On Sun, 5 Oct 2025 11:52:42 -0400
Jon Maloy
On 2025-10-03 00:31, David Gibson wrote:
On Thu, Oct 02, 2025 at 08:34:05PM -0400, Jon Maloy wrote:
We add a cache table to keep track of the contents of the kernel ARP and NDP tables. The table is fed from the just introduced netlink based
[...]
+void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, + const uint8_t *mac) +{ + struct neigh_table *t = &neigh_table; + struct neigh_table_entry *e; + ssize_t slot; + + /* MAC address might change sometimes */ + e = fwd_neigh_table_find(c, addr); + if (e) { + if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw))) + return; + if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw)) + return;
This doesn't make sense to me. From the way its looked up in 4/9, the IP addresses in the table are as seen by the host, not by the guest (we look up the table *after* applying NAT). Which makes guest_gw not meaningful here.
You _do_ need to handle the case that addr is loopback (which guest_gw might be translated to), and that's handled by your dummy entries.
The other case we might need to consider here is if the (translated) address is the host's IP. Do we want to give out_tap_addr for that case? Or the host's real MAC on the template interface? If the latter will that be in the host ARP table, or would we have to handle it specially?
This is my current understanding of this:
1) Adding the loopback entries to the neigbour table is harmless, but unnecessary.
Loopback entries yes, but I thought you would add the equivalent guest-side entries.
fwd_neigh_mac_get() will always return our_tap_mac when it doesn't find an entry in the table, which is the case here.
Kind of, because while with IPv4: $ pasta sh -c 'ip n a 127.0.0.1 dev lo lladdr 00:ba:db:ad:1d:ea; ip n' 0.0.0.0 dev lo lladdr 00:ba:db:ad:1d:ea PERMANENT that won't match anything, with IPv6: $ pasta sh -c 'ip n a ::1 dev lo lladdr 00:ba:db:ad:1d:ea; ip n' ::1 dev lo lladdr 00:ba:db:ad:1d:ea PERMANENT that would match the loopback address. I don't think we actually need to care about this, though.
This addition was a mistake.
2) Regarding the default gw, we have two cases:
2.1) guest_gw IP is the same as the host_gw IP:
I'm afraid we're losing a substantial amount of time on ambiguous terminology, so let me clarify what my understanding of these terms is: a. "guest_gw IP" -> any of the two IP addresses, in the guest, possibly mapping to the host ...that's because the default gateway address in the guest doesn't really matter, so I don't think you're referring to the IP address of the default gateway in the guest. b. "host_gw IP" -> the IP address of the default gateway on the host
In this case, we'll have an event trying to inserting an entry into the table with the host_gw's real mac address. In v11, this mapping was announced to the guest, but later contradicted by all ARP/NDP messages he receives, because they all come from PASTA/PASST and uses our_tap_mac as source mac in the message header. This is probably harmless, but causes confusion in the guest and a warning wireshark. In v12, I therefore chose to add the entry but suppress the announcements for the gw, and also the updates of the mac fields from subsequent events. This is not right either. We either have to be consistent, and add the real host_gw mac in all messages sent from the purported gw address, or we just as consistently use our_tap_mac. I think the latter is simpler with less consequences for the code.
I also think we should use our_tap_mac for any message coming from the host itself: they're not coming from the gateway (as seen by the host), which we are effectively shadowing.
I think we should just simply suppress all events from the host gw in such cases.
This is generally fine, I think, but it comes with the additional problem that, should we ever make the map_guest_addr or map_host_loopback parameters configurable at runtime, we'll miss the entry at that point and we'll need to rework this. If it doesn't cause substantial effort, I would suggest that, if entries always refer to IP addresses as seen from the guest (David's suggestion), you would actually add the entries mapping to the host to the table, resolving to our_tap_mac. Those addresses are *not* c->ip[46].guest_gw though. They are c->ip[46].map_guest_addr and c->ip[46].map_host_loopback. See nat_outbound() for an overview of translated addresses.
(Here I have a question: I don't see any NAT mapping making it possible to communicate between the guest and the the host gw when guest gw IP and host gw IP are are equal. Shouldn't there be one?)
Right, there's no such parameter at the moment, because the assumption is that, if you need to reach the default gateway of the host, you would specify a non-default --map-host-loopback (possibly 'none'). On the other hand we plan to make this more generic, perhaps as part of https://bugs.passt.top/show_bug.cgi?id=140, perhaps as part of another effort. The flow table should enable arbitrary NATs, and was implemented with that in mind. We just don't have a suitable configuration front-end at the moment, even though command line parameters would probably be enough to cover most common cases.
2.2) guest_gw IP is different from host_gw IP: This case is simpler. We just treat the host_gw as any other local host, add and update its entry at new events, and pass it on to the guest as an announcement at first contact, consistently using that host's true mapping.
3) Regarding the host address, we also have two cases:
3.1) guest IP is the same as the host IP (on the default interface):
My interpretation of the terms here is: c. "guest IP" -> one of the configured guest IP addresses (c->ip[46].addr) d. "host IP" -> one of the IP addresses configured on the host (not in struct ctx, but two of those will become c->ip[46].addr by default)
In this case the guest can only reach the host by using the guest_gw IP (which lands on the host),
Not according to my interpretation of "guest_gw IP" (a. above).
some or the other IPs on the host, or the map_host_loopback IP, as far as I understand.
Yes. And c->ip[46].map_guest_addr on top of them.
The host IP will never enter the neigbour table by ARP/NDP event, and there is no reason for us to add a dummy for it.
If none of the host IP addresses are visible to the guest, right, there's no need for any matching entry.
3.2) guest IP differs from the host IP. In this case we let fwd_neigh_table_init() add an entry for the host as if it were any other host, using the IP and mac addresses from the template interface.
This assumes that we'll use MAC addresses assigned to host interfaces, instead of using our_tap_mac. I think it looks more elegant than the alternative, but mind that there can be 2^32 - 1 interfaces on Linux (see dev_index_reserve() in net/core/dev.c), so that might flood the table right away. Maybe our_tap_mac isn't that bad after all.
Conclusion: - We use nat_inbound() instead of nat_outbound() before consulting the table, like you suggest.
If this implies standardising on guest-side addresses, it makes sense to me as well.
- We need to manually add an entry representing the host in the case the host IP differs from the guest IP. This entry is announced. In the case the IPs are the same we don't add any entry.
I don't think host and guest IP addresses are really related to this point. It's rather about the two sets of addresses that might map to the host.
- Local host entries are added by ARP/NDP events, but only if nat_inbound() doesn't translate the IP address (will that ever happen?). We suppress all events from the host gw in case it has the same IP as the guest gw (unless this is covered by the nat_inbound() check?).
I wouldn't hide any address like that, to make this a bit more future-proof.
- We may want a NAT mapping making it possible to reach the host gw in the case it has the same IP as the guest gw. This has no effect on our neighbour table.
If you agree with my above analysis I can go ahead and post a v13 of the series early next week, including the above changes and fixes for your other comments to the series.
...the rest sounds good to me. -- Stefano