On 2025-10-03 00:31, David Gibson wrote:
On Thu, Oct 02, 2025 at 08:34:05PM -0400, Jon Maloy wrote:
We add a cache table to keep track of the contents of the kernel ARP and NDP tables. The table is fed from the just introduced netlink based
[...]
+void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, + const uint8_t *mac) +{ + struct neigh_table *t = &neigh_table; + struct neigh_table_entry *e; + ssize_t slot; + + /* MAC address might change sometimes */ + e = fwd_neigh_table_find(c, addr); + if (e) { + if (inany_equals(addr, &inany_from_v4(c->ip4.guest_gw))) + return; + if (inany_equals(addr, (union inany_addr *)&c->ip6.guest_gw)) + return;
This doesn't make sense to me. From the way its looked up in 4/9, the IP addresses in the table are as seen by the host, not by the guest (we look up the table *after* applying NAT). Which makes guest_gw not meaningful here.
You _do_ need to handle the case that addr is loopback (which guest_gw might be translated to), and that's handled by your dummy entries.
The other case we might need to consider here is if the (translated) address is the host's IP. Do we want to give out_tap_addr for that case? Or the host's real MAC on the template interface? If the latter will that be in the host ARP table, or would we have to handle it specially?
This is my current understanding of this: 1) Adding the loopback entries to the neigbour table is harmless, but unnecessary. fwd_neigh_mac_get() will always return our_tap_mac when it doesn't find an entry in the table, which is the case here. This addition was a mistake. 2) Regarding the default gw, we have two cases: 2.1) guest_gw IP is the same as the host_gw IP: In this case, we'll have an event trying to inserting an entry into the table with the host_gw's real mac address. In v11, this mapping was announced to the guest, but later contradicted by all ARP/NDP messages he receives, because they all come from PASTA/PASST and uses our_tap_mac as source mac in the message header. This is probably harmless, but causes confusion in the guest and a warning wireshark. In v12, I therefore chose to add the entry but suppress the announcements for the gw, and also the updates of the mac fields from subsequent events. This is not right either. We either have to be consistent, and add the real host_gw mac in all messages sent from the purported gw address, or we just as consistently use our_tap_mac. I think the latter is simpler with less consequences for the code. I think we should just simply suppress all events from the host gw in such cases. (Here I have a question: I don't see any NAT mapping making it possible to communicate between the guest and the the host gw when guest gw IP and host gw IP are are equal. Shouldn't there be one?) 2.2) guest_gw IP is different from host_gw IP: This case is simpler. We just treat the host_gw as any other local host, add and update its entry at new events, and pass it on to the guest as an announcement at first contact, consistently using that host's true mapping. 3) Regarding the host address, we also have two cases: 3.1) guest IP is the same as the host IP (on the default interface): In this case the guest can only reach the host by using the guest_gw IP (which lands on the host), some or the other IPs on the host, or the map_host_loopback IP, as far as I understand. The host IP will never enter the neigbour table by ARP/NDP event, and there is no reason for us to add a dummy for it. 3.2) guest IP differs from the host IP. In this case we let fwd_neigh_table_init() add an entry for the host as if it were any other host, using the IP and mac addresses from the template interface. Conclusion: - We use nat_inbound() instead of nat_outbound() before consulting the table, like you suggest. - We need to manually add an entry representing the host in the case the host IP differs from the guest IP. This entry is announced. In the case the IPs are the same we don't add any entry. - Local host entries are added by ARP/NDP events, but only if nat_inbound() doesn't translate the IP address (will that ever happen?). We suppress all events from the host gw in case it has the same IP as the guest gw (unless this is covered by the nat_inbound() check?). - We may want a NAT mapping making it possible to reach the host gw in the case it has the same IP as the guest gw. This has no effect on our neighbour table. If you agree with my above analysis I can go ahead and post a v13 of the series early next week, including the above changes and fixes for your other comments to the series. ///jon
+ + memcpy(e->mac, mac, ETH_ALEN); + return; + } + + e = t->free; + if (!e) { + debug("Failed to allocate neighbour table entry"); + return; + } + t->free = e->next; + + slot = neigh_table_slot(c, addr); + e->next = t->slots[slot]; + t->slots[slot] = e; + + memcpy(&e->addr, addr, sizeof(*addr)); + memcpy(e->mac, mac, ETH_ALEN); +} + +/** + * fwd_neigh_table_free() - Remove an entry from a slot and add it to free list + * @c: Execution context + * @addr: IP address used to find the slot for the entry + */ +void fwd_neigh_table_free(const struct ctx *c, const union inany_addr *addr) +{ + ssize_t slot = neigh_table_slot(c, addr); + struct neigh_table *t = &neigh_table; + struct neigh_table_entry *e, **prev; + + prev = &t->slots[slot]; + e = t->slots[slot]; + while (e && !inany_equals(&e->addr, addr)) { + prev = &e->next; + e = e->next; + } + if (!e) + return; + + *prev = e->next; + e->next = t->free; + t->free = e; + memset(&e->addr, 0, sizeof(*addr)); + memset(e->mac, 0, ETH_ALEN);
As Stefano noted earlier, 0xff is probably a better placeholder here, since 00:00:00:00:00:00 is a valid MAC.
+} + +/** + * fwd_neigh_mac_get() - Look up MAC address in the ARP/NDP table + * @c: Execution context + * @addr: Neighbour IP address used as lookup key + * @mac: Buffer for Ethernet MAC to return, found or default value. + * + * Return: true if real MAC found, false if not found or if failure + */ +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, + uint8_t *mac) +{ + const struct neigh_table_entry *e = fwd_neigh_table_find(c, addr); + + if (e) + memcpy(mac, e->mac, ETH_ALEN); + else + memcpy(mac, c->our_tap_mac, ETH_ALEN); + + return !!e; +} + +/** + * fwd_neigh_table_init() - Initialize the neighbour table + * @c: Execution context + */ +void fwd_neigh_table_init(const struct ctx *c) +{ + struct neigh_table *t = &neigh_table; + const uint8_t *omac = c->our_tap_mac; + struct neigh_table_entry *e; + int i; + + memset(t, 0, sizeof(*t)); + for (i = 0; i < NEIGH_TABLE_SIZE; i++) { + e = &t->entries[i]; + e->next = t->free; + t->free = e; + } + + /* These addresses must always map to our own MAC address */ + fwd_neigh_table_update(c, &inany_loopback4, omac); + fwd_neigh_table_update(c, &inany_loopback6, omac);
These make sense.
+ fwd_neigh_table_update(c, &inany_from_v4(c->ip4.guest_gw), omac); + fwd_neigh_table_update(c, (union inany_addr *)&c->ip6.guest_gw, omac);
But these don't. For two reasons: * As noted guest_gw is only meaningful guest side, but the table is based on host side addresses. * These will be no-ops, since you explicitly exclude these addresses in fwd_neigh_table_update().
+} + /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral * * Work out what ports the host thinks are emphemeral and record it for later diff --git a/fwd.h b/fwd.h index 65c7c96..6ca743c 100644 --- a/fwd.h +++ b/fwd.h @@ -56,5 +56,12 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt); uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt); +void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr, + const uint8_t *mac); +void fwd_neigh_table_free(const struct ctx *c, + const union inany_addr *addr); +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, + uint8_t *mac); +void fwd_neigh_table_init(const struct ctx *c);
#endif /* FWD_H */ diff --git a/netlink.c b/netlink.c index 3fe2fdd..4be5fcf 100644 --- a/netlink.c +++ b/netlink.c @@ -1192,10 +1192,13 @@ static void nl_neigh_msg_read(const struct ctx *c, struct nlmsghdr *nh) inany_from_af(&addr, ndm->ndm_family, dst); inany_ntop(dst, ip_str, sizeof(ip_str));
- if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) + if (nh->nlmsg_type == RTM_NEWNEIGH && ndm->ndm_state & NUD_VALID) { trace("neigh table update: %s / %s", ip_str, mac_str); - else + fwd_neigh_table_update(c, &addr, mac); + } else { trace("neigh table delete: %s / %s", ip_str, mac_str); + fwd_neigh_table_free(c, &addr); + } }
/** -- 2.50.1