On Thu, 9 Oct 2025 14:51:02 +1100
David Gibson
On Wed, Oct 08, 2025 at 12:01:18PM +0200, Stefano Brivio wrote:
On Wed, 8 Oct 2025 11:27:32 +1100 David Gibson
wrote: On Tue, Oct 07, 2025 at 12:10:22PM +0200, Stefano Brivio wrote:
On Fri, 3 Oct 2025 14:41:56 +1000 David Gibson
wrote: On Thu, Oct 02, 2025 at 08:34:06PM -0400, Jon Maloy wrote:
ARP announcements and unsolicited NAs should be handled with caution because of the risk of malignant users emitting them to disturb network communication.
There is however one case we where we know it is legitimate and safe for us to send out such messages: The one time we switch from using ctx->own_tap_mac to a MAC address received via the recently added neigbour subscription function. Later changes to the MAC address of a host in an existing entry cannot be fully trusted, so we abstain from doing it in such cases.
When sending this type of messages, we notice that the guest accepts the update, but shortly later asks for a confirmation in the form of a regular ARP/NS request. This is responded to with the new value, and we have exactly the effect we wanted.
This commit adds this functionality.
Signed-off-by: Jon Maloy
--- v10: -Made small changes based of feedback from David G. v11: -Moved from 'Gratuitous ARP reply' model to 'ARP Announcement' model. v12: -Excluding loopback and default GW addresses from the ARP/NA announcement to be sent to the guest --- arp.c | 42 ++++++++++++++++++++++++++++++++++++++++++ arp.h | 2 ++ fwd.c | 16 ++++++++++++++++ ndp.c | 10 ++++++++++ ndp.h | 1 + 5 files changed, 71 insertions(+)
diff --git a/arp.c b/arp.c index ad088b1..b08780f 100644 --- a/arp.c +++ b/arp.c @@ -146,3 +146,45 @@ void arp_send_init_req(const struct ctx *c) debug("Sending initial ARP request for guest MAC address"); tap_send_single(c, &req, sizeof(req)); } + +/** + * arp_announce() - Send an ARP announcement for an IPv4 host + * @c: Execution context + * @ip: IPv4 address we announce as owned by @mac + * @mac: MAC address to advertise for @ip + */ +void arp_announce(const struct ctx *c, struct in_addr *ip, + const unsigned char *mac) +{ + char ip_str[INET_ADDRSTRLEN]; + char mac_str[ETH_ADDRSTRLEN]; + struct { + struct ethhdr eh; + struct arphdr ah; + struct arpmsg am; + } __attribute__((__packed__)) annc; + + /* Ethernet header */ + annc.eh.h_proto = htons(ETH_P_ARP); + memcpy(annc.eh.h_dest, MAC_BROADCAST, sizeof(annc.eh.h_dest)); + memcpy(annc.eh.h_source, mac, sizeof(annc.eh.h_source)); + + /* ARP header */ + annc.ah.ar_op = htons(ARPOP_REQUEST); + annc.ah.ar_hrd = htons(ARPHRD_ETHER); + annc.ah.ar_pro = htons(ETH_P_IP); + annc.ah.ar_hln = ETH_ALEN; + annc.ah.ar_pln = 4; + + /* ARP message */ + memcpy(annc.am.sha, mac, sizeof(annc.am.sha)); + memcpy(annc.am.sip, ip, sizeof(annc.am.sip)); + memcpy(annc.am.tha, MAC_BROADCAST, sizeof(annc.am.tha)); + memcpy(annc.am.tip, ip, sizeof(annc.am.tip));
As noted in several earlier revisions, having sip == tip (but with different mac addresses) looks odd. Is that what the RFCs say to do for ARP announcements?
+ inet_ntop(AF_INET, ip, ip_str, sizeof(ip_str)); + eth_ntop(mac, mac_str, sizeof(mac_str)); + debug("Announcing ARP for %s / %s", ip_str, mac_str); + + tap_send_single(c, &annc, sizeof(annc)); +} diff --git a/arp.h b/arp.h index d5ad0e1..4862e90 100644 --- a/arp.h +++ b/arp.h @@ -22,5 +22,7 @@ struct arpmsg {
int arp(const struct ctx *c, struct iov_tail *data); void arp_send_init_req(const struct ctx *c); +void arp_announce(const struct ctx *c, struct in_addr *ip, + const unsigned char *mac);
#endif /* ARP_H */ diff --git a/fwd.c b/fwd.c index c34bb1c..ade97c8 100644 --- a/fwd.c +++ b/fwd.c @@ -26,6 +26,8 @@ #include "passt.h" #include "lineread.h" #include "flow_table.h" +#include "arp.h" +#include "ndp.h"
/* Empheral port range: values from RFC 6335 */ static in_port_t fwd_ephemeral_min = (1 << 15) + (1 << 14); @@ -140,6 +142,20 @@ void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
memcpy(&e->addr, addr, sizeof(*addr)); memcpy(e->mac, mac, ETH_ALEN); + + if (inany_equals(addr, &inany_loopback4)) + return; + if (inany_equals(addr, &inany_loopback6)) + return;
Since you need these explicit checks anyway, there's not much point to the dummy entries you created - you could exit on these addresses before even looking up the table.
I guess those entries make sense if we can drop all these checks as a result. I think we should be able to.
We couldn't in this version, because that might have allowed the entries for loopback to be updated, which is certainly wrong. But it will all need re-examination after moving everything over to guest side addresses which AIUI is the plan for the next spin.
Yes, I was talking about the next version. For context, when we first discussed about the possibility of these entries with Jon, my assumption was that the whole series used guest-side link-layer addresses exclusively,
We did use guest-side link-layer addresses - host-side LL addresses might not even exist. The question is about whether we use guest side or host side IP addresses to index the table.
Sorry, yes, I meant to write network and I wrote link-layer.
but that wasn't the case, hence (I think) the current struggle. If we go in that direction, I hope it's possible.
Thinking a bit more closely, I don't think it is, for much the same reason it wasn't in this draft.
According to the rules Jon and I thrashed out elsewhere in the thread, there are certain guest side addresses that must be locked to use our_tap_mac. We're essentially shadowing something that might exist on the host side, so we should use our MAC not the MAC of whatever is shadowed.
Just pre-populating an entry won't do the trick, because it could be overwritten if the right events occur for the shadowed host.
Right, sorry, I omitted another bit of context: I've been suggesting to Jon that he'd introduce some kind of "permanent" or "administrative" bit, and keep those entries at the beginning of the chain, exactly for the reason you mention. I can imagine we'll need those at some point if we ever want to offer explicit link-layer address mapping in the future, and they're probably convenient the day one can change map_guest_addr and map_host_loopback at runtime. We can also happily skip that for the moment, though, it's another problem we can keep for later.
By the way, while they are probably more elegant because we can skip explicit cases, they might be a bit more complicated to manage compared to those explicit cases the day we get to change addresses and routes dynamically using a netlink monitor, because at that point we might need to remove some entries based on old addresses / default gateways.
But given that this is already complicated enough, we can keep that problem for later, and just go with the simplest possible approach (whatever it is) for the moment.
-- Stefano
-- Stefano