On Wed, 8 Oct 2025 11:27:32 +1100
David Gibson
On Tue, Oct 07, 2025 at 12:10:22PM +0200, Stefano Brivio wrote:
On Fri, 3 Oct 2025 14:41:56 +1000 David Gibson
wrote: On Thu, Oct 02, 2025 at 08:34:06PM -0400, Jon Maloy wrote:
ARP announcements and unsolicited NAs should be handled with caution because of the risk of malignant users emitting them to disturb network communication.
There is however one case we where we know it is legitimate and safe for us to send out such messages: The one time we switch from using ctx->own_tap_mac to a MAC address received via the recently added neigbour subscription function. Later changes to the MAC address of a host in an existing entry cannot be fully trusted, so we abstain from doing it in such cases.
When sending this type of messages, we notice that the guest accepts the update, but shortly later asks for a confirmation in the form of a regular ARP/NS request. This is responded to with the new value, and we have exactly the effect we wanted.
This commit adds this functionality.
Signed-off-by: Jon Maloy
--- v10: -Made small changes based of feedback from David G. v11: -Moved from 'Gratuitous ARP reply' model to 'ARP Announcement' model. v12: -Excluding loopback and default GW addresses from the ARP/NA announcement to be sent to the guest --- arp.c | 42 ++++++++++++++++++++++++++++++++++++++++++ arp.h | 2 ++ fwd.c | 16 ++++++++++++++++ ndp.c | 10 ++++++++++ ndp.h | 1 + 5 files changed, 71 insertions(+)
diff --git a/arp.c b/arp.c index ad088b1..b08780f 100644 --- a/arp.c +++ b/arp.c @@ -146,3 +146,45 @@ void arp_send_init_req(const struct ctx *c) debug("Sending initial ARP request for guest MAC address"); tap_send_single(c, &req, sizeof(req)); } + +/** + * arp_announce() - Send an ARP announcement for an IPv4 host + * @c: Execution context + * @ip: IPv4 address we announce as owned by @mac + * @mac: MAC address to advertise for @ip + */ +void arp_announce(const struct ctx *c, struct in_addr *ip, + const unsigned char *mac) +{ + char ip_str[INET_ADDRSTRLEN]; + char mac_str[ETH_ADDRSTRLEN]; + struct { + struct ethhdr eh; + struct arphdr ah; + struct arpmsg am; + } __attribute__((__packed__)) annc; + + /* Ethernet header */ + annc.eh.h_proto = htons(ETH_P_ARP); + memcpy(annc.eh.h_dest, MAC_BROADCAST, sizeof(annc.eh.h_dest)); + memcpy(annc.eh.h_source, mac, sizeof(annc.eh.h_source)); + + /* ARP header */ + annc.ah.ar_op = htons(ARPOP_REQUEST); + annc.ah.ar_hrd = htons(ARPHRD_ETHER); + annc.ah.ar_pro = htons(ETH_P_IP); + annc.ah.ar_hln = ETH_ALEN; + annc.ah.ar_pln = 4; + + /* ARP message */ + memcpy(annc.am.sha, mac, sizeof(annc.am.sha)); + memcpy(annc.am.sip, ip, sizeof(annc.am.sip)); + memcpy(annc.am.tha, MAC_BROADCAST, sizeof(annc.am.tha)); + memcpy(annc.am.tip, ip, sizeof(annc.am.tip));
As noted in several earlier revisions, having sip == tip (but with different mac addresses) looks odd. Is that what the RFCs say to do for ARP announcements?
+ inet_ntop(AF_INET, ip, ip_str, sizeof(ip_str)); + eth_ntop(mac, mac_str, sizeof(mac_str)); + debug("Announcing ARP for %s / %s", ip_str, mac_str); + + tap_send_single(c, &annc, sizeof(annc)); +} diff --git a/arp.h b/arp.h index d5ad0e1..4862e90 100644 --- a/arp.h +++ b/arp.h @@ -22,5 +22,7 @@ struct arpmsg {
int arp(const struct ctx *c, struct iov_tail *data); void arp_send_init_req(const struct ctx *c); +void arp_announce(const struct ctx *c, struct in_addr *ip, + const unsigned char *mac);
#endif /* ARP_H */ diff --git a/fwd.c b/fwd.c index c34bb1c..ade97c8 100644 --- a/fwd.c +++ b/fwd.c @@ -26,6 +26,8 @@ #include "passt.h" #include "lineread.h" #include "flow_table.h" +#include "arp.h" +#include "ndp.h"
/* Empheral port range: values from RFC 6335 */ static in_port_t fwd_ephemeral_min = (1 << 15) + (1 << 14); @@ -140,6 +142,20 @@ void fwd_neigh_table_update(const struct ctx *c, const union inany_addr *addr,
memcpy(&e->addr, addr, sizeof(*addr)); memcpy(e->mac, mac, ETH_ALEN); + + if (inany_equals(addr, &inany_loopback4)) + return; + if (inany_equals(addr, &inany_loopback6)) + return;
Since you need these explicit checks anyway, there's not much point to the dummy entries you created - you could exit on these addresses before even looking up the table.
I guess those entries make sense if we can drop all these checks as a result. I think we should be able to.
We couldn't in this version, because that might have allowed the entries for loopback to be updated, which is certainly wrong. But it will all need re-examination after moving everything over to guest side addresses which AIUI is the plan for the next spin.
Yes, I was talking about the next version. For context, when we first discussed about the possibility of these entries with Jon, my assumption was that the whole series used guest-side link-layer addresses exclusively, but that wasn't the case, hence (I think) the current struggle. If we go in that direction, I hope it's possible. By the way, while they are probably more elegant because we can skip explicit cases, they might be a bit more complicated to manage compared to those explicit cases the day we get to change addresses and routes dynamically using a netlink monitor, because at that point we might need to remove some entries based on old addresses / default gateways. But given that this is already complicated enough, we can keep that problem for later, and just go with the simplest possible approach (whatever it is) for the moment. -- Stefano