[PATCH] Send an initial ARP request to resolve the guest IP address
When restarting passt while QEMU keeps running with a configured
"reconnect-ms" setting, the port forwardings will stop working until
the guest sends some outgoing network traffic.
Reason: Although QEMU reconnects successfully to the unix domain
socket of the new passt process, that one no longer knows the guest's
MAC address and uses instead a broadcast MAC address. However, this
is ignored by the guest, at least if the guest runs Linux. Only after
the guest sends some network package on its own initiative, passt will
know the MAC address and will be able to establish forwarded
connections.
This change fixes this issue by sending an ARP request to resolve the
guest's MAC address via its IP address, which we do know, right after
the unix domain socket (re)connection.
The only case where the IP is "wrong" would be if the configuration
changed, and/or on the very first start right after qemu started. But
in those cases, we just wouldn't get an ARP response, and can't do
anything until we receive the guest's DHCP request - just as before.
In other words, in the worst case an ARP request would be harmless.
Signed-off-by: Volker Diels-Grabsch
On Sun, Sep 07, 2025 at 01:01:08PM +0200, Volker Diels-Grabsch wrote:
When restarting passt while QEMU keeps running with a configured "reconnect-ms" setting, the port forwardings will stop working until the guest sends some outgoing network traffic.
Reason: Although QEMU reconnects successfully to the unix domain socket of the new passt process, that one no longer knows the guest's MAC address and uses instead a broadcast MAC address. However, this is ignored by the guest, at least if the guest runs Linux.
Huh... I thought Linux would respond to that. I wonder if there's some sysctl config or version difference going on here.
Only after the guest sends some network package on its own initiative, passt will
[Aside: Although "packet" and "package" usually mean the same thing in English, it's always "network packets" not "network packages" and always "software packages" not "software packets"]
know the MAC address and will be able to establish forwarded connections.
This change fixes this issue by sending an ARP request to resolve the guest's MAC address via its IP address, which we do know, right after the unix domain socket (re)connection.
Right. The guest doesn't necessarily use the IP we give it, but it usually will, so this will work most of the time.
The only case where the IP is "wrong" would be if the configuration changed, and/or on the very first start right after qemu started. But in those cases, we just wouldn't get an ARP response, and can't do anything until we receive the guest's DHCP request - just as before. In other words, in the worst case an ARP request would be harmless.
Right. This seems like a good idea to me. It won't help in all cases, but it will help in some, and I can't see any harm it could do. Some comments on the implementation details below.
Signed-off-by: Volker Diels-Grabsch
--- arp.c | 34 ++++++++++++++++++++++++++++++++++ arp.h | 1 + tap.c | 15 +++++++++++++-- 3 files changed, 48 insertions(+), 2 deletions(-) diff --git a/arp.c b/arp.c index 44677ad..561581a 100644 --- a/arp.c +++ b/arp.c @@ -112,3 +112,37 @@ int arp(const struct ctx *c, struct iov_tail *data)
return 1; } + +/** + * send_initial_arp_req() - Send initial ARP request to retrieve guest MAC address
Wrap to 80 columns, please. Also, since this is exposed from this module the name should start with 'arp_'.
+ * @c: Execution context + */ +void send_initial_arp_req(const struct ctx *c) +{ + struct { + struct ethhdr eh; + struct arphdr ah; + struct arpmsg am; + } __attribute__((__packed__)) req; + + /* Ethernet header */ + req.eh.h_proto = htons(ETH_P_ARP); + memcpy(req.eh.h_dest, c->guest_mac, sizeof(req.eh.h_dest));
At this point we expect guest_mac to be the broadcast address, but it seems like explicitly using broadcast would be a little more robust if that changes at some point.
+ memcpy(req.eh.h_source, c->our_tap_mac, sizeof(req.eh.h_source)); + + /* ARP header */ + req.ah.ar_op = htons(ARPOP_REQUEST); + req.ah.ar_hrd = htons(ARPHRD_ETHER); + req.ah.ar_pro = htons(ETH_P_IP); + req.ah.ar_hln = ETH_ALEN; + req.ah.ar_pln = 4; + + /* ARP message */ + memcpy(req.am.sha, c->our_tap_mac, sizeof(req.am.sha)); + memcpy(req.am.sip, &c->ip4.our_tap_addr, sizeof(req.am.sip)); + memcpy(req.am.tha, c->guest_mac, sizeof(req.am.tha)); + memcpy(req.am.tip, &c->ip4.addr, sizeof(req.am.tip)); + + info("sending initial ARP request to retrieve guest MAC address after reconnect"); + tap_send_single(c, &req, sizeof(req)); +} diff --git a/arp.h b/arp.h index 86bcbf8..5490144 100644 --- a/arp.h +++ b/arp.h @@ -21,5 +21,6 @@ struct arpmsg { } __attribute__((__packed__));
int arp(const struct ctx *c, struct iov_tail *data); +void send_initial_arp_req(const struct ctx *c);
#endif /* ARP_H */ diff --git a/tap.c b/tap.c index 7ba6399..47dedd5 100644 --- a/tap.c +++ b/tap.c @@ -1355,6 +1355,16 @@ static void tap_start_connection(const struct ctx *c) ev.events = EPOLLIN | EPOLLRDHUP; ev.data.u64 = ref.u64; epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); + + switch (c->mode) { + case MODE_PASST: + send_initial_arp_req(c); + break; + case MODE_PASTA: + break; + case MODE_VU: + break; + }
I don't think we want to make this conditional on MODE_PASST. I think MODE_VU could suffer from the same problem. So could MODE_PASTA, although it would take a more unusual setup to trigger it. In any case, sending it unconditionally should be at worst harmless and is simpler.
}
/** @@ -1504,8 +1514,9 @@ void tap_backend_init(struct ctx *c) tap_sock_unix_init(c);
/* In passt mode, we don't know the guest's MAC address until it - * sends us packets. Use the broadcast address so that our - * first packets will reach it. + * sends us packets (e.g. responds to our initial ARP request). + * Until then, use the broadcast address so that our first + * packets will reach it. */ memset(&c->guest_mac, 0xff, sizeof(c->guest_mac)); break; -- 2.47.2
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
Dear David, Thanks a lot for your timely review. I applied all of your suggested improvements, and moreover introduced MAC_BROADCAST, analogous to the already existing MAC_ZERO, to improve readability. Regarding the maximum of 80 columns per line, I did fix it at the one place you asked for, but noticed that there are two other places where my patch hits that limit. However, I'm not sure about the expected code formatting when breaking those lines. So if those need to be broken as well, I'd be grateful for coding style suggestions. Best regards, Volker
When restarting passt while QEMU keeps running with a configured
"reconnect-ms" setting, the port forwardings will stop working until
the guest sends some outgoing network traffic.
Reason: Although QEMU reconnects successfully to the unix domain
socket of the new passt process, that one no longer knows the guest's
MAC address and uses instead a broadcast MAC address. However, this
is ignored by the guest, at least if the guest runs Linux. Only after
the guest sends some network package on its own initiative, passt will
know the MAC address and will be able to establish forwarded
connections.
This change fixes this issue by sending an ARP request to resolve the
guest's MAC address via its IP address, which we do know, right after
the unix domain socket (re)connection.
The only case where the IP is "wrong" would be if the configuration
changed, and/or on the very first start right after qemu started. But
in those cases, we just wouldn't get an ARP response, and can't do
anything until we receive the guest's DHCP request - just as before.
In other words, in the worst case an ARP request would be harmless.
Signed-off-by: Volker Diels-Grabsch
On Mon, Sep 08, 2025 at 11:22:44AM +0200, Volker Diels-Grabsch wrote:
When restarting passt while QEMU keeps running with a configured "reconnect-ms" setting, the port forwardings will stop working until the guest sends some outgoing network traffic.
Reason: Although QEMU reconnects successfully to the unix domain socket of the new passt process, that one no longer knows the guest's MAC address and uses instead a broadcast MAC address. However, this is ignored by the guest, at least if the guest runs Linux. Only after the guest sends some network package on its own initiative, passt will know the MAC address and will be able to establish forwarded connections.
This change fixes this issue by sending an ARP request to resolve the guest's MAC address via its IP address, which we do know, right after the unix domain socket (re)connection.
The only case where the IP is "wrong" would be if the configuration changed, and/or on the very first start right after qemu started. But in those cases, we just wouldn't get an ARP response, and can't do anything until we receive the guest's DHCP request - just as before. In other words, in the worst case an ARP request would be harmless.
Signed-off-by: Volker Diels-Grabsch
Reviewed-by: David Gibson
--- arp.c | 34 ++++++++++++++++++++++++++++++++++ arp.h | 1 + tap.c | 9 ++++++--- util.h | 1 + 4 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/arp.c b/arp.c index 44677ad..c217356 100644 --- a/arp.c +++ b/arp.c @@ -112,3 +112,37 @@ int arp(const struct ctx *c, struct iov_tail *data)
return 1; } + +/** + * arp_send_init_req() - Send initial ARP request to retrieve guest MAC address + * @c: Execution context + */ +void arp_send_init_req(const struct ctx *c) +{ + struct { + struct ethhdr eh; + struct arphdr ah; + struct arpmsg am; + } __attribute__((__packed__)) req; + + /* Ethernet header */ + req.eh.h_proto = htons(ETH_P_ARP); + memcpy(req.eh.h_dest, MAC_BROADCAST, sizeof(req.eh.h_dest)); + memcpy(req.eh.h_source, c->our_tap_mac, sizeof(req.eh.h_source)); + + /* ARP header */ + req.ah.ar_op = htons(ARPOP_REQUEST); + req.ah.ar_hrd = htons(ARPHRD_ETHER); + req.ah.ar_pro = htons(ETH_P_IP); + req.ah.ar_hln = ETH_ALEN; + req.ah.ar_pln = 4; + + /* ARP message */ + memcpy(req.am.sha, c->our_tap_mac, sizeof(req.am.sha)); + memcpy(req.am.sip, &c->ip4.our_tap_addr, sizeof(req.am.sip)); + memcpy(req.am.tha, MAC_BROADCAST, sizeof(req.am.tha)); + memcpy(req.am.tip, &c->ip4.addr, sizeof(req.am.tip)); + + info("sending initial ARP request to retrieve guest MAC address after reconnect"); + tap_send_single(c, &req, sizeof(req)); +} diff --git a/arp.h b/arp.h index 86bcbf8..d5ad0e1 100644 --- a/arp.h +++ b/arp.h @@ -21,5 +21,6 @@ struct arpmsg { } __attribute__((__packed__));
int arp(const struct ctx *c, struct iov_tail *data); +void arp_send_init_req(const struct ctx *c);
#endif /* ARP_H */ diff --git a/tap.c b/tap.c index 7ba6399..4249353 100644 --- a/tap.c +++ b/tap.c @@ -1355,6 +1355,8 @@ static void tap_start_connection(const struct ctx *c) ev.events = EPOLLIN | EPOLLRDHUP; ev.data.u64 = ref.u64; epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); + + arp_send_init_req(c); }
/** @@ -1504,10 +1506,11 @@ void tap_backend_init(struct ctx *c) tap_sock_unix_init(c);
/* In passt mode, we don't know the guest's MAC address until it - * sends us packets. Use the broadcast address so that our - * first packets will reach it. + * sends us packets (e.g. responds to our initial ARP request). + * Until then, use the broadcast address so that our first + * packets will reach it. */ - memset(&c->guest_mac, 0xff, sizeof(c->guest_mac)); + memcpy(&c->guest_mac, MAC_BROADCAST, sizeof(c->guest_mac)); break; }
diff --git a/util.h b/util.h index 2a8c38f..3719f0c 100644 --- a/util.h +++ b/util.h @@ -97,6 +97,7 @@ void abort_with_msg(const char *fmt, ...) #define FD_PROTO(x, proto) \ (IN_INTERVAL(c->proto.fd_min, c->proto.fd_max, (x)))
+#define MAC_BROADCAST ((uint8_t [ETH_ALEN]){ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }) #define MAC_ZERO ((uint8_t [ETH_ALEN]){ 0 }) #define MAC_IS_ZERO(addr) (!memcmp((addr), MAC_ZERO, ETH_ALEN))
-- 2.47.3
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
David Gibson wrote:
Looks good to me. It would be nice to also send an Neighbour Discovery request to accomplish the same thing for IPv6 only guests, but that could be a separate patch.
Good point! I'd prefer to do it in the same patch, as I'd also like fix another minor detail in this one. Just for the sake of clarity: I had a look at RFC4861 and there are many types of Neighbour Discovery requests. For our purpose, I believe that we want to send a "Neighbor Solicitation Message" described in section 4.3: https://datatracker.ietf.org/doc/html/rfc4861#section-4.3 Do you agree, or did you have something else in mind?
Note that even with this patch, active TCP connections (and in some cases UDP flows) will be broken by a passt restart.
Indeed, that is unavoidable for a user-space tool opening TCP and UDP connections, I guess, unless "passt" itself is wrapped into another process or system tool that keeps those connections open. But let's not go into that. If I really needed that level of independence, there is still the option to use a VPN like Wireguard, and then have a fixed passt process that is never restarted, and forwards only the wireguard UDP port. The service discrimination would then happen on IP rather than port level, so that (re)configuration of the service assignments to VM could happen always on-the-fly directly by the Cryptokey Routing table. (And maybe there are even better options for that use case, this is just the first simple solution that came to my mind, even though it would incur some crypto overhead unless you are already using wireguard anyways.) Best regards, Volker -- .---<<<((()))>>>---. | [[||]] | '---<<<((()))>>>---'
Okay, so here is the improved patch, which now performs NDP in addition to ARP. I tested everything locally with QEMU VMs running a minimal Debian system, and it all worked flawlessly. Moreover, the log messages are improved, and I added new log message "Guest MAC address: ..." as soon as we discover the guest's MAC address or notice a change of it.
When restarting passt while QEMU keeps running with a configured
"reconnect-ms" setting, the port forwardings will stop working until
the guest sends some outgoing network traffic.
Reason: Although QEMU reconnects successfully to the unix domain
socket of the new passt process, that one no longer knows the guest's
MAC address and uses instead the broadcast MAC address. However, this
is ignored by the guest, at least if the guest runs Linux. Only after
the guest sends some network package on its own initiative, passt will
know the MAC address and will be able to establish forwarded
connections.
This change fixes this issue by sending an ARP and an NDP request to
resolve the guest's MAC address via its IPv4 and IPv6 address, which
we do know, right after the unix domain socket (re)connection.
The only case where the IP is "wrong" would be if the configuration
changed, or on the very first start right after qemu started. But in
those cases, we just wouldn't get an ARP/NDP response, and can't do
anything until we receive the guest's DHCP request - just as before.
In other words, in the worst case the ARP/NDP requests would be
harmless.
Signed-off-by: Volker Diels-Grabsch
On Tue, Sep 09, 2025 at 04:49:20PM +0200, Volker Diels-Grabsch wrote:
When restarting passt while QEMU keeps running with a configured "reconnect-ms" setting, the port forwardings will stop working until the guest sends some outgoing network traffic.
Reason: Although QEMU reconnects successfully to the unix domain socket of the new passt process, that one no longer knows the guest's MAC address and uses instead the broadcast MAC address. However, this is ignored by the guest, at least if the guest runs Linux. Only after the guest sends some network package on its own initiative, passt will know the MAC address and will be able to establish forwarded connections.
This change fixes this issue by sending an ARP and an NDP request to resolve the guest's MAC address via its IPv4 and IPv6 address, which we do know, right after the unix domain socket (re)connection.
The only case where the IP is "wrong" would be if the configuration changed, or on the very first start right after qemu started. But in those cases, we just wouldn't get an ARP/NDP response, and can't do anything until we receive the guest's DHCP request - just as before. In other words, in the worst case the ARP/NDP requests would be harmless.
Signed-off-by: Volker Diels-Grabsch
Reviewed-by: David Gibson
diff --git a/tap.c b/tap.c index 7ba6399..ea61eae 100644 --- a/tap.c +++ b/tap.c @@ -1088,6 +1088,7 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data, { struct ethhdr eh_storage; const struct ethhdr *eh; + char bufmac[ETH_ADDRSTRLEN];
We'd generally prefer to move this local to the if block where it's used.
pcap_iov(data->iov, data->cnt, data->off);
@@ -1097,6 +1098,7 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
if (memcmp(c->guest_mac, eh->h_source, ETH_ALEN)) { memcpy(c->guest_mac, eh->h_source, ETH_ALEN); + info("Guest MAC address: %s", eth_ntop(c->guest_mac, bufmac, sizeof(bufmac))); proto_update_l2_buf(c->guest_mac, NULL); }
@@ -1355,6 +1357,11 @@ static void tap_start_connection(const struct ctx *c) ev.events = EPOLLIN | EPOLLRDHUP; ev.data.u64 = ref.u64; epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); + + info("Sending initial ARP and NDP request to retrieve" + " guest MAC address after reconnect");
I think it's going to be rare that we care about this, so I'd demote it to a debug().
+ arp_send_init_req(c); + ndp_send_init_req(c); }
/** @@ -1503,11 +1510,12 @@ void tap_backend_init(struct ctx *c) case MODE_PASST: tap_sock_unix_init(c);
- /* In passt mode, we don't know the guest's MAC address until it - * sends us packets. Use the broadcast address so that our - * first packets will reach it. + /* In passt mode, we don't know the guest's MAC address until + * it sends us packets (e.g. responds to our initial ARP or + * NDP request). Until then, use the broadcast address so + * that our first packets will have a chance to reach it. */ - memset(&c->guest_mac, 0xff, sizeof(c->guest_mac)); + memcpy(&c->guest_mac, MAC_BROADCAST, sizeof(c->guest_mac)); break; }
diff --git a/util.h b/util.h index 2a8c38f..3719f0c 100644 --- a/util.h +++ b/util.h @@ -97,6 +97,7 @@ void abort_with_msg(const char *fmt, ...) #define FD_PROTO(x, proto) \ (IN_INTERVAL(c->proto.fd_min, c->proto.fd_max, (x)))
+#define MAC_BROADCAST ((uint8_t [ETH_ALEN]){ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }) #define MAC_ZERO ((uint8_t [ETH_ALEN]){ 0 }) #define MAC_IS_ZERO(addr) (!memcmp((addr), MAC_ZERO, ETH_ALEN))
-- 2.47.3
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 9 Sep 2025 16:49:20 +0200
Volker Diels-Grabsch
When restarting passt while QEMU keeps running with a configured "reconnect-ms" setting, the port forwardings will stop working until the guest sends some outgoing network traffic.
Reason: Although QEMU reconnects successfully to the unix domain socket of the new passt process, that one no longer knows the guest's MAC address and uses instead the broadcast MAC address. However, this is ignored by the guest, at least if the guest runs Linux. Only after the guest sends some network package on its own initiative, passt will know the MAC address and will be able to establish forwarded connections.
This change fixes this issue by sending an ARP and an NDP request to resolve the guest's MAC address via its IPv4 and IPv6 address, which we do know, right after the unix domain socket (re)connection.
The only case where the IP is "wrong" would be if the configuration changed, or on the very first start right after qemu started. But in those cases, we just wouldn't get an ARP/NDP response, and can't do anything until we receive the guest's DHCP request - just as before. In other words, in the worst case the ARP/NDP requests would be harmless.
Thanks for the implementation, this looks like a small but quite relevant feature we missed until now. I have a couple of comments on top of David's ones:
Signed-off-by: Volker Diels-Grabsch
--- arp.c | 33 +++++++++++++++++++++++++++++++++ arp.h | 1 + ndp.c | 19 +++++++++++++++++++ ndp.h | 1 + tap.c | 16 ++++++++++++---- util.h | 1 + 6 files changed, 67 insertions(+), 4 deletions(-) diff --git a/arp.c b/arp.c index 44677ad..c1bd63b 100644 --- a/arp.c +++ b/arp.c @@ -112,3 +112,36 @@ int arp(const struct ctx *c, struct iov_tail *data)
return 1; } + +/** + * arp_send_init_req() - Send initial ARP request to retrieve guest MAC address + * @c: Execution context + */ +void arp_send_init_req(const struct ctx *c) +{ + struct { + struct ethhdr eh; + struct arphdr ah; + struct arpmsg am; + } __attribute__((__packed__)) req; + + /* Ethernet header */ + req.eh.h_proto = htons(ETH_P_ARP); + memcpy(req.eh.h_dest, MAC_BROADCAST, sizeof(req.eh.h_dest)); + memcpy(req.eh.h_source, c->our_tap_mac, sizeof(req.eh.h_source)); + + /* ARP header */ + req.ah.ar_op = htons(ARPOP_REQUEST); + req.ah.ar_hrd = htons(ARPHRD_ETHER); + req.ah.ar_pro = htons(ETH_P_IP); + req.ah.ar_hln = ETH_ALEN; + req.ah.ar_pln = 4; + + /* ARP message */ + memcpy(req.am.sha, c->our_tap_mac, sizeof(req.am.sha)); + memcpy(req.am.sip, &c->ip4.our_tap_addr, sizeof(req.am.sip)); + memcpy(req.am.tha, MAC_BROADCAST, sizeof(req.am.tha)); + memcpy(req.am.tip, &c->ip4.addr, sizeof(req.am.tip)); + + tap_send_single(c, &req, sizeof(req)); +} diff --git a/arp.h b/arp.h index 86bcbf8..d5ad0e1 100644 --- a/arp.h +++ b/arp.h @@ -21,5 +21,6 @@ struct arpmsg { } __attribute__((__packed__));
int arp(const struct ctx *c, struct iov_tail *data); +void arp_send_init_req(const struct ctx *c);
#endif /* ARP_H */ diff --git a/ndp.c b/ndp.c index eb090cd..b3bdedb 100644 --- a/ndp.c +++ b/ndp.c @@ -438,3 +438,22 @@ void ndp_timer(const struct ctx *c, const struct timespec *now) first: next_ra = now->tv_sec + interval; } + +/** + * ndp_send_init_req() - Send initial NDP NS to retrieve guest MAC address + * @c: Execution context + */ +void ndp_send_init_req(const struct ctx *c) +{ + struct ndp_ns ns = { + .ih = { + .icmp6_type = NS, + .icmp6_code = 0, + .icmp6_router = 0, /* Reserved */ + .icmp6_solicited = 0, /* Reserved */ + .icmp6_override = 0, /* Reserved */ + }, + .target_addr = c->ip6.addr + }; + ndp_send(c, &c->ip6.addr, &ns, sizeof(ns)); +} diff --git a/ndp.h b/ndp.h index b1dd5e8..781ea86 100644 --- a/ndp.h +++ b/ndp.h @@ -11,5 +11,6 @@ struct icmp6hdr; int ndp(const struct ctx *c, const struct in6_addr *saddr, struct iov_tail *data); void ndp_timer(const struct ctx *c, const struct timespec *now); +void ndp_send_init_req(const struct ctx *c);
#endif /* NDP_H */ diff --git a/tap.c b/tap.c index 7ba6399..ea61eae 100644 --- a/tap.c +++ b/tap.c @@ -1088,6 +1088,7 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data, { struct ethhdr eh_storage; const struct ethhdr *eh; + char bufmac[ETH_ADDRSTRLEN];
pcap_iov(data->iov, data->cnt, data->off);
@@ -1097,6 +1098,7 @@ void tap_add_packet(struct ctx *c, struct iov_tail *data,
if (memcmp(c->guest_mac, eh->h_source, ETH_ALEN)) { memcpy(c->guest_mac, eh->h_source, ETH_ALEN); + info("Guest MAC address: %s", eth_ntop(c->guest_mac, bufmac, sizeof(bufmac))); proto_update_l2_buf(c->guest_mac, NULL); }
@@ -1355,6 +1357,11 @@ static void tap_start_connection(const struct ctx *c) ev.events = EPOLLIN | EPOLLRDHUP; ev.data.u64 = ref.u64; epoll_ctl(c->epollfd, EPOLL_CTL_ADD, c->fd_tap, &ev); + + info("Sending initial ARP and NDP request to retrieve" + " guest MAC address after reconnect"); + arp_send_init_req(c);
This should be conditional to whether we have IPv4 support enabled or not, and the check would need to be analogous to the one from tap4_handler() (sorry, it's a bit hidden): if (!c->ifi4 || ...) return ...;
+ ndp_send_init_req(c);
And this should only happen if IPv6 is enabled, see tap6_handler(): if (!c->ifi6 || ...) return ...; and also, arguably, iff NDP support is not disabled by means of --no-ndp (c->no_ndp). Strictly speaking, we could send this anyway and still fit the current documentation of --no-ndp: --no-ndp Disable NDP responses. NDP messages coming from guest or target namespace will be ignored. but this would make --no-ndp a misnomer, and given that we'll ignore neighbour advertisements, it makes no sense to send a solicitation anyway. All in all, I would just not do this on c->no_ndp. If you can think of a terse way of updating the man page to reflect this, that would be appreciated, but I think it's also fine like it is. By the way, we'll also ignore responses on --no-icmp. I just realised that the man page is currently inaccurate, because it refers to echo messages only, but in tap6_handler() we have: if (proto == IPPROTO_ICMPV6) { ... if (c->no_icmp) continue; ... if (ndp(c, saddr, &ndp_data)) continue; ... } So I think we should update the man page to mention that --no-icmp means no ICMP and no ICMPv6, and also skip sending the NDP solicitation in that case. Or update the code to reflect what the man page says, but then the option could be considered a misnomer, so I wouldn't go this way.
}
/** @@ -1503,11 +1510,12 @@ void tap_backend_init(struct ctx *c) case MODE_PASST: tap_sock_unix_init(c);
- /* In passt mode, we don't know the guest's MAC address until it - * sends us packets. Use the broadcast address so that our - * first packets will reach it. + /* In passt mode, we don't know the guest's MAC address until + * it sends us packets (e.g. responds to our initial ARP or
I don't think the response is an example, so I wouldn't use "e.g." here, rather "i.e." / "that is", if that's the expected behaviour.
+ * NDP request). Until then, use the broadcast address so + * that our first packets will have a chance to reach it. */ - memset(&c->guest_mac, 0xff, sizeof(c->guest_mac)); + memcpy(&c->guest_mac, MAC_BROADCAST, sizeof(c->guest_mac)); break; }
diff --git a/util.h b/util.h index 2a8c38f..3719f0c 100644 --- a/util.h +++ b/util.h @@ -97,6 +97,7 @@ void abort_with_msg(const char *fmt, ...) #define FD_PROTO(x, proto) \ (IN_INTERVAL(c->proto.fd_min, c->proto.fd_max, (x)))
+#define MAC_BROADCAST ((uint8_t [ETH_ALEN]){ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff })
This can be easily wrapped to fit 80 columns without otherwise affecting readability, see examples just above and below: #define MAC_BROADCAST \ ((uint8_t [ETH_ALEN]){ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff })
#define MAC_ZERO ((uint8_t [ETH_ALEN]){ 0 }) #define MAC_IS_ZERO(addr) (!memcmp((addr), MAC_ZERO, ETH_ALEN))
The rest looks good to me! -- Stefano
Dear Stefano, Thanks for your timely review. I agree with almost everything of it. Just a small clarification on this one: Stefano Brivio wrote:
@@ -1503,11 +1510,12 @@ void tap_backend_init(struct ctx *c) case MODE_PASST: tap_sock_unix_init(c);
- /* In passt mode, we don't know the guest's MAC address until it - * sends us packets. Use the broadcast address so that our - * first packets will reach it. + /* In passt mode, we don't know the guest's MAC address until + * it sends us packets (e.g. responds to our initial ARP or
I don't think the response is an example, so I wouldn't use "e.g." here, rather "i.e." / "that is", if that's the expected behaviour.
The reason for using "e.g." is the following: There is still the "usual" case where the passt client (QEMU) was freshly started together with passt. In that case, it *will not* respond to neither our ARP nor NDP request, simply because it won't recognize the IPv4/6 addresses, because it doesn't yet have any. In that situation, we'll learn about the guest's MAC address only after it sends a DCHP request, or NDP, or similar to us, on its own initiative - not as a response to anything we might have sent. So I'd propose to either keep the "e.g." wording, or to extend the comment by enumerating all possible cases. (Regarding the latter, I don't feel confident enough to be able to really enumerate them all. In addition to DCHP, NDP NS and DHCPv6, a sufficient strange client network stack might even broadcast nonsense packets to us, which we might not process further, but still learn the guest's MAC address from.) Best regards, Volker -- .---<<<((()))>>>---. | [[||]] | '---<<<((()))>>>---'
On Tue, 9 Sep 2025 12:10:45 +0200
Volker Diels-Grabsch
David Gibson wrote:
Looks good to me. It would be nice to also send an Neighbour Discovery request to accomplish the same thing for IPv6 only guests, but that could be a separate patch.
Good point! I'd prefer to do it in the same patch, as I'd also like fix another minor detail in this one.
Just for the sake of clarity: I had a look at RFC4861 and there are many types of Neighbour Discovery requests. For our purpose, I believe that we want to send a "Neighbor Solicitation Message" described in section 4.3:
https://datatracker.ietf.org/doc/html/rfc4861#section-4.3
Do you agree, or did you have something else in mind?
I was about to comment on this, but from the new patch you sent, I see you already figured out it's a Neighbour Solicitation and that we already have some bits of code for that. :)
Note that even with this patch, active TCP connections (and in some cases UDP flows) will be broken by a passt restart.
Indeed, that is unavoidable for a user-space tool opening TCP and UDP connections, I guess, unless "passt" itself is wrapped into another process or system tool that keeps those connections open. But let's not go into that.
Actually, it's not really unavoidable, in the sense that we recently added (see migrate.c, repair.c, passt-repair.c, and passt-repair(1)) support for migration of live TCP connections triggered by vhost-user commands: https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#migrating-back-e... which is based on the TCP_REPAIR socket option in the Linux kernel, which was in turn added to support a similar feature in CRIU: https://criu.org/TCP_connection and while this was done with KubeVirt in mind: https://github.com/kubevirt/enhancements/blob/main/veps/sig-network/passt/pa... that is, migration between two different nodes / hosts, there's nothing that really prevents migration between two instances of passt via, for example, load/dump from/to a binary file. Actually, we initially wanted to add the file option for testing purposes, but we skipped it eventually and went straight ahead for the direct implementation. Some bits of "documentation": git log migrate.c test/migrate/basic ...yes, a new website with some space for this stuff is in (infinitesimally slow) progress. -- Stefano
On Tue, Sep 09, 2025 at 05:55:30PM +0200, Stefano Brivio wrote:
On Tue, 9 Sep 2025 12:10:45 +0200 Volker Diels-Grabsch
wrote: David Gibson wrote:
Looks good to me. It would be nice to also send an Neighbour Discovery request to accomplish the same thing for IPv6 only guests, but that could be a separate patch.
Good point! I'd prefer to do it in the same patch, as I'd also like fix another minor detail in this one.
Just for the sake of clarity: I had a look at RFC4861 and there are many types of Neighbour Discovery requests. For our purpose, I believe that we want to send a "Neighbor Solicitation Message" described in section 4.3:
https://datatracker.ietf.org/doc/html/rfc4861#section-4.3
Do you agree, or did you have something else in mind?
I was about to comment on this, but from the new patch you sent, I see you already figured out it's a Neighbour Solicitation and that we already have some bits of code for that. :)
Note that even with this patch, active TCP connections (and in some cases UDP flows) will be broken by a passt restart.
Indeed, that is unavoidable for a user-space tool opening TCP and UDP connections, I guess, unless "passt" itself is wrapped into another process or system tool that keeps those connections open. But let's not go into that.
Actually, it's not really unavoidable, in the sense that we recently added (see migrate.c, repair.c, passt-repair.c, and passt-repair(1)) support for migration of live TCP connections triggered by vhost-user commands:
Right, but doing this requires knowing in advance, support from qemu and a bunch of infrastructure. It's not going to work if you just kill and restart passt. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (3)
-
David Gibson
-
Stefano Brivio
-
Volker Diels-Grabsch