[PATCH 0/2] vhost-user, dhcp: Fix iPXE network boot over vhost-user
iPXE network boot over vhost-user was broken because passt
unconditionally skipped TCP/UDP checksum computation, relying on
VIRTIO_NET_HDR_F_DATA_VALID to tell the guest the checksums were
valid. Linux guests happened to work because their virtio-net driver
honours DATA_VALID regardless of feature negotiation, but iPXE
verifies checksums strictly and never negotiates VIRTIO_NET_F_GUEST_CSUM.
This series handles correctly VIRTIO_NET_F_GUEST_CSUM feature to fix that
and adds a minimal --dhcp-boot option that populates the BOOTP/DHCP 'file'
field, providing just enough for testing iPXE UDP and TCP support.
This can be tested as following:
- Create an ipxe file configuration
cat > boot-alpine.ipxe <
According to the virtio-net specification, when the VIRTIO_NET_F_GUEST_CSUM
is negotiated, the device can set VIRTIO_NET_HDR_F_DATA_VALID in the
virtio-net header to indicate that packet checksums have been validated,
allowing the guest to skip verification. Without this feature, the device
must provide fully checksummed packets.
The vhost-user TCP and UDP paths were unconditionally skipping checksum
computation, regardless of whether GUEST_CSUM was negotiated. This
went undetected with Linux guests because Linux's virtio-net driver
honours VIRTIO_NET_HDR_F_DATA_VALID regardless of whether
VIRTIO_NET_F_GUEST_CSUM was negotiated, marking such packets as
CHECKSUM_UNNECESSARY and skipping verification.
iPXE, however, does not negotiate GUEST_CSUM, ignores the DATA_VALID
flag entirely, and always verifies checksums. This caused TCP
connections to fail: the SYN-ACK had a zero TCP checksum, iPXE rejected
it, and the connection timed out in SYN_RCVD.
Adding --pcap happened to mask the bug, because the pcap code path
forces checksum computation to ensure correct captures.
Offer VIRTIO_NET_F_GUEST_CSUM in the device features, and only skip
checksum computation when the guest has actually negotiated it. When
GUEST_CSUM is not negotiated, always compute valid checksums as required
by the specification.
We keep setting VIRTIO_NET_HDR_F_DATA_VALID unconditionally in
VU_HEADER: when GUEST_CSUM is negotiated, the flag lets the guest skip
checksum verification; when it is not, the spec says the guest should
ignore the flags field, so setting it is harmless.
Signed-off-by: Laurent Vivier
Add a --dhcp-boot option that populates the 'file' field in DHCP reply
messages with the given filename.
Using --dhcp-boot together with --no-dhcp is rejected at startup.
Signed-off-by: Laurent Vivier
On Fri, Apr 03, 2026 at 10:02:03AM +0200, Laurent Vivier wrote:
According to the virtio-net specification, when the VIRTIO_NET_F_GUEST_CSUM is negotiated, the device can set VIRTIO_NET_HDR_F_DATA_VALID in the virtio-net header to indicate that packet checksums have been validated, allowing the guest to skip verification. Without this feature, the device must provide fully checksummed packets.
The vhost-user TCP and UDP paths were unconditionally skipping checksum computation, regardless of whether GUEST_CSUM was negotiated. This went undetected with Linux guests because Linux's virtio-net driver honours VIRTIO_NET_HDR_F_DATA_VALID regardless of whether VIRTIO_NET_F_GUEST_CSUM was negotiated, marking such packets as CHECKSUM_UNNECESSARY and skipping verification.
iPXE, however, does not negotiate GUEST_CSUM, ignores the DATA_VALID flag entirely, and always verifies checksums. This caused TCP connections to fail: the SYN-ACK had a zero TCP checksum, iPXE rejected it, and the connection timed out in SYN_RCVD.
Adding --pcap happened to mask the bug, because the pcap code path forces checksum computation to ensure correct captures.
Offer VIRTIO_NET_F_GUEST_CSUM in the device features, and only skip checksum computation when the guest has actually negotiated it. When GUEST_CSUM is not negotiated, always compute valid checksums as required by the specification.
We keep setting VIRTIO_NET_HDR_F_DATA_VALID unconditionally in VU_HEADER: when GUEST_CSUM is negotiated, the flag lets the guest skip checksum verification; when it is not, the spec says the guest should ignore the flags field, so setting it is harmless.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson
--- tcp_vu.c | 8 ++++++-- udp_vu.c | 7 ++++--- vhost_user.c | 1 + 3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/tcp_vu.c b/tcp_vu.c index 1927b14e0962..49d39e7de201 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -126,6 +126,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; size_t optlen, hdrlen, iov_cnt, iov_used; struct vu_virtq_element flags_elem[2]; + uint32_t csum_flags = IP4_CSUM; struct iovec flags_iov[64]; struct tcp_syn_opts opts; struct iov_tail payload; @@ -137,6 +138,9 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) int elem_cnt; int ret;
+ if (*c->pcap || !vu_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM)) + csum_flags |= TCP_CSUM; + hdrlen = tcp_vu_hdrlen(CONN_V6(conn));
elem_cnt = vu_collect(vdev, vq, &flags_elem[0], 1, @@ -175,7 +179,7 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) iov_from_buf(payload.iov, payload.cnt, payload.off, &opts, optlen); tcp_fill_headers(c, conn, &eh, CONN_V4(conn) ? &ip4h : NULL, CONN_V6(conn) ? &ip6h : NULL, &th, &payload, - optlen, IP4_CSUM | (*c->pcap ? TCP_CSUM : 0), seq); + optlen, csum_flags, seq);
vu_pad(flags_elem[0].in_sg, iov_cnt, hdrlen + optlen); vu_flush(vdev, vq, flags_elem, elem_cnt, hdrlen + optlen); @@ -516,7 +520,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn)
hdrlen = tcp_vu_hdrlen(v6); check = IP4_CSUM; - if (*c->pcap) + if (*c->pcap || !vu_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM)) check |= TCP_CSUM; for (i = 0, previous_dlen = -1; i < frame_cnt; i++) { struct iovec *iov = &iov_vu[frame[i].idx_iovec]; diff --git a/udp_vu.c b/udp_vu.c index 5bc9509a1b98..ed888a2baab3 100644 --- a/udp_vu.c +++ b/udp_vu.c @@ -234,12 +234,13 @@ void udp_vu_sock_to_tap(const struct ctx *c, int s, int n, flow_sidx_t tosidx) if (iov_cnt > 0) { struct iov_tail data = IOV_TAIL(iov_vu, iov_cnt, VNET_HLEN); size_t l4len = udp_vu_prepare(c, &data, toside, dlen); - if (*c->pcap) { + if (!vu_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM) || + *c->pcap) udp_vu_csum(toside, &data, l4len); + vu_pad(iov_vu, iov_cnt, hdrlen + dlen); + if (*c->pcap) pcap_iov(iov_vu, iov_cnt, VNET_HLEN, hdrlen + dlen - VNET_HLEN); - } - vu_pad(iov_vu, iov_cnt, hdrlen + dlen); vu_flush(vdev, vq, elem, elem_used, hdrlen + dlen); vu_queue_notify(vdev, vq); } diff --git a/vhost_user.c b/vhost_user.c index f062badd3311..a1259c2624c0 100644 --- a/vhost_user.c +++ b/vhost_user.c @@ -322,6 +322,7 @@ static bool vu_get_features_exec(struct vu_dev *vdev, { uint64_t features = 1ULL << VIRTIO_F_VERSION_1 | + 1ULL << VIRTIO_NET_F_GUEST_CSUM | 1ULL << VIRTIO_NET_F_MRG_RXBUF | 1ULL << VHOST_F_LOG_ALL | 1ULL << VHOST_USER_F_PROTOCOL_FEATURES; -- 2.53.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, Apr 03, 2026 at 10:02:04AM +0200, Laurent Vivier wrote:
Add a --dhcp-boot option that populates the 'file' field in DHCP reply messages with the given filename.
Using --dhcp-boot together with --no-dhcp is rejected at startup.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson
--- conf.c | 10 ++++++++++ dhcp.c | 5 ++++- passt.1 | 6 ++++++ passt.h | 1 + 4 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/conf.c b/conf.c index ae37bf96dac4..2cf1bc7420d3 100644 --- a/conf.c +++ b/conf.c @@ -984,6 +984,7 @@ static void usage(const char *name, FILE *f, int status) " --no-udp Disable UDP protocol handler\n" " --no-icmp Disable ICMP/ICMPv6 protocol handler\n" " --no-dhcp Disable DHCP server\n" + " --dhcp-boot FILE DHCP boot filename option\n" " --no-ndp Disable NDP responses\n" " --no-dhcpv6 Disable DHCPv6 server\n" " --no-ra Disable router advertisements\n" @@ -1537,6 +1538,7 @@ void conf(struct ctx *c, int argc, char **argv) {"migrate-exit", no_argument, NULL, 29 }, {"migrate-no-linger", no_argument, NULL, 30 }, {"stats", required_argument, NULL, 31 }, + {"dhcp-boot", required_argument, NULL, 32 }, { 0 }, }; const char *optstring = "+dqfel:hs:F:I:p:P:m:a:n:M:g:i:o:D:S:H:461t:u:T:U:"; @@ -1775,6 +1777,11 @@ void conf(struct ctx *c, int argc, char **argv) die("Can't display statistics if not running in foreground"); c->stats = strtol(optarg, NULL, 0); break; + case 32: + if (snprintf_check(c->dhcp_boot, sizeof(c->dhcp_boot), + "%s", optarg)) + die("Invalid DHCP bootfile name: %s", optarg); + break; case 'd': c->debug = 1; c->quiet = 0; @@ -2206,6 +2213,9 @@ void conf(struct ctx *c, int argc, char **argv) c->no_dhcpv6 = 1; }
+ if (c->no_dhcp && c->dhcp_boot[0]) + die("--dhcp-boot cannot be used with --no-dhcp"); + get_dns(c);
if (!*c->pasta_ifn) { diff --git a/dhcp.c b/dhcp.c index 1ff8cba9f93d..7fdb6e051dec 100644 --- a/dhcp.c +++ b/dhcp.c @@ -344,6 +344,9 @@ int dhcp(const struct ctx *c, struct iov_tail *data) m->op != BOOTREQUEST) return -1;
+ static_assert(sizeof(reply.file) == sizeof(c->dhcp_boot), + "dhcp_boot must have the same size as reply.file"); + reply.op = BOOTREPLY; reply.htype = m->htype; reply.hlen = m->hlen; @@ -357,7 +360,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data) reply.giaddr = m->giaddr; memcpy(&reply.chaddr, m->chaddr, sizeof(reply.chaddr)); memset(&reply.sname, 0, sizeof(reply.sname)); - memset(&reply.file, 0, sizeof(reply.file)); + memcpy(&reply.file, c->dhcp_boot, sizeof(reply.file)); reply.magic = m->magic;
for (i = 0; i < ARRAY_SIZE(opts); i++) diff --git a/passt.1 b/passt.1 index 13e8df9de9f3..a6ac2884bf3f 100644 --- a/passt.1 +++ b/passt.1 @@ -342,6 +342,12 @@ Disable the DHCP server. DHCP client requests coming from guest or target namespace will be silently dropped. Implied if there is no gateway on the selected IPv4 default route.
+.TP +.BR \-\-dhcp-boot " " \fIfile +Set the boot filename in DHCP replies to \fIfile\fR. This populates the +\fIfile\fR field in BOOTP/DHCP reply messages, which can be used for +network booting (PXE). Cannot be used with \fB--no-dhcp\fR. + .TP .BR \-\-no-ndp Disable Neighbor Discovery. NDP messages coming from guest or target diff --git a/passt.h b/passt.h index 62b8dcdf0a41..4044a4c07ea3 100644 --- a/passt.h +++ b/passt.h @@ -255,6 +255,7 @@ struct ctx {
char hostname[PASST_MAXDNAME]; char fqdn[PASST_MAXDNAME]; + char dhcp_boot[128];
int ifi6; struct ip6_ctx ip6; -- 2.53.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Laurent Vivier