[PATCH v2 00/13] vhost-user,udp: Handle multiple iovec entries per virtqueue element
Some virtio-net drivers (notably iPXE) provide descriptors where the vnet header and the frame payload are in separate buffers, resulting in two iovec entries per virtqueue element. Currently, the RX (host to guest) path assumes a single iovec per element, which triggers: ASSERTION FAILED in virtqueue_map_desc (virtio.c:403): num_sg < max_num_sg This series reworks the UDP vhost-user receive path to support multiple iovec entries per element, fixing the iPXE crash. This series only addresses the UDP path. TCP vhost-user will be updated to use multi-iov elements in a subsequent series. v2: - add iov_truncate(), iov_memset() - remove iov_tail_truncate() and iov_tail_zero_end() - manage 802.3 minimum frame size Laurent Vivier (13): iov: Add iov_truncate() helper and use it in vu handlers vhost-user: Centralise 802.3 frame padding in vu_collect() and vu_flush() vhost-user: Use ARRAY_SIZE(elem) instead of VIRTQUEUE_MAX_SIZE udp_vu: Use iov_tail to manage virtqueue buffers udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller iov: Add IOV_PUT_HEADER() to write header data back to iov_tail udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() udp_vu: Use iov_tail in udp_vu_prepare() vu_common: Pass iov_tail to vu_set_vnethdr() vu_common: Accept explicit iovec counts in vu_set_element() vu_common: Accept explicit iovec count per element in vu_init_elem() vu_common: Prepare to use multibuffer with guest RX vhost-user,udp: Use 2 iovec entries per element iov.c | 69 +++++++++++++++++ iov.h | 17 ++++- tcp_vu.c | 73 +++++++++--------- udp.c | 72 +++++++++--------- udp_internal.h | 10 ++- udp_vu.c | 197 +++++++++++++++++++++++++------------------------ vu_common.c | 127 +++++++++++++++++++------------ vu_common.h | 20 +++-- 8 files changed, 356 insertions(+), 229 deletions(-) -- 2.53.0
Add a generic iov_truncate() function that truncates an IO vector to a
given number of bytes, returning the number of iov entries that contain
data after truncation.
Use it in udp_vu_sock_recv() and tcp_vu_sock_recv() to replace the
open-coded truncation logic that adjusted iov entries after recvmsg().
Also convert the direct iov_len assignment in tcp_vu_send_flag() to use
iov_truncate() for consistency.
Add an ASSERT() in tcp_vu_data_from_sock() to quiet the Coverity error:
passt/tcp_vu.c:457:3:
19. overflow_const: Expression "dlen + hdrlen", where "dlen" is known to
be equal to -86, and "hdrlen" is known to be equal to 86, underflows
the type of "dlen + hdrlen", which is type "unsigned long".
Signed-off-by: Laurent Vivier
The per-protocol padding done by vu_pad() in tcp_vu.c and udp_vu.c was
only correct for single-buffer frames, and assumed the padding area always
fell within the first iov. It also relied on each caller computing the
right MAX(..., ETH_ZLEN + VNET_HLEN) size for vu_collect() and calling
vu_pad() at the right point.
Centralise padding logic into the two shared vhost-user helpers instead:
- vu_collect() now ensures at least ETH_ZLEN + VNET_HLEN bytes of buffer
space are collected, so there is always room for a minimum-sized frame.
- vu_flush() computes the actual frame length (accounting for
VIRTIO_NET_F_MRG_RXBUF multi-buffer frames) and passes the padded
length to vu_queue_fill().
A new iov_memset() helper in iov.c zero-fills the padding area in each
buffer before iov_truncate() sets the logical frame size. The callers in
tcp_vu.c, udp_vu.c and vu_send_single() use iov_memset() directly,
replacing the now-removed vu_pad() helper and the MAX(..., ETH_ZLEN +
VNET_HLEN) size calculations passed to vu_collect().
Centralising padding here will also ease the move to multi-iovec per
element support, since there will be a single place to update.
In vu_send_single(), fix padding, truncation and data copy to use the
requested frame size rather than the total available buffer space from
vu_collect(), which could be larger. Also add matching padding, truncation
and explicit size to vu_collect() for the DUP_ACK path in
tcp_vu_send_flag().
Signed-off-by: Laurent Vivier
When passing the element count to vu_init_elem(), vu_collect(), or using
it as a loop bound, use ARRAY_SIZE(elem) instead of the VIRTQUEUE_MAX_SIZE.
No functional change.
Signed-off-by: Laurent Vivier
Replace direct iovec pointer arithmetic in UDP vhost-user handling with
iov_tail operations.
udp_vu_sock_recv() now takes an iov/cnt pair instead of using the
file-scoped iov_vu array, and returns the data length rather than the
iov count. Internally it uses iov_drop_header() to skip past L2/L3/L4
headers before receiving, and iov_tail_clone() to build the recvmsg()
iovec, removing the manual pointer offset and restore pattern.
udp_vu_prepare() and udp_vu_csum() take a const struct iov_tail *
instead of referencing iov_vu directly, making data flow explicit.
udp_vu_csum() uses iov_drop_header() and IOV_REMOVE_HEADER() to locate
the UDP header and payload, replacing manual offset calculations via
vu_payloadv4()/vu_payloadv6().
Signed-off-by: Laurent Vivier
udp_vu_sock_recv() currently mixes two concerns: receiving data from the
socket and managing virtqueue buffers (collecting, rewinding, releasing).
This makes the function harder to reason about and couples socket I/O
with virtqueue state.
Move all virtqueue operations, vu_collect(), vu_init_elem(),
vu_queue_rewind(), vu_set_vnethdr(), and the queue-readiness check, into
udp_vu_sock_to_tap(), which is the only caller. This turns
udp_vu_sock_recv() into a pure socket receive function that simply reads
into the provided iov array and adjusts its length.
Signed-off-by: Laurent Vivier
Add a counterpart to IOV_PEEK_HEADER() that writes header data back
to an iov_tail after modification. If the header pointer matches the
original iov buffer location, it only advances the offset. Otherwise,
it copies the data using iov_from_buf().
Signed-off-by: Laurent Vivier
Change udp_update_hdr4() and udp_update_hdr6() to take a separate
struct udphdr pointer and an iov_tail for the payload, instead of a
struct udp_payload_t pointer and an explicit data length.
This decouples the header update functions from the udp_payload_t memory
layout, which assumes all headers and data sit in a single contiguous
buffer. The vhost-user path uses virtqueue-provided scatter-gather
buffers where this assumption does not hold; passing an iov_tail lets
both the tap path and the vhost-user path share the same functions
without casting through layout-specific helpers.
Signed-off-by: Laurent Vivier
Rework udp_vu_prepare() to use IOV_REMOVE_HEADER() and IOV_PUT_HEADER()
to walk through Ethernet, IP and UDP headers instead of the layout-specific
helpers (vu_eth(), vu_ip(), vu_payloadv4(), vu_payloadv6()) that assume a
contiguous buffer. The payload length is now implicit in the iov_tail, so
drop the dlen parameter.
Signed-off-by: Laurent Vivier
Refactor vu_set_vnethdr() to take an iov_tail pointer instead of a
direct pointer to the virtio_net_hdr_mrg_rxbuf structure.
This makes the function use IOV_PEEK_HEADER() and IOV_PUT_HEADER()
to read and write the virtio-net header through the iov_tail abstraction.
Signed-off-by: Laurent Vivier
Previously, vu_set_element() derived the number of iovec entries from
whether the pointer was NULL or not (using !!out_sg and !!in_sg). This
implicitly limited each virtqueue element to at most one iovec per
direction.
Change the function signature to accept explicit out_num and in_num
parameters, allowing callers to specify multiple iovec entries per
element when needed. Update all existing call sites to pass the
equivalent values (0 for NULL pointers, 1 for valid pointers).
No functional change.
Signed-off-by: Laurent Vivier
1b95bd6fa114 ("vhost_user: fix multibuffer from linux") introduces
multibuffer with TX (from the guest), but with iPXE we need to handle
also multibuffer for RX (to the guest). This patch makes the parameter
generic and global.
No functional change.
Signed-off-by: Laurent Vivier
Extend vu_init_elem() to accept an iov_per_elem parameter specifying
how many iovec entries to assign to each virtqueue element. The iov
array is now strided by iov_per_elem rather than 1.
Update all callers to pass 1, preserving existing behavior.
No functional change.
Signed-off-by: Laurent Vivier
iPXE places the vnet header in one virtqueue descriptor and the payload
in another. When passt maps these descriptors, it needs two iovecs per
virtqueue element to handle this layout.
Without this, passt crashes with:
ASSERTION FAILED in virtqueue_map_desc (virtio.c:403): num_sg < max_num_sg
Signed-off-by: Laurent Vivier
participants (1)
-
Laurent Vivier