On Wed, 15 Jan 2025 17:22:30 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:Under some conditions, linux can provide several buffers in the same element (multiple entries in the iovec array). I didn't identify what changed between the kernel guest that provides one buffer and the one that provides several (doesn't seem to be a kernel change or a configuration change).Perhaps memory pressure, or different page accounting between kernels?Fix the following assert: ASSERTION FAILED in virtqueue_map_desc (virtio.c:402): num_sg < max_num_sg What I can see is the buffer can be splitted in two iovecs: - vnet header - packet data This change manages this special case but the real fix will be to allow tap_add_packet() to manage iovec array. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>Applied. I just wonder, if it makes sense as a follow-up:--- vu_common.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/vu_common.c b/vu_common.c index 6d365bea5fe2..431fba6be0c0 100644 --- a/vu_common.c +++ b/vu_common.c @@ -18,6 +18,8 @@ #include "pcap.h" #include "vu_common.h" +#define VU_MAX_TX_BUFFER_NB 2 + /** * vu_packet_check_range() - Check if a given memory zone is contained in * a mapped guest memory region @@ -168,10 +170,15 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, count = 0; out_sg_count = 0; - while (count < VIRTQUEUE_MAX_SIZE) { + while (count < VIRTQUEUE_MAX_SIZE && + out_sg_count + VU_MAX_TX_BUFFER_NB <= VIRTQUEUE_MAX_SIZE) { int ret; - vu_set_element(&elem[count], &out_sg[out_sg_count], NULL); + elem[count].out_num = VU_MAX_TX_BUFFER_NB; + elem[count].out_sg = &out_sg[out_sg_count]; + elem[count].in_num = 0; + elem[count].in_sg = NULL; + ret = vu_queue_pop(vdev, vq, &elem[count]); if (ret < 0) break; @@ -181,11 +188,20 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, warn("virtio-net transmit queue contains no out buffers"); break; } - ASSERT(elem[count].out_num == 1); + if (elem[count].out_num == 1) { + tap_add_packet(vdev->context, + elem[count].out_sg[0].iov_len - hdrlen, + (char *)elem[count].out_sg[0].iov_base + + hdrlen); + } else { + /* vnet header can be in a separate iovec */ + ASSERT(elem[count].out_num == 2);I suppose we don't have strong guarantees about this. What about discarding the packet with a debug() message, at least until we have a more elegant solution, if this happens? For UDP and ICMP, that's the best thing we can do. For TCP, we could just discard a part of it, and the peer would tell our guest, but it's surely not practical to look into the packet here, so dropping it altogether would look reasonable.+ ASSERT(elem[count].out_sg[0].iov_len == (size_t)hdrlen);And similarly here (with an err() message), even though there's probably an issue in the hypervisor if this happens, but it doesn't mean we're doomed.+ tap_add_packet(vdev->context, + elem[count].out_sg[1].iov_len, + (char *)elem[count].out_sg[1].iov_base); + } - tap_add_packet(vdev->context, - elem[count].out_sg[0].iov_len - hdrlen, - (char *)elem[count].out_sg[0].iov_base + hdrlen); count++; } tap_handler(vdev->context, now);-- Stefano