On Mon, Jul 28, 2025 at 06:55:50PM +0200, Eugenio Perez Martin wrote:
On Thu, Jul 24, 2025 at 3:03 AM David Gibson
wrote: On Wed, Jul 09, 2025 at 07:47:46PM +0200, Eugenio Pérez wrote:
The vhost-kernel module is async by nature: the driver (pasta) places a few buffers in the virtqueue and the device (vhost-kernel) trust the
s/trust/trusts/
Fixing in the next version.
driver will not modify them until it uses them. To implement it is not possible with TCP at the moment, as tcp_buf trust it can reuse the buffers as soon as tcp_payload_flush() finish.
To achieve async let's make tcp_buf work with a circular ring, so vhost can transmit at the same time pasta is queing more data. When a buffer is received from a TCP socket, the element is placed in the ring and sock_head is moved: [][][][] ^ ^ | | | sock_head | tail tap_head
When the data is sent to vhost through the tx queue, tap_head is moved forward: [][][][] ^ ^ | | | sock_head | tap_head | tail
Finally, the tail move forward when vhost has used the tx buffers, so tcp_payload (and all lower protocol buffers) can be reused. [][][][] ^ | sock_head tap_head tail
This all sounds good. I wonder if it might be clearer to do this circular queue conversion as a separate patch series. I think it makes sense even without the context of vhost (it's closer to how most network things work).
Sure it can be done.
In the case of error queueing to the vhost virtqueue, sock_head moves backwards. The only possible error is that the queue is full, as
sock_head moves backwards? Or tap_head moves backwards?
Sock head moves backwards. Tap_head cannot move backwards as vhost does not have a way to report "the last X packets has not been sent".
Right, I realised that as I read further.
virtio-net does not report success on packet sending.
Starting as simple as possible, and only implementing the count variables in this patch so it keeps working as previously. The circular behavior will be added on top.
From ~16BGbit/s to ~13Gbit/s compared with write(2) to the tap.
I don't really understand what you're comparing here.
Sending through vhost-net vs write(2) to tap device.
Ok. That's a bit dissapointing.
Signed-off-by: Eugenio Pérez
--- tcp_buf.c | 63 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 23 deletions(-) diff --git a/tcp_buf.c b/tcp_buf.c index 242086d..0437120 100644 --- a/tcp_buf.c +++ b/tcp_buf.c @@ -53,7 +53,12 @@ static_assert(MSS6 <= sizeof(tcp_payload[0].data), "MSS6 is greater than 65516")
/* References tracking the owner connection of frames in the tap outqueue */ static struct tcp_tap_conn *tcp_frame_conns[TCP_FRAMES_MEM]; -static unsigned int tcp_payload_used; +static unsigned int tcp_payload_sock_used, tcp_payload_tap_used;
I think the "payload" here is a hangover from when we had separate queues for flags-only and data-containing packets. We can probably drop it and make a bunch of names shorter.
Maybe we can short even more if we isolate this in its own circular_buffer.h or equivalent. UDP will also need it.
Maybe, yes. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson