Re: [PATCH 5/5] tap: Improve handling of partially received frames on qemu socket

26 Jul 2024

      On Fri, Jul 26, 2024 at 01:39:13PM +0200, Stefano Brivio wrote:
...
On Fri, 26 Jul 2024 17:20:31 +1000
David Gibson  wrote:
...
Because the Unix socket to qemu is a stream socket, we have no guarantee
of where the boundaries between recv() calls will lie.  Typically they
will lie on frame boundaries, because that's how qemu will send then, but
we can't rely on it.
Currently we handle this case by detecting when we have received a partial
frame and performing a blocking recv() to get the remainder, and only then
processing the frames. Change it so instead we save the partial frame
persistently and include it as the first thing processed next time we
receive data from the socket.  This handles a number of (unlikely) cases
which previously would not be dealt with correctly:
* If qemu sent a partial frame then waited some time before sending the
  remainder, previously we could block here for an unacceptably long time
* If qemu sent a tiny partial frame (< 4 bytes) we'd leave the loop without
  doing the partial frame handling, which would put us out of sync with
  the stream from qemu
* If a the blocking recv() only received some of the remainder of the
  frame, not all of it, we'd return leaving us out of sync with the
  stream again
Caveat: This could memmove() a moderate amount of data (ETH_MAX_MTU).  This
is probably acceptable because it's an unlikely case in practice.  If
necessary we could mitigate this by using a true ring buffer.
I don't think that that memmove() is a problem if we have a single
recv(), even if it happens to be one memmove() for every recv() (guest
filling up the buffer, common in throughput tests and bulk transfers),
because it's very small in relative terms anyway.
I think the ringbuffer would be worth it with multiple recv() calls per
epoll wakeup, with EPOLLET.
So first, as noted on the earlier patch, I don't think multiple
recv()s per wakeup requires EPOLLET, though the reverse is true.

Regardless, AFAICT the proportion of memmove()s to data received would
not vary regardless of whether we do multiple recv()s per wakeup or
the same number of recv()s split across multiple wakeups.

Of course, if the recv()s line up with frame boundaries, as we expect,
then it doesn't matter anyway, since we won't get partial frames and
we won't need memmove()s.

-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson