On Tue, Apr 9, 2024 at 5:28 PM
From: Jon Maloy
When reading received messages from a socket with MSG_PEEK, we may want to read the contents with an offset, like we can do with pread/preadv() when reading files. Currently, it is not possible to do that.
In this commit, we add support for the SO_PEEK_OFF socket option for TCP, in a similar way it is done for Unix Domain sockets.
In the iperf3 log examples shown below, we can observe a throughput improvement of 15-20 % in the direction host->namespace when using the protocol splicer 'pasta' (https://passt.top). This is a consistent result.
pasta(1) and passt(1) implement user-mode networking for network namespaces (containers) and virtual machines by means of a translation layer between Layer-2 network interface and native Layer-4 sockets (TCP, UDP, ICMP/ICMPv6 echo).
Received, pending TCP data to the container/guest is kept in kernel buffers until acknowledged, so the tool routinely needs to fetch new data from socket, skipping data that was already sent.
Suggested-by: Paolo Abeni
Reviewed-by: Stefano Brivio Signed-off-by: Jon Maloy
Reviewed-by: Eric Dumazet