On Mon, 25 Sep 2023 14:57:40 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Sat, Sep 23, 2023 at 12:06:10AM +0200, Stefano Brivio wrote:Oops. I guess I should simply s/shrink/grow/ here.Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- passt.1 | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/passt.1 b/passt.1 index 1ad4276..bcbe6fd 100644 --- a/passt.1 +++ b/passt.1 @@ -926,6 +926,39 @@ If the sending window cannot be queried, it will always be announced as the current sending buffer size to guest or target namespace. This might affect throughput of TCP connections. +.SS Tuning for high throughput + +On Linux, by default, the maximum memory that can be set for receive and send +socket buffers is 208 KiB. Those limits are set by the +\fI/proc/sys/net/core/rmem_max\fR and \fI/proc/sys/net/core/wmem_max\fR files, +see \fBsocket\fR(7). + +As of Linux 6.5, while the TCP implementation can dynamically shrink buffers +depending on utilisation even above those limits, such a small limit will"shrink buffers" and "even above those limits" don't seem to quite work together.If we don't use SO_RCVBUF, yes... but we currently do, and with that, we can get a much larger initial window (as we do now). On the other hand, maybe, as mentioned in my follow-up about 3/5, we should drop SO_RCVBUF for TCP sockets.+reflect on the advertised TCP window at the beginning of a connection, and theHmmm.... while [rw]mem_max might limit that initial window size, I wouldn't expect increasing the limits alone to increase that initial window size: wouldn't that instead be affected by the TCP default buffer size i.e. the middle value in net.ipv4.tcp_rmem?Right. Let's keep this patch for a later time then, and meanwhile check if we should drop SO_RCVBUF, SO_SNDBUF, or both, for TCP sockets. -- Stefano+buffer size of the UNIX domain socket buffer used by \fBpasst\fR cannot exceed +these limits anyway. + +Further, as of Linux 6.5, using socket options \fBSO_RCVBUF\fR and +\fBSO_SNDBUF\fR will prevent TCP buffers to expand above the \fIrmem_max\fR and +\fIwmem_max\fR limits because the automatic adjustment provided by the TCP +implementation is then disabled. + +As a consequence, \fBpasst\fR and \fBpasta\fR probe these limits at start-up and +will not set TCP socket buffer sizes if they are lower than 2 MiB, because this +would affect the maximum size of TCP buffers for the whole duration of a +connection. + +Note that 208 KiB is, accounting for kernel overhead, enough to fit less than +three TCP packets at the default MSS. In applications where high throughput is +expected, it is therefore advisable to increase those limits to at least 2 MiB, +or even 16 MiB: + +.nf + sysctl -w net.core.rmem_max=$((16 << 20) + sysctl -w net.core.wmem_max=$((16 << 20) +.fiAs noted in a previous mail, empirically, this doesn't necessarily seem to work better for me. I'm wondering if we'd be better off never touching RCFBUF and SNDBUF for TCP sockets, and letting the kernel do its adaptive thing. We probably still want to expand the buffers as much as we can for the Unix socket, though. And we likely still want expanded limits for the tests so that iperf3 can use large buffers