On Mon, 17 Feb 2025 14:49:39 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Sun, Feb 16, 2025 at 11:12:15PM +0100, Stefano Brivio wrote:Yeah, I was undecided as well, then I tested and tested, and I realised that commit 904b86ade7db ("tcp: Rework window handling, timers, add SO_RCVLOWAT and pools for sockets/pipes") added this gem, still there: if (read >= (long)c->tcp.pipe_size * 10 / 100) continue; if (!bitmap_isset(rcvlowat_set, conn - ts) && read > (long)c->tcp.pipe_size / 10) { int lowat = c->tcp.pipe_size / 4; setsockopt(move_from, SOL_SOCKET, SO_RCVLOWAT, &lowat, sizeof(lowat)); which means that we'll not set SO_RCVLOWAT anyway, because if read > c->tcp.pipe_size / 10, we'll skip the second block, and if not, we'll skip it anyway. Now, I have a clear memory of characterising those 10% and 25% values over a wide range of pipe sizes, message sizes, etc. Other than SSH and installing packages in a container (and check that nothing gets stuck for ~one second), my basic test idea was: $ iperf3 -c localhost -p 5202 -l 100 with port 5202 forwarded to the network namespace and iperf3 server listening there. I think I added another typo while cleaning up. I probably meant: [...] read < (long)c->tcp.pipe_size / 10) { ...and if I do that, it finally works. But anyway, it's too many typos at this point and especially we never had a release with it enabled, so I'm not "fixing" this for the moment, it needs a lot more testing than I can do now. -- StefanoIn commit e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation"), this: if (!bitmap_isset(rcvlowat_set, conn - ts) && readlen > (long)c->tcp.pipe_size / 10) { (note the !) became: if (conn->flags & lowat_set_flag && readlen > (long)c->tcp.pipe_size / 10) { in the new tcp_splice_sock_handler(). We want to check, there, if we should set SO_RCVLOWAT, only if we haven't set it already. But, instead, we're checking if it's already set before we set it, so we'll never set it, of course. Fix the check and re-enable the functionality, which should give us improved CPU utilisation in non-interactive cases where we are not transferring at full pipe capacity. Fixes: e5eefe77435a ("tcp: Refactor to use events instead of states, split out spliced implementation") Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Ouch. Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> At least insofar as this clearly corrects towards the intended behaviour. Given that we inadvertently bee using RCVLOWAT for so long, I am a bit worried that this might expose deadlocks or stalls. But, I guess we debug that when we come to it.