On Fri, Jan 17, 2025 at 10:34:05AM +0100, Stefano Brivio wrote:Following up on 725acd111ba3 ("tcp_splice: Set (again) TCP_NODELAY on both sides"), David argues that, in general, we don't know what kind of TCP traffic we're dealing with, on any side or path. TCP segments might have been delivered to our socket with a PSH flag, but we don't have a way to know about it. Similarly, the guest might send us segments with PSH or URG set, but we don't know if we should generally TCP_CORK sockets and uncork on those flags, because that would assume they're running a Linux kernel (and a particular version of it) matching the kernel that delivers outbound packets for us. Given that we can't make any assumption and everything might very well be interactive traffic, disable Nagle's algorithm on all non-spliced sockets as well. After all, John Nagle himself is nowadays recommending that delayed ACKs should never be enabled together with his algorithm, but we don't have a practical way to ensure that our environment is free from delayed ACKs (TCP_QUICKACK is not really usable for this purpose): https://news.ycombinator.com/item?id=34180239 Suggested-by: David Gibson <david(a)gibson.dropbear.id.au> Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- tcp.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/tcp.c b/tcp.c index 3b3193a..c570f42 100644 --- a/tcp.c +++ b/tcp.c @@ -756,6 +756,19 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) trace("TCP: failed to set SO_SNDBUF to %i", v); } +/** + * tcp_sock_set_nodelay() - Set TCP_NODELAY option (disable Nagle's algorithm) + * @s: Socket, can be -1 to avoid check in the caller + */ +static void tcp_sock_set_nodelay(int s) +{ + if (s == -1) + return; + + if (setsockopt(s, SOL_TCP, TCP_NODELAY, &((int){ 1 }), sizeof(int))) + trace("TCP: failed to set TCP_NODELAY on socket %i", s);I think this deserves something a little stronger than trace(). It might have a significant impact on latency, and if we get a failure enabling a feature that's been in TCP longer than Linux has existed, something pretty weird is going on.+} + /** * tcp_update_csum() - Calculate TCP checksum * @psum: Unfolded partial checksum of the IPv4 or IPv6 pseudo-header @@ -1285,6 +1298,7 @@ static int tcp_conn_new_sock(const struct ctx *c, sa_family_t af) return -errno; tcp_sock_set_bufsize(c, s); + tcp_sock_set_nodelay(s); return s; } @@ -2261,6 +2275,8 @@ static int tcp_sock_init_one(const struct ctx *c, const union inany_addr *addr, return s; tcp_sock_set_bufsize(c, s); + tcp_sock_set_nodelay(s); +This is a listening socket, not an accepted or connecting socket. Does NODELAY even mean anything there?return s; } @@ -2322,6 +2338,8 @@ static void tcp_ns_sock_init4(const struct ctx *c, in_port_t port) else s = -1; + tcp_sock_set_nodelay(s);Same here.if (c->tcp.fwd_out.mode == FWD_AUTO) tcp_sock_ns[port][V4] = s; } @@ -2348,6 +2366,8 @@ static void tcp_ns_sock_init6(const struct ctx *c, in_port_t port) else s = -1; + tcp_sock_set_nodelay(s);And here.if (c->tcp.fwd_out.mode == FWD_AUTO) tcp_sock_ns[port][V6] = s; }Do accept()ed sockets inherit NODELAY from the listening socket? Or should we instead be setting NODELAY after the accept4() in tcp_listen_handler()? In fact.. even if they do inhereit, would it be simpler to just set it at accept() time? -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson