On Fri, Jan 17, 2025 at 4:41 PM <jmaloy(a)redhat.com> wrote:From: Jon Maloy <jmaloy(a)redhat.com> Testing with iperf3 using the "pasta" protocol splicer has revealed a bug in the way tcp handles window advertising in extreme memory squeeze situations. Under memory pressure, a socket endpoint may temporarily advertise a zero-sized window, but this is not stored as part of the socket data. The reasoning behind this is that it is considered a temporary setting which shouldn't influence any further calculations. However, if we happen to stall at an unfortunate value of the current window size, the algorithm selecting a new value will consistently fail to advertise a non-zero window once we have freed up enough memory.The "if we happen to stall at an unfortunate value of the current window size" phrase is a little vague... :-) Do you have a sense of what might count as "unfortunate" here? That might help in crafting a packetdrill test to reproduce this and have an automated regression test.This means that this side's notion of the current window size is different from the one last advertised to the peer, causing the latter to not send any data to resolve the sitution.Since the peer last saw a zero receive window at the time of the memory-pressure drop, shouldn't the peer be sending repeated zero window probes, and shouldn't the local host respond to a ZWP with an ACK with the correct non-zero window? Do you happen to have a tcpdump .pcap of one of these cases that you can share?The problem occurs on the iperf3 server side, and the socket in question is a completely regular socket with the default settings for the fedora40 kernel. We do not use SO_PEEK or SO_RCVBUF on the socket. The following excerpt of a logging session, with own comments added, shows more in detail what is happening: // tcp_v4_rcv(->) // tcp_rcv_established(->) [5201<->39222]: ==== Activating log @ net/ipv4/tcp_input.c/tcp_data_queue()/5257 ==== [5201<->39222]: tcp_data_queue(->) [5201<->39222]: DROPPING skb [265600160..265665640], reason: SKB_DROP_REASON_PROTO_MEM [rcv_nxt 265600160, rcv_wnd 262144, snt_ack 265469200, win_now 131184]What is "win_now"? That doesn't seem to correspond to any variable name in the Linux source tree. Can this be renamed to the tcp_select_window() variable it is printing, like "cur_win" or "effective_win" or "new_win", etc? Or perhaps you can attach your debugging patch in some email thread? I agree with Eric that these debug dumps are a little hard to parse without seeing the patch that allows us to understand what some of these fields are... I agree with Eric that probably tp->pred_flags should be cleared, and a packetdrill test for this would be super-helpful. thanks, neal