On Sun, Apr 7, 2024 at 2:52 PM Jason Xing <kerneljasonxing(a)gmail.com> wrote:On Sun, Apr 7, 2024 at 2:38 AM Eric Dumazet <edumazet(a)google.com> wrote: > > On Sat, Apr 6, 2024 at 8:21 PM <jmaloy(a)redhat.com> wrote: > > > > From: Jon Maloy <jmaloy(a)redhat.com> > > > > Testing of the previous commit ("tcp: add support for SO_PEEK_OFF") > > in this series along with the pasta protocol splicer revealed a bug in > > the way tcp handles window advertising during extreme memory squeeze > > situations. > > > > The excerpt of the below logging session shows what is happeing: > > > > [5201<->54494]: ==== Activating log @ tcp_select_window()/268 ==== > > [5201<->54494]: (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM) --> TRUE > > [5201<->54494]: tcp_select_window(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354, returning 0 > > [5201<->54494]: ADVERTISING WINDOW SIZE 0 > > [5201<->54494]: __tcp_transmit_skb(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > > > [5201<->54494]: tcp_recvmsg_locked(->) > > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: (win_now: 250164, new_win: 262144 >= (2 * win_now): 500328))? --> time_to_ack: 0 > > [5201<->54494]: NOT calling tcp_send_ack() > > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window now: 250164, qlen: 83 > > > > [...] > > I would prefer a packetdrill test, it is not clear what is happening... > > In particular, have you used SO_RCVBUF ? > > > > > [5201<->54494]: tcp_recvmsg_locked(->) > > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: (win_now: 250164, new_win: 262144 >= (2 * win_now): 500328))? --> time_to_ack: 0 > > [5201<->54494]: NOT calling tcp_send_ack() > > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window now: 250164, qlen: 1 > > > > [5201<->54494]: tcp_recvmsg_locked(->) > > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: (win_now: 250164, new_win: 262144 >= (2 * win_now): 500328))? --> time_to_ack: 0 > > [5201<->54494]: NOT calling tcp_send_ack() > > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: tcp_recvmsg_locked(<-) returning 57036 bytes, window now: 250164, qlen: 0 > > > > [5201<->54494]: tcp_recvmsg_locked(->) > > [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: NOT calling tcp_send_ack() > > [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294, tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354 > > [5201<->54494]: tcp_recvmsg_locked(<-) returning -11 bytes, window now: 250164, qlen: 0 > > > > We can see that although we are adverising a window size of zero, > > tp->rcv_wnd is not updated accordingly. This leads to a discrepancy > > between this side's and the peer's view of the current window size. > > - The peer thinks the window is zero, and stops sending.Hi! In my original logic, the client will send a zero-window ack when it drops the skb because it is out of the memory. And the peer SHOULD keep retrans the dropped packet. Does the peer do the transmission in this case? The receive window of the peer SHOULD recover once the retransmission is successful.> > - This side ends up in a cycle where it repeatedly caclulates a new > > window size it finds too small to advertise.Yeah, the zero-window suppressed the sending of ack in __tcp_cleanup_rbuf, which I wasn't aware of. The ack will recover the receive window of the peer. Does it make the peer retrans the dropped data immediately? In my opinion, the peer still needs to retrans the dropped packet until the retransmission timer timeout. Isn't it? If it is, maybe we can do the retransmission immediately if we are in zero-window from a window-shrink, which can make the recovery faster. [......]Thanks for CC the new email of mine, it's very kind of you, xing :/Any particular reason to not cc Menglong Dong ? (I just did)He is not working at Tencent any more. Let me CC here one more time.