On Mon, Aug 18, 2025 at 07:06:29PM +0200, Stefano Brivio wrote:
On Mon, 18 Aug 2025 12:56:09 +1000 David Gibson
wrote: On Fri, Aug 15, 2025 at 06:10:41PM +0200, Stefano Brivio wrote:
If the peer shrinks the window to zero, we'll skip storing the new window, as a convenient way to cause window probes (which exceed any zero-sized window, strictly speaking) if we don't get window updates in a while.
Strictly speaking, not storing the new zero window feels slightly wrong to me - I wonder if it would be more correct to store the zero window, but still send window probes as a special case.
Yeah, it's wrong but it's not really obvious what value we should use at that point. With a recent kernel version, there would be no problem storing 0, waiting for a window update which will come soon after, and reserve the special case I already added (1 as window size) for zero-window probes, triggered via timer.
But without 8c670bdfa58e ("tcp: correct handling of extreme memory squeeze"), and with e2142825c120 ("net: tcp: send zero-window ACK when no memory"), the window update never comes, so we would need to wait for the timer to expire before we send an explicit zero-window probe, which takes a while (two seconds).
Right. I guess what I had in mind was to store the zero window, but also set up a new timer to send a new window-probe in a more reasonable timeframe.
With this trick, instead, we keep the latest advertised value, which is actually the latest correct value, and retry sending what we were originally sending as soon as we have more data from the socket, which would becomes available quite soon in all the practical cases where the kernel shrinks the window to zero.
Ah, right. So, IIUC a part of the issue is that at this point if we see a zero window we don't know if it's because *we* filled the window in the normal way - and so should wait for an update - or because the peer slammed the window shut - in which case we need to reprobe. I mean, I guess we could set a flag for that earlier, but that's fairly awkward. At least for the time being, I think the current approach of keeping the previous window is preferable.
Maybe we could store zero, and use WINDOW_DEFAULT (14600 bytes) on socket activity. The RFC simply says we need to send zero-window probes at some point, not that we should assume a broken receiver, but unfortunately between e2142825c120 and 8c670bdfa58e it's like that.
I think the existing behaviour of keeping the previous window is preferable. At least for the specific case of a transient memory error, that seems more likely to be right. Maybe? Unless we think we should slow down in the hopes of not exhausting the peer's memory again. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson