Re: [PATCH RFT 2/5] tcp: Reset STALLED flag on ACK only, check for pending socket data

27 Sep 2023

      On Mon, 25 Sep 2023 13:07:24 +1000
David Gibson  wrote:
...
I think the change itself here is sound, but I have some nits to pick
with the description and reasoning.
On Sat, Sep 23, 2023 at 12:06:07AM +0200, Stefano Brivio wrote:
...
In tcp_tap_handler(), we shouldn't reset the STALLED flag (indicating
that we ran out of tap-side window space, or that all available
socket data is already in flight -- better names welcome!
Hmm.. when you put it like that it makes me wonder if those two quite
different conditions really need the same handling.  Hrm.. I guess
both conditions mean that we can't accept data from the socket, even
if it's availble.
Right. I mean, we can also call them differently... or maybe pick a
name that reflects the outcome/handling instead of what happened.
...
...
) on any
event: do that only if the first packet in a batch has the ACK flag
set.
"First packet in a batch" may not be accurate here - we're looking at
whichever packet we were up to before calling data_from_tap().  There
could have been earlier packets in the receive batch that were already
processed.
Well, it depends on what we call "batch" -- here I meant the pool of
packets (that are passed as a batch to tcp_tap_handler()). Yes, "pool"
would be more accurate.
...
This also raises the question of why the first data packet should be
particularly privileged here.
No reason other than convenience, and yes, it can be subtly wrong.
...
I'm wondering if what we really want to
check is whether data_from_tap() advanced the ack pointer at all.
Right, we probably should do that instead.
...
I'm not clear on when the th->ack check would ever fail in practice:
aren't the only normal packets in a TCP connection without ACK the
initial SYN or an RST?  We've handled the SYN case earlier, so should
we just have a blanket case above this that if we get a packet with
!ACK, we reset the connection?
One thing that's legitimate (rarely seen, but I've seen it, I don't
remember if the Linux kernel ever does that) is a segment without ACK,
and without data, that just updates the window (for example after a
zero window).

If the sequence received/processed so far doesn't correspond to the
latest sequence sent, omitting the ACK flag is useful so that the
window update is not taken as duplicate ACK (that would trigger
retransmission).
...
...
Make sure we check for pending socket data when we reset it:
reverting back to level-triggered epoll events, as tcp_epoll_ctl()
does, isn't guaranteed to actually trigger a socket event.
Which sure seems like a kernel bug.  Some weird edge conditions for
edge-triggered seems expected, but this doesn't seem like valid
level-triggered semantics.
Hmm, yes, and by doing a quick isolated test actually this seems to work
as intended in the kernel. I should drop this and try again.
...
Hmmm... is toggling EPOLLET even what we want.  IIUC, the heart of
what's going on here is that we can't take more data from the socket
until something happens on the tap side (either the window expands, or
it acks some data).  In which case should we be toggling EPOLLIN on
the socket instead?  That seems more explicitly to be saying to the
socket side "we don't currently care if you have data available".
The reason was to act on EPOLLRDHUP at the same time. But well, we
could just mask EPOLLIN and EPOLLRDHUP, then -- I guess that would make
more sense.

For the moment being, we should probably try to see what happens
if this patch simply moves conn_flag(c, conn, ~STALLED); to where 3/5
needs it (or directly incorporate that into 3/5).

-- 
Stefano