On Thu, 9 Oct 2025 13:43:00 +1100
David Gibson
On Wed, Oct 08, 2025 at 12:01:27PM +0200, Stefano Brivio wrote:
On Wed, 8 Oct 2025 11:41:23 +1100 David Gibson
wrote: On Wed, Oct 08, 2025 at 12:42:49AM +0200, Stefano Brivio wrote:
On Tue, 7 Oct 2025 10:34:01 +1100 David Gibson
wrote: On Tue, Oct 07, 2025 at 12:32:54AM +0200, Stefano Brivio wrote:
On Fri, 3 Oct 2025 13:43:32 +1000 David Gibson
wrote: [snip] > @@ -1820,6 +1817,8 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > break; > seq_from_tap += size; > iov_i += count; > + if (th->fin) > + fin = 1; > > if (keep == i) > keep = -1; > > > We'd need to double check that the "accept data segment" path is safe > with len == 0, of course. For sure it's not before d2c33f45f7be ("tcp: Convert tcp_data_from_tap() to use iov_tail"), because we might add zero-length segments to the tcp_iov array, and that would make backporting an otherwise simple and critical fix to slightly older versions rather complicated.
Kinda. It's not that complicated to deal with that case, by wrapping the actual data processing in an `if (len) { ... }`
That's needed for sure, but do we risk looping forever on a particular batch of FIN segments without data with a given series of sequence numbers?
Right now the loop terminates because the sequence moves forward. I'm not sure what happens without data while moving 'keep' around. Maybe it takes a few minutes to figure out (I haven't tried) but I wouldn't call that trivial.
That should be fine, because we need to advance the sequence by one for the FIN anyway, so we will move forwards. I might have forgotten that in my quick example, which is a bug, but an easy one to fix.
But we don't, in that loop, and there's a specific reason for it. FIN segments are a special case in that, if you receive more than one, you don't advance the sequence by more than one, even if the segments themselves are in the expected sequence.
Ok. *thinks*. So, if we both set fin = 1 and advance the sequence inside an if (th->fin && !fin), that should do the trick, yes?
Yes, but mind that we must then rewind the sequence on 'partial_send' as well as on any non-retried sendmsg() failure because we want to pretend, in those cases, that we never received a FIN (we won't set TAP_FIN_RCVD either). I think it will be less awkward and probably more robust altogether to make sure that the 'keep' mechanism can't be fooled by FIN segments with looping sequences. Actually, now that we have packet pools (I didn't have them when I first implemented this logic in dc169643a457 "tcp: Full batched processing for tap messages"), do we want to try and reorder segments first, separately? It's not great for cache locality but perhaps worth at least profiling, and it would probably make things simpler. In that case, though, maybe it's appropriate to have a complete, less intrusive fix for out-of-sequence FIN handling. If we advance the sequence there, and rewind it later, I think it's not that bad as an intermediate step.
If you want to move the sequence increase into that loop (which might make sense with some extra care), perhaps it's worth doing that together with the whole https://bugs.passt.top/show_bug.cgi?id=125 at this point.
Yeah, maybe.
By the way a separate reordering step should make this easier as well, but still I'm not sure it's justified, I think it depends on what perf(1) says about it. -- Stefano