On Tue, 7 Oct 2025 10:34:01 +1100
David Gibson
On Tue, Oct 07, 2025 at 12:32:54AM +0200, Stefano Brivio wrote:
On Fri, 3 Oct 2025 13:43:32 +1000 David Gibson
wrote: On Thu, Oct 02, 2025 at 01:51:04PM +0200, Stefano Brivio wrote:
On Thu, 2 Oct 2025 13:02:09 +1000 David Gibson
wrote: On Thu, Oct 02, 2025 at 12:52:31PM +1000, David Gibson wrote:
On Thu, Oct 02, 2025 at 02:06:45AM +0200, Stefano Brivio wrote: [snip] > diff --git a/tcp.c b/tcp.c > index 3f7dc82..5a7a607 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1769,7 +1769,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > } > } > > - if (th->fin) > + if (th->fin && seq == seq_from_tap) > fin = 1;
Can a FIN segment also contain data? My quick googling suggests yes.
Yes, absolutely, my slow wiresharking over the years also confirms, and it's so often the case that (I think) this issue doesn't happen so frequently as it only occurs if we have a FIN segment without data.
Makes sense.
If we have a data segment, with FIN set, that we can't fully transmit, we already set 'partial_send' and won't set TAP_FIN_RCVD as a result.
Another case where we want to ignore a FIN segment with data is if we have a gap before it, but in that case we'll eventually set 'keep' and return early.
Ah, right. I'd noticed we set fin = 1 in that case, but forgotten about the exit before setting TAP_FIN_RCVD if keep is set.
If so, doesn't this logic need to go after we process the data processing, so that seq_from_tap points to the end of the packet's data, rather than the beginning? (And the handling of zero-length packets would also need revision to match).
This made sense to me for a moment but now I'm struggling to understand or remember why. What I want to check here is that a FIN segment without data (I should have specified in the commit message) is acceptable because its sequence is as expected.
Right. This is correct for zero-data FIN segments, but I think as a side-effect you've made it ignore certain FIN segments _with_ data. It will work in the common case where the data exactly follows on from what we already have. But in the case where the segment has some data we already have and some new data, the fin = 1 won't trip because seq != seq_from_tap. There isn't another place that will catch it instead, AFAICT.
I guess it will be fine in the end, because with all the data acked, the guest should retransmit the FIN with no data.
But going back to FIN segments with data: why should we sum the length to seq_from_tap before comparing the sequence? I don't understand what additional check you want to introduce, or what case you want to cover.
I was thinking about the case above, but I didn't explain it very well.
Following on from that, it seems to me like it would make sense for FIN segments to also participate in the 'keep' mechanism. It should work eventually, but I expect it would be smoother in the case that we get a final burst of packets in a stream out of order.
FIN segments with data already go through that dance.
Without data, I guess you're right, we might have in the same batch (not that I've ever seen it happening in practice) a FIN segment without data that we process first (and now discard because of the sequence number), and some data before that we process later, so we shouldn't throw away the FIN segment because of that. We should, conceptually, reorder it as well.
It probably makes things more complicated for a reason that's not so critical (ignoring a FIN is fine, we'll get another one), and I wanted to have the simplest possible fix here.
Let me see if I can make this entirely correct without a substantially bigger change, I haven't really tried.
How about this:
diff --git a/tcp.c b/tcp.c index 7da41797..42e576b4 100644 --- a/tcp.c +++ b/tcp.c @@ -1774,10 +1774,7 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, } }
- if (th->fin) - fin = 1; - - if (!len) + if (!len && !th->fin) continue;
seq_offset = seq_from_tap - seq; @@ -1820,6 +1817,8 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, break; seq_from_tap += size; iov_i += count; + if (th->fin) + fin = 1;
if (keep == i) keep = -1;
We'd need to double check that the "accept data segment" path is safe with len == 0, of course.
For sure it's not before d2c33f45f7be ("tcp: Convert tcp_data_from_tap() to use iov_tail"), because we might add zero-length segments to the tcp_iov array, and that would make backporting an otherwise simple and critical fix to slightly older versions rather complicated.
Kinda. It's not that complicated to deal with that case, by wrapping the actual data processing in an `if (len) { ... }`
That's needed for sure, but do we risk looping forever on a particular batch of FIN segments without data with a given series of sequence numbers? Right now the loop terminates because the sequence moves forward. I'm not sure what happens without data while moving 'keep' around. Maybe it takes a few minutes to figure out (I haven't tried) but I wouldn't call that trivial.
After that commit, I'm not sure about the behaviour of iov_tail_clone(). I think it will return 0, but it should be tested (that assumption, itself, but also that the fix still works).
Right, I think it's safe in that case - that's what I was looking at.
Note that I didn't test this.
But I think that will treat dataless and with-data FINs the same way, and let them use the keep mechanism.
Given that the only advantage of doing this would be to handle a rare (I guess) corner case, that is, an out-of-sequence FIN segment with data, which is not critical anyway because the FIN will be retransmitted, I would rather keep this (critical) fix as it is.
I would suggest filing a separate ticket or anyway sending a separate patch if you want to fix that other case.
Fair enough. I'll wait for you to get the basic fix merge, then I might batch this cleanup/fixup with the reorg to clarify bytes_acked handling.
Okay, thanks, merged now. -- Stefano