On Thu, 21 Nov 2024 05:26:17 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:On Thu, 21 Nov 2024 13:38:09 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:^ lonely, I should sayOn Wed, Nov 20, 2024 at 07:43:44AM +0100, Stefano Brivio wrote:*Shouldn't*.On Wed, 20 Nov 2024 12:02:00 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Ah, I see. But that is an optimisation, right? It shouldn't be necessary for correctness.On Tue, Nov 19, 2024 at 08:53:44PM +0100, Stefano Brivio wrote: > RFC 9293, 3.8.4 says: > > Implementers MAY include "keep-alives" in their TCP implementations > (MAY-5), although this practice is not universally accepted. Some > TCP implementations, however, have included a keep-alive mechanism. > To confirm that an idle connection is still active, these > implementations send a probe segment designed to elicit a response > from the TCP peer. Such a segment generally contains SEG.SEQ = > SND.NXT-1 and may or may not contain one garbage octet of data. If > keep-alives are included, the application MUST be able to turn them > on or off for each TCP connection (MUST-24), and they MUST default to > off (MUST-25). > > but currently, tcp_data_from_tap() is not aware of this and will > schedule a fast re-transmit on the second keep-alive (because it's > also a duplicate ACK), ignoring the fact that the sequence number was > rewinded to SND.NXT-1. > > ACK these keep-alive segments, reset the activity timeout, and ignore > them for the rest. > > At some point, we could think of implementing an approximation of > keep-alive segments on outbound sockets, for example by setting > TCP_KEEPIDLE to 1, and a large TCP_KEEPINTVL, so that we send a single > keep-alive segment at approximately the same time, and never reset the > connection. That's beyond the scope of this fix, though. > > Reported-by: Tim Besard <tim.besard(a)gmail.com> > Link: https://github.com/containers/podman/discussions/24572 > Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> > --- > tcp.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/tcp.c b/tcp.c > index f357920..1eb85bb 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -1763,6 +1763,20 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, > continue; > > seq = ntohl(th->seq); > + if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) { > + flow_trace(conn, > + "keep-alive sequence: %u, previous: %u", > + seq, conn->seq_from_tap); > + > + tcp_send_flag(c, conn, ACK); > + tcp_timer_ctl(c, conn); > + > + if (p->count == 1) > + return 1; I'm not sure what this test is for. Shouldn't the continue be sufficient?I don't think we want to go through tcp_update_seqack_from_tap(), tcp_tap_window_update() and the like on a keep-alive segment.The code itself is two lines longer, of course, with an additional early return. Considering all the possible side effects of looking at window values from a keep-alive segment looks to me more complicatedBut if we receive something else in this batch, that's going to be a data segment that happened to arrive just after the keep-alive, so, in that case, we have to do the normal processing, by ignoring just this segment and hitting 'continue'. Strictly speaking, the 'continue' is enough and correct, but I think that returning early in the obviously common case is simpler and more robust.Hrm. Doesn't seem simpler to me, but I can see the point of the change so,than the alternative, though. > Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>-- Stefano