Re: [PATCH 2/2] tcp: Acknowledge keep-alive segments, ignore them for the rest

21 Nov 2024

On Thu, 21 Nov 2024 13:38:09 +1100
David Gibson  wrote:
...
On Wed, Nov 20, 2024 at 07:43:44AM +0100, Stefano Brivio wrote:
...
On Wed, 20 Nov 2024 12:02:00 +1100
David Gibson  wrote:
...
On Tue, Nov 19, 2024 at 08:53:44PM +0100, Stefano Brivio wrote:
...
RFC 9293, 3.8.4 says:
Implementers MAY include "keep-alives" in their TCP implementations
   (MAY-5), although this practice is not universally accepted.  Some
   TCP implementations, however, have included a keep-alive mechanism.
   To confirm that an idle connection is still active, these
   implementations send a probe segment designed to elicit a response
   from the TCP peer.  Such a segment generally contains SEG.SEQ =
   SND.NXT-1 and may or may not contain one garbage octet of data.  If
   keep-alives are included, the application MUST be able to turn them
   on or off for each TCP connection (MUST-24), and they MUST default to
   off (MUST-25).
but currently, tcp_data_from_tap() is not aware of this and will
schedule a fast re-transmit on the second keep-alive (because it's
also a duplicate ACK), ignoring the fact that the sequence number was
rewinded to SND.NXT-1.
ACK these keep-alive segments, reset the activity timeout, and ignore
them for the rest.
At some point, we could think of implementing an approximation of
keep-alive segments on outbound sockets, for example by setting
TCP_KEEPIDLE to 1, and a large TCP_KEEPINTVL, so that we send a single
keep-alive segment at approximately the same time, and never reset the
connection. That's beyond the scope of this fix, though.
Reported-by: Tim Besard 
Link: https://github.com/containers/podman/discussions/24572
Signed-off-by: Stefano Brivio 
---
 tcp.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tcp.c b/tcp.c
index f357920..1eb85bb 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1763,6 +1763,20 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn,
      	continue;
seq = ntohl(th->seq);
+		if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) {
+			flow_trace(conn,
+				   "keep-alive sequence: %u, previous: %u",
+				   seq, conn->seq_from_tap);
+
+			tcp_send_flag(c, conn, ACK);
+			tcp_timer_ctl(c, conn);
+
+			if (p->count == 1)
+				return 1;
I'm not sure what this test is for.  Shouldn't the continue be sufficient?
I don't think we want to go through tcp_update_seqack_from_tap(),
tcp_tap_window_update() and the like on a keep-alive segment.
Ah, I see.  But that is an optimisation, right?  It shouldn't be
necessary for correctness.
*Shouldn't*.
...
...
But if we receive something else in this batch, that's going to be a
data segment that happened to arrive just after the keep-alive, so, in
that case, we have to do the normal processing, by ignoring just this
segment and hitting 'continue'.
Strictly speaking, the 'continue' is enough and correct, but I think
that returning early in the obviously common case is simpler and more
robust.
Hrm.  Doesn't seem simpler to me, but I can see the point of the
change so,
The code itself is two lines longer, of course, with an additional
early return. Considering all the possible side effects of looking at
window values from a keep-alive segment looks to me more complicated
than the alternative, though.
...
Reviewed-by: David Gibson 
-- 
Stefano