Re: [PATCH 1/1] tcp: Clarify logic calculating how much guest data to ack

8 Oct 2025

On Fri,  3 Oct 2025 16:30:51 +1000
David Gibson  wrote:
...
This is fairly complex, because we have a method we prefer but we need to
fall back to a simpler one in a bunch of cases.  Slightly reorganise the
code to make the flow clearer, and add a large comment giving the
rationale.
I think this is a strict improvement on the original and I was about to
apply it regardless of my pending series with TCP fixes (it looks
completely independent to me) and a few nits I had, but then I noticed
one bit that might be substantially misleading, at the end.

So here come all my comments:
...
Signed-off-by: David Gibson 
---
 tcp.c | 68 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/tcp.c b/tcp.c
index 7da41797..85eb2c32 100644
--- a/tcp.c
+++ b/tcp.c
@@ -1014,35 +1014,51 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn,
  uint32_t new_wnd_to_tap = prev_wnd_to_tap;
  int s = conn->sock;
-	if (!bytes_acked_cap) {
-		conn->seq_ack_to_tap = conn->seq_from_tap;
-		if (SEQ_LT(conn->seq_ack_to_tap, prev_ack_to_tap))
-			conn->seq_ack_to_tap = prev_ack_to_tap;
-	} else {
-		if ((unsigned)SNDBUF_GET(conn) < SNDBUF_SMALL ||
-		    tcp_rtt_dst_low(conn) || CONN_IS_CLOSING(conn) ||
-		    (conn->flags & LOCAL) || force_seq) {
-			conn->seq_ack_to_tap = conn->seq_from_tap;
-		} else if (conn->seq_ack_to_tap != conn->seq_from_tap) {
-			if (!tinfo) {
-				tinfo = &tinfo_new;
-				if (getsockopt(s, SOL_TCP, TCP_INFO, tinfo, &sl))
-					return 0;
-			}
-
-			/* This trips a cppcheck bug in some versions, including
-			 * cppcheck 2.18.3.
-			 * https://sourceforge.net/p/cppcheck/discussion/general/thread/fecde59085/
-			 */
-			/* cppcheck-suppress [uninitvar,unmatchedSuppression] */
-			conn->seq_ack_to_tap = tinfo->tcpi_bytes_acked +
-				conn->seq_init_from_tap;
-
-			if (SEQ_LT(conn->seq_ack_to_tap, prev_ack_to_tap))
-				conn->seq_ack_to_tap = prev_ack_to_tap;
+	/* At this point we could ack all the data we've accepted for forwarding
+	 * (seq_from_tap).  When possible, however, we want to only ack what the
+	 * peer has acked.  This makes it appear to the guest more like a direct
+	 * connection to the peer, and may improve flow control behaviour.
For consistency, as we don't use "ack" as a verb anywhere else, maybe
spell it out as "acknowledge" / "acknowledged".
...
+	 *
+	 * For it to be possible and worth it we need:
+	 *  - The TCP_INFO Linux extension which gives us the peer acked bytes
+	 *  - Not to be told not to (force_seq)
+	 *  - Not half-closed in the peer->guest direction
+	 *      With no data coming from the peer, we won't get further events
+	 *      which would prompt us to recheck bytes_acked.  We could poll on
+	 *      a timer, but that's more trouble than it's worth.
Strictly speaking, we could (and usually do) get further events
prompting us to check bytes_acked, in the form of segments from the
guest, but perhaps we can just leave this detail out for brevity,
unless you want to try and factor that in.
...
+	 *  - Not a host local connection
The tcp_rtt_dst_low() is a trick to consider "local" also anything (VMs)
that's connected to us via veth.

It's not local from a network segment perspective, but it's local to
the machine, and the same consideration applies (somewhat surprisingly,
for veth). Same here, I guess we could leave this out for brevity.
...
+	 *      Data goes directly from socket to socket in this case, with
+	 *      nothing meaningful "in flight".
+	 *  - Large enough send buffer
+	 *      If this is small, there's not enough in flight to bother.
+	 */
+	if (bytes_acked_cap && !force_seq &&
+	    !CONN_IS_CLOSING(conn) &&
+	    !(conn->flags & LOCAL) && !tcp_rtt_dst_low(conn) &&
+	    (unsigned)SNDBUF_GET(conn) >= SNDBUF_SMALL) {
+		if (!tinfo) {
+			tinfo = &tinfo_new;
+			if (getsockopt(s, SOL_TCP, TCP_INFO, tinfo, &sl))
+				return 0;
      }
+
+		/* This trips a cppcheck bug in some versions, including
+		 * cppcheck 2.18.3.
+		 * https://sourceforge.net/p/cppcheck/discussion/general/thread/fecde59085/
+		 */
+		/* cppcheck-suppress [uninitvar,unmatchedSuppression] */
+		conn->seq_ack_to_tap = tinfo->tcpi_bytes_acked +
+			conn->seq_init_from_tap;
Maybe fix the indentation while at it?

		conn->seq_ack_to_tap = tinfo->tcpi_bytes_acked +
				       conn->seq_init_from_tap;
...
+	} else {
+		/* Fall back to acking everything we have */
Maybe specifically refer to what we got so far,

		/* Fall back to acknowledging everything we got */

?
...
+		conn->seq_ack_to_tap = conn->seq_from_tap;
  }
+	/* If the guest is retransmitting, don't let our ACKed sequence go
+	 * backwards */
This is the misleading part I realised about, after I mentioned it in:

  https://archives.passt.top/passt-dev/20251007003219.3f286b1d@elisabeth/

...the reason why we risk rewinding the acknowledged sequence isn't
that the guest is retransmitting, because in that case we wouldn't have
advanced conn->seq_to_tap to begin with.

The reason is that one of those conditions for using bytes_acked you
listed above happened to be false, and now it becomes true again.

The only practical one I can think of is the array used by
tcp_rtt_dst_low() getting full at some point, but later we re-insert the
peer we're talking to in the table.

By the way, for consistency:

	/* Multi-line
	 * comment
	 */
...
+	if (SEQ_LT(conn->seq_ack_to_tap, prev_ack_to_tap))
+		conn->seq_ack_to_tap = prev_ack_to_tap;
The reason behind the current code structure is to skip this if we
didn't touch conn->seq_ack_to_tap at all, but the compiler will probably
figure this out by itself, and even if it doesn't, I guess it's more
efficient to do this unconditionally anyway.
...
+
  if (!snd_wnd_cap) {
      tcp_get_sndbuf(conn);
      new_wnd_to_tap = MIN(SNDBUF_GET(conn), MAX_WINDOW);
-- 
Stefano

    

Re: [PATCH 1/1] tcp: Clarify logic calculating how much guest data to ack

Stefano Brivio