Re: [PATCH v3 16/16] tcp: Don't defer hash table removal

4 Dec 2023

On Sat, Dec 02, 2023 at 05:34:58AM +0100, Stefano Brivio wrote:
...
On Fri, 1 Dec 2023 11:07:50 +1100
David Gibson  wrote:
...
On Thu, Nov 30, 2023 at 01:45:32PM +0100, Stefano Brivio wrote:
...
On Thu, 30 Nov 2023 13:02:22 +1100
David Gibson  wrote:
...
When a TCP connection is closed, we mark it by setting events to CLOSED,
then some time later we do final cleanups: closing sockets, removing from
the hash table and so forth.
This does mean that when making a hash lookup we need to exclude any
apparent matches that are CLOSED, since they represent a stale connection.
This can happen in practice if one connection closes and a new one with the
same endpoints is started shortly afterward.
Checking for CLOSED is quite specific to TCP however, and won't work when
we extend the hash table to more general flows.  So, alter the code to
immediately remove the connection from the hash table when CLOSED, although
we still defer closing sockets and other cleanup.
Signed-off-by: David Gibson 
---
 tcp.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tcp.c b/tcp.c
index 74d06bf..17c7cba 100644
--- a/tcp.c
+++ b/tcp.c
@@ -781,6 +781,9 @@ static void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn,
      tcp_timer_ctl(c, conn);
 }
+static void tcp_hash_remove(const struct ctx *c,
+			    const struct tcp_tap_conn *conn);
+
 /**
  * conn_event_do() - Set and log connection events, update epoll state
  * @c:		Execution context
@@ -825,7 +828,9 @@ static void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn,
      flow_dbg(conn, "%s",
      	 num == -1 	       ? "CLOSED" : tcp_event_str[num]);
-	if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
+	if (event == CLOSED)
+		tcp_hash_remove(c, conn);
+	else if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD))
      conn_flag(c, conn, ACTIVE_CLOSE);
  else
      tcp_epoll_ctl(c, conn);
@@ -1150,7 +1155,7 @@ static int tcp_hash_match(const struct tcp_tap_conn *conn,
      	  const union inany_addr *faddr,
      	  in_port_t eport, in_port_t fport)
 {
-	if (conn->events != CLOSED && inany_equals(&conn->faddr, faddr) &&
+	if (inany_equals(&conn->faddr, faddr) &&
      conn->eport == eport && conn->fport == fport)
      return 1;
@@ -1308,7 +1313,6 @@ static void tcp_conn_destroy(struct ctx *c, union flow *flow)
  if (conn->timer != -1)
      close(conn->timer);
-	tcp_hash_remove(c, conn);
  flow_table_compact(c, flow);
I was pretty sure, due to the way I originally implemented this, that
removing an entry from the hash table without compacting the table
afterwards, with an event possibly coming between the two, would
present some inconsistency while we're handling that event.
But looking at it now, I don't see any issue with this. I just wanted
to raise it in case you're aware of (but didn't think about) some
concern in this sense.
I think it's ok.  The thing is that compacting the connection table
itself is largely independent of the hash table, whose buckets are
separately indexed.  A hash remove shuffles things around in the hash
buckets, but doesn't change where connections sit in the connection
table.  A connection table compaction changes the indices in the
connection table, which requires updating things in the hash buckets,
but not moving things around in the buckets - exactly the same entries
are in every hash bucket, it's just that one of them has a new "name"
now.
...
By the way, the reason why I deferred tcp_hash_remove() back then was
to save cycles between epoll events and get higher CRR rates, but I
think the effect is negligible anyway.
Right.. to process a FIN and the next SYN at once, I guess?
That's one example, yes, but in any case it was an optimisation for...
...
I figured
this might make a difference, but probably not much.  There's no
syscall here, and batching doesn't reduce the total amount of work in
this case.
supposedly better data locality, with batching. But never
micro-benchmarked, and surely negligible anyway.
Hmm.. except hash tables are by construction non-local, so I wouldn't
really expect batching unrelated hash entries to do that.  Even if the
connection table entries themselves are close, which is more likely,
they take a cacheline each on the most common platform, so that's not
likely to win anything.  In fact if anything, I'd expect better
locality with the non-deferred approach - we're triggering the hash
remove when we've already been working on that connection, so it
should be cache hot.

I guess batching does win some icache locality.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson