On Sat, Dec 02, 2023 at 05:34:58AM +0100, Stefano Brivio wrote:On Fri, 1 Dec 2023 11:07:50 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Hmm.. except hash tables are by construction non-local, so I wouldn't really expect batching unrelated hash entries to do that. Even if the connection table entries themselves are close, which is more likely, they take a cacheline each on the most common platform, so that's not likely to win anything. In fact if anything, I'd expect better locality with the non-deferred approach - we're triggering the hash remove when we've already been working on that connection, so it should be cache hot. I guess batching does win some icache locality. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonOn Thu, Nov 30, 2023 at 01:45:32PM +0100, Stefano Brivio wrote:That's one example, yes, but in any case it was an optimisation for...On Thu, 30 Nov 2023 13:02:22 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:I think it's ok. The thing is that compacting the connection table itself is largely independent of the hash table, whose buckets are separately indexed. A hash remove shuffles things around in the hash buckets, but doesn't change where connections sit in the connection table. A connection table compaction changes the indices in the connection table, which requires updating things in the hash buckets, but not moving things around in the buckets - exactly the same entries are in every hash bucket, it's just that one of them has a new "name" now.When a TCP connection is closed, we mark it by setting events to CLOSED, then some time later we do final cleanups: closing sockets, removing from the hash table and so forth. This does mean that when making a hash lookup we need to exclude any apparent matches that are CLOSED, since they represent a stale connection. This can happen in practice if one connection closes and a new one with the same endpoints is started shortly afterward. Checking for CLOSED is quite specific to TCP however, and won't work when we extend the hash table to more general flows. So, alter the code to immediately remove the connection from the hash table when CLOSED, although we still defer closing sockets and other cleanup. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tcp.c b/tcp.c index 74d06bf..17c7cba 100644 --- a/tcp.c +++ b/tcp.c @@ -781,6 +781,9 @@ static void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, tcp_timer_ctl(c, conn); } +static void tcp_hash_remove(const struct ctx *c, + const struct tcp_tap_conn *conn); + /** * conn_event_do() - Set and log connection events, update epoll state * @c: Execution context @@ -825,7 +828,9 @@ static void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, flow_dbg(conn, "%s", num == -1 ? "CLOSED" : tcp_event_str[num]); - if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD)) + if (event == CLOSED) + tcp_hash_remove(c, conn); + else if ((event == TAP_FIN_RCVD) && !(conn->events & SOCK_FIN_RCVD)) conn_flag(c, conn, ACTIVE_CLOSE); else tcp_epoll_ctl(c, conn); @@ -1150,7 +1155,7 @@ static int tcp_hash_match(const struct tcp_tap_conn *conn, const union inany_addr *faddr, in_port_t eport, in_port_t fport) { - if (conn->events != CLOSED && inany_equals(&conn->faddr, faddr) && + if (inany_equals(&conn->faddr, faddr) && conn->eport == eport && conn->fport == fport) return 1; @@ -1308,7 +1313,6 @@ static void tcp_conn_destroy(struct ctx *c, union flow *flow) if (conn->timer != -1) close(conn->timer); - tcp_hash_remove(c, conn); flow_table_compact(c, flow);I was pretty sure, due to the way I originally implemented this, that removing an entry from the hash table without compacting the table afterwards, with an event possibly coming between the two, would present some inconsistency while we're handling that event. But looking at it now, I don't see any issue with this. I just wanted to raise it in case you're aware of (but didn't think about) some concern in this sense.By the way, the reason why I deferred tcp_hash_remove() back then was to save cycles between epoll events and get higher CRR rates, but I think the effect is negligible anyway.Right.. to process a FIN and the next SYN at once, I guess?I figured this might make a difference, but probably not much. There's no syscall here, and batching doesn't reduce the total amount of work in this case.supposedly better data locality, with batching. But never micro-benchmarked, and surely negligible anyway.