On Wed, Dec 06, 2023 at 08:37:27PM +0100, Stefano Brivio wrote:On Mon, 4 Dec 2023 14:16:10 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Right, either that or name the variable MAX_NUM_FLOWS or something. Eh, whatever.We implement our hash table with pointers to the entry for each bucket (or NULL). However, the entries are always allocated within the flow table, meaning that a flow index will suffice, halving the size of the hash table. For TCP, just a flow index would be enough, but future uses will want to expand the hash table to cover indexing either side of a flow, so use a flow_sidx_t as the type for each hash bucket. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- flow.h | 11 +++++++++++ tcp.c | 34 +++++++++++++++++++++++----------- 2 files changed, 34 insertions(+), 11 deletions(-) diff --git a/flow.h b/flow.h index c2a5190..959b461 100644 --- a/flow.h +++ b/flow.h @@ -53,6 +53,17 @@ static_assert(sizeof(flow_sidx_t) <= sizeof(uint32_t), #define FLOW_SIDX_NONE ((flow_sidx_t){ .flow = FLOW_MAX })In hindsight, while reviewing the functions below: FLOW_MAX should probably be MAX_FROM_BITS(FLOW_INDEX_BITS) - 1 instead (and those >= comparisons would happily become >), so that we don't need to have a "maximum" value that's also not allowed (then, strictly speaking, it's more than the maximum).Yes: we need to stop when we reach something matching @sidx *or* we hit an empty entry. Otherwise we'll never terminate if the entry isn't in there.+/** + * flow_sidx_eq() - Test if two sidx values are equal + * @a, @b: sidx values + * + * Return: true iff @a and @b refer to the same side of the same flow + */ +static inline bool flow_sidx_eq(flow_sidx_t a, flow_sidx_t b) +{ + return (a.flow == b.flow) && (a.side == b.side); +} + union flow; void flow_table_compact(struct ctx *c, union flow *hole); diff --git a/tcp.c b/tcp.c index 09acf7f..7e438b7 100644 --- a/tcp.c +++ b/tcp.c @@ -574,7 +574,7 @@ static unsigned int tcp6_l2_flags_buf_used; #define CONN(idx) (&(FLOW(idx)->tcp)) /* Table for lookup from remote address, local port, remote port */ -static struct tcp_tap_conn *tc_hash[TCP_HASH_TABLE_SIZE]; +static flow_sidx_t tc_hash[TCP_HASH_TABLE_SIZE]; static_assert(ARRAY_SIZE(tc_hash) >= FLOW_MAX, "Safe linear probing requires hash table larger than connection table"); @@ -1197,10 +1197,13 @@ static unsigned int tcp_conn_hash(const struct ctx *c, static inline unsigned tcp_hash_probe(const struct ctx *c, const struct tcp_tap_conn *conn) { + flow_sidx_t sidx = FLOW_SIDX(conn, TAPSIDE); unsigned b; /* Linear probing */ - for (b = tcp_conn_hash(c, conn); tc_hash[b] && tc_hash[b] != conn; + for (b = tcp_conn_hash(c, conn); + !flow_sidx_eq(tc_hash[b], FLOW_SIDX_NONE) &&Do we actually need to check for FLOW_SIDX_NONE explicitly? That is, sidx we get as input here should never be FLOW_SIDX_NONE.I wonder if it makes sense to take care of the possible "overflow" outcome from step L4. of algorithm L you mentioned in 1/3. It *shouldn't* because you're enforcing the minimum size of the hash table, I wonder if it's a good idea anyway.Yeah, I wondered that too, it's probably a good idea for safety. I'll look at implementing that.Hm, fair point. I think the while looked uglier in some earlier versions before I added the _probe() helper so it was duplicated in several places.+ !flow_sidx_eq(tc_hash[b], sidx); b = (b + 1) % TCP_HASH_TABLE_SIZE) ;I respect the fact that this is fundamentally a for loop. :) On the other hand: unsigned b = tcp_conn_hash(c, conn); while (!flow_sidx_eq(tc_hash[b], sidx)) b = (b + 1) % TCP_HASH_TABLE_SIZE); ...would be a bit more readable?-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson@@ -1216,7 +1219,7 @@ static void tcp_hash_insert(const struct ctx *c, struct tcp_tap_conn *conn) { unsigned b = tcp_hash_probe(c, conn); - tc_hash[b] = conn; + tc_hash[b] = FLOW_SIDX(conn, TAPSIDE); flow_dbg(conn, "hash table insert: sock %i, bucket: %u", conn->sock, b); } @@ -1229,16 +1232,18 @@ static void tcp_hash_remove(const struct ctx *c, const struct tcp_tap_conn *conn) { unsigned b = tcp_hash_probe(c, conn), s; + union flow *flow = flow_at_sidx(tc_hash[b]); - if (!tc_hash[b]) + if (!flow) return; /* Redundant remove */ flow_dbg(conn, "hash table remove: sock %i, bucket: %u", conn->sock, b); /* Scan the remainder of the cluster */ - for (s = (b + 1) % TCP_HASH_TABLE_SIZE; tc_hash[s]; + for (s = (b + 1) % TCP_HASH_TABLE_SIZE; + (flow = flow_at_sidx(tc_hash[s])); s = (s + 1) % TCP_HASH_TABLE_SIZE) { - unsigned h = tcp_conn_hash(c, tc_hash[s]); + unsigned h = tcp_conn_hash(c, &flow->tcp); if (in_mod_range(h, b, s, TCP_HASH_TABLE_SIZE)) { /* tc_hash[s] can live in tc_hash[b]'s slot */ @@ -1248,7 +1253,7 @@ static void tcp_hash_remove(const struct ctx *c, } } - tc_hash[b] = NULL; + tc_hash[b] = FLOW_SIDX_NONE; } /** @@ -1263,10 +1268,10 @@ void tcp_tap_conn_update(const struct ctx *c, struct tcp_tap_conn *old, { unsigned b = tcp_hash_probe(c, old); - if (!tc_hash[b]) + if (!flow_at_sidx(tc_hash[b])) return; /* Not in hash table, nothing to update */ - tc_hash[b] = new; + tc_hash[b] = FLOW_SIDX(new, TAPSIDE); debug("TCP: hash table update: old index %u, new index %u, sock %i, " "bucket: %u", FLOW_IDX(old), FLOW_IDX(new), new->sock, b); @@ -1289,16 +1294,18 @@ static struct tcp_tap_conn *tcp_hash_lookup(const struct ctx *c, in_port_t eport, in_port_t fport) { union inany_addr aany; + union flow *flow; unsigned b; inany_from_af(&aany, af, faddr); for (b = tcp_hash(c, &aany, eport, fport); - tc_hash[b] && !tcp_hash_match(tc_hash[b], &aany, eport, fport); + (flow = flow_at_sidx(tc_hash[b])) + && !tcp_hash_match(&flow->tcp, &aany, eport, fport);Same as above about readability (somehow clashing with correctness).b = (b + 1) % TCP_HASH_TABLE_SIZE) ; - return tc_hash[b]; + return &flow->tcp; } /** @@ -3090,6 +3097,11 @@ static void tcp_sock_refill_init(const struct ctx *c) */ int tcp_init(struct ctx *c) { + unsigned b; + + for (b = 0; b < TCP_HASH_TABLE_SIZE; b++) + tc_hash[b] = FLOW_SIDX_NONE; + if (c->ifi4) tcp_sock4_iov_init(c);