On Tue, Jan 13, 2026 at 11:13:03PM +0100, Stefano Brivio wrote:
On Thu, 8 Jan 2026 13:29:48 +1100 David Gibson
wrote: Now that listening sockets include a reference to the forwarding rule which created them we can, in many cases, avoid a linear search of the forwarding table when we want to find the relevant rule. Instead we can take the rule index from the socket's epoll reference, and use that to immediately find the correct rule.
This is conceptually simple, but requires a moderate amount of plumbing to get the index from the reference through to the rule lookup. We still allow fall back to linear search if we don't have the index, and this may (rarely) be used in the udp_flush_flow() case, where we could get packets for one flow on a different flow's socket, rather than through a listening socket as usual.
Signed-off-by: David Gibson
--- flow.c | 13 +++++++++++-- flow_table.h | 2 +- fwd.c | 3 +-- fwd.h | 1 + icmp.c | 2 +- tcp.c | 4 ++-- udp.c | 14 +++++++++----- udp_flow.c | 14 ++++++++------ udp_flow.h | 2 +- udp_internal.h | 4 ++-- 10 files changed, 37 insertions(+), 22 deletions(-) diff --git a/flow.c b/flow.c index 38f88732..85759970 100644 --- a/flow.c +++ b/flow.c @@ -482,12 +482,13 @@ struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif, * flow_target() - Determine where flow should forward to, and move to TGT * @c: Execution context * @flow: Flow to forward + * @rule_hint: Index of relevant forwarding rule, or -1 if unknown * @proto: Protocol * * Return: pointer to the target flowside information */ struct flowside *flow_target(const struct ctx *c, union flow *flow, - uint8_t proto) + int rule_hint, uint8_t proto) { char estr[INANY_ADDRSTRLEN], ostr[INANY_ADDRSTRLEN]; struct flow_common *f = &flow->f; @@ -515,7 +516,15 @@ struct flowside *flow_target(const struct ctx *c, union flow *flow, else goto nofwd;
- if (!(rule = fwd_rule_search(fwd, ini))) { + if (rule_hint >= 0) { + ASSERT((unsigned)rule_hint < fwd->count); + rule = &fwd->rules[rule_hint]; + if (!fwd_rule_match(rule, ini)) { + flow_dbg(flow, + "Unexpected mismatching forward rule"); + goto nofwd; + } + } else if (!(rule = fwd_rule_search(fwd, ini))) {
*If* we need this fwd_rule_search() / fwd_rule_match() / else if fwd_rule_search() pattern in more places, it might be convenient to hide the complexity by taking 'rule_hint' as argument for fwd_rule_search() perhaps, and then do something like this in fwd_rule_search():
for (i = hint == -1 ? 0 : hint; i < fwd->count; i++) { if (fwd_rule_match(&fwd->rules[i], ini)) return &fwd->rules[i]; if (hint) break; }
return NULL;
but I haven't really thought this through. It makes fwd_rule_search() a bit ugly.
I considered that approach, but decided against it. However, looking again I think that might make dealing with the coverity issue in the earlier patch a bit cleaner, so I'm reconsidering.
/* This shouldn't happen, because if there's no rule for * it we should have no listening socket that would let * us get here diff --git a/flow_table.h b/flow_table.h index 5ee13acc..73de13ba 100644 --- a/flow_table.h +++ b/flow_table.h @@ -207,7 +207,7 @@ const struct flowside *flow_target_af(union flow *flow, uint8_t pif, const void *saddr, in_port_t sport, const void *daddr, in_port_t dport); struct flowside *flow_target(const struct ctx *c, union flow *flow, - uint8_t proto); + int rule_hint, uint8_t proto);
union flow *flow_set_type(union flow *flow, enum flow_type type); #define FLOW_SET_TYPE(flow_, t_, var_) (&flow_set_type((flow_), (t_))->var_) diff --git a/fwd.c b/fwd.c index 6727d26f..250b470c 100644 --- a/fwd.c +++ b/fwd.c @@ -415,8 +415,7 @@ void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags, * * Returns: true if the rule applies to the flow, false otherwise */ -static bool fwd_rule_match(const struct fwd_rule *rule, - const struct flowside *ini) +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini) { return inany_matches(&ini->oaddr, fwd_rule_addr(rule)) && ini->oport >= rule->first && ini->oport <= rule->last; diff --git a/fwd.h b/fwd.h index 435f422a..2af7791d 100644 --- a/fwd.h +++ b/fwd.h @@ -106,6 +106,7 @@ struct fwd_ports { void fwd_rule_add(struct fwd_ports *fwd, uint8_t flags, const union inany_addr *addr, const char *ifname, in_port_t first, in_port_t last, in_port_t to); +bool fwd_rule_match(const struct fwd_rule *rule, const struct flowside *ini); const struct fwd_rule *fwd_rule_search(const struct fwd_ports *fwd, const struct flowside *ini); void fwd_rules_print(const struct fwd_ports *fwd); diff --git a/icmp.c b/icmp.c index 9564c496..0f4d48bb 100644 --- a/icmp.c +++ b/icmp.c @@ -183,7 +183,7 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, return NULL;
flow_initiate_af(flow, PIF_TAP, af, saddr, id, daddr, id); - if (!(tgt = flow_target(c, flow, proto))) + if (!(tgt = flow_target(c, flow, -1, proto))) goto cancel;
if (flow->f.pif[TGTSIDE] != PIF_HOST) { diff --git a/tcp.c b/tcp.c index fc03e38f..89b22a51 100644 --- a/tcp.c +++ b/tcp.c @@ -1657,7 +1657,7 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, ini = flow_initiate_af(flow, PIF_TAP, af, saddr, srcport, daddr, dstport);
- if (!(tgt = flow_target(c, flow, IPPROTO_TCP))) + if (!(tgt = flow_target(c, flow, -1, IPPROTO_TCP))) goto cancel;
if (flow->f.pif[TGTSIDE] != PIF_HOST) { @@ -2496,7 +2496,7 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, goto cancel; }
- if (!flow_target(c, flow, IPPROTO_TCP)) + if (!flow_target(c, flow, ref.listen.rule, IPPROTO_TCP)) goto cancel;
switch (flow->f.pif[TGTSIDE]) { diff --git a/udp.c b/udp.c index 761221f6..d7e47e52 100644 --- a/udp.c +++ b/udp.c @@ -838,12 +838,13 @@ static void udp_buf_sock_to_tap(const struct ctx *c, int s, int n, * udp_sock_fwd() - Forward datagrams from a possibly unconnected socket * @c: Execution context * @s: Socket to forward from + * @rule_hint: Forwarding rule to use, or -1 if unknown * @frompif: Interface to which @s belongs * @port: Our (local) port number of @s * @now: Current timestamp */ -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, - in_port_t port, const struct timespec *now) +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, + uint8_t frompif, in_port_t port, const struct timespec *now) { union sockaddr_inany src; union inany_addr dst; @@ -868,7 +869,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, continue; }
- tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, now); + tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, + rule_hint, now); topif = pif_at_sidx(tosidx);
if (pif_is_socket(topif)) { @@ -910,8 +912,10 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, const struct timespec *now) { - if (events & (EPOLLERR | EPOLLIN)) - udp_sock_fwd(c, ref.fd, ref.listen.pif, ref.listen.port, now); + if (events & (EPOLLERR | EPOLLIN)) { + udp_sock_fwd(c, ref.fd, ref.listen.rule, + ref.listen.pif, ref.listen.port, now); + } }
/** diff --git a/udp_flow.c b/udp_flow.c index 8907f2f7..b82c6c06 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -139,6 +139,7 @@ static int udp_flow_sock(const struct ctx *c, * udp_flow_new() - Common setup for a new UDP flow * @c: Execution context * @flow: Initiated flow + * @rule_hint: Index of forwarding rule, or -1 if unknown * @now: Timestamp * * Return: sidx for the target side of the new UDP flow, or FLOW_SIDX_NONE @@ -147,13 +148,13 @@ static int udp_flow_sock(const struct ctx *c, * #syscalls getsockname */ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, - const struct timespec *now) + int rule_hint, const struct timespec *now) { struct udp_flow *uflow = NULL; const struct flowside *tgt; unsigned sidei;
- if (!(tgt = flow_target(c, flow, IPPROTO_UDP))) + if (!(tgt = flow_target(c, flow, rule_hint, IPPROTO_UDP))) goto cancel;
uflow = FLOW_SET_TYPE(flow, FLOW_UDP, udp); @@ -216,6 +217,7 @@ cancel: * @dst: Our (local) address to which the datagram is arriving * @port: Our (local) port number to which the datagram is arriving * @s_in: Source socket address, filled in by recvmmsg() + * @rule_hint: Index of forwarding rule, or -1 if unknown * @now: Timestamp * * #syscalls fcntl arm:fcntl64 ppc64:fcntl64|fcntl i686:fcntl64 @@ -226,7 +228,7 @@ cancel: flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, const union inany_addr *dst, in_port_t port, const union sockaddr_inany *s_in, - const struct timespec *now) + int rule_hint, const struct timespec *now) { const struct flowside *ini; struct udp_flow *uflow; @@ -260,7 +262,7 @@ flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, return FLOW_SIDX_NONE; }
- return udp_flow_new(c, flow, now); + return udp_flow_new(c, flow, rule_hint, now); }
/** @@ -316,7 +318,7 @@ flow_sidx_t udp_flow_from_tap(const struct ctx *c, return FLOW_SIDX_NONE; }
- return udp_flow_new(c, flow, now); + return udp_flow_new(c, flow, -1, now); }
/** @@ -332,7 +334,7 @@ static void udp_flush_flow(const struct ctx *c, { /* We don't know exactly where the datagrams will come from, but we know * they'll have an interface and oport matching this flow */ - udp_sock_fwd(c, uflow->s[sidei], uflow->f.pif[sidei], + udp_sock_fwd(c, -1, uflow->s[sidei], uflow->f.pif[sidei],
I guess this is fixed on your local branch because this patch (and only this one) applied with some fuzz, but in case it's not... this passes -1 as socket ('s' argument), not as 'rule_hint'.
Oops, no. Not sure where the fuzz came from, but this was wrong in my tree..
If I swap it with uflow->s[sidei] it all works as expected.
.. but not any more.
uflow->f.side[sidei].oport, now); }
diff --git a/udp_flow.h b/udp_flow.h index 4c528e95..14e0f920 100644 --- a/udp_flow.h +++ b/udp_flow.h @@ -35,7 +35,7 @@ struct udp_flow *udp_at_sidx(flow_sidx_t sidx); flow_sidx_t udp_flow_from_sock(const struct ctx *c, uint8_t pif, const union inany_addr *dst, in_port_t port, const union sockaddr_inany *s_in, - const struct timespec *now); + int rule_hint, const struct timespec *now); flow_sidx_t udp_flow_from_tap(const struct ctx *c, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, diff --git a/udp_internal.h b/udp_internal.h index 96d11cff..f0ce8f14 100644 --- a/udp_internal.h +++ b/udp_internal.h @@ -28,7 +28,7 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp, size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp, const struct flowside *toside, size_t dlen, bool no_udp_csum); -void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, - in_port_t port, const struct timespec *now); +void udp_sock_fwd(const struct ctx *c, int s, int rule_hint, + uint8_t frompif, in_port_t port, const struct timespec *now);
#endif /* UDP_INTERNAL_H */
So, uh, having reached this point, I was asking myself "is this really everything"? :)
Well, it's not exactly everything as you mentioned in the cover letter, perhaps the biggest missing part (other than client / interface) is outbound forwarding, but still, reaching this point with just 14 relatively small patches is quite impressive.
I guess it's also merit of how generic / well structured the flow table and especially the flow "stages" are.
Thank you? As also noted in the cover letter, I think this is now complete enough to be interesting. There is a lot still to do though: * Allow configuration of the target address as well as matching address * Allow addresses to be configured for outbound / splice mappings * Allow n to 1 port remapping * Remove port bitmap (or rather make them local variables only) * This is a bit complicated because if we want to implement port scanning for passt, we might need it to be persistent again * Allow "auto" mappings to be configured * Allow "auto" mappings to be for specific bind addresses * Outbound forwarding table * Configuring specific NATs * ? Merging separate tcp/udp in/out tables into one * Remove global forwarding mode (other than as a local during conf) * Probably a bunch more I'm not remembering And, of course, dynamic updates. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson