On Wed, Jul 01, 2026 at 02:08:02AM +0200, Stefano Brivio wrote:
On Fri, 26 Jun 2026 17:10:03 +1000 David Gibson
wrote: Forwarding rules for the -[tTuU] options are parsed in fwd_rule_parse() and fwd_rule_parse_ports(). Currently those use a mix of loosely recursive descent parsing helpers, consuming input from left to right, and ad-hoc C parsing - searching for delimeters with strchr() or the like
Nit if you respin: delimiters.
Fixed.
and breaking down on that basis.
As well as being a slightly awkward mix, the current approach will not work for adding target addresses to the rules: we can't easily split the listening from target parts first, because the ':' delimiter could appear multiple times for multiple port ranges, or in IPv6 addresses. However, without splitting the target part first, we can't first split off the listening address and interface because the '/' delimiter could appear in the target portion, but not the listening portion, e.g.: -t 12345:192.0.1.1/12345
To address this, rewrite the entirety of the parsing so we consume the input left to right in a more-or-less recursive descent manner. This means we no longer rely on certain delimiters having a single meaning over the whole input, just the next part of it.
Because of the semantics of port specs, we need to make several passes over the list of comma separated chunks. Previously, some of the logic was duplicated between the two passes, making it fragile. Now, we introduce a parse_port_chunk() helper which we re-use in each pass to make it clearer we parse exactly the same way on each pass.
Signed-off-by: David Gibson
--- fwd_rule.c | 262 ++++++++++++++++++++++++++++++----------------------- parse.c | 29 ++++++ parse.h | 2 + 3 files changed, 181 insertions(+), 112 deletions(-) diff --git a/fwd_rule.c b/fwd_rule.c index b14df340..8ae21b99 100644 --- a/fwd_rule.c +++ b/fwd_rule.c @@ -439,17 +439,96 @@ fail: fwd_rule_fmt(&rule, rulestr, sizeof(rulestr))); }
-/* - * for_each_chunk - Step through delimited chunks of a string - * @p_: Pointer to start of each chunk (updated) - * @ep_: Pointer to end of each chunk (updated) - * @s_: String to step through - * @sep_: String of all allowed delimiters +/** + * enum fwd_port_chunk_kind - Kind of port specifier piece
Nit: I think we should use kerneldoc-style comments also for enum (see enum udp_iov_idx, udp.c for an example).
Sure, done.
+ */ +enum fwd_port_chunk_kind { + CHUNK_ALL, /* "all" */ + CHUNK_AUTO, /* "auto" */ + CHUNK_EXCLUDE, /* "~1111-2222" */ + CHUNK_INCLUDE, /* "1111-2222" */ +}; + +/** + * parse_port_chunk() - Parse one chunk of a port specifier + * @cursor: Parsing point (see parse.c) + * @kindp: Updated with kind of chunk we parsed + * @lrange: Updated with listening port range (for INCLUDE & EXCLUDE) + * @trange: Updated with target port range (for EXCLUDE) + */ +static bool parse_port_chunk(const char **cursor, enum fwd_port_chunk_kind *kindp, + struct port_range *lrange, struct port_range *trange)
Nit: this exceeds 80 columns.
Oops, fixed.
I'm wondering whether this doesn't rather belong in parse.c: having it here makes it hard to spot what it returns (the "Theory of Operation" comment from parse.c), and anyway it's pure parsing, nothing else.
Maybe. I left it (and parse_laddrif()) here because they're so specific to the syntax of -[tTuU] and unlikely to be wanted for anything else.
+{ + struct port_range lr = { 0 }, tr = { 0 }; + enum fwd_port_chunk_kind kind; + const char *p = *cursor; + + if (parse_literal(&p, "all")) { + kind = CHUNK_ALL; + } else if (parse_literal(&p, "auto")) { + kind = CHUNK_AUTO; + } else if (parse_literal(&p, "~")) { + kind = CHUNK_EXCLUDE; + if (!parse_port_range(&p, &lr)) + return false; + } else if (parse_port_range(&p, &lr)) { + kind = CHUNK_INCLUDE; + + if (parse_literal(&p, ":")) { + if (!parse_port_range(&p, &tr)) + return false; + } else { + tr = lr; + } + } else { + return false; + } + + *kindp = kind; + *lrange = lr; + if (trange) + *trange = tr; + *cursor = p;
I wonder if it's worth it being rigid about avoiding side effects on failure, in these functions. In general I agree it's desirable, but in practice we don't need it here and it's quite a few extra lines with slightly confusing variable names.
So, side effects on failure is a pretty big footgun: foo_t foo = FOO_DEFAULT; if (parse_foo(&p, &foo)) { /* handle foo override */ } else { /* uh oh foo could be clobbered */ } Which is why I was strict about this. On the other hand, the footgun's still kind of there when you start combining things. foo_t foo = FOO_DEFAULT; bar_t bar; if (parse_foo(&p, &foo) && parse_bar(&p, &bar)) { /* do stuff */ } else { /* Uh oh, foo could be clobbered if parse_foo() succeded, but parse_bar() failed */ } This is one of the nasties which makes me less happy with this framework than I was hoping.
+ return true; +} + +/** + * parse_laddrif() - Parse listening address+ifname
This makes it look like + is a separator, maybe "address%ifname"?
The name isn't really descriptive I think. Maybe we could take for granted that it's about the listening part, and call it parse_addrifname()? Or parse_addr_ifname()?
Sure, adjusted.
+ * @cursor: Parsing cursor (see parse.c) + * @addr: Updated with parsed inany address (NULL for *) + * @abuf: Buffer to store address + * @ifname: Updated with parsed interface name ("" if none) */ -#define for_each_chunk(p_, ep_, s_, sep_) \ - for ((p_) = (s_); \ - (ep_) = (p_) + strcspn((p_), (sep_)), *(p_); \ - (p_) = *(ep_) ? (ep_) + 1 : (ep_)) +static bool parse_laddrif(const char **cursor, + const union inany_addr **addr, + union inany_addr *abuf, + char *ifname) +{ + union inany_addr atmp = inany_any6; + char iftmp[IFNAMSIZ] = {0}; + const char *p; + + if (p = *cursor,
The advantage of using the comma operator isn't really clear to me here. Can't we just do this assignment unconditionally as initialisation?
We could here, but not for the else if below. We need the reinitialization for the else if, because if some but not all of the parse chain for the if clause suceeds, we'll arrive at the else if with p no longer at the start of the string. So, doing it the same way for both if and else if seemed the least bad. This is the #1 example of the sequencing subtlety that I'm not thrilled with in this scheme. I experimented with soe other approaches, but they had their own problems: * Many of the earlier drafts had parse_foo(p, &ep, ...), strtoul() style. With *ep not updated on failure (which seems like the obvious choice), there's a different footgun: have_optional_foo = parse_foo(start, &p, &foo); if (!parse_the_rest(p, &p)) die(); We need p, not start as the first parameter to parse_the_rest(), so it starts after the foo. If there was a foo, if there wasn't a foo, p is uninitialized. Damn. * We could change the convention so that *ep is updated to the entry value of p on failure. Maybe that would be better, but I'm worried there's some other gotcha. At the least it means more caution in all the helpers to enforce that rigid requirement. * We could have each parser take the start point, and return the end point, or NULL on failure. But that means more hassle and verbosity in the callers because we need to keep around parse points to go back to on any non-fatal failure of a sub-parser. Plus it means chaining things together is the rather awkward: ep = parse_last_thing(parse_middle_thing(parse_first_thing(p))); I'd love to have a better way, but I don't see it. Or at least, not in C: I can see lovely ways to do it with closures, generics and higher-order functions.
By the way, while we don't adhere to it, I think MISRA C contains many guidelines that are actually useful (maybe more than half of them?), and MISRA C:2012 forbids this kind of usage because of readability issues:
https://www.mathworks.com/help/bugfinder/ref/misrac2012rule12.3.html
As a rule, I completely agree. It was still the least worst way I could see. The , operator is the only way to reinitialize the cursor between the if condition and the else if condition, short of introducing a bunch of extra booleans.
+ parse_inany(&p, &atmp) && + parse_ifspec(&p, iftmp) && + parse_literal(&p, "/")) { + /* Specific listening address */ + *addr = abuf; + } else if (p = *cursor, + parse_literal(&p, "*"), + parse_ifspec(&p, iftmp) && + parse_literal(&p, "/")) { + /* Missing or "*" address */ + *addr = NULL; + } else { + return false; + } + + *abuf = atmp; + memcpy(ifname, iftmp, IFNAMSIZ); + *cursor = p; + return true; +}
/** * fwd_rule_parse_ports() - Parse port range(s) specifier @@ -466,87 +545,70 @@ static void fwd_rule_parse_ports(struct fwd_table *fwd, bool del, uint8_t proto, const char *spec) { uint8_t exclude[PORT_BITMAP_SIZE] = { 0 }; + enum fwd_port_chunk_kind kind; + struct port_range lrange; bool exclude_only = true; - const char *p, *ep; uint8_t flags = 0; + const char *p; unsigned i;
- /* Parse excluded ranges and "auto" in the first pass */ - for_each_chunk(p, ep, spec, ",") { - struct port_range xrange; - - /* Include range, parse later */ - if (parse_literal(&p, "all") || isdigit(*p)) - continue; - - if (parse_literal(&p, "auto")) { - if (p != ep) /* Garbage after the keyword */ - goto bad; + /* Consider excluded ranges and "auto" in the first pass */ + p = spec; + do { + if (!parse_port_chunk(&p, &kind, &lrange, NULL)) + goto bad;
+ switch (kind) { + case CHUNK_AUTO: if (!(fwd->caps & FWD_CAP_SCAN)) { die( "'auto' port forwarding is only allowed for pasta"); } - flags |= FWD_SCAN; - continue; - } - - /* Should be an exclude range */ - if (!parse_literal(&p, "~")) - goto bad; - - if (!parse_port_range(&p, &xrange)) - goto bad; - if (p != ep) /* Garbage after the range */ - goto bad; - - for (i = xrange.first; i <= xrange.last; i++) - bitmap_set(exclude, i); - } - - /* Now process base ranges, skipping exclusions */ - for_each_chunk(p, ep, spec, ",") { - struct port_range orig_range, mapped_range; - - /* Handle "all" like exclude only */ - if (parse_literal(&p, "all")) { - if (p != ep) /* Garbage after the keyword */ - goto bad; + break;
- continue; + case CHUNK_EXCLUDE: + for (i = lrange.first; i <= lrange.last; i++) + bitmap_set(exclude, i); + break; + default: + ; /* Handled later */ } + } while (parse_literal(&p, ","));
- if (!isdigit(*p)) - /* Already parsed */ - continue; + /* Consider included ranges in next pass */ + p = spec; + do { + struct port_range trange;
- if (!parse_port_range(&p, &orig_range)) + if (!parse_port_chunk(&p, &kind, &lrange, &trange)) goto bad;
- exclude_only = false; + switch (kind) { + case CHUNK_AUTO: /* already handled */ + case CHUNK_EXCLUDE: /* already handled */ + case CHUNK_ALL: /* handled later */ + continue;
- if (parse_literal(&p, ":")) { - /* There's a range to map to as well */ - if (!parse_port_range(&p, &mapped_range)) - goto bad; - if ((mapped_range.last - mapped_range.first) != - (orig_range.last - orig_range.first)) + case CHUNK_INCLUDE: + exclude_only = false; + if (trange.last - trange.first != + lrange.last - lrange.first) goto bad; - } else { - mapped_range = orig_range; - }
- if (p != ep) /* Garbage after the ranges */ + fwd_rule_range_except(fwd, del, proto, addr, ifname, + lrange.first, lrange.last, + exclude, trange.first, flags); + break; + default: goto bad; + } + } while (parse_literal(&p, ","));
- fwd_rule_range_except(fwd, del, proto, addr, ifname, - orig_range.first, orig_range.last, - exclude, - mapped_range.first, flags); - } + if (!parse_eoi(p)) + goto bad; /* trailing garbage */
- /* Finally handle "all" and exclude only specs */ + /* Finally handle "all" and exclude only cases */ if (exclude_only) { fwd_port_map_ephemeral(exclude);
@@ -569,10 +631,11 @@ bad: void fwd_rule_parse(char optname, bool del, const char *optarg, struct fwd_table *fwd) { - char buf[BUFSIZ], *spec, *ifname = NULL; - union inany_addr addr_buf = inany_any6; - const union inany_addr *addr = &addr_buf; + const union inany_addr *addr; + union inany_addr addr_buf; + char ifname[IFNAMSIZ]; uint8_t proto; + const char *p;
if (optname == 't' || optname == 'T') proto = IPPROTO_TCP; @@ -581,7 +644,8 @@ void fwd_rule_parse(char optname, bool del, const char *optarg, else assert(0);
- if (!strcmp(optarg, "none")) { + if (p = optarg,
Same as above: do we really need this? It's local anyway.
Here, no, but the idea was using this as the consistent style was safer than only using it when we strictly need it for an else if.
+ parse_literal(&p, "none") && parse_eoi(p)) { unsigned i;
for (i = 0; i < fwd->count; i++) { @@ -593,45 +657,19 @@ void fwd_rule_parse(char optname, bool del, const char *optarg, return; }
- strncpy(buf, optarg, sizeof(buf) - 1); - - if ((spec = strchr(buf, '/'))) { - const char *p = buf; - - *spec = 0; - spec++; - - if (optname != 't' && optname != 'u') + if (p = optarg, + parse_laddrif(&p, &addr, &addr_buf, ifname)) { + if (optname == 'T' || optname == 'U') die("Listening address not allowed for -%c %s", optname, optarg); - - if ((ifname = strchr(buf, '%'))) { - *ifname = 0; - ifname++; - - /* spec is already advanced one past the '/', - * so the length of the given ifname is: - * (spec - ifname - 1) - */ - if (spec - ifname - 1 >= IFNAMSIZ) { - die("Interface name '%s' is too long (max %u)", - ifname, IFNAMSIZ - 1); - } - } - - if (ifname == buf + 1 || /* Interface without address */ - !strcmp(buf, "*")) /* Explicit wildcard address */ - addr = NULL; - else if (!parse_inany(&p, &addr_buf) && parse_eoi(p)) - die("Bad forwarding address '%s'", buf); } else { - spec = buf; - + /* No address or ifname */ addr = NULL; + ifname[0] = '\0';
Actually, here, having side effects from parse_laddrif() failing would be rather convenient and intuitive.
Yes, but it kind of just moves the awkwardness into the function. This is roughly the difference between having a sub-parser "fail", requiring the caller to fall back, or having it "succeed" by parsing an empty string into a default value. parse_laddrif() uses the former style. parse_ifspec() uses the latter.
}
if (optname == 'T' || optname == 'U') { - assert(!addr && !ifname); + assert(!addr && !*ifname);
if (!(fwd->caps & FWD_CAP_IFNAME)) { warn( @@ -640,18 +678,18 @@ void fwd_rule_parse(char optname, bool del, const char *optarg,
if (fwd->caps & FWD_CAP_IPV4) { fwd_rule_parse_ports(fwd, del, proto, - &inany_loopback4, NULL, - spec); + &inany_loopback4, NULL, p); } if (fwd->caps & FWD_CAP_IPV6) { fwd_rule_parse_ports(fwd, del, proto, - &inany_loopback6, NULL, - spec); + &inany_loopback6, NULL, p); } return; }
- ifname = "lo"; + static_assert(sizeof("lo") <= sizeof(ifname), + "ifname buffer too small"); + strcpy(ifname, "lo"); }
/* No need for dual stack if we only have one IP version */ @@ -660,13 +698,13 @@ void fwd_rule_parse(char optname, bool del, const char *optarg, else if (!addr && !(fwd->caps & FWD_CAP_IPV6)) addr = &inany_any4;
- if (ifname && !(fwd->caps & FWD_CAP_IFNAME)) { + if (*ifname && !(fwd->caps & FWD_CAP_IFNAME)) { die( "Device binding for '-%c %s' unsupported (requires kernel 5.7+)", optname, optarg); }
- fwd_rule_parse_ports(fwd, del, proto, addr, ifname, spec); + fwd_rule_parse_ports(fwd, del, proto, addr, *ifname ? ifname : NULL, p); }
/** diff --git a/parse.c b/parse.c index 3e0dbd45..d83ea9f9 100644 --- a/parse.c +++ b/parse.c @@ -18,6 +18,7 @@ #include
#include #include +#include #include "common.h" #include "parse.h" @@ -212,3 +213,31 @@ bool parse_inany_(const char **cursor, union inany_addr *addr,
return false; } + +/** + * parse_ifspec() - Parse a interface name specifier (starting with %) + * @ifname: On success updated with parsed name (must have IFNAMSIZ space) + * + * This will accept a missing specifier (empty string), setting ifname to "" + */ +bool parse_ifspec(const char **cursor, char *ifname) +{ + const char *p = *cursor; + size_t len; + + if (!parse_literal(&p, "%")) { + /* No interface specifier */ + ifname[0] = '\0'; + return true; + } + + /* ifnames can have anything that's not '/' or whitespace */
dev_valid_name(), net/core/dev.c in the Linux kernel, also forbids '.' and '..', to avoid breaking sysfs. We don't really _need_ to check for that as we don't create interfaces, but strictly speaking the comment isn't accurate, and we might want to filter for consistency anyway.
Ah, missed that. Fixed.
+ len = strcspn(p, "/ \f\n\r\t\v"); + if (!len || len >= IFNAMSIZ) + return false; + + memcpy(ifname, p, len); + ifname[len] = '\0'; + *cursor = p + len; + return true; +} diff --git a/parse.h b/parse.h index 08b038cf..1079f5bf 100644 --- a/parse.h +++ b/parse.h @@ -32,4 +32,6 @@ bool parse_inany_(const char **cursor, union inany_addr *addr,
#define parse_inany(cursor, addr) parse_inany_((cursor), (addr), NULL)
+bool parse_ifspec(const char **cursor, char *ifname); + #endif /* _PARSE_H */
Other than those comments, it all looks good to me, and also like an obvious, much needed improvement. I tested it quite a bit and I didn't manage to break things.
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson