[PATCH v8 00/19] Dynamic configuration update implementation
Changes in v8: * Implement --add, --delete, and --clear in 19/19, to add forwarding rules instead of replacing tables, delete existing rules, and explicitly clear tables * Address Laurent's comments for 15/19 and 17/19 * In 10/19, instead of passing SOCK_NONBLOCK to accept4(), explicitly set O_NONBLOCK on the listening socket. Using SOCK_NONBLOCK doesn't do what we want, as it results in setting O_NONBLOCK on the new socket rather than on the listening one * Note: 18/19 is left as it is, I didn't address pending comments yet * Note: this doesn't include yet changes for AppArmor and SELinux policies, as well as changes for the template Fedora spec file. I'm still working on them Changes in v7: * Addressed comments from Laurent in 6/18, 8/18, 9/18, 10/18, 11/18, 12/18, 14/18, 15/18 (details in commit messages of single patches, before my Signed-off-by) * Note: this doesn't include yet --add and --delete, I'm still working on that Changes in v6: * Addressed comments from Jon in 10/18, 11/18, 14/18, and 16/18 * Dodged all warnings from static checkers (Coverity Scan and clang-tidy) with changes in 10/18, 11/18, 16/18, and with a new patch, 18/18 * This does *not* include yet the implementation of --add and --delete switches for pesto as I originally intended, I'm rather far from being done with those. At the moment I just have a "mode selection" implementation for command line parsing but merging rules to / removing rules from / clearing the current table is something I barely started (and what I have at the moment isn't really valuable anyway) David wrote: --- Here's the next draft of dynamic configuration updates. This now can successfully update rules, though I've not tested it very extensively. Patches 1..8/18 are preliminary reworks that make sense even without pesto - feel free to apply if you're happy with them. I don't think the rest should be applied yet; we need to at least harden it so passt can't be blocked indefinitely by a client which sends a partial update then waits. Based on my earlier series reworking static checking invocation. TODO: - Don't allow a client which sends a partial configuration then blocks also block passt - Allow pesto to clear existing configuration, not just add - Allow pesto selectively delete existing rules, not just add Changes in v5: * If multiple clients connect at once, they're now blocked until the first one finishes, instead of later ones being discarded Changes in v4: * Merged with remainder of forward rule parsing rework series * Fix some bugs in rule checking pointed out by Laurent * Significantly cleaned up option parsing code * Changed from replacing all existing rules to adding new rules (clear and remove still TBD) * Somewhat simplified protocol (pif names and rules sent in a single pass) * pesto is now allocation free * Fixed commit message and style nits pointed out by Stefano Changes in v3: * Removed already applied ASSERT() rename * Renamed serialisation functions * Incorporated Stefano's extensions, reworked and fixed * Several additional cleanups / preliminary reworks Changes in v2: * Removed already applied cleanups * Reworked assert() patch to handle -DNDEBUG properly * Numerous extra patches: * Factored out serialisation helpers and use them for migration as well * Reworked to allow ip.[ch] and inany.[ch] to be shared with pesto * Reworks to share some forwarding rule datatypes with pesto * Implemented sending pif names and current ruleset to pesto --- David Gibson (17): conf, fwd: Stricter rule checking in fwd_rule_add() fwd_rule: Move ephemeral port probing to fwd_rule.c fwd, conf: Move rule parsing code to fwd_rule.[ch] fwd_rule: Move conflict checking back within fwd_rule_add() fwd: Generalise fwd_rules_info() pif: Limit pif names to 128 bytes fwd_rule: Fix some format specifiers pesto: Introduce stub configuration tool pesto, log: Share log.h (but not log.c) with pesto tool pesto, conf: Have pesto connect to passt and check versions pesto: Expose list of pifs to pesto and display them ip: Prepare ip.[ch] for sharing with pesto tool inany: Prepare inany.[ch] for sharing with pesto tool pesto: Read current ruleset from passt/pasta and optionally display it pesto: Parse and add new rules from command line pesto, conf: Send updated rules from pesto back to passt/pasta conf, fwd: Allow switching to new rules received from pesto Stefano Brivio (2): fwd_rule: Fix static checkers warnings in fwd_rule_add() pesto, conf, fwd_rule: Add options and modes to add, delete, clear rules .gitignore | 2 + Makefile | 53 ++-- common.h | 116 +++++++++ conf.c | 696 ++++++++++++++++++++++----------------------------- conf.h | 2 + epoll_type.h | 4 + flow.c | 4 +- fwd.c | 169 ++++--------- fwd.h | 41 +-- fwd_rule.c | 680 +++++++++++++++++++++++++++++++++++++++++++++++-- fwd_rule.h | 68 ++++- inany.c | 19 +- inany.h | 17 +- ip.c | 56 +---- ip.h | 4 +- lineread.c | 2 +- log.h | 53 +++- passt.1 | 5 + passt.c | 8 + passt.h | 8 + pesto.1 | 271 ++++++++++++++++++++ pesto.c | 520 ++++++++++++++++++++++++++++++++++++++ pesto.h | 54 ++++ pif.c | 2 +- pif.h | 7 +- serialise.c | 7 + serialise.h | 1 + siphash.h | 13 + tap.c | 52 ++++ util.h | 110 +------- 30 files changed, 2252 insertions(+), 792 deletions(-) create mode 100644 common.h create mode 100644 pesto.1 create mode 100644 pesto.c create mode 100644 pesto.h -- 2.43.0
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
From: David Gibson
The new checks are actually sufficient but not enough for Coverity
Scan. Now that fwd->sock_count and new->last are affected or supplied
by clients, we need explicit (albeit redundant) checks on them.
Signed-off-by: Stefano Brivio
Instead of just being able to replace the existing forwarding table,
implement --add and --delete options to maintain the table and add
or delete specific ports.
The option --clear PIF forces the clearing of a table, instead.
These options can be combined arbitrarily and are handled as
sequential commands, as now described in pesto(1).
If no option is given before forwarding specifiers for a matching
table, the command line is interpreted as a replacement of the
existing rules.
To this end:
- there's no protocol change, as pesto is anyway sending updated
copies of the table
- the forwarding table functions now include a new fwd_rule_del(),
which deletes existing rule only if a matching one is found
- a trivial fwd_rule_clear() is factored out from the existing
conf_handler() implementation, so that it can be directly used
in pesto
The entry points for parsing of port specifiers now take an additional
'del' parameter which is passed down all the way before reaching the
fwd_rule_add() implementation. If a rule should be deleted, at that
point, fwd_rule_del() is called instead.
Signed-off-by: Stefano Brivio
On Wed, May 06, 2026 at 01:47:10AM +0200, Stefano Brivio wrote:
From: David Gibson
Start implementing pesto in earnest. Create a control/configuration socket in passt. Have pesto connect to it and retrieve a server greeting Perform some basic version checking.
Signed-off-by: David Gibson
[sbrivio: Avoid potential recursive calling between conf_accept() and conf_close(), reported by clang-tidy]
Huh. For some reason that warning didn't trip for me. Although it's technically true they can mutually recurse, I believe they're both tail calls, so it shouldn't eat the stack.
[sbrivio: In conf(), check we're not exceeding sizeof(c->control_path) instead of sizeof(c->socket_path), and, in pesto's main(), print argv[optind] instead of argv[1] to indicate an invalid socket path, both reported by Jon Maloy] [sbrivio: In pesto's main(), drop unnecessary newline from error message, reported by Laurent] [sbrivio: Don't use SOCK_NONBLOCK on accept4(), as that only applies to the *new* file descriptor, which we don't want -- set O_NONBLOCK on the listening file descriptor using fcntl()]
Making the new (accepted) socket non-blocking was the intended behaviour here. We also want non-blocking for the listening socket, but that was already done in feab892c7 ("tap, repair: Use SOCK_NONBLOCK and SOCK_CLOEXEC on Unix sockets"). WIth the current design, I guess we don't want non-blocking on the accepted socket, although I don't think it actually matters very much. We will want non-blocking it when we change this to read out the updated rules incrementally, rather than all at once. [snip]
@@ -1072,6 +1080,19 @@ static void conf_open_files(struct ctx *c) if (c->pidfile_fd < 0) die_perror("Couldn't open PID file %s", c->pidfile); } + + c->fd_control = -1; + if (*c->control_path) { + c->fd_control_listen = sock_unix(c->control_path); + if (c->fd_control_listen < 0) { + die_perror("Couldn't open control socket %s", + c->control_path); + } + if (fcntl(c->fd_control_listen, F_SETFL, O_NONBLOCK)) + die_perror("Couldn't set O_NONBLOCK on control socket");
So, this is unneccessary because of feab892c7. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, May 06, 2026 at 01:47:19AM +0200, Stefano Brivio wrote:
Instead of just being able to replace the existing forwarding table,
As of my last version, we already added, rather than replacing.
implement --add and --delete options to maintain the table and add or delete specific ports.
The option --clear PIF forces the clearing of a table, instead.
These options can be combined arbitrarily and are handled as sequential commands, as now described in pesto(1).
If no option is given before forwarding specifiers for a matching table, the command line is interpreted as a replacement of the existing rules.
To this end:
- there's no protocol change, as pesto is anyway sending updated copies of the table
- the forwarding table functions now include a new fwd_rule_del(), which deletes existing rule only if a matching one is found
- a trivial fwd_rule_clear() is factored out from the existing conf_handler() implementation, so that it can be directly used in pesto
The entry points for parsing of port specifiers now take an additional 'del' parameter which is passed down all the way before reaching the fwd_rule_add() implementation. If a rule should be deleted, at that point, fwd_rule_del() is called instead.
Signed-off-by: Stefano Brivio
--- conf.c | 26 ++++++---------- fwd_rule.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++------- fwd_rule.h | 4 ++- pesto.1 | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++ pesto.c | 53 ++++++++++++++++++++++++++++--- 5 files changed, 227 insertions(+), 32 deletions(-) diff --git a/conf.c b/conf.c index 3f48793..909c34c 100644 --- a/conf.c +++ b/conf.c @@ -1849,16 +1849,16 @@ void conf(struct ctx *c, int argc, char **argv)
if (name == 't') { opt_t = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'u') { opt_u = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'T') { opt_T = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } else if (name == 'U') { opt_U = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } } while (name != -1);
@@ -1910,13 +1910,13 @@ void conf(struct ctx *c, int argc, char **argv)
if (c->mode == MODE_PASTA) { if (!opt_t) - fwd_rule_parse('t', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('t', false, "auto", c->fwd[PIF_HOST]); if (!opt_T) - fwd_rule_parse('T', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('T', false, "auto", c->fwd[PIF_SPLICE]); if (!opt_u) - fwd_rule_parse('u', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('u', false, "auto", c->fwd[PIF_HOST]); if (!opt_U) - fwd_rule_parse('U', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('U', false, "auto", c->fwd[PIF_SPLICE]); }
conf_sock_listen(c); @@ -2135,14 +2135,8 @@ void conf_handler(struct ctx *c, uint32_t events) unsigned pif;
/* Clear pending tables */ - for (pif = 0; pif < PIF_NUM_TYPES; pif++) { - struct fwd_table *fwd = c->fwd_pending[pif]; - - if (!fwd) - continue; - fwd->count = 0; - fwd->sock_count = 0; - } + for (pif = 0; pif < PIF_NUM_TYPES; pif++) + fwd_rule_clear(c->fwd_pending[pif]);
/* FIXME: this could block indefinitely if the client doesn't * write as much as it should diff --git a/fwd_rule.c b/fwd_rule.c index 03e8e80..eb9a601 100644 --- a/fwd_rule.c +++ b/fwd_rule.c @@ -180,6 +180,66 @@ static bool fwd_rule_conflicts(const struct fwd_rule *a, const struct fwd_rule * return true; }
+/** + * fwd_rule_clear() - Clear a forwarding table + * @fwd: Table to clear (might be NULL) + */ +void fwd_rule_clear(struct fwd_table *fwd) +{ + if (!fwd) + return; +
Not essential, but I wonder if it would be wise to verify that there are no currently open sockets associated with any of the rules.
+ fwd->count = 0; + fwd->sock_count = 0; +} + +/** + * fwd_rule_del() - Partially validate and delete a rule from a forwarding table + * @fwd: Table to delete from + * @rule: Rule to delete (must match an existing rule) + * + * Return: 0 on success, negative error code on failure (-ENOENT if not found) + * + * NOTE: This function can't be used for a forwarding table with valid sockets + * stored in fwd->rulesocks.
The exact meaning of this isn't very clear to me. Does "valid" mean "open" or something else? I think what you're getting at is that every entry in fwd->socks[] must be -1. Or at least every entry with index in [0,sock_count)
+ */ +static int fwd_rule_del(struct fwd_table *fwd, const struct fwd_rule *rule) +{ + unsigned num, i; + + for (i = 0; i < fwd->count; i++) { + if (fwd_rule_conflicts(rule, &fwd->rules[i])) + break; + }
So, this deletes any conflicting rule, not only exact matches. That's not very clear from the description of @rule.
+ + if (i == fwd->count) { + char newstr[FWD_RULE_STRLEN]; + + warn("Couldn't find forwarding rule to delete: %s", + fwd_rule_fmt(rule, newstr, sizeof(newstr))); + return -ENOENT; + } + + /* Don't use anything else from 'rule' as passed, it's not validated */ + rule = &fwd->rules[i]; + num = (unsigned)rule->last - rule->first + 1; + + fwd->count--; + + memmove((void *)(fwd->rulesocks + i), (void *)(fwd->rulesocks + i + 1),
I don't think the explicit (void *) casts are necessary - they should be implicit from memmove()s signature.
+ (fwd->count - i) * sizeof(*fwd->rulesocks));
Is memmove() guaranteed to be safe if given a zero length? That will occur if deleting the last entry.
+ /* TODO: move sockets stored starting from fwd->rulesocks[i + i], should + * we ever need to delete rules from a table with open sockets. + */ + fwd->sock_count -= num; + + memmove(fwd->rules + i, fwd->rules + i + 1, + (fwd->count - i) * sizeof(*fwd->rules)
Again, is this safe if i == fwd->count?
+ + return 0; +} + /** * fwd_rule_add() - Validate and add a rule to a forwarding table * @fwd: Table to add to @@ -368,6 +428,7 @@ static int parse_keyword(const char *s, const char **endptr, const char *kw) * fwd_rule_range_except() - Set up forwarding for a range of ports minus a * bitmap of exclusions * @fwd: Forwarding table to be updated + * @del: Delete resulting rules from forwarding table, instead of adding
Clunky, but it gets the job done.
* @proto: Protocol to forward * @addr: Listening address * @ifname: Listening interface @@ -377,8 +438,8 @@ static int parse_keyword(const char *s, const char **endptr, const char *kw) * @to: Port to translate @first to when forwarding * @flags: Flags for forwarding entries */ -static void fwd_rule_range_except(struct fwd_table *fwd, uint8_t proto, - const union inany_addr *addr, +static void fwd_rule_range_except(struct fwd_table *fwd, bool del, + uint8_t proto, const union inany_addr *addr, const char *ifname, uint16_t first, uint16_t last, const uint8_t *exclude, uint16_t to, @@ -418,8 +479,13 @@ static void fwd_rule_range_except(struct fwd_table *fwd, uint8_t proto, rule.last = i - 1; rule.to = base + delta;
- if (fwd_rule_add(fwd, &rule) < 0) - goto fail; + if (del) { + if (fwd_rule_del(fwd, &rule) < 0) + goto fail; + } else { + if (fwd_rule_add(fwd, &rule) < 0) + goto fail; + }
base = i - 1; } @@ -445,12 +511,13 @@ fail: /** * fwd_rule_parse_ports() - Parse port range(s) specifier * @fwd: Forwarding table to be updated + * @del: Delete resulting rules from forwarding table, instead of adding * @proto: Protocol to forward * @addr: Listening address for forwarding * @ifname: Interface name for listening * @spec: Port range(s) specifier */ -static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, +static void fwd_rule_parse_ports(struct fwd_table *fwd, bool del, uint8_t proto, const union inany_addr *addr, const char *ifname, const char *spec) @@ -507,7 +574,7 @@ static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, /* Exclude ephemeral ports */ fwd_port_map_ephemeral(exclude);
- fwd_rule_range_except(fwd, proto, addr, ifname, + fwd_rule_range_except(fwd, del, proto, addr, ifname, 1, NUM_PORTS - 1, exclude, 1, flags | FWD_WEAK); return; @@ -537,7 +604,7 @@ static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, if (p != ep) /* Garbage after the ranges */ goto bad;
- fwd_rule_range_except(fwd, proto, addr, ifname, + fwd_rule_range_except(fwd, del, proto, addr, ifname, orig_range.first, orig_range.last, exclude, mapped_range.first, flags); @@ -551,10 +618,12 @@ bad: /** * fwd_rule_parse() - Parse port configuration option * @optname: Short option name, t, T, u, or U + * @del: Delete resulting rules from forwarding table, instead of adding * @optarg: Option argument (port specification) * @fwd: Forwarding table to be updated */ -void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) +void fwd_rule_parse(char optname, bool del, const char *optarg, + struct fwd_table *fwd) { union inany_addr addr_buf = inany_any6, *addr = &addr_buf; char buf[BUFSIZ], *spec, *ifname = NULL; @@ -632,12 +701,12 @@ void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) optname, optarg);
if (fwd->caps & FWD_CAP_IPV4) { - fwd_rule_parse_ports(fwd, proto, + fwd_rule_parse_ports(fwd, del, proto, &inany_loopback4, NULL, spec); } if (fwd->caps & FWD_CAP_IPV6) { - fwd_rule_parse_ports(fwd, proto, + fwd_rule_parse_ports(fwd, del, proto, &inany_loopback6, NULL, spec); } @@ -653,7 +722,7 @@ void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) optname, optarg); }
- fwd_rule_parse_ports(fwd, proto, addr, ifname, spec); + fwd_rule_parse_ports(fwd, del, proto, addr, ifname, spec); }
/** diff --git a/fwd_rule.h b/fwd_rule.h index f43b37d..ae9a3cb 100644 --- a/fwd_rule.h +++ b/fwd_rule.h @@ -100,9 +100,11 @@ void fwd_probe_ephemeral(void);
const union inany_addr *fwd_rule_addr(const struct fwd_rule *rule); const char *fwd_rule_fmt(const struct fwd_rule *rule, char *dst, size_t size); -void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd); +void fwd_rule_parse(char optname, bool del, const char *optarg, + struct fwd_table *fwd); int fwd_rule_read(int fd, struct fwd_rule *rule); int fwd_rule_write(int fd, const struct fwd_rule *rule); +void fwd_rule_clear(struct fwd_table *fwd); int fwd_rule_add(struct fwd_table *fwd, const struct fwd_rule *new);
/** diff --git a/pesto.1 b/pesto.1 index 1ea1d12..cd0f3dc 100644 --- a/pesto.1 +++ b/pesto.1 @@ -31,6 +31,42 @@ Be verbose. .BR \-h ", " \-\-help Display a help message and exit.
+.TP +.BR \-A ", " \-\-add +Add the port forwarding specifiers following this option to the current +forwarding table, rather than replacing it. + +This option can be given multiple times, as it might follow previous deletions +(see \fB--delete\fR below), and implies that all the specifiers following it, +before a further \fB--delete\fR option occurs, will be handled as additions. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + +.TP +.BR \-D ", " \-\-delete +Delete the port forwarding specifiers following this option from the current +forwarding table, rather than adding them it. + +This option can be given multiple times, as it might follow previous additions +(see \fB--add\fR above), and implies that all the specifiers following it, +before a further \fB--add\fR option occurs, will be handled as deletions. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + +.TP +.BR \-C ", " \-\-clear " " \fIpif +Clear the forwarding table associated to a given \fIpif\fR, that is, a +conceptual type of interface in \fBpasst\fR(1) or \fBpasta\fR(1) representing a +specific data path and direction. + +The available \fIpif\fR names can be obtained by querying the current forwarding +configuration, which can be done by calling \fBpesto\fR(1) without options. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + .TP .BR \-t ", " \-\-tcp-ports " " \fIspec Configure TCP port forwarding to guest or namespace. \fIspec\fR can be one of: @@ -162,6 +198,55 @@ Configure UDP port forwarding from target namespace to init namespace. .BR \-\-version Show version and exit.
+.SH NOTES + +.SS Adding, deleting, clearing rules + +The options \fB--add\fR, \fB--delete\fR, and \fB--clear\fR are handled as +sequential commands to manipulate the current forwarding tables. If none of them +is given, forwarding specifiers for a given table are intended as replacement of +the corresponding table. That is:
I thought we wanted to default to add, rather than replace. That seems both a little simpler to implement and agruably more likely to be what peopke want.
+.nf + pesto -t 1024 -U 1025 +.fi + +will \fBreplace\fR the current TCP inbound port forwarding table with a single +rule, forwarding port 1024, and will similarly replace the UDP outbound +forwarding table with a single forwarding rule for port 1025. This usage is a +short-hand form for: + +.nf + pesto -C HOST -t 1024 -C SPLICE -U 1025 +.fi + +The options \fB--add\fR and \fB--delete\fR are used to \fBadd new specific +rules or delete existing ones\fR, instead of replacing tables. For example: + +.nf + pesto -A -t 2000 -D -t 3000 -U 5000 +.fi + +will add a forwarding rule for inbound TCP port 2000, and delete inbound TCP +port 3000 as well as outbound UDP port 5000 from the existing set of rules. + +All these options are interpreted as sequential commands and can be arbitrarily +combined. For example: + +.nf + pesto -A -t 2000 -C HOST -A -T 3000 -t 2001 -D -u 5000 +.fi + +will, in order: + +.RS +- add inbound TCP port 2000 +- clear inbound ports, reverting the addition above +- add outbound TCP port 3000 +- add inbound TCP port 2001 +- delete inbound UDP port 5000 +.RE + .SH AUTHORS
Stefano Brivio
, diff --git a/pesto.c b/pesto.c index 73fdc39..143d4c6 100644 --- a/pesto.c +++ b/pesto.c @@ -55,6 +55,9 @@ static void usage(const char *name, FILE *f, int status) FPRINTF(f, "Usage: %s [OPTION]... PATH\n", name); FPRINTF(f, "\n" + " -A, --add Add following specifiers to forwards\n" + " -D, --delete Delete following specifiers instead\n" + " -C, --clear PIF Clear forwarding table for given PIF\n" " -t, --tcp-ports SPEC TCP inbound port forwarding\n" " can be specified multiple times\n" " SPEC can be:\n" @@ -298,6 +301,9 @@ int main(int argc, char **argv) {"debug", no_argument, NULL, 'd' }, {"help", no_argument, NULL, 'h' }, {"version", no_argument, NULL, 1 }, + {"add", no_argument, NULL, 'A' }, + {"delete", no_argument, NULL, 'D' }, + {"clear", required_argument, NULL, 'C' }, {"tcp-ports", required_argument, NULL, 't' }, {"udp-ports", required_argument, NULL, 'u' }, {"tcp-ns", required_argument, NULL, 'T' }, @@ -305,9 +311,11 @@ int main(int argc, char **argv) {"show", no_argument, NULL, 's' }, { 0 }, }; + enum { MODE_CLEAR, MODE_ADD, MODE_DEL } mode = MODE_CLEAR;
MODE_CLEAR doesn't make sense to me. Unlike add or del, clear is a once-off operation, it's not clear to me how it would affect the interpretation of -[tTuU].
+ bool inbound_cleared = false, outbound_cleared = false; struct pif_configuration *inbound, *outbound; + const char *optstring = "dhADC:t:u:T:U:s"; struct sockaddr_un a = { AF_UNIX, "" }; - const char *optstring = "dht:u:T:U:s"; struct configuration conf = { 0 }; bool update = false, show = false; struct pesto_hello hello; @@ -339,11 +347,16 @@ int main(int argc, char **argv) case -1: case 0: break; + case 'C': case 't': case 'u': case 'T': case 'U': - /* Parse these options after we've read state from passt/pasta */ + case 'A': + case 'D': + /* Parse these options after we've read state from + * passt/pasta + */ update = true; break; case 's': @@ -439,13 +452,38 @@ int main(int argc, char **argv) optname = getopt_long(argc, argv, optstring, options, NULL);
switch (optname) { + case 'A': + mode = MODE_ADD; + break; + case 'D': + mode = MODE_DEL; + break; + case 'C': + if (!strcmp(optarg, "HOST")) { + fwd_rule_clear(&inbound->fwd); + inbound_cleared = true; + } else if (!strcmp(optarg, "SPLICE")) { + fwd_rule_clear(&outbound->fwd);
outbound will be NULL if talking to passt, so this could SEGV.
+ outbound_cleared = true; + } else { + die("Unsupported pif name %s", optarg); + }
For the time being pesto is limited to a single "inbound" and single "outbound" table, simply because we haven't devised syntax for anything else. However, we don't actually need that for --clear. Since it already takes a pif name we can use pif_conf_by_name() to clear an arbitrary named pif's rules.
+ + break; case 't': case 'u': if (!inbound) { die("Can't use -%c, no inbound interface", optname); } - fwd_rule_parse(optname, optarg, &inbound->fwd); + + if (mode == MODE_CLEAR && !inbound_cleared) { + fwd_rule_clear(&inbound->fwd); + inbound_cleared = true; + } + + fwd_rule_parse(optname, mode == MODE_DEL, optarg, + &inbound->fwd); break; case 'T': case 'U': @@ -453,7 +491,14 @@ int main(int argc, char **argv) die("Can't use -%c, no outbound interface", optname); } - fwd_rule_parse(optname, optarg, &outbound->fwd); + + if (mode == MODE_CLEAR && !outbound_cleared) { + fwd_rule_clear(&outbound->fwd); + outbound_cleared = true; + } + + fwd_rule_parse(optname, mode == MODE_DEL, optarg, + &inbound->fwd); break; default: continue; -- 2.43.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, May 06, 2026 at 01:47:00AM +0200, Stefano Brivio wrote:
Changes in v8: * Implement --add, --delete, and --clear in 19/19, to add forwarding rules instead of replacing tables, delete existing rules, and explicitly clear tables * Address Laurent's comments for 15/19 and 17/19 * In 10/19, instead of passing SOCK_NONBLOCK to accept4(), explicitly set O_NONBLOCK on the listening socket. Using SOCK_NONBLOCK doesn't do what we want, as it results in setting O_NONBLOCK on the new socket rather than on the listening one * Note: 18/19 is left as it is, I didn't address pending comments yet * Note: this doesn't include yet changes for AppArmor and SELinux policies, as well as changes for the template Fedora spec file. I'm still working on them
I haven't re-reviewed the whole series, but these changes all seem good, with the exception of 19/19 and a few concerns on 10/19 which I've sent separate mails about.
Changes in v7: * Addressed comments from Laurent in 6/18, 8/18, 9/18, 10/18, 11/18, 12/18, 14/18, 15/18 (details in commit messages of single patches, before my Signed-off-by) * Note: this doesn't include yet --add and --delete, I'm still working on that
Changes in v6: * Addressed comments from Jon in 10/18, 11/18, 14/18, and 16/18 * Dodged all warnings from static checkers (Coverity Scan and clang-tidy) with changes in 10/18, 11/18, 16/18, and with a new patch, 18/18 * This does *not* include yet the implementation of --add and --delete switches for pesto as I originally intended, I'm rather far from being done with those. At the moment I just have a "mode selection" implementation for command line parsing but merging rules to / removing rules from / clearing the current table is something I barely started (and what I have at the moment isn't really valuable anyway)
David wrote:
--- Here's the next draft of dynamic configuration updates. This now can successfully update rules, though I've not tested it very extensively.
Patches 1..8/18 are preliminary reworks that make sense even without pesto - feel free to apply if you're happy with them. I don't think the rest should be applied yet; we need to at least harden it so passt can't be blocked indefinitely by a client which sends a partial update then waits.
Based on my earlier series reworking static checking invocation.
TODO: - Don't allow a client which sends a partial configuration then blocks also block passt - Allow pesto to clear existing configuration, not just add - Allow pesto selectively delete existing rules, not just add
Changes in v5: * If multiple clients connect at once, they're now blocked until the first one finishes, instead of later ones being discarded Changes in v4: * Merged with remainder of forward rule parsing rework series * Fix some bugs in rule checking pointed out by Laurent * Significantly cleaned up option parsing code * Changed from replacing all existing rules to adding new rules (clear and remove still TBD) * Somewhat simplified protocol (pif names and rules sent in a single pass) * pesto is now allocation free * Fixed commit message and style nits pointed out by Stefano Changes in v3: * Removed already applied ASSERT() rename * Renamed serialisation functions * Incorporated Stefano's extensions, reworked and fixed * Several additional cleanups / preliminary reworks Changes in v2: * Removed already applied cleanups * Reworked assert() patch to handle -DNDEBUG properly * Numerous extra patches: * Factored out serialisation helpers and use them for migration as well * Reworked to allow ip.[ch] and inany.[ch] to be shared with pesto * Reworks to share some forwarding rule datatypes with pesto * Implemented sending pif names and current ruleset to pesto ---
David Gibson (17): conf, fwd: Stricter rule checking in fwd_rule_add() fwd_rule: Move ephemeral port probing to fwd_rule.c fwd, conf: Move rule parsing code to fwd_rule.[ch] fwd_rule: Move conflict checking back within fwd_rule_add() fwd: Generalise fwd_rules_info() pif: Limit pif names to 128 bytes fwd_rule: Fix some format specifiers pesto: Introduce stub configuration tool pesto, log: Share log.h (but not log.c) with pesto tool pesto, conf: Have pesto connect to passt and check versions pesto: Expose list of pifs to pesto and display them ip: Prepare ip.[ch] for sharing with pesto tool inany: Prepare inany.[ch] for sharing with pesto tool pesto: Read current ruleset from passt/pasta and optionally display it pesto: Parse and add new rules from command line pesto, conf: Send updated rules from pesto back to passt/pasta conf, fwd: Allow switching to new rules received from pesto
Stefano Brivio (2): fwd_rule: Fix static checkers warnings in fwd_rule_add() pesto, conf, fwd_rule: Add options and modes to add, delete, clear rules
.gitignore | 2 + Makefile | 53 ++-- common.h | 116 +++++++++ conf.c | 696 ++++++++++++++++++++++----------------------------- conf.h | 2 + epoll_type.h | 4 + flow.c | 4 +- fwd.c | 169 ++++--------- fwd.h | 41 +-- fwd_rule.c | 680 +++++++++++++++++++++++++++++++++++++++++++++++-- fwd_rule.h | 68 ++++- inany.c | 19 +- inany.h | 17 +- ip.c | 56 +---- ip.h | 4 +- lineread.c | 2 +- log.h | 53 +++- passt.1 | 5 + passt.c | 8 + passt.h | 8 + pesto.1 | 271 ++++++++++++++++++++ pesto.c | 520 ++++++++++++++++++++++++++++++++++++++ pesto.h | 54 ++++ pif.c | 2 +- pif.h | 7 +- serialise.c | 7 + serialise.h | 1 + siphash.h | 13 + tap.c | 52 ++++ util.h | 110 +------- 30 files changed, 2252 insertions(+), 792 deletions(-) create mode 100644 common.h create mode 100644 pesto.1 create mode 100644 pesto.c create mode 100644 pesto.h
-- 2.43.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On 5/6/26 07:38, David Gibson wrote:
On Wed, May 06, 2026 at 01:47:10AM +0200, Stefano Brivio wrote:
From: David Gibson
Start implementing pesto in earnest. Create a control/configuration socket in passt. Have pesto connect to it and retrieve a server greeting Perform some basic version checking.
Signed-off-by: David Gibson
[sbrivio: Avoid potential recursive calling between conf_accept() and conf_close(), reported by clang-tidy] Huh. For some reason that warning didn't trip for me. Although it's technically true they can mutually recurse, I believe they're both tail calls, so it shouldn't eat the stack.
[sbrivio: In conf(), check we're not exceeding sizeof(c->control_path) instead of sizeof(c->socket_path), and, in pesto's main(), print argv[optind] instead of argv[1] to indicate an invalid socket path, both reported by Jon Maloy] [sbrivio: In pesto's main(), drop unnecessary newline from error message, reported by Laurent] [sbrivio: Don't use SOCK_NONBLOCK on accept4(), as that only applies to the *new* file descriptor, which we don't want -- set O_NONBLOCK on the listening file descriptor using fcntl()]
Making the new (accepted) socket non-blocking was the intended behaviour here. We also want non-blocking for the listening socket, but that was already done in feab892c7 ("tap, repair: Use SOCK_NONBLOCK and SOCK_CLOEXEC on Unix sockets").
This patch has been dropped since v6 of the series. Thanks, Laurent
On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
This adds parsing of options using fwd_rule_parse(), validates them and adds them to the existing rules. It doesn't yet send those rules back to passt or pasta.
Message-ID: <20260322141843.4095972-3-sbrivio@redhat.com> [dwg: Based on an early draft by Stefano] Signed-off-by: David Gibson
[sbrivio: Recycled usage messages for -T and -U from conf.c as suggested by Laurent, dropped unrelated whitespace change] [sbrivio: Add description of -t, -u, -T, -U to pesto.1]
And -s, --show ?
Signed-off-by: Stefano Brivio
Reviewed-by: Laurent Vivier --- Makefile | 1 + fwd_rule.c | 2 +- fwd_rule.h | 1 + pesto.1 | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++++ pesto.c | 111 ++++++++++++++++++++++++++++++++++++++++++++-- 5 files changed, 237 insertions(+), 5 deletions(-)
On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
We can now receive updates to the forwarding rules from the pesto client and store them in a "pending" copy of the forwarding tables. Implement switching to using the new rules.
The logic is in a new fwd_listen_switch(). For now this closes all listening sockets related to the old tables, swaps the active and pending tables, then listens based on the new tables. In future we look to improve this so that we don't temporarily stop listening on ports that both the old and new tables specify.
Signed-off-by: David Gibson
[sbrivio: In fwd_listen_switch(), use the destination size as argument to memcpy(), instead of sizeof(tmp), as suggested by Laurent] Signed-off-by: Stefano Brivio
Reviewed-by: Laurent Vivier
--- conf.c | 5 ++--- fwd.c | 34 ++++++++++++++++++++++++++++++++++ fwd.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-)
diff --git a/conf.c b/conf.c index 76344da..3f48793 100644 --- a/conf.c +++ b/conf.c @@ -2160,15 +2160,14 @@ void conf_handler(struct ctx *c, uint32_t events) fwd_rules_dump(info, fwd->rules, fwd->count, " ", ""); } + + fwd_listen_switch(c); }
if (events & EPOLLHUP) { debug("Configuration client hangup"); - goto close; }
- return; - close: conf_close(c);
diff --git a/fwd.c b/fwd.c index d93d2e5..0697435 100644 --- a/fwd.c +++ b/fwd.c @@ -534,6 +534,40 @@ int fwd_listen_init(const struct ctx *c) return 0; }
+/** + * fwd_listen_switch() - Switch from current to pending rules table + * @c: Execution context + */ +void fwd_listen_switch(struct ctx *c) +{ + struct fwd_table *tmp[PIF_NUM_TYPES]; + unsigned i; + + /* Stop listening on the old tables */ + for (i = 0; i < PIF_NUM_TYPES; i++) { + struct fwd_table *fwd = c->fwd[i]; + + if (!fwd) + continue; + + debug("Flushing %u old %s rules", fwd->count, pif_name(i)); + fwd_listen_close(fwd); + fwd->count = fwd->sock_count = 0; + } + + /* Swap active and pending tables */ + static_assert(sizeof(tmp) == sizeof(c->fwd) && + sizeof(tmp) == sizeof(c->fwd_pending), + "Temporary has wrong size"); + memcpy(&tmp, (void *)c->fwd, sizeof(tmp)); + memcpy((void *)c->fwd, (void *)c->fwd_pending, sizeof(c->fwd)); + memcpy((void *)c->fwd_pending, &tmp, sizeof(c->fwd_pending)); + + /* Start listening on the new tables */ + if (fwd_listen_init(c) < 0) + err("Error switching to new forwarding rules"); +} + /* See enum in kernel's include/net/tcp_states.h */ #define UDP_LISTEN 0x07 #define TCP_LISTEN 0x0a diff --git a/fwd.h b/fwd.h index ac24782..b60697d 100644 --- a/fwd.h +++ b/fwd.h @@ -61,6 +61,7 @@ int fwd_listen_sync(const struct ctx *c, uint8_t pif, const struct fwd_scan *tcp, const struct fwd_scan *udp); void fwd_listen_close(const struct fwd_table *fwd); int fwd_listen_init(const struct ctx *c); +void fwd_listen_switch(struct ctx *c);
bool nat_inbound(const struct ctx *c, const union inany_addr *addr, union inany_addr *translated);
On 5/6/26 01:47, Stefano Brivio wrote:
The new checks are actually sufficient but not enough for Coverity Scan. Now that fwd->sock_count and new->last are affected or supplied by clients, we need explicit (albeit redundant) checks on them.
Signed-off-by: Stefano Brivio
Reviewed-by: Laurent Vivier
--- fwd_rule.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/fwd_rule.c b/fwd_rule.c index b55e4df..03e8e80 100644 --- a/fwd_rule.c +++ b/fwd_rule.c @@ -271,13 +271,22 @@ int fwd_rule_add(struct fwd_table *fwd, const struct fwd_rule *new) warn("Too many rules (maximum %d)", ARRAY_SIZE(fwd->rules)); return -ENOSPC; } + if ((fwd->sock_count + num) > ARRAY_SIZE(fwd->socks)) { warn("Rules require too many listening sockets (maximum %d)", ARRAY_SIZE(fwd->socks)); return -ENOSPC; } + /* Redundant, to make static checkers happy */ + if (fwd->sock_count > ARRAY_SIZE(fwd->socks)) + return -ENOSPC;
fwd->rulesocks[fwd->count] = &fwd->socks[fwd->sock_count]; + + /* Redundant ('num' checked above), but not for static checkers */ + if (new->last > ARRAY_SIZE(fwd->socks) + new->first) + return -ENOSPC; + for (port = new->first; port <= new->last; port++) fwd->rulesocks[fwd->count][port - new->first] = -1;
On Wed, May 06, 2026 at 09:06:05AM +0200, Laurent Vivier wrote:
On 5/6/26 07:38, David Gibson wrote:
On Wed, May 06, 2026 at 01:47:10AM +0200, Stefano Brivio wrote:
From: David Gibson
Start implementing pesto in earnest. Create a control/configuration socket in passt. Have pesto connect to it and retrieve a server greeting Perform some basic version checking.
Signed-off-by: David Gibson
[sbrivio: Avoid potential recursive calling between conf_accept() and conf_close(), reported by clang-tidy] Huh. For some reason that warning didn't trip for me. Although it's technically true they can mutually recurse, I believe they're both tail calls, so it shouldn't eat the stack.
[sbrivio: In conf(), check we're not exceeding sizeof(c->control_path) instead of sizeof(c->socket_path), and, in pesto's main(), print argv[optind] instead of argv[1] to indicate an invalid socket path, both reported by Jon Maloy] [sbrivio: In pesto's main(), drop unnecessary newline from error message, reported by Laurent] [sbrivio: Don't use SOCK_NONBLOCK on accept4(), as that only applies to the *new* file descriptor, which we don't want -- set O_NONBLOCK on the listening file descriptor using fcntl()]
Making the new (accepted) socket non-blocking was the intended behaviour here. We also want non-blocking for the listening socket, but that was already done in feab892c7 ("tap, repair: Use SOCK_NONBLOCK and SOCK_CLOEXEC on Unix sockets").
This patch has been dropped since v6 of the series.
Oh, right. Sorry, I forgot it had been part of this series; thought it was already merged from an earlier series. What was the reason for dropping it? -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 6 May 2026 15:38:30 +1000
David Gibson
On Wed, May 06, 2026 at 01:47:10AM +0200, Stefano Brivio wrote:
From: David Gibson
Start implementing pesto in earnest. Create a control/configuration socket in passt. Have pesto connect to it and retrieve a server greeting Perform some basic version checking.
Signed-off-by: David Gibson
[sbrivio: Avoid potential recursive calling between conf_accept() and conf_close(), reported by clang-tidy] Huh. For some reason that warning didn't trip for me. Although it's technically true they can mutually recurse, I believe they're both tail calls, so it shouldn't eat the stack.
[sbrivio: In conf(), check we're not exceeding sizeof(c->control_path) instead of sizeof(c->socket_path), and, in pesto's main(), print argv[optind] instead of argv[1] to indicate an invalid socket path, both reported by Jon Maloy] [sbrivio: In pesto's main(), drop unnecessary newline from error message, reported by Laurent] [sbrivio: Don't use SOCK_NONBLOCK on accept4(), as that only applies to the *new* file descriptor, which we don't want -- set O_NONBLOCK on the listening file descriptor using fcntl()]
Making the new (accepted) socket non-blocking was the intended behaviour here. We also want non-blocking for the listening socket, but that was already done in feab892c7 ("tap, repair: Use SOCK_NONBLOCK and SOCK_CLOEXEC on Unix sockets").
Oops, now that Laurent mentioned it, I realised I dropped it accidentally while / after debugging things on v6, and:
WIth the current design, I guess we don't want non-blocking on the accepted socket, although I don't think it actually matters very much.
...this is the issue I was trying to fix: if the accepted socket is non-blocking, messages are cut short sometimes, and in general things don't work. I don't remember if this was while testing things on Fedora or Debian, it only happened in one of the two environments. So, while it was accidental (I really didn't leave any note for a cover letter, so I'm almost certain there was no other reason for me to drop it), I think it's actually a good idea to drop it for the following reasons: - O_NONBLOCK on the accepted socket breaks things - the rest looks correct to me but fairly out of scope, and I have very limited time for testing things in detail right now, so I'd rather keep that patch for later Without that patch, and my follow-up change to this patch, we're just adding two lines for this specific behaviour, instead of 18.
We will want non-blocking it when we change this to read out the updated rules incrementally, rather than all at once.
Right, so maybe we can keep that patch for that moment. Or even before, again, I think it's all correct and desirable, just out of scope (this isn't the reason why I dropped it though, I would have written a note). -- Stefano
On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
We can now receive updates to the forwarding rules from the pesto client and store them in a "pending" copy of the forwarding tables. Implement switching to using the new rules.
The logic is in a new fwd_listen_switch(). For now this closes all listening sockets related to the old tables, swaps the active and pending tables, then listens based on the new tables. In future we look to improve this so that we don't temporarily stop listening on ports that both the old and new tables specify.
Signed-off-by: David Gibson
[sbrivio: In fwd_listen_switch(), use the destination size as argument to memcpy(), instead of sizeof(tmp), as suggested by Laurent] Signed-off-by: Stefano Brivio --- conf.c | 5 ++--- fwd.c | 34 ++++++++++++++++++++++++++++++++++ fwd.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/conf.c b/conf.c index 76344da..3f48793 100644 --- a/conf.c +++ b/conf.c @@ -2160,15 +2160,14 @@ void conf_handler(struct ctx *c, uint32_t events) fwd_rules_dump(info, fwd->rules, fwd->count, " ", ""); } + + fwd_listen_switch(c); }
if (events & EPOLLHUP) { debug("Configuration client hangup"); - goto close; }
- return; - close: conf_close(c);
diff --git a/fwd.c b/fwd.c index d93d2e5..0697435 100644 --- a/fwd.c +++ b/fwd.c @@ -534,6 +534,40 @@ int fwd_listen_init(const struct ctx *c) return 0; }
+/** + * fwd_listen_switch() - Switch from current to pending rules table + * @c: Execution context + */ +void fwd_listen_switch(struct ctx *c) +{ + struct fwd_table *tmp[PIF_NUM_TYPES]; + unsigned i; + + /* Stop listening on the old tables */ + for (i = 0; i < PIF_NUM_TYPES; i++) { + struct fwd_table *fwd = c->fwd[i]; + + if (!fwd) + continue; + + debug("Flushing %u old %s rules", fwd->count, pif_name(i)); + fwd_listen_close(fwd); + fwd->count = fwd->sock_count = 0; + } + + /* Swap active and pending tables */ + static_assert(sizeof(tmp) == sizeof(c->fwd) && + sizeof(tmp) == sizeof(c->fwd_pending), + "Temporary has wrong size");
At this point: c->fwd[PIF_HOST] = &fwd_in; c->fwd[PIF_SPLICE] = &fwd_out; c->fwd_pending[PIF_HOST] = &fwd_in_pending; c->fwd_pending[PIF_SPLICE] = &fwd_out_pending;
+ memcpy(&tmp, (void *)c->fwd, sizeof(tmp)); + memcpy((void *)c->fwd, (void *)c->fwd_pending, sizeof(c->fwd)); + memcpy((void *)c->fwd_pending, &tmp, sizeof(c->fwd_pending));
At this point: c->fwd[PIF_HOST] = &fwd_in_pending; c->fwd[PIF_SPLICE] = &fwd_out_pending; c->fwd_pending[PIF_HOST] = &fwd_in; c->fwd_pending[PIF_SPLICE] = &fwd_out; Perhaps it should be noted somewhere to avoid confusion in the future? Or to copy the content of the rules rather than the pointer to the rules? Thanks, Laurent
On Wed, May 06, 2026 at 09:55:32AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 15:38:30 +1000 David Gibson
wrote: On Wed, May 06, 2026 at 01:47:10AM +0200, Stefano Brivio wrote:
From: David Gibson
Start implementing pesto in earnest. Create a control/configuration socket in passt. Have pesto connect to it and retrieve a server greeting Perform some basic version checking.
Signed-off-by: David Gibson
[sbrivio: Avoid potential recursive calling between conf_accept() and conf_close(), reported by clang-tidy] Huh. For some reason that warning didn't trip for me. Although it's technically true they can mutually recurse, I believe they're both tail calls, so it shouldn't eat the stack.
[sbrivio: In conf(), check we're not exceeding sizeof(c->control_path) instead of sizeof(c->socket_path), and, in pesto's main(), print argv[optind] instead of argv[1] to indicate an invalid socket path, both reported by Jon Maloy] [sbrivio: In pesto's main(), drop unnecessary newline from error message, reported by Laurent] [sbrivio: Don't use SOCK_NONBLOCK on accept4(), as that only applies to the *new* file descriptor, which we don't want -- set O_NONBLOCK on the listening file descriptor using fcntl()]
Making the new (accepted) socket non-blocking was the intended behaviour here. We also want non-blocking for the listening socket, but that was already done in feab892c7 ("tap, repair: Use SOCK_NONBLOCK and SOCK_CLOEXEC on Unix sockets").
Oops, now that Laurent mentioned it, I realised I dropped it accidentally while / after debugging things on v6, and:
Ah, right.
WIth the current design, I guess we don't want non-blocking on the accepted socket, although I don't think it actually matters very much.
...this is the issue I was trying to fix: if the accepted socket is non-blocking, messages are cut short sometimes, and in general things don't work.
Hrm. I was pretty sure setting it blocking just meant you'd always get *some* data instead of EAGAIN. I don't believe it prevents either short reads or short writes. Both sides should already be using {read,write}_all_buf() to handle short read/writes, so I'm really not sure where it's going wrong if the accepted socket is non-blocking (other than maybe spinning on EAGAIN more than we'd like).
I don't remember if this was while testing things on Fedora or Debian, it only happened in one of the two environments.
So, while it was accidental (I really didn't leave any note for a cover letter, so I'm almost certain there was no other reason for me to drop it), I think it's actually a good idea to drop it for the following reasons:
- O_NONBLOCK on the accepted socket breaks things
The earlier patch doesn't affect O_NONBLOCK on the accepted socket, only on the listening socket (it's not inherited).
- the rest looks correct to me but fairly out of scope, and I have very limited time for testing things in detail right now, so I'd rather keep that patch for later
It's in scope because O_NONBLOCK on the listening socket is essential to implementing the "concurrent client blocks" instead of "concurrent client is rejected" behaviour. Without O_NONBLOCK on the listening socket, we can't safely call conf_accept() anywhere other than in response to the epoll event - because looking for additional connections after we close one could block.
Without that patch, and my follow-up change to this patch, we're just adding two lines for this specific behaviour, instead of 18.
We will want non-blocking it when we change this to read out the updated rules incrementally, rather than all at once.
Right, so maybe we can keep that patch for that moment. Or even before,
"that patch" meaning the sock_unix() one? Again, that affects the listening socket behaviour. What we need for incrementally reading the rules is about the accepted socket behaviour.
again, I think it's all correct and desirable, just out of scope (this isn't the reason why I dropped it though, I would have written a note).
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 6 May 2026 16:45:27 +1000
David Gibson
On Wed, May 06, 2026 at 01:47:19AM +0200, Stefano Brivio wrote:
Instead of just being able to replace the existing forwarding table,
As of my last version, we already added, rather than replacing.
Right, I noticed that, but this isn't the default behaviour we discussed, so I assumed it was accidental, and planned to go back and check the reason why. Given that it wasn't accidental, I'll simply adjust this part of the commit message.
implement --add and --delete options to maintain the table and add or delete specific ports.
The option --clear PIF forces the clearing of a table, instead.
These options can be combined arbitrarily and are handled as sequential commands, as now described in pesto(1).
If no option is given before forwarding specifiers for a matching table, the command line is interpreted as a replacement of the existing rules.
To this end:
- there's no protocol change, as pesto is anyway sending updated copies of the table
- the forwarding table functions now include a new fwd_rule_del(), which deletes existing rule only if a matching one is found
- a trivial fwd_rule_clear() is factored out from the existing conf_handler() implementation, so that it can be directly used in pesto
The entry points for parsing of port specifiers now take an additional 'del' parameter which is passed down all the way before reaching the fwd_rule_add() implementation. If a rule should be deleted, at that point, fwd_rule_del() is called instead.
Signed-off-by: Stefano Brivio
--- conf.c | 26 ++++++---------- fwd_rule.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++------- fwd_rule.h | 4 ++- pesto.1 | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++ pesto.c | 53 ++++++++++++++++++++++++++++--- 5 files changed, 227 insertions(+), 32 deletions(-) diff --git a/conf.c b/conf.c index 3f48793..909c34c 100644 --- a/conf.c +++ b/conf.c @@ -1849,16 +1849,16 @@ void conf(struct ctx *c, int argc, char **argv)
if (name == 't') { opt_t = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'u') { opt_u = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'T') { opt_T = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } else if (name == 'U') { opt_U = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } } while (name != -1);
@@ -1910,13 +1910,13 @@ void conf(struct ctx *c, int argc, char **argv)
if (c->mode == MODE_PASTA) { if (!opt_t) - fwd_rule_parse('t', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('t', false, "auto", c->fwd[PIF_HOST]); if (!opt_T) - fwd_rule_parse('T', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('T', false, "auto", c->fwd[PIF_SPLICE]); if (!opt_u) - fwd_rule_parse('u', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('u', false, "auto", c->fwd[PIF_HOST]); if (!opt_U) - fwd_rule_parse('U', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('U', false, "auto", c->fwd[PIF_SPLICE]); }
conf_sock_listen(c); @@ -2135,14 +2135,8 @@ void conf_handler(struct ctx *c, uint32_t events) unsigned pif;
/* Clear pending tables */ - for (pif = 0; pif < PIF_NUM_TYPES; pif++) { - struct fwd_table *fwd = c->fwd_pending[pif]; - - if (!fwd) - continue; - fwd->count = 0; - fwd->sock_count = 0; - } + for (pif = 0; pif < PIF_NUM_TYPES; pif++) + fwd_rule_clear(c->fwd_pending[pif]);
/* FIXME: this could block indefinitely if the client doesn't * write as much as it should diff --git a/fwd_rule.c b/fwd_rule.c index 03e8e80..eb9a601 100644 --- a/fwd_rule.c +++ b/fwd_rule.c @@ -180,6 +180,66 @@ static bool fwd_rule_conflicts(const struct fwd_rule *a, const struct fwd_rule * return true; }
+/** + * fwd_rule_clear() - Clear a forwarding table + * @fwd: Table to clear (might be NULL) + */ +void fwd_rule_clear(struct fwd_table *fwd) +{ + if (!fwd) + return; +
Not essential, but I wonder if it would be wise to verify that there are no currently open sockets associated with any of the rules.
With a loop, I suppose. I can add it as a TODO comment because I guess it would be good to handle that case (open sockets left) for fwd_rule_del() as well, and a part of the implementation can probably be common.
+ fwd->count = 0; + fwd->sock_count = 0; +} + +/** + * fwd_rule_del() - Partially validate and delete a rule from a forwarding table + * @fwd: Table to delete from + * @rule: Rule to delete (must match an existing rule) + * + * Return: 0 on success, negative error code on failure (-ENOENT if not found) + * + * NOTE: This function can't be used for a forwarding table with valid sockets + * stored in fwd->rulesocks.
The exact meaning of this isn't very clear to me. Does "valid" mean "open" or something else?
It means valid at some point, not necessarily open right now. I'll change it to "open" for clarity.
I think what you're getting at is that every entry in fwd->socks[] must be -1. Or at least every entry with index in [0,sock_count)
Yes.
+ */ +static int fwd_rule_del(struct fwd_table *fwd, const struct fwd_rule *rule) +{ + unsigned num, i; + + for (i = 0; i < fwd->count; i++) { + if (fwd_rule_conflicts(rule, &fwd->rules[i])) + break; + }
So, this deletes any conflicting rule, not only exact matches. That's not very clear from the description of @rule.
It deletes the first one (but given that fwd_rule_conflicts() is called on insertion, there should be a single one). It's good enough for our purposes right now, even though we might want to make that more sophisticated in the future. I'll change the description of @rule.
+ + if (i == fwd->count) { + char newstr[FWD_RULE_STRLEN]; + + warn("Couldn't find forwarding rule to delete: %s", + fwd_rule_fmt(rule, newstr, sizeof(newstr))); + return -ENOENT; + } + + /* Don't use anything else from 'rule' as passed, it's not validated */ + rule = &fwd->rules[i]; + num = (unsigned)rule->last - rule->first + 1; + + fwd->count--; + + memmove((void *)(fwd->rulesocks + i), (void *)(fwd->rulesocks + i + 1),
I don't think the explicit (void *) casts are necessary - they should be implicit from memmove()s signature.
They aren't from a C point of view, but clang-tidy reports a bugprone-multi-level-implicit-pointer-conversion warning if I don't do that, because it's not obvious that we want the same kind of cast as the implicit one.
+ (fwd->count - i) * sizeof(*fwd->rulesocks));
Is memmove() guaranteed to be safe if given a zero length? That will occur if deleting the last entry.
In practice, this is quite commonly done in the Linux kernel and elsewhere and I never hit issues, against different C libraries. I'm actually not sure what issue I should get, C11 says that it "copies n characters from the object pointed to by s2 into the object pointed to by s1", and if "n" is 0, nothing is done. There's no specific mention of that in C11, but I don't think there needs to be one, unlike with realloc() with a zero size and possibly others problematic cases. So, yes, as far as I know it's "safe", but with no explicit guarantee (and I don't think any is needed).
+ /* TODO: move sockets stored starting from fwd->rulesocks[i + i], should + * we ever need to delete rules from a table with open sockets. + */ + fwd->sock_count -= num; + + memmove(fwd->rules + i, fwd->rules + i + 1, + (fwd->count - i) * sizeof(*fwd->rules)
Again, is this safe if i == fwd->count?
Same as above.
+ + return 0; +} + /** * fwd_rule_add() - Validate and add a rule to a forwarding table * @fwd: Table to add to @@ -368,6 +428,7 @@ static int parse_keyword(const char *s, const char **endptr, const char *kw) * fwd_rule_range_except() - Set up forwarding for a range of ports minus a * bitmap of exclusions * @fwd: Forwarding table to be updated + * @del: Delete resulting rules from forwarding table, instead of adding
Clunky, but it gets the job done.
* @proto: Protocol to forward * @addr: Listening address * @ifname: Listening interface @@ -377,8 +438,8 @@ static int parse_keyword(const char *s, const char **endptr, const char *kw) * @to: Port to translate @first to when forwarding * @flags: Flags for forwarding entries */ -static void fwd_rule_range_except(struct fwd_table *fwd, uint8_t proto, - const union inany_addr *addr, +static void fwd_rule_range_except(struct fwd_table *fwd, bool del, + uint8_t proto, const union inany_addr *addr, const char *ifname, uint16_t first, uint16_t last, const uint8_t *exclude, uint16_t to, @@ -418,8 +479,13 @@ static void fwd_rule_range_except(struct fwd_table *fwd, uint8_t proto, rule.last = i - 1; rule.to = base + delta;
- if (fwd_rule_add(fwd, &rule) < 0) - goto fail; + if (del) { + if (fwd_rule_del(fwd, &rule) < 0) + goto fail; + } else { + if (fwd_rule_add(fwd, &rule) < 0) + goto fail; + }
base = i - 1; } @@ -445,12 +511,13 @@ fail: /** * fwd_rule_parse_ports() - Parse port range(s) specifier * @fwd: Forwarding table to be updated + * @del: Delete resulting rules from forwarding table, instead of adding * @proto: Protocol to forward * @addr: Listening address for forwarding * @ifname: Interface name for listening * @spec: Port range(s) specifier */ -static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, +static void fwd_rule_parse_ports(struct fwd_table *fwd, bool del, uint8_t proto, const union inany_addr *addr, const char *ifname, const char *spec) @@ -507,7 +574,7 @@ static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, /* Exclude ephemeral ports */ fwd_port_map_ephemeral(exclude);
- fwd_rule_range_except(fwd, proto, addr, ifname, + fwd_rule_range_except(fwd, del, proto, addr, ifname, 1, NUM_PORTS - 1, exclude, 1, flags | FWD_WEAK); return; @@ -537,7 +604,7 @@ static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, if (p != ep) /* Garbage after the ranges */ goto bad;
- fwd_rule_range_except(fwd, proto, addr, ifname, + fwd_rule_range_except(fwd, del, proto, addr, ifname, orig_range.first, orig_range.last, exclude, mapped_range.first, flags); @@ -551,10 +618,12 @@ bad: /** * fwd_rule_parse() - Parse port configuration option * @optname: Short option name, t, T, u, or U + * @del: Delete resulting rules from forwarding table, instead of adding * @optarg: Option argument (port specification) * @fwd: Forwarding table to be updated */ -void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) +void fwd_rule_parse(char optname, bool del, const char *optarg, + struct fwd_table *fwd) { union inany_addr addr_buf = inany_any6, *addr = &addr_buf; char buf[BUFSIZ], *spec, *ifname = NULL; @@ -632,12 +701,12 @@ void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) optname, optarg);
if (fwd->caps & FWD_CAP_IPV4) { - fwd_rule_parse_ports(fwd, proto, + fwd_rule_parse_ports(fwd, del, proto, &inany_loopback4, NULL, spec); } if (fwd->caps & FWD_CAP_IPV6) { - fwd_rule_parse_ports(fwd, proto, + fwd_rule_parse_ports(fwd, del, proto, &inany_loopback6, NULL, spec); } @@ -653,7 +722,7 @@ void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) optname, optarg); }
- fwd_rule_parse_ports(fwd, proto, addr, ifname, spec); + fwd_rule_parse_ports(fwd, del, proto, addr, ifname, spec); }
/** diff --git a/fwd_rule.h b/fwd_rule.h index f43b37d..ae9a3cb 100644 --- a/fwd_rule.h +++ b/fwd_rule.h @@ -100,9 +100,11 @@ void fwd_probe_ephemeral(void);
const union inany_addr *fwd_rule_addr(const struct fwd_rule *rule); const char *fwd_rule_fmt(const struct fwd_rule *rule, char *dst, size_t size); -void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd); +void fwd_rule_parse(char optname, bool del, const char *optarg, + struct fwd_table *fwd); int fwd_rule_read(int fd, struct fwd_rule *rule); int fwd_rule_write(int fd, const struct fwd_rule *rule); +void fwd_rule_clear(struct fwd_table *fwd); int fwd_rule_add(struct fwd_table *fwd, const struct fwd_rule *new);
/** diff --git a/pesto.1 b/pesto.1 index 1ea1d12..cd0f3dc 100644 --- a/pesto.1 +++ b/pesto.1 @@ -31,6 +31,42 @@ Be verbose. .BR \-h ", " \-\-help Display a help message and exit.
+.TP +.BR \-A ", " \-\-add +Add the port forwarding specifiers following this option to the current +forwarding table, rather than replacing it. + +This option can be given multiple times, as it might follow previous deletions +(see \fB--delete\fR below), and implies that all the specifiers following it, +before a further \fB--delete\fR option occurs, will be handled as additions. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + +.TP +.BR \-D ", " \-\-delete +Delete the port forwarding specifiers following this option from the current +forwarding table, rather than adding them it. + +This option can be given multiple times, as it might follow previous additions +(see \fB--add\fR above), and implies that all the specifiers following it, +before a further \fB--add\fR option occurs, will be handled as deletions. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + +.TP +.BR \-C ", " \-\-clear " " \fIpif +Clear the forwarding table associated to a given \fIpif\fR, that is, a +conceptual type of interface in \fBpasst\fR(1) or \fBpasta\fR(1) representing a +specific data path and direction. + +The available \fIpif\fR names can be obtained by querying the current forwarding +configuration, which can be done by calling \fBpesto\fR(1) without options. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + .TP .BR \-t ", " \-\-tcp-ports " " \fIspec Configure TCP port forwarding to guest or namespace. \fIspec\fR can be one of: @@ -162,6 +198,55 @@ Configure UDP port forwarding from target namespace to init namespace. .BR \-\-version Show version and exit.
+.SH NOTES + +.SS Adding, deleting, clearing rules + +The options \fB--add\fR, \fB--delete\fR, and \fB--clear\fR are handled as +sequential commands to manipulate the current forwarding tables. If none of them +is given, forwarding specifiers for a given table are intended as replacement of +the corresponding table. That is:
I thought we wanted to default to add, rather than replace. That seems both a little simpler to implement and agruably more likely to be what peopke want.
We previously discussed we would default to replace for brevity, because otherwise users would need to explicitly clear tables, but also because it's consistent with: - the idea that we replace the current table - the existing command line syntax that just "programs" a new table. That is, the cost of typing: -A -t 22 is marginally higher compared to: -t 22 but: -C HOST -t 22 in case a replacement is desired looks substantially more to me. It's really not that important at the moment as Podman will anyway explicitly pass options. Of course it's not a great idea to change this later for non-automated users, but in any case I think the reason I explained above (and we discussed in the past) are still valid.
+.nf + pesto -t 1024 -U 1025 +.fi + +will \fBreplace\fR the current TCP inbound port forwarding table with a single +rule, forwarding port 1024, and will similarly replace the UDP outbound +forwarding table with a single forwarding rule for port 1025. This usage is a +short-hand form for: + +.nf + pesto -C HOST -t 1024 -C SPLICE -U 1025 +.fi + +The options \fB--add\fR and \fB--delete\fR are used to \fBadd new specific +rules or delete existing ones\fR, instead of replacing tables. For example: + +.nf + pesto -A -t 2000 -D -t 3000 -U 5000 +.fi + +will add a forwarding rule for inbound TCP port 2000, and delete inbound TCP +port 3000 as well as outbound UDP port 5000 from the existing set of rules. + +All these options are interpreted as sequential commands and can be arbitrarily +combined. For example: + +.nf + pesto -A -t 2000 -C HOST -A -T 3000 -t 2001 -D -u 5000 +.fi + +will, in order: + +.RS +- add inbound TCP port 2000 +- clear inbound ports, reverting the addition above +- add outbound TCP port 3000 +- add inbound TCP port 2001 +- delete inbound UDP port 5000 +.RE + .SH AUTHORS
Stefano Brivio
, diff --git a/pesto.c b/pesto.c index 73fdc39..143d4c6 100644 --- a/pesto.c +++ b/pesto.c @@ -55,6 +55,9 @@ static void usage(const char *name, FILE *f, int status) FPRINTF(f, "Usage: %s [OPTION]... PATH\n", name); FPRINTF(f, "\n" + " -A, --add Add following specifiers to forwards\n" + " -D, --delete Delete following specifiers instead\n" + " -C, --clear PIF Clear forwarding table for given PIF\n" " -t, --tcp-ports SPEC TCP inbound port forwarding\n" " can be specified multiple times\n" " SPEC can be:\n" @@ -298,6 +301,9 @@ int main(int argc, char **argv) {"debug", no_argument, NULL, 'd' }, {"help", no_argument, NULL, 'h' }, {"version", no_argument, NULL, 1 }, + {"add", no_argument, NULL, 'A' }, + {"delete", no_argument, NULL, 'D' }, + {"clear", required_argument, NULL, 'C' }, {"tcp-ports", required_argument, NULL, 't' }, {"udp-ports", required_argument, NULL, 'u' }, {"tcp-ns", required_argument, NULL, 'T' }, @@ -305,9 +311,11 @@ int main(int argc, char **argv) {"show", no_argument, NULL, 's' }, { 0 }, }; + enum { MODE_CLEAR, MODE_ADD, MODE_DEL } mode = MODE_CLEAR; MODE_CLEAR doesn't make sense to me. Unlike add or del, clear is a once-off operation, it's not clear to me how it would affect the interpretation of -[tTuU].
It needs to be a mode, and the default one, to make sure we default to replacing tables for a given direction of forwarding. It's not exactly one-off as it's a starting mode until -A or -D are given. The interpretation is as described in the man page.
+ bool inbound_cleared = false, outbound_cleared = false; struct pif_configuration *inbound, *outbound; + const char *optstring = "dhADC:t:u:T:U:s"; struct sockaddr_un a = { AF_UNIX, "" }; - const char *optstring = "dht:u:T:U:s"; struct configuration conf = { 0 }; bool update = false, show = false; struct pesto_hello hello; @@ -339,11 +347,16 @@ int main(int argc, char **argv) case -1: case 0: break; + case 'C': case 't': case 'u': case 'T': case 'U': - /* Parse these options after we've read state from passt/pasta */ + case 'A': + case 'D': + /* Parse these options after we've read state from + * passt/pasta + */ update = true; break; case 's': @@ -439,13 +452,38 @@ int main(int argc, char **argv) optname = getopt_long(argc, argv, optstring, options, NULL);
switch (optname) { + case 'A': + mode = MODE_ADD; + break; + case 'D': + mode = MODE_DEL; + break; + case 'C': + if (!strcmp(optarg, "HOST")) { + fwd_rule_clear(&inbound->fwd); + inbound_cleared = true; + } else if (!strcmp(optarg, "SPLICE")) { + fwd_rule_clear(&outbound->fwd);
outbound will be NULL if talking to passt, so this could SEGV.
Ah, right, nice catch, I'll fix that in v9.
+ outbound_cleared = true; + } else { + die("Unsupported pif name %s", optarg); + }
For the time being pesto is limited to a single "inbound" and single "outbound" table, simply because we haven't devised syntax for anything else. However, we don't actually need that for --clear. Since it already takes a pif name we can use pif_conf_by_name() to clear an arbitrary named pif's rules.
I'll change that in v9, assuming it works (I haven't tried yet).
+ + break; case 't': case 'u': if (!inbound) { die("Can't use -%c, no inbound interface", optname); } - fwd_rule_parse(optname, optarg, &inbound->fwd); + + if (mode == MODE_CLEAR && !inbound_cleared) { + fwd_rule_clear(&inbound->fwd); + inbound_cleared = true; + } + + fwd_rule_parse(optname, mode == MODE_DEL, optarg, + &inbound->fwd); break; case 'T': case 'U': @@ -453,7 +491,14 @@ int main(int argc, char **argv) die("Can't use -%c, no outbound interface", optname); } - fwd_rule_parse(optname, optarg, &outbound->fwd); + + if (mode == MODE_CLEAR && !outbound_cleared) { + fwd_rule_clear(&outbound->fwd); + outbound_cleared = true; + } + + fwd_rule_parse(optname, mode == MODE_DEL, optarg, + &inbound->fwd); break; default: continue; -- 2.43.0
-- Stefano
On Wed, May 06, 2026 at 10:12:21AM +0200, Laurent Vivier wrote:
On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
We can now receive updates to the forwarding rules from the pesto client and store them in a "pending" copy of the forwarding tables. Implement switching to using the new rules.
The logic is in a new fwd_listen_switch(). For now this closes all listening sockets related to the old tables, swaps the active and pending tables, then listens based on the new tables. In future we look to improve this so that we don't temporarily stop listening on ports that both the old and new tables specify.
Signed-off-by: David Gibson
[sbrivio: In fwd_listen_switch(), use the destination size as argument to memcpy(), instead of sizeof(tmp), as suggested by Laurent] Signed-off-by: Stefano Brivio --- conf.c | 5 ++--- fwd.c | 34 ++++++++++++++++++++++++++++++++++ fwd.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/conf.c b/conf.c index 76344da..3f48793 100644 --- a/conf.c +++ b/conf.c @@ -2160,15 +2160,14 @@ void conf_handler(struct ctx *c, uint32_t events) fwd_rules_dump(info, fwd->rules, fwd->count, " ", ""); } + + fwd_listen_switch(c); } if (events & EPOLLHUP) { debug("Configuration client hangup"); - goto close; } - return; - close: conf_close(c); diff --git a/fwd.c b/fwd.c index d93d2e5..0697435 100644 --- a/fwd.c +++ b/fwd.c @@ -534,6 +534,40 @@ int fwd_listen_init(const struct ctx *c) return 0; } +/** + * fwd_listen_switch() - Switch from current to pending rules table + * @c: Execution context + */ +void fwd_listen_switch(struct ctx *c) +{ + struct fwd_table *tmp[PIF_NUM_TYPES]; + unsigned i; + + /* Stop listening on the old tables */ + for (i = 0; i < PIF_NUM_TYPES; i++) { + struct fwd_table *fwd = c->fwd[i]; + + if (!fwd) + continue; + + debug("Flushing %u old %s rules", fwd->count, pif_name(i)); + fwd_listen_close(fwd); + fwd->count = fwd->sock_count = 0; + } + + /* Swap active and pending tables */ + static_assert(sizeof(tmp) == sizeof(c->fwd) && + sizeof(tmp) == sizeof(c->fwd_pending), + "Temporary has wrong size");
At this point:
c->fwd[PIF_HOST] = &fwd_in; c->fwd[PIF_SPLICE] = &fwd_out;
c->fwd_pending[PIF_HOST] = &fwd_in_pending; c->fwd_pending[PIF_SPLICE] = &fwd_out_pending;
+ memcpy(&tmp, (void *)c->fwd, sizeof(tmp)); + memcpy((void *)c->fwd, (void *)c->fwd_pending, sizeof(c->fwd)); + memcpy((void *)c->fwd_pending, &tmp, sizeof(c->fwd_pending));
At this point:
c->fwd[PIF_HOST] = &fwd_in_pending; c->fwd[PIF_SPLICE] = &fwd_out_pending;
c->fwd_pending[PIF_HOST] = &fwd_in; c->fwd_pending[PIF_SPLICE] = &fwd_out;
Perhaps it should be noted somewhere to avoid confusion in the future? Or to copy the content of the rules rather than the pointer to the rules?
Ah, good point. The problem is the names of the globals - it's not really one "real" and one "pending" table, just two tables that switch between those roles. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 6 May 2026 18:21:32 +1000
David Gibson
On Wed, May 06, 2026 at 09:55:32AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 15:38:30 +1000 David Gibson
wrote: On Wed, May 06, 2026 at 01:47:10AM +0200, Stefano Brivio wrote:
From: David Gibson
Start implementing pesto in earnest. Create a control/configuration socket in passt. Have pesto connect to it and retrieve a server greeting Perform some basic version checking.
Signed-off-by: David Gibson
[sbrivio: Avoid potential recursive calling between conf_accept() and conf_close(), reported by clang-tidy] Huh. For some reason that warning didn't trip for me. Although it's technically true they can mutually recurse, I believe they're both tail calls, so it shouldn't eat the stack.
[sbrivio: In conf(), check we're not exceeding sizeof(c->control_path) instead of sizeof(c->socket_path), and, in pesto's main(), print argv[optind] instead of argv[1] to indicate an invalid socket path, both reported by Jon Maloy] [sbrivio: In pesto's main(), drop unnecessary newline from error message, reported by Laurent] [sbrivio: Don't use SOCK_NONBLOCK on accept4(), as that only applies to the *new* file descriptor, which we don't want -- set O_NONBLOCK on the listening file descriptor using fcntl()]
Making the new (accepted) socket non-blocking was the intended behaviour here. We also want non-blocking for the listening socket, but that was already done in feab892c7 ("tap, repair: Use SOCK_NONBLOCK and SOCK_CLOEXEC on Unix sockets").
Oops, now that Laurent mentioned it, I realised I dropped it accidentally while / after debugging things on v6, and:
Ah, right.
WIth the current design, I guess we don't want non-blocking on the accepted socket, although I don't think it actually matters very much.
...this is the issue I was trying to fix: if the accepted socket is non-blocking, messages are cut short sometimes, and in general things don't work.
Hrm. I was pretty sure setting it blocking just meant you'd always get *some* data instead of EAGAIN. I don't believe it prevents either short reads or short writes.
Maybe, but something caused actual problems for me, otherwise I wouldn't have played with it at all. The current behaviour with this patch is something I tested quite heavily by now.
Both sides should already be using {read,write}_all_buf() to handle short read/writes, so I'm really not sure where it's going wrong if the accepted socket is non-blocking (other than maybe spinning on EAGAIN more than we'd like).
I don't remember if this was while testing things on Fedora or Debian, it only happened in one of the two environments.
So, while it was accidental (I really didn't leave any note for a cover letter, so I'm almost certain there was no other reason for me to drop it), I think it's actually a good idea to drop it for the following reasons:
- O_NONBLOCK on the accepted socket breaks things
The earlier patch doesn't affect O_NONBLOCK on the accepted socket, only on the listening socket (it's not inherited).
- the rest looks correct to me but fairly out of scope, and I have very limited time for testing things in detail right now, so I'd rather keep that patch for later
It's in scope because O_NONBLOCK on the listening socket is essential
This can be implemented in two lines like the current version does. The rest is out of scope.
to implementing the "concurrent client blocks" instead of "concurrent client is rejected" behaviour. Without O_NONBLOCK on the listening socket, we can't safely call conf_accept() anywhere other than in response to the epoll event - because looking for additional connections after we close one could block.
Without that patch, and my follow-up change to this patch, we're just adding two lines for this specific behaviour, instead of 18.
We will want non-blocking it when we change this to read out the updated rules incrementally, rather than all at once.
Right, so maybe we can keep that patch for that moment. Or even before,
"that patch" meaning the sock_unix() one? Again, that affects the listening socket behaviour. What we need for incrementally reading the rules is about the accepted socket behaviour.
Yes. That doesn't just affect the listening socket, it affects a bunch of things, and they can all be checked and revisited, more conveniently, together, at a later point. -- Stefano
On Wed, 6 May 2026 10:12:21 +0200
Laurent Vivier
On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
We can now receive updates to the forwarding rules from the pesto client and store them in a "pending" copy of the forwarding tables. Implement switching to using the new rules.
The logic is in a new fwd_listen_switch(). For now this closes all listening sockets related to the old tables, swaps the active and pending tables, then listens based on the new tables. In future we look to improve this so that we don't temporarily stop listening on ports that both the old and new tables specify.
Signed-off-by: David Gibson
[sbrivio: In fwd_listen_switch(), use the destination size as argument to memcpy(), instead of sizeof(tmp), as suggested by Laurent] Signed-off-by: Stefano Brivio --- conf.c | 5 ++--- fwd.c | 34 ++++++++++++++++++++++++++++++++++ fwd.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/conf.c b/conf.c index 76344da..3f48793 100644 --- a/conf.c +++ b/conf.c @@ -2160,15 +2160,14 @@ void conf_handler(struct ctx *c, uint32_t events) fwd_rules_dump(info, fwd->rules, fwd->count, " ", ""); } + + fwd_listen_switch(c); }
if (events & EPOLLHUP) { debug("Configuration client hangup"); - goto close; }
- return; - close: conf_close(c);
diff --git a/fwd.c b/fwd.c index d93d2e5..0697435 100644 --- a/fwd.c +++ b/fwd.c @@ -534,6 +534,40 @@ int fwd_listen_init(const struct ctx *c) return 0; }
+/** + * fwd_listen_switch() - Switch from current to pending rules table + * @c: Execution context + */ +void fwd_listen_switch(struct ctx *c) +{ + struct fwd_table *tmp[PIF_NUM_TYPES]; + unsigned i; + + /* Stop listening on the old tables */ + for (i = 0; i < PIF_NUM_TYPES; i++) { + struct fwd_table *fwd = c->fwd[i]; + + if (!fwd) + continue; + + debug("Flushing %u old %s rules", fwd->count, pif_name(i)); + fwd_listen_close(fwd); + fwd->count = fwd->sock_count = 0; + } + + /* Swap active and pending tables */ + static_assert(sizeof(tmp) == sizeof(c->fwd) && + sizeof(tmp) == sizeof(c->fwd_pending), + "Temporary has wrong size");
At this point:
c->fwd[PIF_HOST] = &fwd_in; c->fwd[PIF_SPLICE] = &fwd_out;
c->fwd_pending[PIF_HOST] = &fwd_in_pending; c->fwd_pending[PIF_SPLICE] = &fwd_out_pending;
+ memcpy(&tmp, (void *)c->fwd, sizeof(tmp)); + memcpy((void *)c->fwd, (void *)c->fwd_pending, sizeof(c->fwd)); + memcpy((void *)c->fwd_pending, &tmp, sizeof(c->fwd_pending));
At this point:
c->fwd[PIF_HOST] = &fwd_in_pending; c->fwd[PIF_SPLICE] = &fwd_out_pending;
c->fwd_pending[PIF_HOST] = &fwd_in; c->fwd_pending[PIF_SPLICE] = &fwd_out;
Yeah, makes sense, I can change that in v9.
Perhaps it should be noted somewhere to avoid confusion in the future?
What do you think should be noted exactly, and where? Can you show a practical example of the change you're proposing?
Or to copy the content of the rules rather than the pointer to the rules?
Thanks, Laurent
-- Stefano
On Wed, May 06, 2026 at 10:22:20AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 16:45:27 +1000 David Gibson
wrote: On Wed, May 06, 2026 at 01:47:19AM +0200, Stefano Brivio wrote:
Instead of just being able to replace the existing forwarding table,
As of my last version, we already added, rather than replacing.
Right, I noticed that, but this isn't the default behaviour we discussed, so I assumed it was accidental, and planned to go back and check the reason why.
Given that it wasn't accidental, I'll simply adjust this part of the commit message.
implement --add and --delete options to maintain the table and add or delete specific ports.
The option --clear PIF forces the clearing of a table, instead.
These options can be combined arbitrarily and are handled as sequential commands, as now described in pesto(1).
If no option is given before forwarding specifiers for a matching table, the command line is interpreted as a replacement of the existing rules.
To this end:
- there's no protocol change, as pesto is anyway sending updated copies of the table
- the forwarding table functions now include a new fwd_rule_del(), which deletes existing rule only if a matching one is found
- a trivial fwd_rule_clear() is factored out from the existing conf_handler() implementation, so that it can be directly used in pesto
The entry points for parsing of port specifiers now take an additional 'del' parameter which is passed down all the way before reaching the fwd_rule_add() implementation. If a rule should be deleted, at that point, fwd_rule_del() is called instead.
Signed-off-by: Stefano Brivio
--- conf.c | 26 ++++++---------- fwd_rule.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++------- fwd_rule.h | 4 ++- pesto.1 | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++ pesto.c | 53 ++++++++++++++++++++++++++++--- 5 files changed, 227 insertions(+), 32 deletions(-) diff --git a/conf.c b/conf.c index 3f48793..909c34c 100644 --- a/conf.c +++ b/conf.c @@ -1849,16 +1849,16 @@ void conf(struct ctx *c, int argc, char **argv)
if (name == 't') { opt_t = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'u') { opt_u = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'T') { opt_T = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } else if (name == 'U') { opt_U = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } } while (name != -1);
@@ -1910,13 +1910,13 @@ void conf(struct ctx *c, int argc, char **argv)
if (c->mode == MODE_PASTA) { if (!opt_t) - fwd_rule_parse('t', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('t', false, "auto", c->fwd[PIF_HOST]); if (!opt_T) - fwd_rule_parse('T', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('T', false, "auto", c->fwd[PIF_SPLICE]); if (!opt_u) - fwd_rule_parse('u', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('u', false, "auto", c->fwd[PIF_HOST]); if (!opt_U) - fwd_rule_parse('U', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('U', false, "auto", c->fwd[PIF_SPLICE]); }
conf_sock_listen(c); @@ -2135,14 +2135,8 @@ void conf_handler(struct ctx *c, uint32_t events) unsigned pif;
/* Clear pending tables */ - for (pif = 0; pif < PIF_NUM_TYPES; pif++) { - struct fwd_table *fwd = c->fwd_pending[pif]; - - if (!fwd) - continue; - fwd->count = 0; - fwd->sock_count = 0; - } + for (pif = 0; pif < PIF_NUM_TYPES; pif++) + fwd_rule_clear(c->fwd_pending[pif]);
/* FIXME: this could block indefinitely if the client doesn't * write as much as it should diff --git a/fwd_rule.c b/fwd_rule.c index 03e8e80..eb9a601 100644 --- a/fwd_rule.c +++ b/fwd_rule.c @@ -180,6 +180,66 @@ static bool fwd_rule_conflicts(const struct fwd_rule *a, const struct fwd_rule * return true; }
+/** + * fwd_rule_clear() - Clear a forwarding table + * @fwd: Table to clear (might be NULL) + */ +void fwd_rule_clear(struct fwd_table *fwd) +{ + if (!fwd) + return; +
Not essential, but I wonder if it would be wise to verify that there are no currently open sockets associated with any of the rules.
With a loop, I suppose. I can add it as a TODO comment because I guess it would be good to handle that case (open sockets left) for fwd_rule_del() as well, and a part of the implementation can probably be common.
+ fwd->count = 0; + fwd->sock_count = 0; +} + +/** + * fwd_rule_del() - Partially validate and delete a rule from a forwarding table + * @fwd: Table to delete from + * @rule: Rule to delete (must match an existing rule) + * + * Return: 0 on success, negative error code on failure (-ENOENT if not found) + * + * NOTE: This function can't be used for a forwarding table with valid sockets + * stored in fwd->rulesocks.
The exact meaning of this isn't very clear to me. Does "valid" mean "open" or something else?
It means valid at some point, not necessarily open right now. I'll change it to "open" for clarity.
I'm not sure what "valid at some point" means, either.
I think what you're getting at is that every entry in fwd->socks[] must be -1. Or at least every entry with index in [0,sock_count)
Yes.
+ */ +static int fwd_rule_del(struct fwd_table *fwd, const struct fwd_rule *rule) +{ + unsigned num, i; + + for (i = 0; i < fwd->count; i++) { + if (fwd_rule_conflicts(rule, &fwd->rules[i])) + break; + }
So, this deletes any conflicting rule, not only exact matches. That's not very clear from the description of @rule.
It deletes the first one
Oh, good point. Which actually elevates this to a bug, not just a debate about the best semantics, because...
(but given that fwd_rule_conflicts() is called on insertion, there should be a single one).
... that's not correct. "conflicts" is not transitive, so (for example) in the cases below: -t 1000-2000 -t 4000-5000 --delete -t 500-5500 -t 127.0.0.1/100 -t 127.0.0.2/100 --delete -t 100 The deleted rule conflicts with both the added rules, but they don't conflict with each other. I don't think "delete all conflicting rules" is a great either, since it means that: -t 1000-1999 -t 2000-2999 --delete -t 1500-2500 maps nothing at all, which seems pretty surprising.
It's good enough for our purposes right now, even though we might want to make that more sophisticated in the future. I'll change the description of @rule.
I really think the current behaviour is too confusing. Only deleting exact matches (and giving an error if there's a conflict that's not an exact match) I think *is* good enough for now, so that's what I'd suggest.
+ + if (i == fwd->count) { + char newstr[FWD_RULE_STRLEN]; + + warn("Couldn't find forwarding rule to delete: %s", + fwd_rule_fmt(rule, newstr, sizeof(newstr))); + return -ENOENT; + } + + /* Don't use anything else from 'rule' as passed, it's not validated */ + rule = &fwd->rules[i]; + num = (unsigned)rule->last - rule->first + 1; + + fwd->count--; + + memmove((void *)(fwd->rulesocks + i), (void *)(fwd->rulesocks + i + 1),
I don't think the explicit (void *) casts are necessary - they should be implicit from memmove()s signature.
They aren't from a C point of view, but clang-tidy reports a bugprone-multi-level-implicit-pointer-conversion warning if I don't do that, because it's not obvious that we want the same kind of cast as the implicit one.
Ah, ok.
+ (fwd->count - i) * sizeof(*fwd->rulesocks));
Is memmove() guaranteed to be safe if given a zero length? That will occur if deleting the last entry.
In practice, this is quite commonly done in the Linux kernel and elsewhere and I never hit issues, against different C libraries.
Ok.
I'm actually not sure what issue I should get, C11 says that it "copies n characters from the object pointed to by s2 into the object pointed to by s1", and if "n" is 0, nothing is done.
There's no specific mention of that in C11, but I don't think there needs to be one, unlike with realloc() with a zero size and possibly others problematic cases.
So, yes, as far as I know it's "safe", but with no explicit guarantee (and I don't think any is needed).
+ /* TODO: move sockets stored starting from fwd->rulesocks[i + i], should + * we ever need to delete rules from a table with open sockets. + */ + fwd->sock_count -= num; + + memmove(fwd->rules + i, fwd->rules + i + 1, + (fwd->count - i) * sizeof(*fwd->rules)
Again, is this safe if i == fwd->count?
Same as above.
+ + return 0; +} + /** * fwd_rule_add() - Validate and add a rule to a forwarding table * @fwd: Table to add to @@ -368,6 +428,7 @@ static int parse_keyword(const char *s, const char **endptr, const char *kw) * fwd_rule_range_except() - Set up forwarding for a range of ports minus a * bitmap of exclusions * @fwd: Forwarding table to be updated + * @del: Delete resulting rules from forwarding table, instead of adding
Clunky, but it gets the job done.
* @proto: Protocol to forward * @addr: Listening address * @ifname: Listening interface @@ -377,8 +438,8 @@ static int parse_keyword(const char *s, const char **endptr, const char *kw) * @to: Port to translate @first to when forwarding * @flags: Flags for forwarding entries */ -static void fwd_rule_range_except(struct fwd_table *fwd, uint8_t proto, - const union inany_addr *addr, +static void fwd_rule_range_except(struct fwd_table *fwd, bool del, + uint8_t proto, const union inany_addr *addr, const char *ifname, uint16_t first, uint16_t last, const uint8_t *exclude, uint16_t to, @@ -418,8 +479,13 @@ static void fwd_rule_range_except(struct fwd_table *fwd, uint8_t proto, rule.last = i - 1; rule.to = base + delta;
- if (fwd_rule_add(fwd, &rule) < 0) - goto fail; + if (del) { + if (fwd_rule_del(fwd, &rule) < 0) + goto fail; + } else { + if (fwd_rule_add(fwd, &rule) < 0) + goto fail; + }
base = i - 1; } @@ -445,12 +511,13 @@ fail: /** * fwd_rule_parse_ports() - Parse port range(s) specifier * @fwd: Forwarding table to be updated + * @del: Delete resulting rules from forwarding table, instead of adding * @proto: Protocol to forward * @addr: Listening address for forwarding * @ifname: Interface name for listening * @spec: Port range(s) specifier */ -static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, +static void fwd_rule_parse_ports(struct fwd_table *fwd, bool del, uint8_t proto, const union inany_addr *addr, const char *ifname, const char *spec) @@ -507,7 +574,7 @@ static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, /* Exclude ephemeral ports */ fwd_port_map_ephemeral(exclude);
- fwd_rule_range_except(fwd, proto, addr, ifname, + fwd_rule_range_except(fwd, del, proto, addr, ifname, 1, NUM_PORTS - 1, exclude, 1, flags | FWD_WEAK); return; @@ -537,7 +604,7 @@ static void fwd_rule_parse_ports(struct fwd_table *fwd, uint8_t proto, if (p != ep) /* Garbage after the ranges */ goto bad;
- fwd_rule_range_except(fwd, proto, addr, ifname, + fwd_rule_range_except(fwd, del, proto, addr, ifname, orig_range.first, orig_range.last, exclude, mapped_range.first, flags); @@ -551,10 +618,12 @@ bad: /** * fwd_rule_parse() - Parse port configuration option * @optname: Short option name, t, T, u, or U + * @del: Delete resulting rules from forwarding table, instead of adding * @optarg: Option argument (port specification) * @fwd: Forwarding table to be updated */ -void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) +void fwd_rule_parse(char optname, bool del, const char *optarg, + struct fwd_table *fwd) { union inany_addr addr_buf = inany_any6, *addr = &addr_buf; char buf[BUFSIZ], *spec, *ifname = NULL; @@ -632,12 +701,12 @@ void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) optname, optarg);
if (fwd->caps & FWD_CAP_IPV4) { - fwd_rule_parse_ports(fwd, proto, + fwd_rule_parse_ports(fwd, del, proto, &inany_loopback4, NULL, spec); } if (fwd->caps & FWD_CAP_IPV6) { - fwd_rule_parse_ports(fwd, proto, + fwd_rule_parse_ports(fwd, del, proto, &inany_loopback6, NULL, spec); } @@ -653,7 +722,7 @@ void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd) optname, optarg); }
- fwd_rule_parse_ports(fwd, proto, addr, ifname, spec); + fwd_rule_parse_ports(fwd, del, proto, addr, ifname, spec); }
/** diff --git a/fwd_rule.h b/fwd_rule.h index f43b37d..ae9a3cb 100644 --- a/fwd_rule.h +++ b/fwd_rule.h @@ -100,9 +100,11 @@ void fwd_probe_ephemeral(void);
const union inany_addr *fwd_rule_addr(const struct fwd_rule *rule); const char *fwd_rule_fmt(const struct fwd_rule *rule, char *dst, size_t size); -void fwd_rule_parse(char optname, const char *optarg, struct fwd_table *fwd); +void fwd_rule_parse(char optname, bool del, const char *optarg, + struct fwd_table *fwd); int fwd_rule_read(int fd, struct fwd_rule *rule); int fwd_rule_write(int fd, const struct fwd_rule *rule); +void fwd_rule_clear(struct fwd_table *fwd); int fwd_rule_add(struct fwd_table *fwd, const struct fwd_rule *new);
/** diff --git a/pesto.1 b/pesto.1 index 1ea1d12..cd0f3dc 100644 --- a/pesto.1 +++ b/pesto.1 @@ -31,6 +31,42 @@ Be verbose. .BR \-h ", " \-\-help Display a help message and exit.
+.TP +.BR \-A ", " \-\-add +Add the port forwarding specifiers following this option to the current +forwarding table, rather than replacing it. + +This option can be given multiple times, as it might follow previous deletions +(see \fB--delete\fR below), and implies that all the specifiers following it, +before a further \fB--delete\fR option occurs, will be handled as additions. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + +.TP +.BR \-D ", " \-\-delete +Delete the port forwarding specifiers following this option from the current +forwarding table, rather than adding them it. + +This option can be given multiple times, as it might follow previous additions +(see \fB--add\fR above), and implies that all the specifiers following it, +before a further \fB--add\fR option occurs, will be handled as deletions. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + +.TP +.BR \-C ", " \-\-clear " " \fIpif +Clear the forwarding table associated to a given \fIpif\fR, that is, a +conceptual type of interface in \fBpasst\fR(1) or \fBpasta\fR(1) representing a +specific data path and direction. + +The available \fIpif\fR names can be obtained by querying the current forwarding +configuration, which can be done by calling \fBpesto\fR(1) without options. + +See the section \fBAdding, deleting, clearing rules\fR in the \fBNOTES\fR for +more details. + .TP .BR \-t ", " \-\-tcp-ports " " \fIspec Configure TCP port forwarding to guest or namespace. \fIspec\fR can be one of: @@ -162,6 +198,55 @@ Configure UDP port forwarding from target namespace to init namespace. .BR \-\-version Show version and exit.
+.SH NOTES + +.SS Adding, deleting, clearing rules + +The options \fB--add\fR, \fB--delete\fR, and \fB--clear\fR are handled as +sequential commands to manipulate the current forwarding tables. If none of them +is given, forwarding specifiers for a given table are intended as replacement of +the corresponding table. That is:
I thought we wanted to default to add, rather than replace. That seems both a little simpler to implement and agruably more likely to be what peopke want.
We previously discussed we would default to replace for brevity,
Oh, ok. Apparently I'd forgotten.
because otherwise users would need to explicitly clear tables, but also because it's consistent with:
- the idea that we replace the current table
- the existing command line syntax that just "programs" a new table.
That is, the cost of typing:
-A -t 22
is marginally higher compared to:
-t 22
but:
-C HOST -t 22
in case a replacement is desired looks substantially more to me.
Ok, fair enough.
It's really not that important at the moment as Podman will anyway explicitly pass options. Of course it's not a great idea to change this later for non-automated users, but in any case I think the reason I explained above (and we discussed in the past) are still valid.
+.nf + pesto -t 1024 -U 1025 +.fi + +will \fBreplace\fR the current TCP inbound port forwarding table with a single +rule, forwarding port 1024, and will similarly replace the UDP outbound +forwarding table with a single forwarding rule for port 1025. This usage is a +short-hand form for: + +.nf + pesto -C HOST -t 1024 -C SPLICE -U 1025 +.fi + +The options \fB--add\fR and \fB--delete\fR are used to \fBadd new specific +rules or delete existing ones\fR, instead of replacing tables. For example: + +.nf + pesto -A -t 2000 -D -t 3000 -U 5000 +.fi + +will add a forwarding rule for inbound TCP port 2000, and delete inbound TCP +port 3000 as well as outbound UDP port 5000 from the existing set of rules. + +All these options are interpreted as sequential commands and can be arbitrarily +combined. For example: + +.nf + pesto -A -t 2000 -C HOST -A -T 3000 -t 2001 -D -u 5000 +.fi + +will, in order: + +.RS +- add inbound TCP port 2000 +- clear inbound ports, reverting the addition above +- add outbound TCP port 3000 +- add inbound TCP port 2001 +- delete inbound UDP port 5000 +.RE + .SH AUTHORS
Stefano Brivio
, diff --git a/pesto.c b/pesto.c index 73fdc39..143d4c6 100644 --- a/pesto.c +++ b/pesto.c @@ -55,6 +55,9 @@ static void usage(const char *name, FILE *f, int status) FPRINTF(f, "Usage: %s [OPTION]... PATH\n", name); FPRINTF(f, "\n" + " -A, --add Add following specifiers to forwards\n" + " -D, --delete Delete following specifiers instead\n" + " -C, --clear PIF Clear forwarding table for given PIF\n" " -t, --tcp-ports SPEC TCP inbound port forwarding\n" " can be specified multiple times\n" " SPEC can be:\n" @@ -298,6 +301,9 @@ int main(int argc, char **argv) {"debug", no_argument, NULL, 'd' }, {"help", no_argument, NULL, 'h' }, {"version", no_argument, NULL, 1 }, + {"add", no_argument, NULL, 'A' }, + {"delete", no_argument, NULL, 'D' }, + {"clear", required_argument, NULL, 'C' }, {"tcp-ports", required_argument, NULL, 't' }, {"udp-ports", required_argument, NULL, 'u' }, {"tcp-ns", required_argument, NULL, 'T' }, @@ -305,9 +311,11 @@ int main(int argc, char **argv) {"show", no_argument, NULL, 's' }, { 0 }, }; + enum { MODE_CLEAR, MODE_ADD, MODE_DEL } mode = MODE_CLEAR; MODE_CLEAR doesn't make sense to me. Unlike add or del, clear is a once-off operation, it's not clear to me how it would affect the interpretation of -[tTuU].
It needs to be a mode, and the default one, to make sure we default to replacing tables for a given direction of forwarding.
It's not exactly one-off as it's a starting mode until -A or -D are given. The interpretation is as described in the man page.
Yeah, ok, it's necessary to support the replace-by-default semantics.
+ bool inbound_cleared = false, outbound_cleared = false; struct pif_configuration *inbound, *outbound; + const char *optstring = "dhADC:t:u:T:U:s"; struct sockaddr_un a = { AF_UNIX, "" }; - const char *optstring = "dht:u:T:U:s"; struct configuration conf = { 0 }; bool update = false, show = false; struct pesto_hello hello; @@ -339,11 +347,16 @@ int main(int argc, char **argv) case -1: case 0: break; + case 'C': case 't': case 'u': case 'T': case 'U': - /* Parse these options after we've read state from passt/pasta */ + case 'A': + case 'D': + /* Parse these options after we've read state from + * passt/pasta + */ update = true; break; case 's': @@ -439,13 +452,38 @@ int main(int argc, char **argv) optname = getopt_long(argc, argv, optstring, options, NULL);
switch (optname) { + case 'A': + mode = MODE_ADD; + break; + case 'D': + mode = MODE_DEL; + break; + case 'C': + if (!strcmp(optarg, "HOST")) { + fwd_rule_clear(&inbound->fwd); + inbound_cleared = true; + } else if (!strcmp(optarg, "SPLICE")) { + fwd_rule_clear(&outbound->fwd);
outbound will be NULL if talking to passt, so this could SEGV.
Ah, right, nice catch, I'll fix that in v9.
In principle 'inbound' could also be NULL, although not with any current passt version. Generalising as mentioned below should generalise the fix as well though.
+ outbound_cleared = true; + } else { + die("Unsupported pif name %s", optarg); + }
For the time being pesto is limited to a single "inbound" and single "outbound" table, simply because we haven't devised syntax for anything else. However, we don't actually need that for --clear. Since it already takes a pif name we can use pif_conf_by_name() to clear an arbitrary named pif's rules.
I'll change that in v9, assuming it works (I haven't tried yet).
+ + break; case 't': case 'u': if (!inbound) { die("Can't use -%c, no inbound interface", optname); } - fwd_rule_parse(optname, optarg, &inbound->fwd); + + if (mode == MODE_CLEAR && !inbound_cleared) { + fwd_rule_clear(&inbound->fwd); + inbound_cleared = true; + } + + fwd_rule_parse(optname, mode == MODE_DEL, optarg, + &inbound->fwd); break; case 'T': case 'U': @@ -453,7 +491,14 @@ int main(int argc, char **argv) die("Can't use -%c, no outbound interface", optname); } - fwd_rule_parse(optname, optarg, &outbound->fwd); + + if (mode == MODE_CLEAR && !outbound_cleared) { + fwd_rule_clear(&outbound->fwd); + outbound_cleared = true; + } + + fwd_rule_parse(optname, mode == MODE_DEL, optarg, + &inbound->fwd); break; default: continue; -- 2.43.0
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 6 May 2026 10:39:30 +0200
Stefano Brivio
On Wed, 6 May 2026 10:12:21 +0200 Laurent Vivier
wrote: On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
We can now receive updates to the forwarding rules from the pesto client and store them in a "pending" copy of the forwarding tables. Implement switching to using the new rules.
The logic is in a new fwd_listen_switch(). For now this closes all listening sockets related to the old tables, swaps the active and pending tables, then listens based on the new tables. In future we look to improve this so that we don't temporarily stop listening on ports that both the old and new tables specify.
Signed-off-by: David Gibson
[sbrivio: In fwd_listen_switch(), use the destination size as argument to memcpy(), instead of sizeof(tmp), as suggested by Laurent] Signed-off-by: Stefano Brivio --- conf.c | 5 ++--- fwd.c | 34 ++++++++++++++++++++++++++++++++++ fwd.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/conf.c b/conf.c index 76344da..3f48793 100644 --- a/conf.c +++ b/conf.c @@ -2160,15 +2160,14 @@ void conf_handler(struct ctx *c, uint32_t events) fwd_rules_dump(info, fwd->rules, fwd->count, " ", ""); } + + fwd_listen_switch(c); }
if (events & EPOLLHUP) { debug("Configuration client hangup"); - goto close; }
- return; - close: conf_close(c);
diff --git a/fwd.c b/fwd.c index d93d2e5..0697435 100644 --- a/fwd.c +++ b/fwd.c @@ -534,6 +534,40 @@ int fwd_listen_init(const struct ctx *c) return 0; }
+/** + * fwd_listen_switch() - Switch from current to pending rules table + * @c: Execution context + */ +void fwd_listen_switch(struct ctx *c) +{ + struct fwd_table *tmp[PIF_NUM_TYPES]; + unsigned i; + + /* Stop listening on the old tables */ + for (i = 0; i < PIF_NUM_TYPES; i++) { + struct fwd_table *fwd = c->fwd[i]; + + if (!fwd) + continue; + + debug("Flushing %u old %s rules", fwd->count, pif_name(i)); + fwd_listen_close(fwd); + fwd->count = fwd->sock_count = 0; + } + + /* Swap active and pending tables */ + static_assert(sizeof(tmp) == sizeof(c->fwd) && + sizeof(tmp) == sizeof(c->fwd_pending), + "Temporary has wrong size");
At this point:
c->fwd[PIF_HOST] = &fwd_in; c->fwd[PIF_SPLICE] = &fwd_out;
c->fwd_pending[PIF_HOST] = &fwd_in_pending; c->fwd_pending[PIF_SPLICE] = &fwd_out_pending;
+ memcpy(&tmp, (void *)c->fwd, sizeof(tmp)); + memcpy((void *)c->fwd, (void *)c->fwd_pending, sizeof(c->fwd)); + memcpy((void *)c->fwd_pending, &tmp, sizeof(c->fwd_pending));
At this point:
c->fwd[PIF_HOST] = &fwd_in_pending; c->fwd[PIF_SPLICE] = &fwd_out_pending;
c->fwd_pending[PIF_HOST] = &fwd_in; c->fwd_pending[PIF_SPLICE] = &fwd_out;
Yeah, makes sense, I can change that in v9.
Perhaps it should be noted somewhere to avoid confusion in the future?
What do you think should be noted exactly, and where? Can you show a practical example of the change you're proposing?
...I'm leaving like it is in v9 to make sure I'm not misinterpreting you, and also because the current (v8) version is obviously correct and I also tested it fairly heavily by now. I'd suggest optimising this (and commenting as needed) in a separate patch later. -- Stefano
On Wed, May 06, 2026 at 10:49:55AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 10:39:30 +0200 Stefano Brivio
wrote: On Wed, 6 May 2026 10:12:21 +0200 Laurent Vivier
wrote: On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
We can now receive updates to the forwarding rules from the pesto client and store them in a "pending" copy of the forwarding tables. Implement switching to using the new rules.
The logic is in a new fwd_listen_switch(). For now this closes all listening sockets related to the old tables, swaps the active and pending tables, then listens based on the new tables. In future we look to improve this so that we don't temporarily stop listening on ports that both the old and new tables specify.
Signed-off-by: David Gibson
[sbrivio: In fwd_listen_switch(), use the destination size as argument to memcpy(), instead of sizeof(tmp), as suggested by Laurent] Signed-off-by: Stefano Brivio --- conf.c | 5 ++--- fwd.c | 34 ++++++++++++++++++++++++++++++++++ fwd.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/conf.c b/conf.c index 76344da..3f48793 100644 --- a/conf.c +++ b/conf.c @@ -2160,15 +2160,14 @@ void conf_handler(struct ctx *c, uint32_t events) fwd_rules_dump(info, fwd->rules, fwd->count, " ", ""); } + + fwd_listen_switch(c); }
if (events & EPOLLHUP) { debug("Configuration client hangup"); - goto close; }
- return; - close: conf_close(c);
diff --git a/fwd.c b/fwd.c index d93d2e5..0697435 100644 --- a/fwd.c +++ b/fwd.c @@ -534,6 +534,40 @@ int fwd_listen_init(const struct ctx *c) return 0; }
+/** + * fwd_listen_switch() - Switch from current to pending rules table + * @c: Execution context + */ +void fwd_listen_switch(struct ctx *c) +{ + struct fwd_table *tmp[PIF_NUM_TYPES]; + unsigned i; + + /* Stop listening on the old tables */ + for (i = 0; i < PIF_NUM_TYPES; i++) { + struct fwd_table *fwd = c->fwd[i]; + + if (!fwd) + continue; + + debug("Flushing %u old %s rules", fwd->count, pif_name(i)); + fwd_listen_close(fwd); + fwd->count = fwd->sock_count = 0; + } + + /* Swap active and pending tables */ + static_assert(sizeof(tmp) == sizeof(c->fwd) && + sizeof(tmp) == sizeof(c->fwd_pending), + "Temporary has wrong size");
At this point:
c->fwd[PIF_HOST] = &fwd_in; c->fwd[PIF_SPLICE] = &fwd_out;
c->fwd_pending[PIF_HOST] = &fwd_in_pending; c->fwd_pending[PIF_SPLICE] = &fwd_out_pending;
+ memcpy(&tmp, (void *)c->fwd, sizeof(tmp)); + memcpy((void *)c->fwd, (void *)c->fwd_pending, sizeof(c->fwd)); + memcpy((void *)c->fwd_pending, &tmp, sizeof(c->fwd_pending));
At this point:
c->fwd[PIF_HOST] = &fwd_in_pending; c->fwd[PIF_SPLICE] = &fwd_out_pending;
c->fwd_pending[PIF_HOST] = &fwd_in; c->fwd_pending[PIF_SPLICE] = &fwd_out;
Yeah, makes sense, I can change that in v9.
Perhaps it should be noted somewhere to avoid confusion in the future?
What do you think should be noted exactly, and where? Can you show a practical example of the change you're proposing?
...I'm leaving like it is in v9 to make sure I'm not misinterpreting you, and also because the current (v8) version is obviously correct and I also tested it fairly heavily by now.
I'd suggest optimising this (and commenting as needed) in a separate patch later.
As noted in another branch of this thread, I think all it really needs is renaming the globals that c->fwd and c->fwd_pending point to. They should just be fwd_in_[12] (or an array of 2 tables), instead of implying a semantic difference between the plain and "pending" copies. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 6 May 2026 18:48:10 +1000
David Gibson
On Wed, May 06, 2026 at 10:22:20AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 16:45:27 +1000 David Gibson
wrote: On Wed, May 06, 2026 at 01:47:19AM +0200, Stefano Brivio wrote:
Instead of just being able to replace the existing forwarding table,
As of my last version, we already added, rather than replacing.
Right, I noticed that, but this isn't the default behaviour we discussed, so I assumed it was accidental, and planned to go back and check the reason why.
Given that it wasn't accidental, I'll simply adjust this part of the commit message.
implement --add and --delete options to maintain the table and add or delete specific ports.
The option --clear PIF forces the clearing of a table, instead.
These options can be combined arbitrarily and are handled as sequential commands, as now described in pesto(1).
If no option is given before forwarding specifiers for a matching table, the command line is interpreted as a replacement of the existing rules.
To this end:
- there's no protocol change, as pesto is anyway sending updated copies of the table
- the forwarding table functions now include a new fwd_rule_del(), which deletes existing rule only if a matching one is found
- a trivial fwd_rule_clear() is factored out from the existing conf_handler() implementation, so that it can be directly used in pesto
The entry points for parsing of port specifiers now take an additional 'del' parameter which is passed down all the way before reaching the fwd_rule_add() implementation. If a rule should be deleted, at that point, fwd_rule_del() is called instead.
Signed-off-by: Stefano Brivio
--- conf.c | 26 ++++++---------- fwd_rule.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++------- fwd_rule.h | 4 ++- pesto.1 | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++ pesto.c | 53 ++++++++++++++++++++++++++++--- 5 files changed, 227 insertions(+), 32 deletions(-) diff --git a/conf.c b/conf.c index 3f48793..909c34c 100644 --- a/conf.c +++ b/conf.c @@ -1849,16 +1849,16 @@ void conf(struct ctx *c, int argc, char **argv)
if (name == 't') { opt_t = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'u') { opt_u = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'T') { opt_T = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } else if (name == 'U') { opt_U = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } } while (name != -1);
@@ -1910,13 +1910,13 @@ void conf(struct ctx *c, int argc, char **argv)
if (c->mode == MODE_PASTA) { if (!opt_t) - fwd_rule_parse('t', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('t', false, "auto", c->fwd[PIF_HOST]); if (!opt_T) - fwd_rule_parse('T', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('T', false, "auto", c->fwd[PIF_SPLICE]); if (!opt_u) - fwd_rule_parse('u', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('u', false, "auto", c->fwd[PIF_HOST]); if (!opt_U) - fwd_rule_parse('U', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('U', false, "auto", c->fwd[PIF_SPLICE]); }
conf_sock_listen(c); @@ -2135,14 +2135,8 @@ void conf_handler(struct ctx *c, uint32_t events) unsigned pif;
/* Clear pending tables */ - for (pif = 0; pif < PIF_NUM_TYPES; pif++) { - struct fwd_table *fwd = c->fwd_pending[pif]; - - if (!fwd) - continue; - fwd->count = 0; - fwd->sock_count = 0; - } + for (pif = 0; pif < PIF_NUM_TYPES; pif++) + fwd_rule_clear(c->fwd_pending[pif]);
/* FIXME: this could block indefinitely if the client doesn't * write as much as it should diff --git a/fwd_rule.c b/fwd_rule.c index 03e8e80..eb9a601 100644 --- a/fwd_rule.c +++ b/fwd_rule.c @@ -180,6 +180,66 @@ static bool fwd_rule_conflicts(const struct fwd_rule *a, const struct fwd_rule * return true; }
+/** + * fwd_rule_clear() - Clear a forwarding table + * @fwd: Table to clear (might be NULL) + */ +void fwd_rule_clear(struct fwd_table *fwd) +{ + if (!fwd) + return; +
Not essential, but I wonder if it would be wise to verify that there are no currently open sockets associated with any of the rules.
With a loop, I suppose. I can add it as a TODO comment because I guess it would be good to handle that case (open sockets left) for fwd_rule_del() as well, and a part of the implementation can probably be common.
+ fwd->count = 0; + fwd->sock_count = 0; +} + +/** + * fwd_rule_del() - Partially validate and delete a rule from a forwarding table + * @fwd: Table to delete from + * @rule: Rule to delete (must match an existing rule) + * + * Return: 0 on success, negative error code on failure (-ENOENT if not found) + * + * NOTE: This function can't be used for a forwarding table with valid sockets + * stored in fwd->rulesocks.
The exact meaning of this isn't very clear to me. Does "valid" mean "open" or something else?
It means valid at some point, not necessarily open right now. I'll change it to "open" for clarity.
I'm not sure what "valid at some point" means, either.
That it was a valid socket file descriptor (an open one) at some point.
I think what you're getting at is that every entry in fwd->socks[] must be -1. Or at least every entry with index in [0,sock_count)
Yes.
+ */ +static int fwd_rule_del(struct fwd_table *fwd, const struct fwd_rule *rule) +{ + unsigned num, i; + + for (i = 0; i < fwd->count; i++) { + if (fwd_rule_conflicts(rule, &fwd->rules[i])) + break; + }
So, this deletes any conflicting rule, not only exact matches. That's not very clear from the description of @rule.
It deletes the first one
Oh, good point. Which actually elevates this to a bug, not just a debate about the best semantics, because...
(but given that fwd_rule_conflicts() is called on insertion, there should be a single one).
... that's not correct. "conflicts" is not transitive, so (for example) in the cases below: -t 1000-2000 -t 4000-5000 --delete -t 500-5500 -t 127.0.0.1/100 -t 127.0.0.2/100 --delete -t 100 The deleted rule conflicts with both the added rules, but they don't conflict with each other.
Right, yes, for partially overlapping rules that's true. But that's not what Podman needs right now, so I think it can be fixed later.
I don't think "delete all conflicting rules" is a great either, since it means that: -t 1000-1999 -t 2000-2999 --delete -t 1500-2500 maps nothing at all, which seems pretty surprising.
It's good enough for our purposes right now, even though we might want to make that more sophisticated in the future. I'll change the description of @rule.
I really think the current behaviour is too confusing. Only deleting exact matches (and giving an error if there's a conflict that's not an exact match) I think *is* good enough for now, so that's what I'd suggest.
...except that it's not implemented by any function and it's not exactly trivial either, and delaying the implementation further makes this useless (at least for Podman, which we can approximate to "essentially useless"), so I'd rather go with something that doesn't take care of partially overlapping ranges, rather than no feature at all. -- Stefano
On 5/6/26 10:52, David Gibson wrote:
On Wed, May 06, 2026 at 10:49:55AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 10:39:30 +0200 Stefano Brivio
wrote: On Wed, 6 May 2026 10:12:21 +0200 Laurent Vivier
wrote: On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
We can now receive updates to the forwarding rules from the pesto client and store them in a "pending" copy of the forwarding tables. Implement switching to using the new rules.
The logic is in a new fwd_listen_switch(). For now this closes all listening sockets related to the old tables, swaps the active and pending tables, then listens based on the new tables. In future we look to improve this so that we don't temporarily stop listening on ports that both the old and new tables specify.
Signed-off-by: David Gibson
[sbrivio: In fwd_listen_switch(), use the destination size as argument to memcpy(), instead of sizeof(tmp), as suggested by Laurent] Signed-off-by: Stefano Brivio --- conf.c | 5 ++--- fwd.c | 34 ++++++++++++++++++++++++++++++++++ fwd.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/conf.c b/conf.c index 76344da..3f48793 100644 --- a/conf.c +++ b/conf.c @@ -2160,15 +2160,14 @@ void conf_handler(struct ctx *c, uint32_t events) fwd_rules_dump(info, fwd->rules, fwd->count, " ", ""); } + + fwd_listen_switch(c); }
if (events & EPOLLHUP) { debug("Configuration client hangup"); - goto close; }
- return; - close: conf_close(c);
diff --git a/fwd.c b/fwd.c index d93d2e5..0697435 100644 --- a/fwd.c +++ b/fwd.c @@ -534,6 +534,40 @@ int fwd_listen_init(const struct ctx *c) return 0; }
+/** + * fwd_listen_switch() - Switch from current to pending rules table + * @c: Execution context + */ +void fwd_listen_switch(struct ctx *c) +{ + struct fwd_table *tmp[PIF_NUM_TYPES]; + unsigned i; + + /* Stop listening on the old tables */ + for (i = 0; i < PIF_NUM_TYPES; i++) { + struct fwd_table *fwd = c->fwd[i]; + + if (!fwd) + continue; + + debug("Flushing %u old %s rules", fwd->count, pif_name(i)); + fwd_listen_close(fwd); + fwd->count = fwd->sock_count = 0; + } + + /* Swap active and pending tables */ + static_assert(sizeof(tmp) == sizeof(c->fwd) && + sizeof(tmp) == sizeof(c->fwd_pending), + "Temporary has wrong size");
At this point:
c->fwd[PIF_HOST] = &fwd_in; c->fwd[PIF_SPLICE] = &fwd_out;
c->fwd_pending[PIF_HOST] = &fwd_in_pending; c->fwd_pending[PIF_SPLICE] = &fwd_out_pending;
+ memcpy(&tmp, (void *)c->fwd, sizeof(tmp)); + memcpy((void *)c->fwd, (void *)c->fwd_pending, sizeof(c->fwd)); + memcpy((void *)c->fwd_pending, &tmp, sizeof(c->fwd_pending));
At this point:
c->fwd[PIF_HOST] = &fwd_in_pending; c->fwd[PIF_SPLICE] = &fwd_out_pending;
c->fwd_pending[PIF_HOST] = &fwd_in; c->fwd_pending[PIF_SPLICE] = &fwd_out;
Yeah, makes sense, I can change that in v9.
Perhaps it should be noted somewhere to avoid confusion in the future?
What do you think should be noted exactly, and where? Can you show a practical example of the change you're proposing?
...I'm leaving like it is in v9 to make sure I'm not misinterpreting you, and also because the current (v8) version is obviously correct and I also tested it fairly heavily by now.
I'd suggest optimising this (and commenting as needed) in a separate patch later.
As noted in another branch of this thread, I think all it really needs is renaming the globals that c->fwd and c->fwd_pending point to. They should just be fwd_in_[12] (or an array of 2 tables), instead of implying a semantic difference between the plain and "pending" copies.
I agree with that. Thanks, Laurent
On Wed, 6 May 2026 09:13:19 +0200
Laurent Vivier
On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
This adds parsing of options using fwd_rule_parse(), validates them and adds them to the existing rules. It doesn't yet send those rules back to passt or pasta.
Message-ID: <20260322141843.4095972-3-sbrivio@redhat.com> [dwg: Based on an early draft by Stefano] Signed-off-by: David Gibson
[sbrivio: Recycled usage messages for -T and -U from conf.c as suggested by Laurent, dropped unrelated whitespace change] [sbrivio: Add description of -t, -u, -T, -U to pesto.1] And -s, --show ?
Oops, I forgot that one as well. Added in v9. -- Stefano
On Wed, 6 May 2026 11:11:05 +0200
Laurent Vivier
On 5/6/26 10:52, David Gibson wrote:
On Wed, May 06, 2026 at 10:49:55AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 10:39:30 +0200 Stefano Brivio
wrote: On Wed, 6 May 2026 10:12:21 +0200 Laurent Vivier
wrote: On 5/6/26 01:47, Stefano Brivio wrote:
From: David Gibson
We can now receive updates to the forwarding rules from the pesto client and store them in a "pending" copy of the forwarding tables. Implement switching to using the new rules.
The logic is in a new fwd_listen_switch(). For now this closes all listening sockets related to the old tables, swaps the active and pending tables, then listens based on the new tables. In future we look to improve this so that we don't temporarily stop listening on ports that both the old and new tables specify.
Signed-off-by: David Gibson
[sbrivio: In fwd_listen_switch(), use the destination size as argument to memcpy(), instead of sizeof(tmp), as suggested by Laurent] Signed-off-by: Stefano Brivio --- conf.c | 5 ++--- fwd.c | 34 ++++++++++++++++++++++++++++++++++ fwd.h | 1 + 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/conf.c b/conf.c index 76344da..3f48793 100644 --- a/conf.c +++ b/conf.c @@ -2160,15 +2160,14 @@ void conf_handler(struct ctx *c, uint32_t events) fwd_rules_dump(info, fwd->rules, fwd->count, " ", ""); } + + fwd_listen_switch(c); }
if (events & EPOLLHUP) { debug("Configuration client hangup"); - goto close; }
- return; - close: conf_close(c);
diff --git a/fwd.c b/fwd.c index d93d2e5..0697435 100644 --- a/fwd.c +++ b/fwd.c @@ -534,6 +534,40 @@ int fwd_listen_init(const struct ctx *c) return 0; }
+/** + * fwd_listen_switch() - Switch from current to pending rules table + * @c: Execution context + */ +void fwd_listen_switch(struct ctx *c) +{ + struct fwd_table *tmp[PIF_NUM_TYPES]; + unsigned i; + + /* Stop listening on the old tables */ + for (i = 0; i < PIF_NUM_TYPES; i++) { + struct fwd_table *fwd = c->fwd[i]; + + if (!fwd) + continue; + + debug("Flushing %u old %s rules", fwd->count, pif_name(i)); + fwd_listen_close(fwd); + fwd->count = fwd->sock_count = 0; + } + + /* Swap active and pending tables */ + static_assert(sizeof(tmp) == sizeof(c->fwd) && + sizeof(tmp) == sizeof(c->fwd_pending), + "Temporary has wrong size");
At this point:
c->fwd[PIF_HOST] = &fwd_in; c->fwd[PIF_SPLICE] = &fwd_out;
c->fwd_pending[PIF_HOST] = &fwd_in_pending; c->fwd_pending[PIF_SPLICE] = &fwd_out_pending;
+ memcpy(&tmp, (void *)c->fwd, sizeof(tmp)); + memcpy((void *)c->fwd, (void *)c->fwd_pending, sizeof(c->fwd)); + memcpy((void *)c->fwd_pending, &tmp, sizeof(c->fwd_pending));
At this point:
c->fwd[PIF_HOST] = &fwd_in_pending; c->fwd[PIF_SPLICE] = &fwd_out_pending;
c->fwd_pending[PIF_HOST] = &fwd_in; c->fwd_pending[PIF_SPLICE] = &fwd_out;
Yeah, makes sense, I can change that in v9.
Perhaps it should be noted somewhere to avoid confusion in the future?
What do you think should be noted exactly, and where? Can you show a practical example of the change you're proposing?
...I'm leaving like it is in v9 to make sure I'm not misinterpreting you, and also because the current (v8) version is obviously correct and I also tested it fairly heavily by now.
I'd suggest optimising this (and commenting as needed) in a separate patch later.
As noted in another branch of this thread, I think all it really needs is renaming the globals that c->fwd and c->fwd_pending point to. They should just be fwd_in_[12] (or an array of 2 tables), instead of implying a semantic difference between the plain and "pending" copies.
I agree with that.
Okay, I see, and I agree as well. I'd still keep that as a separate patch (there are quite a few follow-ups we'll need anyway) as it's not really fundamental right now. -- Stefano
On Wed, 6 May 2026 10:56:01 +0200
Stefano Brivio
On Wed, 6 May 2026 18:48:10 +1000 David Gibson
wrote: On Wed, May 06, 2026 at 10:22:20AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 16:45:27 +1000 David Gibson
wrote: On Wed, May 06, 2026 at 01:47:19AM +0200, Stefano Brivio wrote:
Instead of just being able to replace the existing forwarding table,
As of my last version, we already added, rather than replacing.
Right, I noticed that, but this isn't the default behaviour we discussed, so I assumed it was accidental, and planned to go back and check the reason why.
Given that it wasn't accidental, I'll simply adjust this part of the commit message.
implement --add and --delete options to maintain the table and add or delete specific ports.
The option --clear PIF forces the clearing of a table, instead.
These options can be combined arbitrarily and are handled as sequential commands, as now described in pesto(1).
If no option is given before forwarding specifiers for a matching table, the command line is interpreted as a replacement of the existing rules.
To this end:
- there's no protocol change, as pesto is anyway sending updated copies of the table
- the forwarding table functions now include a new fwd_rule_del(), which deletes existing rule only if a matching one is found
- a trivial fwd_rule_clear() is factored out from the existing conf_handler() implementation, so that it can be directly used in pesto
The entry points for parsing of port specifiers now take an additional 'del' parameter which is passed down all the way before reaching the fwd_rule_add() implementation. If a rule should be deleted, at that point, fwd_rule_del() is called instead.
Signed-off-by: Stefano Brivio
--- conf.c | 26 ++++++---------- fwd_rule.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++------- fwd_rule.h | 4 ++- pesto.1 | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++ pesto.c | 53 ++++++++++++++++++++++++++++--- 5 files changed, 227 insertions(+), 32 deletions(-) diff --git a/conf.c b/conf.c index 3f48793..909c34c 100644 --- a/conf.c +++ b/conf.c @@ -1849,16 +1849,16 @@ void conf(struct ctx *c, int argc, char **argv)
if (name == 't') { opt_t = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'u') { opt_u = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'T') { opt_T = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } else if (name == 'U') { opt_U = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } } while (name != -1);
@@ -1910,13 +1910,13 @@ void conf(struct ctx *c, int argc, char **argv)
if (c->mode == MODE_PASTA) { if (!opt_t) - fwd_rule_parse('t', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('t', false, "auto", c->fwd[PIF_HOST]); if (!opt_T) - fwd_rule_parse('T', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('T', false, "auto", c->fwd[PIF_SPLICE]); if (!opt_u) - fwd_rule_parse('u', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('u', false, "auto", c->fwd[PIF_HOST]); if (!opt_U) - fwd_rule_parse('U', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('U', false, "auto", c->fwd[PIF_SPLICE]); }
conf_sock_listen(c); @@ -2135,14 +2135,8 @@ void conf_handler(struct ctx *c, uint32_t events) unsigned pif;
/* Clear pending tables */ - for (pif = 0; pif < PIF_NUM_TYPES; pif++) { - struct fwd_table *fwd = c->fwd_pending[pif]; - - if (!fwd) - continue; - fwd->count = 0; - fwd->sock_count = 0; - } + for (pif = 0; pif < PIF_NUM_TYPES; pif++) + fwd_rule_clear(c->fwd_pending[pif]);
/* FIXME: this could block indefinitely if the client doesn't * write as much as it should diff --git a/fwd_rule.c b/fwd_rule.c index 03e8e80..eb9a601 100644 --- a/fwd_rule.c +++ b/fwd_rule.c @@ -180,6 +180,66 @@ static bool fwd_rule_conflicts(const struct fwd_rule *a, const struct fwd_rule * return true; }
+/** + * fwd_rule_clear() - Clear a forwarding table + * @fwd: Table to clear (might be NULL) + */ +void fwd_rule_clear(struct fwd_table *fwd) +{ + if (!fwd) + return; +
Not essential, but I wonder if it would be wise to verify that there are no currently open sockets associated with any of the rules.
With a loop, I suppose. I can add it as a TODO comment because I guess it would be good to handle that case (open sockets left) for fwd_rule_del() as well, and a part of the implementation can probably be common.
+ fwd->count = 0; + fwd->sock_count = 0; +} + +/** + * fwd_rule_del() - Partially validate and delete a rule from a forwarding table + * @fwd: Table to delete from + * @rule: Rule to delete (must match an existing rule) + * + * Return: 0 on success, negative error code on failure (-ENOENT if not found) + * + * NOTE: This function can't be used for a forwarding table with valid sockets + * stored in fwd->rulesocks.
The exact meaning of this isn't very clear to me. Does "valid" mean "open" or something else?
It means valid at some point, not necessarily open right now. I'll change it to "open" for clarity.
I'm not sure what "valid at some point" means, either.
That it was a valid socket file descriptor (an open one) at some point.
I think what you're getting at is that every entry in fwd->socks[] must be -1. Or at least every entry with index in [0,sock_count)
Yes.
+ */ +static int fwd_rule_del(struct fwd_table *fwd, const struct fwd_rule *rule) +{ + unsigned num, i; + + for (i = 0; i < fwd->count; i++) { + if (fwd_rule_conflicts(rule, &fwd->rules[i])) + break; + }
So, this deletes any conflicting rule, not only exact matches. That's not very clear from the description of @rule.
It deletes the first one
Oh, good point. Which actually elevates this to a bug, not just a debate about the best semantics, because...
(but given that fwd_rule_conflicts() is called on insertion, there should be a single one).
... that's not correct. "conflicts" is not transitive, so (for example) in the cases below: -t 1000-2000 -t 4000-5000 --delete -t 500-5500 -t 127.0.0.1/100 -t 127.0.0.2/100 --delete -t 100 The deleted rule conflicts with both the added rules, but they don't conflict with each other.
Right, yes, for partially overlapping rules that's true. But that's not what Podman needs right now, so I think it can be fixed later.
I don't think "delete all conflicting rules" is a great either, since it means that: -t 1000-1999 -t 2000-2999 --delete -t 1500-2500 maps nothing at all, which seems pretty surprising.
It's good enough for our purposes right now, even though we might want to make that more sophisticated in the future. I'll change the description of @rule.
I really think the current behaviour is too confusing. Only deleting exact matches (and giving an error if there's a conflict that's not an exact match) I think *is* good enough for now, so that's what I'd suggest.
...except that it's not implemented by any function and it's not exactly trivial either, and delaying the implementation further makes this useless (at least for Podman, which we can approximate to "essentially useless"), so I'd rather go with something that doesn't take care of partially overlapping ranges, rather than no feature at all.
...but I ended up implementing exactly that in v10 as a re-spin was needed anyway and I realised that the implementation is absolutely trivial. -- Stefano
On Wed, May 06, 2026 at 10:56:02AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 18:48:10 +1000 David Gibson
wrote: On Wed, May 06, 2026 at 10:22:20AM +0200, Stefano Brivio wrote:
On Wed, 6 May 2026 16:45:27 +1000 David Gibson
wrote: On Wed, May 06, 2026 at 01:47:19AM +0200, Stefano Brivio wrote:
Instead of just being able to replace the existing forwarding table,
As of my last version, we already added, rather than replacing.
Right, I noticed that, but this isn't the default behaviour we discussed, so I assumed it was accidental, and planned to go back and check the reason why.
Given that it wasn't accidental, I'll simply adjust this part of the commit message.
implement --add and --delete options to maintain the table and add or delete specific ports.
The option --clear PIF forces the clearing of a table, instead.
These options can be combined arbitrarily and are handled as sequential commands, as now described in pesto(1).
If no option is given before forwarding specifiers for a matching table, the command line is interpreted as a replacement of the existing rules.
To this end:
- there's no protocol change, as pesto is anyway sending updated copies of the table
- the forwarding table functions now include a new fwd_rule_del(), which deletes existing rule only if a matching one is found
- a trivial fwd_rule_clear() is factored out from the existing conf_handler() implementation, so that it can be directly used in pesto
The entry points for parsing of port specifiers now take an additional 'del' parameter which is passed down all the way before reaching the fwd_rule_add() implementation. If a rule should be deleted, at that point, fwd_rule_del() is called instead.
Signed-off-by: Stefano Brivio
--- conf.c | 26 ++++++---------- fwd_rule.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++------- fwd_rule.h | 4 ++- pesto.1 | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++ pesto.c | 53 ++++++++++++++++++++++++++++--- 5 files changed, 227 insertions(+), 32 deletions(-) diff --git a/conf.c b/conf.c index 3f48793..909c34c 100644 --- a/conf.c +++ b/conf.c @@ -1849,16 +1849,16 @@ void conf(struct ctx *c, int argc, char **argv)
if (name == 't') { opt_t = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'u') { opt_u = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_HOST]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_HOST]); } else if (name == 'T') { opt_T = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } else if (name == 'U') { opt_U = true; - fwd_rule_parse(name, optarg, c->fwd[PIF_SPLICE]); + fwd_rule_parse(name, false, optarg, c->fwd[PIF_SPLICE]); } } while (name != -1);
@@ -1910,13 +1910,13 @@ void conf(struct ctx *c, int argc, char **argv)
if (c->mode == MODE_PASTA) { if (!opt_t) - fwd_rule_parse('t', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('t', false, "auto", c->fwd[PIF_HOST]); if (!opt_T) - fwd_rule_parse('T', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('T', false, "auto", c->fwd[PIF_SPLICE]); if (!opt_u) - fwd_rule_parse('u', "auto", c->fwd[PIF_HOST]); + fwd_rule_parse('u', false, "auto", c->fwd[PIF_HOST]); if (!opt_U) - fwd_rule_parse('U', "auto", c->fwd[PIF_SPLICE]); + fwd_rule_parse('U', false, "auto", c->fwd[PIF_SPLICE]); }
conf_sock_listen(c); @@ -2135,14 +2135,8 @@ void conf_handler(struct ctx *c, uint32_t events) unsigned pif;
/* Clear pending tables */ - for (pif = 0; pif < PIF_NUM_TYPES; pif++) { - struct fwd_table *fwd = c->fwd_pending[pif]; - - if (!fwd) - continue; - fwd->count = 0; - fwd->sock_count = 0; - } + for (pif = 0; pif < PIF_NUM_TYPES; pif++) + fwd_rule_clear(c->fwd_pending[pif]);
/* FIXME: this could block indefinitely if the client doesn't * write as much as it should diff --git a/fwd_rule.c b/fwd_rule.c index 03e8e80..eb9a601 100644 --- a/fwd_rule.c +++ b/fwd_rule.c @@ -180,6 +180,66 @@ static bool fwd_rule_conflicts(const struct fwd_rule *a, const struct fwd_rule * return true; }
+/** + * fwd_rule_clear() - Clear a forwarding table + * @fwd: Table to clear (might be NULL) + */ +void fwd_rule_clear(struct fwd_table *fwd) +{ + if (!fwd) + return; +
Not essential, but I wonder if it would be wise to verify that there are no currently open sockets associated with any of the rules.
With a loop, I suppose. I can add it as a TODO comment because I guess it would be good to handle that case (open sockets left) for fwd_rule_del() as well, and a part of the implementation can probably be common.
+ fwd->count = 0; + fwd->sock_count = 0; +} + +/** + * fwd_rule_del() - Partially validate and delete a rule from a forwarding table + * @fwd: Table to delete from + * @rule: Rule to delete (must match an existing rule) + * + * Return: 0 on success, negative error code on failure (-ENOENT if not found) + * + * NOTE: This function can't be used for a forwarding table with valid sockets + * stored in fwd->rulesocks.
The exact meaning of this isn't very clear to me. Does "valid" mean "open" or something else?
It means valid at some point, not necessarily open right now. I'll change it to "open" for clarity.
I'm not sure what "valid at some point" means, either.
That it was a valid socket file descriptor (an open one) at some point.
I think what you're getting at is that every entry in fwd->socks[] must be -1. Or at least every entry with index in [0,sock_count)
Yes.
+ */ +static int fwd_rule_del(struct fwd_table *fwd, const struct fwd_rule *rule) +{ + unsigned num, i; + + for (i = 0; i < fwd->count; i++) { + if (fwd_rule_conflicts(rule, &fwd->rules[i])) + break; + }
So, this deletes any conflicting rule, not only exact matches. That's not very clear from the description of @rule.
It deletes the first one
Oh, good point. Which actually elevates this to a bug, not just a debate about the best semantics, because...
(but given that fwd_rule_conflicts() is called on insertion, there should be a single one).
... that's not correct. "conflicts" is not transitive, so (for example) in the cases below: -t 1000-2000 -t 4000-5000 --delete -t 500-5500 -t 127.0.0.1/100 -t 127.0.0.2/100 --delete -t 100 The deleted rule conflicts with both the added rules, but they don't conflict with each other.
Right, yes, for partially overlapping rules that's true. But that's not what Podman needs right now, so I think it can be fixed later.
The second example involves no ranges at all.
I don't think "delete all conflicting rules" is a great either, since it means that: -t 1000-1999 -t 2000-2999 --delete -t 1500-2500 maps nothing at all, which seems pretty surprising.
It's good enough for our purposes right now, even though we might want to make that more sophisticated in the future. I'll change the description of @rule.
I really think the current behaviour is too confusing. Only deleting exact matches (and giving an error if there's a conflict that's not an exact match) I think *is* good enough for now, so that's what I'd suggest.
...except that it's not implemented by any function and it's not exactly trivial either, and delaying the implementation further makes this useless (at least for Podman, which we can approximate to "essentially useless"), so I'd rather go with something that doesn't take care of partially overlapping ranges, rather than no feature at all.
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (3)
-
David Gibson
-
Laurent Vivier
-
Stefano Brivio