[PATCH v3 00/11] Introduce multiple addresses

Jon Maloy

30 Jan 2026 30 Jan '26

10:44 p.m.

This version contains what I perceive as the least controversial parts of my previous RFC series. It basically makes address handling behave like before, but now allowing multiple addresses both at the host side and the guest side. v2: - Added the earlier standalone CIDR commit to the head of the series. - Replaced the guest namespace interface subscriptions with just an address observation feature, so that it works with both PASTA and PASST. - Unified 'no_copy_addrs' and 'copy_addrs' code paths, as suggested by David G. - Multiple other changes, also based on feedback from David. - Removed the host interface subscription patches, -for now. I intend to re-add them once this series is applied. - Outstanding question: When do we add an IPv4 link local address to the guest? Only in local/opaque mode? Only when explicitly requested? Always? v3: - Unified the IPv4 and IPv6 arrays into one array - Changed prefix_len to always eb in IPv6/IpV4 mapped format - Updated migration protocol to v3, handling multiple addresses - Many other smaller changes, based on feedback from the PASST team Jon Maloy (11): conf: Support CIDR notation for -a/--address option ip: Add IN4_MASK() macro for IPv4 netmask calculation ip: Introduce unified multi-address data structures fwd: Check all configured addresses in guest accessibility functions arp: Check all configured addresses in ARP filtering pasta: Extract pasta_ns_conf_ip4/6() to reduce nesting conf: Allow multiple -a/--address options per address family migrate: Rename v1 address functions to v2 for clarity ip: Track observed guest IPv4 addresses in unified address array ip: Track observed guest IPv6 addresses in unified address array conf: Select addresses for DHCP and NDP distribution arp.c | 17 ++-- conf.c | 249 ++++++++++++++++++++++++++++++++---------------------- conf.h | 6 ++ dhcp.c | 23 +++-- dhcp.h | 2 +- dhcpv6.c | 23 +++-- dhcpv6.h | 2 +- fwd.c | 216 ++++++++++++++++++++++++++++++++++++++-------- fwd.h | 6 ++ inany.c | 68 +++++++++++++++ inany.h | 1 + ip.c | 22 +++++ ip.h | 9 ++ migrate.c | 244 ++++++++++++++++++++++++++++++++++++++++++++++++---- ndp.c | 19 +++-- ndp.h | 4 +- passt.1 | 17 ++-- passt.h | 134 ++++++++++++++++++++++++++--- pasta.c | 197 ++++++++++++++++++++++++------------------ tap.c | 117 +++++++++++++++++++++---- 20 files changed, 1080 insertions(+), 296 deletions(-) -- 2.52.0

Show replies by date

Jon Maloy

30 Jan 30 Jan

...

Extend the -a/--address option to accept addresses in CIDR notation (e.g., 192.168.1.1/24 or 2001:db8::1/64) as an alternative to using separate -a and -n options.

We add a new inany_prefix_pton() helper function that: - Parses address strings with optional /prefix_len suffix - Validates prefix length based on address family (0-32 for IPv4, 0-128 for IPv6), including handling of IPv4-to-IPv6 mapping case. - Returns identified address family, if any.

For IPv4, the prefix length is stored in ip4.prefix_len when provided. Mixing -n and CIDR notation results in an error to catch likely user mistakes.

Also fix a bug in conf_ip4_prefix() that was incorrectly using the global 'optarg' instead of its 'arg' parameter.

Signed-off-by: Jon Maloy

--- v3: Fixes after feedback from Laurent, David and Stefano Notably, updated man page for the -a option

v4: Fixes based on feedback from David G: - Handling prefix length adjustment when IPv4-to-IPv6 mapping - Removed redundant !IN6_IS_ADDR_V4MAPPED(&addr.a6) test - Simplified tests of acceptable address types - Merged documentation and code commits - Some documentation text clarifications

v5: - Moved address/prefix parsing into a refactored inany_prefix_pton() function. - inany_prefix_pton() now only caluclates IPv6 style prefix lengths - Stricter distinction between error causes. - Some refactoring of the 'case a:' branch in conf() - Some small fixes in passt.1 --- conf.c | 58 +++++++++++++++++++++++++++++------------------- inany.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ inany.h | 1 + ip.c | 21 ++++++++++++++++++ ip.h | 2 ++ passt.1 | 17 ++++++++++----- 6 files changed, 139 insertions(+), 28 deletions(-)

diff --git a/conf.c b/conf.c index 2942c8c..98d5d17 100644 --- a/conf.c +++ b/conf.c @@ -682,7 +682,7 @@ static int conf_ip4_prefix(const char *arg) return -1; } else { errno = 0; - len = strtoul(optarg, NULL, 0); + len = strtoul(arg, NULL, 0); if (len > 32 || errno) return -1; } @@ -896,7 +896,7 @@ static void usage(const char *name, FILE *f, int status) " a zero value disables assignment\n" " default: 65520: maximum 802.3 MTU minus 802.3 header\n" " length, rounded to 32 bits (IPv4 words)\n" - " -a, --address ADDR Assign IPv4 or IPv6 address ADDR\n" + " -a, --address ADDR Assign IPv4 or IPv6 address ADDR[/PREFIXLEN]\n" " can be specified zero to two times (for IPv4 and IPv6)\n" " default: use addresses from interface with default route\n" " -n, --netmask MASK Assign IPv4 MASK, dot-decimal or bits\n" @@ -1498,6 +1498,7 @@ void conf(struct ctx *c, int argc, char **argv) const char *optstring = "+dqfel:hs:F:I:p:P:m:a:n:M:g:i:o:D:S:H:461t:u:T:U:"; const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt"; char userns[PATH_MAX] = { 0 }, netns[PATH_MAX] = { 0 }; + bool prefix_from_cidr = false, prefix_from_opt = false; bool copy_addrs_opt = false, copy_routes_opt = false; enum fwd_ports_mode fwd_default = FWD_NONE; bool v4_only = false, v6_only = false; @@ -1808,35 +1809,46 @@ void conf(struct ctx *c, int argc, char **argv) c->mtu = mtu; break; } - case 'a': - if (inet_pton(AF_INET6, optarg, &c->ip6.addr) && - !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr) && - !IN6_IS_ADDR_LOOPBACK(&c->ip6.addr) && - !IN6_IS_ADDR_V4MAPPED(&c->ip6.addr) && - !IN6_IS_ADDR_V4COMPAT(&c->ip6.addr) && - !IN6_IS_ADDR_MULTICAST(&c->ip6.addr)) { - if (c->mode == MODE_PASTA) - c->ip6.no_copy_addrs = true; - break; - } - - if (inet_pton(AF_INET, optarg, &c->ip4.addr) && - !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.addr) && - !IN4_IS_ADDR_BROADCAST(&c->ip4.addr) && - !IN4_IS_ADDR_LOOPBACK(&c->ip4.addr) && - !IN4_IS_ADDR_MULTICAST(&c->ip4.addr)) { + case 'a': { + union inany_addr addr = { 0 };

You're unconditionally calling inany_prefix_pton(), so you shouldn't need to initialise this.

...

+ int prefix_len = 0;

Or this.

...

+ int af; + + af = inany_prefix_pton(optarg, &addr, &prefix_len);

You need to check for errors from inany_prefix_pton().

...

+ if (inany_is_unspecified(&addr) || + inany_is_multicast(&addr) || + inany_is_loopback(&addr) || + IN6_IS_ADDR_V4COMPAT(&addr.a6)) + die("Invalid address: %s", optarg);

For sanity / extensibility, it probably also makes to fail here if af is not either AF_INET or AF_INET6.

...

+ + if (prefix_len && !prefix_from_opt) + prefix_from_cidr = true; + else if (prefix_len) + die("Can't mix CIDR with -n"); + + if (af == AF_INET) { + c->ip4.addr = *inany_v4(&addr); + c->ip4.prefix_len = prefix_len ? prefix_len - 96 : + ip4_class_prefix_len(&c->ip4.addr); if (c->mode == MODE_PASTA) c->ip4.no_copy_addrs = true; - break; + } else if (af == AF_INET6) { + c->ip6.addr = addr.a6; + if (c->mode == MODE_PASTA) + c->ip6.no_copy_addrs = true; + } else { + die("Invalid prefix length: %d", prefix_len);

This error message does not seem to match the conditions that trigger it.

...

} - - die("Invalid address: %s", optarg); break; + } case 'n': + if (prefix_from_cidr) + die("Can't use both -n and CIDR prefix length"); c->ip4.prefix_len = conf_ip4_prefix(optarg); if (c->ip4.prefix_len < 0) die("Invalid netmask: %s", optarg); - + prefix_from_opt = true; break; case 'M': parse_mac(c->our_tap_mac, optarg); diff --git a/inany.c b/inany.c index 7680439..00d44e0 100644 --- a/inany.c +++ b/inany.c @@ -11,6 +11,7 @@ #include #include #include +#include

#include "util.h" #include "ip.h" @@ -57,3 +58,70 @@ int inany_pton(const char *src, union inany_addr *dst)

return 0; } + +/** inany_prefix_pton - Parse an IPv[46] address with prefix length adjustment

"adjustment" doesn't make sense to me here.

...

+ * @src: IPv[46] address string + * @dst: output buffer, filled with parsed address + * @prefix_len: pointer to prefix length + * + * Return: AF_INET for IPv4, AF_INET6 for IPv6, -1 on error

I'm not particularly convinced that returning the address family here is useful, since the caller can call inany_v4() just as easily as we can. But if we do return the address family, the it probably makes more sense to use AF_UNSPEC for errors, rather than -1, since it's not - strictly speaking - guaranteed that one of the AF_* constants has the value -1.

...

+ */ +int inany_prefix_pton(char *src, union inany_addr *dst, int *prefix_len)

If we are returning an address family, the return type should be sa_family_t.

...

+{ + bool mapped = false; + struct in6_addr a6; + struct in_addr a4; + char *slash; + char *end; + int af; + + *prefix_len = 0;

As noted below, 0 is not a suitable value for "missing prefix", because 0 length prefixes are real and important. I think it would be better to make the prefix length non-optional for this functional. The caller can fall back to inany_pton() if this fails. That makes this function more easily reusable for cases where we _require_ a prefix length.

...

+ + /* Check for presence of /prefix_len suffix */ + slash = strchr(src, '/'); + if (slash) + *slash = '\0'; + + /* Read address */ + if (inet_pton(AF_INET, src, &a4)) { + inany_from_af(dst, AF_INET, &a4); + af = AF_INET; + } else if (inet_pton(AF_INET6, src, &a6)) { + inany_from_af(dst, AF_INET6, &a6); + af = AF_INET6; + if (inany_v4(dst)) + mapped = true; + } else { + memset(dst, 0, sizeof(*dst)); + return -1; + } + + if (!slash) + return mapped ? AF_INET : af;

You can avoid messing around with the mapped temporary by unconditionally deriving the address family from inany_v4().

...

+ + /* Read prefix_len - /0 is not allowed */

/0 should be allowed. It doesn't make much sense for -a, but it's valid and important if we use this in future for routes (a 0 length prefix indicates a default route).

...

+ errno = 0; + *prefix_len = strtoul(slash + 1, &end, 10); + if (errno || *end || *prefix_len == 0) + return -1; + + if (mapped) { + /* IPv4-mapped: prefix already in IPv6 format, must be 96-128 */ + if (*prefix_len < 96 || *prefix_len > 128) + return -1; + return AF_INET; + } + + if (af == AF_INET) { + /* Native IPv4: convert to IPv6 format */ + if (*prefix_len > 32) + return -1; + *prefix_len += 96;

If you make this adjustment before the previous stanza you again don't need the 'mapped' local and can use inany_v4().

...

+ return AF_INET; + } + + /* Native IPv6: keep as-is */ + if (*prefix_len > 128) + return -1;

And if you move this before the if (mapped) stanza, you can remove the duplicated check for > 128.

...

+ return AF_INET6; +} diff --git a/inany.h b/inany.h index 61b36fb..316ee44 100644 --- a/inany.h +++ b/inany.h @@ -295,5 +295,6 @@ static inline void inany_siphash_feed(struct siphash_state *state,

const char *inany_ntop(const union inany_addr *src, char *dst, socklen_t size); int inany_pton(const char *src, union inany_addr *dst); +int inany_prefix_pton(char *src, union inany_addr *dst, int *prefix_len);

#endif /* INANY_H */ diff --git a/ip.c b/ip.c index 9a7f4c5..40dc24e 100644 --- a/ip.c +++ b/ip.c @@ -13,6 +13,8 @@ */

#include +#include + #include "util.h" #include "ip.h"

@@ -67,3 +69,22 @@ found: *proto = nh; return true; } + +/** + * ip4_class_prefix_len() - Get class based prefix length for IPv4 address + * @addr: IPv4 address + * + * Return: prefix length based on address class, or 32 for other + */ +int ip4_class_prefix_len(const struct in_addr *addr) +{ + in_addr_t a = ntohl(addr->s_addr); + + if (IN_CLASSA(a)) + return 32 - IN_CLASSA_NSHIFT; + if (IN_CLASSB(a)) + return 32 - IN_CLASSB_NSHIFT; + if (IN_CLASSC(a)) + return 32 - IN_CLASSC_NSHIFT; + return 32; +} diff --git a/ip.h b/ip.h index 5830b92..bd28640 100644 --- a/ip.h +++ b/ip.h @@ -135,4 +135,6 @@ static const struct in_addr in4addr_broadcast = { 0xffffffff }; #define IPV6_MIN_MTU 1280 #endif

+int ip4_class_prefix_len(const struct in_addr *addr); + #endif /* IP_H */ diff --git a/passt.1 b/passt.1 index db0d662..53537c4 100644 --- a/passt.1 +++ b/passt.1 @@ -156,10 +156,14 @@ By default, the advertised MTU is 65520 bytes, that is, the maximum 802.3 MTU minus the length of a 802.3 header, rounded to 32 bits (IPv4 words).

.TP -.BR \-a ", " \-\-address " " \fIaddr +.BR \-a ", " \-\-address " " \fIaddr\fR[/\fIprefix_len\fR] Assign IPv4 \fIaddr\fR via DHCP (\fByiaddr\fR), or \fIaddr\fR via DHCPv6 (option 5) and an \fIaddr\fR-based prefix via NDP Router Advertisement (option type 3) for an IPv6 \fIaddr\fR. +An optional /\fIprefix_len\fR (1-32 for IPv4, 1-128 for IPv6) can be +appended in CIDR notation (e.g. 192.0.2.1/24). This is an alternative to +using the \fB-n\fR, \fB--netmask\fR option. Mixing CIDR notation with +\fB-n\fR results in an error. This option can be specified zero (for defaults) to two times (once for IPv4, once for IPv6). By default, assigned IPv4 and IPv6 addresses are taken from the host interfaces @@ -172,10 +176,13 @@ is assigned for IPv4, and no additional address will be assigned for IPv6. .TP .BR \-n ", " \-\-netmask " " \fImask Assign IPv4 netmask \fImask\fR, expressed as dot-decimal or number of bits, via -DHCP (option 1). -By default, the netmask associated to the host address matching the assigned one -is used. If there's no matching address on the host, the netmask is determined -according to the CIDR block of the assigned address (RFC 4632). +DHCP (option 1). Alternatively, the prefix length can be specified using CIDR +notation with the \fB-a\fR, \fB--address\fR option (e.g. \fB-a\fR 192.0.2.1/24). +Mixing \fB-n\fR with CIDR notation results in an error. +If no address is indicated, the netmask associated with the adopted host address, +if any, is used. If an address is indicated, but without a prefix length, the +netmask is determined based on the corresponding network class. In all other +cases, the netmask is determined by using the indicated prefix length.

.TP .BR \-M ", " \-\-mac-addr " " \fIaddr -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

2:16 p.m.

New subject: [PATCH v3 04/11] fwd: Check all configured addresses in guest accessibility functions

On Fri, Jan 30, 2026 at 04:44:40PM -0500, Jon Maloy wrote:

...

As a preparation for handling multiple addresses, we update fwd_guest_accessible4() and fwd_guest_accessible6() to check against all addresses in the unified addrs[] array using the for_each_addr() macro.

This ensures that when multiple addresses are configured via -a options, inbound traffic for any of them is correctly detected as having no valid forwarding path, and subsequently dropped. This occurs when a peer address collides with an address the guest is using, and we have no translation for it.

Signed-off-by: Jon Maloy

--- v2: Updated commit log to make it clearer v3: Adapted to changes earlier in the series --- fwd.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fwd.c b/fwd.c index 54248a3..20d581d 100644 --- a/fwd.c +++ b/fwd.c @@ -502,6 +502,8 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini) static bool fwd_guest_accessible4(const struct ctx *c, const struct in_addr *addr)

With the changes to this point, there's no longer a point to having separate fwd_guest_accessible4() and fwd_guest_accessible6() functions. fwd_guest_accessible() can check against the combined list in a single pass. This is an example of the simplfications that I think will outweigh the complications introduced by using merged list for v4 and v6 addresses.

...

{ + const struct inany_addr_entry *e; + if (IN4_IS_ADDR_LOOPBACK(addr)) return false;

@@ -513,12 +515,15 @@ static bool fwd_guest_accessible4(const struct ctx *c, if (IN4_IS_ADDR_UNSPECIFIED(addr)) return false;

- /* For IPv4, addr_seen is initialised to addr, so is always a valid - * address + /* Check against all configured guest addresses */ + for_each_addr(c, e, AF_INET) + if (IN4_ARE_ADDR_EQUAL(addr, inany_v4(&e->addr))) + return false; + + /* Also check addr_seen: it tracks the address the guest is actually + * using, which may differ from configured addresses. */ - if ((first_v4(c) && - IN4_ARE_ADDR_EQUAL(addr, inany_v4(&first_v4(c)->addr))) || - IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) + if (IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) return false;

return true; @@ -535,11 +540,15 @@ static bool fwd_guest_accessible4(const struct ctx *c, static bool fwd_guest_accessible6(const struct ctx *c, const struct in6_addr *addr) { + const struct inany_addr_entry *e; + if (IN6_IS_ADDR_LOOPBACK(addr)) return false;

- if (first_v6(c) && IN6_ARE_ADDR_EQUAL(addr, &first_v6(c)->addr.a6)) - return false; + /* Check against all configured guest addresses */ + for_each_addr(c, e, AF_INET6) + if (IN6_ARE_ADDR_EQUAL(addr, &e->addr.a6)) + return false;

/* For IPv6, addr_seen starts unspecified, because we don't know what LL * address the guest will take until we see it. Only check against it @@ -714,7 +723,7 @@ bool nat_inbound(const struct ctx *c, const union inany_addr *addr, first_v4(c) && inany_equals(addr, &first_v4(c)->addr)) { *translated = inany_from_v4(c->ip4.map_guest_addr); } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_guest_addr) && - first_v6(c) && inany_equals6(addr, &first_v6(c)->addr.a6)) { + first_v6(c) && inany_equals(addr, &first_v6(c)->addr)) { translated->a6 = c->ip6.map_guest_addr; } else if (fwd_guest_accessible(c, addr)) { *translated = *addr; -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

1:52 p.m.

New subject: [PATCH v3 02/11] ip: Add IN4_MASK() macro for IPv4 netmask calculation

On Fri, Jan 30, 2026 at 04:44:38PM -0500, Jon Maloy wrote:

...

Add a convenience macro to compute an IPv4 netmask from a prefix length. This simplifies netmask calculations throughout the codebase.

Signed-off-by: Jon Maloy

Reviewed-by: David Gibson

...

--- conf.c | 2 +- dhcp.c | 2 +- ip.h | 2 ++ 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/conf.c b/conf.c index 98d5d17..5188c02 100644 --- a/conf.c +++ b/conf.c @@ -1147,7 +1147,7 @@ static void conf_print(const struct ctx *c) if (!c->no_dhcp) { uint32_t mask;

- mask = htonl(0xffffffff << (32 - c->ip4.prefix_len)); + mask = IN4_MASK(c->ip4.prefix_len);

info("DHCP:"); info(" assign: %s", diff --git a/dhcp.c b/dhcp.c index 6b9c2e3..c552f01 100644 --- a/dhcp.c +++ b/dhcp.c @@ -404,7 +404,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data)

info(" from %s", eth_ntop(m->chaddr, macstr, sizeof(macstr)));

- mask.s_addr = htonl(0xffffffff << (32 - c->ip4.prefix_len)); + mask.s_addr = IN4_MASK(c->ip4.prefix_len); memcpy(opts[1].s, &mask, sizeof(mask)); memcpy(opts[3].s, &c->ip4.guest_gw, sizeof(c->ip4.guest_gw)); memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr)); diff --git a/ip.h b/ip.h index bd28640..c829d84 100644 --- a/ip.h +++ b/ip.h @@ -17,6 +17,8 @@ (ntohl(((struct in_addr *)(a))->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET) #define IN4_IS_ADDR_MULTICAST(a) \ (IN_MULTICAST(ntohl(((struct in_addr *)(a))->s_addr))) +#define IN4_MASK(prefix_len) \ + (htonl(0xffffffff << (128 - (prefix_len)))) #define IN4_ARE_ADDR_EQUAL(a, b) \ (((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr) #define IN4ADDR_LOOPBACK_INIT \ -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

Jon Maloy

5 Feb 5 Feb

1:56 a.m.

New subject: [PATCH v3 01/11] conf: Support CIDR notation for -a/--address option

On 2026-02-04 07:50, David Gibson wrote:

...

On Fri, Jan 30, 2026 at 04:44:37PM -0500, Jon Maloy wrote:

...
Extend the -a/--address option to accept addresses in CIDR notation (e.g., 192.168.1.1/24 or 2001:db8::1/64) as an alternative to using separate -a and -n options.

We add a new inany_prefix_pton() helper function that: - Parses address strings with optional /prefix_len suffix - Validates prefix length based on address family (0-32 for IPv4, 0-128 for IPv6), including handling of IPv4-to-IPv6 mapping case. - Returns identified address family, if any.

For IPv4, the prefix length is stored in ip4.prefix_len when provided. Mixing -n and CIDR notation results in an error to catch likely user mistakes.

Also fix a bug in conf_ip4_prefix() that was incorrectly using the global 'optarg' instead of its 'arg' parameter.

Signed-off-by: Jon Maloy

--- v3: Fixes after feedback from Laurent, David and Stefano Notably, updated man page for the -a option

v4: Fixes based on feedback from David G: - Handling prefix length adjustment when IPv4-to-IPv6 mapping - Removed redundant !IN6_IS_ADDR_V4MAPPED(&addr.a6) test - Simplified tests of acceptable address types - Merged documentation and code commits - Some documentation text clarifications

v5: - Moved address/prefix parsing into a refactored inany_prefix_pton() function. - inany_prefix_pton() now only caluclates IPv6 style prefix lengths - Stricter distinction between error causes. - Some refactoring of the 'case a:' branch in conf() - Some small fixes in passt.1 --- conf.c | 58 +++++++++++++++++++++++++++++------------------- inany.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ inany.h | 1 + ip.c | 21 ++++++++++++++++++ ip.h | 2 ++ passt.1 | 17 ++++++++++----- 6 files changed, 139 insertions(+), 28 deletions(-)

diff --git a/conf.c b/conf.c index 2942c8c..98d5d17 100644 --- a/conf.c +++ b/conf.c @@ -682,7 +682,7 @@ static int conf_ip4_prefix(const char *arg) return -1; } else { errno = 0; - len = strtoul(optarg, NULL, 0); + len = strtoul(arg, NULL, 0); if (len > 32 || errno) return -1; } @@ -896,7 +896,7 @@ static void usage(const char *name, FILE *f, int status) " a zero value disables assignment\n" " default: 65520: maximum 802.3 MTU minus 802.3 header\n" " length, rounded to 32 bits (IPv4 words)\n" - " -a, --address ADDR Assign IPv4 or IPv6 address ADDR\n" + " -a, --address ADDR Assign IPv4 or IPv6 address ADDR[/PREFIXLEN]\n" " can be specified zero to two times (for IPv4 and IPv6)\n" " default: use addresses from interface with default route\n" " -n, --netmask MASK Assign IPv4 MASK, dot-decimal or bits\n" @@ -1498,6 +1498,7 @@ void conf(struct ctx *c, int argc, char **argv) const char *optstring = "+dqfel:hs:F:I:p:P:m:a:n:M:g:i:o:D:S:H:461t:u:T:U:"; const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt"; char userns[PATH_MAX] = { 0 }, netns[PATH_MAX] = { 0 }; + bool prefix_from_cidr = false, prefix_from_opt = false; bool copy_addrs_opt = false, copy_routes_opt = false; enum fwd_ports_mode fwd_default = FWD_NONE; bool v4_only = false, v6_only = false; @@ -1808,35 +1809,46 @@ void conf(struct ctx *c, int argc, char **argv) c->mtu = mtu; break; } - case 'a': - if (inet_pton(AF_INET6, optarg, &c->ip6.addr) && - !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr) && - !IN6_IS_ADDR_LOOPBACK(&c->ip6.addr) && - !IN6_IS_ADDR_V4MAPPED(&c->ip6.addr) && - !IN6_IS_ADDR_V4COMPAT(&c->ip6.addr) && - !IN6_IS_ADDR_MULTICAST(&c->ip6.addr)) { - if (c->mode == MODE_PASTA) - c->ip6.no_copy_addrs = true; - break; - } - - if (inet_pton(AF_INET, optarg, &c->ip4.addr) && - !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.addr) && - !IN4_IS_ADDR_BROADCAST(&c->ip4.addr) && - !IN4_IS_ADDR_LOOPBACK(&c->ip4.addr) && - !IN4_IS_ADDR_MULTICAST(&c->ip4.addr)) { + case 'a': { + union inany_addr addr = { 0 };

You're unconditionally calling inany_prefix_pton(), so you shouldn't need to initialise this.

Yes, to guarantee I get an invalid address in case the parsing fails. However, I should do it inside the funtion instead.

...

...
+ int prefix_len = 0;

Or this.

Same.

...

...
+ int af; + + af = inany_prefix_pton(optarg, &addr, &prefix_len);

You need to check for errors from inany_prefix_pton().

I do that. Further down.

...

...
+ if (inany_is_unspecified(&addr) || + inany_is_multicast(&addr) || + inany_is_loopback(&addr) || + IN6_IS_ADDR_V4COMPAT(&addr.a6)) + die("Invalid address: %s", optarg);

For sanity / extensibility, it probably also makes to fail here if af is not either AF_INET or AF_INET6.

This is what I do further down.

...

...
+ + if (prefix_len && !prefix_from_opt) + prefix_from_cidr = true; + else if (prefix_len) + die("Can't mix CIDR with -n"); + + if (af == AF_INET) { + c->ip4.addr = *inany_v4(&addr); + c->ip4.prefix_len = prefix_len ? prefix_len - 96 : + ip4_class_prefix_len(&c->ip4.addr); if (c->mode == MODE_PASTA) c->ip4.no_copy_addrs = true; - break; + } else if (af == AF_INET6) { + c->ip6.addr = addr.a6; + if (c->mode == MODE_PASTA) + c->ip6.no_copy_addrs = true; + } else { + die("Invalid prefix length: %d", prefix_len);

This error message does not seem to match the conditions that trigger it.

The only way you can obtain a valid address and a return value different from AF_INET and AF_INET6 is that the prefix_len parsing failed or the value was invalid. All the above is logically correct, although we can discuss if it is the cleanest and best algorithm. The fact that you misunderstand this is a sign it is not.

...

...
} - - die("Invalid address: %s", optarg); break; + } case 'n': + if (prefix_from_cidr) + die("Can't use both -n and CIDR prefix length"); c->ip4.prefix_len = conf_ip4_prefix(optarg); if (c->ip4.prefix_len < 0) die("Invalid netmask: %s", optarg); - + prefix_from_opt = true; break; case 'M': parse_mac(c->our_tap_mac, optarg); diff --git a/inany.c b/inany.c index 7680439..00d44e0 100644 --- a/inany.c +++ b/inany.c @@ -11,6 +11,7 @@ #include #include #include +#include

#include "util.h" #include "ip.h" @@ -57,3 +58,70 @@ int inany_pton(const char *src, union inany_addr *dst)

return 0; } + +/** inany_prefix_pton - Parse an IPv[46] address with prefix length adjustment

"adjustment" doesn't make sense to me here.

...
+ * @src: IPv[46] address string + * @dst: output buffer, filled with parsed address + * @prefix_len: pointer to prefix length + * + * Return: AF_INET for IPv4, AF_INET6 for IPv6, -1 on error

I'm not particularly convinced that returning the address family here is useful, since the caller can call inany_v4() just as easily as we can. But if we do return the address family, the it probably makes more sense to use AF_UNSPEC for errors, rather than -1, since it's not - strictly speaking - guaranteed that one of the AF_* constants has the value -1.

AF_UNSPEC is a valid code, meaning "any" or "don't care". That doesn't sound like a suitable error code, and we do want to know if the parsing failed. I think I'll go for just a 0/-1 return value instead.

...

...
+ */ +int inany_prefix_pton(char *src, union inany_addr *dst, int *prefix_len)

If we are returning an address family, the return type should be sa_family_t.

Ok.

...

...
+{ + bool mapped = false; + struct in6_addr a6; + struct in_addr a4; + char *slash; + char *end; + int af; + + *prefix_len = 0;

As noted below, 0 is not a suitable value for "missing prefix", because 0 length prefixes are real and important.

0 is a valid value, but the question is if /0 ia a valid suffix in our CIDR format. I see below that you think so.

...

I think it would be better to make the prefix length non-optional for this functional. The caller can fall back to inany_pton() if this fails. That makes this function more easily reusable for cases where we _require_ a prefix length.

I can try that.

...

...
+ + /* Check for presence of /prefix_len suffix */ + slash = strchr(src, '/'); + if (slash) + *slash = '\0'; + + /* Read address */ + if (inet_pton(AF_INET, src, &a4)) { + inany_from_af(dst, AF_INET, &a4); + af = AF_INET; + } else if (inet_pton(AF_INET6, src, &a6)) { + inany_from_af(dst, AF_INET6, &a6); + af = AF_INET6; + if (inany_v4(dst)) + mapped = true; + } else { + memset(dst, 0, sizeof(*dst)); + return -1; + } + + if (!slash) + return mapped ? AF_INET : af;

You can avoid messing around with the mapped temporary by unconditionally deriving the address family from inany_v4().

Ok.

...

...
+ + /* Read prefix_len - /0 is not allowed */

/0 should be allowed. It doesn't make much sense for -a, but it's valid and important if we use this in future for routes (a 0 length prefix indicates a default route).

ok. That changes a few things.

...

...
+ errno = 0; + *prefix_len = strtoul(slash + 1, &end, 10); + if (errno || *end || *prefix_len == 0) + return -1; + + if (mapped) { + /* IPv4-mapped: prefix already in IPv6 format, must be 96-128 */ + if (*prefix_len < 96 || *prefix_len > 128) + return -1; + return AF_INET; + } + + if (af == AF_INET) { + /* Native IPv4: convert to IPv6 format */ + if (*prefix_len > 32) + return -1; + *prefix_len += 96;

If you make this adjustment before the previous stanza you again don't need the 'mapped' local and can use inany_v4().

Good point.

...

...
+ return AF_INET; + } + + /* Native IPv6: keep as-is */ + if (*prefix_len > 128) + return -1;

And if you move this before the if (mapped) stanza, you can remove the duplicated check for > 128.

I'll give it a try. ///jon

...

...
+ return AF_INET6; +} diff --git a/inany.h b/inany.h index 61b36fb..316ee44 100644 --- a/inany.h +++ b/inany.h @@ -295,5 +295,6 @@ static inline void inany_siphash_feed(struct siphash_state *state,

const char *inany_ntop(const union inany_addr *src, char *dst, socklen_t size); int inany_pton(const char *src, union inany_addr *dst); +int inany_prefix_pton(char *src, union inany_addr *dst, int *prefix_len);

#endif /* INANY_H */ diff --git a/ip.c b/ip.c index 9a7f4c5..40dc24e 100644 --- a/ip.c +++ b/ip.c @@ -13,6 +13,8 @@ */

#include +#include + #include "util.h" #include "ip.h"

@@ -67,3 +69,22 @@ found: *proto = nh; return true; } + +/** + * ip4_class_prefix_len() - Get class based prefix length for IPv4 address + * @addr: IPv4 address + * + * Return: prefix length based on address class, or 32 for other + */ +int ip4_class_prefix_len(const struct in_addr *addr) +{ + in_addr_t a = ntohl(addr->s_addr); + + if (IN_CLASSA(a)) + return 32 - IN_CLASSA_NSHIFT; + if (IN_CLASSB(a)) + return 32 - IN_CLASSB_NSHIFT; + if (IN_CLASSC(a)) + return 32 - IN_CLASSC_NSHIFT; + return 32; +} diff --git a/ip.h b/ip.h index 5830b92..bd28640 100644 --- a/ip.h +++ b/ip.h @@ -135,4 +135,6 @@ static const struct in_addr in4addr_broadcast = { 0xffffffff }; #define IPV6_MIN_MTU 1280 #endif

+int ip4_class_prefix_len(const struct in_addr *addr); + #endif /* IP_H */ diff --git a/passt.1 b/passt.1 index db0d662..53537c4 100644 --- a/passt.1 +++ b/passt.1 @@ -156,10 +156,14 @@ By default, the advertised MTU is 65520 bytes, that is, the maximum 802.3 MTU minus the length of a 802.3 header, rounded to 32 bits (IPv4 words).

.TP -.BR \-a ", " \-\-address " " \fIaddr +.BR \-a ", " \-\-address " " \fIaddr\fR[/\fIprefix_len\fR] Assign IPv4 \fIaddr\fR via DHCP (\fByiaddr\fR), or \fIaddr\fR via DHCPv6 (option 5) and an \fIaddr\fR-based prefix via NDP Router Advertisement (option type 3) for an IPv6 \fIaddr\fR. +An optional /\fIprefix_len\fR (1-32 for IPv4, 1-128 for IPv6) can be +appended in CIDR notation (e.g. 192.0.2.1/24). This is an alternative to +using the \fB-n\fR, \fB--netmask\fR option. Mixing CIDR notation with +\fB-n\fR results in an error. This option can be specified zero (for defaults) to two times (once for IPv4, once for IPv6). By default, assigned IPv4 and IPv6 addresses are taken from the host interfaces @@ -172,10 +176,13 @@ is assigned for IPv4, and no additional address will be assigned for IPv6. .TP .BR \-n ", " \-\-netmask " " \fImask Assign IPv4 netmask \fImask\fR, expressed as dot-decimal or number of bits, via -DHCP (option 1). -By default, the netmask associated to the host address matching the assigned one -is used. If there's no matching address on the host, the netmask is determined -according to the CIDR block of the assigned address (RFC 4632). +DHCP (option 1). Alternatively, the prefix length can be specified using CIDR +notation with the \fB-a\fR, \fB--address\fR option (e.g. \fB-a\fR 192.0.2.1/24). +Mixing \fB-n\fR with CIDR notation results in an error. +If no address is indicated, the netmask associated with the adopted host address, +if any, is used. If an address is indicated, but without a prefix length, the +netmask is determined based on the corresponding network class. In all other +cases, the netmask is determined by using the indicated prefix length.

.TP .BR \-M ", " \-\-mac-addr " " \fIaddr -- 2.52.0

Jon Maloy

2:01 a.m.

New subject: [PATCH v3 04/11] fwd: Check all configured addresses in guest accessibility functions

On 2026-02-04 08:16, David Gibson wrote:

...

On Fri, Jan 30, 2026 at 04:44:40PM -0500, Jon Maloy wrote:

...
As a preparation for handling multiple addresses, we update fwd_guest_accessible4() and fwd_guest_accessible6() to check against all addresses in the unified addrs[] array using the for_each_addr() macro.

This ensures that when multiple addresses are configured via -a options, inbound traffic for any of them is correctly detected as having no valid forwarding path, and subsequently dropped. This occurs when a peer address collides with an address the guest is using, and we have no translation for it.

Signed-off-by: Jon Maloy

--- v2: Updated commit log to make it clearer v3: Adapted to changes earlier in the series --- fwd.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fwd.c b/fwd.c index 54248a3..20d581d 100644 --- a/fwd.c +++ b/fwd.c @@ -502,6 +502,8 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini) static bool fwd_guest_accessible4(const struct ctx *c, const struct in_addr *addr)

With the changes to this point, there's no longer a point to having separate fwd_guest_accessible4() and fwd_guest_accessible6() functions. fwd_guest_accessible() can check against the combined list in a single pass.

Absolutely. I have already identified several functions which can be unified in a similar way, and have already implemented some. But I still want to wait with such changes until this series has been applied. ///jon

...

This is an example of the simplfications that I think will outweigh the complications introduced by using merged list for v4 and v6 addresses.

...
{ + const struct inany_addr_entry *e; + if (IN4_IS_ADDR_LOOPBACK(addr)) return false;

@@ -513,12 +515,15 @@ static bool fwd_guest_accessible4(const struct ctx *c, if (IN4_IS_ADDR_UNSPECIFIED(addr)) return false;

- /* For IPv4, addr_seen is initialised to addr, so is always a valid - * address + /* Check against all configured guest addresses */ + for_each_addr(c, e, AF_INET) + if (IN4_ARE_ADDR_EQUAL(addr, inany_v4(&e->addr))) + return false; + + /* Also check addr_seen: it tracks the address the guest is actually + * using, which may differ from configured addresses. */ - if ((first_v4(c) && - IN4_ARE_ADDR_EQUAL(addr, inany_v4(&first_v4(c)->addr))) || - IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) + if (IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) return false;

return true; @@ -535,11 +540,15 @@ static bool fwd_guest_accessible4(const struct ctx *c, static bool fwd_guest_accessible6(const struct ctx *c, const struct in6_addr *addr) { + const struct inany_addr_entry *e; + if (IN6_IS_ADDR_LOOPBACK(addr)) return false;

- if (first_v6(c) && IN6_ARE_ADDR_EQUAL(addr, &first_v6(c)->addr.a6)) - return false; + /* Check against all configured guest addresses */ + for_each_addr(c, e, AF_INET6) + if (IN6_ARE_ADDR_EQUAL(addr, &e->addr.a6)) + return false;

/* For IPv6, addr_seen starts unspecified, because we don't know what LL * address the guest will take until we see it. Only check against it @@ -714,7 +723,7 @@ bool nat_inbound(const struct ctx *c, const union inany_addr *addr, first_v4(c) && inany_equals(addr, &first_v4(c)->addr)) { *translated = inany_from_v4(c->ip4.map_guest_addr); } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_guest_addr) && - first_v6(c) && inany_equals6(addr, &first_v6(c)->addr.a6)) { + first_v6(c) && inany_equals(addr, &first_v6(c)->addr)) { translated->a6 = c->ip6.map_guest_addr; } else if (fwd_guest_accessible(c, addr)) { *translated = *addr; -- 2.52.0

David Gibson

6 Feb 6 Feb

4:26 a.m.

New subject: [PATCH v3 01/11] conf: Support CIDR notation for -a/--address option

On Wed, Feb 04, 2026 at 07:56:35PM -0500, Jon Maloy wrote:

...

On 2026-02-04 07:50, David Gibson wrote:

...
On Fri, Jan 30, 2026 at 04:44:37PM -0500, Jon Maloy wrote:

...
Extend the -a/--address option to accept addresses in CIDR notation (e.g., 192.168.1.1/24 or 2001:db8::1/64) as an alternative to using separate -a and -n options.

We add a new inany_prefix_pton() helper function that: - Parses address strings with optional /prefix_len suffix - Validates prefix length based on address family (0-32 for IPv4, 0-128 for IPv6), including handling of IPv4-to-IPv6 mapping case. - Returns identified address family, if any.

For IPv4, the prefix length is stored in ip4.prefix_len when provided. Mixing -n and CIDR notation results in an error to catch likely user mistakes.

Also fix a bug in conf_ip4_prefix() that was incorrectly using the global 'optarg' instead of its 'arg' parameter.

Signed-off-by: Jon Maloy

--- v3: Fixes after feedback from Laurent, David and Stefano Notably, updated man page for the -a option

v4: Fixes based on feedback from David G: - Handling prefix length adjustment when IPv4-to-IPv6 mapping - Removed redundant !IN6_IS_ADDR_V4MAPPED(&addr.a6) test - Simplified tests of acceptable address types - Merged documentation and code commits - Some documentation text clarifications

v5: - Moved address/prefix parsing into a refactored inany_prefix_pton() function. - inany_prefix_pton() now only caluclates IPv6 style prefix lengths - Stricter distinction between error causes. - Some refactoring of the 'case a:' branch in conf() - Some small fixes in passt.1 --- conf.c | 58 +++++++++++++++++++++++++++++------------------- inany.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ inany.h | 1 + ip.c | 21 ++++++++++++++++++ ip.h | 2 ++ passt.1 | 17 ++++++++++----- 6 files changed, 139 insertions(+), 28 deletions(-)

diff --git a/conf.c b/conf.c index 2942c8c..98d5d17 100644 --- a/conf.c +++ b/conf.c @@ -682,7 +682,7 @@ static int conf_ip4_prefix(const char *arg) return -1; } else { errno = 0; - len = strtoul(optarg, NULL, 0); + len = strtoul(arg, NULL, 0); if (len > 32 || errno) return -1; } @@ -896,7 +896,7 @@ static void usage(const char *name, FILE *f, int status) " a zero value disables assignment\n" " default: 65520: maximum 802.3 MTU minus 802.3 header\n" " length, rounded to 32 bits (IPv4 words)\n" - " -a, --address ADDR Assign IPv4 or IPv6 address ADDR\n" + " -a, --address ADDR Assign IPv4 or IPv6 address ADDR[/PREFIXLEN]\n" " can be specified zero to two times (for IPv4 and IPv6)\n" " default: use addresses from interface with default route\n" " -n, --netmask MASK Assign IPv4 MASK, dot-decimal or bits\n" @@ -1498,6 +1498,7 @@ void conf(struct ctx *c, int argc, char **argv) const char *optstring = "+dqfel:hs:F:I:p:P:m:a:n:M:g:i:o:D:S:H:461t:u:T:U:"; const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt"; char userns[PATH_MAX] = { 0 }, netns[PATH_MAX] = { 0 }; + bool prefix_from_cidr = false, prefix_from_opt = false; bool copy_addrs_opt = false, copy_routes_opt = false; enum fwd_ports_mode fwd_default = FWD_NONE; bool v4_only = false, v6_only = false; @@ -1808,35 +1809,46 @@ void conf(struct ctx *c, int argc, char **argv) c->mtu = mtu; break; } - case 'a': - if (inet_pton(AF_INET6, optarg, &c->ip6.addr) && - !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr) && - !IN6_IS_ADDR_LOOPBACK(&c->ip6.addr) && - !IN6_IS_ADDR_V4MAPPED(&c->ip6.addr) && - !IN6_IS_ADDR_V4COMPAT(&c->ip6.addr) && - !IN6_IS_ADDR_MULTICAST(&c->ip6.addr)) { - if (c->mode == MODE_PASTA) - c->ip6.no_copy_addrs = true; - break; - } - - if (inet_pton(AF_INET, optarg, &c->ip4.addr) && - !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.addr) && - !IN4_IS_ADDR_BROADCAST(&c->ip4.addr) && - !IN4_IS_ADDR_LOOPBACK(&c->ip4.addr) && - !IN4_IS_ADDR_MULTICAST(&c->ip4.addr)) { + case 'a': { + union inany_addr addr = { 0 };

You're unconditionally calling inany_prefix_pton(), so you shouldn't need to initialise this.

Yes, to guarantee I get an invalid address in case the parsing fails. However, I should do it inside the funtion instead.

Right.

...

...
...
+ int prefix_len = 0;

Or this.

Same.

...
...
+ int af; + + af = inany_prefix_pton(optarg, &addr, &prefix_len);

You need to check for errors from inany_prefix_pton().

I do that. Further down.

Ok, but it's hard to follow when separated from the call like that.

...

...
...
+ if (inany_is_unspecified(&addr) || + inany_is_multicast(&addr) || + inany_is_loopback(&addr) || + IN6_IS_ADDR_V4COMPAT(&addr.a6)) + die("Invalid address: %s", optarg);

For sanity / extensibility, it probably also makes to fail here if af is not either AF_INET or AF_INET6.

This is what I do further down.

Ah, true.

...

...
...
+ + if (prefix_len && !prefix_from_opt) + prefix_from_cidr = true; + else if (prefix_len) + die("Can't mix CIDR with -n"); + + if (af == AF_INET) { + c->ip4.addr = *inany_v4(&addr); + c->ip4.prefix_len = prefix_len ? prefix_len - 96 : + ip4_class_prefix_len(&c->ip4.addr); if (c->mode == MODE_PASTA) c->ip4.no_copy_addrs = true; - break; + } else if (af == AF_INET6) { + c->ip6.addr = addr.a6; + if (c->mode == MODE_PASTA) + c->ip6.no_copy_addrs = true; + } else { + die("Invalid prefix length: %d", prefix_len);

This error message does not seem to match the conditions that trigger it.

The only way you can obtain a valid address and a return value different from AF_INET and AF_INET6 is that the prefix_len parsing failed or the value was invalid.

Right, but it's a minor layering violation to assume that in the caller.

...

All the above is logically correct, although we can discuss if it is the cleanest and best algorithm. The fact that you misunderstand this is a sign it is not.

...
...
} - - die("Invalid address: %s", optarg); break; + } case 'n': + if (prefix_from_cidr) + die("Can't use both -n and CIDR prefix length"); c->ip4.prefix_len = conf_ip4_prefix(optarg); if (c->ip4.prefix_len < 0) die("Invalid netmask: %s", optarg); - + prefix_from_opt = true; break; case 'M': parse_mac(c->our_tap_mac, optarg); diff --git a/inany.c b/inany.c index 7680439..00d44e0 100644 --- a/inany.c +++ b/inany.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "util.h" #include "ip.h" @@ -57,3 +58,70 @@ int inany_pton(const char *src, union inany_addr *dst) return 0; } + +/** inany_prefix_pton - Parse an IPv[46] address with prefix length adjustment

"adjustment" doesn't make sense to me here.

...
+ * @src: IPv[46] address string + * @dst: output buffer, filled with parsed address + * @prefix_len: pointer to prefix length + * + * Return: AF_INET for IPv4, AF_INET6 for IPv6, -1 on error

I'm not particularly convinced that returning the address family here is useful, since the caller can call inany_v4() just as easily as we can. But if we do return the address family, the it probably makes more sense to use AF_UNSPEC for errors, rather than -1, since it's not - strictly speaking - guaranteed that one of the AF_* constants has the value -1.

AF_UNSPEC is a valid code, meaning "any" or "don't care". That doesn't sound like a suitable error code, and we do want to know if the parsing failed. I think I'll go for just a 0/-1 return value instead.

Ok.

...

...
...
+ */ +int inany_prefix_pton(char *src, union inany_addr *dst, int *prefix_len)

If we are returning an address family, the return type should be sa_family_t. Ok.

...
+{ + bool mapped = false; + struct in6_addr a6; + struct in_addr a4; + char *slash; + char *end; + int af; + + *prefix_len = 0;

As noted below, 0 is not a suitable value for "missing prefix", because 0 length prefixes are real and important.

0 is a valid value, but the question is if /0 ia a valid suffix in our CIDR format. I see below that you think so.

For -a, maybe not. But we want to be able to re-use this for things where it will be.

...

...
I think it would be better to make the prefix length non-optional for this functional. The caller can fall back to inany_pton() if this fails. That makes this function more easily reusable for cases where we _require_ a prefix length.

I can try that.

...
...
+ + /* Check for presence of /prefix_len suffix */ + slash = strchr(src, '/'); + if (slash) + *slash = '\0'; + + /* Read address */ + if (inet_pton(AF_INET, src, &a4)) { + inany_from_af(dst, AF_INET, &a4); + af = AF_INET; + } else if (inet_pton(AF_INET6, src, &a6)) { + inany_from_af(dst, AF_INET6, &a6); + af = AF_INET6; + if (inany_v4(dst)) + mapped = true; + } else { + memset(dst, 0, sizeof(*dst)); + return -1; + } + + if (!slash) + return mapped ? AF_INET : af;

You can avoid messing around with the mapped temporary by unconditionally deriving the address family from inany_v4().

Ok.

...
...
+ + /* Read prefix_len - /0 is not allowed */

/0 should be allowed. It doesn't make much sense for -a, but it's valid and important if we use this in future for routes (a 0 length prefix indicates a default route).

ok. That changes a few things.

...
...
+ errno = 0; + *prefix_len = strtoul(slash + 1, &end, 10); + if (errno || *end || *prefix_len == 0) + return -1; + + if (mapped) { + /* IPv4-mapped: prefix already in IPv6 format, must be 96-128 */ + if (*prefix_len < 96 || *prefix_len > 128) + return -1; + return AF_INET; + } + + if (af == AF_INET) { + /* Native IPv4: convert to IPv6 format */ + if (*prefix_len > 32) + return -1; + *prefix_len += 96;

If you make this adjustment before the previous stanza you again don't need the 'mapped' local and can use inany_v4().

Good point.

...
...
+ return AF_INET; + } + + /* Native IPv6: keep as-is */ + if (*prefix_len > 128) + return -1;

And if you move this before the if (mapped) stanza, you can remove the duplicated check for > 128.

I'll give it a try.

///jon

...
...
+ return AF_INET6; +} diff --git a/inany.h b/inany.h index 61b36fb..316ee44 100644 --- a/inany.h +++ b/inany.h @@ -295,5 +295,6 @@ static inline void inany_siphash_feed(struct siphash_state *state, const char *inany_ntop(const union inany_addr *src, char *dst, socklen_t size); int inany_pton(const char *src, union inany_addr *dst); +int inany_prefix_pton(char *src, union inany_addr *dst, int *prefix_len); #endif /* INANY_H */ diff --git a/ip.c b/ip.c index 9a7f4c5..40dc24e 100644 --- a/ip.c +++ b/ip.c @@ -13,6 +13,8 @@ */ #include +#include + #include "util.h" #include "ip.h" @@ -67,3 +69,22 @@ found: *proto = nh; return true; } + +/** + * ip4_class_prefix_len() - Get class based prefix length for IPv4 address + * @addr: IPv4 address + * + * Return: prefix length based on address class, or 32 for other + */ +int ip4_class_prefix_len(const struct in_addr *addr) +{ + in_addr_t a = ntohl(addr->s_addr); + + if (IN_CLASSA(a)) + return 32 - IN_CLASSA_NSHIFT; + if (IN_CLASSB(a)) + return 32 - IN_CLASSB_NSHIFT; + if (IN_CLASSC(a)) + return 32 - IN_CLASSC_NSHIFT; + return 32; +} diff --git a/ip.h b/ip.h index 5830b92..bd28640 100644 --- a/ip.h +++ b/ip.h @@ -135,4 +135,6 @@ static const struct in_addr in4addr_broadcast = { 0xffffffff }; #define IPV6_MIN_MTU 1280 #endif +int ip4_class_prefix_len(const struct in_addr *addr); + #endif /* IP_H */ diff --git a/passt.1 b/passt.1 index db0d662..53537c4 100644 --- a/passt.1 +++ b/passt.1 @@ -156,10 +156,14 @@ By default, the advertised MTU is 65520 bytes, that is, the maximum 802.3 MTU minus the length of a 802.3 header, rounded to 32 bits (IPv4 words). .TP -.BR \-a ", " \-\-address " " \fIaddr +.BR \-a ", " \-\-address " " \fIaddr\fR[/\fIprefix_len\fR] Assign IPv4 \fIaddr\fR via DHCP (\fByiaddr\fR), or \fIaddr\fR via DHCPv6 (option 5) and an \fIaddr\fR-based prefix via NDP Router Advertisement (option type 3) for an IPv6 \fIaddr\fR. +An optional /\fIprefix_len\fR (1-32 for IPv4, 1-128 for IPv6) can be +appended in CIDR notation (e.g. 192.0.2.1/24). This is an alternative to +using the \fB-n\fR, \fB--netmask\fR option. Mixing CIDR notation with +\fB-n\fR results in an error. This option can be specified zero (for defaults) to two times (once for IPv4, once for IPv6). By default, assigned IPv4 and IPv6 addresses are taken from the host interfaces @@ -172,10 +176,13 @@ is assigned for IPv4, and no additional address will be assigned for IPv6. .TP .BR \-n ", " \-\-netmask " " \fImask Assign IPv4 netmask \fImask\fR, expressed as dot-decimal or number of bits, via -DHCP (option 1). -By default, the netmask associated to the host address matching the assigned one -is used. If there's no matching address on the host, the netmask is determined -according to the CIDR block of the assigned address (RFC 4632). +DHCP (option 1). Alternatively, the prefix length can be specified using CIDR +notation with the \fB-a\fR, \fB--address\fR option (e.g. \fB-a\fR 192.0.2.1/24). +Mixing \fB-n\fR with CIDR notation results in an error. +If no address is indicated, the netmask associated with the adopted host address, +if any, is used. If an address is indicated, but without a prefix length, the +netmask is determined based on the corresponding network class. In all other +cases, the netmask is determined by using the indicated prefix length. .TP .BR \-M ", " \-\-mac-addr " " \fIaddr -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

4:29 a.m.

New subject: [PATCH v3 04/11] fwd: Check all configured addresses in guest accessibility functions

On Wed, Feb 04, 2026 at 08:01:45PM -0500, Jon Maloy wrote:

...

On 2026-02-04 08:16, David Gibson wrote:

...
On Fri, Jan 30, 2026 at 04:44:40PM -0500, Jon Maloy wrote:

...
As a preparation for handling multiple addresses, we update fwd_guest_accessible4() and fwd_guest_accessible6() to check against all addresses in the unified addrs[] array using the for_each_addr() macro.

This ensures that when multiple addresses are configured via -a options, inbound traffic for any of them is correctly detected as having no valid forwarding path, and subsequently dropped. This occurs when a peer address collides with an address the guest is using, and we have no translation for it.

Signed-off-by: Jon Maloy

--- v2: Updated commit log to make it clearer v3: Adapted to changes earlier in the series --- fwd.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fwd.c b/fwd.c index 54248a3..20d581d 100644 --- a/fwd.c +++ b/fwd.c @@ -502,6 +502,8 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini) static bool fwd_guest_accessible4(const struct ctx *c, const struct in_addr *addr)

With the changes to this point, there's no longer a point to having separate fwd_guest_accessible4() and fwd_guest_accessible6() functions. fwd_guest_accessible() can check against the combined list in a single pass.

Absolutely. I have already identified several functions which can be unified in a similar way, and have already implemented some. But I still want to wait with such changes until this series has been applied.

That doesn't make a lot of sense to me. I mean, I understand doing things one step at a time. But in this case keeping the old structure means a bunch of extra churn, and the intermediate steps are more confusing to read and review than they need to be.

...

...
This is an example of the simplfications that I think will outweigh the complications introduced by using merged list for v4 and v6 addresses.

...
{ + const struct inany_addr_entry *e; + if (IN4_IS_ADDR_LOOPBACK(addr)) return false; @@ -513,12 +515,15 @@ static bool fwd_guest_accessible4(const struct ctx *c, if (IN4_IS_ADDR_UNSPECIFIED(addr)) return false; - /* For IPv4, addr_seen is initialised to addr, so is always a valid - * address + /* Check against all configured guest addresses */ + for_each_addr(c, e, AF_INET) + if (IN4_ARE_ADDR_EQUAL(addr, inany_v4(&e->addr))) + return false; + + /* Also check addr_seen: it tracks the address the guest is actually + * using, which may differ from configured addresses. */ - if ((first_v4(c) && - IN4_ARE_ADDR_EQUAL(addr, inany_v4(&first_v4(c)->addr))) || - IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) + if (IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) return false; return true; @@ -535,11 +540,15 @@ static bool fwd_guest_accessible4(const struct ctx *c, static bool fwd_guest_accessible6(const struct ctx *c, const struct in6_addr *addr) { + const struct inany_addr_entry *e; + if (IN6_IS_ADDR_LOOPBACK(addr)) return false; - if (first_v6(c) && IN6_ARE_ADDR_EQUAL(addr, &first_v6(c)->addr.a6)) - return false; + /* Check against all configured guest addresses */ + for_each_addr(c, e, AF_INET6) + if (IN6_ARE_ADDR_EQUAL(addr, &e->addr.a6)) + return false; /* For IPv6, addr_seen starts unspecified, because we don't know what LL * address the guest will take until we see it. Only check against it @@ -714,7 +723,7 @@ bool nat_inbound(const struct ctx *c, const union inany_addr *addr, first_v4(c) && inany_equals(addr, &first_v4(c)->addr)) { *translated = inany_from_v4(c->ip4.map_guest_addr); } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_guest_addr) && - first_v6(c) && inany_equals6(addr, &first_v6(c)->addr.a6)) { + first_v6(c) && inany_equals(addr, &first_v6(c)->addr)) { translated->a6 = c->ip6.map_guest_addr; } else if (fwd_guest_accessible(c, addr)) { *translated = *addr; -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

9:24 a.m.

New subject: [PATCH v3 03/11] ip: Introduce unified multi-address data structures

On Fri, Jan 30, 2026 at 04:44:39PM -0500, Jon Maloy wrote:

...

As preparation for supporting multiple addresses per interface, we replace the single addr/prefix_len fields with an array. The array consists of a new struct inany_addr_entry containing an address and prefix length, both in inany_addr format.

There are only two functional changes: - The indicated IPv6 prefix length is now properly stored, instead of being ignored and overridden with the hardcoded value 64, as as has been the case until now. - Since even IPv4 addresses now are stored in IPv6 format, we also store the corresponding prefix length in that format, i.e. using the range [96,128] instead of [0,32].

Signed-off-by: Jon Maloy

--- v2: -Using inany_addr instead of protocol specific addresses as entry address field.

v3: -Merging into one array, directly in struct ctx -Changed prefix_len and flags fields in struct inany_addr_entry to uint8_t, since that makes the struct directly migratable --- arp.c | 5 +- conf.c | 145 ++++++++++++++++++++++++++++++++----------------------- conf.h | 6 +++ dhcp.c | 8 +-- dhcpv6.c | 6 +-- fwd.c | 17 ++++--- ip.c | 11 +++-- ip.h | 5 ++ ndp.c | 6 +-- passt.h | 123 +++++++++++++++++++++++++++++++++++++++++++--- pasta.c | 15 ++++-- tap.c | 5 +- 12 files changed, 251 insertions(+), 101 deletions(-)

diff --git a/arp.c b/arp.c index bb042e9..f16beac 100644 --- a/arp.c +++ b/arp.c @@ -54,7 +54,8 @@ static bool ignore_arp(const struct ctx *c, return true;

/* Don't resolve the guest's assigned address, either. */ - if (!memcmp(am->tip, &c->ip4.addr, sizeof(am->tip))) + if (first_v4(c) &&

first_v4() isn't a great name as a global function. But then I guess later patches in the series will remove most of the uses, so it might be ok in the interim.

...

+ !memcmp(am->tip, inany_v4(&first_v4(c)->addr), sizeof(am->tip))) return true;

return false; @@ -145,7 +146,7 @@ void arp_send_init_req(const struct ctx *c) memcpy(req.am.sha, c->our_tap_mac, sizeof(req.am.sha)); memcpy(req.am.sip, &c->ip4.our_tap_addr, sizeof(req.am.sip)); memcpy(req.am.tha, MAC_BROADCAST, sizeof(req.am.tha)); - memcpy(req.am.tip, &c->ip4.addr, sizeof(req.am.tip)); + memcpy(req.am.tip, inany_v4(&first_v4(c)->addr), sizeof(req.am.tip));

debug("Sending initial ARP request for guest MAC address"); tap_send_single(c, &req, sizeof(req)); diff --git a/conf.c b/conf.c index 5188c02..bb6bcf8 100644 --- a/conf.c +++ b/conf.c @@ -56,7 +56,7 @@ #define IP4_LL_GUEST_GW (struct in_addr){ htonl_constant(0xa9fe0202) } /* 169.254.2.2, libslirp default: 10.0.2.2 */

-#define IP4_LL_PREFIX_LEN 16 +#define IP4_LL_PREFIX_LEN 112 /* 96 + 16, in IPv6 format */

I don't think the encoding should be changed without also changing the #define name.

...

#define IP6_LL_GUEST_GW (struct in6_addr) \ {{{ 0xfe, 0x80, 0, 0, 0, 0, 0, 0, \ @@ -693,11 +693,11 @@ static int conf_ip4_prefix(const char *arg) /** * conf_ip4() - Verify or detect IPv4 support, get relevant addresses * @ifi: Host interface to attempt (0 to determine one) - * @ip4: IPv4 context (will be written) + * @c: Execution context (will be written)

Putting @c as the first parameter is a pretty strong convention.

...

* * Return: interface index for IPv4, or 0 on failure. */ -static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4) +static unsigned int conf_ip4(unsigned int ifi, struct ctx *c) { if (!ifi) ifi = nl_get_ext_if(nl_sock, AF_INET); @@ -707,9 +707,9 @@ static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4) return 0; }

- if (IN4_IS_ADDR_UNSPECIFIED(&ip4->guest_gw)) { + if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.guest_gw)) { int rc = nl_route_get_def(nl_sock, ifi, AF_INET, - &ip4->guest_gw); + &c->ip4.guest_gw); if (rc < 0) { debug("Couldn't discover IPv4 gateway address: %s", strerror_(-rc)); @@ -717,60 +717,57 @@ static unsigned int conf_ip4(unsigned int ifi, struct ip4_ctx *ip4) } }

- if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) { + if (!count_v4(c)) {

Do you need count_v4() at this stage of the series? Testing first_v4() would work just as well here, and so far this is the only caller of count_v4().

...

+ struct in_addr addr = {0,}; + int prefix_len = 0; int rc = nl_addr_get(nl_sock, ifi, AF_INET, - &ip4->addr, &ip4->prefix_len, NULL); + &addr, &prefix_len, NULL); if (rc < 0) { debug("Couldn't discover IPv4 address: %s", strerror_(-rc)); return 0; } - } + if (IN4_IS_ADDR_UNSPECIFIED(&addr)) + return 0;

- if (!ip4->prefix_len) { - in_addr_t addr = ntohl(ip4->addr.s_addr); - if (IN_CLASSA(addr)) - ip4->prefix_len = (32 - IN_CLASSA_NSHIFT); - else if (IN_CLASSB(addr)) - ip4->prefix_len = (32 - IN_CLASSB_NSHIFT); - else if (IN_CLASSC(addr)) - ip4->prefix_len = (32 - IN_CLASSC_NSHIFT); - else - ip4->prefix_len = 32; + c->addrs[c->addr_count].addr = inany_from_v4(addr); + c->addrs[c->addr_count].prefix_len = prefix_len + 96; + c->addrs[c->addr_count].flags = INANY_ADDR_HOST; + c->addr_count++;

I suspect an add_prefix() type function will be worth it, by the time we're done.

...

+ c->ip4.addr_seen = addr; }

- ip4->addr_seen = ip4->addr; - - ip4->our_tap_addr = ip4->guest_gw; - - if (IN4_IS_ADDR_UNSPECIFIED(&ip4->addr)) - return 0; + c->ip4.our_tap_addr = c->ip4.guest_gw;

return ifi; }

/** * conf_ip4_local() - Configure IPv4 addresses and attributes for local mode - * @ip4: IPv4 context (will be written) + * @c: Execution context (will be written) */ -static void conf_ip4_local(struct ip4_ctx *ip4) +static void conf_ip4_local(struct ctx *c) { - ip4->addr_seen = ip4->addr = IP4_LL_GUEST_ADDR; - ip4->our_tap_addr = ip4->guest_gw = IP4_LL_GUEST_GW; - ip4->prefix_len = IP4_LL_PREFIX_LEN; + c->addrs[c->addr_count].addr = inany_from_v4(IP4_LL_GUEST_ADDR); + c->ip4.addr_seen = *inany_v4(&c->addrs[c->addr_count].addr); + c->ip4.our_tap_addr = c->ip4.guest_gw = IP4_LL_GUEST_GW; + c->addrs[c->addr_count].prefix_len = IP4_LL_PREFIX_LEN;

I'd add the 96 here, rather than putting it into the constant.

...

+ c->addr_count++;

- ip4->no_copy_addrs = ip4->no_copy_routes = true; + c->ip4.no_copy_addrs = c->ip4.no_copy_routes = true; }

/** * conf_ip6() - Verify or detect IPv6 support, get relevant addresses * @ifi: Host interface to attempt (0 to determine one) - * @ip6: IPv6 context (will be written) + * @c: Execution context (will be written) * * Return: interface index for IPv6, or 0 on failure. */ -static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6) +static unsigned int conf_ip6(unsigned int ifi, struct ctx *c) { + struct inany_addr_entry *e; + union inany_addr addr = {0,}; int prefix_len = 0; int rc;

@@ -782,8 +779,8 @@ static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6) return 0; }

- if (IN6_IS_ADDR_UNSPECIFIED(&ip6->guest_gw)) { - rc = nl_route_get_def(nl_sock, ifi, AF_INET6, &ip6->guest_gw); + if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.guest_gw)) { + rc = nl_route_get_def(nl_sock, ifi, AF_INET6, &c->ip6.guest_gw); if (rc < 0) { debug("Couldn't discover IPv6 gateway address: %s", strerror_(-rc)); @@ -791,21 +788,28 @@ static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6) } }

- rc = nl_addr_get(nl_sock, ifi, AF_INET6, - IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) ? &ip6->addr : NULL, - &prefix_len, &ip6->our_tap_ll); + rc = nl_addr_get(nl_sock, ifi, AF_INET6, &addr.a6, + &prefix_len, &c->ip6.our_tap_ll); if (rc < 0) { debug("Couldn't discover IPv6 address: %s", strerror_(-rc)); return 0; }

- ip6->addr_seen = ip6->addr; + if (!count_v6(c)) {

Same comment as for count_v4().

...

+ c->addrs[c->addr_count].addr = addr; + c->addrs[c->addr_count].prefix_len = prefix_len ? prefix_len : 64; + c->addrs[c->addr_count].flags = INANY_ADDR_HOST; + c->addr_count++; + } + + e = first_v6(c); + c->ip6.addr_seen = e->addr.a6;

- if (IN6_IS_ADDR_LINKLOCAL(&ip6->guest_gw)) - ip6->our_tap_ll = ip6->guest_gw; + if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.guest_gw)) + c->ip6.our_tap_ll = c->ip6.guest_gw;

- if (IN6_IS_ADDR_UNSPECIFIED(&ip6->addr) || - IN6_IS_ADDR_UNSPECIFIED(&ip6->our_tap_ll)) + if (IN6_IS_ADDR_UNSPECIFIED(&e->addr.a6) || + IN6_IS_ADDR_UNSPECIFIED(&c->ip6.our_tap_ll)) return 0;

return ifi; @@ -813,13 +817,13 @@ static unsigned int conf_ip6(unsigned int ifi, struct ip6_ctx *ip6)

/** * conf_ip6_local() - Configure IPv6 addresses and attributes for local mode - * @ip6: IPv6 context (will be written) + * @c: Execution context (will be written) */ -static void conf_ip6_local(struct ip6_ctx *ip6) +static void conf_ip6_local(struct ctx *c) { - ip6->our_tap_ll = ip6->guest_gw = IP6_LL_GUEST_GW; + c->ip6.our_tap_ll = c->ip6.guest_gw = IP6_LL_GUEST_GW;

- ip6->no_copy_addrs = ip6->no_copy_routes = true; + c->ip6.no_copy_addrs = c->ip6.no_copy_routes = true; }

/** @@ -1145,13 +1149,15 @@ static void conf_print(const struct ctx *c) buf4, sizeof(buf4)));

if (!c->no_dhcp) { + struct inany_addr_entry *e = first_v4(c);

You never test for e being NULL in this path. Maybe that's not possible unless c->no_dhcp, but that's pretty non-obvious from here.

...

uint32_t mask;

- mask = IN4_MASK(c->ip4.prefix_len); + mask = IN4_MASK(e->prefix_len);

info("DHCP:"); info(" assign: %s", - inet_ntop(AF_INET, &c->ip4.addr, buf4, sizeof(buf4))); + inet_ntop(AF_INET, inany_v4(&e->addr), + buf4, sizeof(buf4))); info(" mask: %s", inet_ntop(AF_INET, &mask, buf4, sizeof(buf4))); info(" router: %s", @@ -1189,7 +1195,8 @@ static void conf_print(const struct ctx *c) goto dns6;

info(" assign: %s", - inet_ntop(AF_INET6, &c->ip6.addr, buf6, sizeof(buf6))); + inet_ntop(AF_INET6, &first_v6(c)->addr.a6,

Similarly no test for NULL first_v6() before dereferencing.

...

+ buf6, sizeof(buf6))); info(" router: %s", inet_ntop(AF_INET6, &c->ip6.guest_gw, buf6, sizeof(buf6))); info(" our link-local: %s", @@ -1811,6 +1818,7 @@ void conf(struct ctx *c, int argc, char **argv) } case 'a': { union inany_addr addr = { 0 }; + struct inany_addr_entry *e; int prefix_len = 0; int af;

@@ -1828,13 +1836,20 @@ void conf(struct ctx *c, int argc, char **argv) die("Can't mix CIDR with -n");

if (af == AF_INET) { - c->ip4.addr = *inany_v4(&addr); - c->ip4.prefix_len = prefix_len ? prefix_len - 96 : - ip4_class_prefix_len(&c->ip4.addr); + e = &c->addrs[c->addr_count]; + e->addr = addr; + e->prefix_len = prefix_len ? prefix_len : + ip4_class_prefix_len(inany_v4(&addr)); + e->flags = INANY_ADDR_CONFIGURED; + c->addr_count++; if (c->mode == MODE_PASTA) c->ip4.no_copy_addrs = true; } else if (af == AF_INET6) { - c->ip6.addr = addr.a6; + e = &c->addrs[c->addr_count]; + e->addr = addr; + e->prefix_len = prefix_len ? prefix_len : 64; + e->flags = INANY_ADDR_CONFIGURED; + c->addr_count++; if (c->mode == MODE_PASTA) c->ip6.no_copy_addrs = true; } else { @@ -1842,14 +1857,21 @@ void conf(struct ctx *c, int argc, char **argv) } break; } - case 'n': + case 'n': { + struct inany_addr_entry *e; + int plen; + if (prefix_from_cidr) die("Can't use both -n and CIDR prefix length"); - c->ip4.prefix_len = conf_ip4_prefix(optarg); - if (c->ip4.prefix_len < 0) + plen = conf_ip4_prefix(optarg); + if (plen < 0) die("Invalid netmask: %s", optarg); + e = first_v4(c); + if (e) + e->prefix_len = plen + 96; prefix_from_opt = true; break; + } case 'M': parse_mac(c->our_tap_mac, optarg); break; @@ -1993,9 +2015,9 @@ void conf(struct ctx *c, int argc, char **argv)

nl_sock_init(c, false); if (!v6_only) - c->ifi4 = conf_ip4(ifi4, &c->ip4); + c->ifi4 = conf_ip4(ifi4, c); if (!v4_only) - c->ifi6 = conf_ip6(ifi6, &c->ip6); + c->ifi6 = conf_ip6(ifi6, c);

if (c->ifi4 && c->mtu < IPV4_MIN_MTU) { warn("MTU %"PRIu16" is too small for IPv4 (minimum %u)", @@ -2018,14 +2040,14 @@ void conf(struct ctx *c, int argc, char **argv) if (!c->ifi4 && !v6_only) { info("IPv4: no external interface as template, use local mode");

- conf_ip4_local(&c->ip4); + conf_ip4_local(c); c->ifi4 = -1; }

if (!c->ifi6 && !v4_only) { info("IPv6: no external interface as template, use local mode");

- conf_ip6_local(&c->ip6); + conf_ip6_local(c); c->ifi6 = -1; }

@@ -2134,7 +2156,8 @@ void conf(struct ctx *c, int argc, char **argv) if (!c->ifi6) { c->no_ndp = 1; c->no_dhcpv6 = 1; - } else if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr)) { + } else if (!first_v6(c) || + IN6_IS_ADDR_UNSPECIFIED(&first_v6(c)->addr.a6)) { c->no_dhcpv6 = 1; }

diff --git a/conf.h b/conf.h index b45ad74..e86fe96 100644 --- a/conf.h +++ b/conf.h @@ -6,6 +6,12 @@ #ifndef CONF_H #define CONF_H

+/* Flags indicating origin and role of an address + * To be used in struct inany_addr_entry + */ +#define CONF_ADDR_CONFIGURED (1 << 0) /* User set via -a */

As Stefano said, I think "USER" or "CMDLINE" would be preferable to "CONFIGURED".

...

+#define CONF_ADDR_HOST (1 << 1) /* From host interface */ + enum passt_modes conf_mode(int argc, char *argv[]); void conf(struct ctx *c, int argc, char **argv);

diff --git a/dhcp.c b/dhcp.c index c552f01..933a8cf 100644 --- a/dhcp.c +++ b/dhcp.c @@ -352,7 +352,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data) reply.secs = 0; reply.flags = m->flags; reply.ciaddr = m->ciaddr; - reply.yiaddr = c->ip4.addr; + reply.yiaddr = *inany_v4(&first_v4(c)->addr);

Again, maybe it's impossible for first_v4() to be NULL if !no_dhcp, but that's not obvious from here. If it's your intention to rely on that, an ASSERT would make sense.

...

reply.siaddr = 0; reply.giaddr = m->giaddr; memcpy(&reply.chaddr, m->chaddr, sizeof(reply.chaddr)); @@ -404,7 +404,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data)

info(" from %s", eth_ntop(m->chaddr, macstr, sizeof(macstr)));

- mask.s_addr = IN4_MASK(c->ip4.prefix_len); + mask.s_addr = IN4_MASK(first_v4(c)->prefix_len);

- 96, since the entry has an IPv6 encoded prefix length. Or maybe it would be nicer to replace first_v4() with something that extracts the first v4 address calling inany_v4() for you, and also extracts the matching prefix, doing the re-encoding to IPv4 format.

...

memcpy(opts[1].s, &mask, sizeof(mask)); memcpy(opts[3].s, &c->ip4.guest_gw, sizeof(c->ip4.guest_gw)); memcpy(opts[54].s, &c->ip4.our_tap_addr, sizeof(c->ip4.our_tap_addr)); @@ -412,7 +412,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data) /* If the gateway is not on the assigned subnet, send an option 121 * (Classless Static Routing) adding a dummy route to it. */ - if ((c->ip4.addr.s_addr & mask.s_addr) + if ((inany_v4(&first_v4(c)->addr)->s_addr & mask.s_addr) != (c->ip4.guest_gw.s_addr & mask.s_addr)) { /* a.b.c.d/32:0.0.0.0, 0:a.b.c.d */ opts[121].slen = 14; @@ -469,7 +469,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data) if (m->flags & FLAG_BROADCAST) dst = in4addr_broadcast; else - dst = c->ip4.addr; + dst = *inany_v4(&first_v4(c)->addr);

tap_udp4_send(c, c->ip4.our_tap_addr, 67, dst, 68, &reply, dlen);

diff --git a/dhcpv6.c b/dhcpv6.c index e4df0db..a1e5f15 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -625,7 +625,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, if (mh->type == TYPE_CONFIRM && server_id) return -1;

- if (dhcpv6_ia_notonlink(data, &c->ip6.addr)) { + if (dhcpv6_ia_notonlink(data, &first_v6(c)->addr.a6)) {

Likewise a helper that returns the first v6 address *as* a struct in6_addr might be neater.

...

dhcpv6_send_ia_notonlink(c, data, &client_id_base, ntohs(client_id->l), mh->xid); @@ -679,7 +679,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data,

tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546, mh->xid, &resp, n); - c->ip6.addr_seen = c->ip6.addr; + c->ip6.addr_seen = first_v6(c)->addr.a6;

return 1; } @@ -703,5 +703,5 @@ void dhcpv6_init(const struct ctx *c) memcpy(resp_not_on_link.server_id.duid_lladdr, c->our_tap_mac, sizeof(c->our_tap_mac));

- resp.ia_addr.addr = c->ip6.addr; + resp.ia_addr.addr = first_v6(c)->addr.a6; } diff --git a/fwd.c b/fwd.c index 44a0e10..54248a3 100644 --- a/fwd.c +++ b/fwd.c @@ -516,7 +516,8 @@ static bool fwd_guest_accessible4(const struct ctx *c, /* For IPv4, addr_seen is initialised to addr, so is always a valid * address */ - if (IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr) || + if ((first_v4(c) && + IN4_ARE_ADDR_EQUAL(addr, inany_v4(&first_v4(c)->addr))) || IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) return false;

@@ -537,7 +538,7 @@ static bool fwd_guest_accessible6(const struct ctx *c, if (IN6_IS_ADDR_LOOPBACK(addr)) return false;

- if (IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr)) + if (first_v6(c) && IN6_ARE_ADDR_EQUAL(addr, &first_v6(c)->addr.a6)) return false;

/* For IPv6, addr_seen starts unspecified, because we don't know what LL @@ -586,10 +587,10 @@ static void nat_outbound(const struct ctx *c, const union inany_addr *addr, *translated = inany_loopback4; else if (inany_equals6(addr, &c->ip6.map_host_loopback)) *translated = inany_loopback6; - else if (inany_equals4(addr, &c->ip4.map_guest_addr)) - *translated = inany_from_v4(c->ip4.addr); - else if (inany_equals6(addr, &c->ip6.map_guest_addr)) - translated->a6 = c->ip6.addr; + else if (first_v4(c) && inany_equals4(addr, &c->ip4.map_guest_addr)) + *translated = first_v4(c)->addr; + else if (first_v6(c) && inany_equals6(addr, &c->ip6.map_guest_addr)) + translated->a6 = first_v6(c)->addr.a6;

Implementing the forward table for the tap pif will make this cleaner, when I get to it.

...

else *translated = *addr; } @@ -710,10 +711,10 @@ bool nat_inbound(const struct ctx *c, const union inany_addr *addr, inany_equals6(addr, &in6addr_loopback)) { translated->a6 = c->ip6.map_host_loopback; } else if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_guest_addr) && - inany_equals4(addr, &c->ip4.addr)) { + first_v4(c) && inany_equals(addr, &first_v4(c)->addr)) { *translated = inany_from_v4(c->ip4.map_guest_addr); } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_guest_addr) && - inany_equals6(addr, &c->ip6.addr)) { + first_v6(c) && inany_equals6(addr, &first_v6(c)->addr.a6)) { translated->a6 = c->ip6.map_guest_addr; } else if (fwd_guest_accessible(c, addr)) { *translated = *addr; diff --git a/ip.c b/ip.c index 40dc24e..a86753f 100644 --- a/ip.c +++ b/ip.c @@ -74,17 +74,18 @@ found: * ip4_class_prefix_len() - Get class based prefix length for IPv4 address * @addr: IPv4 address * - * Return: prefix length based on address class, or 32 for other + * Return: prefix length in IPv6 format (96 + IPv4 prefix) based on address + * class, or 128 for other

Here again, I don't think you should change the encoding without changing the function name. As I've said, I think we should consistently use IPv6 encoding *when dealing with an inany*. But functions that deal specifically with IPv4 addresses, should use the IPv4 prefix length encoding.

...

*/ int ip4_class_prefix_len(const struct in_addr *addr) { in_addr_t a = ntohl(addr->s_addr);

if (IN_CLASSA(a)) - return 32 - IN_CLASSA_NSHIFT; + return 96 + 32 - IN_CLASSA_NSHIFT; if (IN_CLASSB(a)) - return 32 - IN_CLASSB_NSHIFT; + return 96 + 32 - IN_CLASSB_NSHIFT; if (IN_CLASSC(a)) - return 32 - IN_CLASSC_NSHIFT; - return 32; + return 96 + 32 - IN_CLASSC_NSHIFT; + return 128; } diff --git a/ip.h b/ip.h index c829d84..91d6d9a 100644 --- a/ip.h +++ b/ip.h @@ -137,6 +137,11 @@ static const struct in_addr in4addr_broadcast = { 0xffffffff }; #define IPV6_MIN_MTU 1280 #endif

+/* Maximum number of addresses per address family */ +#define IP4_MAX_ADDRS 16 +#define IP6_MAX_ADDRS 16

Separate defines here don't really make sense now you have a single address array.

...

+#define INANY_MAX_ADDRS (IP4_MAX_ADDRS + IP6_MAX_ADDRS) + int ip4_class_prefix_len(const struct in_addr *addr);

#endif /* IP_H */ diff --git a/ndp.c b/ndp.c index eb9e313..f60c298 100644 --- a/ndp.c +++ b/ndp.c @@ -257,7 +257,7 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) .valid_lifetime = ~0U, .pref_lifetime = ~0U, }, - .prefix = c->ip6.addr, + .prefix = first_v6(c)->addr.a6, .source_ll = { .header = { .type = OPT_SRC_L2_ADDR, @@ -466,8 +466,8 @@ void ndp_send_init_req(const struct ctx *c) .icmp6_solicited = 0, /* Reserved */ .icmp6_override = 0, /* Reserved */ }, - .target_addr = c->ip6.addr + .target_addr = first_v6(c)->addr.a6 }; debug("Sending initial NDP NS request for guest MAC address"); - ndp_send(c, &c->ip6.addr, &ns, sizeof(ns)); + ndp_send(c, &first_v6(c)->addr.a6, &ns, sizeof(ns)); } diff --git a/passt.h b/passt.h index 79d01dd..ec5517d 100644 --- a/passt.h +++ b/passt.h @@ -64,11 +64,42 @@ enum passt_modes { MODE_VU, };

+/* Flags indicating origin and role of an address + * in struct inany_addr_entry + */ +#define INANY_ADDR_CONFIGURED BIT(0) /* User set via -a */ +#define INANY_ADDR_HOST BIT(1) /* From host interface */

These appear to be redundant with with CONF_ADDR_*. These are specific to how we handle configuration, not general properties of a prefix, so I don't think these ones belong.

...

+ +/** + * for_each_addr() - Iterate over addresses in unified array + * @c: Pointer to struct ctx + * @e: Pointer variable for current entry (struct inany_addr_entry *)

Wll, @c going first is a strongish convention, but the loop variable going first in a for_each_() macro is an even stronger convention.

...

+ * @af: Address family filter: AF_INET, AF_INET6, or 0 for all

Iterating over just one address type is going to become less and less common as we unify handling in various places. So it might be simpler to just have a single iterator, and explicitly filter the "wrong" type in the places we need to.

...

+ * + * Note: @_i is the internal loop counter, uses _next_addr_idx() helper + */ +#define for_each_addr(c, e, af) \ + for (int _i = _next_addr_idx((c), 0, (af)); \ + _i < (c)->addr_count && ((e) = &(c)->addrs[_i], true); \ + _i = _next_addr_idx((c), _i + 1, (af))) + +/* first_v4(), first_v6(), count_v4(), count_v6() defined after struct ctx */ + +/** + * struct inany_addr_entry - Unified IPv4/IPv6 address entry + * @addr: IPv4 (as mapped) or IPv6 address + * @prefix_len: Prefix length in IPv6/IPv4-mapped [0,128]/[96,128] format + * @flags: INANY_ADDR_* flags + */ +struct inany_addr_entry { + union inany_addr addr; + uint8_t prefix_len; + uint8_t flags; +}; + /** * struct ip4_ctx - IPv4 execution context - * @addr: IPv4 address assigned to guest * @addr_seen: Latest IPv4 address seen as source from tap - * @prefixlen: IPv4 prefix length (netmask) * @guest_gw: IPv4 gateway as seen by the guest * @map_host_loopback: Outbound connections to this address are NATted to the * host's 127.0.0.1 @@ -84,10 +115,7 @@ enum passt_modes { * @no_copy_addrs: Don't copy all addresses when configuring namespace */ struct ip4_ctx { - /* PIF_TAP addresses */ - struct in_addr addr; struct in_addr addr_seen; - int prefix_len; struct in_addr guest_gw; struct in_addr map_host_loopback; struct in_addr map_guest_addr; @@ -107,7 +135,6 @@ struct ip4_ctx {

/** * struct ip6_ctx - IPv6 execution context - * @addr: IPv6 address assigned to guest * @addr_seen: Latest IPv6 global/site address seen as source from tap * @addr_ll_seen: Latest IPv6 link-local address seen as source from tap * @guest_gw: IPv6 gateway as seen by the guest @@ -125,8 +152,6 @@ struct ip4_ctx { * @no_copy_addrs: Don't copy all addresses when configuring namespace */ struct ip6_ctx { - /* PIF_TAP addresses */ - struct in6_addr addr; struct in6_addr addr_seen; struct in6_addr addr_ll_seen; struct in6_addr guest_gw; @@ -257,6 +282,10 @@ struct ctx { int ifi6; struct ip6_ctx ip6;

+ /* Unified address array for both IPv4 (mapped) and IPv6 */ + struct inany_addr_entry addrs[INANY_MAX_ADDRS]; + int addr_count; +

THe new fields need to be added to the structure comment

...

char pasta_ifn[IF_NAMESIZE]; unsigned int pasta_ifi; int pasta_conf_ns; @@ -294,6 +323,84 @@ struct ctx { bool migrate_exit; };

+/** + * first_v4() - Get first IPv4 address entry + * @c: Pointer to struct ctx + * + * Return: pointer to first IPv4 entry, or NULL if none + */ +static inline struct inany_addr_entry *first_v4(const struct ctx *c) +{ + for (int i = 0; i < c->addr_count; i++) + if (inany_v4(&c->addrs[i].addr)) + return (struct inany_addr_entry *)&c->addrs[i]; + return NULL; +} + +/** + * first_v6() - Get first IPv6 address entry + * @c: Pointer to struct ctx + * + * Return: pointer to first IPv6 entry, or NULL if none + */ +static inline struct inany_addr_entry *first_v6(const struct ctx *c) +{ + for (int i = 0; i < c->addr_count; i++) + if (!inany_v4(&c->addrs[i].addr)) + return (struct inany_addr_entry *)&c->addrs[i]; + return NULL; +} + +/** + * count_v4() - Count IPv4 addresses + * @c: Pointer to struct ctx + * + * Return: number of IPv4 addresses + */ +static inline int count_v4(const struct ctx *c) +{ + int count = 0; + + for (int i = 0; i < c->addr_count; i++) + if (inany_v4(&c->addrs[i].addr)) + count++; + return count; +} + +/** + * count_v6() - Count IPv6 addresses + * @c: Pointer to struct ctx + * + * Return: number of IPv6 addresses + */ +static inline int count_v6(const struct ctx *c) +{ + int count = 0; + + for (int i = 0; i < c->addr_count; i++) + if (!inany_v4(&c->addrs[i].addr)) + count++; + return count; +} + +/** + * _next_addr_idx() - Find next address index matching family filter + * @c: Pointer to struct ctx + * @i: Starting index + * @af: Address family filter: AF_INET, AF_INET6, or 0 for all + * + * Return: next matching index, or addr_count if none found + */ +static inline int _next_addr_idx(const struct ctx *c, int i, int af) +{ + while (i < c->addr_count) { + if (!af || ((af == AF_INET) == !!inany_v4(&c->addrs[i].addr))) + return i; + i++; + } + return i; +}

This is a pretty fiddly way of doing the iteration. Simpler to have a plain loop through the array, and skip the entries you don't want.

...

+ void proto_update_l2_buf(const unsigned char *eth_d);

#endif /* PASST_H */ diff --git a/pasta.c b/pasta.c index c307b8a..08f35f4 100644 --- a/pasta.c +++ b/pasta.c @@ -338,10 +338,12 @@ void pasta_ns_conf(struct ctx *c)

if (c->ifi4) { if (c->ip4.no_copy_addrs) { + struct inany_addr_entry *e = first_v4(c);

NULL check?

...

+ rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET, - &c->ip4.addr, - c->ip4.prefix_len); + inany_v4(&e->addr), + e->prefix_len); } else { rc = nl_addr_dup(nl_sock, c->ifi4, nl_sock_ns, c->pasta_ifi, @@ -387,10 +389,13 @@ void pasta_ns_conf(struct ctx *c) 0, IFF_NOARP);

if (c->ip6.no_copy_addrs) { - if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr)) { + struct inany_addr_entry *e = first_v6(c); + + if (e && !IN6_IS_ADDR_UNSPECIFIED(&e->addr.a6)) {

If it was UNSPECIFIED, then first_v6() would return NULL, no?

...

rc = nl_addr_set(nl_sock_ns, - c->pasta_ifi, AF_INET6, - &c->ip6.addr, 64); + c->pasta_ifi, + AF_INET6, + &e->addr.a6, 64); } } else { rc = nl_addr_dup(nl_sock, c->ifi6, diff --git a/tap.c b/tap.c index 9d1344b..f3073c6 100644 --- a/tap.c +++ b/tap.c @@ -951,8 +951,9 @@ resume: c->ip6.addr_seen = *saddr; }

- if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr)) - c->ip6.addr = *saddr; + if (first_v6(c) && + IN6_IS_ADDR_UNSPECIFIED(&first_v6(c)->addr.a6))

Again, if the address were unspecified, first_v6() would return NULL, wouldn't it?

...

+ first_v6(c)->addr.a6 = *saddr; } else if (!IN6_IS_ADDR_UNSPECIFIED(saddr)){ c->ip6.addr_seen = *saddr; } -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

9:40 a.m.

New subject: [PATCH v3 06/11] pasta: Extract pasta_ns_conf_ip4/6() to reduce nesting

On Fri, Jan 30, 2026 at 04:44:42PM -0500, Jon Maloy wrote:

...

Extract the IPv4 and IPv6 namespace configuration code from pasta_ns_conf() into separate static functions. This reduces indentation depth and prepares for adding multi-address support.

No functional change.

This seems fine in isolation, but I think it moves in the wrong direction in context. With the unified address array, we want to step through and set all the addresses in a single pass - this kind of works against that.

...

Signed-off-by: Jon Maloy --- pasta.c | 182 ++++++++++++++++++++++++++++++-------------------------- 1 file changed, 96 insertions(+), 86 deletions(-)

diff --git a/pasta.c b/pasta.c index 08f35f4..de0ba14 100644 --- a/pasta.c +++ b/pasta.c @@ -303,6 +303,98 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid, die_perror("Failed to join network namespace"); }

+/** + * pasta_ns_conf_ip4() - Configure IPv4 in namespace + * @c: Execution context + */ +static void pasta_ns_conf_ip4(struct ctx *c) +{ + int rc = 0; + + if (c->ip4.no_copy_addrs) { + struct inany_addr_entry *e = first_v4(c); + + rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET, + inany_v4(&e->addr), e->prefix_len - 96); + } else { + rc = nl_addr_dup(nl_sock, c->ifi4, + nl_sock_ns, c->pasta_ifi, AF_INET); + } + + if (rc < 0) { + die("Couldn't set IPv4 address(es) in namespace: %s", + strerror_(-rc)); + } + + if (c->ip4.no_copy_routes) { + rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi, + AF_INET, &c->ip4.guest_gw); + } else { + rc = nl_route_dup(nl_sock, c->ifi4, nl_sock_ns, + c->pasta_ifi, AF_INET); + } + + if (rc < 0) { + die("Couldn't set IPv4 route(s) in guest: %s", + strerror_(-rc)); + } +} + +/** + * pasta_ns_conf_ip6() - Configure IPv6 in namespace + * @c: Execution context + */ +static void pasta_ns_conf_ip6(struct ctx *c) +{ + struct inany_addr_entry *e; + int rc = 0; + + rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, &c->ip6.addr_ll_seen); + if (rc < 0) { + warn("Can't get LL address from namespace: %s", + strerror_(-rc)); + } + + rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); + if (rc < 0) { + warn("Can't set nodad for LL in namespace: %s", + strerror_(-rc)); + } + + /* We dodged DAD: re-enable neighbour solicitations */ + nl_link_set_flags(nl_sock_ns, c->pasta_ifi, 0, IFF_NOARP); + + if (c->ip6.no_copy_addrs) { + e = first_v6(c); + + if (e && !IN6_IS_ADDR_UNSPECIFIED(&e->addr.a6)) { + rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, + AF_INET6, &e->addr.a6, 64); + } + } else { + rc = nl_addr_dup(nl_sock, c->ifi6, + nl_sock_ns, c->pasta_ifi, AF_INET6); + } + + if (rc < 0) { + die("Couldn't set IPv6 address(es) in namespace: %s", + strerror_(-rc)); + } + + if (c->ip6.no_copy_routes) { + rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi, + AF_INET6, &c->ip6.guest_gw); + } else { + rc = nl_route_dup(nl_sock, c->ifi6, + nl_sock_ns, c->pasta_ifi, AF_INET6); + } + + if (rc < 0) { + die("Couldn't set IPv6 route(s) in guest: %s", + strerror_(-rc)); + } +} + /** * pasta_ns_conf() - Set up loopback and tap interfaces in namespace as needed * @c: Execution context @@ -336,93 +428,11 @@ void pasta_ns_conf(struct ctx *c)

nl_link_set_flags(nl_sock_ns, c->pasta_ifi, flags, flags);

- if (c->ifi4) { - if (c->ip4.no_copy_addrs) { - struct inany_addr_entry *e = first_v4(c); - - rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, - AF_INET, - inany_v4(&e->addr), - e->prefix_len); - } else { - rc = nl_addr_dup(nl_sock, c->ifi4, - nl_sock_ns, c->pasta_ifi, - AF_INET); - } - - if (rc < 0) { - die("Couldn't set IPv4 address(es) in namespace: %s", - strerror_(-rc)); - } - - if (c->ip4.no_copy_routes) { - rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi, - AF_INET, - &c->ip4.guest_gw); - } else { - rc = nl_route_dup(nl_sock, c->ifi4, nl_sock_ns, - c->pasta_ifi, AF_INET); - } - - if (rc < 0) { - die("Couldn't set IPv4 route(s) in guest: %s", - strerror_(-rc)); - } - } + if (c->ifi4) + pasta_ns_conf_ip4(c);

- if (c->ifi6) { - rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, - &c->ip6.addr_ll_seen); - if (rc < 0) { - warn("Can't get LL address from namespace: %s", - strerror_(-rc)); - } - - rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); - if (rc < 0) { - warn("Can't set nodad for LL in namespace: %s", - strerror_(-rc)); - } - - /* We dodged DAD: re-enable neighbour solicitations */ - nl_link_set_flags(nl_sock_ns, c->pasta_ifi, - 0, IFF_NOARP); - - if (c->ip6.no_copy_addrs) { - struct inany_addr_entry *e = first_v6(c); - - if (e && !IN6_IS_ADDR_UNSPECIFIED(&e->addr.a6)) { - rc = nl_addr_set(nl_sock_ns, - c->pasta_ifi, - AF_INET6, - &e->addr.a6, 64); - } - } else { - rc = nl_addr_dup(nl_sock, c->ifi6, - nl_sock_ns, c->pasta_ifi, - AF_INET6); - } - - if (rc < 0) { - die("Couldn't set IPv6 address(es) in namespace: %s", - strerror_(-rc)); - } - - if (c->ip6.no_copy_routes) { - rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi, - AF_INET6, - &c->ip6.guest_gw); - } else { - rc = nl_route_dup(nl_sock, c->ifi6, - nl_sock_ns, c->pasta_ifi, - AF_INET6); - } - - if (rc < 0) { - die("Couldn't set IPv6 route(s) in guest: %s", - strerror_(-rc)); - } - } + if (c->ifi6) + pasta_ns_conf_ip6(c); }

proto_update_l2_buf(c->guest_mac); -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

9:47 a.m.

New subject: [PATCH v3 07/11] conf: Allow multiple -a/--address options per address family

On Fri, Jan 30, 2026 at 04:44:43PM -0500, Jon Maloy wrote:

...

We enable configuration of multiple IPv4 and IPv6 addresses by allowing repeated use of the -a/--address option.

- We update option parsing to append addresses to the unified addrs[] array, with limit checks for IP4_MAX_ADDRS and IP6_MAX_ADDRS.

I don't see any point to separate v4 and v6 limits, now there's a unified array.

...

- Each address specified via -a, but with no prefix length indicated, gets a class-based default prefix length.

That's not new, is it? I think we want to be careful with this. We need to maintain compatibility with previously working options, but I think we want to strongly encourage prefix lengths to be explicitly specified. Address classes are very anachronistic now, so I don't think we want to add any more uses of them than we have to for compatibility.

...

- If no -a option is given, addresses/prefix lengths are inherited from the template interface. - If a prefix length is to be added, it has to be done in CIDR format, except for the very first address.

I don't really follow what that means.

...

- We configure all indicated addresses in the namespace interface using the for_each_addr() macro.

Signed-off-by: Jon Maloy

--- v2: - Adapted to previous code changes v3: - Adapted to single-array strategy - Changes according to feedback from S. Brivio and G Gibson. --- conf.c | 23 ++++++++++++++++------- pasta.c | 21 ++++++++++++++------- 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/conf.c b/conf.c index bb6bcf8..d73a3dd 100644 --- a/conf.c +++ b/conf.c @@ -803,13 +803,13 @@ static unsigned int conf_ip6(unsigned int ifi, struct ctx *c) }

e = first_v6(c); - c->ip6.addr_seen = e->addr.a6; + if (e) + c->ip6.addr_seen = e->addr.a6;

This seems like it belongs in an earlier patch.

...

if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.guest_gw)) c->ip6.our_tap_ll = c->ip6.guest_gw;

- if (IN6_IS_ADDR_UNSPECIFIED(&e->addr.a6) || - IN6_IS_ADDR_UNSPECIFIED(&c->ip6.our_tap_ll))

Missed this in an earlier patch, but !!e and an unspecified address shouldn't be possible, no?

...

+ if (!count_v6(c) || IN6_IS_ADDR_UNSPECIFIED(&c->ip6.our_tap_ll)) return 0;

return ifi; @@ -901,9 +901,11 @@ static void usage(const char *name, FILE *f, int status) " default: 65520: maximum 802.3 MTU minus 802.3 header\n" " length, rounded to 32 bits (IPv4 words)\n" " -a, --address ADDR Assign IPv4 or IPv6 address ADDR[/PREFIXLEN]\n" - " can be specified zero to two times (for IPv4 and IPv6)\n" + " can be specified multiple times (limit: %d IPv4, %d IPv6)\n" " default: use addresses from interface with default route\n" - " -n, --netmask MASK Assign IPv4 MASK, dot-decimal or bits\n" + " -n, --netmask MASK Assign IPv4 MASK, dot-decimal or bits\n", + IP4_MAX_ADDRS, IP6_MAX_ADDRS); + FPRINTF(f, " default: netmask from matching address on the host\n" " -M, --mac-addr ADDR Use source MAC address ADDR\n" " default: 9a:55:9a:55:9a:55 (locally administered)\n" @@ -1836,6 +1838,9 @@ void conf(struct ctx *c, int argc, char **argv) die("Can't mix CIDR with -n");

if (af == AF_INET) { + if (count_v4(c) >= IP4_MAX_ADDRS) + die("Too many IPv4 addresses"); + e = &c->addrs[c->addr_count]; e->addr = addr; e->prefix_len = prefix_len ? prefix_len : @@ -1845,6 +1850,9 @@ void conf(struct ctx *c, int argc, char **argv) if (c->mode == MODE_PASTA) c->ip4.no_copy_addrs = true; } else if (af == AF_INET6) { + if (count_v6(c) >= IP6_MAX_ADDRS) + die("Too many IPv6 addresses"); + e = &c->addrs[c->addr_count]; e->addr = addr; e->prefix_len = prefix_len ? prefix_len : 64; @@ -1861,6 +1869,8 @@ void conf(struct ctx *c, int argc, char **argv) struct inany_addr_entry *e; int plen;

+ if (count_v4(c) > 1) + die("-n can only be used with first address"); if (prefix_from_cidr) die("Can't use both -n and CIDR prefix length"); plen = conf_ip4_prefix(optarg); @@ -2156,8 +2166,7 @@ void conf(struct ctx *c, int argc, char **argv) if (!c->ifi6) { c->no_ndp = 1; c->no_dhcpv6 = 1; - } else if (!first_v6(c) || - IN6_IS_ADDR_UNSPECIFIED(&first_v6(c)->addr.a6)) { + } else if (!count_v6(c)) { c->no_dhcpv6 = 1; }

diff --git a/pasta.c b/pasta.c index de0ba14..8cb5873 100644 --- a/pasta.c +++ b/pasta.c @@ -312,10 +312,14 @@ static void pasta_ns_conf_ip4(struct ctx *c) int rc = 0;

if (c->ip4.no_copy_addrs) { - struct inany_addr_entry *e = first_v4(c); + const struct inany_addr_entry *e;

- rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET, - inany_v4(&e->addr), e->prefix_len - 96); + for_each_addr(c, e, AF_INET) { + rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET, + inany_v4(&e->addr), e->prefix_len - 96); + if (rc < 0) + break; + }

As noted on the previous patch, I think a single pass through the array makes sense, only changing the inner part of the loop for v4 vs. v6 (we could potentially update nl_addr_set() to take an inany, which might make it cleaner).

...

} else { rc = nl_addr_dup(nl_sock, c->ifi4, nl_sock_ns, c->pasta_ifi, AF_INET); @@ -346,7 +350,6 @@ static void pasta_ns_conf_ip4(struct ctx *c) */ static void pasta_ns_conf_ip6(struct ctx *c) { - struct inany_addr_entry *e; int rc = 0;

rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, &c->ip6.addr_ll_seen); @@ -365,11 +368,15 @@ static void pasta_ns_conf_ip6(struct ctx *c) nl_link_set_flags(nl_sock_ns, c->pasta_ifi, 0, IFF_NOARP);

if (c->ip6.no_copy_addrs) { - e = first_v6(c); + const struct inany_addr_entry *e;

- if (e && !IN6_IS_ADDR_UNSPECIFIED(&e->addr.a6)) { + for_each_addr(c, e, AF_INET6) { + if (IN6_IS_ADDR_UNSPECIFIED(&e->addr.a6)) + continue; rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, - AF_INET6, &e->addr.a6, 64); + AF_INET6, &e->addr.a6, e->prefix_len); + if (rc < 0) + break; } } else { rc = nl_addr_dup(nl_sock, c->ifi6, -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

9:34 a.m.

New subject: [PATCH v3 05/11] arp: Check all configured addresses in ARP filtering

On Fri, Jan 30, 2026 at 04:44:41PM -0500, Jon Maloy wrote:

...

As a preparation for handling multiple addresses, we update ignore_arp() to check against all addresses in the unified addrs[] array using the for_each_addr() macro.

Signed-off-by: Jon Maloy

Reviewed-by: David Gibson Except that I think updates I've suggested to for_each_addr() will require small updates here too.

...

--- v3: Adapted to single-array changes earlier in this series --- arp.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arp.c b/arp.c index f16beac..c493f9e 100644 --- a/arp.c +++ b/arp.c @@ -41,6 +41,8 @@ static bool ignore_arp(const struct ctx *c, const struct arphdr *ah, const struct arpmsg *am) { + const struct inany_addr_entry *e; + if (ah->ar_hrd != htons(ARPHRD_ETHER) || ah->ar_pro != htons(ETH_P_IP) || ah->ar_hln != ETH_ALEN || @@ -53,10 +55,10 @@ static bool ignore_arp(const struct ctx *c, !memcmp(am->sip, am->tip, sizeof(am->sip))) return true;

- /* Don't resolve the guest's assigned address, either. */ - if (first_v4(c) && - !memcmp(am->tip, inany_v4(&first_v4(c)->addr), sizeof(am->tip))) - return true; + /* Don't resolve any of the guest's addresses */ + for_each_addr(c, e, AF_INET) + if (!memcmp(am->tip, inany_v4(&e->addr), sizeof(am->tip))) + return true;

return false; } -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

9:50 a.m.

New subject: [PATCH v3 08/11] migrate: Rename v1 address functions to v2 for clarity

On Fri, Jan 30, 2026 at 04:44:44PM -0500, Jon Maloy wrote:

...

Some migration address structures and functions have a _v1 suffix. This is confusing, since they are currently handling version 2 of the migration protocol. We are now going to introduce a new version 3 of the protocol, so we choose to give these functions the correct suffix _v2 instead. This is in correspondence with current reality, and will help make a clearer distinction between the old and the new versions of those functions.

I think the historical reason for this is that those specific components didn't change format from v1 to v2. But, given that v1 existed only very briefly and is not now supported, that's not very relevant. So, Reviewed-by: David Gibson And furthermore I think it makes sense to apply this independent of the rest of the series.

...

Signed-off-by: Jon Maloy --- migrate.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/migrate.c b/migrate.c index 48d63a0..7398d26 100644 --- a/migrate.c +++ b/migrate.c @@ -29,13 +29,13 @@ #define MIGRATE_MAGIC 0xB1BB1D1B0BB1D1B0

/** - * struct migrate_seen_addrs_v1 - Migratable guest addresses for v1 state stream + * struct migrate_seen_addrs_v2 - Migratable guest addresses for v2 protocol * @addr6: Observed guest IPv6 address * @addr6_ll: Observed guest IPv6 link-local address * @addr4: Observed guest IPv4 address * @mac: Observed guest MAC address */ -struct migrate_seen_addrs_v1 { +struct migrate_seen_addrs_v2 { struct in6_addr addr6; struct in6_addr addr6_ll; struct in_addr addr4; @@ -43,7 +43,7 @@ struct migrate_seen_addrs_v1 { } __attribute__((packed));

/** - * seen_addrs_source_v1() - Copy and send guest observed addresses from source + * seen_addrs_source_v2() - Copy and send guest observed addresses from source * @c: Execution context * @stage: Migration stage, unused * @fd: File descriptor for state transfer @@ -51,10 +51,10 @@ struct migrate_seen_addrs_v1 { * Return: 0 on success, positive error code on failure */ /* cppcheck-suppress [constParameterCallback, unmatchedSuppression] */ -static int seen_addrs_source_v1(struct ctx *c, +static int seen_addrs_source_v2(struct ctx *c, const struct migrate_stage *stage, int fd) { - struct migrate_seen_addrs_v1 addrs = { + struct migrate_seen_addrs_v2 addrs = { .addr6 = c->ip6.addr_seen, .addr6_ll = c->ip6.addr_ll_seen, .addr4 = c->ip4.addr_seen, @@ -71,17 +71,17 @@ static int seen_addrs_source_v1(struct ctx *c, }

/** - * seen_addrs_target_v1() - Receive and use guest observed addresses on target + * seen_addrs_target_v2() - Receive and use guest observed addresses on target * @c: Execution context * @stage: Migration stage, unused * @fd: File descriptor for state transfer * * Return: 0 on success, positive error code on failure */ -static int seen_addrs_target_v1(struct ctx *c, +static int seen_addrs_target_v2(struct ctx *c, const struct migrate_stage *stage, int fd) { - struct migrate_seen_addrs_v1 addrs; + struct migrate_seen_addrs_v2 addrs;

(void)stage;

@@ -100,8 +100,8 @@ static int seen_addrs_target_v1(struct ctx *c, static const struct migrate_stage stages_v2[] = { { .name = "observed addresses", - .source = seen_addrs_source_v1, - .target = seen_addrs_target_v1, + .source = seen_addrs_source_v2, + .target = seen_addrs_target_v2, }, { .name = "prepare flows", -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

9 Feb 9 Feb

11:17 p.m.

New subject: [PATCH v3 09/11] ip: Track observed guest IPv4 addresses in unified address array

On Fri, Jan 30, 2026 at 04:44:45PM -0500, Jon Maloy wrote:

...

We remove the addr_seen field in struct ip4_ctx and replace it by setting a new INANY_ADDR_OBSERVED flag in the corresponding entry in the unified address array. If the seen address is not present in the array we add it first.

We also update the migration protocol to v3, so that it can distribute multiple addresses from the unified array.

Signed-off-by: Jon Maloy

--- v3: - Updated migration protocol --- conf.c | 2 - fwd.c | 50 ++++++++++++---- migrate.c | 172 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- passt.h | 3 +- tap.c | 40 +++++++++++-- 5 files changed, 241 insertions(+), 26 deletions(-)

diff --git a/conf.c b/conf.c index d73a3dd..c5818a8 100644 --- a/conf.c +++ b/conf.c @@ -734,7 +734,6 @@ static unsigned int conf_ip4(unsigned int ifi, struct ctx *c) c->addrs[c->addr_count].prefix_len = prefix_len + 96; c->addrs[c->addr_count].flags = INANY_ADDR_HOST; c->addr_count++; - c->ip4.addr_seen = addr; }

c->ip4.our_tap_addr = c->ip4.guest_gw; @@ -749,7 +748,6 @@ static unsigned int conf_ip4(unsigned int ifi, struct ctx *c) static void conf_ip4_local(struct ctx *c) { c->addrs[c->addr_count].addr = inany_from_v4(IP4_LL_GUEST_ADDR); - c->ip4.addr_seen = *inany_v4(&c->addrs[c->addr_count].addr); c->ip4.our_tap_addr = c->ip4.guest_gw = IP4_LL_GUEST_GW; c->addrs[c->addr_count].prefix_len = IP4_LL_PREFIX_LEN; c->addr_count++; diff --git a/fwd.c b/fwd.c index 20d581d..5af482d 100644 --- a/fwd.c +++ b/fwd.c @@ -491,6 +491,30 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini) ((ini->oport == 53) || (ini->oport == 853)); }

+/** + * fwd_guest_addr4() - Get first observed IPv4 guest address + * @c: Execution context + * + * Return: first observed IPv4 address, if any, otherwise + * first configured, if any, otherwise NULL + */ +static const struct in_addr *fwd_guest_addr4(const struct ctx *c) +{ + const struct inany_addr_entry *e; + + /* Find first observed address */ + for_each_addr(c, e, AF_INET) + if (e->flags & INANY_ADDR_OBSERVED) + return inany_v4(&e->addr); + + /* Fallback to first address */ + e = first_v4(c); + if (e) + return inany_v4(&e->addr); + + return NULL; +} + /** * fwd_guest_accessible4() - Is IPv4 address guest-accessible * @c: Execution context @@ -515,17 +539,11 @@ static bool fwd_guest_accessible4(const struct ctx *c, if (IN4_IS_ADDR_UNSPECIFIED(addr)) return false;

- /* Check against all configured guest addresses */ + /* Check against all guest addresses (configured and observed) */ for_each_addr(c, e, AF_INET) if (IN4_ARE_ADDR_EQUAL(addr, inany_v4(&e->addr))) return false;

- /* Also check addr_seen: it tracks the address the guest is actually - * using, which may differ from configured addresses. - */ - if (IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) - return false; - return true; }

@@ -768,10 +786,16 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, * match. */ if (inany_v4(&ini->eaddr)) { - if (c->host_lo_to_ns_lo) + if (c->host_lo_to_ns_lo) { tgt->eaddr = inany_loopback4; - else - tgt->eaddr = inany_from_v4(c->ip4.addr_seen); + } else { + const struct in_addr *guest_addr; + + guest_addr = fwd_guest_addr4(c); + if (!guest_addr) + return PIF_NONE; + tgt->eaddr = inany_from_v4(*guest_addr);

Converting from inany to in_addr then back again is a bit clunky here.

...

+ } tgt->oaddr = inany_any4; } else { if (c->host_lo_to_ns_lo) @@ -803,7 +827,11 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, tgt->oport = ini->eport;

if (inany_v4(&tgt->oaddr)) { - tgt->eaddr = inany_from_v4(c->ip4.addr_seen); + const struct in_addr *guest_addr = fwd_guest_addr4(c); + + if (!guest_addr) + return PIF_NONE; + tgt->eaddr = inany_from_v4(*guest_addr); } else { if (inany_is_linklocal6(&tgt->oaddr)) tgt->eaddr.a6 = c->ip6.addr_ll_seen; diff --git a/migrate.c b/migrate.c index 7398d26..a84af1c 100644 --- a/migrate.c +++ b/migrate.c @@ -29,7 +29,7 @@ #define MIGRATE_MAGIC 0xB1BB1D1B0BB1D1B0

/** - * struct migrate_seen_addrs_v2 - Migratable guest addresses for v2 protocol + * struct migrate_seen_addrs_v2 - Migratable observed guest addresses

I don't really see the rationale for this change.

...

* @addr6: Observed guest IPv6 address * @addr6_ll: Observed guest IPv6 link-local address * @addr4: Observed guest IPv4 address @@ -43,7 +43,19 @@ struct migrate_seen_addrs_v2 { } __attribute__((packed));

/** - * seen_addrs_source_v2() - Copy and send guest observed addresses from source + * struct migrate_seen_addrs_v3 - Migratable observed guest addresses + * @addr_count: Number of address entries + * @addrs: Array of observed address entries + * @mac: Observed guest MAC address + */ +struct migrate_seen_addrs_v3 { + uint8_t addr_count; + struct inany_addr_entry addrs[INANY_MAX_ADDRS];

Having a fixed array size here in the migration format isn't ideal. It would be better to send a count, then send as many addresses as we have, then immediately continue with whatever else. That means that if we adjust the number of concurrent addresses we support, it doesn't need a chnage to the migration stream format.

...

+ unsigned char mac[ETH_ALEN]; +} __packed;

I'd be inclined to move the migration changes into a separate patch. They won't really work properly until both v4 and v6 observed addresses are in the unified array. That will necessitate glue code to squeeze the new information as best you can into the old format. But, that might actually be useful to test migrating from an older passt to a newer one. Since you're making a new migration stream version anyway, it might be better to make the new format transfer new addresses, not just observed ones. That means the format will be robust when/if we allow reconfiguring the other addresses at runtime.

...

+ +/** + * seen_addrs_source_v2() - Send observed addresses from source * @c: Execution context * @stage: Migration stage, unused * @fd: File descriptor for state transfer @@ -57,11 +69,19 @@ static int seen_addrs_source_v2(struct ctx *c, struct migrate_seen_addrs_v2 addrs = { .addr6 = c->ip6.addr_seen, .addr6_ll = c->ip6.addr_ll_seen, - .addr4 = c->ip4.addr_seen, }; + const struct inany_addr_entry *e;

(void)stage;

+ /* Find first observed IPv4 address */ + for_each_addr(c, e, AF_INET) { + if (e->flags & INANY_ADDR_OBSERVED) { + addrs.addr4 = *inany_v4(&e->addr); + break; + } + }

Can you re-use fwd_guest_addr4() here?

...

+ memcpy(addrs.mac, c->guest_mac, sizeof(addrs.mac));

if (write_all_buf(fd, &addrs, sizeof(addrs))) @@ -71,7 +91,34 @@ static int seen_addrs_source_v2(struct ctx *c, }

/** - * seen_addrs_target_v2() - Receive and use guest observed addresses on target + * migrate_observed_addr4() - Add observed IPv4 address to address array + * @c: Execution context + * @addr: IPv4 address to add + */ +static void migrate_observed_addr4(struct ctx *c, const struct in_addr *addr) +{ + struct inany_addr_entry *e; + + if (IN4_IS_ADDR_UNSPECIFIED(addr)) + return; + + for_each_addr(c, e, AF_INET) { + if (IN4_ARE_ADDR_EQUAL(addr, inany_v4(&e->addr))) { + e->flags |= INANY_ADDR_OBSERVED; + return; + } + } + + if (c->addr_count < INANY_MAX_ADDRS) {

Hmm. If we don't have space for the observed addr, that might be a problem. Maybe we need to reserve a slot for it?

...

+ c->addrs[c->addr_count].addr = inany_from_v4(*addr); + c->addrs[c->addr_count].prefix_len = 0; + c->addrs[c->addr_count].flags = INANY_ADDR_OBSERVED; + c->addr_count++; + } +} + +/** + * seen_addrs_target_v2() - Receive observed addresses on target * @c: Execution context * @stage: Migration stage, unused * @fd: File descriptor for state transfer @@ -82,6 +129,7 @@ static int seen_addrs_target_v2(struct ctx *c, const struct migrate_stage *stage, int fd) { struct migrate_seen_addrs_v2 addrs; + struct in_addr addr4;

(void)stage;

@@ -90,7 +138,100 @@ static int seen_addrs_target_v2(struct ctx *c,

c->ip6.addr_seen = addrs.addr6; c->ip6.addr_ll_seen = addrs.addr6_ll; - c->ip4.addr_seen = addrs.addr4; + + /* Avoid alignment warning */ + addr4 = addrs.addr4; + migrate_observed_addr4(c, &addr4); + + memcpy(c->guest_mac, addrs.mac, sizeof(c->guest_mac)); + + return 0; +} + +/** + * seen_addrs_source_v3() - Send observed addresses from source + * @c: Execution context + * @stage: Migration stage, unused + * @fd: File descriptor for state transfer + * + * Return: 0 on success, positive error code on failure + */ +/* cppcheck-suppress [constParameterCallback, unmatchedSuppression] */ +static int seen_addrs_source_v3(struct ctx *c, + const struct migrate_stage *stage, int fd) +{ + struct migrate_seen_addrs_v3 addrs = { .addr_count = 0 }; + const struct inany_addr_entry *e; + + (void)stage; + + /* Copy only observed address entries */ + for_each_addr(c, e, 0) { + if ((e->flags & INANY_ADDR_OBSERVED) && + addrs.addr_count < INANY_MAX_ADDRS) + addrs.addrs[addrs.addr_count++] = *e; + } + + memcpy(addrs.mac, c->guest_mac, sizeof(addrs.mac)); + + if (write_all_buf(fd, &addrs, sizeof(addrs))) + return errno; + + return 0; +} + +/** + * migrate_observed_addr() - Add observed address to address array + * @c: Execution context + * @addr: Address entry to add + */ +static void migrate_observed_addr(struct ctx *c, + const struct inany_addr_entry *addr)

Surely migrate_observed_addr4() can be rewritten more easily in terms of this function.

...

+{ + struct inany_addr_entry *e; + + if (inany_is_unspecified(&addr->addr)) + return; + + /* Check if address already exists, just mark it observed */ + for_each_addr(c, e, 0) { + if (inany_equals(&e->addr, &addr->addr)) { + e->flags |= INANY_ADDR_OBSERVED; + return; + } + } + + /* Add new entry if there's room */ + if (c->addr_count < INANY_MAX_ADDRS) + c->addrs[c->addr_count++] = *addr; +} + +/** + * seen_addrs_target_v3() - Receive observed addresses on target + * @c: Execution context + * @stage: Migration stage, unused + * @fd: File descriptor for state transfer + * + * Return: 0 on success, positive error code on failure + */ +static int seen_addrs_target_v3(struct ctx *c, + const struct migrate_stage *stage, int fd) +{ + struct migrate_seen_addrs_v3 addrs; + uint8_t i; + + (void)stage; + + if (read_all_buf(fd, &addrs, sizeof(addrs))) + return errno; + + /* Merge observed addresses into existing array */ + for (i = 0; i < addrs.addr_count && i < INANY_MAX_ADDRS; i++) { + struct inany_addr_entry e = addrs.addrs[i]; + + migrate_observed_addr(c, &e); + } + memcpy(c->guest_mac, addrs.mac, sizeof(c->guest_mac));

return 0; @@ -116,8 +257,29 @@ static const struct migrate_stage stages_v2[] = { { 0 }, };

+/* Stages for version 3 (multiple observed IPv4 addresses) */ +static const struct migrate_stage stages_v3[] = { + { + .name = "observed addresses", + .source = seen_addrs_source_v3, + .target = seen_addrs_target_v3, + }, + { + .name = "prepare flows", + .source = flow_migrate_source_pre, + .target = NULL, + }, + { + .name = "transfer flows", + .source = flow_migrate_source, + .target = flow_migrate_target, + }, + { 0 }, +}; + /* Supported encoding versions, from latest (most preferred) to oldest */ static const struct migrate_version versions[] = { + { 3, stages_v3, }, { 2, stages_v2, }, /* v1 was released, but not widely used. It had bad endianness for the * MSS and omitted timestamps, which meant it usually wouldn't work. diff --git a/passt.h b/passt.h index ec5517d..73084ae 100644 --- a/passt.h +++ b/passt.h @@ -69,6 +69,7 @@ enum passt_modes { */ #define INANY_ADDR_CONFIGURED BIT(0) /* User set via -a */ #define INANY_ADDR_HOST BIT(1) /* From host interface */ +#define INANY_ADDR_OBSERVED BIT(2) /* Seen in guest traffic */

/** * for_each_addr() - Iterate over addresses in unified array @@ -99,7 +100,6 @@ struct inany_addr_entry {

/** * struct ip4_ctx - IPv4 execution context - * @addr_seen: Latest IPv4 address seen as source from tap * @guest_gw: IPv4 gateway as seen by the guest * @map_host_loopback: Outbound connections to this address are NATted to the * host's 127.0.0.1 @@ -115,7 +115,6 @@ struct inany_addr_entry { * @no_copy_addrs: Don't copy all addresses when configuring namespace */ struct ip4_ctx { - struct in_addr addr_seen; struct in_addr guest_gw; struct in_addr map_host_loopback; struct in_addr map_guest_addr; diff --git a/tap.c b/tap.c index f3073c6..f2512f2 100644 --- a/tap.c +++ b/tap.c @@ -161,6 +161,38 @@ void tap_send_single(const struct ctx *c, const void *data, size_t l2len) } }

+/** + * tap_check_src_addr4() - Note an IPv4 address seen in guest traffic + * @c: Execution context + * @addr: IPv4 address seen as source from guest + * + * Add the address to addrs[] with OBSERVED flag if not already present + */ +static void tap_check_src_addr4(struct ctx *c, const struct in_addr *addr) +{ + struct inany_addr_entry *e; + + /* Check if already in array */ + for_each_addr(c, e, AF_INET) { + if (IN4_ARE_ADDR_EQUAL(addr, inany_v4(&e->addr))) { + e->flags |= INANY_ADDR_OBSERVED; + return; + } + } + + /* Add new entry if space available */ + if (c->addr_count >= INANY_MAX_ADDRS) { + debug("Address table full, can't add new IPv4 address"); + return; + } + + c->addrs[c->addr_count].addr = inany_from_v4(*addr); + c->addrs[c->addr_count].prefix_len = 0; + c->addrs[c->addr_count].flags = INANY_ADDR_OBSERVED; + c->addr_count++; + debug("Added new observed IPv4 address");

This is almost identical to migrate_observed_addr4(), it should be possible to have a common "add observed address" function. In addition, this function will add observed addresses, but never remove them, which will cause problems. Most obviously, it adds the new address at the end, but we pick the first observed address when we need one to send to, so we will no longer update to sending to the new observed address. It also means if the guest keeps changing address, we'll fill up our array. For the time being, I think we'd be better off sticking to a single observed address at one time - when we see a new one, we de-tag any existing OBSERVED address, and remove/replace it if it doesn't have one of the other bits set. We'll probably need to tackle multiple observed addresses at some point, but that will need a broader rethink.

...

+} + /** * tap_ip6_daddr() - Normal IPv6 destination address for inbound packets * @c: Execution context @@ -771,8 +803,8 @@ resume: continue; }

- if (iph->saddr && c->ip4.addr_seen.s_addr != iph->saddr) - c->ip4.addr_seen.s_addr = iph->saddr; + if (iph->saddr) + tap_check_src_addr4(c, (const struct in_addr *)&iph->saddr);

if (!iov_drop_header(&data, hlen)) continue; @@ -950,10 +982,6 @@ resume: if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen)) { c->ip6.addr_seen = *saddr; } - - if (first_v6(c) && - IN6_IS_ADDR_UNSPECIFIED(&first_v6(c)->addr.a6)) - first_v6(c)->addr.a6 = *saddr; } else if (!IN6_IS_ADDR_UNSPECIFIED(saddr)){ c->ip6.addr_seen = *saddr; } -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

11:30 p.m.

New subject: [PATCH v3 10/11] ip: Track observed guest IPv6 addresses in unified address array

On Fri, Jan 30, 2026 at 04:44:46PM -0500, Jon Maloy wrote:

...

We remove the addr_seen and addr_ll_seen fields in struct ip6_ctx and replace them by setting INANY_ADDR_OBSERVED and INANY_ADDR_LINKLOCAL flags in the corresponding entry in the unified address array. If the seen address is not present in the array we add it first.

This completes the unification of address storage for both IPv4 and IPv6, enabling future support for multiple guest addresses per family.

Signed-off-by: Jon Maloy

A number of the comments on 9/11 will also apply here.

...

--- v3: - Adapted to previous changes in this series --- conf.c | 5 ---- dhcpv6.c | 6 ++--- dhcpv6.h | 2 +- fwd.c | 62 +++++++++++++++++++++++++++++++------------ migrate.c | 64 +++++++++++++++++++++++++++++++++++++++------ passt.h | 5 +--- pasta.c | 15 ++++++++++- tap.c | 78 +++++++++++++++++++++++++++++++++++++++++++++---------- 8 files changed, 185 insertions(+), 52 deletions(-)

diff --git a/conf.c b/conf.c index c5818a8..6892cea 100644 --- a/conf.c +++ b/conf.c @@ -764,7 +764,6 @@ static void conf_ip4_local(struct ctx *c) */ static unsigned int conf_ip6(unsigned int ifi, struct ctx *c) { - struct inany_addr_entry *e; union inany_addr addr = {0,}; int prefix_len = 0; int rc; @@ -800,10 +799,6 @@ static unsigned int conf_ip6(unsigned int ifi, struct ctx *c) c->addr_count++; }

- e = first_v6(c); - if (e) - c->ip6.addr_seen = e->addr.a6; - if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.guest_gw)) c->ip6.our_tap_ll = c->ip6.guest_gw;

diff --git a/dhcpv6.c b/dhcpv6.c index a1e5f15..67307d9 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -546,10 +546,11 @@ static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data, * Return: 0 if it's not a DHCPv6 message, 1 if handled, -1 on failure */ int dhcpv6(struct ctx *c, struct iov_tail *data, - const struct in6_addr *saddr, const struct in6_addr *daddr) + const struct in6_addr *daddr) { const struct opt_server_id *server_id = NULL; const struct opt_hdr *client_id = NULL; +

Unrelated whitespace change.

...

/* The _storage variables can't be local to the blocks they're used in, * because IOV_*_HEADER() may return pointers to them which are * dereferenced afterwards. Since we don't have Rust-like lifetime @@ -587,8 +588,6 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, if (mlen + sizeof(*uh) != ntohs(uh->len) || mlen < sizeof(*mh)) return -1;

- c->ip6.addr_ll_seen = *saddr; -

It's stored differently, but we still want to update the observed ip6_ll address here.

...

src = &c->ip6.our_tap_ll;

mh = IOV_REMOVE_HEADER(data, mh_storage); @@ -679,7 +678,6 @@ int dhcpv6(struct ctx *c, struct iov_tail *data,

tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546, mh->xid, &resp, n); - c->ip6.addr_seen = first_v6(c)->addr.a6;

return 1; } diff --git a/dhcpv6.h b/dhcpv6.h index c706dfd..8cbc769 100644 --- a/dhcpv6.h +++ b/dhcpv6.h @@ -7,7 +7,7 @@ #define DHCPV6_H

int dhcpv6(struct ctx *c, struct iov_tail *data, - struct in6_addr *saddr, struct in6_addr *daddr); + struct in6_addr *daddr); void dhcpv6_init(const struct ctx *c);

#endif /* DHCPV6_H */ diff --git a/fwd.c b/fwd.c index 5af482d..fc26eba 100644 --- a/fwd.c +++ b/fwd.c @@ -515,6 +515,36 @@ static const struct in_addr *fwd_guest_addr4(const struct ctx *c) return NULL; }

+/** + * fwd_guest_addr6() - Get first observed IPv6 guest address + * @c: Execution context + * @linklocal: If true, find link-local address; if false, find global + * + * Return: first observed IPv6 address, if any, + * or first configured, if any, or NULL + */ +static const struct in6_addr *fwd_guest_addr6(const struct ctx *c, + bool linklocal) +{ + const struct inany_addr_entry *e; + + /* Find first observed address of matching scope */ + for_each_addr(c, e, AF_INET6) { + if (!!(e->flags & INANY_ADDR_LINKLOCAL) != linklocal) + continue; + if (e->flags & INANY_ADDR_OBSERVED) + return &e->addr.a6; + } + + /* Fallback to first address of matching scope */ + for_each_addr(c, e, AF_INET6) { + if (!!(e->flags & INANY_ADDR_LINKLOCAL) == linklocal) + return &e->addr.a6; + } + + return NULL; +} + /** * fwd_guest_accessible4() - Is IPv4 address guest-accessible * @c: Execution context @@ -563,19 +593,11 @@ static bool fwd_guest_accessible6(const struct ctx *c, if (IN6_IS_ADDR_LOOPBACK(addr)) return false;

- /* Check against all configured guest addresses */ + /* Check against all guest addresses (configured, host, or observed) */ for_each_addr(c, e, AF_INET6) if (IN6_ARE_ADDR_EQUAL(addr, &e->addr.a6)) return false;

- /* For IPv6, addr_seen starts unspecified, because we don't know what LL - * address the guest will take until we see it. Only check against it - * if it has been set to a real address. - */ - if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen) && - IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr_seen)) - return false; - return true; }

@@ -798,10 +820,16 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, } tgt->oaddr = inany_any4; } else { - if (c->host_lo_to_ns_lo) + if (c->host_lo_to_ns_lo) { tgt->eaddr = inany_loopback6; - else - tgt->eaddr.a6 = c->ip6.addr_seen; + } else { + const struct in6_addr *guest_addr6; + + guest_addr6 = fwd_guest_addr6(c, false); + if (!guest_addr6) + return PIF_NONE; + tgt->eaddr.a6 = *guest_addr6; + } tgt->oaddr = inany_any6; }

@@ -833,10 +861,12 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, return PIF_NONE; tgt->eaddr = inany_from_v4(*guest_addr); } else { - if (inany_is_linklocal6(&tgt->oaddr)) - tgt->eaddr.a6 = c->ip6.addr_ll_seen; - else - tgt->eaddr.a6 = c->ip6.addr_seen; + bool linklocal = inany_is_linklocal6(&tgt->oaddr); + const struct in6_addr *guest_addr6 = fwd_guest_addr6(c, linklocal); + + if (!guest_addr6) + return PIF_NONE; + tgt->eaddr.a6 = *guest_addr6; }

return PIF_TAP; diff --git a/migrate.c b/migrate.c index a84af1c..20da1f7 100644 --- a/migrate.c +++ b/migrate.c @@ -66,10 +66,7 @@ struct migrate_seen_addrs_v3 { static int seen_addrs_source_v2(struct ctx *c, const struct migrate_stage *stage, int fd) { - struct migrate_seen_addrs_v2 addrs = { - .addr6 = c->ip6.addr_seen, - .addr6_ll = c->ip6.addr_ll_seen, - }; + struct migrate_seen_addrs_v2 addrs = { 0 }; const struct inany_addr_entry *e;

(void)stage; @@ -82,6 +79,23 @@ static int seen_addrs_source_v2(struct ctx *c, } }

+ /* Find first observed IPv6 addresses (global and link-local) */ + for_each_addr(c, e, AF_INET6) { + struct in6_addr tmp; + + if (!(e->flags & INANY_ADDR_OBSERVED)) + continue; + if (e->flags & INANY_ADDR_LINKLOCAL) { + tmp = addrs.addr6_ll; + if (IN6_IS_ADDR_UNSPECIFIED(&tmp)) + addrs.addr6_ll = e->addr.a6; + } else { + tmp = addrs.addr6; + if (IN6_IS_ADDR_UNSPECIFIED(&tmp)) + addrs.addr6 = e->addr.a6; + } + }

You could reuse fwd_guest_addr6() here.

...

+ memcpy(addrs.mac, c->guest_mac, sizeof(addrs.mac));

if (write_all_buf(fd, &addrs, sizeof(addrs))) @@ -117,6 +131,37 @@ static void migrate_observed_addr4(struct ctx *c, const struct in_addr *addr) } }

+/** + * migrate_observed_addr6() - Add observed IPv6 address to address array + * @c: Execution context + * @addr: IPv6 address to add + */ +static void migrate_observed_addr6(struct ctx *c, const struct in6_addr *addr) +{ + uint8_t flags = INANY_ADDR_OBSERVED; + struct inany_addr_entry *e; + + if (IN6_IS_ADDR_UNSPECIFIED(addr)) + return; + + if (IN6_IS_ADDR_LINKLOCAL(addr)) + flags |= INANY_ADDR_LINKLOCAL; + + for_each_addr(c, e, AF_INET6) { + if (IN6_ARE_ADDR_EQUAL(addr, &e->addr.a6)) { + e->flags |= flags; + return; + } + } + + if (c->addr_count < INANY_MAX_ADDRS) { + c->addrs[c->addr_count].addr.a6 = *addr; + c->addrs[c->addr_count].prefix_len = 0; + c->addrs[c->addr_count].flags = flags; + c->addr_count++; + }

Similar comments to v4: * Can use a unified function for migration and non-migration updates * Better to stick to 1 LL and 1 non-LL observed address for now, I think.

...

+} + /** * seen_addrs_target_v2() - Receive observed addresses on target * @c: Execution context @@ -129,6 +174,7 @@ static int seen_addrs_target_v2(struct ctx *c, const struct migrate_stage *stage, int fd) { struct migrate_seen_addrs_v2 addrs; + struct in6_addr addr6, addr6_ll; struct in_addr addr4;

(void)stage; @@ -136,12 +182,14 @@ static int seen_addrs_target_v2(struct ctx *c, if (read_all_buf(fd, &addrs, sizeof(addrs))) return errno;

- c->ip6.addr_seen = addrs.addr6; - c->ip6.addr_ll_seen = addrs.addr6_ll; - - /* Avoid alignment warning */ + /* Copy from packed struct to avoid alignment issues */ addr4 = addrs.addr4; + addr6 = addrs.addr6; + addr6_ll = addrs.addr6_ll; + migrate_observed_addr4(c, &addr4); + migrate_observed_addr6(c, &addr6); + migrate_observed_addr6(c, &addr6_ll);

memcpy(c->guest_mac, addrs.mac, sizeof(c->guest_mac));

diff --git a/passt.h b/passt.h index 73084ae..7040781 100644 --- a/passt.h +++ b/passt.h @@ -70,6 +70,7 @@ enum passt_modes { #define INANY_ADDR_CONFIGURED BIT(0) /* User set via -a */ #define INANY_ADDR_HOST BIT(1) /* From host interface */ #define INANY_ADDR_OBSERVED BIT(2) /* Seen in guest traffic */ +#define INANY_ADDR_LINKLOCAL BIT(3) /* Link-local address */

Isn't this redundant with the contents of the address itself?

...

/** * for_each_addr() - Iterate over addresses in unified array @@ -134,8 +135,6 @@ struct ip4_ctx {

/** * struct ip6_ctx - IPv6 execution context - * @addr_seen: Latest IPv6 global/site address seen as source from tap - * @addr_ll_seen: Latest IPv6 link-local address seen as source from tap * @guest_gw: IPv6 gateway as seen by the guest * @map_host_loopback: Outbound connections to this address are NATted to the * host's [::1] @@ -151,8 +150,6 @@ struct ip4_ctx { * @no_copy_addrs: Don't copy all addresses when configuring namespace */ struct ip6_ctx { - struct in6_addr addr_seen; - struct in6_addr addr_ll_seen; struct in6_addr guest_gw; struct in6_addr map_host_loopback; struct in6_addr map_guest_addr; diff --git a/pasta.c b/pasta.c index 8cb5873..7c2efd8 100644 --- a/pasta.c +++ b/pasta.c @@ -350,12 +350,19 @@ static void pasta_ns_conf_ip4(struct ctx *c) */ static void pasta_ns_conf_ip6(struct ctx *c) { + struct in6_addr addr_ll; int rc = 0;

- rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, &c->ip6.addr_ll_seen); + rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, &addr_ll); if (rc < 0) { warn("Can't get LL address from namespace: %s", strerror_(-rc)); + } else if (c->addr_count < INANY_MAX_ADDRS) { + c->addrs[c->addr_count].addr.a6 = addr_ll; + c->addrs[c->addr_count].prefix_len = 64; + c->addrs[c->addr_count].flags = + INANY_ADDR_OBSERVED | INANY_ADDR_LINKLOCAL; + c->addr_count++; }

rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); @@ -373,6 +380,11 @@ static void pasta_ns_conf_ip6(struct ctx *c) for_each_addr(c, e, AF_INET6) { if (IN6_IS_ADDR_UNSPECIFIED(&e->addr.a6)) continue; + + /* Skip link-local - kernel auto-configures */ + if (e->flags & INANY_ADDR_LINKLOCAL) + continue; +

I'm not sure we need this. The kernel _will_ auto-configure, but if the user has specified a specific one, we should probably use it. Usually I wouldn't expect there to be an LL address in the array at this point

...

rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET6, &e->addr.a6, e->prefix_len); if (rc < 0) @@ -440,6 +452,7 @@ void pasta_ns_conf(struct ctx *c)

if (c->ifi6) pasta_ns_conf_ip6(c); +

Unrelated whitespace change.

...

}

proto_update_l2_buf(c->guest_mac); diff --git a/tap.c b/tap.c index f2512f2..ba60f4b 100644 --- a/tap.c +++ b/tap.c @@ -193,6 +193,41 @@ static void tap_check_src_addr4(struct ctx *c, const struct in_addr *addr) debug("Added new observed IPv4 address"); }

+/** + * tap_check_src_addr6() - Note an IPv6 address seen in guest traffic + * @c: Execution context + * @addr: IPv6 address seen as source from guest + * + * Add the address to addrs[] with OBSERVED flag if not already present. + * Link-local addresses are also marked with LINKLOCAL flag. + */ +static void tap_check_src_addr6(struct ctx *c, const struct in6_addr *addr) +{ + uint8_t flags = INANY_ADDR_OBSERVED; + struct inany_addr_entry *e; + + if (IN6_IS_ADDR_LINKLOCAL(addr)) + flags |= INANY_ADDR_LINKLOCAL; + + /* Check if already in array */ + for_each_addr(c, e, AF_INET6) { + if (IN6_ARE_ADDR_EQUAL(addr, &e->addr.a6)) { + e->flags |= flags; + return; + } + } + + /* Add new entry if space available */ + if (c->addr_count < INANY_MAX_ADDRS) { + c->addrs[c->addr_count].addr.a6 = *addr; + c->addrs[c->addr_count].prefix_len = 0; + c->addrs[c->addr_count].flags = flags; + c->addr_count++; + } else { + warn("Address table full, can't add IPv6 address"); + } +} + /** * tap_ip6_daddr() - Normal IPv6 destination address for inbound packets * @c: Execution context @@ -203,9 +238,33 @@ static void tap_check_src_addr4(struct ctx *c, const struct in_addr *addr) const struct in6_addr *tap_ip6_daddr(const struct ctx *c, const struct in6_addr *src) { - if (IN6_IS_ADDR_LINKLOCAL(src)) - return &c->ip6.addr_ll_seen; - return &c->ip6.addr_seen; + bool want_ll = IN6_IS_ADDR_LINKLOCAL(src); + const struct inany_addr_entry *e; + + /* Find first observed address of matching scope */ + for_each_addr(c, e, AF_INET6) { + bool is_ll = !!(e->flags & INANY_ADDR_LINKLOCAL); + + if (is_ll != want_ll) + continue; + if (e->flags & INANY_ADDR_OBSERVED) + return &e->addr.a6; + } + + /* Fallback to first address of matching scope */ + for_each_addr(c, e, AF_INET6) { + bool is_ll = !!(e->flags & INANY_ADDR_LINKLOCAL); + + if (is_ll == want_ll) + return &e->addr.a6; + } + + /* Last resort: return first IPv6 address */

I think we have to match scope - an LL address is not meaningful to a peer who's not in scope. So I think all you can do is return NULL at this point.

...

+ e = first_v6(c); + if (e) + return &e->addr.a6; + + return &in6addr_any; }

/** @@ -976,15 +1035,8 @@ resume: continue; }

- if (IN6_IS_ADDR_LINKLOCAL(saddr)) { - c->ip6.addr_ll_seen = *saddr; - - if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen)) { - c->ip6.addr_seen = *saddr; - } - } else if (!IN6_IS_ADDR_UNSPECIFIED(saddr)){ - c->ip6.addr_seen = *saddr; - } + if (!IN6_IS_ADDR_UNSPECIFIED(saddr)) + tap_check_src_addr6(c, saddr);

if (proto == IPPROTO_ICMPV6) { struct iov_tail ndp_data; @@ -1015,7 +1067,7 @@ resume: if (proto == IPPROTO_UDP) { struct iov_tail uh_data = data;

- if (dhcpv6(c, &uh_data, saddr, daddr)) + if (dhcpv6(c, &uh_data, daddr)) continue; }

-- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

David Gibson

11:46 p.m.

New subject: [PATCH v3 11/11] conf: Select addresses for DHCP and NDP distribution

On Fri, Jan 30, 2026 at 04:44:47PM -0500, Jon Maloy wrote:

...

As a part of the multiple address support feature, we add a new algorithm for DHCP/DHCPv6/NDP address selection and refactor the conf_print() function accordingly.

The fwd_first_usable_addr() and fwd_select_addr() functions now take a context pointer and address family parameter to work with the unified address array using for_each_addr().

Signed-off-by: Jon Maloy --- conf.c | 60 +++++++++++++++++++++++++---------------- dhcp.c | 17 +++++++++--- dhcp.h | 2 +- dhcpv6.c | 17 ++++++++++-- fwd.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fwd.h | 6 +++++ ndp.c | 15 ++++++++--- ndp.h | 4 +-- passt.h | 3 +++ 9 files changed, 171 insertions(+), 35 deletions(-)

diff --git a/conf.c b/conf.c index 6892cea..4c53d33 100644 --- a/conf.c +++ b/conf.c @@ -46,6 +46,7 @@ #include "lineread.h" #include "isolation.h" #include "log.h" +#include "fwd.h" #include "vhost_user.h"

#define NETNS_RUN_DIR "/run/netns" @@ -1099,10 +1100,12 @@ enum passt_modes conf_mode(int argc, char *argv[]) static void conf_print(const struct ctx *c) { char buf4[INET_ADDRSTRLEN], buf6[INET6_ADDRSTRLEN]; - char bufmac[ETH_ADDRSTRLEN], ifn[IFNAMSIZ]; + char bufmac[ETH_ADDRSTRLEN]; int i;

if (c->ifi4 > 0 || c->ifi6 > 0) { + char ifn[IFNAMSIZ]; + info("Template interface: %s%s%s%s%s", c->ifi4 > 0 ? if_indextoname(c->ifi4, ifn) : "", c->ifi4 > 0 ? " (IPv4)" : "", @@ -1144,20 +1147,22 @@ static void conf_print(const struct ctx *c) buf4, sizeof(buf4)));

if (!c->no_dhcp) { - struct inany_addr_entry *e = first_v4(c); - uint32_t mask; - - mask = IN4_MASK(e->prefix_len); - - info("DHCP:"); - info(" assign: %s", - inet_ntop(AF_INET, inany_v4(&e->addr), - buf4, sizeof(buf4))); - info(" mask: %s", - inet_ntop(AF_INET, &mask, buf4, sizeof(buf4))); - info(" router: %s", - inet_ntop(AF_INET, &c->ip4.guest_gw, - buf4, sizeof(buf4))); + const struct inany_addr_entry *e; + + e = fwd_first_usable_addr(c, AF_INET, + INANY_ADDR_OBSERVED);

The fact that INANY_ADDR_OBSERVER is things to skip rather than things to look for is not obvious here at the call site.

...

+ if (e) { + uint32_t mask = IN4_MASK(e->prefix_len); + + info("DHCP:"); + info(" assign: %s", + inany_ntop(&e->addr, buf4, sizeof(buf4))); + info(" mask: %s", + inet_ntop(AF_INET, &mask, buf4, sizeof(buf4))); + info(" router: %s", + inet_ntop(AF_INET, &c->ip4.guest_gw, + buf4, sizeof(buf4))); + } }

for (i = 0; !IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns[i]); i++) { @@ -1189,14 +1194,23 @@ static void conf_print(const struct ctx *c) else goto dns6;

- info(" assign: %s", - inet_ntop(AF_INET6, &first_v6(c)->addr.a6, - buf6, sizeof(buf6))); - info(" router: %s", - inet_ntop(AF_INET6, &c->ip6.guest_gw, buf6, sizeof(buf6))); - info(" our link-local: %s", - inet_ntop(AF_INET6, &c->ip6.our_tap_ll, - buf6, sizeof(buf6))); + { + const struct inany_addr_entry *e; + + e = fwd_first_usable_addr(c, AF_INET6, + INANY_ADDR_OBSERVED | + INANY_ADDR_LINKLOCAL); + if (e) { + info(" assign: %s", + inany_ntop(&e->addr, buf6, sizeof(buf6))); + info(" router: %s", + inet_ntop(AF_INET6, &c->ip6.guest_gw, + buf6, sizeof(buf6))); + info(" our link-local: %s", + inet_ntop(AF_INET6, &c->ip6.our_tap_ll, + buf6, sizeof(buf6))); + } + }

dns6: for (i = 0; !IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns[i]); i++) { diff --git a/dhcp.c b/dhcp.c index 933a8cf..31ea4f1 100644 --- a/dhcp.c +++ b/dhcp.c @@ -31,6 +31,7 @@ #include "passt.h" #include "tap.h" #include "log.h" +#include "fwd.h" #include "dhcp.h"

/** @@ -300,8 +301,9 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len) * * Return: 0 if it's not a DHCP message, 1 if handled, -1 on failure */ -int dhcp(const struct ctx *c, struct iov_tail *data) +int dhcp(struct ctx *c, struct iov_tail *data) { + union inany_addr yiaddr = { 0 }; char macstr[ETH_ADDRSTRLEN]; size_t mlen, dlen, opt_len; struct in_addr mask, dst; @@ -314,7 +316,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data) struct msg m_storage; struct msg const *m; struct msg reply; - unsigned int i; + int i;

eh = IOV_REMOVE_HEADER(data, eh_storage); iph = IOV_PEEK_HEADER(data, iph_storage); @@ -344,6 +346,15 @@ int dhcp(const struct ctx *c, struct iov_tail *data) m->op != BOOTREQUEST) return -1;

+ /* Select address to offer */ + { + const struct inany_addr_entry *e; + + e = fwd_select_addr(c, AF_INET, INANY_ADDR_DHCP, + INANY_ADDR_OBSERVED);

The fact we're using a different call here, where we actually select the address, from the call where we print the address to use seems fragile.

...

+ if (e) + yiaddr = e->addr;

.. and if !e? I think you want to not respond to the DHCP at all, rather than reply with an unspecified address.

...

+ } reply.op = BOOTREPLY; reply.htype = m->htype; reply.hlen = m->hlen; @@ -352,7 +363,7 @@ int dhcp(const struct ctx *c, struct iov_tail *data) reply.secs = 0; reply.flags = m->flags; reply.ciaddr = m->ciaddr; - reply.yiaddr = *inany_v4(&first_v4(c)->addr); + reply.yiaddr = *inany_v4(&yiaddr); reply.siaddr = 0; reply.giaddr = m->giaddr; memcpy(&reply.chaddr, m->chaddr, sizeof(reply.chaddr)); diff --git a/dhcp.h b/dhcp.h index cd50c99..7326c7d 100644 --- a/dhcp.h +++ b/dhcp.h @@ -6,7 +6,7 @@ #ifndef DHCP_H #define DHCP_H

-int dhcp(const struct ctx *c, struct iov_tail *data); +int dhcp(struct ctx *c, struct iov_tail *data); void dhcp_init(void);

#endif /* DHCP_H */ diff --git a/dhcpv6.c b/dhcpv6.c index 67307d9..32494e9 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -31,6 +31,7 @@ #include "passt.h" #include "tap.h" #include "log.h" +#include "fwd.h"

/** * struct opt_hdr - DHCPv6 option header @@ -564,6 +565,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, struct opt_hdr client_id_storage; /* cppcheck-suppress [variableScope,unmatchedSuppression] */ struct opt_ia_na ia_storage; + struct in6_addr *ia_addr = NULL; const struct in6_addr *src; struct msg_hdr mh_storage; const struct msg_hdr *mh; @@ -612,6 +614,17 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, if (ia && ntohs(ia->hdr.l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta))) return -1;

+ /* Select address to assign */ + { + struct inany_addr_entry *e; + + e = fwd_select_addr(c, AF_INET6, INANY_ADDR_DHCP6, + INANY_ADDR_OBSERVED | INANY_ADDR_LINKLOCAL); + if (e) { + ia_addr = &e->addr.a6; + resp.ia_addr.addr = *ia_addr; + } + }

Same comments as for DHCPv4. But more generally, this seems like a lot of work for an interim step. Why not go straight to advertising all the addresses with DHCPv6?

...

resp.hdr.type = TYPE_REPLY; switch (mh->type) { case TYPE_REQUEST: @@ -624,7 +637,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, if (mh->type == TYPE_CONFIRM && server_id) return -1;

- if (dhcpv6_ia_notonlink(data, &first_v6(c)->addr.a6)) { + if (dhcpv6_ia_notonlink(data, ia_addr)) {

dhcpv6_send_ia_notonlink(c, data, &client_id_base, ntohs(client_id->l), mh->xid); @@ -701,5 +714,5 @@ void dhcpv6_init(const struct ctx *c) memcpy(resp_not_on_link.server_id.duid_lladdr, c->our_tap_mac, sizeof(c->our_tap_mac));

- resp.ia_addr.addr = first_v6(c)->addr.a6; + /* Address is set dynamically in dhcpv6() */ } diff --git a/fwd.c b/fwd.c index fc26eba..e41ae9b 100644 --- a/fwd.c +++ b/fwd.c @@ -491,6 +491,88 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini) ((ini->oport == 53) || (ini->oport == 853)); }

+/** + * fwd_first_usable_addr() - Find first assignable address (read-only) + * @c: Execution context + * @af: Address family (AF_INET or AF_INET6) + * @skip: Flags to skip (e.g., INANY_ADDR_LINKLOCAL | INANY_ADDR_OBSERVED) + * + * Prefers CONFIGURED over HOST addresses.

Not sure we explicitly need this. CONFIGURED addresses should generally go before HOST addresses anyway. Once we allow dynamic reconfiguration of addresses we'll need to re-examine our priority policies anyway. I think it will be cleaner to control priority by controlling the order we insert / update them, rather than prioritising when we look up.

...

+ * Return: pointer to first usable address, or NULL + */ +const struct inany_addr_entry * +fwd_first_usable_addr(const struct ctx *c, int af, int skip) +{ + const struct inany_addr_entry *e; + + for_each_addr(c, e, af) { + if (e->flags & skip) + continue; + if (e->flags & INANY_ADDR_CONFIGURED) + return e; + } + + for_each_addr(c, e, af) { + if (e->flags & skip) + continue; + if (e->flags & INANY_ADDR_HOST) + return e; + } + + return NULL; +} + +/** + * fwd_select_addr() - Select address for protocol assignment (DHCP/NDP/DHCPv6) + * @c: Execution context + * @af: Address family (AF_INET or AF_INET6) + * @assign_flag: Flag to set on selected address (INANY_ADDR_DHCP, etc.) + * @skip: Flags to skip (e.g., INANY_ADDR_LINKLOCAL | INANY_ADDR_OBSERVED) + * + * Selection priority: + * 1. Address already having @assign_flag set + * 2. Address with INANY_ADDR_CONFIGURED (user -a option) + * 3. Address with INANY_ADDR_HOST (from template interface) + * + * Return: pointer to selected address entry, or NULL if none found + */ +struct inany_addr_entry *fwd_select_addr(struct ctx *c, int af, + int assign_flag, int skip) +{ + struct inany_addr_entry *e; + + /* First: find address already having assign_flag */ + for_each_addr(c, e, af) { + if (e->flags & skip) + continue; + if (e->flags & assign_flag) + return e; + } + + /* Second: find CONFIGURED address, set assign_flag */ + for_each_addr(c, e, af) { + if (e->flags & skip) + continue; + if (e->flags & INANY_ADDR_CONFIGURED) { + e->flags |= assign_flag;

What's the value of tagging with a persistent flag?

...

+ return e; + } + } + + /* Third: find HOST address, set assign_flag */ + for_each_addr(c, e, af) { + if (e->flags & skip) + continue; + if (e->flags & INANY_ADDR_HOST) { + e->flags |= assign_flag; + return e; + } + } + + return NULL; +} + /** * fwd_guest_addr4() - Get first observed IPv4 guest address * @c: Execution context diff --git a/fwd.h b/fwd.h index 7792582..91b15dc 100644 --- a/fwd.h +++ b/fwd.h @@ -9,6 +9,7 @@

union inany_addr; struct flowside; +struct inany_addr_entry;

/* Number of ports for both TCP and UDP */ #define NUM_PORTS (1U << 16) @@ -63,4 +64,9 @@ void fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, uint8_t *mac); void fwd_neigh_table_init(const struct ctx *c);

+const struct inany_addr_entry *fwd_first_usable_addr(const struct ctx *c, + int af, int skip); +struct inany_addr_entry *fwd_select_addr(struct ctx *c, int af, + int assign_flag, int skip); + #endif /* FWD_H */ diff --git a/ndp.c b/ndp.c index f60c298..2211f86 100644 --- a/ndp.c +++ b/ndp.c @@ -32,6 +32,7 @@ #include "passt.h" #include "tap.h" #include "log.h" +#include "fwd.h"

#define RT_LIFETIME 65535

@@ -236,7 +237,7 @@ void ndp_unsolicited_na(const struct ctx *c, const struct in6_addr *addr) * @c: Execution context * @dst: IPv6 address to send the RA to */ -static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) +static void ndp_ra(struct ctx *c, const struct in6_addr *dst) { struct ndp_ra ra = { .ih = { @@ -257,7 +258,7 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) .valid_lifetime = ~0U, .pref_lifetime = ~0U, }, - .prefix = first_v6(c)->addr.a6, + .prefix = IN6ADDR_ANY_INIT, .source_ll = { .header = { .type = OPT_SRC_L2_ADDR, @@ -265,8 +266,14 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) }, }, }; + struct inany_addr_entry *e; unsigned char *ptr = NULL;

+ /* Select address/prefix to advertise */ + e = fwd_select_addr(c, AF_INET6, INANY_ADDR_NDP, INANY_ADDR_LINKLOCAL); + if (e) + ra.prefix = e->addr.a6;

As for DHCP if !e we just shouldn't send this RA. As for DHPCPv6, having this interim step seems awkward - we want to send RAs for all the addresses. Btw, now that we're actually tracking prefix length, we should probably avoid RAs if prefix_len != 64 - SLAAC address configuration won't work if that's not the case.

...

+ ptr = &ra.var[0];

if (c->mtu) { @@ -353,7 +360,7 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) * * Return: 0 if not handled here, 1 if handled, -1 on failure */ -int ndp(const struct ctx *c, const struct in6_addr *saddr, +int ndp(struct ctx *c, const struct in6_addr *saddr, struct iov_tail *data) { struct icmp6hdr ih_storage; @@ -407,7 +414,7 @@ static time_t next_ra; * @c: Execution context * @now: Current (monotonic) time */ -void ndp_timer(const struct ctx *c, const struct timespec *now) +void ndp_timer(struct ctx *c, const struct timespec *now) { time_t max_rtr_adv_interval = DEFAULT_MAX_RTR_ADV_INTERVAL; time_t min_rtr_adv_interval, interval; diff --git a/ndp.h b/ndp.h index 56b756d..726aef6 100644 --- a/ndp.h +++ b/ndp.h @@ -8,9 +8,9 @@

struct icmp6hdr;

-int ndp(const struct ctx *c, const struct in6_addr *saddr, +int ndp(struct ctx *c, const struct in6_addr *saddr, struct iov_tail *data); -void ndp_timer(const struct ctx *c, const struct timespec *now); +void ndp_timer(struct ctx *c, const struct timespec *now); void ndp_send_init_req(const struct ctx *c); void ndp_unsolicited_na(const struct ctx *c, const struct in6_addr *addr);

diff --git a/passt.h b/passt.h index 7040781..f5f8c68 100644 --- a/passt.h +++ b/passt.h @@ -71,6 +71,9 @@ enum passt_modes { #define INANY_ADDR_HOST BIT(1) /* From host interface */ #define INANY_ADDR_OBSERVED BIT(2) /* Seen in guest traffic */ #define INANY_ADDR_LINKLOCAL BIT(3) /* Link-local address */ +#define INANY_ADDR_DHCP BIT(4) /* IPv4: assigned via DHCP */ +#define INANY_ADDR_DHCP6 BIT(5) /* IPv6: assigned via DHCPv6 */ +#define INANY_ADDR_NDP BIT(6) /* IPv6: prefix via NDP/RA */

/** * for_each_addr() - Iterate over addresses in unified array -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

Age (days ago)

Last active (days ago)

List overview

Download

26 comments

2 participants

participants (2)

David Gibson
Jon Maloy

[PATCH v3 00/11] Introduce multiple addresses

tags

participants (2)