[PATCH v3] netlink: Return prefix length for IPv6 addresses in nl_addr_get()
nl_addr_get() was not setting the prefix_len output parameter for
IPv6 addresses, only for IPv4. This meant callers always got 0 for
IPv6, forcing them to use a hardcoded default (64).
Fix by assigning *prefix_len even in the IPv6 case.
We also add another functional change. We now check for if an AF_INET
address is link local, in which case we have to skip it. Although it
is conventional to set the scope of such addresses to RT_SCOPE_LINK,
this is not stated in any RFC, and we cannot trust it to have been set
correctly by the user, just as we cannot trust it to have been set
correctly for any other AF_INET address. We therefore add this check
explicitly on the address itself.
Signed-off-by: Jon Maloy
On Sat, Mar 07, 2026 at 01:41:57PM -0500, Jon Maloy wrote:
nl_addr_get() was not setting the prefix_len output parameter for IPv6 addresses, only for IPv4. This meant callers always got 0 for IPv6, forcing them to use a hardcoded default (64).
Fix by assigning *prefix_len even in the IPv6 case.
We also add another functional change. We now check for if an AF_INET address is link local, in which case we have to skip it. Although it is conventional to set the scope of such addresses to RT_SCOPE_LINK, this is not stated in any RFC, and we cannot trust it to have been set correctly by the user, just as we cannot trust it to have been set correctly for any other AF_INET address. We therefore add this check explicitly on the address itself.
Signed-off-by: Jon Maloy
--- v2: - Simplified according to feedback from S. Brivio - Added test for AF_INET link local property v3: - Corrected claim about convention for IPv4 link local address scope in commit log. --- netlink.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/netlink.c b/netlink.c index 82a2f0c..f0e8fa7 100644 --- a/netlink.c +++ b/netlink.c @@ -752,7 +752,7 @@ int nl_addr_set_ll_nodad(int s, unsigned int ifi) * @ifi: Interface index in outer network namespace * @af: Address family * @addr: Global address to fill - * @prefix_len: Mask or prefix length, to fill (for IPv4) + * @prefix_len: Mask or prefix length, to fill * @addr_l: Link-scoped address to fill (for IPv6) * * Return: 0 on success, negative error code on failure @@ -777,6 +777,7 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); struct rtattr *rta; + bool link_local; size_t na;
if (ifa->ifa_index != ifi || ifa->ifa_flags & IFA_F_DEPRECATED) @@ -788,18 +789,14 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, (af == AF_INET6 && rta->rta_type != IFA_ADDRESS)) continue;
- if (af == AF_INET && ifa->ifa_prefixlen > prefix_max) { - memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta)); + link_local = (af == AF_INET6) ? + ifa->ifa_scope >= RT_SCOPE_LINK : + IN4_IS_ADDR_LINKLOCAL(RTA_DATA(rta));
Is ifa->ifa_scope not populatedcorrectly for IPv4 link local addresses?
- prefix_max = *prefix_len = ifa->ifa_prefixlen; - } else if (af == AF_INET6 && addr && - ifa->ifa_scope < RT_SCOPE_LINK && - ifa->ifa_prefixlen > prefix_max) { + if (ifa->ifa_prefixlen > prefix_max && !link_local) { memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta)); - - prefix_max = ifa->ifa_prefixlen; + prefix_max = *prefix_len = ifa->ifa_prefixlen; } - if (addr_l && af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK && ifa->ifa_prefixlen > prefix_max_ll) { -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
The patch looks good to me now, but I have two questions:
On Sat, 7 Mar 2026 13:41:57 -0500
Jon Maloy
nl_addr_get() was not setting the prefix_len output parameter for IPv6 addresses, only for IPv4. This meant callers always got 0 for IPv6, forcing them to use a hardcoded default (64).
Fix by assigning *prefix_len even in the IPv6 case.
We also add another functional change. We now check for if an AF_INET address is link local, in which case we have to skip it.
The reason why the original code skipped IPv6 link-local addresses and not IPv4 link-local ones is that copying a IPv6 link-local address clearly makes no sense and breaks things. For IPv4 I wasn't quite sure, and it seemed to work just like other addresses, so I never took care of excluding them. I tend to think it's correct to exclude them, also for consistency with IPv6, but I'm not quite sure if we risk breaking something. I have some vague recollection of link-local addresses being used in some cloud (probably Google Computing Platform), at least for some Podman tests. I'll try to find some pointers to it. Did you already look into the matter, though? By the way, this makes things inconsistent with nl_addr_dup() (used by the vast majority of users), where IPv4 link-local addresses are copied just like all the other ones.
Although it is conventional to set the scope of such addresses to RT_SCOPE_LINK,
You mean that users manually do that? I think it's kind of rare actually. Does the kernel do that? Or configuration agents such as NetworkManager? I wonder a bit what you consider as "user" here.
this is not stated in any RFC, and we cannot trust it to have been set correctly by the user, just as we cannot trust it to have been set correctly for any other AF_INET address. We therefore add this check explicitly on the address itself.
-- Stefano
On 2026-03-09 05:47, David Gibson wrote:
On Sat, Mar 07, 2026 at 01:41:57PM -0500, Jon Maloy wrote:
nl_addr_get() was not setting the prefix_len output parameter for IPv6 addresses, only for IPv4. This meant callers always got 0 for IPv6, forcing them to use a hardcoded default (64).
Fix by assigning *prefix_len even in the IPv6 case.
We also add another functional change. We now check for if an AF_INET address is link local, in which case we have to skip it. Although it is conventional to set the scope of such addresses to RT_SCOPE_LINK, this is not stated in any RFC, and we cannot trust it to have been set correctly by the user, just as we cannot trust it to have been set correctly for any other AF_INET address. We therefore add this check explicitly on the address itself.
Signed-off-by: Jon Maloy
--- v2: - Simplified according to feedback from S. Brivio - Added test for AF_INET link local property v3: - Corrected claim about convention for IPv4 link local address scope in commit log. --- netlink.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/netlink.c b/netlink.c index 82a2f0c..f0e8fa7 100644 --- a/netlink.c +++ b/netlink.c @@ -752,7 +752,7 @@ int nl_addr_set_ll_nodad(int s, unsigned int ifi) * @ifi: Interface index in outer network namespace * @af: Address family * @addr: Global address to fill - * @prefix_len: Mask or prefix length, to fill (for IPv4) + * @prefix_len: Mask or prefix length, to fill * @addr_l: Link-scoped address to fill (for IPv6) * * Return: 0 on success, negative error code on failure @@ -777,6 +777,7 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); struct rtattr *rta; + bool link_local; size_t na;
if (ifa->ifa_index != ifi || ifa->ifa_flags & IFA_F_DEPRECATED) @@ -788,18 +789,14 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, (af == AF_INET6 && rta->rta_type != IFA_ADDRESS)) continue;
- if (af == AF_INET && ifa->ifa_prefixlen > prefix_max) { - memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta)); + link_local = (af == AF_INET6) ? + ifa->ifa_scope >= RT_SCOPE_LINK : + IN4_IS_ADDR_LINKLOCAL(RTA_DATA(rta));
Is ifa->ifa_scope not populatedcorrectly for IPv4 link local addresses?
It isn't populated by the kernel, and my conclusion was we cannot trust it is set correctly. Also see my response to Stefano about this. /jon
- prefix_max = *prefix_len = ifa->ifa_prefixlen; - } else if (af == AF_INET6 && addr && - ifa->ifa_scope < RT_SCOPE_LINK && - ifa->ifa_prefixlen > prefix_max) { + if (ifa->ifa_prefixlen > prefix_max && !link_local) { memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta)); - - prefix_max = ifa->ifa_prefixlen; + prefix_max = *prefix_len = ifa->ifa_prefixlen; } - if (addr_l && af == AF_INET6 && ifa->ifa_scope == RT_SCOPE_LINK && ifa->ifa_prefixlen > prefix_max_ll) { -- 2.52.0
On 2026-03-09 05:56, Stefano Brivio wrote:
The patch looks good to me now, but I have two questions:
On Sat, 7 Mar 2026 13:41:57 -0500 Jon Maloy
wrote: nl_addr_get() was not setting the prefix_len output parameter for IPv6 addresses, only for IPv4. This meant callers always got 0 for IPv6, forcing them to use a hardcoded default (64).
Fix by assigning *prefix_len even in the IPv6 case.
We also add another functional change. We now check for if an AF_INET address is link local, in which case we have to skip it.
The reason why the original code skipped IPv6 link-local addresses and not IPv4 link-local ones is that copying a IPv6 link-local address clearly makes no sense and breaks things.
For IPv4 I wasn't quite sure, and it seemed to work just like other addresses, so I never took care of excluding them.
I tend to think it's correct to exclude them, also for consistency with IPv6, but I'm not quite sure if we risk breaking something. I have some vague recollection of link-local addresses being used in some cloud (probably Google Computing Platform), at least for some Podman tests. I'll try to find some pointers to it.
Did you already look into the matter, though?
Honestly I though it was just an oversight.
By the way, this makes things inconsistent with nl_addr_dup() (used by the vast majority of users), where IPv4 link-local addresses are copied just like all the other ones.
Although it is conventional to set the scope of such addresses to RT_SCOPE_LINK,
You mean that users manually do that? I think it's kind of rare actually. Does the kernel do that? Or configuration agents such as NetworkManager? I wonder a bit what you consider as "user" here.
It is normally set by NetworkManager, but I don't think that means we can trust it at 100%. Anyway, I will remove this change in an update, as we agreed upon this morning. /jon
this is not stated in any RFC, and we cannot trust it to have been set correctly by the user, just as we cannot trust it to have been set correctly for any other AF_INET address. We therefore add this check explicitly on the address itself.
On Mon, 9 Mar 2026 10:56:06 +0100
Stefano Brivio
[...]
The reason why the original code skipped IPv6 link-local addresses and not IPv4 link-local ones is that copying a IPv6 link-local address clearly makes no sense and breaks things.
For IPv4 I wasn't quite sure, and it seemed to work just like other addresses, so I never took care of excluding them.
I tend to think it's correct to exclude them, also for consistency with IPv6, but I'm not quite sure if we risk breaking something. I have some vague recollection of link-local addresses being used in some cloud (probably Google Computing Platform), at least for some Podman tests. I'll try to find some pointers to it.
Paul found this, from a GCE environment used in Podman's CI: https://api.cirrus-ci.com/v1/task/5902961064280064/logs/journal.log (look for "Route IPv4 info"): Mar 09 16:58:37 cirrus-task-5902961064280064 cloud-init[573]: ci-info: +++++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++++ Mar 09 16:58:37 cirrus-task-5902961064280064 cloud-init[573]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ Mar 09 16:58:37 cirrus-task-5902961064280064 cloud-init[573]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags | Mar 09 16:58:37 cirrus-task-5902961064280064 cloud-init[573]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ Mar 09 16:58:37 cirrus-task-5902961064280064 cloud-init[573]: ci-info: | 0 | 0.0.0.0 | 10.128.0.1 | 0.0.0.0 | ens4 | UG | Mar 09 16:58:37 cirrus-task-5902961064280064 cloud-init[573]: ci-info: | 1 | 10.128.0.1 | 0.0.0.0 | 255.255.255.255 | ens4 | UH | Mar 09 16:58:37 cirrus-task-5902961064280064 cloud-init[573]: ci-info: | 2 | 169.254.169.254 | 10.128.0.1 | 255.255.255.255 | ens4 | UGH | Mar 09 16:58:37 cirrus-task-5902961064280064 cloud-init[573]: ci-info: +-------+-----------------+------------+-----------------+-----------+-------+ ...so there's a route to a link-local address, because that happens to be the DNS server. The host doesn't have a link-local IPv4 address though, so nothing we would risk missing with this kind of change. In any case, as agreed, better to leave that out as you did in v4, because it's not a trivial task to exclude that we're going to break something at this point. -- Stefano
On Mon, Mar 09, 2026 at 02:04:40PM -0400, Jon Maloy wrote:
On 2026-03-09 05:56, Stefano Brivio wrote:
The patch looks good to me now, but I have two questions:
On Sat, 7 Mar 2026 13:41:57 -0500 Jon Maloy
wrote: nl_addr_get() was not setting the prefix_len output parameter for IPv6 addresses, only for IPv4. This meant callers always got 0 for IPv6, forcing them to use a hardcoded default (64).
Fix by assigning *prefix_len even in the IPv6 case.
We also add another functional change. We now check for if an AF_INET address is link local, in which case we have to skip it.
The reason why the original code skipped IPv6 link-local addresses and not IPv4 link-local ones is that copying a IPv6 link-local address clearly makes no sense and breaks things.
For IPv4 I wasn't quite sure, and it seemed to work just like other addresses, so I never took care of excluding them.
I tend to think it's correct to exclude them, also for consistency with IPv6, but I'm not quite sure if we risk breaking something. I have some vague recollection of link-local addresses being used in some cloud (probably Google Computing Platform), at least for some Podman tests. I'll try to find some pointers to it.
Did you already look into the matter, though?
Honestly I though it was just an oversight.
By the way, this makes things inconsistent with nl_addr_dup() (used by the vast majority of users), where IPv4 link-local addresses are copied just like all the other ones.
Although it is conventional to set the scope of such addresses to RT_SCOPE_LINK,
You mean that users manually do that? I think it's kind of rare actually. Does the kernel do that? Or configuration agents such as NetworkManager? I wonder a bit what you consider as "user" here.
It is normally set by NetworkManager,
Or, presumably by ip(8).
but I don't think that means we can trust it at 100%.
Actually, I think this means we should trust it more. I'd see this as an explicit indication that regardless of any normal conventions, it's being treated as a link-local address in the configuration of this specific host, so we should too. That's moot for the current patch, based on our earlier discussions, but I think it's a relevant point for what we do in future. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (3)
-
David Gibson
-
Jon Maloy
-
Stefano Brivio