There's no point in letting a container perform duplicate address detection as we'll silently discard neighbour solicitations with unspecified source addresses anyway, without relaying them to anybody. And we realised that it's not harmless, see the whole discussion around https://github.com/containers/podman/pull/23561#discussion_r1711639663: we can't communicate with the container right away because of that, which is surely annoying for tests, but it could also be an issue for use cases with very short-lived containers or namespaces. Disabling DAD via procfs configuration would be simpler than all this, but we don't own the namespace (unless we spawn a shell), so we shouldn't mess up with procfs entries, assuming it's even possible. Set the nodad attribute, and prevent DAD from being triggered before on link up, before we can set that attribute. v2: - in 4/7, instead of doing the whole nl_routes_dup()-vendored dance to keep addresses in a single buffer, send NLM_F_REPLACE requests right away, but use nlmsg_send() instead of nl_do(), and check for answers to our further requests later. Use warn() instead of die() if we can't set nodad attributes - in 5/7, make nl_addr_get_ll() get a pointer to struct in6_addr instead of a generic void pointer, and warn(), don't die(), if it fails Stefano Brivio (7): netlink: Fix typo in function comment for nl_addr_get() netlink, pasta: Split MTU setting functionality out of nl_link_up() netlink, pasta: Turn nl_link_up() into a generic function to set link flags netlink, pasta: Disable DAD for link-local addresses on namespace interface netlink, pasta: Fetch link-local address from namespace interface once it's up pasta: Disable neighbour solicitations on device up to prevent DAD netlink: Fix typo in function comment for nl_addr_set() netlink.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++----- netlink.h | 6 ++- pasta.c | 29 ++++++++++- 3 files changed, 164 insertions(+), 15 deletions(-) -- 2.43.0
Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/netlink.c b/netlink.c index 093de26..e6a315e 100644 --- a/netlink.c +++ b/netlink.c @@ -682,7 +682,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src, * @prefix_len: Mask or prefix length, to fill (for IPv4) * @addr_l: Link-scoped address to fill (for IPv6) * - * Return: 9 on success, negative error code on failure + * Return: 0 on success, negative error code on failure */ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, void *addr, int *prefix_len, void *addr_l) -- 2.43.0
As we'll use nl_link_up() for more than just bringing up devices, it will become awkward to carry empty MTU values around whenever we call it. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- netlink.c | 35 +++++++++++++++++++++++++---------- netlink.h | 3 ++- pasta.c | 7 +++++-- 3 files changed, 32 insertions(+), 13 deletions(-) diff --git a/netlink.c b/netlink.c index e6a315e..e33765e 100644 --- a/netlink.c +++ b/netlink.c @@ -942,14 +942,14 @@ int nl_link_set_mac(int s, unsigned int ifi, const void *mac) } /** - * nl_link_up() - Bring link up + * nl_link_set_mtu() - Set link MTU * @s: Netlink socket * @ifi: Interface index - * @mtu: If non-zero, set interface MTU + * @mtu: Interface MTU * * Return: 0 on success, negative error code on failure */ -int nl_link_up(int s, unsigned int ifi, int mtu) +int nl_link_set_mtu(int s, unsigned int ifi, int mtu) { struct req_t { struct nlmsghdr nlh; @@ -959,17 +959,32 @@ int nl_link_up(int s, unsigned int ifi, int mtu) } req = { .ifm.ifi_family = AF_UNSPEC, .ifm.ifi_index = ifi, - .ifm.ifi_flags = IFF_UP, - .ifm.ifi_change = IFF_UP, .rta.rta_type = IFLA_MTU, .rta.rta_len = RTA_LENGTH(sizeof(unsigned int)), .mtu = mtu, }; - ssize_t len = sizeof(req); - if (!mtu) - /* Shorten request to drop MTU attribute */ - len = offsetof(struct req_t, rta); + return nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req)); +} + +/** + * nl_link_up() - Bring link up + * @s: Netlink socket + * @ifi: Interface index + * + * Return: 0 on success, negative error code on failure + */ +int nl_link_up(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifinfomsg ifm; + } req = { + .ifm.ifi_family = AF_UNSPEC, + .ifm.ifi_index = ifi, + .ifm.ifi_flags = IFF_UP, + .ifm.ifi_change = IFF_UP, + }; - return nl_do(s, &req, RTM_NEWLINK, 0, len); + return nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req)); } diff --git a/netlink.h b/netlink.h index 3a1f0de..87d27ae 100644 --- a/netlink.h +++ b/netlink.h @@ -23,6 +23,7 @@ int nl_addr_dup(int s_src, unsigned int ifi_src, int s_dst, unsigned int ifi_dst, sa_family_t af); int nl_link_get_mac(int s, unsigned int ifi, void *mac); int nl_link_set_mac(int s, unsigned int ifi, const void *mac); -int nl_link_up(int s, unsigned int ifi, int mtu); +int nl_link_set_mtu(int s, unsigned int ifi, int mtu); +int nl_link_up(int s, unsigned int ifi); #endif /* NETLINK_H */ diff --git a/pasta.c b/pasta.c index 615ff7b..3a0652e 100644 --- a/pasta.c +++ b/pasta.c @@ -288,7 +288,7 @@ void pasta_ns_conf(struct ctx *c) { int rc = 0; - rc = nl_link_up(nl_sock_ns, 1 /* lo */, 0); + rc = nl_link_up(nl_sock_ns, 1 /* lo */); if (rc < 0) die("Couldn't bring up loopback interface in namespace: %s", strerror(-rc)); @@ -303,7 +303,10 @@ void pasta_ns_conf(struct ctx *c) strerror(-rc)); if (c->pasta_conf_ns) { - nl_link_up(nl_sock_ns, c->pasta_ifi, c->mtu); + if (c->mtu != -1) + nl_link_set_mtu(nl_sock_ns, c->pasta_ifi, c->mtu); + + nl_link_up(nl_sock_ns, c->pasta_ifi); if (c->ifi4) { if (c->ip4.no_copy_addrs) { -- 2.43.0
In the next patches, we'll reuse it to set flags other than IFF_UP. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- netlink.c | 11 +++++++---- netlink.h | 3 ++- pasta.c | 4 ++-- 3 files changed, 11 insertions(+), 7 deletions(-) diff --git a/netlink.c b/netlink.c index e33765e..873e6c7 100644 --- a/netlink.c +++ b/netlink.c @@ -968,13 +968,16 @@ int nl_link_set_mtu(int s, unsigned int ifi, int mtu) } /** - * nl_link_up() - Bring link up + * nl_link_set_flags() - Set link flags * @s: Netlink socket * @ifi: Interface index + * @set: Device flags to set + * @change: Mask of device flag changes * * Return: 0 on success, negative error code on failure */ -int nl_link_up(int s, unsigned int ifi) +int nl_link_set_flags(int s, unsigned int ifi, + unsigned int set, unsigned int change) { struct req_t { struct nlmsghdr nlh; @@ -982,8 +985,8 @@ int nl_link_up(int s, unsigned int ifi) } req = { .ifm.ifi_family = AF_UNSPEC, .ifm.ifi_index = ifi, - .ifm.ifi_flags = IFF_UP, - .ifm.ifi_change = IFF_UP, + .ifm.ifi_flags = set, + .ifm.ifi_change = change, }; return nl_do(s, &req, RTM_NEWLINK, 0, sizeof(req)); diff --git a/netlink.h b/netlink.h index 87d27ae..178f8ae 100644 --- a/netlink.h +++ b/netlink.h @@ -24,6 +24,7 @@ int nl_addr_dup(int s_src, unsigned int ifi_src, int nl_link_get_mac(int s, unsigned int ifi, void *mac); int nl_link_set_mac(int s, unsigned int ifi, const void *mac); int nl_link_set_mtu(int s, unsigned int ifi, int mtu); -int nl_link_up(int s, unsigned int ifi); +int nl_link_set_flags(int s, unsigned int ifi, + unsigned int set, unsigned int change); #endif /* NETLINK_H */ diff --git a/pasta.c b/pasta.c index 3a0652e..96545b1 100644 --- a/pasta.c +++ b/pasta.c @@ -288,7 +288,7 @@ void pasta_ns_conf(struct ctx *c) { int rc = 0; - rc = nl_link_up(nl_sock_ns, 1 /* lo */); + rc = nl_link_set_flags(nl_sock_ns, 1 /* lo */, IFF_UP, IFF_UP); if (rc < 0) die("Couldn't bring up loopback interface in namespace: %s", strerror(-rc)); @@ -306,7 +306,7 @@ void pasta_ns_conf(struct ctx *c) if (c->mtu != -1) nl_link_set_mtu(nl_sock_ns, c->pasta_ifi, c->mtu); - nl_link_up(nl_sock_ns, c->pasta_ifi); + nl_link_set_flags(nl_sock_ns, c->pasta_ifi, IFF_UP, IFF_UP); if (c->ifi4) { if (c->ip4.no_copy_addrs) { -- 2.43.0
It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address. While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6. This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches. As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep a count of messages we send (addresses we change) and deal with the answer to those NLM_F_REPLACE requests in a separate loop, later. Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 6 ++++++ 3 files changed, 62 insertions(+) diff --git a/netlink.c b/netlink.c index 873e6c7..59f2fd9 100644 --- a/netlink.c +++ b/netlink.c @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src, return 0; } +/** + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses + * @s: Netlink socket + * @ifi: Interface index in target namespace + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_set_ll_nodad(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + unsigned ll_addrs = 0; + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK) + continue; + + ifa->ifa_flags |= IFA_F_NODAD; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + /* If 32-bit flags are used, add IFA_F_NODAD there */ + if (rta->rta_type == IFA_FLAGS) + *(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD; + } + + nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len); + ll_addrs++; + } + + if (status < 0) + return status; + + seq += ll_addrs; + + nl_foreach(nh, status, s, buf, seq) + warn("netlink: Unexpected response message"); + + return status; +} + /** * nl_addr_get() - Get most specific global address, given interface and family * @s: Netlink socket diff --git a/netlink.h b/netlink.h index 178f8ae..66a44ad 100644 --- a/netlink.h +++ b/netlink.h @@ -19,6 +19,7 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, void *addr, int *prefix_len, void *addr_l); int nl_addr_set(int s, unsigned int ifi, sa_family_t af, const void *addr, int prefix_len); +int nl_addr_set_ll_nodad(int s, unsigned int ifi); int nl_addr_dup(int s_src, unsigned int ifi_src, int s_dst, unsigned int ifi_dst, sa_family_t af); int nl_link_get_mac(int s, unsigned int ifi, void *mac); diff --git a/pasta.c b/pasta.c index 96545b1..17eed15 100644 --- a/pasta.c +++ b/pasta.c @@ -340,6 +340,12 @@ void pasta_ns_conf(struct ctx *c) } if (c->ifi6) { + rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); + if (rc < 0) { + warn("Can't set nodad for LL in namespace: %s", + strerror(-rc)); + } + if (c->ip6.no_copy_addrs) { rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET6, &c->ip6.addr, 64); -- 2.43.0
On Thu, Aug 15, 2024 at 10:36:46AM +0200, Stefano Brivio wrote:It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address. While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6. This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches. As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep a count of messages we send (addresses we change) and deal with the answer to those NLM_F_REPLACE requests in a separate loop, later. Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 6 ++++++ 3 files changed, 62 insertions(+) diff --git a/netlink.c b/netlink.c index 873e6c7..59f2fd9 100644 --- a/netlink.c +++ b/netlink.c @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src, return 0; } +/** + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses + * @s: Netlink socket + * @ifi: Interface index in target namespace + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_set_ll_nodad(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + unsigned ll_addrs = 0; + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK) + continue; + + ifa->ifa_flags |= IFA_F_NODAD; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + /* If 32-bit flags are used, add IFA_F_NODAD there */ + if (rta->rta_type == IFA_FLAGS) + *(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD; + } + + nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len); + ll_addrs++; + } + + if (status < 0) + return status;Ah... one gotcha with the nl_send() in the loop. We should make sure we get the responses from any of those we sent, even if the original request failed. Otherwise we'll be out of sync on the netlink socket again.+ seq += ll_addrs; + + nl_foreach(nh, status, s, buf, seq) + warn("netlink: Unexpected response message");I don't think this will work right if there's > 1 address. It will be looking for the last sequence number on the first iteration and will die in nl_status() when it mismatches. Maybe just loop on nl_next() until you get the last seq number, then call nl_status()? That also means you could just save the seq each time you nl_send(), overwriting the previous one, rather than relying on the fact that we allocate seqs, well, sequentially.+ + return status; +} + /** * nl_addr_get() - Get most specific global address, given interface and family * @s: Netlink socket diff --git a/netlink.h b/netlink.h index 178f8ae..66a44ad 100644 --- a/netlink.h +++ b/netlink.h @@ -19,6 +19,7 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, void *addr, int *prefix_len, void *addr_l); int nl_addr_set(int s, unsigned int ifi, sa_family_t af, const void *addr, int prefix_len); +int nl_addr_set_ll_nodad(int s, unsigned int ifi); int nl_addr_dup(int s_src, unsigned int ifi_src, int s_dst, unsigned int ifi_dst, sa_family_t af); int nl_link_get_mac(int s, unsigned int ifi, void *mac); diff --git a/pasta.c b/pasta.c index 96545b1..17eed15 100644 --- a/pasta.c +++ b/pasta.c @@ -340,6 +340,12 @@ void pasta_ns_conf(struct ctx *c) } if (c->ifi6) { + rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); + if (rc < 0) { + warn("Can't set nodad for LL in namespace: %s", + strerror(-rc)); + } + if (c->ip6.no_copy_addrs) { rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET6, &c->ip6.addr, 64);-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Thu, 15 Aug 2024 20:38:17 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Aug 15, 2024 at 10:36:46AM +0200, Stefano Brivio wrote:I'm ignoring the return code of nl_send(), so, minus the issue you're raising about nl_foreach() below, that should already be sorted, right?It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address. While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6. This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches. As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep a count of messages we send (addresses we change) and deal with the answer to those NLM_F_REPLACE requests in a separate loop, later. Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 6 ++++++ 3 files changed, 62 insertions(+) diff --git a/netlink.c b/netlink.c index 873e6c7..59f2fd9 100644 --- a/netlink.c +++ b/netlink.c @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src, return 0; } +/** + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses + * @s: Netlink socket + * @ifi: Interface index in target namespace + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_set_ll_nodad(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + unsigned ll_addrs = 0; + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK) + continue; + + ifa->ifa_flags |= IFA_F_NODAD; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + /* If 32-bit flags are used, add IFA_F_NODAD there */ + if (rta->rta_type == IFA_FLAGS) + *(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD; + } + + nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len); + ll_addrs++; + } + + if (status < 0) + return status;Ah... one gotcha with the nl_send() in the loop. We should make sure we get the responses from any of those we sent, even if the original request failed. Otherwise we'll be out of sync on the netlink socket again.Ah, oops, right.+ seq += ll_addrs; + + nl_foreach(nh, status, s, buf, seq) + warn("netlink: Unexpected response message");I don't think this will work right if there's > 1 address. It will be looking for the last sequence number on the first iteration and will die in nl_status() when it mismatches.Maybe just loop on nl_next() until you get the last seq number, then call nl_status()?How do I check for errors on the answers before the next one? I mean, nl_foreach() should fit here, it's just that I need to start from the right sequence number.That also means you could just save the seq each time you nl_send(), overwriting the previous one, rather than relying on the fact that we allocate seqs, well, sequentially.I don't understand how this fits with calling nl_next() until I get to the last sequence number. Letting that aside, can't I simply use nl_foreach(), but start with the sequence of the first nl_send() instead of the last one? -- Stefano
On Thu, Aug 15, 2024 at 12:59:32PM +0200, Stefano Brivio wrote:On Thu, 15 Aug 2024 20:38:17 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:No. The return code from nl_send() is mostly irrelevant - it's just the sequence number (other errors die()). But the point is you've queued requests, so the kernel will queue responses and if you exit the function here, nothing will consume them.On Thu, Aug 15, 2024 at 10:36:46AM +0200, Stefano Brivio wrote:I'm ignoring the return code of nl_send(), so, minus the issue you're raising about nl_foreach() below, that should already be sorted, right?It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address. While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6. This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches. As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep a count of messages we send (addresses we change) and deal with the answer to those NLM_F_REPLACE requests in a separate loop, later. Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 6 ++++++ 3 files changed, 62 insertions(+) diff --git a/netlink.c b/netlink.c index 873e6c7..59f2fd9 100644 --- a/netlink.c +++ b/netlink.c @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src, return 0; } +/** + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses + * @s: Netlink socket + * @ifi: Interface index in target namespace + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_set_ll_nodad(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + unsigned ll_addrs = 0; + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK) + continue; + + ifa->ifa_flags |= IFA_F_NODAD; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + /* If 32-bit flags are used, add IFA_F_NODAD there */ + if (rta->rta_type == IFA_FLAGS) + *(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD; + } + + nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len); + ll_addrs++; + } + + if (status < 0) + return status;Ah... one gotcha with the nl_send() in the loop. We should make sure we get the responses from any of those we sent, even if the original request failed. Otherwise we'll be out of sync on the netlink socket again.Uh.. yeah, it's a bit fiddly. Especially since in those foreach loops status does double duty as the remaining data in the current message and as the status code. # Option 1 Assuming contiguous sequence numbers, which is true for now. - Change the nl_send() within the first loop to last_seq = nl_send(...) Then immediately after the first loop int status2 = status; for (seq++; seq <= last_seq; seq++) { nl_foreach(nh, status2, s, buf, seq) ; if (status == 0) status = status2; } At this point you will have consumed all the responses and status will have the first reported error code. # Option 2 Refactor nl_status() to have a version that reports sequence number instead of taking & checking it. Loop on nl_next() until nl_status_variant() returns <= 0 *and* the last sequence number. # Option 3 Open-coded version of (2) ssize_t err = status; do { nh = nl_next(s, buf, nh, &status); if (err == 0 && nh->nl_msg_type == NLMSG_ERR) { struct nlmsgerr *errmsg = (struct nlmsgerr *)NLMSG_DATA(nh); err = errmsg->error; } } while (ng->nlmsg_seq != last_seq || (nh->nlmsg_type != NLMSG_DONE && nh->nlmsg_type != NLMSG_ERROR)); And at this point, again, you've consumed all the responses and 'err' has the first error code. I think this is roughly what I was suggesting originally, but it is messier than I thought. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonAh, oops, right.+ seq += ll_addrs; + + nl_foreach(nh, status, s, buf, seq) + warn("netlink: Unexpected response message");I don't think this will work right if there's > 1 address. It will be looking for the last sequence number on the first iteration and will die in nl_status() when it mismatches.Maybe just loop on nl_next() until you get the last seq number, then call nl_status()?How do I check for errors on the answers before the next one? I mean, nl_foreach() should fit here, it's just that I need to start from the right sequence number.That also means you could just save the seq each time you nl_send(), overwriting the previous one, rather than relying on the fact that we allocate seqs, well, sequentially.I don't understand how this fits with calling nl_next() until I get to the last sequence number. Letting that aside, can't I simply use nl_foreach(), but start with the sequence of the first nl_send() instead of the last one?
On Fri, 16 Aug 2024 10:55:45 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Aug 15, 2024 at 12:59:32PM +0200, Stefano Brivio wrote:Oh, that's what I missed: you were referring to this return statement. Sure, I understand that we need to consume those, hence the nl_foreach() later, but I missed the fact that, of course, we wouldn't necessarily reach it.On Thu, 15 Aug 2024 20:38:17 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:No. The return code from nl_send() is mostly irrelevant - it's just the sequence number (other errors die()). But the point is you've queued requests, so the kernel will queue responses and if you exit the function here, nothing will consume them.On Thu, Aug 15, 2024 at 10:36:46AM +0200, Stefano Brivio wrote:I'm ignoring the return code of nl_send(), so, minus the issue you're raising about nl_foreach() below, that should already be sorted, right?It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address. While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6. This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches. As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep a count of messages we send (addresses we change) and deal with the answer to those NLM_F_REPLACE requests in a separate loop, later. Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 6 ++++++ 3 files changed, 62 insertions(+) diff --git a/netlink.c b/netlink.c index 873e6c7..59f2fd9 100644 --- a/netlink.c +++ b/netlink.c @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src, return 0; } +/** + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses + * @s: Netlink socket + * @ifi: Interface index in target namespace + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_set_ll_nodad(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + unsigned ll_addrs = 0; + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK) + continue; + + ifa->ifa_flags |= IFA_F_NODAD; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + /* If 32-bit flags are used, add IFA_F_NODAD there */ + if (rta->rta_type == IFA_FLAGS) + *(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD; + } + + nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len); + ll_addrs++; + } + + if (status < 0) + return status;Ah... one gotcha with the nl_send() in the loop. We should make sure we get the responses from any of those we sent, even if the original request failed. Otherwise we'll be out of sync on the netlink socket again.This looks to me like the easiest to follow, thanks for the thorough descriptions! I'm going with this one in v3.Uh.. yeah, it's a bit fiddly. Especially since in those foreach loops status does double duty as the remaining data in the current message and as the status code. # Option 1 Assuming contiguous sequence numbers, which is true for now. - Change the nl_send() within the first loop to last_seq = nl_send(...) Then immediately after the first loop int status2 = status; for (seq++; seq <= last_seq; seq++) { nl_foreach(nh, status2, s, buf, seq) ; if (status == 0) status = status2; } At this point you will have consumed all the responses and status will have the first reported error code.Ah, oops, right.+ seq += ll_addrs; + + nl_foreach(nh, status, s, buf, seq) + warn("netlink: Unexpected response message");I don't think this will work right if there's > 1 address. It will be looking for the last sequence number on the first iteration and will die in nl_status() when it mismatches.Maybe just loop on nl_next() until you get the last seq number, then call nl_status()?How do I check for errors on the answers before the next one? I mean, nl_foreach() should fit here, it's just that I need to start from the right sequence number.That also means you could just save the seq each time you nl_send(), overwriting the previous one, rather than relying on the fact that we allocate seqs, well, sequentially.I don't understand how this fits with calling nl_next() until I get to the last sequence number. Letting that aside, can't I simply use nl_foreach(), but start with the sequence of the first nl_send() instead of the last one?# Option 2 Refactor nl_status() to have a version that reports sequence number instead of taking & checking it. Loop on nl_next() until nl_status_variant() returns <= 0 *and* the last sequence number. # Option 3 Open-coded version of (2) ssize_t err = status; do { nh = nl_next(s, buf, nh, &status); if (err == 0 && nh->nl_msg_type == NLMSG_ERR) { struct nlmsgerr *errmsg = (struct nlmsgerr *)NLMSG_DATA(nh); err = errmsg->error; } } while (ng->nlmsg_seq != last_seq || (nh->nlmsg_type != NLMSG_DONE && nh->nlmsg_type != NLMSG_ERROR)); And at this point, again, you've consumed all the responses and 'err' has the first error code. I think this is roughly what I was suggesting originally, but it is messier than I thought.-- Stefano
As soon as we bring up the interface, the Linux kernel will set up a link-local address for it, so we can fetch it and start using right away, if we need a link-local address to communicate to the container before we see any traffic coming from it. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- netlink.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 7 +++++++ 3 files changed, 55 insertions(+) diff --git a/netlink.c b/netlink.c index 59f2fd9..06a3816 100644 --- a/netlink.c +++ b/netlink.c @@ -794,6 +794,53 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, return status; } +/** + * nl_addr_get_ll() - Get first IPv6 link-local address for a given interface + * @s: Netlink socket + * @ifi: Interface index in outer network namespace + * @addr: Link-local address to fill + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + struct nlmsghdr *nh; + bool found = false; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK || + found) + continue; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + if (rta->rta_type != IFA_ADDRESS) + continue; + + if (!found) { + memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta)); + found = true; + } + } + } + return status; +} + /** * nl_add_set() - Set IP addresses for given interface and address family * @s: Netlink socket diff --git a/netlink.h b/netlink.h index 66a44ad..b51e99c 100644 --- a/netlink.h +++ b/netlink.h @@ -19,6 +19,7 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, void *addr, int *prefix_len, void *addr_l); int nl_addr_set(int s, unsigned int ifi, sa_family_t af, const void *addr, int prefix_len); +int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr); int nl_addr_set_ll_nodad(int s, unsigned int ifi); int nl_addr_dup(int s_src, unsigned int ifi_src, int s_dst, unsigned int ifi_dst, sa_family_t af); diff --git a/pasta.c b/pasta.c index 17eed15..e8883bd 100644 --- a/pasta.c +++ b/pasta.c @@ -340,6 +340,13 @@ void pasta_ns_conf(struct ctx *c) } if (c->ifi6) { + rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, + &c->ip6.addr_ll_seen); + if (rc < 0) { + warn("Can't get LL address from namespace: %s", + strerror(-rc)); + } + rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); if (rc < 0) { warn("Can't set nodad for LL in namespace: %s", -- 2.43.0
On Thu, Aug 15, 2024 at 10:36:47AM +0200, Stefano Brivio wrote:As soon as we bring up the interface, the Linux kernel will set up a link-local address for it, so we can fetch it and start using right away, if we need a link-local address to communicate to the container before we see any traffic coming from it. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>--- netlink.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 7 +++++++ 3 files changed, 55 insertions(+) diff --git a/netlink.c b/netlink.c index 59f2fd9..06a3816 100644 --- a/netlink.c +++ b/netlink.c @@ -794,6 +794,53 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, return status; } +/** + * nl_addr_get_ll() - Get first IPv6 link-local address for a given interface + * @s: Netlink socket + * @ifi: Interface index in outer network namespace + * @addr: Link-local address to fill + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + struct nlmsghdr *nh; + bool found = false; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK || + found) + continue; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + if (rta->rta_type != IFA_ADDRESS) + continue; + + if (!found) { + memcpy(addr, RTA_DATA(rta), RTA_PAYLOAD(rta)); + found = true; + } + } + } + return status; +} + /** * nl_add_set() - Set IP addresses for given interface and address family * @s: Netlink socket diff --git a/netlink.h b/netlink.h index 66a44ad..b51e99c 100644 --- a/netlink.h +++ b/netlink.h @@ -19,6 +19,7 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, void *addr, int *prefix_len, void *addr_l); int nl_addr_set(int s, unsigned int ifi, sa_family_t af, const void *addr, int prefix_len); +int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr); int nl_addr_set_ll_nodad(int s, unsigned int ifi); int nl_addr_dup(int s_src, unsigned int ifi_src, int s_dst, unsigned int ifi_dst, sa_family_t af); diff --git a/pasta.c b/pasta.c index 17eed15..e8883bd 100644 --- a/pasta.c +++ b/pasta.c @@ -340,6 +340,13 @@ void pasta_ns_conf(struct ctx *c) } if (c->ifi6) { + rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, + &c->ip6.addr_ll_seen); + if (rc < 0) { + warn("Can't get LL address from namespace: %s", + strerror(-rc)); + } + rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); if (rc < 0) { warn("Can't set nodad for LL in namespace: %s",-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
As soon as we the kernel notifier for IPv6 address configuration (addrconf_notify()) sees that we bring the target interface up (NETDEV_UP), it will schedule duplicate address detection, so, by itself, setting the nodad flag later is useless, because that won't stop a detection that's already in progress. However, if we disable neighbour solicitations with IFF_NOARP (which is a misnomer for IPv6 interfaces, but there's no possibility of mixing things up), the notifier will not trigger DAD, because it can't be done, of course, without neighbour solicitations. Set IFF_NOARP as we bring up the device, and drop it after we had a chance to set the nodad attribute on the link. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- pasta.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/pasta.c b/pasta.c index e8883bd..1142f03 100644 --- a/pasta.c +++ b/pasta.c @@ -303,10 +303,15 @@ void pasta_ns_conf(struct ctx *c) strerror(-rc)); if (c->pasta_conf_ns) { + unsigned int flags = IFF_UP; + if (c->mtu != -1) nl_link_set_mtu(nl_sock_ns, c->pasta_ifi, c->mtu); - nl_link_set_flags(nl_sock_ns, c->pasta_ifi, IFF_UP, IFF_UP); + if (c->ifi6) /* Avoid duplicate address detection on link up */ + flags |= IFF_NOARP; + + nl_link_set_flags(nl_sock_ns, c->pasta_ifi, flags, flags); if (c->ifi4) { if (c->ip4.no_copy_addrs) { @@ -353,6 +358,10 @@ void pasta_ns_conf(struct ctx *c) strerror(-rc)); } + /* We dodged DAD: re-enable neighbour solicitations */ + nl_link_set_flags(nl_sock_ns, c->pasta_ifi, + 0, IFF_NOARP); + if (c->ip6.no_copy_addrs) { rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET6, &c->ip6.addr, 64); -- 2.43.0
Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/netlink.c b/netlink.c index 06a3816..afd6efd 100644 --- a/netlink.c +++ b/netlink.c @@ -842,7 +842,7 @@ int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr) } /** - * nl_add_set() - Set IP addresses for given interface and address family + * nl_addr_set() - Set IP addresses for given interface and address family * @s: Netlink socket * @ifi: Interface index * @af: Address family -- 2.43.0