A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families") changed netlink behaviour so that the NLMSG_DONE terminating a bunch of responses can go in the same datagram as those responses, rather than in a separate one. Our netlink code is supposed to handle that behaviour, and indeed does so for most cases, using the nl_foreach() macro. However, there was a subtle error in nl_route_dup() which doesn't work with this change. f00b1534 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own subtle error. The problem arises because nl_route_dup(), unlike other cases doesn't just make a single pass through all the responses to a netlink request. It needs to get all the routes, then make multiple passes through them. We don't really have anywhere to buffer multiple datagrams, so we only support the case where all the routes fit in a single datagram - but we need to fail gracefully when that's not the case. After receiving the first datagram of responses (with nl_next()) we have a first loop scanning them. It needs to exit when either we run out of messages in the datagram (!NLMSG_OK()) or when we get a message indicating the last response (nl_status() <= 0). What we do after the loop depends on which exit case we had. If we saw the last response, we're done, but otherwise we need to receive more datagrams to discard the rest of the responses. We attempt to check for that second case by re-checking NLMSG_OK(nh, status). However in the got-last-response case, we've altered status from the number of remaining bytes to the error code (usually 0). That means NLMSG_OK() now returns false even if it didn't during the loop check. To fix this we need separate variables for the number of bytes left and the final status code. We also checked status after the loop, but this was redundant: we can only exit the loop with NLMSG_OK() == true if status <= 0. Reported-by: Martin Pitt <mpitt(a)redhat.com> Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request") Link: https://github.com/containers/podman/issues/22052 Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- netlink.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/netlink.c b/netlink.c index f93f3770..632304c1 100644 --- a/netlink.c +++ b/netlink.c @@ -504,7 +504,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src, .rta.rta_len = RTA_LENGTH(sizeof(unsigned int)), .ifi = ifi_src, }; - ssize_t nlmsgs_size, status; + ssize_t nlmsgs_size, left, status; unsigned dup_routes = 0; struct nlmsghdr *nh; char buf[NLBUFSIZ]; @@ -518,9 +518,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src, * routes in the buffer at once. */ nh = nl_next(s_src, buf, NULL, &nlmsgs_size); - for (status = nlmsgs_size; - NLMSG_OK(nh, status) && (status = nl_status(nh, status, seq)) > 0; - nh = NLMSG_NEXT(nh, status)) { + for (left = nlmsgs_size; + NLMSG_OK(nh, left) && (status = nl_status(nh, left, seq)) > 0; + nh = NLMSG_NEXT(nh, left)) { struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh); struct rtattr *rta; size_t na; @@ -550,8 +550,7 @@ int nl_route_dup(int s_src, unsigned int ifi_src, } } - if (nh->nlmsg_type != NLMSG_DONE && - (!NLMSG_OK(nh, status) || status > 0)) { + if (!NLMSG_OK(nh, left)) { /* Process any remaining datagrams in a different * buffer so we don't overwrite the first one. */ @@ -577,9 +576,9 @@ int nl_route_dup(int s_src, unsigned int ifi_src, * to calculate dependencies: let the kernel do that. */ for (i = 0; i < dup_routes; i++) { - for (nh = (struct nlmsghdr *)buf, status = nlmsgs_size; - NLMSG_OK(nh, status); - nh = NLMSG_NEXT(nh, status)) { + for (nh = (struct nlmsghdr *)buf, left = nlmsgs_size; + NLMSG_OK(nh, left); + nh = NLMSG_NEXT(nh, left)) { uint16_t flags = nh->nlmsg_flags; int rc; -- 2.44.0
On Tue, 19 Mar 2024 15:53:41 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:A recent kernel change 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families") changed netlink behaviour so that the NLMSG_DONE terminating a bunch of responses can go in the same datagram as those responses, rather than in a separate one. Our netlink code is supposed to handle that behaviour, and indeed does so for most cases, using the nl_foreach() macro. However, there was a subtle error in nl_route_dup() which doesn't work with this change. f00b1534 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") attempted to fix this, but has its own subtle error. The problem arises because nl_route_dup(), unlike other cases doesn't just make a single pass through all the responses to a netlink request. It needs to get all the routes, then make multiple passes through them. We don't really have anywhere to buffer multiple datagrams, so we only support the case where all the routes fit in a single datagram - but we need to fail gracefully when that's not the case. After receiving the first datagram of responses (with nl_next()) we have a first loop scanning them. It needs to exit when either we run out of messages in the datagram (!NLMSG_OK()) or when we get a message indicating the last response (nl_status() <= 0). What we do after the loop depends on which exit case we had. If we saw the last response, we're done, but otherwise we need to receive more datagrams to discard the rest of the responses. We attempt to check for that second case by re-checking NLMSG_OK(nh, status). However in the got-last-response case, we've altered status from the number of remaining bytes to the error code (usually 0). That means NLMSG_OK() now returns false even if it didn't during the loop check. To fix this we need separate variables for the number of bytes left and the final status code. We also checked status after the loop, but this was redundant: we can only exit the loop with NLMSG_OK() == true if status <= 0. Reported-by: Martin Pitt <mpitt(a)redhat.com> Fixes: f00b153414b1 ("netlink: Don't try to get further datagrams in nl_route_dup() on NLMSG_DONE") Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request") Link: https://github.com/containers/podman/issues/22052 Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au>Applied. -- Stefano