On Wed, 20 Mar 2024 11:51:59 +0100 Paul Holzinger <pholzing(a)redhat.com> wrote:On 20/03/2024 06:33, David Gibson wrote:I was about to reply as I just applied this with s/warn/info/ here :)Since f919dc7a4b1c ("conf, netlink: Don't require a default route to start"), if there is only one host interface with routes, we will pick that as the template interface, even if there are no default routes for an IP version. Unfortunately this selection had a serious flaw: in some cases it would 'return' in the middle of an nl_foreach() loop, meaning we wouldn't consume all the netlink responses for our query. This could cause later netlink operations to fail as we read leftover responses from the aborted query. Rewrite the interface detection to avoid this problem. While we're there: * Perform detection of both default and non-default routes in a single pass, avoiding an ugly goto * Give more detail on error and working but unusual paths about the situation (no suitable interface, multiple possible candidates, etc.). Fixes: f919dc7a4b1c ("conf, netlink: Don't require a default route to start") Link: https://bugs.passt.top/show_bug.cgi?id=83 Link: https://github.com/containers/podman/issues/22052 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2270257 Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 4 ++-- netlink.c | 62 ++++++++++++++++++++++++++++++++++--------------------- 2 files changed, 40 insertions(+), 26 deletions(-) diff --git a/conf.c b/conf.c index 644752cc..9e0318a5 100644 --- a/conf.c +++ b/conf.c @@ -584,7 +584,7 @@ static unsigned int conf_ip4(unsigned int ifi, ifi = nl_get_ext_if(nl_sock, AF_INET); if (!ifi) { - info("No interface with a route for IPv4: disabling IPv4"); + info("Couldn't pick external interface: disabling IPv4"); return 0; } @@ -656,7 +656,7 @@ static unsigned int conf_ip6(unsigned int ifi, ifi = nl_get_ext_if(nl_sock, AF_INET6); if (!ifi) { - info("No interface with a route for IPv6: disabling IPv6"); + info("Couldn't pick external interface: disabling IPv6"); return 0; } diff --git a/netlink.c b/netlink.c index 632304c1..c0a5f158 100644 --- a/netlink.c +++ b/netlink.c @@ -254,8 +254,8 @@ unsigned int nl_get_ext_if(int s, sa_family_t af) .rtm.rtm_type = RTN_UNICAST, .rtm.rtm_family = af, }; - bool default_only = true; - unsigned int ifi = 0; + unsigned defifi = 0, anyifi = 0; + unsigned ndef = 0, nany = 0; struct nlmsghdr *nh; struct rtattr *rta; char buf[NLBUFSIZ]; @@ -263,7 +263,6 @@ unsigned int nl_get_ext_if(int s, sa_family_t af) uint32_t seq; size_t na; -again: /* Look for an interface with a default route first, failing that, look * for any interface with a route, and pick it only if it's the only * interface with a route. @@ -271,46 +270,61 @@ again: seq = nl_send(s, &req, RTM_GETROUTE, NLM_F_DUMP, sizeof(req)); nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWROUTE) { struct rtmsg *rtm = (struct rtmsg *)NLMSG_DATA(nh); + unsigned thisifi = 0; - if (default_only) { - if (ifi || rtm->rtm_dst_len || rtm->rtm_family != af) - continue; - } else { - if (rtm->rtm_family != af) - continue; - } + if (rtm->rtm_family != af) + continue; for (rta = RTM_RTA(rtm), na = RTM_PAYLOAD(nh); RTA_OK(rta, na); rta = RTA_NEXT(rta, na)) { if (rta->rta_type == RTA_OIF) { - if (!default_only && ifi && - ifi != *(unsigned int *)RTA_DATA(rta)) - return 0; - - ifi = *(unsigned int *)RTA_DATA(rta); + thisifi = *(unsigned int *)RTA_DATA(rta); } else if (rta->rta_type == RTA_MULTIPATH) { const struct rtnexthop *rtnh; rtnh = (struct rtnexthop *)RTA_DATA(rta); + thisifi = rtnh->rtnh_ifindex; + } + } - if (!default_only && ifi && - (int)ifi != rtnh->rtnh_ifindex) - return 0; + if (!thisifi) + continue; /* No interface for this route */ - ifi = rtnh->rtnh_ifindex; - } + if (rtm->rtm_dst_len == 0) { + /* Default route */ + ndef++; + if (!defifi) + defifi = thisifi; + } else { + /* Non-default route */ + nany++; + if (!anyifi) + anyifi = thisifi; } } if (status < 0) warn("netlink: RTM_GETROUTE failed: %s", strerror(-status)); - if (!ifi && default_only) { - default_only = false; - goto again; + if (defifi) { + if (ndef > 1) + info("Multiple default %s routes, picked first", + af == AF_INET ? "IPv4" : "IPv6"); + return defifi; } - return ifi; + if (anyifi) { + if (nany == 1) + return anyifi; + + warn("Multiple interfaces with %s routes, use -i to select one", + af == AF_INET ? "IPv4" : "IPv6");This should not be a warning, for me this always triggers because I have two interfaces with link local addresses and no global ipv6 route as I do not have any ipv6 connection.Or maybe the correct fix is to never consider ipv6 link local routes for this logic? At least I cannot see the purpose of using a interface with only a link local route. > + } > + > + if (!nany) > + warn("No interfaces with %s routes", af == AF_INET ? "IPv4" : "IPv6");...and here, because if one has no IPv6 routes we would reintroduce the issue we just fixed in 338b6321ac0d ("conf: No routable interface for IPv4 or IPv6 is informational, not a warning"). I think the purpose of picking interfaces based on routes for link-local destinations is for practical test setups like the one described in https://github.com/containers/podman/issues/21896. Functionally it doesn't make sense, but it shouldn't harm either (right?).Thanks for checking! Let's hope that was the case in the failing test. -- Stefano+ + return 0; } /**I reproduced by having a second interface and confirm this patch fixes it.