[PATCH v2 00/17] netlink fixes and cleanups
We've had several bugs in the past that were quite tricky to debug, but would have been much easier if we'd known that a netlink operation had failed. So, it would be desirable to actually detect and report failures of netlink operations. While working on that, I discovered that there are a number of other issues ranging from very small to medium sized with the way we use netlink. This series addresses many of them. Link: https://bugs.passt.top/show_bug.cgi?id=60 Link: https://bugs.passt.top/show_bug.cgi?id=67 This series was tested as based on the pending patches adding C11 support, though I believe it trivially rebases onto current main. Changes since v1: * Assorted minor fixes based on Stefano's review * Rebased on C11 branch David Gibson (17): netlink: Split up functionality of nl_link() netlink: Split nl_addr() into separate operation functions netlink: Split nl_route() into separate operation functions netlink: Use struct in_addr for IPv4 addresses, not bare uint32_t netlink: Explicitly pass netlink sockets to operations netlink: Make nl_*_dup() use a separate datagram for each request netlink: Start sequence number from 1 instead of 0 netlink: Treat send() or recv() errors as fatal netlink: Fill in netlink header fields from nl_req() netlink: Add nl_do() helper for simple operations with error checking netlink: Clearer reasoning about the netlink response buffer size netlink: Split nl_req() to allow processing multiple response datagrams netlink: Add nl_foreach_oftype to filter response message types netlink: Propagate errors for "set" operations netlink: Always process all responses to a netlink request netlink: Propagate errors for "dump" operations netlink: Propagate errors for "dup" operations conf.c | 66 ++++- netlink.c | 843 ++++++++++++++++++++++++++++++++++-------------------- netlink.h | 27 +- pasta.c | 75 +++-- 4 files changed, 658 insertions(+), 353 deletions(-) -- 2.41.0
nl_link() performs a number of functions: it can bring links up, set MAC
address and MTU and also retrieve the existing MAC. This makes for a small
number of lines of code, but high conceptual complexity: it's quite hard
to follow what's going on both in nl_link() itself and it's also not very
obvious which function its callers are intending to use.
Clarify this, by splitting nl_link() into nl_link_up(), nl_link_set_mac(),
and nl_link_get_mac(). The first brings up a link, optionally setting the
MTU, the others get or set the MAC address.
This fixes an arguable bug in pasta_ns_conf(): it looks as though that was
intended to retrieve the guest MAC whether or not c->pasta_conf_ns is set.
However, it only actually does so in the !c->pasta_conf_ns case: the fact
that we set up==1 means we would only ever set, never get, the MAC in the
nl_link() call in the other path. We get away with this because the MAC
will quickly be discovered once we receive packets on the tap interface.
Still, it's neater to always get the MAC address here.
Signed-off-by: David Gibson
nl_addr() can perform three quite different operations based on the 'op'
parameter, each of which uses a different subset of the parameters. Split
them up into a function for each operation. This does use more lines of
code, but the overlap wasn't that great, and the separated logic is much
easier to follow.
It's also clearer in the callers what we expect the netlink operations to
do, and what information it uses.
Signed-off-by: David Gibson
nl_route() can perform 3 quite different operations based on the 'op'
parameter. Split this into separate functions for each one. This requires
more lines of code, but makes the internal logic of each operation much
easier to follow.
Signed-off-by: David Gibson
This improves consistency with IPv6 and makes it harder to misuse these as
some other sort of value.
Signed-off-by: David Gibson
All the netlink operations currently implicitly use one of the two global
netlink sockets, sometimes depending on an 'ns' parameter. Change them
all to explicitly take the socket to use (or two sockets to use in the case
of the *_dup() functions). As well as making these functions strictly more
general, it makes the callers easier to follow because we're passing a
socket variable with a name rather than an unexplained '0' or '1' for the
ns parameter.
Signed-off-by: David Gibson
nl_req() is designed to handle a single netlink request message: it only
receives a single reply datagram for the request, and only waits for a
single NLMSG_DONE or NLMSG_ERROR message at the beginning to clear out
things from previous requests.
However, in both nl_addr_dup() and nl_route_dup() we can send multiple
request messages as a single datagram, with a single nl_req() call.
This can easily mean that the replies nl_req() collects get out of
sync with requests. We only get away with this because after we call
these functions we don't make any netlink calls where we need to parse
the replies.
This is fragile, so alter nl_*_dup() to make an nl_req() call for each
address it is adding in the target namespace.
For nl_route_dup() this fixes an additional minor problem: because
routes can have dependencies, some of the route add requests might
fail on the first attempt, so we need to repeat the requests a number
of times. When we did that, we weren't updating the sequence number
on each new attempt. This works, but not updating the sequence number
for each new request isn't ideal. Now that we're making the requests
one at a time, it's easier to make sure we update the sequence number
each time.
Link: https://bugs.passt.top/show_bug.cgi?id=67
Signed-off-by: David Gibson
Netlink messages have a sequence number that's used to match requests to
responses. It mostly doesn't matter what it is as long as it monotonically
increases, so we just use a global counter which we advance with each
request.
However, we start this counter at 0, so our very first request has sequence
number 0, which is usually reserved for asynchronous messages from the
kernel which aren't in response to a specific request. Since we don't (for
now) use such async messages, this doesn't really matter, but it's not
good practce. So start the sequence at 1 instead.
Link: https://bugs.passt.top/show_bug.cgi?id=67
Signed-off-by: David Gibson
Errors on send() or recv() calls on a netlink socket don't indicate errors
with the netlink operations we're attempting, but rather that something's
gone wrong with the mechanics of netlink itself. We don't really expect
this to ever happen, and if it does, it's not clear what we could to to
recover.
So, treat errors from these calls as fatal, rather than returning the error
up the stack. This makes handling failures in the callers of nl_req()
simpler.
Signed-off-by: David Gibson
Currently netlink functions need to fill in a full netlink header, as well
as a payload then call nl_req() to submit that to the kernel. It makes
things a bit terser if we just give the relevant header fields as
parameters to nl_req() and have it complete the header.
Signed-off-by: David Gibson
So far we never checked for errors reported on netlink operations via
NLMSG_ERROR messages. This has led to several subtle and tricky to debug
situations which would have been obvious if we knew that certain netlink
operations had failed.
Introduce a nl_do() helper that performs netlink "do" operations (that is
making a single change without retreiving complex information) with much
more thorough error checking. As well as returning an error code if we
get an NLMSG_ERROR message, we also check for unexpected behaviour in
several places. That way if we've made a mistake in our assumptions about
how netlink works it should result in a clear error rather than some subtle
misbehaviour.
We update those calls to nl_req() that can use the new wrapper to do so.
We will extend those to better handle errors in future. We don't touch
non-"do" operations for now, those are a bit trickier.
Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson
Currently we set NLBUFSIZ large enough for 8192 netlink headers (128kiB in
total), and reference netlink(7). However netlink(7) says nothing about
reponse buffer sizes, and the documents which do reference 8192 *bytes* not
8192 headers.
Update NLBUFSIZ to 64kiB with a more detailed rationale.
Link: https://bugs.passt.top/show_bug.cgi?id=67
Signed-off-by: David Gibson
Currently nl_req() sends the request, and receives a single response
datagram which we then process. However, a single request can result in
multiple response datagrams. That happens nearly all the time for DUMP
requests, where the 'DONE' message usually comes in a second datagram after
the NEW{LINK|ADDR|ROUTE} messages. It can also happen if there are just
too many objects to dump in a single datagram.
Allow our netlink code to process multiple response datagrams by splitting
nl_req() into three different helpers: nl_send() just sends a request,
without getting a response. nl_status() checks a single message to see if
it indicates the end of the reponses for our request. nl_next() moves onto
the next response message, whether it's in a datagram we already received
or we need to recv() a new one. We also add a 'for'-style macro to use
these to step through every response message to a request across multiple
datagrams.
While we're at it, be more thourough with checking that our sequence
numbers are in sync.
Link: https://bugs.passt.top/show_bug.cgi?id=67
Signed-off-by: David Gibson
In most cases where processing response messages, we expect only one type
of message (excepting NLMSG_DONE or NLMSG_ERROR), and so we need a test
and continue to skip anything else. Add a helper macro to do this.
This also fixes a bug in nl_get_ext_if() where we didn't have such a test
and if we got a message other than RTM_NEWROUTE we would have parsed
its contents as nonsense.
Also add a warning message if we get such an unexpected message type, which
could be useful for debugging if we ever hit it.
Signed-off-by: David Gibson
Currently if anything goes wrong while we're configuring the namespace
network with --config-net, we'll just ignore it and carry on. This might
lead to a silently unconfigured or misconfigured namespace environment.
For simple "set" operations based on nl_do() we can now detect failures
reported via netlink. Propagate those errors up to pasta_ns_conf() and
report them usefully.
Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson
A single netlink request can result in multiple response datagrams. We
process multiple response datagrams in some circumstances, but there are
cases where we exit early and will leave remaining datagrams in the queue.
These will be flushed in nl_send() before we send another request.
This is confusing, and not what we need to reliably check for errors from
netlink operations. So, instead, make sure we always process all the
response datagrams whenever we send a request (excepting fatal errors).
In most cases this is just a matter of avoiding early exits from nl_foreach
loops. nl_route_dup() is a bit trickier, because we need to retain all the
routes we're going to try to copy in a single buffer. Here we instead use
a secondary buffer to flush any remaining datagrams, and report an error
if there are any additional routes in those datagrams .
Link: https://bugs.passt.top/show_bug.cgi?id=67
Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson
Currently if we receive any netlink errors while discovering network
configuration from the host, we'll just ignore it and carry on. This
might lead to cryptic error messages later on, or even silent
misconfiguration.
We now have the mechanisms to detect errors from get/dump netlink
operations. Propgate these errors up to the callers and report them usefully.
Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson
We now detect errors on netlink "set" operations while configuring the
pasta namespace with --config-net. However in many cases rather than
a simple "set" we use a more complex "dup" function to copy
configuration from the host to the namespace. We're not yet properly
detecting and reporting netlink errors for that case.
Change the "dup" operations to propagate netlink errors to their
caller, pasta_ns_conf() and report them there.
Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson
On Thu, 3 Aug 2023 17:19:39 +1000
David Gibson
We've had several bugs in the past that were quite tricky to debug, but would have been much easier if we'd known that a netlink operation had failed. So, it would be desirable to actually detect and report failures of netlink operations. While working on that, I discovered that there are a number of other issues ranging from very small to medium sized with the way we use netlink. This series addresses many of them.
Link: https://bugs.passt.top/show_bug.cgi?id=60 Link: https://bugs.passt.top/show_bug.cgi?id=67
This series was tested as based on the pending patches adding C11 support, though I believe it trivially rebases onto current main.
Changes since v1: * Assorted minor fixes based on Stefano's review * Rebased on C11 branch
Applied (with minor formatting fixes in pasta_ns_conf()). -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio