On Mon, 24 Jul 2023 16:09:29 +1000
David Gibson
So far we never checked for errors reported on netlink operations via NLMSG_ERROR messages. This has led to several subtle and tricky to debug situations which would have been obvious if we knew that certain netlink operations had failed.
Introduce a nl_do() helper that performs netlink "do" operations (that is making a single change without retreiving complex information) with much more thorough error checking. As well as returning an error code if we get an NLMSG_ERROR message, we also check for unexpected behaviour in several places. That way if we've made a mistake in our assumptions about how netlink works it should result in a clear error rather than some subtle misbehaviour.
We update those calls to nl_req() that can use the new wrapper to do so. We will extend those to better handle errors in future. We don't touch non-"do" operations for now, those are a bit trickier.
Link: https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson
--- netlink.c | 59 ++++++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 47 insertions(+), 12 deletions(-) diff --git a/netlink.c b/netlink.c index 3170344..cdd65c0 100644 --- a/netlink.c +++ b/netlink.c @@ -148,6 +148,47 @@ static ssize_t nl_req(int s, char *buf, void *req, return n; }
+/** + * nl_do() - Send netlink "do" request, and wait for acknowledgement + * @s: Netlink socket + * @req: Request (will fill netlink header) + * @type: Request type + * @flags: Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed) + * @len: Request length + * + * Return: 0 on success, negative error code on error + */ +static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len) +{ + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + uint16_t seq; + ssize_t n; + + n = nl_req(s, buf, req, type, flags, len); + seq = ((struct nlmsghdr *)req)->nlmsg_seq; + + for (nh = (struct nlmsghdr *)buf; + NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) { + struct nlmsgerr *errmsg; + + if (nh->nlmsg_seq != seq) + die("netlink: Unexpected response sequence number"); + + switch (nh->nlmsg_type) { + case NLMSG_DONE: + return 0; + case NLMSG_ERROR: + errmsg = (struct nlmsgerr *)NLMSG_DATA(nh); + return errmsg->error;
This is an errno, we should probably print it here ...and, now reading 14/17 and 16/17: saving repeated strerror() calls there. On the other hand this has the advantage of one single error message instead of two, but... hmm. -- Stefano