So far we never checked for errors reported on netlink
operations via
NLMSG_ERROR messages. This has led to several subtle and tricky to debug
situations which would have been obvious if we knew that certain netlink
operations had failed.
Introduce a nl_do() helper that performs netlink "do" operations (that is
making a single change without retreiving complex information) with much
more thorough error checking. As well as returning an error code if we
get an NLMSG_ERROR message, we also check for unexpected behaviour in
several places. That way if we've made a mistake in our assumptions about
how netlink works it should result in a clear error rather than some subtle
misbehaviour.
We update those calls to nl_req() that can use the new wrapper to do so.
We will extend those to better handle errors in future. We don't touch
non-"do" operations for now, those are a bit trickier.
Link:
https://bugs.passt.top/show_bug.cgi?id=60
Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au>
---
netlink.c | 59 ++++++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 47 insertions(+), 12 deletions(-)
diff --git a/netlink.c b/netlink.c
index 3170344..cdd65c0 100644
--- a/netlink.c
+++ b/netlink.c
@@ -148,6 +148,47 @@ static ssize_t nl_req(int s, char *buf, void *req,
return n;
}
+/**
+ * nl_do() - Send netlink "do" request, and wait for acknowledgement
+ * @s: Netlink socket
+ * @req: Request (will fill netlink header)
+ * @type: Request type
+ * @flags: Extra request flags (NLM_F_REQUEST and NLM_F_ACK assumed)
+ * @len: Request length
+ *
+ * Return: 0 on success, negative error code on error
+ */
+static int nl_do(int s, void *req, uint16_t type, uint16_t flags, ssize_t len)
+{
+ struct nlmsghdr *nh;
+ char buf[NLBUFSIZ];
+ uint16_t seq;
+ ssize_t n;
+
+ n = nl_req(s, buf, req, type, flags, len);
+ seq = ((struct nlmsghdr *)req)->nlmsg_seq;
+
+ for (nh = (struct nlmsghdr *)buf;
+ NLMSG_OK(nh, n); nh = NLMSG_NEXT(nh, n)) {
+ struct nlmsgerr *errmsg;
+
+ if (nh->nlmsg_seq != seq)
+ die("netlink: Unexpected response sequence number");
+
+ switch (nh->nlmsg_type) {
+ case NLMSG_DONE:
+ return 0;
+ case NLMSG_ERROR:
+ errmsg = (struct nlmsgerr *)NLMSG_DATA(nh);
+ return errmsg->error;
This is an errno, we should probably print it here ...and, now reading
14/17 and 16/17: saving repeated strerror() calls there. On the other
hand this has the advantage of one single error message instead of two,
but... hmm.
--
Stefano