[PATCH 0/7] Assorted fixes for UDP socket and error handling problems
My recent changes to UDP socket management caused some unfortunate side effects. In particular, it has some bad interactions with UDP error handling. Fix several bugs here, along with some reworks to allow that. David Gibson (7): udp: Fix breakage of UDP error handling by PKTINFO support udp: Be quieter about errors on UDP receive udp: Pass socket & flow information direction to error handling functions udp: Deal with errors as we go in udp_sock_fwd() udp: Add udp_pktinfo() helper udp: Minor re-organisation of udp_sock_recverr() udp: Propagate errors on listening and brand new sockets udp.c | 215 +++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 138 insertions(+), 77 deletions(-) -- 2.49.0
We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our
UDP sockets. This lets us obtain and properly handle the specific local
address used when we're "listening" with a socket on 0.0.0.0 or ::.
However, the PKTINFO cmsgs this option generates appear on error queue
messages as well as regular datagrams. udp_sock_recverr() doesn't expect
this and so flags an unrecoverable error when it can't parse the control
message.
Correct this by adding space in udp_sock_recverr()s control buffer for the
additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather
than only looking at the first one.
Link: https://bugs.passt.top/show_bug.cgi?id=99
Fixes: f4b0dd8b06 ("udp: Use PKTINFO cmsgs to get destination address..")
Reported-by: Stefano Brivio
If we get an error on UDP receive, either in udp_peek_addr() or
udp_sock_recv(), we'll print an error message. However, this could be
a perfectly routine UDP error triggered by an ICMP, which need not go to
the error log.
This doesn't usually happen, because before receiving we typically clear
the error queue from udp_sock_errs(). However, it's possible an error
could be flagged after udp_sock_errs() but before we receive. So it's
better to handle this error "silently" (trace level only). We'll bail out
of the receive, return to the epoll loop, and get an EPOLLERR where we'll
handle and report the error properly.
In particular there's one situation that can trigger this case much more
easily. If we start a new outbound UDP flow to a local destination with
nothing listening, we'll get a more or less immediate connection refused
error. So, we'll get that error on the very first receive after the
connect(). That will occur in udp_flow_defer() -> udp_flush_flow() ->
udp_sock_fwd() -> udp_peek_addr() -> recvmsg(). This path doesn't call
udp_sock_errs() first, so isn't (imperfectly) protected the way we are
most of the time.
Fixes: 84ab1305faba ("udp: Polish udp_vu_sock_info() and remove from...")
Fixes: 69e5393c3722 ("udp: Move some more of sock_handler tasks into...")
Signed-off-by: David Gibson
udp_sock_recverr() and udp_sock_errs() take an epoll reference from which
they obtain both the socket fd to receive errors from, and - for flow
specific sockets - the flow and side the socket is associated with.
We have some upcoming cases where we want to clear errors when we're not
directly associated with receiving an epoll event, so it's not natural to
have an epoll reference. Therefore, make these functions take the socket
and flow from explicit parameters.
Signed-off-by: David Gibson
When we get an epoll event on a listening socket, we first deal with any
errors (udp_sock_errs()), then with any received packets (udp_sock_fwd()).
However, it's theoretically possible that new errors could get flagged on
the socket after we call udp_sock_errs(), in which case we could get errors
returned in in udp_sock_fwd() -> udp_peek_addr() -> recvmsg().
In fact, we do deal with this correctly, although the path is somewhat
non-obvious. The recvmsg() error will cause us to bail out of
udp_sock_fwd(), but the EPOLLERR event will now be flagged, so we'll come
back here next epoll loop and call udp_sock_errs().
Except.. we call udp_sock_fwd() from udp_flush_flow() as well as from
epoll events. This is to deal with any packets that arrived between bind()
and connect(), and so might not be associated with the socket's intended
flow. This expects udp_sock_fwd() to flush _all_ queued datagrams, so that
anything received later must be for the correct flow.
At the moment, udp_sock_errs() might fail to flush all datagrams if errors
occur. In particular this can happen in practice for locally reported
errors which occur immediately after connect() (e.g. connecting to a local
port with nothing listening).
We can deal with the problem case, and also make the flow a little more
natural for the common case by having udp_sock_fwd() call udp_sock_errs()
to handle errors as the occur, rather than trying to deal with all errors
in advance.
Signed-off-by: David Gibson
Currently we open code parsing the control message for IP_PKTINFO in
udp_peek_addr(). We have an upcoming case where we want to parse PKTINFO
in another place, so split this out into a helper function.
While we're there, make the parsing a bit more robust: scan all cmsgs to
look for the one we want, rather than assuming there's only one.
Signed-off-by: David Gibson
I took the liberty of applying this with two minor changes:
On Tue, 15 Apr 2025 17:16:22 +1000
David Gibson
Currently we open code parsing the control message for IP_PKTINFO in udp_peek_addr(). We have an upcoming case where we want to parse PKTINFO in another place, so split this out into a helper function.
While we're there, make the parsing a bit more robust: scan all cmsgs to look for the one we want, rather than assuming there's only one.
Signed-off-by: David Gibson
--- udp.c | 52 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/udp.c b/udp.c index 0bec499d..352ab83b 100644 --- a/udp.c +++ b/udp.c @@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c, tap_icmp6_send(c, saddr, eaddr, &msg, msglen); }
+/** + * udp_pktinfo() - Retreive packet destination address from cmsg
Nit: "Retrieve", and:
+ * @msg: msghdr into which message has been received + * @dst: (Local) destination address of message in @mh (output) + * + * Return: 0 on success, -1 if the information was missing (@dst is set to + * inany_any6). + */ +static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) +{ + struct cmsghdr *hdr; + + for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) { + if (hdr->cmsg_level == IPPROTO_IP && + hdr->cmsg_type == IP_PKTINFO) { + const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr); + + *dst = inany_from_v4(i4->ipi_addr); + return 0; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && + hdr->cmsg_type == IPV6_PKTINFO) { + const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr); + + dst->a6 = i6->ipi6_addr; + return 0; + } + } + + err("Missing PKTINFO cmsg on datagram");
...from a quick glance at the matching kernel path, this looks a bit too likely for err(), so I got a bit scared, and turned it to debug(). I think warn() is actually more correct than debug() (I think it shouldn't really be err() though because we can keep using this flow), but I would feel a lot more confident about it once we have rate-limited versions of those. Of course, I'll change this back to err() or warn() if you have a compelling reason. Similar reasoning in 7/7 where this is called. By the way, two messages (here and in caller) look redundant to me, regardless of their severity, but I'm not sure if you have further changes in mind where these separate prints would make sense. -- Stefano
On Tue, Apr 15, 2025 at 08:54:00PM +0200, Stefano Brivio wrote:
I took the liberty of applying this with two minor changes:
On Tue, 15 Apr 2025 17:16:22 +1000 David Gibson
wrote: Currently we open code parsing the control message for IP_PKTINFO in udp_peek_addr(). We have an upcoming case where we want to parse PKTINFO in another place, so split this out into a helper function.
While we're there, make the parsing a bit more robust: scan all cmsgs to look for the one we want, rather than assuming there's only one.
Signed-off-by: David Gibson
--- udp.c | 52 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/udp.c b/udp.c index 0bec499d..352ab83b 100644 --- a/udp.c +++ b/udp.c @@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c, tap_icmp6_send(c, saddr, eaddr, &msg, msglen); }
+/** + * udp_pktinfo() - Retreive packet destination address from cmsg
Nit: "Retrieve", and:
Oops, thanks.
+ * @msg: msghdr into which message has been received + * @dst: (Local) destination address of message in @mh (output) + * + * Return: 0 on success, -1 if the information was missing (@dst is set to + * inany_any6). + */ +static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) +{ + struct cmsghdr *hdr; + + for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) { + if (hdr->cmsg_level == IPPROTO_IP && + hdr->cmsg_type == IP_PKTINFO) { + const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr); + + *dst = inany_from_v4(i4->ipi_addr); + return 0; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && + hdr->cmsg_type == IPV6_PKTINFO) { + const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr); + + dst->a6 = i6->ipi6_addr; + return 0; + } + } + + err("Missing PKTINFO cmsg on datagram");
...from a quick glance at the matching kernel path, this looks a bit too likely for err(), so I got a bit scared, and turned it to debug().
Ehh.. fair enough.
I think warn() is actually more correct than debug() (I think it shouldn't really be err() though because we can keep using this flow),
So, technically we don't have a flow at this point: the point here is that if we don't get the pktinfo we don't have the information we need to determine the right flow. But, regardless, you're right, warn() would be the right level in theory.
but I would feel a lot more confident about it once we have rate-limited versions of those.
Ok.
Of course, I'll change this back to err() or warn() if you have a compelling reason.
Similar reasoning in 7/7 where this is called.
By the way, two messages (here and in caller) look redundant to me, regardless of their severity, but I'm not sure if you have further changes in mind where these separate prints would make sense.
In this patch there aren't two messages, it's only in this function. There are two when I add the other caller in 7/7 which was an oversight. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
Usually we work with the "exit early" flow style, where we return early
on "error" conditions in functions. We don't currently do this in
udp_sock_recverr() for the case where we don't have a flow to associate
the error with.
Reorganise to use the "exit early" style, which will make some subsequent
changes less awkward.
Signed-off-by: David Gibson
udp_sock_recverr() processes errors on UDP sockets and attempts to
propagate them as ICMP packets on the tap interface. To do this it
currently requires the flow with which the error is associated as a
parameter. If that's missing it will clear the error condition, but not
propagate it.
That means that we largely ignore errors on "listening" sockets. It also
means we may discard some errors on flow specific sockets if they occur
very shortly after the socket is created. In udp_flush_flow() we need to
clear any datagrams received between bind() and connect() which might not
be associated with the "final" flow for the socket. If we get errors
before that point we'll ignore them in the same way because we don't know
the flow they're associated with in advance.
This can happen in practice if we have errors which occur almost
immediately after connect(), such as ECONNREFUSED when we connect() to a
local address where nothing is listening.
Between the extended error message itself and the PKTINFO information we
do actually have enough information to find the correct flow. So, rather
than ignoring errors where we don't have a flow "hint", determine the flow
the hard way in udp_sock_recverr().
Signed-off-by: David Gibson
On Tue, 15 Apr 2025 17:16:24 +1000
David Gibson
udp_sock_recverr() processes errors on UDP sockets and attempts to propagate them as ICMP packets on the tap interface. To do this it currently requires the flow with which the error is associated as a parameter. If that's missing it will clear the error condition, but not propagate it.
That means that we largely ignore errors on "listening" sockets. It also means we may discard some errors on flow specific sockets if they occur very shortly after the socket is created. In udp_flush_flow() we need to clear any datagrams received between bind() and connect() which might not be associated with the "final" flow for the socket. If we get errors before that point we'll ignore them in the same way because we don't know the flow they're associated with in advance.
This can happen in practice if we have errors which occur almost immediately after connect(), such as ECONNREFUSED when we connect() to a local address where nothing is listening.
Between the extended error message itself and the PKTINFO information we do actually have enough information to find the correct flow. So, rather than ignoring errors where we don't have a flow "hint", determine the flow the hard way in udp_sock_recverr().
Signed-off-by: David Gibson
--- udp.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/udp.c b/udp.c index 6db3accc..c04a091b 100644 --- a/udp.c +++ b/udp.c @@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) * @c: Execution context * @s: Socket to receive errors from * @sidx: Flow and side of @s, or FLOW_SIDX_NONE if unknown + * @pif: Interface on which the error occurred + * (only used if @sidx == FLOW_SIDX_NONE) + * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) * * Return: 1 if error received and processed, 0 if no more errors in queue, < 0 * if there was an error reading the queue * * #syscalls recvmsg */ -static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) +static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, + uint8_t pif, in_port_t port) { struct errhdr { struct sock_extended_err ee; union sockaddr_inany saddr; }; char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; + const struct errhdr *eh = NULL; char data[ICMP6_MAX_DLEN]; - const struct errhdr *eh; struct cmsghdr *hdr; struct iovec iov = { .iov_base = data, .iov_len = sizeof(data) }; + union sockaddr_inany src; struct msghdr mh = { + .msg_name = &src, + .msg_namelen = sizeof(src), .msg_iov = &iov, .msg_iovlen = 1, .msg_control = buf, @@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) hdr->cmsg_type == IP_RECVERR) || (hdr->cmsg_level == IPPROTO_IPV6 && hdr->cmsg_type == IPV6_RECVERR)) - break; + break; }
if (!hdr) { @@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno));
if (!flow_sidx_valid(sidx)) { - trace("Ignoring received IP_RECVERR cmsg on listener socket"); - return 1; + /* No hint from the socket, determine flow from addresses */ + union inany_addr dst; + + if (udp_pktinfo(&mh, &dst) < 0) { + warn("Missing PKTINFO on UDP error");
...changed this to debug(), see my comments to 5/7. -- Stefano
On Tue, Apr 15, 2025 at 08:54:17PM +0200, Stefano Brivio wrote:
On Tue, 15 Apr 2025 17:16:24 +1000 David Gibson
wrote: udp_sock_recverr() processes errors on UDP sockets and attempts to propagate them as ICMP packets on the tap interface. To do this it currently requires the flow with which the error is associated as a parameter. If that's missing it will clear the error condition, but not propagate it.
That means that we largely ignore errors on "listening" sockets. It also means we may discard some errors on flow specific sockets if they occur very shortly after the socket is created. In udp_flush_flow() we need to clear any datagrams received between bind() and connect() which might not be associated with the "final" flow for the socket. If we get errors before that point we'll ignore them in the same way because we don't know the flow they're associated with in advance.
This can happen in practice if we have errors which occur almost immediately after connect(), such as ECONNREFUSED when we connect() to a local address where nothing is listening.
Between the extended error message itself and the PKTINFO information we do actually have enough information to find the correct flow. So, rather than ignoring errors where we don't have a flow "hint", determine the flow the hard way in udp_sock_recverr().
Signed-off-by: David Gibson
--- udp.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/udp.c b/udp.c index 6db3accc..c04a091b 100644 --- a/udp.c +++ b/udp.c @@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) * @c: Execution context * @s: Socket to receive errors from * @sidx: Flow and side of @s, or FLOW_SIDX_NONE if unknown + * @pif: Interface on which the error occurred + * (only used if @sidx == FLOW_SIDX_NONE) + * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) * * Return: 1 if error received and processed, 0 if no more errors in queue, < 0 * if there was an error reading the queue * * #syscalls recvmsg */ -static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) +static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, + uint8_t pif, in_port_t port) { struct errhdr { struct sock_extended_err ee; union sockaddr_inany saddr; }; char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; + const struct errhdr *eh = NULL; char data[ICMP6_MAX_DLEN]; - const struct errhdr *eh; struct cmsghdr *hdr; struct iovec iov = { .iov_base = data, .iov_len = sizeof(data) }; + union sockaddr_inany src; struct msghdr mh = { + .msg_name = &src, + .msg_namelen = sizeof(src), .msg_iov = &iov, .msg_iovlen = 1, .msg_control = buf, @@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) hdr->cmsg_type == IP_RECVERR) || (hdr->cmsg_level == IPPROTO_IPV6 && hdr->cmsg_type == IPV6_RECVERR)) - break; + break; }
if (!hdr) { @@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno));
if (!flow_sidx_valid(sidx)) { - trace("Ignoring received IP_RECVERR cmsg on listener socket"); - return 1; + /* No hint from the socket, determine flow from addresses */ + union inany_addr dst; + + if (udp_pktinfo(&mh, &dst) < 0) { + warn("Missing PKTINFO on UDP error");
...changed this to debug(), see my comments to 5/7.
Actually this one can go away entirely. As you pointed out it's redundant with the one in udp_pktinfo() itself. Oh well, the extra debug() should be harmless enough. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 15 Apr 2025 17:16:17 +1000
David Gibson
My recent changes to UDP socket management caused some unfortunate side effects. In particular, it has some bad interactions with UDP error handling. Fix several bugs here, along with some reworks to allow that.
David Gibson (7): udp: Fix breakage of UDP error handling by PKTINFO support udp: Be quieter about errors on UDP receive udp: Pass socket & flow information direction to error handling functions udp: Deal with errors as we go in udp_sock_fwd() udp: Add udp_pktinfo() helper udp: Minor re-organisation of udp_sock_recverr() udp: Propagate errors on listening and brand new sockets
Applied with minor changes as described for 5/7 and 7/7. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio