[PATCH v2 00/13] Clean up to tap errors and epoll dispatch
Getting from an epoll event to the relevant handler function is currently several levels of functions and tests. This series simplifies this to be pretty close to a single switch on a value in the epoll ref dispatching directly to the appropriate handler. Doing this requires some preliminary cleaning up of the handling of errors or disconnects on the tap device. Changes since v1: * Give listening TCP sockets their own reference type * Fold "tap reset" series into this one Changes since v2 of tap reset series: * More thorough cleanup of handling error events on the listening Unix socket. Changes since v1 of the tap reset series: * Two extra patches that further clean up the reset path David Gibson (13): tap: Clean up tap reset path tap: Clean up behaviour for errors on listening Unix socket tap: Fold reset handling into tap_handler_pasta() tap: Fold reset handling into tap_handler_passt() epoll: Generalize epoll_ref to cover things other than sockets epoll: Always use epoll_ref for the epoll data variable epoll: Fold sock_handler into general switch on epoll event fd epoll: Split handling of ICMP and ICMPv6 sockets epoll: Tiny cleanup to udp_sock_handler() epoll: Split handling of TCP timerfds into its own handler function epoll: Split handling of listening TCP sockets into their own handler epoll: Split listening Unix domain socket into its own type epoll: Use different epoll types for passt and pasta tap fds icmp.c | 118 +++++++++++++++++++++++++-------------------- icmp.h | 9 ++-- passt.c | 90 ++++++++++++++++++++-------------- passt.h | 56 +++++++++++++++++----- pasta.c | 8 +++- tap.c | 133 ++++++++++++++++++++++++++------------------------- tap.h | 7 ++- tcp.c | 84 ++++++++++++++------------------ tcp.h | 28 +++++++---- tcp_conn.h | 4 +- tcp_splice.c | 8 ++-- tcp_splice.h | 2 +- udp.c | 16 +++---- util.c | 27 ++++++++--- 14 files changed, 334 insertions(+), 256 deletions(-) -- 2.41.0
In tap_handler() if we get an error on the tap device or socket, we use
tap_sock_init() to re-initialise it. However, what we actually need for
this reset case has remarkably little in common with the case where we're
initialising for the first time:
* Re-initialising the packet pools is unnecessary
* The case of a passed in fd (--fd) isn't relevant
* We don't even call this for pasta mode
* We will never re-call tap_sock_unix_init() because we never clear
fd_tap_listen
In fact the only thing we do in tap_sock_init() relevant to the reset case
is to remove the fd from the epoll and close it... which isn't used in the
first initialisation case.
So make a new tap_sock_reset() function just for this case, and simplify
tap_sock_init() slightly as being used only for the first time case.
Signed-off-by: David Gibson
We call tap_sock_unix_new() to handle a new connection to the qemu socket
if we get an EPOLLIN event on c->fd_tap_listen. If we get any other event
on the fd, we'll fall through to the "tap reset" path. But that won't do
anything relevant to the listening socket, it will just close the already
connected socket. Furthermore, the only other event we're subscribed to
for the listening socket is EPOLLRDHUP, which doesn't apply to a non
connected socket.
Change the subscribed events from EPOLLRDHUP to EPOLLERR to catch general
errors - not that there's any obvious case that would cause this event on
a listening socket. Since we don't really expect it, and it's not obvious
how we'd recover, treat it as a fatal error if we ever do get that event.
Finally, fold all this handling into the tap_sock_unix_new() function,
there's no real reason to split it between there and tap_handler().
Signed-off-by: David Gibson
If tap_handler_pasta() fails, we reset the connection. But in the case of
pasta the "reset" is just a fatal error. Fold the die() calls directly
into tap_handler_pasta() for simplicity.
Signed-off-by: David Gibson
We call tap_sock_reset() if tap_handler_passt() fails, or if we get an
error event on the socket. Fold that logic into tap_handler() passt itself
which simplifies the caller.
Signed-off-by: David Gibson
On Thu, 10 Aug 2023 12:33:06 +1000
David Gibson
We call tap_sock_reset() if tap_handler_passt() fails, or if we get an error event on the socket. Fold that logic into tap_handler() passt itself which simplifies the caller.
Signed-off-by: David Gibson
--- tap.c | 63 ++++++++++++++++++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 31 deletions(-) diff --git a/tap.c b/tap.c index 866ca4d..501af33 100644 --- a/tap.c +++ b/tap.c @@ -891,19 +891,41 @@ append: return in->count; }
+/** + * tap_sock_reset() - Handle closing or failure of connect AF_UNIX socket + * @c: Execution context + */ +static void tap_sock_reset(struct ctx *c) +{ + if (c->one_off) { + info("Client closed connection, exiting"); + exit(EXIT_SUCCESS); + } + + /* Close the connected socket, wait for a new connection */ + epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_tap, NULL); + close(c->fd_tap); + c->fd_tap = -1; +} + /** * tap_handler_passt() - Packet handler for AF_UNIX file descriptor * @c: Execution context + * @events: epoll events * @now: Current timestamp - * - * Return: -ECONNRESET on receive error, 0 otherwise */ -static int tap_handler_passt(struct ctx *c, const struct timespec *now) +static void tap_handler_passt(struct ctx *c, uint32_t events, + const struct timespec *now) { struct ethhdr *eh; ssize_t n, rem; char *p;
+ if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) { + tap_sock_reset(c); + return; + } + redo: p = pkt_buf; rem = 0; @@ -914,12 +936,13 @@ redo: n = recv(c->fd_tap, p, TAP_BUF_FILL, MSG_DONTWAIT); if (n < 0) { if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) - return 0; + return;
epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_tap, NULL); close(c->fd_tap);
- return -ECONNRESET; + tap_sock_reset(c);
...also reported by covscan: close() before this is redundant now. -- Stefano
On Thu, Aug 10, 2023 at 09:49:46PM +0200, Stefano Brivio wrote:
On Thu, 10 Aug 2023 12:33:06 +1000 David Gibson
wrote: We call tap_sock_reset() if tap_handler_passt() fails, or if we get an error event on the socket. Fold that logic into tap_handler() passt itself which simplifies the caller.
Signed-off-by: David Gibson
--- tap.c | 63 ++++++++++++++++++++++++++++++----------------------------- 1 file changed, 32 insertions(+), 31 deletions(-) diff --git a/tap.c b/tap.c index 866ca4d..501af33 100644 --- a/tap.c +++ b/tap.c @@ -891,19 +891,41 @@ append: return in->count; }
+/** + * tap_sock_reset() - Handle closing or failure of connect AF_UNIX socket + * @c: Execution context + */ +static void tap_sock_reset(struct ctx *c) +{ + if (c->one_off) { + info("Client closed connection, exiting"); + exit(EXIT_SUCCESS); + } + + /* Close the connected socket, wait for a new connection */ + epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_tap, NULL); + close(c->fd_tap); + c->fd_tap = -1; +} + /** * tap_handler_passt() - Packet handler for AF_UNIX file descriptor * @c: Execution context + * @events: epoll events * @now: Current timestamp - * - * Return: -ECONNRESET on receive error, 0 otherwise */ -static int tap_handler_passt(struct ctx *c, const struct timespec *now) +static void tap_handler_passt(struct ctx *c, uint32_t events, + const struct timespec *now) { struct ethhdr *eh; ssize_t n, rem; char *p;
+ if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) { + tap_sock_reset(c); + return; + } + redo: p = pkt_buf; rem = 0; @@ -914,12 +936,13 @@ redo: n = recv(c->fd_tap, p, TAP_BUF_FILL, MSG_DONTWAIT); if (n < 0) { if (errno == EINTR || errno == EAGAIN || errno == EWOULDBLOCK) - return 0; + return;
epoll_ctl(c->epollfd, EPOLL_CTL_DEL, c->fd_tap, NULL); close(c->fd_tap);
- return -ECONNRESET; + tap_sock_reset(c);
...also reported by covscan: close() before this is redundant now.
Oops, yes. I spotted that, but then forgot to fix it. Fixed for a new spin. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
The epoll_ref type includes fields for the IP protocol of a socket, and the
socket fd. However, we already have a few things in the epoll which aren't
protocol sockets, and we may have more in future. Rename these fields to
an abstract "fd type" and file descriptor for more generality.
Similarly, rather than using existing IP protocol numbers for the type,
introduce our own number space. For now these just correspond to the
supported protocols, but we'll expand on that in future.
Signed-off-by: David Gibson
epoll_ref contains a variety of information useful when handling epoll
events on our sockets, and we place it in the epoll_event data field
returned by epoll. However, for a few other things we use the 'fd' field
in the standard union of types for that data field.
This actually introduces a bug which is vanishingly unlikely to hit in
practice, but very nasty if it ever did: theoretically if we had a very
large file descriptor number for fd_tap or fd_tap_listen it could overflow
into bits that overlap with the 'proto' field in epoll_ref. With some
very bad luck this could mean that we mistakenly think an event on a
regular socket is an event on fd_tap or fd_tap_listen.
More practically, using different (but overlapping) fields of the
epoll_data means we can't unify dispatch for the various different objects
in the epoll. Therefore use the same epoll_ref as the data for the tap
fds and the netns quit fd, adding new fd type values to describe them.
Signed-off-by: David Gibson
When we process events from epoll_wait(), we check for a number of special
cases before calling sock_handler() which then dispatches based on the
protocol type of the socket in the event.
Fold these cases together into a single switch on the fd type recorded in
the epoll data field.
Signed-off-by: David Gibson
On Thu, 10 Aug 2023 12:33:09 +1000
David Gibson
When we process events from epoll_wait(), we check for a number of special cases before calling sock_handler() which then dispatches based on the protocol type of the socket in the event.
Fold these cases together into a single switch on the fd type recorded in the epoll data field.
Signed-off-by: David Gibson
--- passt.c | 54 +++++++++++++++++++++++++++--------------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/passt.c b/passt.c index 45e9fbf..665262b 100644 --- a/passt.c +++ b/passt.c @@ -64,29 +64,6 @@ char *epoll_type_str[EPOLL_TYPE_MAX + 1] = { [EPOLL_TYPE_TAP] = "tap device", };
-/** - * sock_handler() - Event handler for L4 sockets - * @c: Execution context - * @ref: epoll reference - * @events: epoll events - * @now: Current timestamp - */ -static void sock_handler(struct ctx *c, union epoll_ref ref, - uint32_t events, const struct timespec *now) -{ - trace("%s: packet from %s %i (events: 0x%08x)", - c->mode == MODE_PASST ? "passt" : "pasta", - EPOLL_TYPE_STR(ref.type), ref.fd, events); - - if (!c->no_tcp && ref.type == EPOLL_TYPE_TCP) - tcp_sock_handler(c, ref, events, now); - else if (!c->no_udp && ref.type == EPOLL_TYPE_UDP) - udp_sock_handler(c, ref, events, now); - else if (!c->no_icmp && - (ref.type == EPOLL_TYPE_ICMP || ref.type == EPOLL_TYPE_ICMPV6)) - icmp_sock_handler(c, ref, events, now); -} - /** * post_handler() - Run periodic and deferred tasks for L4 protocol handlers * @c: Execution context @@ -330,13 +307,36 @@ loop:
for (i = 0; i < nfds; i++) { union epoll_ref ref = *((union epoll_ref *)&events[i].data.u64); + uint32_t eventmask = events[i].events;
- if (ref.type == EPOLL_TYPE_TAP) + trace("%s: epoll event on %s %i (events: 0x%08x)", + c.mode == MODE_PASST ? "passt" : "pasta", + EPOLL_TYPE_STR(ref.type), ref.fd, events);
Gah, sorry I missed this earlier, but covscan reported it -- you
probably wanted to pass 'eventmask' to trace().
Actually, a long time ago, I was pondering about a macro that would
print constants' names ("EPOLLIN", etc.) instead, but then I thought
that once you remember the values from
On Thu, Aug 10, 2023 at 09:49:37PM +0200, Stefano Brivio wrote:
On Thu, 10 Aug 2023 12:33:09 +1000 David Gibson
wrote: When we process events from epoll_wait(), we check for a number of special cases before calling sock_handler() which then dispatches based on the protocol type of the socket in the event.
Fold these cases together into a single switch on the fd type recorded in the epoll data field.
Signed-off-by: David Gibson
--- passt.c | 54 +++++++++++++++++++++++++++--------------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/passt.c b/passt.c index 45e9fbf..665262b 100644 --- a/passt.c +++ b/passt.c @@ -64,29 +64,6 @@ char *epoll_type_str[EPOLL_TYPE_MAX + 1] = { [EPOLL_TYPE_TAP] = "tap device", };
-/** - * sock_handler() - Event handler for L4 sockets - * @c: Execution context - * @ref: epoll reference - * @events: epoll events - * @now: Current timestamp - */ -static void sock_handler(struct ctx *c, union epoll_ref ref, - uint32_t events, const struct timespec *now) -{ - trace("%s: packet from %s %i (events: 0x%08x)", - c->mode == MODE_PASST ? "passt" : "pasta", - EPOLL_TYPE_STR(ref.type), ref.fd, events); - - if (!c->no_tcp && ref.type == EPOLL_TYPE_TCP) - tcp_sock_handler(c, ref, events, now); - else if (!c->no_udp && ref.type == EPOLL_TYPE_UDP) - udp_sock_handler(c, ref, events, now); - else if (!c->no_icmp && - (ref.type == EPOLL_TYPE_ICMP || ref.type == EPOLL_TYPE_ICMPV6)) - icmp_sock_handler(c, ref, events, now); -} - /** * post_handler() - Run periodic and deferred tasks for L4 protocol handlers * @c: Execution context @@ -330,13 +307,36 @@ loop:
for (i = 0; i < nfds; i++) { union epoll_ref ref = *((union epoll_ref *)&events[i].data.u64); + uint32_t eventmask = events[i].events;
- if (ref.type == EPOLL_TYPE_TAP) + trace("%s: epoll event on %s %i (events: 0x%08x)", + c.mode == MODE_PASST ? "passt" : "pasta", + EPOLL_TYPE_STR(ref.type), ref.fd, events);
Gah, sorry I missed this earlier, but covscan reported it -- you probably wanted to pass 'eventmask' to trace().
Oops, good catch. Fixed for a new spin
Actually, a long time ago, I was pondering about a macro that would print constants' names ("EPOLLIN", etc.) instead, but then I thought that once you remember the values from
, hex values might be a bit easier on eyes when you're digging through thousands of them... or maybe not. I don't know actually.
I'm inclined to leave it as hex for simplicity. The fact that we could have multiple events means it's not really simple macro territory; we'd have to format a bunch of stuff. I don't think that's worth bothering with for a trace message unless we repeatedly find it's making our debugging more difficult, which hasn't happened so far. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
We have different epoll type values for ICMP and ICMPv6 sockets, but they
both call the same handler function, icmp_sock_handler(). However that
function does essentially nothing in common for the two cases. So, split
it into icmp_sock_handler() and icmpv6_sock_handler() and dispatch them
separately from the top level.
While we're there remove some parameters that the function was never using
anyway. Also move the test for c->no_icmp into the functions, so that all
the logic specific to ICMP is within the handler, rather than in the top
level dispatch code.
Signed-off-by: David Gibson
Move the test for c->no_udp into the function itself, rather than in the
dispatching switch statement to better localize the UDP specific logic, and
make for greated consistency with other handler functions.
Signed-off-by: David Gibson
tcp_sock_handler() actually handles several different types of fd events.
This includes timerfds that aren't sockets at all. The handling of these
has essentially nothing in common with the other cases. So, give the
TCP timers there own epoll_type value and dispatch directly to their
handler. This also means we can remove the timer field from tcp_epoll_ref,
the information it encoded is now implicit in the epoll_type value.
Signed-off-by: David Gibson
tcp_sock_handler() handles both listening TCP sockets, and connected TCP
sockets, but what it needs to do in those cases has essentially nothing in
common. Therefore, give listening sockets their own epoll_type value and
dispatch directly to their own handler from the top level. Furthermore,
the two handlers need essentially entirely different information from the
reference: we re-(ab)used the index field in the tcp_epoll_ref to indicate
the port for the listening socket, but that's not the same meaning. So,
switch listening sockets to their own reference type which we can lay out
as we please. That lets us remove the listen and outbound fields from the
normal (connected) tcp_epoll_ref, reducing it to just the connection table
index.
Signed-off-by: David Gibson
tap_handler() actually handles events on three different types of object:
the /dev/tap character device (pasta), a connected Unix domain socket
(passt) or a listening Unix domain socket (passt).
The last, in particular, really has no handling in common with the others,
so split it into its own epoll type and directly dispatch to the relevant
handler from the top level.
Signed-off-by: David Gibson
Currently we have a single epoll event type for the "tap" fd, which could
be either a handle on a /dev/net/tun device (pasta) or a connected Unix
socket (passt). However for the two modes we call different handler
functions. Simplify this a little by using different epoll types and
dispatching directly to the correct handler function.
Signed-off-by: David Gibson
participants (2)
-
David Gibson
-
Stefano Brivio