These changes started off as part of the series re-introducing EPOLLET for tap event handling. That's now turned out to be of lower priority, but along the way we fixed a bug where we could truncate frames from the kernel tap interface. This is a respin of that patch, plus a few minor preliminary cleanups. Various minor tweaks based on feedback from the original posting as part of the tap EPOLLET series. David Gibson (4): tap: Split out handling of EPOLLIN events tap: Improve handling of EINTR in tap_passt_input() tap: Restructure in tap_pasta_input() tap: Don't risk truncating frames on full buffer in tap_pasta_input() tap.c | 98 +++++++++++++++++++++++++++++++++++------------------------ 1 file changed, 58 insertions(+), 40 deletions(-) -- 2.46.0
Currently, tap_handler_pas{st,ta}() check for EPOLLRDHUP, EPOLLHUP and EPOLLERR events, then assume anything left is EPOLLIN. We have some future cases that may want to also handle EPOLLOUT, so in preparation explicitly handle EPOLLIN, moving the logic to a subfunction. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 50 ++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 36 insertions(+), 14 deletions(-) diff --git a/tap.c b/tap.c index 852d8376..c8abc068 100644 --- a/tap.c +++ b/tap.c @@ -982,24 +982,17 @@ static void tap_sock_reset(struct ctx *c) } /** - * tap_handler_passt() - Packet handler for AF_UNIX file descriptor + * tap_passt_input() - Handler for new data on the socket to qemu * @c: Execution context - * @events: epoll events * @now: Current timestamp */ -void tap_handler_passt(struct ctx *c, uint32_t events, - const struct timespec *now) +static void tap_passt_input(struct ctx *c, const struct timespec *now) { static const char *partial_frame; static ssize_t partial_len = 0; ssize_t n; char *p; - if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) { - tap_sock_reset(c); - return; - } - tap_flush_pools(); if (partial_len) { @@ -1052,20 +1045,33 @@ void tap_handler_passt(struct ctx *c, uint32_t events, } /** - * tap_handler_pasta() - Packet handler for /dev/net/tun file descriptor + * tap_handler_passt() - Event handler for AF_UNIX file descriptor * @c: Execution context * @events: epoll events * @now: Current timestamp */ -void tap_handler_pasta(struct ctx *c, uint32_t events, +void tap_handler_passt(struct ctx *c, uint32_t events, const struct timespec *now) +{ + if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) { + tap_sock_reset(c); + return; + } + + if (events & EPOLLIN) + tap_passt_input(c, now); +} + +/** + * tap_pasta_input() - Handler for new data on the socket to hypervisor + * @c: Execution context + * @now: Current timestamp + */ +static void tap_pasta_input(struct ctx *c, const struct timespec *now) { ssize_t n, len; int ret; - if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) - die("Disconnect event on /dev/net/tun device, exiting"); - redo: n = 0; @@ -1102,6 +1108,22 @@ restart: die("Error on tap device, exiting"); } +/** + * tap_handler_pasta() - Packet handler for /dev/net/tun file descriptor + * @c: Execution context + * @events: epoll events + * @now: Current timestamp + */ +void tap_handler_pasta(struct ctx *c, uint32_t events, + const struct timespec *now) +{ + if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) + die("Disconnect event on /dev/net/tun device, exiting"); + + if (events & EPOLLIN) + tap_pasta_input(c, now); +} + /** * tap_sock_unix_open() - Create and bind AF_UNIX socket * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix) -- 2.46.0
When tap_passt_input() gets an error from recv() it (correctly) does not print any error message for EINTR, EAGAIN or EWOULDBLOCK. However in all three cases it returns from the function. That makes sense for EAGAIN and EWOULDBLOCK, since we then want to wait for the next EPOLLIN event before trying again. For EINTR, however, it makes more sense to retry immediately - as it stands we're likely to get a renewer EPOLLIN event immediately in that case, since we're using level triggered signalling. So, handle EINTR separately by immediately retrying until we succeed or get a different type of error. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/tap.c b/tap.c index c8abc068..8977b3f2 100644 --- a/tap.c +++ b/tap.c @@ -1003,10 +1003,13 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now) memmove(pkt_buf, partial_frame, partial_len); } - n = recv(c->fd_tap, pkt_buf + partial_len, TAP_BUF_BYTES - partial_len, - MSG_DONTWAIT); + do { + n = recv(c->fd_tap, pkt_buf + partial_len, + TAP_BUF_BYTES - partial_len, MSG_DONTWAIT); + } while ((n < 0) && errno == EINTR); + if (n < 0) { - if (errno != EINTR && errno != EAGAIN && errno != EWOULDBLOCK) { + if (errno != EAGAIN && errno != EWOULDBLOCK) { err_perror("Receive error on guest connection, reset"); tap_sock_reset(c); } -- 2.46.0
tap_pasta_input() has a rather confusing structure, using two gotos. Remove these by restructuring the function to have the main loop condition based on filling our buffer space, with errors or running out of data treated as the exception, rather than the other way around. This allows us to handle the EINTR which triggered the 'restart' goto with a continue. The outer 'redo' was triggered if we completely filled our buffer, to flush it and do another pass. This one is unnecessary since we don't (yet) use EPOLLET on the tap device: if there's still more data we'll get another event and re-enter the loop. Along the way handle a couple of extra edge cases: - Check for EWOULDBLOCK as well as EAGAIN for the benefit of any future ports where those might not have the same value - Detect EOF on the tap device and exit in that case Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 45 +++++++++++++++++++-------------------------- 1 file changed, 19 insertions(+), 26 deletions(-) diff --git a/tap.c b/tap.c index 8977b3f2..145587fc 100644 --- a/tap.c +++ b/tap.c @@ -1073,42 +1073,35 @@ void tap_handler_passt(struct ctx *c, uint32_t events, static void tap_pasta_input(struct ctx *c, const struct timespec *now) { ssize_t n, len; - int ret; - -redo: - n = 0; tap_flush_pools(); -restart: - while ((len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n)) > 0) { - if (len < (ssize_t)sizeof(struct ethhdr) || - len > (ssize_t)ETH_MAX_MTU) { - n += len; - continue; - } + for (n = 0; n < (ssize_t)TAP_BUF_BYTES; n += len) { + len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n); + if (len == 0) { + die("EOF on tap device, exiting"); + } else if (len < 0) { + if (errno == EINTR) { + len = 0; + continue; + } - tap_add_packet(c, len, pkt_buf + n); + if (errno == EAGAIN && errno == EWOULDBLOCK) + break; /* all done for now */ - if ((n += len) == TAP_BUF_BYTES) - break; - } + die("Error on tap device, exiting"); + } - if (len < 0 && errno == EINTR) - goto restart; + /* Ignore frames of bad length */ + if (len < (ssize_t)sizeof(struct ethhdr) || + len > (ssize_t)ETH_MAX_MTU) + continue; - ret = errno; + tap_add_packet(c, len, pkt_buf + n); + } tap_handler(c, now); - - if (len > 0 || ret == EAGAIN) - return; - - if (n == TAP_BUF_BYTES) - goto redo; - - die("Error on tap device, exiting"); } /** -- 2.46.0
tap_pasta_input() keeps reading frames from the tap device until the buffer is full. However, this has an ugly edge case, when we get close to buffer full, we will provide just the remaining space as a read() buffer. If this is shorter than the next frame to read, the tap device will truncate the frame and discard the remainder. Adjust the code to make sure we always have room for a maximum size frame. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tap.c b/tap.c index 145587fc..41af6a6d 100644 --- a/tap.c +++ b/tap.c @@ -1076,8 +1076,8 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now) tap_flush_pools(); - for (n = 0; n < (ssize_t)TAP_BUF_BYTES; n += len) { - len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n); + for (n = 0; n <= (ssize_t)TAP_BUF_BYTES - ETH_MAX_MTU; n += len) { + len = read(c->fd_tap, pkt_buf + n, ETH_MAX_MTU); if (len == 0) { die("EOF on tap device, exiting"); -- 2.46.0
On Fri, 6 Sep 2024 21:49:35 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:These changes started off as part of the series re-introducing EPOLLET for tap event handling. That's now turned out to be of lower priority, but along the way we fixed a bug where we could truncate frames from the kernel tap interface. This is a respin of that patch, plus a few minor preliminary cleanups. Various minor tweaks based on feedback from the original posting as part of the tap EPOLLET series. David Gibson (4): tap: Split out handling of EPOLLIN events tap: Improve handling of EINTR in tap_passt_input() tap: Restructure in tap_pasta_input() tap: Don't risk truncating frames on full buffer in tap_pasta_input()Applied. -- Stefano