This is a handful of simple cleanups which I made while investigating https://bugs.passt.top/show_bug.cgi?id=41. Note that these don't themselves actually address that bug, they're just unrelated cleanups that happen to be in adjacent code. These are barely tested at all. I've had some crises right before going away that mean I haven't had a chance to plish these. I'm posting them so they're out there rather than waiting until I'm back in two weeks. These are based on my tap send unification series and the TCP socket pool cleanup series. David Gibson (4): tap: Don't pcap frames that didn't get sent tap: Eliminate goto from tap_handler() tcp: Remove 'recvmsg' goto from tcp_data_from_sock tcp: Remove 'zero_len' goto from tcp_data_from_sock tap.c | 49 +++++++++++++++++++++++++++---------------------- tcp.c | 37 +++++++++++++++++-------------------- 2 files changed, 44 insertions(+), 42 deletions(-) -- 2.39.1
In tap_send_frames() we send a number of frames to the tap device, then also write them to the pcap capture file (if configured). However the tap send can partially fail (short write()s or similar), meaning that some of the requested frames weren't actually sent, but we still write those frames to the capture file. We do give a debug message in this case, but it's misleading to add frames that we know weren't sent to the capture file. Rework to avoid this. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/tap.c b/tap.c index af9bc15..dd22490 100644 --- a/tap.c +++ b/tap.c @@ -309,10 +309,12 @@ void tap_icmp6_send(const struct ctx *c, * @iov: Array of buffers, each containing one frame * @n: Number of buffers/frames in @iov * + * Returns: number of frames successfully sent + * * #syscalls:pasta write */ -static void tap_send_frames_pasta(struct ctx *c, - const struct iovec *iov, size_t n) +static size_t tap_send_frames_pasta(struct ctx *c, + const struct iovec *iov, size_t n) { size_t i; @@ -324,6 +326,8 @@ static void tap_send_frames_pasta(struct ctx *c, i--; } } + + return n; } /** @@ -356,10 +360,12 @@ static void tap_send_remainder(const struct ctx *c, const struct iovec *iov, * @iov: Array of buffers, each containing one frame * @n: Number of buffers/frames in @iov * + * Returns: number of frames successfully sent + * * #syscalls:passt sendmsg */ -static void tap_send_frames_passt(const struct ctx *c, - const struct iovec *iov, size_t n) +static size_t tap_send_frames_passt(const struct ctx *c, + const struct iovec *iov, size_t n) { struct msghdr mh = { .msg_iov = (void *)iov, @@ -370,7 +376,7 @@ static void tap_send_frames_passt(const struct ctx *c, sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT); if (sent < 0) - return; + return 0; /* Check for any partial frames due to short send */ for (i = 0; i < n; i++) { @@ -385,8 +391,7 @@ static void tap_send_frames_passt(const struct ctx *c, i++; } - if (i < n) - debug("tap: dropped %lu frames due to short send", n - i); + return i; } /** @@ -397,15 +402,20 @@ static void tap_send_frames_passt(const struct ctx *c, */ void tap_send_frames(struct ctx *c, const struct iovec *iov, size_t n) { + size_t m; + if (!n) return; if (c->mode == MODE_PASST) - tap_send_frames_passt(c, iov, n); + m = tap_send_frames_passt(c, iov, n); else - tap_send_frames_pasta(c, iov, n); + m = tap_send_frames_pasta(c, iov, n); + + if (m < n) + debug("tap: dropped %lu frames of %lu due to short send", n - m, n); - pcap_multiple(iov, n, c->mode == MODE_PASST ? sizeof(uint32_t) : 0); + pcap_multiple(iov, m, c->mode == MODE_PASST ? sizeof(uint32_t) : 0); } /** -- 2.39.1
On Fri, 27 Jan 2023 16:11:07 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:In tap_send_frames() we send a number of frames to the tap device, then also write them to the pcap capture file (if configured). However the tap send can partially fail (short write()s or similar), meaning that some of the requested frames weren't actually sent, but we still write those frames to the capture file. We do give a debug message in this case, but it's misleading to add frames that we know weren't sent to the capture file. Rework to avoid this.To be really "correct", I guess we should also truncate messages in captures if they were sent partially, by returning the number of bytes sent from tap_send_frames_{pasta,passt}() and then modifying the argument to pcap_frame() in the pcap_multiple() loop. This is relevant because, if a packet has a checksum, we could consider it lost while checking captures. Still, it's a vast improvement on the original, so I would apply this even like it is -- except for two nits, below:Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/tap.c b/tap.c index af9bc15..dd22490 100644 --- a/tap.c +++ b/tap.c @@ -309,10 +309,12 @@ void tap_icmp6_send(const struct ctx *c, * @iov: Array of buffers, each containing one frame * @n: Number of buffers/frames in @iov * + * Returns: number of frames successfully sentFor consistency: "Return:" -- I see now that one slipped through in pcap_frame(). I can "fix" this on merge or in a follow-up patch, too.+ * * #syscalls:pasta write */ -static void tap_send_frames_pasta(struct ctx *c, - const struct iovec *iov, size_t n) +static size_t tap_send_frames_pasta(struct ctx *c, + const struct iovec *iov, size_t n) { size_t i; @@ -324,6 +326,8 @@ static void tap_send_frames_pasta(struct ctx *c, i--; } } + + return n; } /** @@ -356,10 +360,12 @@ static void tap_send_remainder(const struct ctx *c, const struct iovec *iov, * @iov: Array of buffers, each containing one frame * @n: Number of buffers/frames in @iov * + * Returns: number of frames successfully sentSame here. -- Stefano
On Wed, Feb 15, 2023 at 01:17:25PM +0100, Stefano Brivio wrote:On Fri, 27 Jan 2023 16:11:07 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:True.. that only applies for the pasta case, though. For passt we always send a whole frame, or the stream would get out of sync.In tap_send_frames() we send a number of frames to the tap device, then also write them to the pcap capture file (if configured). However the tap send can partially fail (short write()s or similar), meaning that some of the requested frames weren't actually sent, but we still write those frames to the capture file. We do give a debug message in this case, but it's misleading to add frames that we know weren't sent to the capture file. Rework to avoid this.To be really "correct", I guess we should also truncate messages in captures if they were sent partially, by returning the number of bytes sent from tap_send_frames_{pasta,passt}() and then modifying the argument to pcap_frame() in the pcap_multiple() loop.This is relevant because, if a packet has a checksum, we could consider it lost while checking captures. Still, it's a vast improvement on the original, so I would apply this even like it is -- except for two nits, below:Fixed. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonSigned-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/tap.c b/tap.c index af9bc15..dd22490 100644 --- a/tap.c +++ b/tap.c @@ -309,10 +309,12 @@ void tap_icmp6_send(const struct ctx *c, * @iov: Array of buffers, each containing one frame * @n: Number of buffers/frames in @iov * + * Returns: number of frames successfully sentFor consistency: "Return:" -- I see now that one slipped through in pcap_frame(). I can "fix" this on merge or in a follow-up patch, too.+ * * #syscalls:pasta write */ -static void tap_send_frames_pasta(struct ctx *c, - const struct iovec *iov, size_t n) +static size_t tap_send_frames_pasta(struct ctx *c, + const struct iovec *iov, size_t n) { size_t i; @@ -324,6 +326,8 @@ static void tap_send_frames_pasta(struct ctx *c, i--; } } + + return n; } /** @@ -356,10 +360,12 @@ static void tap_send_remainder(const struct ctx *c, const struct iovec *iov, * @iov: Array of buffers, each containing one frame * @n: Number of buffers/frames in @iov * + * Returns: number of frames successfully sentSame here.
On Thu, 16 Feb 2023 16:20:44 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Wed, Feb 15, 2023 at 01:17:25PM +0100, Stefano Brivio wrote:Actually, I was thinking of the passt case too: the send() in tap_send_remainder() might fail -- but then, contrary to what I wrote, there's nothing to truncate, because the socket back-end in QEMU doesn't deliver partial frames either. So this patch is actually correct in that regard. Unrelated: given that QEMU is going to have a "reconnect" option: http://patchwork.ozlabs.org/project/qemu-devel/patch/20230119101645.2001683… perhaps, if tap_send_remainder() fails, we should now close the socket to give the guest the best chances to recover? Compared to terminating the process, this has the advantage of keeping the whole state. -- StefanoOn Fri, 27 Jan 2023 16:11:07 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:True.. that only applies for the pasta case, though. For passt we always send a whole frame, or the stream would get out of sync.In tap_send_frames() we send a number of frames to the tap device, then also write them to the pcap capture file (if configured). However the tap send can partially fail (short write()s or similar), meaning that some of the requested frames weren't actually sent, but we still write those frames to the capture file. We do give a debug message in this case, but it's misleading to add frames that we know weren't sent to the capture file. Rework to avoid this.To be really "correct", I guess we should also truncate messages in captures if they were sent partially, by returning the number of bytes sent from tap_send_frames_{pasta,passt}() and then modifying the argument to pcap_frame() in the pcap_multiple() loop.
The goto here really doesn't improve clarity or brevity at all. Use a clearer construct. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/tap.c b/tap.c index dd22490..757fb86 100644 --- a/tap.c +++ b/tap.c @@ -1238,18 +1238,13 @@ void tap_handler(struct ctx *c, int fd, uint32_t events, } if ((c->mode == MODE_PASST && tap_handler_passt(c, now)) || - (c->mode == MODE_PASTA && tap_handler_pasta(c, now))) - goto reinit; - - if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) - goto reinit; + (c->mode == MODE_PASTA && tap_handler_pasta(c, now)) || + (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR))) { + if (c->one_off) { + info("Client closed connection, exiting"); + exit(EXIT_SUCCESS); + } - return; -reinit: - if (c->one_off) { - info("Client closed connection, exiting"); - exit(EXIT_SUCCESS); + tap_sock_init(c); } - - tap_sock_init(c); } -- 2.39.1
This goto can be handled just as simply and more clearly with a do while. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/tcp.c b/tcp.c index f0085e3..f7228d1 100644 --- a/tcp.c +++ b/tcp.c @@ -2158,13 +2158,12 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) iov_sock[fill_bufs].iov_len = iov_rem; /* Receive into buffers, don't dequeue until acknowledged by guest. */ -recvmsg: - len = recvmsg(s, &mh_sock, MSG_PEEK); - if (len < 0) { - if (errno == EINTR) - goto recvmsg; + do { + len = recvmsg(s, &mh_sock, MSG_PEEK); + } while (len < 0 && errno == EINTR); + + if (len < 0) goto err; - } if (!len) goto zero_len; -- 2.39.1
Nit: On Fri, 27 Jan 2023 16:11:09 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:This goto can be handled just as simply and more clearly with a do while. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/tcp.c b/tcp.c index f0085e3..f7228d1 100644 --- a/tcp.c +++ b/tcp.c @@ -2158,13 +2158,12 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) iov_sock[fill_bufs].iov_len = iov_rem; /* Receive into buffers, don't dequeue until acknowledged by guest. */ -recvmsg: - len = recvmsg(s, &mh_sock, MSG_PEEK); - if (len < 0) { - if (errno == EINTR) - goto recvmsg; + do { + len = recvmsg(s, &mh_sock, MSG_PEEK); + } while (len < 0 && errno == EINTR);For consistency, we shouldn't use curly brackets if there's a single line in the loop body (only other occurrence: pasta_wait_for_ns()). I don't have a strong preference here and I can also fix it up on merge, by the way. -- Stefano
On Wed, Feb 15, 2023 at 01:17:33PM +0100, Stefano Brivio wrote:Nit: On Fri, 27 Jan 2023 16:11:09 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Huh... I never even knew the braces were optional for do while.This goto can be handled just as simply and more clearly with a do while. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/tcp.c b/tcp.c index f0085e3..f7228d1 100644 --- a/tcp.c +++ b/tcp.c @@ -2158,13 +2158,12 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) iov_sock[fill_bufs].iov_len = iov_rem; /* Receive into buffers, don't dequeue until acknowledged by guest. */ -recvmsg: - len = recvmsg(s, &mh_sock, MSG_PEEK); - if (len < 0) { - if (errno == EINTR) - goto recvmsg; + do { + len = recvmsg(s, &mh_sock, MSG_PEEK); + } while (len < 0 && errno == EINTR);For consistency, we shouldn't use curly brackets if there's a single line in the loop body (only other occurrence: pasta_wait_for_ns()).I don't have a strong preference here and I can also fix it up on merge, by the way.-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
This goto exists purely to move this exception case out of line. Although that does make the "normal" path a little clearer, it comes at the cost of not knowing how where control will flow after jumping to the zero_len label. The exceptional case isn't that long, so just put it inline. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/tcp.c b/tcp.c index f7228d1..4ae5ed5 100644 --- a/tcp.c +++ b/tcp.c @@ -2165,8 +2165,18 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) if (len < 0) goto err; - if (!len) - goto zero_len; + if (!len) { + if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { + if ((ret = tcp_send_flag(c, conn, FIN | ACK))) { + tcp_rst(c, conn); + return ret; + } + + conn_event(c, conn, TAP_FIN_SENT); + } + + return 0; + } sendlen = len - already_sent; if (sendlen <= 0) { @@ -2205,18 +2215,6 @@ err: } return ret; - -zero_len: - if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { - if ((ret = tcp_send_flag(c, conn, FIN | ACK))) { - tcp_rst(c, conn); - return ret; - } - - conn_event(c, conn, TAP_FIN_SENT); - } - - return 0; } /** -- 2.39.1