pcapm() captures multiple frames from a msghdr, however the only thing it cares about in the msghdr is the list of buffers, where it assumes there is one frame to capture per buffer. That's what we want for its single caller but it's not the only obvious choice here (one frame per msghdr would arguably make more sense in isolation). In addition pcapm() has logic that only makes sense in the context of the passt specific path its called from: it skips the first 4 bytes of each buffer, because those have the qemu vnet_len rather than the frame proper. Make this clearer by replacing pcapm() with pcap_multiple() which more explicitly takes one struct iovec per frame, and parameterizes how much of each buffer to skip (i.e. the offset of the frame within the buffer). Signed-off-by: David Gibson --- pcap.c | 18 +++++++++--------- pcap.h | 2 +- tcp.c | 3 ++- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/pcap.c b/pcap.c index 0c6dc89..7dca599 100644 --- a/pcap.c +++ b/pcap.c @@ -105,10 +105,12 @@ void pcap(const char *pkt, size_t len) } /** - * pcapm() - Capture multiple frames from message header to pcap file - * @mh: Pointer to sendmsg() message header buffer + * pcap_multiple() - Capture multiple frames + * @iov: Array of iovecs, one entry per frame + * @n: Number of frames to capture + * @offset: Offset of the frame within each iovec buffer */ -void pcapm(const struct msghdr *mh) +void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) { struct timeval tv; unsigned int i; @@ -118,13 +120,11 @@ void pcapm(const struct msghdr *mh) gettimeofday(&tv, NULL); - for (i = 0; i < mh->msg_iovlen; i++) { - const struct iovec *iov = &mh->msg_iov[i]; - - if (pcap_frame((char *)iov->iov_base + 4, - iov->iov_len - 4, &tv) != 0) { + for (i = 0; i < n; i++) { + if (pcap_frame((char *)iov[i].iov_base + offset, + iov[i].iov_len - offset, &tv) != 0) { debug("Cannot log packet, length %lu", - iov->iov_len - 4); + iov->iov_len - offset); return; } } diff --git a/pcap.h b/pcap.h index 9e1736c..eafc89b 100644 --- a/pcap.h +++ b/pcap.h @@ -7,7 +7,7 @@ #define PCAP_H void pcap(const char *pkt, size_t len); -void pcapm(const struct msghdr *mh); +void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset); void pcapmm(const struct mmsghdr *mmh, unsigned int vlen); void pcap_init(struct ctx *c); diff --git a/tcp.c b/tcp.c index cfdae06..ed65a9e 100644 --- a/tcp.c +++ b/tcp.c @@ -1468,7 +1468,8 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh, } } *buf_used = *buf_bytes = 0; - pcapm(mh); + + pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t)); } /** -- 2.39.0

David Gibson

1:43 a.m.

New subject: [PATCH v3 03/18] tcp: Combine two parts of passt tap send path together

tcp_l2_buf_flush() open codes the "primary" send of message to the passt tap interface, but calls tcp_l2_buf_flush_part() to handle the case of a short send. Combine these two passt-specific operations into tcp_l2_buf_flush_passt() which is a little cleaner and will enable furrther cleanups. Signed-off-by: David Gibson --- tcp.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/tcp.c b/tcp.c index ed65a9e..6a59c85 100644 --- a/tcp.c +++ b/tcp.c @@ -1415,19 +1415,25 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov) } /** - * tcp_l2_buf_flush_part() - Ensure a complete last message on partial sendmsg() + * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface * @c: Execution context * @mh: Message header that was partially sent by sendmsg() - * @sent: Bytes already sent + * @buf_bytes: Total number of bytes to send */ -static void tcp_l2_buf_flush_part(const struct ctx *c, - const struct msghdr *mh, size_t sent) +static void tcp_l2_buf_flush_passt(const struct ctx *c, + const struct msghdr *mh, size_t buf_bytes) { - size_t end = 0, missing; + size_t end = 0, missing, sent; struct iovec *iov; unsigned int i; + ssize_t n; char *p; + n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT); + if (n < 0 || ((sent = (size_t)n) == buf_bytes)) + return; + + /* Ensure a complete last message on partial sendmsg() */ for (i = 0, iov = mh->msg_iov; i < mh->msg_iovlen; i++, iov++) { end += iov->iov_len; if (end >= sent) @@ -1454,9 +1460,7 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh, return; if (c->mode == MODE_PASST) { - size_t n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT); - if (n > 0 && n < *buf_bytes) - tcp_l2_buf_flush_part(c, mh, n); + tcp_l2_buf_flush_passt(c, mh, *buf_bytes); } else { size_t i; -- 2.39.0

David Gibson

1:43 a.m.

New subject: [PATCH v3 04/18] tcp: Don't compute total bytes in a message until we need it

tcp[46]_l2_buf_bytes keep track of the total number of bytes we have queued to send to the tap interface. tcp_l2_buf_flush_passt() uses this to determine if sendmsg() has sent all the data we requested, or whether we need to resend a trailing portion. However, the logic for finding where we're up to in the case of a short sendmsg() can equally well tell whether we've had one at all, without knowing the total number in advance. This does require an extra loop after each sendmsg(), but it's doing simple arithmetic on values we've already been accessing, and it leads to overall simpler code. tcp[46]_l2_flags_buf_bytes were being calculated, but never used for anything, so simply remove them. Signed-off-by: David Gibson --- tcp.c | 53 ++++++++++++++++++----------------------------------- 1 file changed, 18 insertions(+), 35 deletions(-) diff --git a/tcp.c b/tcp.c index 6a59c85..5efef6f 100644 --- a/tcp.c +++ b/tcp.c @@ -476,7 +476,6 @@ static struct tcp4_l2_buf_t { tcp4_l2_buf[TCP_FRAMES_MEM]; static unsigned int tcp4_l2_buf_used; -static size_t tcp4_l2_buf_bytes; /** * tcp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections @@ -507,7 +506,6 @@ struct tcp6_l2_buf_t { tcp6_l2_buf[TCP_FRAMES_MEM]; static unsigned int tcp6_l2_buf_used; -static size_t tcp6_l2_buf_bytes; /* recvmsg()/sendmsg() data for tap */ static char tcp_buf_discard [MAX_WINDOW]; @@ -555,7 +553,6 @@ static struct tcp4_l2_flags_buf_t { tcp4_l2_flags_buf[TCP_FRAMES_MEM]; static unsigned int tcp4_l2_flags_buf_used; -static size_t tcp4_l2_flags_buf_bytes; /** * tcp6_l2_flags_buf_t - IPv6 packet buffers for segments without data (flags) @@ -585,7 +582,6 @@ static struct tcp6_l2_flags_buf_t { tcp6_l2_flags_buf[TCP_FRAMES_MEM]; static unsigned int tcp6_l2_flags_buf_used; -static size_t tcp6_l2_flags_buf_bytes; /* TCP connections */ union tcp_conn tc[TCP_MAX_CONNS]; @@ -1418,29 +1414,30 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov) * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface * @c: Execution context * @mh: Message header that was partially sent by sendmsg() - * @buf_bytes: Total number of bytes to send */ -static void tcp_l2_buf_flush_passt(const struct ctx *c, - const struct msghdr *mh, size_t buf_bytes) +static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh) { - size_t end = 0, missing, sent; + size_t end = 0, missing; struct iovec *iov; unsigned int i; - ssize_t n; + ssize_t sent; char *p; - n = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT); - if (n < 0 || ((sent = (size_t)n) == buf_bytes)) + sent = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT); + if (sent < 0) return; /* Ensure a complete last message on partial sendmsg() */ for (i = 0, iov = mh->msg_iov; i < mh->msg_iovlen; i++, iov++) { end += iov->iov_len; - if (end >= sent) + if (end >= (size_t)sent) break; } missing = end - sent; + if (!missing) + return; + p = (char *)iov->iov_base + iov->iov_len - missing; if (send(c->fd_tap, p, missing, MSG_NOSIGNAL)) debug("TCP: failed to flush %lu missing bytes to tap", missing); @@ -1451,16 +1448,15 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c, * @c: Execution context * @mh: Message header pointing to buffers, msg_iovlen not set * @buf_used: Pointer to count of used buffers, set to 0 on return - * @buf_bytes: Pointer to count of buffer bytes, set to 0 on return */ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh, - unsigned int *buf_used, size_t *buf_bytes) + unsigned int *buf_used) { if (!(mh->msg_iovlen = *buf_used)) return; if (c->mode == MODE_PASST) { - tcp_l2_buf_flush_passt(c, mh, *buf_bytes); + tcp_l2_buf_flush_passt(c, mh); } else { size_t i; @@ -1471,7 +1467,7 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh, i--; } } - *buf_used = *buf_bytes = 0; + *buf_used = 0; pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t)); } @@ -1484,17 +1480,14 @@ static void tcp_l2_flags_buf_flush(struct ctx *c) { struct msghdr mh = { 0 }; unsigned int *buf_used; - size_t *buf_bytes; mh.msg_iov = tcp6_l2_flags_iov; buf_used = &tcp6_l2_flags_buf_used; - buf_bytes = &tcp6_l2_flags_buf_bytes; - tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes); + tcp_l2_buf_flush(c, &mh, buf_used); mh.msg_iov = tcp4_l2_flags_iov; buf_used = &tcp4_l2_flags_buf_used; - buf_bytes = &tcp4_l2_flags_buf_bytes; - tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes); + tcp_l2_buf_flush(c, &mh, buf_used); } /** @@ -1505,17 +1498,14 @@ static void tcp_l2_data_buf_flush(struct ctx *c) { struct msghdr mh = { 0 }; unsigned int *buf_used; - size_t *buf_bytes; mh.msg_iov = tcp6_l2_iov; buf_used = &tcp6_l2_buf_used; - buf_bytes = &tcp6_l2_buf_bytes; - tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes); + tcp_l2_buf_flush(c, &mh, buf_used); mh.msg_iov = tcp4_l2_iov; buf_used = &tcp4_l2_buf_used; - buf_bytes = &tcp4_l2_buf_bytes; - tcp_l2_buf_flush(c, &mh, buf_used, buf_bytes); + tcp_l2_buf_flush(c, &mh, buf_used); } /** @@ -1829,11 +1819,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) NULL, conn->seq_to_tap); iov->iov_len = eth_len + sizeof(uint32_t); - if (CONN_V4(conn)) - tcp4_l2_flags_buf_bytes += iov->iov_len; - else - tcp6_l2_flags_buf_bytes += iov->iov_len; - if (th->ack) conn_flag(c, conn, ~ACK_TO_TAP_DUE); @@ -1849,7 +1834,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) memcpy(b4 + 1, b4, sizeof(*b4)); (iov + 1)->iov_len = iov->iov_len; tcp4_l2_flags_buf_used++; - tcp4_l2_flags_buf_bytes += iov->iov_len; } if (tcp4_l2_flags_buf_used > ARRAY_SIZE(tcp4_l2_flags_buf) - 2) @@ -1859,7 +1843,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) memcpy(b6 + 1, b6, sizeof(*b6)); (iov + 1)->iov_len = iov->iov_len; tcp6_l2_flags_buf_used++; - tcp6_l2_flags_buf_bytes += iov->iov_len; } if (tcp6_l2_flags_buf_used > ARRAY_SIZE(tcp6_l2_flags_buf) - 2) @@ -2203,7 +2186,7 @@ static void tcp_data_to_tap(struct ctx *c, struct tcp_tap_conn *conn, len = tcp_l2_buf_fill_headers(c, conn, b, plen, check, seq); iov = tcp4_l2_iov + tcp4_l2_buf_used++; - tcp4_l2_buf_bytes += iov->iov_len = len + sizeof(b->vnet_len); + iov->iov_len = len + sizeof(b->vnet_len); if (tcp4_l2_buf_used > ARRAY_SIZE(tcp4_l2_buf) - 1) tcp_l2_data_buf_flush(c); } else if (CONN_V6(conn)) { @@ -2212,7 +2195,7 @@ static void tcp_data_to_tap(struct ctx *c, struct tcp_tap_conn *conn, len = tcp_l2_buf_fill_headers(c, conn, b, plen, NULL, seq); iov = tcp6_l2_iov + tcp6_l2_buf_used++; - tcp6_l2_buf_bytes += iov->iov_len = len + sizeof(b->vnet_len); + iov->iov_len = len + sizeof(b->vnet_len); if (tcp6_l2_buf_used > ARRAY_SIZE(tcp6_l2_buf) - 1) tcp_l2_data_buf_flush(c); } -- 2.39.0

David Gibson

1:43 a.m.

New subject: [PATCH v3 05/18] tcp: Improve interface to tcp_l2_buf_flush()

Currently this takes a msghdr, but the only thing we actually care about in there is the io vector. Make it take an io vector directly. We also have a weird side effect of zeroing @buf_used. Just pass this by value and zero it in the caller instead. Signed-off-by: David Gibson --- tcp.c | 63 ++++++++++++++++++++++++----------------------------------- 1 file changed, 26 insertions(+), 37 deletions(-) diff --git a/tcp.c b/tcp.c index 5efef6f..d96122d 100644 --- a/tcp.c +++ b/tcp.c @@ -1413,22 +1413,27 @@ static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov) /** * tcp_l2_buf_flush_passt() - Send a message on the passt tap interface * @c: Execution context - * @mh: Message header that was partially sent by sendmsg() + * @iov: Pointer to array of buffers, one per frame + * @n: Number of buffers/frames to flush */ -static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh) +static void tcp_l2_buf_flush_passt(const struct ctx *c, + const struct iovec *iov, size_t n) { + struct msghdr mh = { + .msg_iov = (void *)iov, + .msg_iovlen = n, + }; size_t end = 0, missing; - struct iovec *iov; unsigned int i; ssize_t sent; char *p; - sent = sendmsg(c->fd_tap, mh, MSG_NOSIGNAL | MSG_DONTWAIT); + sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT); if (sent < 0) return; /* Ensure a complete last message on partial sendmsg() */ - for (i = 0, iov = mh->msg_iov; i < mh->msg_iovlen; i++, iov++) { + for (i = 0; i < n; i++, iov++) { end += iov->iov_len; if (end >= (size_t)sent) break; @@ -1446,30 +1451,24 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c, const struct msghdr *mh) /** * tcp_l2_flags_buf_flush() - Send out buffers for segments with or without data * @c: Execution context - * @mh: Message header pointing to buffers, msg_iovlen not set - * @buf_used: Pointer to count of used buffers, set to 0 on return */ -static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh, - unsigned int *buf_used) +static void tcp_l2_buf_flush(struct ctx *c, const struct iovec *iov, size_t n) { - if (!(mh->msg_iovlen = *buf_used)) + size_t i; + + if (!n) return; if (c->mode == MODE_PASST) { - tcp_l2_buf_flush_passt(c, mh); + tcp_l2_buf_flush_passt(c, iov, n); } else { - size_t i; - - for (i = 0; i < mh->msg_iovlen; i++) { - struct iovec *iov = &mh->msg_iov[i]; - - if (tcp_l2_buf_write_one(c, iov)) + for (i = 0; i < n; i++) { + if (tcp_l2_buf_write_one(c, iov + i)) i--; } } - *buf_used = 0; - pcap_multiple(mh->msg_iov, mh->msg_iovlen, sizeof(uint32_t)); + pcap_multiple(iov, n, sizeof(uint32_t)); } /** @@ -1478,16 +1477,11 @@ static void tcp_l2_buf_flush(struct ctx *c, struct msghdr *mh, */ static void tcp_l2_flags_buf_flush(struct ctx *c) { - struct msghdr mh = { 0 }; - unsigned int *buf_used; - - mh.msg_iov = tcp6_l2_flags_iov; - buf_used = &tcp6_l2_flags_buf_used; - tcp_l2_buf_flush(c, &mh, buf_used); + tcp_l2_buf_flush(c, tcp6_l2_flags_iov, tcp6_l2_flags_buf_used); + tcp6_l2_flags_buf_used = 0; - mh.msg_iov = tcp4_l2_flags_iov; - buf_used = &tcp4_l2_flags_buf_used; - tcp_l2_buf_flush(c, &mh, buf_used); + tcp_l2_buf_flush(c, tcp4_l2_flags_iov, tcp4_l2_flags_buf_used); + tcp4_l2_flags_buf_used = 0; } /** @@ -1496,16 +1490,11 @@ static void tcp_l2_flags_buf_flush(struct ctx *c) */ static void tcp_l2_data_buf_flush(struct ctx *c) { - struct msghdr mh = { 0 }; - unsigned int *buf_used; - - mh.msg_iov = tcp6_l2_iov; - buf_used = &tcp6_l2_buf_used; - tcp_l2_buf_flush(c, &mh, buf_used); + tcp_l2_buf_flush(c, tcp6_l2_iov, tcp6_l2_buf_used); + tcp6_l2_buf_used = 0; - mh.msg_iov = tcp4_l2_iov; - buf_used = &tcp4_l2_buf_used; - tcp_l2_buf_flush(c, &mh, buf_used); + tcp_l2_buf_flush(c, tcp4_l2_iov, tcp4_l2_buf_used); + tcp4_l2_buf_used = 0; } /** -- 2.39.0

David Gibson

1:43 a.m.

New subject: [PATCH v3 06/18] tcp: Combine two parts of pasta tap send path together

tcp_l2_buf_flush() open codes the loop across each frame in a group, but but calls tcp_l2_buf_write_one() to send each frame to the pasta tuntap device. Combine these two pasta-specific operations into tcp_l2_buf_flush_pasta() which is a little cleaner and will enable further cleanups. Signed-off-by: David Gibson --- tcp.c | 40 ++++++++++++++++++---------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/tcp.c b/tcp.c index d96122d..9960a35 100644 --- a/tcp.c +++ b/tcp.c @@ -1391,23 +1391,25 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn); } while (0) /** - * tcp_l2_buf_write_one() - Write a single buffer to tap file descriptor + * tcp_l2_buf_flush_pasta() - Send frames on the pasta tap interface * @c: Execution context - * @iov: struct iovec item pointing to buffer - * @ts: Current timestamp - * - * Return: 0 on success, negative error code on failure (tap reset possible) + * @iov: Pointer to array of buffers, one per frame + * @n: Number of buffers/frames to flush */ -static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov) +static void tcp_l2_buf_flush_pasta(struct ctx *c, + const struct iovec *iov, size_t n) { - if (write(c->fd_tap, (char *)iov->iov_base + 4, iov->iov_len - 4) < 0) { - debug("tap write: %s", strerror(errno)); - if (errno != EAGAIN && errno != EWOULDBLOCK) - tap_handler(c, c->fd_tap, EPOLLERR, NULL); - return -errno; - } + size_t i; - return 0; + for (i = 0; i < n; i++) { + if (write(c->fd_tap, (char *)iov->iov_base + 4, + iov->iov_len - 4) < 0) { + debug("tap write: %s", strerror(errno)); + if (errno != EAGAIN && errno != EWOULDBLOCK) + tap_handler(c, c->fd_tap, EPOLLERR, NULL); + i--; + } + } } /** @@ -1454,19 +1456,13 @@ static void tcp_l2_buf_flush_passt(const struct ctx *c, */ static void tcp_l2_buf_flush(struct ctx *c, const struct iovec *iov, size_t n) { - size_t i; - if (!n) return; - if (c->mode == MODE_PASST) { + if (c->mode == MODE_PASST) tcp_l2_buf_flush_passt(c, iov, n); - } else { - for (i = 0; i < n; i++) { - if (tcp_l2_buf_write_one(c, iov + i)) - i--; - } - } + else + tcp_l2_buf_flush_pasta(c, iov, n); pcap_multiple(iov, n, sizeof(uint32_t)); } -- 2.39.0

Stefano Brivio

13 Feb 13 Feb

2:13 a.m.

New subject: [PATCH v3 06/18] tcp: Combine two parts of pasta tap send path together

On Fri, 6 Jan 2023 11:43:10 +1100 David Gibson wrote:

...

tcp_l2_buf_flush() open codes the loop across each frame in a group, but but calls tcp_l2_buf_write_one() to send each frame to the pasta tuntap device. Combine these two pasta-specific operations into tcp_l2_buf_flush_pasta() which is a little cleaner and will enable further cleanups.

Signed-off-by: David Gibson --- tcp.c | 40 ++++++++++++++++++---------------------- 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/tcp.c b/tcp.c index d96122d..9960a35 100644 --- a/tcp.c +++ b/tcp.c @@ -1391,23 +1391,25 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn); } while (0)

/** - * tcp_l2_buf_write_one() - Write a single buffer to tap file descriptor + * tcp_l2_buf_flush_pasta() - Send frames on the pasta tap interface * @c: Execution context - * @iov: struct iovec item pointing to buffer - * @ts: Current timestamp - * - * Return: 0 on success, negative error code on failure (tap reset possible) + * @iov: Pointer to array of buffers, one per frame + * @n: Number of buffers/frames to flush */ -static int tcp_l2_buf_write_one(struct ctx *c, const struct iovec *iov) +static void tcp_l2_buf_flush_pasta(struct ctx *c, + const struct iovec *iov, size_t n) { - if (write(c->fd_tap, (char *)iov->iov_base + 4, iov->iov_len - 4) < 0) { - debug("tap write: %s", strerror(errno)); - if (errno != EAGAIN && errno != EWOULDBLOCK) - tap_handler(c, c->fd_tap, EPOLLERR, NULL); - return -errno; - } + size_t i;

- return 0; + for (i = 0; i < n; i++) { + if (write(c->fd_tap, (char *)iov->iov_base + 4, + iov->iov_len - 4) < 0) {

It took me a moment to miss this during review, but a very long time to figure out later. :( This always sends the first frame in 'iov'. Surprisingly, pasta_tcp performance tests work just fine most of the times: for data connections we usually end up moving a single frame at a time, and retransmissions hide the issue for control messages. I just posted a patch on top of this, you don't have to respin, and it's actually more convenient for me to apply this with a fix at this point. -- Stefano

David Gibson

6 Jan 6 Jan

24 Jan 24 Jan

10:20 p.m.

On Fri, 6 Jan 2023 11:43:04 +1100 David Gibson wrote:

...

Although we have an abstraction for the "slow path" (DHCP, NDP) guest bound packets, the TCP and UDP forwarding paths write directly to the tap fd. However, it turns out how they send frames to the tap device is more similar than it originally appears.

This series unifies the low-level tap send functions for TCP and UDP, and makes some clean ups along the way.

This is based on my earlier outstanding series.

For some reason, performance tests consistently get stuck (both TCP and UDP, sometimes throughput, sometimes latency tests) with this series, and not without it, but I don't see any possible relationship with that. I checked debug output and I couldn't find anything obviously wrong there. I just started checking packet captures now... -- Stefano

David Gibson

25 Jan 25 Jan

4:13 a.m.

On Tue, Jan 24, 2023 at 10:20:43PM +0100, Stefano Brivio wrote:

...

On Fri, 6 Jan 2023 11:43:04 +1100 David Gibson wrote:

...
Although we have an abstraction for the "slow path" (DHCP, NDP) guest bound packets, the TCP and UDP forwarding paths write directly to the tap fd. However, it turns out how they send frames to the tap device is more similar than it originally appears.

This series unifies the low-level tap send functions for TCP and UDP, and makes some clean ups along the way.

This is based on my earlier outstanding series.

For some reason, performance tests consistently get stuck (both TCP and UDP, sometimes throughput, sometimes latency tests) with this series, and not without it, but I don't see any possible relationship with that.

Drat, I didn't encounter that. Any chance you could bisect to figure out which patch specifically seems to trigger it? I wonder if this could be related to the stalls I'm debugging, although those didn't appear on the perf tests and also occur on main. I have now discovered they seem to be masked by large socket buffer sizes - more info at https://bugs.passt.top/show_bug.cgi?id=41

...

I checked debug output and I couldn't find anything obviously wrong there. I just started checking packet captures now...

Hrm, probably not then. The stalls I'm seeing are associated with lots of partial sends. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson

Stefano Brivio

26 Jan 26 Jan

12:21 a.m.

On Wed, 25 Jan 2023 14:13:44 +1100 David Gibson wrote:

...

On Tue, Jan 24, 2023 at 10:20:43PM +0100, Stefano Brivio wrote:

...
On Fri, 6 Jan 2023 11:43:04 +1100 David Gibson wrote:

...
Although we have an abstraction for the "slow path" (DHCP, NDP) guest bound packets, the TCP and UDP forwarding paths write directly to the tap fd. However, it turns out how they send frames to the tap device is more similar than it originally appears.

This series unifies the low-level tap send functions for TCP and UDP, and makes some clean ups along the way.

This is based on my earlier outstanding series.

For some reason, performance tests consistently get stuck (both TCP and UDP, sometimes throughput, sometimes latency tests) with this series, and not without it, but I don't see any possible relationship with that.

Drat, I didn't encounter that. Any chance you could bisect to figure out which patch specifically seems to trigger it?

I couldn't do it conclusively, yet. :/ Before "tcp: Combine two parts of passt tap send path together", no stalls at all. After that, I'm routinely getting a stall on the perf/passt_udp test, IPv4 host-to-guest with 256B MTU. I know, that test is probably meaningless as a performance figure, but it helps find issues like this, at least. :) Yes, UDP -- the iperf3 client doesn't connect to the server, passt doesn't crash, but it's gone (zombie) by the time I get to it. I think it's the test scripts terminating it (even though I don't see anything on the terminal), and script.log ends with: 2023/01/25 21:27:14 socat[3432381] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer kex_exchange_identification: Connection closed by remote host Connection closed by UNKNOWN port 65535 ssh-keygen: generating new host keys: RSA 2023/01/25 21:27:14 socat[3432390] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer kex_exchange_identification: Connection closed by remote host Connection closed by UNKNOWN port 65535 2023/01/25 21:27:14 socat[3432393] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer kex_exchange_identification: Connection closed by remote host Connection closed by UNKNOWN port 65535 2023/01/25 21:27:14 socat[3432396] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer kex_exchange_identification: Connection closed by remote host Connection closed by UNKNOWN port 65535 2023/01/25 21:27:14 socat[3432399] E connect(5, AF=40 cid:94557 port:22, 16): Connection reset by peer kex_exchange_identification: Connection closed by remote host Connection closed by UNKNOWN port 65535 DSA ECDSA ED25519 # Warning: Permanently added 'guest' (ED25519) to the list of known hosts. which looks like fairly normal retries. If I run the tests with DEBUG=1, they get stuck during UDP functional testing, so I'm letting that aside for a moment. If I apply the whole series, other tests get stuck (including TCP ones). There might be something going wrong with iperf3's (TCP) control message exchange. I'm going to run this single test next, and add some debugging prints here and there.

...

I wonder if this could be related to the stalls I'm debugging, although those didn't appear on the perf tests and also occur on main. I have now discovered they seem to be masked by large socket buffer sizes - more info at https://bugs.passt.top/show_bug.cgi?id=41

Maybe the subsequent failures (or even this one) could actually be related, and triggered somehow by some change in timing. I'm still clueless at the moment. -- Stefano

Stefano Brivio

13 Feb 13 Feb

2:14 a.m.

On Thu, 26 Jan 2023 00:21:33 +0100 Stefano Brivio wrote:

...

On Wed, 25 Jan 2023 14:13:44 +1100 David Gibson wrote:

...
On Tue, Jan 24, 2023 at 10:20:43PM +0100, Stefano Brivio wrote:

...
On Fri, 6 Jan 2023 11:43:04 +1100 David Gibson wrote:

...
Although we have an abstraction for the "slow path" (DHCP, NDP) guest bound packets, the TCP and UDP forwarding paths write directly to the tap fd. However, it turns out how they send frames to the tap device is more similar than it originally appears.

This series unifies the low-level tap send functions for TCP and UDP, and makes some clean ups along the way.

This is based on my earlier outstanding series.

For some reason, performance tests consistently get stuck (both TCP and UDP, sometimes throughput, sometimes latency tests) with this series, and not without it, but I don't see any possible relationship with that.

Drat, I didn't encounter that. Any chance you could bisect to figure out which patch specifically seems to trigger it?

[...]

...
I wonder if this could be related to the stalls I'm debugging, although those didn't appear on the perf tests and also occur on main. I have now discovered they seem to be masked by large socket buffer sizes - more info at https://bugs.passt.top/show_bug.cgi?id=41

Maybe the subsequent failures (or even this one) could actually be related, and triggered somehow by some change in timing. I'm still clueless at the moment.

This turned out to be a combination of three different issues: - left-over patches in my local qemu tree (and build) trying to address the virtio-net TX hang ultimately fixed by kernel commit d71ebe8114b4 ("virtio-net: correctly enable callback during start_xmit"). I'm using the latest upstream now, clean - the issue you reported at https://bugs.passt.top/show_bug.cgi?id=41, I just posted a patch for it - the issue introduced by "tcp: Combine two parts of pasta tap send path together", patch also posted With these three sorted, finally I could apply this series! Apologies for the delay. -- Stefano

1060

Age (days ago)

1098

Last active (days ago)

List overview

Download

23 comments

2 participants

participants (2)

David Gibson
Stefano Brivio

[PATCH v3 00/18] RFC: Unify and simplify tap send path

tags

participants (2)