[PATCH v6 0/8] Add vhost-user support to passt (part 2)
Extract buffers management code from tcp.c and move it to tcp_buf.c tcp.c keeps all the generic code and will be also used by the vhost-user functions. Also compare mode to MODE_PASTA, as we will manage vhost-user mode (MODE_VU) like MODE_PASST. v6: - reorder declarations, align summing, add curly brackets - rename tap_handler_all() to tap_handler() - rename packet_add_all_do() to tap_add_packet and remove func and line. - rename pool_flush_all() to tap_flush_pools() - rename tcp_fill_flag_header() to tcp_prepare_flags() - set optlen to 0 in tcp_prepare_flags() v5: - remove: [PATCH v4 01/10] tcp: inline tcp_l2_buf_fill_headers() - merge [PATCH v4 09/10] tcp: remove tap_hdr parameter into tcp: extract buffer management from tcp_send_flag() - update comments v4: - remove "tcp: extract buffer management from tcp_conn_tap_mss()" as the MSS size can be the same between socket and vhost-user. - rename tcp_send_flag() and tcp_data_from_sock() to tcp_buf_send_flag() and tcp_buf_data_from_sock() v3: - add 3 new patches tap: use in->buf_size rather than sizeof(pkt_buf) tcp: remove tap_hdr parameter iov: remove iov_copy() v2: - compare to MODE_PASTA in conf_open_files() too - move taph out of udp_update_hdr4()/udp_update_hdr6() Laurent Vivier (8): tcp: extract buffer management from tcp_send_flag() tcp: move buffers management functions to their own file tap: refactor packets handling functions udp: refactor UDP header update functions udp: rename udp_sock_handler() to udp_buf_sock_handler() vhost-user: compare mode MODE_PASTA and not MODE_PASST iov: remove iov_copy() tap: use in->buf_size rather than sizeof(pkt_buf) Makefile | 5 +- conf.c | 14 +- iov.c | 39 ---- iov.h | 3 - isolation.c | 10 +- passt.c | 4 +- tap.c | 134 +++++++----- tap.h | 3 + tcp.c | 580 ++++--------------------------------------------- tcp_buf.c | 513 +++++++++++++++++++++++++++++++++++++++++++ tcp_buf.h | 16 ++ tcp_internal.h | 96 ++++++++ udp.c | 68 +++--- udp.h | 2 +- 14 files changed, 799 insertions(+), 688 deletions(-) create mode 100644 tcp_buf.c create mode 100644 tcp_buf.h create mode 100644 tcp_internal.h -- 2.45.2
This commit isolates the internal data structure management used for storing
data (e.g., tcp4_l2_flags_iov[], tcp6_l2_flags_iov[], tcp4_flags_ip[],
tcp4_flags[], ...) from the tcp_send_flag() function. The extracted
functionality is relocated to a new function named tcp_fill_flag_header().
tcp_fill_flag_header() is now a generic function that accepts parameters such
as struct tcphdr and a data pointer. tcp_send_flag() utilizes this parameter to
pass memory pointers from tcp4_l2_flags_iov[] and tcp6_l2_flags_iov[].
This separation sets the stage for utilizing tcp_fill_flag_header() to
set the memory provided by the guest via vhost-user in future developments.
Signed-off-by: Laurent Vivier
I couldn't find out why this patch breaks the the pasta_podman/bats
test, yet, that is:
not ok 19 [505] Single TCP port forwarding, IPv4, tap
# (from function `bail-now' in file test/podman/test/system/helpers.bash, line 235,
# from function `assert' in file test/podman/test/system/helpers.bash, line 929,
# from function `pasta_test_do' in file test/podman/test/system/505-networking-pasta.bats, line 239,
# in test file test/podman/test/system/505-networking-pasta.bats, line 472)
# `pasta_test_do' failed
#
# [22:54:18.306131353] $ test/podman/bin/podman rm -t 0 --all --force --ignore
#
# [22:54:18.367462243] $ test/podman/bin/podman ps --all --external --format {{.ID}} {{.Names}}
#
# [22:54:18.394935392] $ test/podman/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
# [22:54:18.419773379] quay.io/libpod/testimage:20240123 1f6acd4c4a1d
#
# [22:54:19.246631856] $ test/podman/bin/podman info --format {{.Host.Pasta.Executable}}
# [22:54:20.084392405] /home/sbrivio/passt/pasta
#
# [22:54:20.167980222] $ test/podman/bin/podman run --net=pasta -p [88.198.0.164]:5727:5727/tcp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5727 5727); do socat -u TCP4-LISTEN:${port},bind=[88.198.0.164] STDOUT & done; wait
# [22:54:37.256040883] x2024/06/12 20:54:37 socat[3] E read(6, 0x7fe675cd6000, 8192): Connection reset by peer
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #| FAIL: Mismatch between data sent and received
# #| expected: = x
# #| actual: x2024/06/12 20:54:37 socat\[3\] E read\(6\, 0x7fe675cd6000\, 8192\): Connection reset by peer
# #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meaning that the data transfer is actually fine, but we reset the
connection instead of an orderly shutdown.
I found a few issues while looking for that:
On Wed, 12 Jun 2024 17:47:27 +0200
Laurent Vivier
This commit isolates the internal data structure management used for storing data (e.g., tcp4_l2_flags_iov[], tcp6_l2_flags_iov[], tcp4_flags_ip[], tcp4_flags[], ...) from the tcp_send_flag() function. The extracted functionality is relocated to a new function named tcp_fill_flag_header().
tcp_fill_flag_header() is now a generic function that accepts parameters such as struct tcphdr and a data pointer. tcp_send_flag() utilizes this parameter to pass memory pointers from tcp4_l2_flags_iov[] and tcp6_l2_flags_iov[].
This separation sets the stage for utilizing tcp_fill_flag_header() to set the memory provided by the guest via vhost-user in future developments.
Signed-off-by: Laurent Vivier
--- Notes: v6: - rename tcp_fill_flag_header() to tcp_prepare_flags() - set optlen to 0 in tcp_prepare_flags()
v5: - use tcp_fill_flag_header() rather than tcp_fill_headers4() and tcp_fill_headers6().
tcp.c | 72 +++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 48 insertions(+), 24 deletions(-)
diff --git a/tcp.c b/tcp.c index dd8d46e08628..6800209d4122 100644 --- a/tcp.c +++ b/tcp.c @@ -1567,24 +1567,25 @@ static void tcp_update_seqack_from_tap(const struct ctx *c, }
/** - * tcp_send_flag() - Send segment with flags to tap (no payload) + * tcp_prepare_flags() - Prepare header for flags-only segment (no payload) * @c: Execution context * @conn: Connection pointer * @flags: TCP flags: if not set, send segment only if ACK is due + * @th: TCP header to update + * @data: buffer to store TCP option + * @optlen: size of the TCP option buffer (output parameter) * - * Return: negative error code on connection reset, 0 otherwise + * Return: < 0 error code on connection reset, + * 0 if there is no flag to send + * 1 otherwise
This is often called with if (tcp_send_flag(...)) or if (!tcp_send_flag(...)). Those need to be replaced with if (tcp_send_flag(...) < 0) or if (tcp_send_flag(...) >= 0) in the callers.
*/ -static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) +static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, + int flags, struct tcphdr *th, char *data, + size_t *optlen) { - struct tcp_flags_t *payload; struct tcp_info tinfo = { 0 }; socklen_t sl = sizeof(tinfo); int s = conn->sock; - size_t optlen = 0; - struct tcphdr *th; - struct iovec *iov; - size_t l4len; - char *data;
if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && !flags && conn->wnd_to_tap) @@ -1606,20 +1607,11 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0;
- if (CONN_V4(conn)) - iov = tcp4_l2_flags_iov[tcp4_flags_used++]; - else - iov = tcp6_l2_flags_iov[tcp6_flags_used++]; - - payload = iov[TCP_IOV_PAYLOAD].iov_base; - th = &payload->th; - data = payload->opts; - if (flags & SYN) { int mss;
/* Options: MSS, NOP and window scale (8 bytes) */ - optlen = OPT_MSS_LEN + 1 + OPT_WS_LEN; + *optlen = OPT_MSS_LEN + 1 + OPT_WS_LEN;
*data++ = OPT_MSS; *data++ = OPT_MSS_LEN; @@ -1651,19 +1643,16 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { flags |= ACK; + *optlen = 0;
*optlen also needs to be set to 0 if (flags & RST), say: } else { *optlen = 0; if (!(flags & RST)) flags |= ACK; }
}
- th->doff = (sizeof(*th) + optlen) / 4; + th->doff = (sizeof(*th) + *optlen) / 4;
th->ack = !!(flags & ACK); th->rst = !!(flags & RST); th->syn = !!(flags & SYN); th->fin = !!(flags & FIN);
- l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, - conn->seq_to_tap); - iov[TCP_IOV_PAYLOAD].iov_len = l4len; - if (th->ack) { if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap)) conn_flag(c, conn, ~ACK_TO_TAP_DUE); @@ -1678,6 +1667,41 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) if (th->fin || th->syn) conn->seq_to_tap++;
+ return 1; +} + +/** + * tcp_send_flag() - Send segment with flags to tap (no payload) + * @c: Execution context + * @conn: Connection pointer + * @flags: TCP flags: if not set, send segment only if ACK is due + * + * Return: negative error code on connection reset, 0 otherwise + */ +static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) +{ + struct tcp_flags_t *payload; + struct iovec *iov; + size_t optlen; + size_t l4len; + int ret; + + if (CONN_V4(conn)) + iov = tcp4_l2_flags_iov[tcp4_flags_used++]; + else + iov = tcp6_l2_flags_iov[tcp6_flags_used++];
We increase the counters here, but we don't decrease them back if we hit if (ret <= 0) return ret; later.
+ + payload = iov[TCP_IOV_PAYLOAD].iov_base; + + ret = tcp_prepare_flags(c, conn, flags, &payload->th, + payload->opts, &optlen); + if (ret <= 0) + return ret; + + l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, + conn->seq_to_tap); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; + if (flags & DUP_ACK) { struct iovec *dup_iov; int i;
Here's the diff I have so far, I'm not necessarily recommending any of that, it was just a quick try: diff --git a/tcp.c b/tcp.c index 6800209..6bff6bc 100644 --- a/tcp.c +++ b/tcp.c @@ -1644,6 +1644,8 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, } else if (!(flags & RST)) { flags |= ACK; *optlen = 0; + } else { + *optlen = 0; } th->doff = (sizeof(*th) + *optlen) / 4; @@ -1695,8 +1697,14 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) ret = tcp_prepare_flags(c, conn, flags, &payload->th, payload->opts, &optlen); - if (ret <= 0) + if (ret <= 0) { + if (CONN_V4(conn)) + tcp4_flags_used--; + else + tcp6_flags_used--; + return ret; + } l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, conn->seq_to_tap); @@ -1738,7 +1746,7 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn) if (conn->events == CLOSED) return; - if (!tcp_send_flag(c, conn, RST)) + if (tcp_send_flag(c, conn, RST) >= 0) conn_event(c, conn, CLOSED); } @@ -2111,7 +2119,7 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af, } else { tcp_get_sndbuf(conn); - if (tcp_send_flag(c, conn, SYN | ACK)) + if (tcp_send_flag(c, conn, SYN | ACK) < 0) goto cancel; conn_event(c, conn, TAP_SYN_ACK_SENT); @@ -2282,9 +2290,11 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) if (!len) { if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { - if ((ret = tcp_send_flag(c, conn, FIN | ACK))) { + int rc = tcp_send_flag(c, conn, FIN | ACK); + + if (rc < 0) { tcp_rst(c, conn); - return ret; + return rc; } conn_event(c, conn, TAP_FIN_SENT); @@ -2716,7 +2726,7 @@ static void tcp_connect_finish(struct ctx *c, struct tcp_tap_conn *conn) return; } - if (tcp_send_flag(c, conn, SYN | ACK)) + if (tcp_send_flag(c, conn, SYN | ACK) < 0) return; conn_event(c, conn, TAP_SYN_ACK_SENT); -- Stefano
On Wed, 12 Jun 2024 23:22:10 +0200
Stefano Brivio
I couldn't find out why this patch breaks the the pasta_podman/bats test, yet, that is:
not ok 19 [505] Single TCP port forwarding, IPv4, tap # (from function `bail-now' in file test/podman/test/system/helpers.bash, line 235, # from function `assert' in file test/podman/test/system/helpers.bash, line 929, # from function `pasta_test_do' in file test/podman/test/system/505-networking-pasta.bats, line 239, # in test file test/podman/test/system/505-networking-pasta.bats, line 472) # `pasta_test_do' failed # # [22:54:18.306131353] $ test/podman/bin/podman rm -t 0 --all --force --ignore # # [22:54:18.367462243] $ test/podman/bin/podman ps --all --external --format {{.ID}} {{.Names}} # # [22:54:18.394935392] $ test/podman/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}} # [22:54:18.419773379] quay.io/libpod/testimage:20240123 1f6acd4c4a1d # # [22:54:19.246631856] $ test/podman/bin/podman info --format {{.Host.Pasta.Executable}} # [22:54:20.084392405] /home/sbrivio/passt/pasta # # [22:54:20.167980222] $ test/podman/bin/podman run --net=pasta -p [88.198.0.164]:5727:5727/tcp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5727 5727); do socat -u TCP4-LISTEN:${port},bind=[88.198.0.164] STDOUT & done; wait # [22:54:37.256040883] x2024/06/12 20:54:37 socat[3] E read(6, 0x7fe675cd6000, 8192): Connection reset by peer # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: Mismatch between data sent and received # #| expected: = x # #| actual: x2024/06/12 20:54:37 socat\[3\] E read\(6\, 0x7fe675cd6000\, 8192\): Connection reset by peer # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meaning that the data transfer is actually fine, but we reset the connection instead of an orderly shutdown.
I found a few issues while looking for that:
On Wed, 12 Jun 2024 17:47:27 +0200 Laurent Vivier
wrote: This commit isolates the internal data structure management used for storing data (e.g., tcp4_l2_flags_iov[], tcp6_l2_flags_iov[], tcp4_flags_ip[], tcp4_flags[], ...) from the tcp_send_flag() function. The extracted functionality is relocated to a new function named tcp_fill_flag_header().
tcp_fill_flag_header() is now a generic function that accepts parameters such as struct tcphdr and a data pointer. tcp_send_flag() utilizes this parameter to pass memory pointers from tcp4_l2_flags_iov[] and tcp6_l2_flags_iov[].
This separation sets the stage for utilizing tcp_fill_flag_header() to set the memory provided by the guest via vhost-user in future developments.
Signed-off-by: Laurent Vivier
--- Notes: v6: - rename tcp_fill_flag_header() to tcp_prepare_flags() - set optlen to 0 in tcp_prepare_flags()
v5: - use tcp_fill_flag_header() rather than tcp_fill_headers4() and tcp_fill_headers6().
tcp.c | 72 +++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 48 insertions(+), 24 deletions(-)
diff --git a/tcp.c b/tcp.c index dd8d46e08628..6800209d4122 100644 --- a/tcp.c +++ b/tcp.c @@ -1567,24 +1567,25 @@ static void tcp_update_seqack_from_tap(const struct ctx *c, }
/** - * tcp_send_flag() - Send segment with flags to tap (no payload) + * tcp_prepare_flags() - Prepare header for flags-only segment (no payload) * @c: Execution context * @conn: Connection pointer * @flags: TCP flags: if not set, send segment only if ACK is due + * @th: TCP header to update + * @data: buffer to store TCP option + * @optlen: size of the TCP option buffer (output parameter) * - * Return: negative error code on connection reset, 0 otherwise + * Return: < 0 error code on connection reset, + * 0 if there is no flag to send + * 1 otherwise
This is often called with if (tcp_send_flag(...)) or if (!tcp_send_flag(...)). Those need to be replaced with if (tcp_send_flag(...) < 0) or if (tcp_send_flag(...) >= 0) in the callers.
Ah, no, sorry, you already took care of this in the new tcp_send_flag(). Then there must be something else...
*/ -static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) +static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, + int flags, struct tcphdr *th, char *data, + size_t *optlen) { - struct tcp_flags_t *payload; struct tcp_info tinfo = { 0 }; socklen_t sl = sizeof(tinfo); int s = conn->sock; - size_t optlen = 0; - struct tcphdr *th; - struct iovec *iov; - size_t l4len; - char *data;
if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && !flags && conn->wnd_to_tap) @@ -1606,20 +1607,11 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0;
- if (CONN_V4(conn)) - iov = tcp4_l2_flags_iov[tcp4_flags_used++]; - else - iov = tcp6_l2_flags_iov[tcp6_flags_used++]; - - payload = iov[TCP_IOV_PAYLOAD].iov_base; - th = &payload->th; - data = payload->opts; - if (flags & SYN) { int mss;
/* Options: MSS, NOP and window scale (8 bytes) */ - optlen = OPT_MSS_LEN + 1 + OPT_WS_LEN; + *optlen = OPT_MSS_LEN + 1 + OPT_WS_LEN;
*data++ = OPT_MSS; *data++ = OPT_MSS_LEN; @@ -1651,19 +1643,16 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { flags |= ACK; + *optlen = 0;
*optlen also needs to be set to 0 if (flags & RST), say:
} else { *optlen = 0;
if (!(flags & RST)) flags |= ACK; }
}
- th->doff = (sizeof(*th) + optlen) / 4; + th->doff = (sizeof(*th) + *optlen) / 4;
th->ack = !!(flags & ACK); th->rst = !!(flags & RST); th->syn = !!(flags & SYN); th->fin = !!(flags & FIN);
- l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, - conn->seq_to_tap); - iov[TCP_IOV_PAYLOAD].iov_len = l4len; - if (th->ack) { if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap)) conn_flag(c, conn, ~ACK_TO_TAP_DUE); @@ -1678,6 +1667,41 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) if (th->fin || th->syn) conn->seq_to_tap++;
+ return 1; +} + +/** + * tcp_send_flag() - Send segment with flags to tap (no payload) + * @c: Execution context + * @conn: Connection pointer + * @flags: TCP flags: if not set, send segment only if ACK is due + * + * Return: negative error code on connection reset, 0 otherwise + */ +static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) +{ + struct tcp_flags_t *payload; + struct iovec *iov; + size_t optlen; + size_t l4len; + int ret; + + if (CONN_V4(conn)) + iov = tcp4_l2_flags_iov[tcp4_flags_used++]; + else + iov = tcp6_l2_flags_iov[tcp6_flags_used++];
We increase the counters here, but we don't decrease them back if we hit if (ret <= 0) return ret; later.
+ + payload = iov[TCP_IOV_PAYLOAD].iov_base; + + ret = tcp_prepare_flags(c, conn, flags, &payload->th, + payload->opts, &optlen); + if (ret <= 0) + return ret; + + l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, + conn->seq_to_tap); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; + if (flags & DUP_ACK) { struct iovec *dup_iov; int i;
Here's the diff I have so far, I'm not necessarily recommending any of that, it was just a quick try:
diff --git a/tcp.c b/tcp.c index 6800209..6bff6bc 100644 --- a/tcp.c +++ b/tcp.c @@ -1644,6 +1644,8 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, } else if (!(flags & RST)) { flags |= ACK; *optlen = 0; + } else { + *optlen = 0; }
th->doff = (sizeof(*th) + *optlen) / 4; @@ -1695,8 +1697,14 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
ret = tcp_prepare_flags(c, conn, flags, &payload->th, payload->opts, &optlen); - if (ret <= 0) + if (ret <= 0) { + if (CONN_V4(conn)) + tcp4_flags_used--; + else + tcp6_flags_used--; + return ret; + }
l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, conn->seq_to_tap); @@ -1738,7 +1746,7 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn) if (conn->events == CLOSED) return;
- if (!tcp_send_flag(c, conn, RST)) + if (tcp_send_flag(c, conn, RST) >= 0) conn_event(c, conn, CLOSED); }
@@ -2111,7 +2119,7 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af, } else { tcp_get_sndbuf(conn);
- if (tcp_send_flag(c, conn, SYN | ACK)) + if (tcp_send_flag(c, conn, SYN | ACK) < 0) goto cancel;
conn_event(c, conn, TAP_SYN_ACK_SENT); @@ -2282,9 +2290,11 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn)
if (!len) { if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { - if ((ret = tcp_send_flag(c, conn, FIN | ACK))) { + int rc = tcp_send_flag(c, conn, FIN | ACK); + + if (rc < 0) { tcp_rst(c, conn); - return ret; + return rc; }
conn_event(c, conn, TAP_FIN_SENT); @@ -2716,7 +2726,7 @@ static void tcp_connect_finish(struct ctx *c, struct tcp_tap_conn *conn) return; }
- if (tcp_send_flag(c, conn, SYN | ACK)) + if (tcp_send_flag(c, conn, SYN | ACK) < 0) return;
conn_event(c, conn, TAP_SYN_ACK_SENT);
-- Stefano
Hi, I think the problem can be because tcp_l2_buf_fill_headers() has been moved out of tcp_prepare_flags() and so moved after: if (th->fin || th->syn) conn->seq_to_tap++; and con->seq_to_tap is also a parameter of tcp_l2_buf_fill_headers(). So it is increased before and not after. Could you try: diff --git a/tcp.c b/tcp.c index 6800209d4122..647f42291fcf 100644 --- a/tcp.c +++ b/tcp.c @@ -1607,6 +1607,7 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0; + *optlen = 0; if (flags & SYN) { int mss; @@ -1643,7 +1644,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { flags |= ACK; - *optlen = 0; } th->doff = (sizeof(*th) + *optlen) / 4; @@ -1663,10 +1663,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (th->fin) conn_flag(c, conn, ACK_FROM_TAP_DUE); - /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ - if (th->fin || th->syn) - conn->seq_to_tap++; - return 1; } @@ -1702,6 +1698,10 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) conn->seq_to_tap); iov[TCP_IOV_PAYLOAD].iov_len = l4len; + /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ + if (th->fin || th->syn) + conn->seq_to_tap++; + if (flags & DUP_ACK) { struct iovec *dup_iov; int i; Thanks, Laurent On 13/06/2024 08:07, Stefano Brivio wrote:
On Wed, 12 Jun 2024 23:22:10 +0200 Stefano Brivio
wrote: I couldn't find out why this patch breaks the the pasta_podman/bats test, yet, that is:
not ok 19 [505] Single TCP port forwarding, IPv4, tap # (from function `bail-now' in file test/podman/test/system/helpers.bash, line 235, # from function `assert' in file test/podman/test/system/helpers.bash, line 929, # from function `pasta_test_do' in file test/podman/test/system/505-networking-pasta.bats, line 239, # in test file test/podman/test/system/505-networking-pasta.bats, line 472) # `pasta_test_do' failed # # [22:54:18.306131353] $ test/podman/bin/podman rm -t 0 --all --force --ignore # # [22:54:18.367462243] $ test/podman/bin/podman ps --all --external --format {{.ID}} {{.Names}} # # [22:54:18.394935392] $ test/podman/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}} # [22:54:18.419773379] quay.io/libpod/testimage:20240123 1f6acd4c4a1d # # [22:54:19.246631856] $ test/podman/bin/podman info --format {{.Host.Pasta.Executable}} # [22:54:20.084392405] /home/sbrivio/passt/pasta # # [22:54:20.167980222] $ test/podman/bin/podman run --net=pasta -p [88.198.0.164]:5727:5727/tcp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5727 5727); do socat -u TCP4-LISTEN:${port},bind=[88.198.0.164] STDOUT & done; wait # [22:54:37.256040883] x2024/06/12 20:54:37 socat[3] E read(6, 0x7fe675cd6000, 8192): Connection reset by peer # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: Mismatch between data sent and received # #| expected: = x # #| actual: x2024/06/12 20:54:37 socat\[3\] E read\(6\, 0x7fe675cd6000\, 8192\): Connection reset by peer # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meaning that the data transfer is actually fine, but we reset the connection instead of an orderly shutdown.
I found a few issues while looking for that:
On Wed, 12 Jun 2024 17:47:27 +0200 Laurent Vivier
wrote: This commit isolates the internal data structure management used for storing data (e.g., tcp4_l2_flags_iov[], tcp6_l2_flags_iov[], tcp4_flags_ip[], tcp4_flags[], ...) from the tcp_send_flag() function. The extracted functionality is relocated to a new function named tcp_fill_flag_header().
tcp_fill_flag_header() is now a generic function that accepts parameters such as struct tcphdr and a data pointer. tcp_send_flag() utilizes this parameter to pass memory pointers from tcp4_l2_flags_iov[] and tcp6_l2_flags_iov[].
This separation sets the stage for utilizing tcp_fill_flag_header() to set the memory provided by the guest via vhost-user in future developments.
Signed-off-by: Laurent Vivier
--- Notes: v6: - rename tcp_fill_flag_header() to tcp_prepare_flags() - set optlen to 0 in tcp_prepare_flags()
v5: - use tcp_fill_flag_header() rather than tcp_fill_headers4() and tcp_fill_headers6().
tcp.c | 72 +++++++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 48 insertions(+), 24 deletions(-)
diff --git a/tcp.c b/tcp.c index dd8d46e08628..6800209d4122 100644 --- a/tcp.c +++ b/tcp.c @@ -1567,24 +1567,25 @@ static void tcp_update_seqack_from_tap(const struct ctx *c, }
/** - * tcp_send_flag() - Send segment with flags to tap (no payload) + * tcp_prepare_flags() - Prepare header for flags-only segment (no payload) * @c: Execution context * @conn: Connection pointer * @flags: TCP flags: if not set, send segment only if ACK is due + * @th: TCP header to update + * @data: buffer to store TCP option + * @optlen: size of the TCP option buffer (output parameter) * - * Return: negative error code on connection reset, 0 otherwise + * Return: < 0 error code on connection reset, + * 0 if there is no flag to send + * 1 otherwise
This is often called with if (tcp_send_flag(...)) or if (!tcp_send_flag(...)). Those need to be replaced with if (tcp_send_flag(...) < 0) or if (tcp_send_flag(...) >= 0) in the callers.
Ah, no, sorry, you already took care of this in the new tcp_send_flag(). Then there must be something else...
*/ -static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) +static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, + int flags, struct tcphdr *th, char *data, + size_t *optlen) { - struct tcp_flags_t *payload; struct tcp_info tinfo = { 0 }; socklen_t sl = sizeof(tinfo); int s = conn->sock; - size_t optlen = 0; - struct tcphdr *th; - struct iovec *iov; - size_t l4len; - char *data;
if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && !flags && conn->wnd_to_tap) @@ -1606,20 +1607,11 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0;
- if (CONN_V4(conn)) - iov = tcp4_l2_flags_iov[tcp4_flags_used++]; - else - iov = tcp6_l2_flags_iov[tcp6_flags_used++]; - - payload = iov[TCP_IOV_PAYLOAD].iov_base; - th = &payload->th; - data = payload->opts; - if (flags & SYN) { int mss;
/* Options: MSS, NOP and window scale (8 bytes) */ - optlen = OPT_MSS_LEN + 1 + OPT_WS_LEN; + *optlen = OPT_MSS_LEN + 1 + OPT_WS_LEN;
*data++ = OPT_MSS; *data++ = OPT_MSS_LEN; @@ -1651,19 +1643,16 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { flags |= ACK; + *optlen = 0;
*optlen also needs to be set to 0 if (flags & RST), say:
} else { *optlen = 0;
if (!(flags & RST)) flags |= ACK; }
}
- th->doff = (sizeof(*th) + optlen) / 4; + th->doff = (sizeof(*th) + *optlen) / 4;
th->ack = !!(flags & ACK); th->rst = !!(flags & RST); th->syn = !!(flags & SYN); th->fin = !!(flags & FIN);
- l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, - conn->seq_to_tap); - iov[TCP_IOV_PAYLOAD].iov_len = l4len; - if (th->ack) { if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap)) conn_flag(c, conn, ~ACK_TO_TAP_DUE); @@ -1678,6 +1667,41 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) if (th->fin || th->syn) conn->seq_to_tap++;
+ return 1; +} + +/** + * tcp_send_flag() - Send segment with flags to tap (no payload) + * @c: Execution context + * @conn: Connection pointer + * @flags: TCP flags: if not set, send segment only if ACK is due + * + * Return: negative error code on connection reset, 0 otherwise + */ +static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) +{ + struct tcp_flags_t *payload; + struct iovec *iov; + size_t optlen; + size_t l4len; + int ret; + + if (CONN_V4(conn)) + iov = tcp4_l2_flags_iov[tcp4_flags_used++]; + else + iov = tcp6_l2_flags_iov[tcp6_flags_used++];
We increase the counters here, but we don't decrease them back if we hit if (ret <= 0) return ret; later.
+ + payload = iov[TCP_IOV_PAYLOAD].iov_base; + + ret = tcp_prepare_flags(c, conn, flags, &payload->th, + payload->opts, &optlen); + if (ret <= 0) + return ret; + + l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, + conn->seq_to_tap); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; + if (flags & DUP_ACK) { struct iovec *dup_iov; int i;
Here's the diff I have so far, I'm not necessarily recommending any of that, it was just a quick try:
diff --git a/tcp.c b/tcp.c index 6800209..6bff6bc 100644 --- a/tcp.c +++ b/tcp.c @@ -1644,6 +1644,8 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, } else if (!(flags & RST)) { flags |= ACK; *optlen = 0; + } else { + *optlen = 0; }
th->doff = (sizeof(*th) + *optlen) / 4; @@ -1695,8 +1697,14 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags)
ret = tcp_prepare_flags(c, conn, flags, &payload->th, payload->opts, &optlen); - if (ret <= 0) + if (ret <= 0) { + if (CONN_V4(conn)) + tcp4_flags_used--; + else + tcp6_flags_used--; + return ret; + }
l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, conn->seq_to_tap); @@ -1738,7 +1746,7 @@ static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn) if (conn->events == CLOSED) return;
- if (!tcp_send_flag(c, conn, RST)) + if (tcp_send_flag(c, conn, RST) >= 0) conn_event(c, conn, CLOSED); }
@@ -2111,7 +2119,7 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af, } else { tcp_get_sndbuf(conn);
- if (tcp_send_flag(c, conn, SYN | ACK)) + if (tcp_send_flag(c, conn, SYN | ACK) < 0) goto cancel;
conn_event(c, conn, TAP_SYN_ACK_SENT); @@ -2282,9 +2290,11 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn)
if (!len) { if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { - if ((ret = tcp_send_flag(c, conn, FIN | ACK))) { + int rc = tcp_send_flag(c, conn, FIN | ACK); + + if (rc < 0) { tcp_rst(c, conn); - return ret; + return rc; }
conn_event(c, conn, TAP_FIN_SENT); @@ -2716,7 +2726,7 @@ static void tcp_connect_finish(struct ctx *c, struct tcp_tap_conn *conn) return; }
- if (tcp_send_flag(c, conn, SYN | ACK)) + if (tcp_send_flag(c, conn, SYN | ACK) < 0) return;
conn_event(c, conn, TAP_SYN_ACK_SENT);
On Thu, 13 Jun 2024 10:24:11 +0200
Laurent Vivier
Hi,
I think the problem can be because tcp_l2_buf_fill_headers() has been moved out of tcp_prepare_flags() and so moved after:
if (th->fin || th->syn) conn->seq_to_tap++;
and con->seq_to_tap is also a parameter of tcp_l2_buf_fill_headers(). So it is increased before and not after.
Could you try:
diff --git a/tcp.c b/tcp.c index 6800209d4122..647f42291fcf 100644 --- a/tcp.c +++ b/tcp.c @@ -1607,6 +1607,7 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0;
+ *optlen = 0; if (flags & SYN) { int mss;
@@ -1643,7 +1644,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { flags |= ACK; - *optlen = 0; }
th->doff = (sizeof(*th) + *optlen) / 4; @@ -1663,10 +1663,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (th->fin) conn_flag(c, conn, ACK_FROM_TAP_DUE);
- /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ - if (th->fin || th->syn) - conn->seq_to_tap++; - return 1; }
@@ -1702,6 +1698,10 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) conn->seq_to_tap); iov[TCP_IOV_PAYLOAD].iov_len = l4len;
+ /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ + if (th->fin || th->syn) + conn->seq_to_tap++; + if (flags & DUP_ACK) { struct iovec *dup_iov; int i;
Ah, yes, good catch, I missed this one! It works. Note that it needs to be: if ((flags & FIN) || (flags & SYN)) ... because we don't have a TCP header there, yet. -- Stefano
On 13/06/2024 12:14, Stefano Brivio wrote:
On Thu, 13 Jun 2024 10:24:11 +0200 Laurent Vivier
wrote: Hi,
I think the problem can be because tcp_l2_buf_fill_headers() has been moved out of tcp_prepare_flags() and so moved after:
if (th->fin || th->syn) conn->seq_to_tap++;
and con->seq_to_tap is also a parameter of tcp_l2_buf_fill_headers(). So it is increased before and not after.
Could you try:
diff --git a/tcp.c b/tcp.c index 6800209d4122..647f42291fcf 100644 --- a/tcp.c +++ b/tcp.c @@ -1607,6 +1607,7 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0;
+ *optlen = 0; if (flags & SYN) { int mss;
@@ -1643,7 +1644,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { flags |= ACK; - *optlen = 0; }
th->doff = (sizeof(*th) + *optlen) / 4; @@ -1663,10 +1663,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (th->fin) conn_flag(c, conn, ACK_FROM_TAP_DUE);
- /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ - if (th->fin || th->syn) - conn->seq_to_tap++; - return 1; }
@@ -1702,6 +1698,10 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) conn->seq_to_tap); iov[TCP_IOV_PAYLOAD].iov_len = l4len;
+ /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ + if (th->fin || th->syn) + conn->seq_to_tap++; + if (flags & DUP_ACK) { struct iovec *dup_iov; int i;
Ah, yes, good catch, I missed this one! It works. Note that it needs to be:
if ((flags & FIN) || (flags & SYN)) ...
because we don't have a TCP header there, yet.
I'm wondering if I can do: + seq = conn->seq_to_tap; ret = tcp_prepare_flags(c, conn, flags, &payload->th, payload->opts, &optlen); if (ret <= 0) return ret; l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, - conn->seq_to_tap); + seq); to keep the flags in tcp_prepare_flags()? Thanks, Laurent
On Thu, 13 Jun 2024 12:22:54 +0200
Laurent Vivier
On 13/06/2024 12:14, Stefano Brivio wrote:
On Thu, 13 Jun 2024 10:24:11 +0200 Laurent Vivier
wrote: Hi,
I think the problem can be because tcp_l2_buf_fill_headers() has been moved out of tcp_prepare_flags() and so moved after:
if (th->fin || th->syn) conn->seq_to_tap++;
and con->seq_to_tap is also a parameter of tcp_l2_buf_fill_headers(). So it is increased before and not after.
Could you try:
diff --git a/tcp.c b/tcp.c index 6800209d4122..647f42291fcf 100644 --- a/tcp.c +++ b/tcp.c @@ -1607,6 +1607,7 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0;
+ *optlen = 0; if (flags & SYN) { int mss;
@@ -1643,7 +1644,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { flags |= ACK; - *optlen = 0; }
th->doff = (sizeof(*th) + *optlen) / 4; @@ -1663,10 +1663,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (th->fin) conn_flag(c, conn, ACK_FROM_TAP_DUE);
- /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ - if (th->fin || th->syn) - conn->seq_to_tap++; - return 1; }
@@ -1702,6 +1698,10 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) conn->seq_to_tap); iov[TCP_IOV_PAYLOAD].iov_len = l4len;
+ /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ + if (th->fin || th->syn) + conn->seq_to_tap++; + if (flags & DUP_ACK) { struct iovec *dup_iov; int i;
Ah, yes, good catch, I missed this one! It works. Note that it needs to be:
if ((flags & FIN) || (flags & SYN)) ...
because we don't have a TCP header there, yet.
I'm wondering if I can do:
+ seq = conn->seq_to_tap; ret = tcp_prepare_flags(c, conn, flags, &payload->th, payload->opts, &optlen); if (ret <= 0) return ret;
l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, - conn->seq_to_tap); + seq);
to keep the flags in tcp_prepare_flags()?
Maybe, yes. I don't really have a strong preference. On the other hand tcp_send_flag() has to deal with the DUP_ACK case on its own anyway. But sure, this version would be a bit cleaner, and if vhost-user code will then want to call tcp_prepare_flags() without tcp_send_flag(), it avoids code duplication, I guess? -- Stefano
On 13/06/2024 12:49, Stefano Brivio wrote:
On Thu, 13 Jun 2024 12:22:54 +0200 Laurent Vivier
wrote: On 13/06/2024 12:14, Stefano Brivio wrote:
On Thu, 13 Jun 2024 10:24:11 +0200 Laurent Vivier
wrote: Hi,
I think the problem can be because tcp_l2_buf_fill_headers() has been moved out of tcp_prepare_flags() and so moved after:
if (th->fin || th->syn) conn->seq_to_tap++;
and con->seq_to_tap is also a parameter of tcp_l2_buf_fill_headers(). So it is increased before and not after.
Could you try:
diff --git a/tcp.c b/tcp.c index 6800209d4122..647f42291fcf 100644 --- a/tcp.c +++ b/tcp.c @@ -1607,6 +1607,7 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0;
+ *optlen = 0; if (flags & SYN) { int mss;
@@ -1643,7 +1644,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { flags |= ACK; - *optlen = 0; }
th->doff = (sizeof(*th) + *optlen) / 4; @@ -1663,10 +1663,6 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, if (th->fin) conn_flag(c, conn, ACK_FROM_TAP_DUE);
- /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ - if (th->fin || th->syn) - conn->seq_to_tap++; - return 1; }
@@ -1702,6 +1698,10 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) conn->seq_to_tap); iov[TCP_IOV_PAYLOAD].iov_len = l4len;
+ /* RFC 793, 3.1: "[...] and the first data octet is ISN+1." */ + if (th->fin || th->syn) + conn->seq_to_tap++; + if (flags & DUP_ACK) { struct iovec *dup_iov; int i;
Ah, yes, good catch, I missed this one! It works. Note that it needs to be:
if ((flags & FIN) || (flags & SYN)) ...
because we don't have a TCP header there, yet.
I'm wondering if I can do:
+ seq = conn->seq_to_tap; ret = tcp_prepare_flags(c, conn, flags, &payload->th, payload->opts, &optlen); if (ret <= 0) return ret;
l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, - conn->seq_to_tap); + seq);
to keep the flags in tcp_prepare_flags()?
Maybe, yes. I don't really have a strong preference. On the other hand tcp_send_flag() has to deal with the DUP_ACK case on its own anyway.
But sure, this version would be a bit cleaner, and if vhost-user code will then want to call tcp_prepare_flags() without tcp_send_flag(), it avoids code duplication, I guess?
Yes. I prepare a new version of the series with the changes. Thanks, Laurent
On 12/06/2024 23:22, Stefano Brivio wrote:
I couldn't find out why this patch breaks the the pasta_podman/bats test, yet, that is:
not ok 19 [505] Single TCP port forwarding, IPv4, tap # (from function `bail-now' in file test/podman/test/system/helpers.bash, line 235, # from function `assert' in file test/podman/test/system/helpers.bash, line 929, # from function `pasta_test_do' in file test/podman/test/system/505-networking-pasta.bats, line 239, # in test file test/podman/test/system/505-networking-pasta.bats, line 472) # `pasta_test_do' failed # # [22:54:18.306131353] $ test/podman/bin/podman rm -t 0 --all --force --ignore # # [22:54:18.367462243] $ test/podman/bin/podman ps --all --external --format {{.ID}} {{.Names}} # # [22:54:18.394935392] $ test/podman/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}} # [22:54:18.419773379] quay.io/libpod/testimage:20240123 1f6acd4c4a1d # # [22:54:19.246631856] $ test/podman/bin/podman info --format {{.Host.Pasta.Executable}} # [22:54:20.084392405] /home/sbrivio/passt/pasta # # [22:54:20.167980222] $ test/podman/bin/podman run --net=pasta -p [88.198.0.164]:5727:5727/tcp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5727 5727); do socat -u TCP4-LISTEN:${port},bind=[88.198.0.164] STDOUT & done; wait # [22:54:37.256040883] x2024/06/12 20:54:37 socat[3] E read(6, 0x7fe675cd6000, 8192): Connection reset by peer # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: Mismatch between data sent and received # #| expected: = x # #| actual: x2024/06/12 20:54:37 socat\[3\] E read\(6\, 0x7fe675cd6000\, 8192\): Connection reset by peer # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meaning that the data transfer is actually fine, but we reset the connection instead of an orderly shutdown.
How do you run this test? Thanks, Laurent
On Thu, 13 Jun 2024 09:31:44 +0200
Laurent Vivier
On 12/06/2024 23:22, Stefano Brivio wrote:
I couldn't find out why this patch breaks the the pasta_podman/bats test, yet, that is:
not ok 19 [505] Single TCP port forwarding, IPv4, tap # (from function `bail-now' in file test/podman/test/system/helpers.bash, line 235, # from function `assert' in file test/podman/test/system/helpers.bash, line 929, # from function `pasta_test_do' in file test/podman/test/system/505-networking-pasta.bats, line 239, # in test file test/podman/test/system/505-networking-pasta.bats, line 472) # `pasta_test_do' failed # # [22:54:18.306131353] $ test/podman/bin/podman rm -t 0 --all --force --ignore # # [22:54:18.367462243] $ test/podman/bin/podman ps --all --external --format {{.ID}} {{.Names}} # # [22:54:18.394935392] $ test/podman/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}} # [22:54:18.419773379] quay.io/libpod/testimage:20240123 1f6acd4c4a1d # # [22:54:19.246631856] $ test/podman/bin/podman info --format {{.Host.Pasta.Executable}} # [22:54:20.084392405] /home/sbrivio/passt/pasta # # [22:54:20.167980222] $ test/podman/bin/podman run --net=pasta -p [88.198.0.164]:5727:5727/tcp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5727 5727); do socat -u TCP4-LISTEN:${port},bind=[88.198.0.164] STDOUT & done; wait # [22:54:37.256040883] x2024/06/12 20:54:37 socat[3] E read(6, 0x7fe675cd6000, 8192): Connection reset by peer # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: Mismatch between data sent and received # #| expected: = x # #| actual: x2024/06/12 20:54:37 socat\[3\] E read\(6\, 0x7fe675cd6000\, 8192\): Connection reset by peer # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meaning that the data transfer is actually fine, but we reset the connection instead of an orderly shutdown.
How do you run this test?
It's in the test suite, see test/README.md. Make sure you have a local build of Podman with 'make podman' under test/. You can also skip other tests by commenting the rest out in test/run, then do ./run from test/. -- Stefano
On Thu, Jun 13, 2024 at 11:35:53AM +0200, Stefano Brivio wrote:
On Thu, 13 Jun 2024 09:31:44 +0200 Laurent Vivier
wrote: On 12/06/2024 23:22, Stefano Brivio wrote:
I couldn't find out why this patch breaks the the pasta_podman/bats test, yet, that is:
not ok 19 [505] Single TCP port forwarding, IPv4, tap # (from function `bail-now' in file test/podman/test/system/helpers.bash, line 235, # from function `assert' in file test/podman/test/system/helpers.bash, line 929, # from function `pasta_test_do' in file test/podman/test/system/505-networking-pasta.bats, line 239, # in test file test/podman/test/system/505-networking-pasta.bats, line 472) # `pasta_test_do' failed # # [22:54:18.306131353] $ test/podman/bin/podman rm -t 0 --all --force --ignore # # [22:54:18.367462243] $ test/podman/bin/podman ps --all --external --format {{.ID}} {{.Names}} # # [22:54:18.394935392] $ test/podman/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}} # [22:54:18.419773379] quay.io/libpod/testimage:20240123 1f6acd4c4a1d # # [22:54:19.246631856] $ test/podman/bin/podman info --format {{.Host.Pasta.Executable}} # [22:54:20.084392405] /home/sbrivio/passt/pasta # # [22:54:20.167980222] $ test/podman/bin/podman run --net=pasta -p [88.198.0.164]:5727:5727/tcp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5727 5727); do socat -u TCP4-LISTEN:${port},bind=[88.198.0.164] STDOUT & done; wait # [22:54:37.256040883] x2024/06/12 20:54:37 socat[3] E read(6, 0x7fe675cd6000, 8192): Connection reset by peer # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # #| FAIL: Mismatch between data sent and received # #| expected: = x # #| actual: x2024/06/12 20:54:37 socat\[3\] E read\(6\, 0x7fe675cd6000\, 8192\): Connection reset by peer # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
meaning that the data transfer is actually fine, but we reset the connection instead of an orderly shutdown.
How do you run this test?
It's in the test suite, see test/README.md. Make sure you have a local build of Podman with 'make podman' under test/.
Running 'make' in test/ will clone/update and build podman/
You can also skip other tests by commenting the rest out in test/run, then do ./run from test/.
You can also run them manually by going into test/system in the podman tree, and running CONTAINERS_HELPER_BINARY_DIR=/path/to/passt bats 505-networking-pasta.bats With either option you'll need bats installed (the Fedora package is just 'bats') -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Move all the TCP parts using internal buffers to tcp_buf.c
and keep generic TCP management functions in tcp.c.
Add tcp_internal.h to export needed functions from tcp.c and
tcp_buf.h from tcp_buf.c
With this change we can use existing TCP functions with a
different kind of memory storage as for instance the shared
memory provided by the guest via vhost-user.
Signed-off-by: Laurent Vivier
On Wed, 12 Jun 2024 17:47:28 +0200
Laurent Vivier
Move all the TCP parts using internal buffers to tcp_buf.c and keep generic TCP management functions in tcp.c. Add tcp_internal.h to export needed functions from tcp.c and tcp_buf.h from tcp_buf.c
With this change we can use existing TCP functions with a different kind of memory storage as for instance the shared memory provided by the guest via vhost-user.
Signed-off-by: Laurent Vivier
--- Notes: v5: - as we export now tcp_l2_buf_fill_headers() move also enum tcp_iov_part to tcp_internal.h
v4: - rename tcp_send_flag() and tcp_data_from_sock() to tcp_buf_send_flag() and tcp_buf_data_from_sock()
Makefile | 5 +- tcp.c | 562 ++----------------------------------------------- tcp_buf.c | 513 ++++++++++++++++++++++++++++++++++++++++++++ tcp_buf.h | 16 ++ tcp_internal.h | 96 +++++++++ 5 files changed, 648 insertions(+), 544 deletions(-) create mode 100644 tcp_buf.c create mode 100644 tcp_buf.h create mode 100644 tcp_internal.h
diff --git a/Makefile b/Makefile index e2180b599bdb..09fc461d087e 100644 --- a/Makefile +++ b/Makefile @@ -47,7 +47,7 @@ FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS) PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \ icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c \ ndp.c netlink.c packet.c passt.c pasta.c pcap.c pif.c tap.c tcp.c \ - tcp_splice.c udp.c util.c + tcp_buf.c tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c SRCS = $(PASST_SRCS) $(QRAP_SRCS)
@@ -56,7 +56,8 @@ MANPAGES = passt.1 pasta.1 qrap.1 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h fwd.h \ flow_table.h icmp.h icmp_flow.h inany.h iov.h ip.h isolation.h \ lineread.h log.h ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h \ - siphash.h tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h + siphash.h tap.h tcp.h tcp_buf.h tcp_conn.h tcp_internal.h tcp_splice.h \ + udp.h util.h HEADERS = $(PASST_HEADERS) seccomp.h
C := \#include
\nstruct tcp_info x = { .tcpi_snd_wnd = 0 }; diff --git a/tcp.c b/tcp.c index 6800209d4122..875e318c925b 100644 --- a/tcp.c +++ b/tcp.c @@ -302,28 +302,14 @@ #include "flow.h" #include "flow_table.h" - -#define TCP_FRAMES_MEM 128 -#define TCP_FRAMES \ - (c->mode == MODE_PASST ? TCP_FRAMES_MEM : 1) +#include "tcp_internal.h" +#include "tcp_buf.h"
#define TCP_HASH_TABLE_LOAD 70 /* % */ #define TCP_HASH_TABLE_SIZE (FLOW_MAX * 100 / TCP_HASH_TABLE_LOAD)
-#define MAX_WS 8 -#define MAX_WINDOW (1 << (16 + (MAX_WS))) - /* MSS rounding: see SET_MSS() */ #define MSS_DEFAULT 536 -#define MSS4 ROUND_DOWN(IP_MAX_MTU - \ - sizeof(struct tcphdr) - \ - sizeof(struct iphdr), \ - sizeof(uint32_t)) -#define MSS6 ROUND_DOWN(IP_MAX_MTU - \ - sizeof(struct tcphdr) - \ - sizeof(struct ipv6hdr), \ - sizeof(uint32_t)) - #define WINDOW_DEFAULT 14600 /* RFC 6928 */ #ifdef HAS_SND_WND # define KERNEL_REPORTS_SND_WND(c) ((c)->tcp.kernel_snd_wnd) @@ -345,33 +331,10 @@ */ #define SOL_TCP IPPROTO_TCP
-#define SEQ_LE(a, b) ((b) - (a) < MAX_WINDOW) -#define SEQ_LT(a, b) ((b) - (a) - 1 < MAX_WINDOW) -#define SEQ_GE(a, b) ((a) - (b) < MAX_WINDOW) -#define SEQ_GT(a, b) ((a) - (b) - 1 < MAX_WINDOW) - -#define FIN (1 << 0) -#define SYN (1 << 1) -#define RST (1 << 2) -#define ACK (1 << 4) -/* Flags for internal usage */ -#define DUP_ACK (1 << 5) #define ACK_IF_NEEDED 0 /* See tcp_send_flag() */
-#define OPT_EOL 0 -#define OPT_NOP 1 -#define OPT_MSS 2 -#define OPT_MSS_LEN 4 -#define OPT_WS 3 -#define OPT_WS_LEN 3 -#define OPT_SACKP 4 -#define OPT_SACK 5 -#define OPT_TS 8 - #define TAPSIDE(conn_) ((conn_)->f.pif[1] == PIF_TAP)
-#define CONN_V4(conn) (!!inany_v4(&(conn)->faddr)) -#define CONN_V6(conn) (!CONN_V4(conn)) #define CONN_IS_CLOSING(conn) \ (((conn)->events & ESTABLISHED) && \ ((conn)->events & (SOCK_FIN_RCVD | TAP_FIN_RCVD))) @@ -408,106 +371,7 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS]; */ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
-/* Static buffers */ -/** - * struct tcp_payload_t - TCP header and data to send segments with payload - * @th: TCP header - * @data: TCP data - */ -struct tcp_payload_t { - struct tcphdr th; - uint8_t data[IP_MAX_MTU - sizeof(struct tcphdr)]; -#ifdef __AVX2__ -} __attribute__ ((packed, aligned(32))); /* For AVX2 checksum routines */ -#else -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); -#endif - -/** - * struct tcp_flags_t - TCP header and data to send zero-length - * segments (flags) - * @th: TCP header - * @opts TCP options - */ -struct tcp_flags_t { - struct tcphdr th; - char opts[OPT_MSS_LEN + OPT_WS_LEN + 1]; -#ifdef __AVX2__ -} __attribute__ ((packed, aligned(32))); -#else -} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); -#endif - -/* Ethernet header for IPv4 frames */ -static struct ethhdr tcp4_eth_src; - -static struct tap_hdr tcp4_payload_tap_hdr[TCP_FRAMES_MEM]; -/* IPv4 headers */ -static struct iphdr tcp4_payload_ip[TCP_FRAMES_MEM]; -/* TCP segments with payload for IPv4 frames */ -static struct tcp_payload_t tcp4_payload[TCP_FRAMES_MEM]; - -static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is greater than 65516"); - -/* References tracking the owner connection of frames in the tap outqueue */ -static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM]; -static unsigned int tcp4_payload_used; - -static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM]; -/* IPv4 headers for TCP segment without payload */ -static struct iphdr tcp4_flags_ip[TCP_FRAMES_MEM]; -/* TCP segments without payload for IPv4 frames */ -static struct tcp_flags_t tcp4_flags[TCP_FRAMES_MEM]; - -static unsigned int tcp4_flags_used; - -/* Ethernet header for IPv6 frames */ -static struct ethhdr tcp6_eth_src; - -static struct tap_hdr tcp6_payload_tap_hdr[TCP_FRAMES_MEM]; -/* IPv6 headers */ -static struct ipv6hdr tcp6_payload_ip[TCP_FRAMES_MEM]; -/* TCP headers and data for IPv6 frames */ -static struct tcp_payload_t tcp6_payload[TCP_FRAMES_MEM]; - -static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is greater than 65516"); - -/* References tracking the owner connection of frames in the tap outqueue */ -static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM]; -static unsigned int tcp6_payload_used; - -static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM]; -/* IPv6 headers for TCP segment without payload */ -static struct ipv6hdr tcp6_flags_ip[TCP_FRAMES_MEM]; -/* TCP segment without payload for IPv6 frames */ -static struct tcp_flags_t tcp6_flags[TCP_FRAMES_MEM]; - -static unsigned int tcp6_flags_used; - -/* recvmsg()/sendmsg() data for tap */ -static char tcp_buf_discard [MAX_WINDOW]; -static struct iovec iov_sock [TCP_FRAMES_MEM + 1]; - -/* - * enum tcp_iov_parts - I/O vector parts for one TCP frame - * @TCP_IOV_TAP tap backend specific header - * @TCP_IOV_ETH Ethernet header - * @TCP_IOV_IP IP (v4/v6) header - * @TCP_IOV_PAYLOAD IP payload (TCP header + data) - * @TCP_NUM_IOVS the number of entries in the iovec array - */ -enum tcp_iov_parts { - TCP_IOV_TAP = 0, - TCP_IOV_ETH = 1, - TCP_IOV_IP = 2, - TCP_IOV_PAYLOAD = 3, - TCP_NUM_IOVS -}; - -static struct iovec tcp4_l2_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; -static struct iovec tcp6_l2_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; -static struct iovec tcp4_l2_flags_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; -static struct iovec tcp6_l2_flags_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; +char tcp_buf_discard [MAX_WINDOW];
/* sendmsg() to socket */ static struct iovec tcp_iov [UIO_MAXIOV]; @@ -552,14 +416,6 @@ static uint32_t tcp_conn_epoll_events(uint8_t events, uint8_t conn_flags) return EPOLLRDHUP; }
-static void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, - unsigned long flag); -#define conn_flag(c, conn, flag) \ - do { \ - flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \ - conn_flag_do(c, conn, flag); \ - } while (0) - /** * tcp_epoll_ctl() - Add/modify/delete epoll state from connection events * @c: Execution context @@ -671,8 +527,8 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) * @conn: Connection pointer * @flag: Flag to set, or ~flag to unset */ -static void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, - unsigned long flag) +void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, + unsigned long flag) { if (flag & (flag - 1)) { int flag_index = fls(~flag); @@ -722,8 +578,8 @@ static void tcp_hash_remove(const struct ctx *c, * @conn: Connection pointer * @event: Connection event */ -static void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, - unsigned long event) +void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, + unsigned long event) { int prev, new, num = fls(event);
@@ -771,12 +627,6 @@ static void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, tcp_timer_ctl(c, conn); }
-#define conn_event(c, conn, event) \ - do { \ - flow_trace(conn, "event at %s:%i", __func__, __LINE__); \ - conn_event_do(c, conn, event); \ - } while (0) - /** * tcp_rtt_dst_low() - Check if low RTT was seen for connection endpoint * @conn: Connection pointer @@ -906,104 +756,6 @@ static void tcp_update_check_tcp6(struct ipv6hdr *ip6h, struct tcphdr *th) th->check = csum(th, l4len, sum); }
-/** - * tcp_update_l2_buf() - Update Ethernet header buffers with addresses - * @eth_d: Ethernet destination address, NULL if unchanged - * @eth_s: Ethernet source address, NULL if unchanged - */ -void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s) -{ - eth_update_mac(&tcp4_eth_src, eth_d, eth_s); - eth_update_mac(&tcp6_eth_src, eth_d, eth_s); -} - -/** - * tcp_sock4_iov_init() - Initialise scatter-gather L2 buffers for IPv4 sockets - * @c: Execution context - */ -static void tcp_sock4_iov_init(const struct ctx *c) -{ - struct iphdr iph = L2_BUF_IP4_INIT(IPPROTO_TCP); - struct iovec *iov; - int i; - - tcp4_eth_src.h_proto = htons_constant(ETH_P_IP); - - for (i = 0; i < ARRAY_SIZE(tcp4_payload); i++) { - tcp4_payload_ip[i] = iph; - tcp4_payload[i].th.doff = sizeof(struct tcphdr) / 4; - tcp4_payload[i].th.ack = 1; - } - - for (i = 0; i < ARRAY_SIZE(tcp4_flags); i++) { - tcp4_flags_ip[i] = iph; - tcp4_flags[i].th.doff = sizeof(struct tcphdr) / 4; - tcp4_flags[i].th.ack = 1; - } - - for (i = 0; i < TCP_FRAMES_MEM; i++) { - iov = tcp4_l2_iov[i]; - - iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp4_payload_tap_hdr[i]); - iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp4_eth_src); - iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp4_payload_ip[i]); - iov[TCP_IOV_PAYLOAD].iov_base = &tcp4_payload[i]; - } - - for (i = 0; i < TCP_FRAMES_MEM; i++) { - iov = tcp4_l2_flags_iov[i]; - - iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp4_flags_tap_hdr[i]); - iov[TCP_IOV_ETH].iov_base = &tcp4_eth_src; - iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp4_eth_src); - iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp4_flags_ip[i]); - iov[TCP_IOV_PAYLOAD].iov_base = &tcp4_flags[i]; - } -} - -/** - * tcp_sock6_iov_init() - Initialise scatter-gather L2 buffers for IPv6 sockets - * @c: Execution context - */ -static void tcp_sock6_iov_init(const struct ctx *c) -{ - struct ipv6hdr ip6 = L2_BUF_IP6_INIT(IPPROTO_TCP); - struct iovec *iov; - int i; - - tcp6_eth_src.h_proto = htons_constant(ETH_P_IPV6); - - for (i = 0; i < ARRAY_SIZE(tcp6_payload); i++) { - tcp6_payload_ip[i] = ip6; - tcp6_payload[i].th.doff = sizeof(struct tcphdr) / 4; - tcp6_payload[i].th.ack = 1; - } - - for (i = 0; i < ARRAY_SIZE(tcp6_flags); i++) { - tcp6_flags_ip[i] = ip6; - tcp6_flags[i].th.doff = sizeof(struct tcphdr) / 4; - tcp6_flags[i].th .ack = 1; - } - - for (i = 0; i < TCP_FRAMES_MEM; i++) { - iov = tcp6_l2_iov[i]; - - iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp6_payload_tap_hdr[i]); - iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp6_eth_src); - iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp6_payload_ip[i]); - iov[TCP_IOV_PAYLOAD].iov_base = &tcp6_payload[i]; - } - - for (i = 0; i < TCP_FRAMES_MEM; i++) { - iov = tcp6_l2_flags_iov[i]; - - iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp6_flags_tap_hdr[i]); - iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp6_eth_src); - iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp6_flags_ip[i]); - iov[TCP_IOV_PAYLOAD].iov_base = &tcp6_flags[i]; - } -} - /** * tcp_opt_get() - Get option, and value if any, from TCP header * @opts: Pointer to start of TCP options in header @@ -1227,76 +979,6 @@ bool tcp_flow_defer(const struct tcp_tap_conn *conn) return true; }
-static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn); -#define tcp_rst(c, conn) \ - do { \ - flow_dbg((conn), "TCP reset at %s:%i", __func__, __LINE__); \ - tcp_rst_do(c, conn); \ - } while (0) - -/** - * tcp_flags_flush() - Send out buffers for segments with no data (flags) - * @c: Execution context - */ -static void tcp_flags_flush(const struct ctx *c) -{ - tap_send_frames(c, &tcp6_l2_flags_iov[0][0], TCP_NUM_IOVS, - tcp6_flags_used); - tcp6_flags_used = 0; - - tap_send_frames(c, &tcp4_l2_flags_iov[0][0], TCP_NUM_IOVS, - tcp4_flags_used); - tcp4_flags_used = 0; -} - -/** - * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed transmission - * @conns: Array of connection pointers corresponding to queued frames - * @frames: Two-dimensional array containing queued frames with sub-iovs - * @num_frames: Number of entries in the two arrays to be compared - */ -static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec (*frames)[TCP_NUM_IOVS], - int num_frames) -{ - int i; - - for (i = 0; i < num_frames; i++) { - const struct tcphdr *th = frames[i][TCP_IOV_PAYLOAD].iov_base; - struct tcp_tap_conn *conn = conns[i]; - uint32_t seq = ntohl(th->seq); - - if (SEQ_LE(conn->seq_to_tap, seq)) - continue; - - conn->seq_to_tap = seq; - } -} - -/** - * tcp_payload_flush() - Send out buffers for segments with data - * @c: Execution context - */ -static void tcp_payload_flush(const struct ctx *c) -{ - size_t m; - - m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, - tcp6_payload_used); - if (m != tcp6_payload_used) { - tcp_revert_seq(&tcp6_frame_conns[m], &tcp6_l2_iov[m], - tcp6_payload_used - m); - } - tcp6_payload_used = 0; - - m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, - tcp4_payload_used); - if (m != tcp4_payload_used) { - tcp_revert_seq(&tcp4_frame_conns[m], &tcp4_l2_iov[m], - tcp4_payload_used - m); - } - tcp4_payload_used = 0; -} - /** * tcp_defer_handler() - Handler for TCP deferred tasks * @c: Execution context @@ -1430,10 +1112,10 @@ static size_t tcp_fill_headers6(const struct ctx *c, * * Return: IP payload length, host order */ -static size_t tcp_l2_buf_fill_headers(const struct ctx *c, - const struct tcp_tap_conn *conn, - struct iovec *iov, size_t dlen, - const uint16_t *check, uint32_t seq) +size_t tcp_l2_buf_fill_headers(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct iovec *iov, size_t dlen, + const uint16_t *check, uint32_t seq) { const struct in_addr *a4 = inany_v4(&conn->faddr);
@@ -1459,8 +1141,8 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c, * * Return: 1 if sequence or window were updated, 0 otherwise */ -static int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, - int force_seq, struct tcp_info *tinfo) +int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, + int force_seq, struct tcp_info *tinfo) { uint32_t prev_wnd_to_tap = conn->wnd_to_tap << conn->ws_to_tap; uint32_t prev_ack_to_tap = conn->seq_ack_to_tap; @@ -1579,9 +1261,9 @@ static void tcp_update_seqack_from_tap(const struct ctx *c, * 0 if there is no flag to send * 1 otherwise */ -static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, - int flags, struct tcphdr *th, char *data, - size_t *optlen) +int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, + int flags, struct tcphdr *th, char *data, + size_t *optlen) { struct tcp_info tinfo = { 0 }; socklen_t sl = sizeof(tinfo); @@ -1678,54 +1360,9 @@ static int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, * * Return: negative error code on connection reset, 0 otherwise */ -static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) +int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) { - struct tcp_flags_t *payload; - struct iovec *iov; - size_t optlen; - size_t l4len; - int ret; - - if (CONN_V4(conn)) - iov = tcp4_l2_flags_iov[tcp4_flags_used++]; - else - iov = tcp6_l2_flags_iov[tcp6_flags_used++]; - - payload = iov[TCP_IOV_PAYLOAD].iov_base; - - ret = tcp_prepare_flags(c, conn, flags, &payload->th, - payload->opts, &optlen); - if (ret <= 0) - return ret; - - l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, - conn->seq_to_tap); - iov[TCP_IOV_PAYLOAD].iov_len = l4len; - - if (flags & DUP_ACK) { - struct iovec *dup_iov; - int i; - - if (CONN_V4(conn)) - dup_iov = tcp4_l2_flags_iov[tcp4_flags_used++]; - else - dup_iov = tcp6_l2_flags_iov[tcp6_flags_used++]; - - for (i = 0; i < TCP_NUM_IOVS; i++) - memcpy(dup_iov[i].iov_base, iov[i].iov_base, - iov[i].iov_len); - dup_iov[TCP_IOV_PAYLOAD].iov_len = iov[TCP_IOV_PAYLOAD].iov_len; - } - - if (CONN_V4(conn)) { - if (tcp4_flags_used > TCP_FRAMES_MEM - 2) - tcp_flags_flush(c); - } else { - if (tcp6_flags_used > TCP_FRAMES_MEM - 2) - tcp_flags_flush(c); - } - - return 0; + return tcp_buf_send_flag(c, conn, flags); }
/** @@ -1733,7 +1370,7 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) * @c: Execution context * @conn: Connection pointer */ -static void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn) +void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn) { if (conn->events == CLOSED) return; @@ -2160,49 +1797,6 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) return 0; }
-/** - * tcp_data_to_tap() - Finalise (queue) highest-numbered scatter-gather buffer - * @c: Execution context - * @conn: Connection pointer - * @dlen: TCP payload length - * @no_csum: Don't compute IPv4 checksum, use the one from previous buffer - * @seq: Sequence number to be sent - */ -static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, - ssize_t dlen, int no_csum, uint32_t seq) -{ - struct iovec *iov; - size_t l4len; - - conn->seq_to_tap = seq + dlen; - - if (CONN_V4(conn)) { - struct iovec *iov_prev = tcp4_l2_iov[tcp4_payload_used - 1]; - const uint16_t *check = NULL; - - if (no_csum) { - struct iphdr *iph = iov_prev[TCP_IOV_IP].iov_base; - check = &iph->check; - } - - tcp4_frame_conns[tcp4_payload_used] = conn; - - iov = tcp4_l2_iov[tcp4_payload_used++]; - l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); - iov[TCP_IOV_PAYLOAD].iov_len = l4len; - if (tcp4_payload_used > TCP_FRAMES_MEM - 1) - tcp_payload_flush(c); - } else if (CONN_V6(conn)) { - tcp6_frame_conns[tcp6_payload_used] = conn; - - iov = tcp6_l2_iov[tcp6_payload_used++]; - l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); - iov[TCP_IOV_PAYLOAD].iov_len = l4len; - if (tcp6_payload_used > TCP_FRAMES_MEM - 1) - tcp_payload_flush(c); - } -} - /** * tcp_data_from_sock() - Handle new data from socket, queue to tap, in window * @c: Execution context @@ -2214,123 +1808,7 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, */ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) { - uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; - int fill_bufs, send_bufs = 0, last_len, iov_rem = 0; - int sendlen, len, dlen, v4 = CONN_V4(conn); - int s = conn->sock, i, ret = 0; - struct msghdr mh_sock = { 0 }; - uint16_t mss = MSS_GET(conn); - uint32_t already_sent, seq; - struct iovec *iov; - - already_sent = conn->seq_to_tap - conn->seq_ack_from_tap; - - if (SEQ_LT(already_sent, 0)) { - /* RFC 761, section 2.1. */ - flow_trace(conn, "ACK sequence gap: ACK for %u, sent: %u", - conn->seq_ack_from_tap, conn->seq_to_tap); - conn->seq_to_tap = conn->seq_ack_from_tap; - already_sent = 0; - } - - if (!wnd_scaled || already_sent >= wnd_scaled) { - conn_flag(c, conn, STALLED); - conn_flag(c, conn, ACK_FROM_TAP_DUE); - return 0; - } - - /* Set up buffer descriptors we'll fill completely and partially. */ - fill_bufs = DIV_ROUND_UP(wnd_scaled - already_sent, mss); - if (fill_bufs > TCP_FRAMES) { - fill_bufs = TCP_FRAMES; - iov_rem = 0; - } else { - iov_rem = (wnd_scaled - already_sent) % mss; - } - - mh_sock.msg_iov = iov_sock; - mh_sock.msg_iovlen = fill_bufs + 1; - - iov_sock[0].iov_base = tcp_buf_discard; - iov_sock[0].iov_len = already_sent; - - if (( v4 && tcp4_payload_used + fill_bufs > TCP_FRAMES_MEM) || - (!v4 && tcp6_payload_used + fill_bufs > TCP_FRAMES_MEM)) { - tcp_payload_flush(c); - - /* Silence Coverity CWE-125 false positive */ - tcp4_payload_used = tcp6_payload_used = 0; - } - - for (i = 0, iov = iov_sock + 1; i < fill_bufs; i++, iov++) { - if (v4) - iov->iov_base = &tcp4_payload[tcp4_payload_used + i].data; - else - iov->iov_base = &tcp6_payload[tcp6_payload_used + i].data; - iov->iov_len = mss; - } - if (iov_rem) - iov_sock[fill_bufs].iov_len = iov_rem; - - /* Receive into buffers, don't dequeue until acknowledged by guest. */ - do - len = recvmsg(s, &mh_sock, MSG_PEEK); - while (len < 0 && errno == EINTR); - - if (len < 0) - goto err; - - if (!len) { - if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { - if ((ret = tcp_send_flag(c, conn, FIN | ACK))) { - tcp_rst(c, conn); - return ret; - } - - conn_event(c, conn, TAP_FIN_SENT); - } - - return 0; - } - - sendlen = len - already_sent; - if (sendlen <= 0) { - conn_flag(c, conn, STALLED); - return 0; - } - - conn_flag(c, conn, ~STALLED); - - send_bufs = DIV_ROUND_UP(sendlen, mss); - last_len = sendlen - (send_bufs - 1) * mss; - - /* Likely, some new data was acked too. */ - tcp_update_seqack_wnd(c, conn, 0, NULL); - - /* Finally, queue to tap */ - dlen = mss; - seq = conn->seq_to_tap; - for (i = 0; i < send_bufs; i++) { - int no_csum = i && i != send_bufs - 1 && tcp4_payload_used; - - if (i == send_bufs - 1) - dlen = last_len; - - tcp_data_to_tap(c, conn, dlen, no_csum, seq); - seq += dlen; - } - - conn_flag(c, conn, ACK_FROM_TAP_DUE); - - return 0; - -err: - if (errno != EAGAIN && errno != EWOULDBLOCK) { - ret = -errno; - tcp_rst(c, conn); - } - - return ret; + return tcp_buf_data_from_sock(c, conn); }
/** diff --git a/tcp_buf.c b/tcp_buf.c new file mode 100644 index 000000000000..0c7d07b8d0bd --- /dev/null +++ b/tcp_buf.c @@ -0,0 +1,513 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* PASST - Plug A Simple Socket Transport + * for qemu/UNIX domain socket mode + * + * PASTA - Pack A Subtle Tap Abstraction + * for network namespace/tap device mode + * + * tcp_buf.c - TCP L2-L4 buffer management functions
This still has "L2-L4", but L4 doesn't really make sense I think. I can also drop "-L4" on merge.
+ * + * Copyright Red Hat + * Author: Stefano Brivio
+ */ + +#include +#include +#include +#include +#include + +#include + +#include + +#include "util.h" +#include "ip.h" +#include "iov.h" +#include "passt.h" +#include "tap.h" +#include "siphash.h" +#include "inany.h" +#include "tcp_conn.h" +#include "tcp_internal.h" +#include "tcp_buf.h" + +#define TCP_FRAMES_MEM 128 +#define TCP_FRAMES \ + (c->mode == MODE_PASST ? TCP_FRAMES_MEM : 1) + +/* Static buffers */ +/** + * struct tcp_payload_t - TCP header and data to send segments with payload + * @th: TCP header + * @data: TCP data + */ +struct tcp_payload_t { + struct tcphdr th; + uint8_t data[IP_MAX_MTU - sizeof(struct tcphdr)]; +#ifdef __AVX2__ +} __attribute__ ((packed, aligned(32))); /* For AVX2 checksum routines */ +#else +} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); +#endif + +/** + * struct tcp_flags_t - TCP header and data to send zero-length + * segments (flags) + * @th: TCP header + * @opts TCP options + */ +struct tcp_flags_t { + struct tcphdr th; + char opts[OPT_MSS_LEN + OPT_WS_LEN + 1]; +#ifdef __AVX2__ +} __attribute__ ((packed, aligned(32))); +#else +} __attribute__ ((packed, aligned(__alignof__(unsigned int)))); +#endif + +/* Ethernet header for IPv4 frames */ +static struct ethhdr tcp4_eth_src; + +static struct tap_hdr tcp4_payload_tap_hdr[TCP_FRAMES_MEM]; +/* IPv4 headers */ +static struct iphdr tcp4_payload_ip[TCP_FRAMES_MEM]; +/* TCP segments with payload for IPv4 frames */ +static struct tcp_payload_t tcp4_payload[TCP_FRAMES_MEM]; + +static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is greater than 65516"); + +/* References tracking the owner connection of frames in the tap outqueue */ +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM]; +static unsigned int tcp4_payload_used; + +static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM]; +/* IPv4 headers for TCP segment without payload */ +static struct iphdr tcp4_flags_ip[TCP_FRAMES_MEM]; +/* TCP segments without payload for IPv4 frames */ +static struct tcp_flags_t tcp4_flags[TCP_FRAMES_MEM]; + +static unsigned int tcp4_flags_used; + +/* Ethernet header for IPv6 frames */ +static struct ethhdr tcp6_eth_src; + +static struct tap_hdr tcp6_payload_tap_hdr[TCP_FRAMES_MEM]; +/* IPv6 headers */ +static struct ipv6hdr tcp6_payload_ip[TCP_FRAMES_MEM]; +/* TCP headers and data for IPv6 frames */ +static struct tcp_payload_t tcp6_payload[TCP_FRAMES_MEM]; + +static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is greater than 65516"); + +/* References tracking the owner connection of frames in the tap outqueue */ +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM]; +static unsigned int tcp6_payload_used; + +static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM]; +/* IPv6 headers for TCP segment without payload */ +static struct ipv6hdr tcp6_flags_ip[TCP_FRAMES_MEM]; +/* TCP segment without payload for IPv6 frames */ +static struct tcp_flags_t tcp6_flags[TCP_FRAMES_MEM]; + +static unsigned int tcp6_flags_used; + +/* recvmsg()/sendmsg() data for tap */ +static struct iovec iov_sock [TCP_FRAMES_MEM + 1]; + +static struct iovec tcp4_l2_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; +static struct iovec tcp6_l2_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; +static struct iovec tcp4_l2_flags_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; +static struct iovec tcp6_l2_flags_iov [TCP_FRAMES_MEM][TCP_NUM_IOVS]; +/** + * tcp_update_l2_buf() - Update Ethernet header buffers with addresses + * @eth_d: Ethernet destination address, NULL if unchanged + * @eth_s: Ethernet source address, NULL if unchanged + */ +void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s) +{ + eth_update_mac(&tcp4_eth_src, eth_d, eth_s); + eth_update_mac(&tcp6_eth_src, eth_d, eth_s); +} + +/** + * tcp_sock4_iov_init() - Initialise scatter-gather L2 buffers for IPv4 sockets + * @c: Execution context + */ +void tcp_sock4_iov_init(const struct ctx *c) +{ + struct iphdr iph = L2_BUF_IP4_INIT(IPPROTO_TCP); + struct iovec *iov; + int i; + + tcp4_eth_src.h_proto = htons_constant(ETH_P_IP); + + for (i = 0; i < ARRAY_SIZE(tcp4_payload); i++) { + tcp4_payload_ip[i] = iph; + tcp4_payload[i].th.doff = sizeof(struct tcphdr) / 4; + tcp4_payload[i].th.ack = 1; + } + + for (i = 0; i < ARRAY_SIZE(tcp4_flags); i++) { + tcp4_flags_ip[i] = iph; + tcp4_flags[i].th.doff = sizeof(struct tcphdr) / 4; + tcp4_flags[i].th.ack = 1; + } + + for (i = 0; i < TCP_FRAMES_MEM; i++) { + iov = tcp4_l2_iov[i]; + + iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp4_payload_tap_hdr[i]); + iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp4_eth_src); + iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp4_payload_ip[i]); + iov[TCP_IOV_PAYLOAD].iov_base = &tcp4_payload[i]; + } + + for (i = 0; i < TCP_FRAMES_MEM; i++) { + iov = tcp4_l2_flags_iov[i]; + + iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp4_flags_tap_hdr[i]); + iov[TCP_IOV_ETH].iov_base = &tcp4_eth_src; + iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp4_eth_src); + iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp4_flags_ip[i]); + iov[TCP_IOV_PAYLOAD].iov_base = &tcp4_flags[i]; + } +} + +/** + * tcp_sock6_iov_init() - Initialise scatter-gather L2 buffers for IPv6 sockets + * @c: Execution context + */ +void tcp_sock6_iov_init(const struct ctx *c) +{ + struct ipv6hdr ip6 = L2_BUF_IP6_INIT(IPPROTO_TCP); + struct iovec *iov; + int i; + + tcp6_eth_src.h_proto = htons_constant(ETH_P_IPV6); + + for (i = 0; i < ARRAY_SIZE(tcp6_payload); i++) { + tcp6_payload_ip[i] = ip6; + tcp6_payload[i].th.doff = sizeof(struct tcphdr) / 4; + tcp6_payload[i].th.ack = 1; + } + + for (i = 0; i < ARRAY_SIZE(tcp6_flags); i++) { + tcp6_flags_ip[i] = ip6; + tcp6_flags[i].th.doff = sizeof(struct tcphdr) / 4; + tcp6_flags[i].th .ack = 1; + } + + for (i = 0; i < TCP_FRAMES_MEM; i++) { + iov = tcp6_l2_iov[i]; + + iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp6_payload_tap_hdr[i]); + iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp6_eth_src); + iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp6_payload_ip[i]); + iov[TCP_IOV_PAYLOAD].iov_base = &tcp6_payload[i]; + } + + for (i = 0; i < TCP_FRAMES_MEM; i++) { + iov = tcp6_l2_flags_iov[i]; + + iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp6_flags_tap_hdr[i]); + iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp6_eth_src); + iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp6_flags_ip[i]); + iov[TCP_IOV_PAYLOAD].iov_base = &tcp6_flags[i]; + } +} + +/** + * tcp_flags_flush() - Send out buffers for segments with no data (flags) + * @c: Execution context + */ +void tcp_flags_flush(const struct ctx *c) +{ + tap_send_frames(c, &tcp6_l2_flags_iov[0][0], TCP_NUM_IOVS, + tcp6_flags_used); + tcp6_flags_used = 0; + + tap_send_frames(c, &tcp4_l2_flags_iov[0][0], TCP_NUM_IOVS, + tcp4_flags_used); + tcp4_flags_used = 0; +} + +/** + * tcp_revert_seq() - Revert affected conn->seq_to_tap after failed transmission + * @conns: Array of connection pointers corresponding to queued frames + * @frames: Two-dimensional array containing queued frames with sub-iovs + * @num_frames: Number of entries in the two arrays to be compared + */ +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec (*frames)[TCP_NUM_IOVS], + int num_frames) +{ + int i; + + for (i = 0; i < num_frames; i++) { + const struct tcphdr *th = frames[i][TCP_IOV_PAYLOAD].iov_base; + struct tcp_tap_conn *conn = conns[i]; + uint32_t seq = ntohl(th->seq); + + if (SEQ_LE(conn->seq_to_tap, seq)) + continue; + + conn->seq_to_tap = seq; + } +} + +/** + * tcp_payload_flush() - Send out buffers for segments with data + * @c: Execution context + */ +void tcp_payload_flush(const struct ctx *c) +{ + size_t m; + + m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS, + tcp6_payload_used); + if (m != tcp6_payload_used) { + tcp_revert_seq(&tcp6_frame_conns[m], &tcp6_l2_iov[m], + tcp6_payload_used - m); + } + tcp6_payload_used = 0; + + m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS, + tcp4_payload_used); + if (m != tcp4_payload_used) { + tcp_revert_seq(&tcp4_frame_conns[m], &tcp4_l2_iov[m], + tcp4_payload_used - m); + } + tcp4_payload_used = 0; +} + +/** + * tcp_buf_send_flag() - Send segment with flags to tap (no payload) + * @c: Execution context + * @conn: Connection pointer + * @flags: TCP flags: if not set, send segment only if ACK is due + * + * Return: negative error code on connection reset, 0 otherwise + */ +int tcp_buf_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) +{ + struct tcp_flags_t *payload; + struct iovec *iov; + size_t optlen; + size_t l4len; + int ret; + + if (CONN_V4(conn)) + iov = tcp4_l2_flags_iov[tcp4_flags_used++]; + else + iov = tcp6_l2_flags_iov[tcp6_flags_used++]; + + payload = iov[TCP_IOV_PAYLOAD].iov_base; + + ret = tcp_prepare_flags(c, conn, flags, &payload->th, + payload->opts, &optlen); + if (ret <= 0) + return ret; + + l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, + conn->seq_to_tap); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; + + if (flags & DUP_ACK) { + struct iovec *dup_iov; + int i; + + if (CONN_V4(conn)) + dup_iov = tcp4_l2_flags_iov[tcp4_flags_used++]; + else + dup_iov = tcp6_l2_flags_iov[tcp6_flags_used++]; + + for (i = 0; i < TCP_NUM_IOVS; i++) + memcpy(dup_iov[i].iov_base, iov[i].iov_base, + iov[i].iov_len); + dup_iov[TCP_IOV_PAYLOAD].iov_len = iov[TCP_IOV_PAYLOAD].iov_len; + } + + if (CONN_V4(conn)) { + if (tcp4_flags_used > TCP_FRAMES_MEM - 2) + tcp_flags_flush(c); + } else { + if (tcp6_flags_used > TCP_FRAMES_MEM - 2) + tcp_flags_flush(c); + } + + return 0; +} + +/** + * tcp_data_to_tap() - Finalise (queue) highest-numbered scatter-gather buffer + * @c: Execution context + * @conn: Connection pointer + * @dlen: TCP payload length + * @no_csum: Don't compute IPv4 checksum, use the one from previous buffer + * @seq: Sequence number to be sent + */ +static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, + ssize_t dlen, int no_csum, uint32_t seq) +{ + struct iovec *iov; + size_t l4len; + + conn->seq_to_tap = seq + dlen; + + if (CONN_V4(conn)) { + struct iovec *iov_prev = tcp4_l2_iov[tcp4_payload_used - 1]; + const uint16_t *check = NULL; + + if (no_csum) { + struct iphdr *iph = iov_prev[TCP_IOV_IP].iov_base; + check = &iph->check; + } + + tcp4_frame_conns[tcp4_payload_used] = conn; + + iov = tcp4_l2_iov[tcp4_payload_used++]; + l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; + if (tcp4_payload_used > TCP_FRAMES_MEM - 1) + tcp_payload_flush(c); + } else if (CONN_V6(conn)) { + tcp6_frame_conns[tcp6_payload_used] = conn; + + iov = tcp6_l2_iov[tcp6_payload_used++]; + l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; + if (tcp6_payload_used > TCP_FRAMES_MEM - 1) + tcp_payload_flush(c); + } +} + +/** + * tcp_buf_data_from_sock() - Handle new data from socket, queue to tap, in window + * @c: Execution context + * @conn: Connection pointer + * + * Return: negative on connection reset, 0 otherwise + * + * #syscalls recvmsg + */ +int tcp_buf_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) +{ + uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; + int fill_bufs, send_bufs = 0, last_len, iov_rem = 0; + int sendlen, len, dlen, v4 = CONN_V4(conn); + int s = conn->sock, i, ret = 0; + struct msghdr mh_sock = { 0 }; + uint16_t mss = MSS_GET(conn); + uint32_t already_sent, seq; + struct iovec *iov; + + already_sent = conn->seq_to_tap - conn->seq_ack_from_tap; + + if (SEQ_LT(already_sent, 0)) { + /* RFC 761, section 2.1. */ + flow_trace(conn, "ACK sequence gap: ACK for %u, sent: %u", + conn->seq_ack_from_tap, conn->seq_to_tap); + conn->seq_to_tap = conn->seq_ack_from_tap; + already_sent = 0; + } + + if (!wnd_scaled || already_sent >= wnd_scaled) { + conn_flag(c, conn, STALLED); + conn_flag(c, conn, ACK_FROM_TAP_DUE); + return 0; + } + + /* Set up buffer descriptors we'll fill completely and partially. */ + fill_bufs = DIV_ROUND_UP(wnd_scaled - already_sent, mss); + if (fill_bufs > TCP_FRAMES) { + fill_bufs = TCP_FRAMES; + iov_rem = 0; + } else { + iov_rem = (wnd_scaled - already_sent) % mss; + } + + mh_sock.msg_iov = iov_sock; + mh_sock.msg_iovlen = fill_bufs + 1; + + iov_sock[0].iov_base = tcp_buf_discard; + iov_sock[0].iov_len = already_sent; + + if (( v4 && tcp4_payload_used + fill_bufs > TCP_FRAMES_MEM) || + (!v4 && tcp6_payload_used + fill_bufs > TCP_FRAMES_MEM)) { + tcp_payload_flush(c); + + /* Silence Coverity CWE-125 false positive */ + tcp4_payload_used = tcp6_payload_used = 0; + } + + for (i = 0, iov = iov_sock + 1; i < fill_bufs; i++, iov++) { + if (v4) + iov->iov_base = &tcp4_payload[tcp4_payload_used + i].data; + else + iov->iov_base = &tcp6_payload[tcp6_payload_used + i].data; + iov->iov_len = mss; + } + if (iov_rem) + iov_sock[fill_bufs].iov_len = iov_rem; + + /* Receive into buffers, don't dequeue until acknowledged by guest. */ + do + len = recvmsg(s, &mh_sock, MSG_PEEK); + while (len < 0 && errno == EINTR); + + if (len < 0) + goto err; + + if (!len) { + if ((conn->events & (SOCK_FIN_RCVD | TAP_FIN_SENT)) == SOCK_FIN_RCVD) { + if ((ret = tcp_buf_send_flag(c, conn, FIN | ACK))) { + tcp_rst(c, conn); + return ret; + } + + conn_event(c, conn, TAP_FIN_SENT); + } + + return 0; + } + + sendlen = len - already_sent; + if (sendlen <= 0) { + conn_flag(c, conn, STALLED); + return 0; + } + + conn_flag(c, conn, ~STALLED); + + send_bufs = DIV_ROUND_UP(sendlen, mss); + last_len = sendlen - (send_bufs - 1) * mss; + + /* Likely, some new data was acked too. */ + tcp_update_seqack_wnd(c, conn, 0, NULL); + + /* Finally, queue to tap */ + dlen = mss; + seq = conn->seq_to_tap; + for (i = 0; i < send_bufs; i++) { + int no_csum = i && i != send_bufs - 1 && tcp4_payload_used; + + if (i == send_bufs - 1) + dlen = last_len; + + tcp_data_to_tap(c, conn, dlen, no_csum, seq); + seq += dlen; + } + + conn_flag(c, conn, ACK_FROM_TAP_DUE); + + return 0; + +err: + if (errno != EAGAIN && errno != EWOULDBLOCK) { + ret = -errno; + tcp_rst(c, conn); + } + + return ret; +} diff --git a/tcp_buf.h b/tcp_buf.h new file mode 100644 index 000000000000..14be7b945285 --- /dev/null +++ b/tcp_buf.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * Copyright (c) 2021 Red Hat GmbH + * Author: Stefano Brivio + */ + +#ifndef TCP_BUF_H +#define TCP_BUF_H + +void tcp_sock4_iov_init(const struct ctx *c); +void tcp_sock6_iov_init(const struct ctx *c); +void tcp_flags_flush(const struct ctx *c); +void tcp_payload_flush(const struct ctx *c); +int tcp_buf_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn); +int tcp_buf_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags); + +#endif /*TCP_BUF_H */ diff --git a/tcp_internal.h b/tcp_internal.h new file mode 100644 index 000000000000..51aaa16918cf --- /dev/null +++ b/tcp_internal.h @@ -0,0 +1,96 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * Copyright (c) 2021 Red Hat GmbH + * Author: Stefano Brivio + */ + +#ifndef TCP_INTERNAL_H +#define TCP_INTERNAL_H + +#define MAX_WS 8 +#define MAX_WINDOW (1 << (16 + (MAX_WS))) + +#define MSS4 ROUND_DOWN(IP_MAX_MTU - \ + sizeof(struct tcphdr) - \ + sizeof(struct iphdr), \ + sizeof(uint32_t)) +#define MSS6 ROUND_DOWN(IP_MAX_MTU - \ + sizeof(struct tcphdr) - \ + sizeof(struct ipv6hdr), \ + sizeof(uint32_t)) + +#define SEQ_LE(a, b) ((b) - (a) < MAX_WINDOW) +#define SEQ_LT(a, b) ((b) - (a) - 1 < MAX_WINDOW) +#define SEQ_GE(a, b) ((a) - (b) < MAX_WINDOW) +#define SEQ_GT(a, b) ((a) - (b) - 1 < MAX_WINDOW) + +#define FIN (1 << 0) +#define SYN (1 << 1) +#define RST (1 << 2) +#define ACK (1 << 4) + +/* Flags for internal usage */ +#define DUP_ACK (1 << 5) +#define OPT_EOL 0 +#define OPT_NOP 1 +#define OPT_MSS 2 +#define OPT_MSS_LEN 4 +#define OPT_WS 3 +#define OPT_WS_LEN 3 +#define OPT_SACKP 4 +#define OPT_SACK 5 +#define OPT_TS 8 +#define CONN_V4(conn) (!!inany_v4(&(conn)->faddr)) +#define CONN_V6(conn) (!CONN_V4(conn)) + +/* + * enum tcp_iov_parts - I/O vector parts for one TCP frame + * @TCP_IOV_TAP tap backend specific header + * @TCP_IOV_ETH Ethernet header + * @TCP_IOV_IP IP (v4/v6) header + * @TCP_IOV_PAYLOAD IP payload (TCP header + data) + * @TCP_NUM_IOVS the number of entries in the iovec array + */ +enum tcp_iov_parts { + TCP_IOV_TAP = 0, + TCP_IOV_ETH = 1, + TCP_IOV_IP = 2, + TCP_IOV_PAYLOAD = 3, + TCP_NUM_IOVS +}; + +extern char tcp_buf_discard [MAX_WINDOW]; + +void conn_flag_do(const struct ctx *c, struct tcp_tap_conn *conn, + unsigned long flag); +#define conn_flag(c, conn, flag) \ + do { \ + flow_trace(conn, "flag at %s:%i", __func__, __LINE__); \ + conn_flag_do(c, conn, flag); \ + } while (0) + + +void conn_event_do(const struct ctx *c, struct tcp_tap_conn *conn, + unsigned long event); +#define conn_event(c, conn, event) \ + do { \ + flow_trace(conn, "event at %s:%i", __func__, __LINE__); \ + conn_event_do(c, conn, event); \ + } while (0) + +void tcp_rst_do(struct ctx *c, struct tcp_tap_conn *conn); +#define tcp_rst(c, conn) \ + do { \ + flow_dbg((conn), "TCP reset at %s:%i", __func__, __LINE__); \ + tcp_rst_do(c, conn); \ + } while (0) + +size_t tcp_l2_buf_fill_headers(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct iovec *iov, size_t dlen, + const uint16_t *check, uint32_t seq); +int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, + int force_seq, struct tcp_info *tinfo); +int tcp_prepare_flags(struct ctx *c, struct tcp_tap_conn *conn, int flags, + struct tcphdr *th, char *data, size_t *optlen); + +#endif /* TCP_INTERNAL_H */
-- Stefano
Consolidate pool_tap4() and pool_tap6() into tap_flush_pools(),
and tap4_handler() and tap6_handler() into tap_handler().
Create a generic tap_add_packet() to consolidate packet
addition logic and reduce code duplication.
The purpose is to ease the export of these functions to use
them with the vhost-user backend.
Signed-off-by: Laurent Vivier
On Wed, 12 Jun 2024 17:47:29 +0200
Laurent Vivier
Consolidate pool_tap4() and pool_tap6() into tap_flush_pools(), and tap4_handler() and tap6_handler() into tap_handler(). Create a generic tap_add_packet() to consolidate packet addition logic and reduce code duplication.
The purpose is to ease the export of these functions to use them with the vhost-user backend.
Signed-off-by: Laurent Vivier
--- Notes: v6: - rename tap_handler_all() to tap_handler() - rename packet_add_all_do() to tap_add_packet and remove func and line.
Oops, they're still in the comment to the function. I can drop that on merge.
- rename pool_flush_all() to tap_flush_pools()
v5: - update commit message and function comments
tap.c | 112 +++++++++++++++++++++++++++++++++------------------------- tap.h | 3 ++ 2 files changed, 66 insertions(+), 49 deletions(-)
diff --git a/tap.c b/tap.c index 26084b486519..ec3a0d68ea2f 100644 --- a/tap.c +++ b/tap.c @@ -921,6 +921,60 @@ append: return in->count; }
+/** + * pool_flush() - Flush both IPv4 and IPv6 packet pools + */ +void tap_flush_pools(void) +{ + pool_flush(pool_tap4); + pool_flush(pool_tap6); +} + +/** + * tap_handler() - IPv4/IPv6 and ARP packet handler for tap file descriptor + * @c: Execution context + * @now: Current timestamp + */ +void tap_handler(struct ctx *c, const struct timespec *now) +{ + tap4_handler(c, pool_tap4, now); + tap6_handler(c, pool_tap6, now); +} + +/** + * tap_add_packet() - Queue/capture packet, update notion of guest MAC address + * @c: Execution context + * @l2len: Total L2 packet length + * @p: Packet buffer + * @func: For tracing: name of calling function, NULL means no trace() + * @line: For tracing: caller line of function call + */ +void tap_add_packet(struct ctx *c, ssize_t l2len, char *p) +{ + const struct ethhdr *eh; + + pcap(p, l2len); + + eh = (struct ethhdr *)p; + + if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) { + memcpy(c->mac_guest, eh->h_source, ETH_ALEN); + proto_update_l2_buf(c->mac_guest, NULL); + } + + switch (ntohs(eh->h_proto)) { + case ETH_P_ARP: + case ETH_P_IP: + packet_add(pool_tap4, l2len, p); + break; + case ETH_P_IPV6: + packet_add(pool_tap6, l2len, p); + break; + default: + break; + } +} + /** * tap_sock_reset() - Handle closing or failure of connect AF_UNIX socket * @c: Execution context @@ -947,7 +1001,6 @@ static void tap_sock_reset(struct ctx *c) void tap_handler_passt(struct ctx *c, uint32_t events, const struct timespec *now) { - const struct ethhdr *eh; ssize_t n, rem; char *p;
@@ -960,8 +1013,7 @@ redo: p = pkt_buf; rem = 0;
- pool_flush(pool_tap4); - pool_flush(pool_tap6); + tap_flush_pools();
n = recv(c->fd_tap, p, TAP_BUF_FILL, MSG_DONTWAIT); if (n < 0) { @@ -988,38 +1040,18 @@ redo: /* Complete the partial read above before discarding a malformed * frame, otherwise the stream will be inconsistent. */ - if (l2len < (ssize_t)sizeof(*eh) || + if (l2len < (ssize_t)sizeof(struct ethhdr) || l2len > (ssize_t)ETH_MAX_MTU) goto next;
- pcap(p, l2len); - - eh = (struct ethhdr *)p; - - if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) { - memcpy(c->mac_guest, eh->h_source, ETH_ALEN); - proto_update_l2_buf(c->mac_guest, NULL); - } - - switch (ntohs(eh->h_proto)) { - case ETH_P_ARP: - case ETH_P_IP: - packet_add(pool_tap4, l2len, p); - break; - case ETH_P_IPV6: - packet_add(pool_tap6, l2len, p); - break; - default: - break; - } + tap_add_packet(c, l2len, p);
next: p += l2len; n -= l2len; }
- tap4_handler(c, pool_tap4, now); - tap6_handler(c, pool_tap6, now); + tap_handler(c, now);
/* We can't use EPOLLET otherwise. */ if (rem) @@ -1044,35 +1076,18 @@ void tap_handler_pasta(struct ctx *c, uint32_t events, redo: n = 0;
- pool_flush(pool_tap4); - pool_flush(pool_tap6); + tap_flush_pools(); restart: while ((len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n)) > 0) { - const struct ethhdr *eh = (struct ethhdr *)(pkt_buf + n);
- if (len < (ssize_t)sizeof(*eh) || len > (ssize_t)ETH_MAX_MTU) { + if (len < (ssize_t)sizeof(struct ethhdr) || + len > (ssize_t)ETH_MAX_MTU) { n += len; continue; }
- pcap(pkt_buf + n, len);
- if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) { - memcpy(c->mac_guest, eh->h_source, ETH_ALEN); - proto_update_l2_buf(c->mac_guest, NULL); - } - - switch (ntohs(eh->h_proto)) { - case ETH_P_ARP: - case ETH_P_IP: - packet_add(pool_tap4, len, pkt_buf + n); - break; - case ETH_P_IPV6: - packet_add(pool_tap6, len, pkt_buf + n); - break; - default: - break; - } + tap_add_packet(c, len, pkt_buf + n);
if ((n += len) == TAP_BUF_BYTES) break; @@ -1083,8 +1098,7 @@ restart:
ret = errno;
- tap4_handler(c, pool_tap4, now); - tap6_handler(c, pool_tap6, now); + tap_handler(c, now);
if (len > 0 || ret == EAGAIN) return; diff --git a/tap.h b/tap.h index 2285a87093f9..d496bd0e4b99 100644 --- a/tap.h +++ b/tap.h @@ -70,5 +70,8 @@ void tap_handler_passt(struct ctx *c, uint32_t events, const struct timespec *now); int tap_sock_unix_open(char *sock_path); void tap_sock_init(struct ctx *c); +void tap_flush_pools(void); +void tap_handler(struct ctx *c, const struct timespec *now); +void tap_add_packet(struct ctx *c, ssize_t l2len, char *p);
#endif /* TAP_H */
-- Stefano
On 12/06/2024 17:52, Stefano Brivio wrote:
On Wed, 12 Jun 2024 17:47:29 +0200 Laurent Vivier
wrote: Consolidate pool_tap4() and pool_tap6() into tap_flush_pools(), and tap4_handler() and tap6_handler() into tap_handler(). Create a generic tap_add_packet() to consolidate packet addition logic and reduce code duplication.
The purpose is to ease the export of these functions to use them with the vhost-user backend.
Signed-off-by: Laurent Vivier
--- Notes: v6: - rename tap_handler_all() to tap_handler() - rename packet_add_all_do() to tap_add_packet and remove func and line.
Oops, they're still in the comment to the function. I can drop that on merge.
yes, please, and also the comment of tap_flush_pools() Thanks, Laurent >
- rename pool_flush_all() to tap_flush_pools()
v5: - update commit message and function comments
tap.c | 112 +++++++++++++++++++++++++++++++++------------------------- tap.h | 3 ++ 2 files changed, 66 insertions(+), 49 deletions(-)
diff --git a/tap.c b/tap.c index 26084b486519..ec3a0d68ea2f 100644 --- a/tap.c +++ b/tap.c @@ -921,6 +921,60 @@ append: return in->count; }
+/** + * pool_flush() - Flush both IPv4 and IPv6 packet pools + */ +void tap_flush_pools(void) +{ + pool_flush(pool_tap4); + pool_flush(pool_tap6); +} + +/** + * tap_handler() - IPv4/IPv6 and ARP packet handler for tap file descriptor + * @c: Execution context + * @now: Current timestamp + */ +void tap_handler(struct ctx *c, const struct timespec *now) +{ + tap4_handler(c, pool_tap4, now); + tap6_handler(c, pool_tap6, now); +} + +/** + * tap_add_packet() - Queue/capture packet, update notion of guest MAC address + * @c: Execution context + * @l2len: Total L2 packet length + * @p: Packet buffer + * @func: For tracing: name of calling function, NULL means no trace() + * @line: For tracing: caller line of function call + */ +void tap_add_packet(struct ctx *c, ssize_t l2len, char *p) +{ + const struct ethhdr *eh; + + pcap(p, l2len); + + eh = (struct ethhdr *)p; + + if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) { + memcpy(c->mac_guest, eh->h_source, ETH_ALEN); + proto_update_l2_buf(c->mac_guest, NULL); + } + + switch (ntohs(eh->h_proto)) { + case ETH_P_ARP: + case ETH_P_IP: + packet_add(pool_tap4, l2len, p); + break; + case ETH_P_IPV6: + packet_add(pool_tap6, l2len, p); + break; + default: + break; + } +} + /** * tap_sock_reset() - Handle closing or failure of connect AF_UNIX socket * @c: Execution context @@ -947,7 +1001,6 @@ static void tap_sock_reset(struct ctx *c) void tap_handler_passt(struct ctx *c, uint32_t events, const struct timespec *now) { - const struct ethhdr *eh; ssize_t n, rem; char *p;
@@ -960,8 +1013,7 @@ redo: p = pkt_buf; rem = 0;
- pool_flush(pool_tap4); - pool_flush(pool_tap6); + tap_flush_pools();
n = recv(c->fd_tap, p, TAP_BUF_FILL, MSG_DONTWAIT); if (n < 0) { @@ -988,38 +1040,18 @@ redo: /* Complete the partial read above before discarding a malformed * frame, otherwise the stream will be inconsistent. */ - if (l2len < (ssize_t)sizeof(*eh) || + if (l2len < (ssize_t)sizeof(struct ethhdr) || l2len > (ssize_t)ETH_MAX_MTU) goto next;
- pcap(p, l2len); - - eh = (struct ethhdr *)p; - - if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) { - memcpy(c->mac_guest, eh->h_source, ETH_ALEN); - proto_update_l2_buf(c->mac_guest, NULL); - } - - switch (ntohs(eh->h_proto)) { - case ETH_P_ARP: - case ETH_P_IP: - packet_add(pool_tap4, l2len, p); - break; - case ETH_P_IPV6: - packet_add(pool_tap6, l2len, p); - break; - default: - break; - } + tap_add_packet(c, l2len, p);
next: p += l2len; n -= l2len; }
- tap4_handler(c, pool_tap4, now); - tap6_handler(c, pool_tap6, now); + tap_handler(c, now);
/* We can't use EPOLLET otherwise. */ if (rem) @@ -1044,35 +1076,18 @@ void tap_handler_pasta(struct ctx *c, uint32_t events, redo: n = 0;
- pool_flush(pool_tap4); - pool_flush(pool_tap6); + tap_flush_pools(); restart: while ((len = read(c->fd_tap, pkt_buf + n, TAP_BUF_BYTES - n)) > 0) { - const struct ethhdr *eh = (struct ethhdr *)(pkt_buf + n);
- if (len < (ssize_t)sizeof(*eh) || len > (ssize_t)ETH_MAX_MTU) { + if (len < (ssize_t)sizeof(struct ethhdr) || + len > (ssize_t)ETH_MAX_MTU) { n += len; continue; }
- pcap(pkt_buf + n, len);
- if (memcmp(c->mac_guest, eh->h_source, ETH_ALEN)) { - memcpy(c->mac_guest, eh->h_source, ETH_ALEN); - proto_update_l2_buf(c->mac_guest, NULL); - } - - switch (ntohs(eh->h_proto)) { - case ETH_P_ARP: - case ETH_P_IP: - packet_add(pool_tap4, len, pkt_buf + n); - break; - case ETH_P_IPV6: - packet_add(pool_tap6, len, pkt_buf + n); - break; - default: - break; - } + tap_add_packet(c, len, pkt_buf + n);
if ((n += len) == TAP_BUF_BYTES) break; @@ -1083,8 +1098,7 @@ restart:
ret = errno;
- tap4_handler(c, pool_tap4, now); - tap6_handler(c, pool_tap6, now); + tap_handler(c, now);
if (len > 0 || ret == EAGAIN) return; diff --git a/tap.h b/tap.h index 2285a87093f9..d496bd0e4b99 100644 --- a/tap.h +++ b/tap.h @@ -70,5 +70,8 @@ void tap_handler_passt(struct ctx *c, uint32_t events, const struct timespec *now); int tap_sock_unix_open(char *sock_path); void tap_sock_init(struct ctx *c); +void tap_flush_pools(void); +void tap_handler(struct ctx *c, const struct timespec *now); +void tap_add_packet(struct ctx *c, ssize_t l2len, char *p);
#endif /* TAP_H */
This commit refactors the udp_update_hdr4() and udp_update_hdr6() functions
to improve code portability by replacing the udp_meta_t parameter with
more specific parameters for the IPv4 and IPv6 headers (iphdr/ipv6hdr)
and the source socket address (sockaddr_in/sockaddr_in6).
It also moves the tap_hdr_update() function call inside the udp_tap_send()
function not to have to pass the TAP header to udp_update_hdr4() and
udp_update_hdr6()
This refactor reduces complexity by making the functions more modular and
ensuring that each function operates on more narrowly scoped data structures.
This will facilitate future backend introduction like vhost-user.
Signed-off-by: Laurent Vivier
We are going to introduce a variant of the function to use
vhost-user buffers rather than passt internal buffers.
Signed-off-by: Laurent Vivier
As we are going to introduce the MODE_VU that will act like
the mode MODE_PASST, compare to MODE_PASTA rather than to add
a comparison to MODE_VU when we check for MODE_PASST.
Signed-off-by: Laurent Vivier
it was needed by a draft version of vhost-user, it is not needed
anymore.
Signed-off-by: Laurent Vivier
buf_size is set to sizeof(pkt_buf) by default. And it seems more correct
to provide the actual size of the buffer.
Later a buf_size of 0 will allow vhost-user mode to detect
guest memory buffers.
Signed-off-by: Laurent Vivier
On Wed, 12 Jun 2024 17:47:26 +0200
Laurent Vivier
Extract buffers management code from tcp.c and move it to tcp_buf.c tcp.c keeps all the generic code and will be also used by the vhost-user functions.
Also compare mode to MODE_PASTA, as we will manage vhost-user mode (MODE_VU) like MODE_PASST.
Something in this series breaks the pasta_podman/bats test, number 19 in that script (Single TCP port forwarding, IPv4, tap). I'm bisecting now... -- Stefano
On Wed, 12 Jun 2024 19:16:17 +0200
Stefano Brivio
On Wed, 12 Jun 2024 17:47:26 +0200 Laurent Vivier
wrote: Extract buffers management code from tcp.c and move it to tcp_buf.c tcp.c keeps all the generic code and will be also used by the vhost-user functions.
Also compare mode to MODE_PASTA, as we will manage vhost-user mode (MODE_VU) like MODE_PASST.
Something in this series breaks the pasta_podman/bats test, number 19 in that script (Single TCP port forwarding, IPv4, tap). I'm bisecting now...
$ git bisect good
3c6a20486425ed00ba5b631bea11135045794dc2 is the first bad commit
commit 3c6a20486425ed00ba5b631bea11135045794dc2
Author: Laurent Vivier
On Wed, 12 Jun 2024 19:37:44 +0200
Stefano Brivio
On Wed, 12 Jun 2024 19:16:17 +0200 Stefano Brivio
wrote: On Wed, 12 Jun 2024 17:47:26 +0200 Laurent Vivier
wrote: Extract buffers management code from tcp.c and move it to tcp_buf.c tcp.c keeps all the generic code and will be also used by the vhost-user functions.
Also compare mode to MODE_PASTA, as we will manage vhost-user mode (MODE_VU) like MODE_PASST.
Something in this series breaks the pasta_podman/bats test, number 19 in that script (Single TCP port forwarding, IPv4, tap). I'm bisecting now...
$ git bisect good 3c6a20486425ed00ba5b631bea11135045794dc2 is the first bad commit commit 3c6a20486425ed00ba5b631bea11135045794dc2 Author: Laurent Vivier
Date: Wed Jun 12 17:47:34 2024 +0200 tap: use in->buf_size rather than sizeof(pkt_buf)
buf_size is set to sizeof(pkt_buf) by default. And it seems more correct to provide the actual size of the buffer.
Later a buf_size of 0 will allow vhost-user mode to detect guest memory buffers.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson Signed-off-by: Stefano Brivio tap.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
...checking that patch now.
Ouch, sorry, bad bisect, I forgot to rebuild at every step. The actual
issue seems to be in 1/8 (details as a reply to that one):
$ git bisect bad
fa462103eb219dfb8e6fe7bb25d1707d8a82b2a2 is the first bad commit
commit fa462103eb219dfb8e6fe7bb25d1707d8a82b2a2
Author: Laurent Vivier
participants (3)
-
David Gibson
-
Laurent Vivier
-
Stefano Brivio