From: Jon Maloy <jmaloy(a)redhat.com> Based on an original patch by Jon Maloy: -- The recently added socket option SO_PEEK_OFF is not supported for TCP/IPv6 sockets. Until we get that support into the kernel we need to test for support in both protocols to set the global 'peek_offset_cap´ to true. -- Compared to the original patch: - only check for SO_PEEK_OFF support for enabled IP versions - use sa_family_t instead of int to pass the address family around Fixes: e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available") Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- tcp.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/tcp.c b/tcp.c index 0c66ac8..c031f13 100644 --- a/tcp.c +++ b/tcp.c @@ -2470,6 +2470,29 @@ static void tcp_sock_refill_init(const struct ctx *c) } } +/** + * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel + * @af: Address family, IPv4 or IPv6 + * + * Return: true if supported, false otherwise + */ +bool tcp_probe_peek_offset_cap(sa_family_t af) +{ + bool ret = false; + int s, optv = 0; + + s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); + if (s < 0) { + warn_perror("Temporary TCP socket creation failed"); + } else { + if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) + ret = true; + close(s); + } + + return ret; +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2478,9 +2501,6 @@ static void tcp_sock_refill_init(const struct ctx *c) */ int tcp_init(struct ctx *c) { - unsigned int optv = 0; - int s; - ASSERT(!c->no_tcp); if (c->ifi4) @@ -2502,15 +2522,8 @@ int tcp_init(struct ctx *c) NS_CALL(tcp_ns_socks_init, c); } - /* Probe for SO_PEEK_OFF support */ - s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); - if (s < 0) { - warn_perror("Temporary TCP socket creation failed"); - } else { - if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) - peek_offset_cap = true; - close(s); - } + peek_offset_cap = (!c->ifi4 || tcp_probe_peek_offset_cap(AF_INET)) && + (!c->ifi6 || tcp_probe_peek_offset_cap(AF_INET6)); info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not "); return 0; -- 2.43.0
On Tue, Jul 23, 2024 at 12:09:37AM +0200, Stefano Brivio wrote:From: Jon Maloy <jmaloy(a)redhat.com> Based on an original patch by Jon Maloy:Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 23 Jul 2024 00:09:37 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote:From: Jon Maloy <jmaloy(a)redhat.com> Based on an original patch by Jon Maloy: -- The recently added socket option SO_PEEK_OFF is not supported for TCP/IPv6 sockets. Until we get that support into the kernel we need to test for support in both protocols to set the global 'peek_offset_cap´ to true. -- Compared to the original patch: - only check for SO_PEEK_OFF support for enabled IP versions - use sa_family_t instead of int to pass the address family around Fixes: e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available")...so, with this, the probing issue is solved: on a 6.10 kernel, SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4). However, if I disable it, for some reason, resorting to IPv4, at least together with the flow table (applying just this patch to HEAD), I get something that looks like one of the "old" TCP stalls. On the host: $ ./passt -f -t 10000 -4 and in the guest: # ip link set dev eth0 up # dhclient eth0 # iperf3 -s -p 10000 back to the host: $ iperf3 -c 127.0.0.1 -p 10000 Connecting to host 127.0.0.1, port 10000 [ 5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 11.2 MBytes 94.3 Mbits/sec 0 5.50 MBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes ...the transfer never recovers. I didn't really have time to debug this further. At the moment I would be inclined to temporarily revert commit e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available"), but it's not a good idea if this happens to be hiding some (unlikely?) issue with the flow table. -- Stefano
On Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:On Tue, 23 Jul 2024 00:09:37 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote:Bother. I've reproduced and am debugging now.From: Jon Maloy <jmaloy(a)redhat.com> Based on an original patch by Jon Maloy:...so, with this, the probing issue is solved: on a 6.10 kernel, SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4). However, if I disable it, for some reason, resorting to IPv4, at least together with the flow table (applying just this patch to HEAD), I get something that looks like one of the "old" TCP stalls. On the host: $ ./passt -f -t 10000 -4 and in the guest: # ip link set dev eth0 up # dhclient eth0 # iperf3 -s -p 10000 back to the host: $ iperf3 -c 127.0.0.1 -p 10000 Connecting to host 127.0.0.1, port 10000 [ 5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 11.2 MBytes 94.3 Mbits/sec 0 5.50 MBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes ...the transfer never recovers.I didn't really have time to debug this further. At the moment I would be inclined to temporarily revert commit e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available"), but it's not a good idea if this happens to be hiding some (unlikely?) issue with the flow table.-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, Jul 24, 2024 at 10:40:15AM +1000, David Gibson wrote:On Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:Found it. Looks like one of the cases where we need to set SO_PEEK_OFF was lost somewhere in the refactorings :(.On Tue, 23 Jul 2024 00:09:37 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote:Bother. I've reproduced and am debugging now.From: Jon Maloy <jmaloy(a)redhat.com> Based on an original patch by Jon Maloy:...so, with this, the probing issue is solved: on a 6.10 kernel, SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4). However, if I disable it, for some reason, resorting to IPv4, at least together with the flow table (applying just this patch to HEAD), I get something that looks like one of the "old" TCP stalls. On the host: $ ./passt -f -t 10000 -4 and in the guest: # ip link set dev eth0 up # dhclient eth0 # iperf3 -s -p 10000 back to the host: $ iperf3 -c 127.0.0.1 -p 10000 Connecting to host 127.0.0.1, port 10000 [ 5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 11.2 MBytes 94.3 Mbits/sec 0 5.50 MBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes ...the transfer never recovers.-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonI didn't really have time to debug this further. At the moment I would be inclined to temporarily revert commit e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available"), but it's not a good idea if this happens to be hiding some (unlikely?) issue with the flow table.
On Wed, 24 Jul 2024 13:31:49 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Wed, Jul 24, 2024 at 10:40:15AM +1000, David Gibson wrote:Hah, great, thanks, it fixes the issue on my setup as well. Re-running all tests now... -- StefanoOn Tue, Jul 23, 2024 at 10:29:36PM +0200, Stefano Brivio wrote:Found it. Looks like one of the cases where we need to set SO_PEEK_OFF was lost somewhere in the refactorings :(.On Tue, 23 Jul 2024 00:09:37 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote:Bother. I've reproduced and am debugging now.From: Jon Maloy <jmaloy(a)redhat.com> Based on an original patch by Jon Maloy:...so, with this, the probing issue is solved: on a 6.10 kernel, SO_PEEK_OFF is not used, unless I disable IPv6 (with --ipv4-only / -4). However, if I disable it, for some reason, resorting to IPv4, at least together with the flow table (applying just this patch to HEAD), I get something that looks like one of the "old" TCP stalls. On the host: $ ./passt -f -t 10000 -4 and in the guest: # ip link set dev eth0 up # dhclient eth0 # iperf3 -s -p 10000 back to the host: $ iperf3 -c 127.0.0.1 -p 10000 Connecting to host 127.0.0.1, port 10000 [ 5] local 127.0.0.1 port 39046 connected to 127.0.0.1 port 10000 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 11.2 MBytes 94.3 Mbits/sec 0 5.50 MBytes [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 0 5.50 MBytes ...the transfer never recovers.