[PATCH] tcp: probe for SO_PEEK_OFF both in tcpv4 and tcp6
The recently added socket option SO_PEEK_OFF is not supported for
TCP/IPv6 sockets. Until we get that support into the kernel we need to
test for support in both protocols to set the global 'peek_offset_cap´
to true.
Signed-off-by: Jon Maloy
My first approach to this was to condition the use of SO_PEEK_OFF with tcpv4, e.g., basically a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made an interesting discovery. It turns out that, unless the ´-4' option is explicitly given on the command line, all sockets are v6, even those that are later used as v4 sockets. So, the set_peek_off() call failed even for supposedly v4 sockets. I checked this by adding a printout to the tcp_listen_handler(), and noticed that all returns from the accept4() call goes into the AF_INET6 branch, even when the client (iperf3) call is using an IPv4 address. During traffic, the very same socket is marked as v4 in the tcp_tap_conn structure, and this seems to have worked just fine until I added the set_peek_offset call(). I believe this is an issue that has been introduced during the last months, since I didn't start using the ´-4' option consistently until some months ago, and then it worked. Happy summer ///jon On 2024-07-20 09:54, Jon Maloy wrote:
The recently added socket option SO_PEEK_OFF is not supported for TCP/IPv6 sockets. Until we get that support into the kernel we need to test for support in both protocols to set the global 'peek_offset_cap´ to true.
Signed-off-by: Jon Maloy
--- tcp.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/tcp.c b/tcp.c index c5431f1..32026ca 100644 --- a/tcp.c +++ b/tcp.c @@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c) } }
+/** + * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel + * @af: Address family, IPv4 or IPv6 + * + * Return: true if supported, false otherwise + */ +bool tcp_probe_peek_offset_cap(int af) +{ + bool ret = false; + int s, optv = 0; + + s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); + if (s < 0) { + warn_perror("Temporary TCP socket creation failed"); + } else { + if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) + ret = true; + close(s); + } + return ret; +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c) */ int tcp_init(struct ctx *c) { - unsigned int b, optv = 0; - int s; + unsigned int b;
ASSERT(!c->no_tcp);
@@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c) NS_CALL(tcp_ns_socks_init, c); }
- /* Probe for SO_PEEK_OFF support */ - s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); - if (s < 0) { - warn_perror("Temporary TCP socket creation failed"); - } else { - if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) - peek_offset_cap = true; - close(s); - } + peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) && + tcp_probe_peek_offset_cap(AF_INET6); info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not ");
return 0;
On Sat, 20 Jul 2024 10:46:07 -0400
Jon Maloy
My first approach to this was to condition the use of SO_PEEK_OFF with tcpv4, e.g., basically a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made an interesting discovery.
It turns out that, unless the ´-4' option is explicitly given on the command line, all sockets are v6, even those that are later used as v4 sockets.
Not necessarily: if IPv6 support is disabled for other reasons, such as the fact that we didn't find any routable IPv6 address, sockets will also be IPv4-only. See tcp_sock_init().
So, the set_peek_off() call failed even for supposedly v4 sockets.
I checked this by adding a printout to the tcp_listen_handler(), and noticed that all returns from the accept4() call goes into the AF_INET6 branch, even when the client (iperf3) call is using an IPv4 address. During traffic, the very same socket is marked as v4 in the tcp_tap_conn structure, and this seems to have worked just fine until I added the set_peek_offset call().
I believe this is an issue that has been introduced during the last months, since I didn't start using the ´-4' option consistently until some months ago, and then it worked.
That's not actually an issue, it's intended, see commit 8e914238b6de ("tcp: Use dual stack sockets for port forwarding when possible"). Could it be that you enabled IPv6 on your setup meanwhile, and that's why you see this now? I think I tested your changes on an isolated IPv4-only setup, as I didn't run a kernel version with SO_PEEK_OFF support at the time. -- Stefano
On Sun, Jul 21, 2024 at 11:21:18AM +0200, Stefano Brivio wrote:
On Sat, 20 Jul 2024 10:46:07 -0400 Jon Maloy
wrote: My first approach to this was to condition the use of SO_PEEK_OFF with tcpv4, e.g., basically a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made an interesting discovery.
It turns out that, unless the ´-4' option is explicitly given on the command line, all sockets are v6, even those that are later used as v4 sockets.
Not necessarily: if IPv6 support is disabled for other reasons, such as the fact that we didn't find any routable IPv6 address, sockets will also be IPv4-only. See tcp_sock_init().
Also, if your forwarding options give explicit IPv4 addresses (or even "0.0.0.0") you'll get IPv4 sockets...
So, the set_peek_off() call failed even for supposedly v4 sockets.
I checked this by adding a printout to the tcp_listen_handler(), and noticed that all returns from the accept4() call goes into the AF_INET6 branch, even when the client (iperf3) call is using an IPv4 address. During traffic, the very same socket is marked as v4 in the tcp_tap_conn structure, and this seems to have worked just fine until I added the set_peek_offset call().
I believe this is an issue that has been introduced during the last months, since I didn't start using the ´-4' option consistently until some months ago, and then it worked.
That's not actually an issue, it's intended, see commit 8e914238b6de ("tcp: Use dual stack sockets for port forwarding when possible").
..because this is exactly the reason. We're using dual stack sockets to reduce the amount of kernel memory we consume with many forwards.
Could it be that you enabled IPv6 on your setup meanwhile, and that's why you see this now? I think I tested your changes on an isolated IPv4-only setup, as I didn't run a kernel version with SO_PEEK_OFF support at the time.
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sat, 20 Jul 2024 09:54:53 -0400
Jon Maloy
The recently added socket option SO_PEEK_OFF is not supported for TCP/IPv6 sockets. Until we get that support into the kernel we need to test for support in both protocols to set the global 'peek_offset_cap´ to true.
Signed-off-by: Jon Maloy
--- tcp.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/tcp.c b/tcp.c index c5431f1..32026ca 100644 --- a/tcp.c +++ b/tcp.c @@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c) } }
+/** + * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel + * @af: Address family, IPv4 or IPv6 + * + * Return: true if supported, false otherwise + */ +bool tcp_probe_peek_offset_cap(int af) +{ + bool ret = false; + int s, optv = 0; + + s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); + if (s < 0) { + warn_perror("Temporary TCP socket creation failed"); + } else { + if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) + ret = true; + close(s); + } + return ret; +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c) */ int tcp_init(struct ctx *c) { - unsigned int b, optv = 0; - int s; + unsigned int b;
ASSERT(!c->no_tcp);
@@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c) NS_CALL(tcp_ns_socks_init, c); }
- /* Probe for SO_PEEK_OFF support */ - s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); - if (s < 0) { - warn_perror("Temporary TCP socket creation failed"); - } else { - if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) - peek_offset_cap = true; - close(s); - } + peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) && + tcp_probe_peek_offset_cap(AF_INET6);
I think we shouldn't probe for IPv4 SO_PEEK_OFF support if we're not interested in it, and the same applies for IPv6: those two checks should depend on whether c->ifi4 and c->ifi6 are set (following the same logic as tcp_sock_init()). In practice, since you just submitted the fix to have SO_PEEK_OFF support also for IPv6 sockets, it doesn't matter so much, but it might still be relevant for users who will stick to 6.9 and 6.10 kernel versions for a while, for whatever reason. Eventually, we might want to support IPv6 operation on IPv4-only hosts, maybe even with default options, making this even less relevant. But the day we get to it, it should be simpler to just replace all the checks of c->ifi4 / c->ifi6 used to represent IPv4 / IPv6 support, rather than replacing slightly different bits of logic. -- Stefano
participants (3)
-
David Gibson
-
Jon Maloy
-
Stefano Brivio