The recently added socket option SO_PEEK_OFF is not supported for TCP/IPv6 sockets. Until we get that support into the kernel we need to test for support in both protocols to set the global 'peek_offset_cap´ to true. Signed-off-by: Jon Maloy <jmaloy(a)redhat.com> --- tcp.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/tcp.c b/tcp.c index c5431f1..32026ca 100644 --- a/tcp.c +++ b/tcp.c @@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c) } } +/** + * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel + * @af: Address family, IPv4 or IPv6 + * + * Return: true if supported, false otherwise + */ +bool tcp_probe_peek_offset_cap(int af) +{ + bool ret = false; + int s, optv = 0; + + s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); + if (s < 0) { + warn_perror("Temporary TCP socket creation failed"); + } else { + if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) + ret = true; + close(s); + } + return ret; +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c) */ int tcp_init(struct ctx *c) { - unsigned int b, optv = 0; - int s; + unsigned int b; ASSERT(!c->no_tcp); @@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c) NS_CALL(tcp_ns_socks_init, c); } - /* Probe for SO_PEEK_OFF support */ - s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); - if (s < 0) { - warn_perror("Temporary TCP socket creation failed"); - } else { - if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) - peek_offset_cap = true; - close(s); - } + peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) && + tcp_probe_peek_offset_cap(AF_INET6); info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not "); return 0; -- 2.45.2
My first approach to this was to condition the use of SO_PEEK_OFF with tcpv4, e.g., basically a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made an interesting discovery. It turns out that, unless the ´-4' option is explicitly given on the command line, all sockets are v6, even those that are later used as v4 sockets. So, the set_peek_off() call failed even for supposedly v4 sockets. I checked this by adding a printout to the tcp_listen_handler(), and noticed that all returns from the accept4() call goes into the AF_INET6 branch, even when the client (iperf3) call is using an IPv4 address. During traffic, the very same socket is marked as v4 in the tcp_tap_conn structure, and this seems to have worked just fine until I added the set_peek_offset call(). I believe this is an issue that has been introduced during the last months, since I didn't start using the ´-4' option consistently until some months ago, and then it worked. Happy summer ///jon On 2024-07-20 09:54, Jon Maloy wrote:The recently added socket option SO_PEEK_OFF is not supported for TCP/IPv6 sockets. Until we get that support into the kernel we need to test for support in both protocols to set the global 'peek_offset_cap´ to true. Signed-off-by: Jon Maloy <jmaloy(a)redhat.com> --- tcp.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/tcp.c b/tcp.c index c5431f1..32026ca 100644 --- a/tcp.c +++ b/tcp.c @@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c) } } +/** + * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel + * @af: Address family, IPv4 or IPv6 + * + * Return: true if supported, false otherwise + */ +bool tcp_probe_peek_offset_cap(int af) +{ + bool ret = false; + int s, optv = 0; + + s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); + if (s < 0) { + warn_perror("Temporary TCP socket creation failed"); + } else { + if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) + ret = true; + close(s); + } + return ret; +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c) */ int tcp_init(struct ctx *c) { - unsigned int b, optv = 0; - int s; + unsigned int b; ASSERT(!c->no_tcp); @@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c) NS_CALL(tcp_ns_socks_init, c); } - /* Probe for SO_PEEK_OFF support */ - s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); - if (s < 0) { - warn_perror("Temporary TCP socket creation failed"); - } else { - if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) - peek_offset_cap = true; - close(s); - } + peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) && + tcp_probe_peek_offset_cap(AF_INET6); info("SO_PEEK_OFF%ssupported", peek_offset_cap ? " " : " not "); return 0;
On Sat, 20 Jul 2024 10:46:07 -0400 Jon Maloy <jmaloy(a)redhat.com> wrote:My first approach to this was to condition the use of SO_PEEK_OFF with tcpv4, e.g., basically a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made an interesting discovery. It turns out that, unless the ´-4' option is explicitly given on the command line, all sockets are v6, even those that are later used as v4 sockets.Not necessarily: if IPv6 support is disabled for other reasons, such as the fact that we didn't find any routable IPv6 address, sockets will also be IPv4-only. See tcp_sock_init().So, the set_peek_off() call failed even for supposedly v4 sockets. I checked this by adding a printout to the tcp_listen_handler(), and noticed that all returns from the accept4() call goes into the AF_INET6 branch, even when the client (iperf3) call is using an IPv4 address. During traffic, the very same socket is marked as v4 in the tcp_tap_conn structure, and this seems to have worked just fine until I added the set_peek_offset call(). I believe this is an issue that has been introduced during the last months, since I didn't start using the ´-4' option consistently until some months ago, and then it worked.That's not actually an issue, it's intended, see commit 8e914238b6de ("tcp: Use dual stack sockets for port forwarding when possible"). Could it be that you enabled IPv6 on your setup meanwhile, and that's why you see this now? I think I tested your changes on an isolated IPv4-only setup, as I didn't run a kernel version with SO_PEEK_OFF support at the time. -- Stefano
On Sun, Jul 21, 2024 at 11:21:18AM +0200, Stefano Brivio wrote:On Sat, 20 Jul 2024 10:46:07 -0400 Jon Maloy <jmaloy(a)redhat.com> wrote:Also, if your forwarding options give explicit IPv4 addresses (or even "0.0.0.0") you'll get IPv4 sockets...My first approach to this was to condition the use of SO_PEEK_OFF with tcpv4, e.g., basically a test like if (v4 && peek_offset_cap) {...} everywhere, but then I made an interesting discovery. It turns out that, unless the ´-4' option is explicitly given on the command line, all sockets are v6, even those that are later used as v4 sockets.Not necessarily: if IPv6 support is disabled for other reasons, such as the fact that we didn't find any routable IPv6 address, sockets will also be IPv4-only. See tcp_sock_init()...because this is exactly the reason. We're using dual stack sockets to reduce the amount of kernel memory we consume with many forwards.So, the set_peek_off() call failed even for supposedly v4 sockets. I checked this by adding a printout to the tcp_listen_handler(), and noticed that all returns from the accept4() call goes into the AF_INET6 branch, even when the client (iperf3) call is using an IPv4 address. During traffic, the very same socket is marked as v4 in the tcp_tap_conn structure, and this seems to have worked just fine until I added the set_peek_offset call(). I believe this is an issue that has been introduced during the last months, since I didn't start using the ´-4' option consistently until some months ago, and then it worked.That's not actually an issue, it's intended, see commit 8e914238b6de ("tcp: Use dual stack sockets for port forwarding when possible").Could it be that you enabled IPv6 on your setup meanwhile, and that's why you see this now? I think I tested your changes on an isolated IPv4-only setup, as I didn't run a kernel version with SO_PEEK_OFF support at the time.-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sat, 20 Jul 2024 09:54:53 -0400 Jon Maloy <jmaloy(a)redhat.com> wrote:The recently added socket option SO_PEEK_OFF is not supported for TCP/IPv6 sockets. Until we get that support into the kernel we need to test for support in both protocols to set the global 'peek_offset_cap´ to true. Signed-off-by: Jon Maloy <jmaloy(a)redhat.com> --- tcp.c | 36 +++++++++++++++++++++++++----------- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/tcp.c b/tcp.c index c5431f1..32026ca 100644 --- a/tcp.c +++ b/tcp.c @@ -2717,6 +2717,28 @@ static void tcp_sock_refill_init(const struct ctx *c) } } +/** + * tcp_probe_peek_offset_cap() - Check if SO_PEEK_OFF is supported by kernel + * @af: Address family, IPv4 or IPv6 + * + * Return: true if supported, false otherwise + */ +bool tcp_probe_peek_offset_cap(int af) +{ + bool ret = false; + int s, optv = 0; + + s = socket(af, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); + if (s < 0) { + warn_perror("Temporary TCP socket creation failed"); + } else { + if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) + ret = true; + close(s); + } + return ret; +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2725,8 +2747,7 @@ static void tcp_sock_refill_init(const struct ctx *c) */ int tcp_init(struct ctx *c) { - unsigned int b, optv = 0; - int s; + unsigned int b; ASSERT(!c->no_tcp); @@ -2752,15 +2773,8 @@ int tcp_init(struct ctx *c) NS_CALL(tcp_ns_socks_init, c); } - /* Probe for SO_PEEK_OFF support */ - s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); - if (s < 0) { - warn_perror("Temporary TCP socket creation failed"); - } else { - if (!setsockopt(s, SOL_SOCKET, SO_PEEK_OFF, &optv, sizeof(int))) - peek_offset_cap = true; - close(s); - } + peek_offset_cap = tcp_probe_peek_offset_cap(AF_INET) && + tcp_probe_peek_offset_cap(AF_INET6);I think we shouldn't probe for IPv4 SO_PEEK_OFF support if we're not interested in it, and the same applies for IPv6: those two checks should depend on whether c->ifi4 and c->ifi6 are set (following the same logic as tcp_sock_init()). In practice, since you just submitted the fix to have SO_PEEK_OFF support also for IPv6 sockets, it doesn't matter so much, but it might still be relevant for users who will stick to 6.9 and 6.10 kernel versions for a while, for whatever reason. Eventually, we might want to support IPv6 operation on IPv4-only hosts, maybe even with default options, making this even less relevant. But the day we get to it, it should be simpler to just replace all the checks of c->ifi4 / c->ifi6 used to represent IPv4 / IPv6 support, rather than replacing slightly different bits of logic. -- Stefano