Downstream testing recently discovered that inbound connections can't be migrated properly, because the new socket gets an address conflict with its corresponding listening socket. It turns out this can be avoided by delaying bind() until after we're already in repair mode. Patch 4/4 is the actual fix here. Patch 3/4 adds a test program checking the behaviour to doc/platform-requirements. Patches 1 & 2 fix minor problems I spotted in doc/platform-requirements writing 3/4. Only 4/4 will need to be backported. David Gibson (4): platform requirements: Fix clang-tidy warning platform requirements: Add attributes to die() function platform requirements: Add test for address conflicts with TCP_REPAIR migrate, tcp: bind() migrated sockets in repair mode doc/platform-requirements/.gitignore | 1 + doc/platform-requirements/Makefile | 4 +- doc/platform-requirements/common.h | 1 + doc/platform-requirements/listen-vs-repair.c | 128 ++++++++++++++++++ .../reuseaddr-priority.c | 6 +- tcp.c | 38 ++++-- 6 files changed, 162 insertions(+), 16 deletions(-) create mode 100644 doc/platform-requirements/listen-vs-repair.c -- 2.49.0
Recent clang-tidy versions complain about enums defined with some but not all entries given explicit values. I'm not entirely convinced about whether that's a useful warning, but in any case we really don't need the explicit values in doc/platform-requirements/reuseaddr-priority.c, so remove them to make clang happy. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- doc/platform-requirements/reuseaddr-priority.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/platform-requirements/reuseaddr-priority.c b/doc/platform-requirements/reuseaddr-priority.c index 701b6fff..af39a39c 100644 --- a/doc/platform-requirements/reuseaddr-priority.c +++ b/doc/platform-requirements/reuseaddr-priority.c @@ -46,13 +46,13 @@ /* Different cases for receiving socket configuration */ enum sock_type { /* Socket is bound to 0.0.0.0:DSTPORT and not connected */ - SOCK_BOUND_ANY = 0, + SOCK_BOUND_ANY, /* Socket is bound to 127.0.0.1:DSTPORT and not connected */ - SOCK_BOUND_LO = 1, + SOCK_BOUND_LO, /* Socket is bound to 0.0.0.0:DSTPORT and connected to 127.0.0.1:SRCPORT */ - SOCK_CONNECTED = 2, + SOCK_CONNECTED, NUM_SOCK_TYPES, }; -- 2.49.0
Add both format string and ((noreturn)) attributes to the version of die() used in the test programs in doc/platform-requirements. As well as potentially catching problems in format strings, this means that the compiler and static checkers can properly reason about the fact that it will exit, preventing bogus warnings. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- doc/platform-requirements/common.h | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/platform-requirements/common.h b/doc/platform-requirements/common.h index 8844b1ed..e85fc2b5 100644 --- a/doc/platform-requirements/common.h +++ b/doc/platform-requirements/common.h @@ -15,6 +15,7 @@ #include <stdio.h> #include <stdlib.h> +__attribute__((format(printf, 1, 2), noreturn)) static inline void die(const char *fmt, ...) { va_list ap; -- 2.49.0
Simple test program to check the behaviour we need for bind() address conflicts between listening sockets and repair mode sockets. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- doc/platform-requirements/.gitignore | 1 + doc/platform-requirements/Makefile | 4 +- doc/platform-requirements/listen-vs-repair.c | 128 +++++++++++++++++++ 3 files changed, 131 insertions(+), 2 deletions(-) create mode 100644 doc/platform-requirements/listen-vs-repair.c diff --git a/doc/platform-requirements/.gitignore b/doc/platform-requirements/.gitignore index 3b5a10ac..f6272cf0 100644 --- a/doc/platform-requirements/.gitignore +++ b/doc/platform-requirements/.gitignore @@ -1,3 +1,4 @@ +/listen-vs-repair /reuseaddr-priority /recv-zero /udp-close-dup diff --git a/doc/platform-requirements/Makefile b/doc/platform-requirements/Makefile index 6a7d3748..83930ef8 100644 --- a/doc/platform-requirements/Makefile +++ b/doc/platform-requirements/Makefile @@ -3,8 +3,8 @@ # Copyright Red Hat # Author: David Gibson <david(a)gibson.dropbear.id.au> -TARGETS = reuseaddr-priority recv-zero udp-close-dup -SRCS = reuseaddr-priority.c recv-zero.c udp-close-dup.c +TARGETS = reuseaddr-priority recv-zero udp-close-dup listen-vs-repair +SRCS = reuseaddr-priority.c recv-zero.c udp-close-dup.c listen-vs-repair.c CFLAGS = -Wall all: cppcheck clang-tidy $(TARGETS:%=check-%) diff --git a/doc/platform-requirements/listen-vs-repair.c b/doc/platform-requirements/listen-vs-repair.c new file mode 100644 index 00000000..d31fe3f7 --- /dev/null +++ b/doc/platform-requirements/listen-vs-repair.c @@ -0,0 +1,128 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* liste-vs-repair.c + * + * Do listening sockets have address conflicts with sockets under repair + * ==================================================================== + * + * When we accept() an incoming connection the accept()ed socket will have the + * same local address as the listening socket. This can be a complication on + * migration. On the migration target we've already set up listening sockets + * according to the command line. However to restore connections that we're + * migrating in we need to bind the new sockets to the same address, which would + * be an address conflict on the face of it. This test program verifies that + * enabling repair mode before bind() correctly suppresses that conflict. + * + * Copyright Red Hat + * Author: David Gibson <david(a)gibson.dropbear.id.au> + */ + +/* NOLINTNEXTLINE(bugprone-reserved-identifier,cert-dcl37-c,cert-dcl51-cpp) */ +#define _GNU_SOURCE + +#include <arpa/inet.h> +#include <errno.h> +#include <linux/netlink.h> +#include <linux/rtnetlink.h> +#include <net/if.h> +#include <netinet/in.h> +#include <netinet/tcp.h> +#include <sched.h> +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include "common.h" + +#define PORT 13256U +#define CPORT 13257U + +/* 127.0.0.1:PORT */ +static const struct sockaddr_in addr = SOCKADDR_INIT(INADDR_LOOPBACK, PORT); + +/* 127.0.0.1:CPORT */ +static const struct sockaddr_in caddr = SOCKADDR_INIT(INADDR_LOOPBACK, CPORT); + +/* Put ourselves into a network sandbox */ +static void net_sandbox(void) +{ + /* NOLINTNEXTLINE(altera-struct-pack-align) */ + const struct req_t { + struct nlmsghdr nlh; + struct ifinfomsg ifm; + } __attribute__((packed)) req = { + .nlh.nlmsg_type = RTM_NEWLINK, + .nlh.nlmsg_flags = NLM_F_REQUEST, + .nlh.nlmsg_len = sizeof(req), + .nlh.nlmsg_seq = 1, + .ifm.ifi_family = AF_UNSPEC, + .ifm.ifi_index = 1, + .ifm.ifi_flags = IFF_UP, + .ifm.ifi_change = IFF_UP, + }; + int nl; + + if (unshare(CLONE_NEWUSER | CLONE_NEWNET)) + die("unshare(): %s\n", strerror(errno)); + + /* Bring up lo in the new netns */ + nl = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE); + if (nl < 0) + die("Can't create netlink socket: %s\n", strerror(errno)); + + if (send(nl, &req, sizeof(req), 0) < 0) + die("Netlink send(): %s\n", strerror(errno)); + close(nl); +} + +static void check(void) +{ + int s1, s2, op; + + s1 = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (s1 < 0) + die("socket() 1: %s\n", strerror(errno)); + + if (bind(s1, (struct sockaddr *)&addr, sizeof(addr))) + die("bind() 1: %s\n", strerror(errno)); + + if (listen(s1, 0)) + die("listen(): %s\n", strerror(errno)); + + s2 = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (s2 < 0) + die("socket() 2: %s\n", strerror(errno)); + + op = TCP_REPAIR_ON; + if (setsockopt(s2, SOL_TCP, TCP_REPAIR, &op, sizeof(op))) + die("TCP_REPAIR: %s\n", strerror(errno)); + + if (bind(s2, (struct sockaddr *)&addr, sizeof(addr))) + die("bind() 2: %s\n", strerror(errno)); + + if (connect(s2, (struct sockaddr *)&caddr, sizeof(caddr))) + die("connect(): %s\n", strerror(errno)); + + op = TCP_REPAIR_OFF_NO_WP; + if (setsockopt(s2, SOL_TCP, TCP_REPAIR, &op, sizeof(op))) + die("TCP_REPAIR: %s\n", strerror(errno)); + + close(s1); + close(s2); +} + +int main(int argc, char *argv[]) +{ + (void)argc; + (void)argv; + + net_sandbox(); + + check(); + + printf("Repair mode appears to properly suppress conflicts with listening sockets\n"); + + exit(0); +} -- 2.49.0
Currently on a migration target, we create then immediately bind() new sockets for the TCP connections we're reconstructing. Mostly, this works, since a socket() that is bound but hasn't had listen() or connect() called is essentially passive. However, this bind() is subject to the usual address conflict checking. In particular that means if we already have a listening socket on that port, we'll get an EADDRINUSE. This will happen for every connection we try to migrate that was initiated from outside to the guest, since we necessarily created a listening socket for that case. We set SO_REUSEADDR on the socket in an attempt to avoid this, but that's not sufficient; even with SO_REUSEADDR address conflicts are still prohibited for listening sockets. Of course once these incoming sockets are fully repaired and connect()ed they'll no longer conflict, but that doesn't help us if we fail at the bind(). We can avoid this by not calling bind() until we're already in repair mode which suppresses this transient conflict. Because of the batching of setting repair mode, to do that we need to move the bind to a step in tcp_flow_migrate_target_ext(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 38 +++++++++++++++++++++++++++----------- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/tcp.c b/tcp.c index fa1d8857..35626c91 100644 --- a/tcp.c +++ b/tcp.c @@ -3414,13 +3414,8 @@ fail: static int tcp_flow_repair_socket(struct ctx *c, struct tcp_tap_conn *conn) { sa_family_t af = CONN_V4(conn) ? AF_INET : AF_INET6; - const struct flowside *sockside = HOSTFLOW(conn); - union sockaddr_inany a; - socklen_t sl; int s, rc; - pif_sockaddr(c, &a, &sl, PIF_HOST, &sockside->oaddr, sockside->oport); - if ((conn->sock = socket(af, SOCK_STREAM | SOCK_NONBLOCK | SOCK_CLOEXEC, IPPROTO_TCP)) < 0) { rc = -errno; @@ -3435,12 +3430,6 @@ static int tcp_flow_repair_socket(struct ctx *c, struct tcp_tap_conn *conn) tcp_sock_set_nodelay(s); - if (bind(s, &a.sa, sizeof(a))) { - rc = -errno; - flow_perror(conn, "Failed to bind socket for migrated flow"); - goto err; - } - if ((rc = tcp_flow_repair_on(c, conn))) goto err; @@ -3452,6 +3441,30 @@ err: return rc; } +/** + * tcp_flow_repair_bind() - Bind socket in repair mode + * @c: Execution context + * @conn: Pointer to the TCP connection structure + * + * Return: 0 on success, negative error code on failure + */ +static int tcp_flow_repair_bind(const struct ctx *c, struct tcp_tap_conn *conn) +{ + const struct flowside *sockside = HOSTFLOW(conn); + union sockaddr_inany a; + socklen_t sl; + + pif_sockaddr(c, &a, &sl, PIF_HOST, &sockside->oaddr, sockside->oport); + + if (bind(conn->sock, &a.sa, sizeof(a))) { + int rc = -errno; + flow_perror(conn, "Failed to bind socket for migrated flow"); + return rc; + } + + return 0; +} + /** * tcp_flow_repair_connect() - Connect socket in repair mode, then turn it off * @c: Execution context @@ -3618,6 +3631,9 @@ int tcp_flow_migrate_target_ext(struct ctx *c, struct tcp_tap_conn *conn, int fd /* We weren't able to create the socket, discard flow */ goto fail; + if (tcp_flow_repair_bind(c, conn)) + goto fail; + if (tcp_flow_repair_timestamp(conn, &t)) goto fail; -- 2.49.0
On Wed, 2 Apr 2025 14:13:15 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Downstream testing recently discovered that inbound connections can't be migrated properly, because the new socket gets an address conflict with its corresponding listening socket. It turns out this can be avoided by delaying bind() until after we're already in repair mode. Patch 4/4 is the actual fix here. Patch 3/4 adds a test program checking the behaviour to doc/platform-requirements. Patches 1 & 2 fix minor problems I spotted in doc/platform-requirements writing 3/4. Only 4/4 will need to be backported. David Gibson (4): platform requirements: Fix clang-tidy warning platform requirements: Add attributes to die() function platform requirements: Add test for address conflicts with TCP_REPAIR migrate, tcp: bind() migrated sockets in repair modeApplied. -- Stefano