On Fri, Jan 31, 2025 at 08:39:41PM +0100, Stefano Brivio wrote:A privileged helper to set/clear TCP_REPAIR on sockets on behalf of passt. Not used yet.I don't think a trivial change like the .gitignore really needs to be commented and credited.From David's patch: add it to .gitignore, like our other executabletargets. Co-authored-by: David Gibson <david(a)gibson.dropbear.id.au>Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- .gitignore | 1 + Makefile | 10 +++-- passt-repair.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 125 insertions(+), 3 deletions(-) create mode 100644 passt-repair.c diff --git a/.gitignore b/.gitignore index d1c8be9..5824a71 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,7 @@ /passt.avx2 /pasta /pasta.avx2 +/passt-repair /qrap /pasta.1 /seccomp.h diff --git a/Makefile b/Makefile index 1383875..1b71cb0 100644 --- a/Makefile +++ b/Makefile @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \ tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \ vhost_user.c virtio.c vu_common.c QRAP_SRCS = qrap.c -SRCS = $(PASST_SRCS) $(QRAP_SRCS) +PASST_REPAIR_SRCS = passt-repair.c +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS) MANPAGES = passt.1 pasta.1 qrap.1 @@ -72,9 +73,9 @@ mandir ?= $(datarootdir)/man man1dir ?= $(mandir)/man1 ifeq ($(TARGET_ARCH),x86_64) -BIN := passt passt.avx2 pasta pasta.avx2 qrap +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair else -BIN := passt pasta qrap +BIN := passt pasta qrap passt-repair endif all: $(BIN) $(MANPAGES) docs @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt% qrap: $(QRAP_SRCS) passt.h $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\" $(QRAP_SRCS) -o qrap $(LDFLAGS) +passt-repair: $(PASST_REPAIR_SRCS) + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair $(LDFLAGS) + valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ rt_sigreturn getpid gettid kill clock_gettime mmap \ mmap2 munmap open unlink gettimeofday futex statx \ diff --git a/passt-repair.c b/passt-repair.c new file mode 100644 index 0000000..988a52c --- /dev/null +++ b/passt-repair.c @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* PASST - Plug A Simple Socket Transport + * for qemu/UNIX domain socket mode + * + * PASTA - Pack A Subtle Tap Abstraction + * for network namespace/tap device mode + * + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets + * + * Copyright (c) 2025 Red Hat GmbH + * Author: Stefano Brivio <sbrivio(a)redhat.com> + * + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along + * with byte commands mapping to TCP_REPAIR values, and switch repair mode on or + * off. Reply by echoing the command. Exit on EOF. + */ + +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/un.h> +#include <errno.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <limits.h> +#include <unistd.h> +#include <netdb.h> + +#include <netinet/tcp.h> + +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ + +int main(int argc, char **argv) +{ + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)] + __attribute__ ((aligned(__alignof__(struct cmsghdr)))); + struct sockaddr_un a = { AF_UNIX, "" }; + int fds[SCM_MAX_FD], s, ret, i, n; + int8_t cmd = INT8_MAX; + struct cmsghdr *cmsg; + struct msghdr msg; + struct iovec iov; + + iov = (struct iovec){ &cmd, sizeof(cmd) }; + msg = (struct msghdr){ NULL, 0, &iov, 1, buf, sizeof(buf), 0 }; + cmsg = CMSG_FIRSTHDR(&msg); + + if (argc != 2) { + fprintf(stderr, "Usage: %s PATH\n", argv[0]); + return -1; + } + + ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]); + if (ret <= 0 || ret >= (int)sizeof(a.sun_path)) { + fprintf(stderr, "Invalid socket path: %s\n", argv[1]); + return -1; + } + + if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { + perror("Failed to create AF_UNIX socket"); + return -1; + } + + if (connect(s, (struct sockaddr *)&a, sizeof(a))) { + fprintf(stderr, "Failed to connect to %s: %s\n", argv[1], + strerror(errno)); + return -1; + } + +loop: + ret = recvmsg(s, &msg, 0); + if (ret < 0) { + perror("Failed to receive message"); + return -1; + } + + if (!ret) /* Done */ + return 0; + + if (!cmsg || + cmsg->cmsg_len < CMSG_LEN(sizeof(int)) || + cmsg->cmsg_len > CMSG_LEN(sizeof(int) * SCM_MAX_FD) || + cmsg->cmsg_type != SCM_RIGHTS) + return -1; + + n = cmsg->cmsg_len / CMSG_LEN(sizeof(int)); + memcpy(fds, CMSG_DATA(cmsg), sizeof(int) * n); + + if (cmd != TCP_REPAIR_ON && cmd != TCP_REPAIR_OFF && + cmd != TCP_REPAIR_OFF_NO_WP) { + fprintf(stderr, "Unsupported command 0x%04x\n", cmd); + return -1; + } + + for (i = 0; i < n; i++) { + int o = cmd; + + if (setsockopt(fds[i], SOL_TCP, TCP_REPAIR, &o, sizeof(o))) { + fprintf(stderr, + "Setting TCP_REPAIR to %i on socket %i: %s", o, + fds[i], strerror(errno)); + return -1; + }So, I was thinking about this: I think we need to close() the fd, after calling TCP_REPAIR. If we don't, that's essentially an extra reference to the underlying kernel file object. That means: * When we close() the fd in passt, the socket won't actually go away. I think this is probably the cause of the in use ports you encountered. The current approach of exiting after the migrate is causing passt-repair to also exit, freeing up the additional references. * For incoming migrations, there's: when a migrated connection comes to a proper close on the target, the socket will be held open by the extra fd in the target side passt-repair. * At the moment, I don't think we expect more than two migrations for a single passt-repair instance (one in, and one out). But, particularly for the case of multiple failed migration attempts, I don't think we want to count on that. We're essentially leaking fd slots here, so passt-repair could run out of fds. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson