It might not be feasible for users to start passt-repair after passt is started, on a migration target, but before the migration process starts. For instance, with libvirt, the guest domain (and, hence, passt) is started on the target as part of the migration process. At least for the moment being, there's no hook a libvirt user (including KubeVirt) can use to start passt-repair before the migration starts. Add a directory watch using inotify: if PATH is a directory, instead of connecting to it, we'll watch for a .repair socket file to appear in it, and then attempt to connect to that socket. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- contrib/selinux/passt-repair.te | 16 +++---- passt-repair.1 | 6 ++- passt-repair.c | 82 ++++++++++++++++++++++++++++++--- 3 files changed, 88 insertions(+), 16 deletions(-) diff --git a/contrib/selinux/passt-repair.te b/contrib/selinux/passt-repair.te index f171be6..7157dfb 100644 --- a/contrib/selinux/passt-repair.te +++ b/contrib/selinux/passt-repair.te @@ -61,11 +61,11 @@ allow passt_repair_t unconfined_t:unix_stream_socket { connectto read write }; allow passt_repair_t passt_t:unix_stream_socket { connectto read write }; allow passt_repair_t user_tmp_t:unix_stream_socket { connectto read write }; -allow passt_repair_t user_tmp_t:dir search; +allow passt_repair_t user_tmp_t:dir { getattr read search watch }; -allow passt_repair_t unconfined_t:sock_file { read write }; -allow passt_repair_t passt_t:sock_file { read write }; -allow passt_repair_t user_tmp_t:sock_file { read write }; +allow passt_repair_t unconfined_t:sock_file { getattr read write }; +allow passt_repair_t passt_t:sock_file { getattr read write }; +allow passt_repair_t user_tmp_t:sock_file { getattr read write }; allow passt_repair_t unconfined_t:tcp_socket { read setopt write }; allow passt_repair_t passt_t:tcp_socket { read setopt write }; @@ -80,8 +80,8 @@ allow passt_repair_t passt_t:tcp_socket { read setopt write }; allow passt_repair_t qemu_var_run_t:unix_stream_socket { connectto read write }; allow passt_repair_t virt_var_run_t:unix_stream_socket { connectto read write }; -allow passt_repair_t qemu_var_run_t:dir search; -allow passt_repair_t virt_var_run_t:dir search; +allow passt_repair_t qemu_var_run_t:dir { getattr read search watch }; +allow passt_repair_t virt_var_run_t:dir { getattr read search watch }; -allow passt_repair_t qemu_var_run_t:sock_file { read write }; -allow passt_repair_t virt_var_run_t:sock_file { read write }; +allow passt_repair_t qemu_var_run_t:sock_file { getattr read write }; +allow passt_repair_t virt_var_run_t:sock_file { getattr read write }; diff --git a/passt-repair.1 b/passt-repair.1 index 7c1b140..e65aadd 100644 --- a/passt-repair.1 +++ b/passt-repair.1 @@ -16,13 +16,17 @@ .B passt-repair is a privileged helper setting and clearing repair mode on TCP sockets on behalf of \fBpasst\fR(1), as instructed via single-byte commands over a UNIX domain -socket, specified by \fIPATH\fR. +socket. It can be used to migrate TCP connections between guests without granting additional capabilities to \fBpasst\fR(1) itself: to migrate TCP connections, \fBpasst\fR(1) leverages repair mode, which needs the \fBCAP_NET_ADMIN\fR capability (see \fBcapabilities\fR(7)) to be set or cleared. +If \fIPATH\fR represents a UNIX domain socket, \fBpasst-repair\fR(1) attempts to +connect to it. If it is a directory, \fBpasst-repair\fR(1) waits until a file +ending with \fI.repair\fR appears in it, and then attempts to connect to it. + .SH PROTOCOL \fBpasst-repair\fR(1) connects to \fBpasst\fR(1) using the socket specified via diff --git a/passt-repair.c b/passt-repair.c index e0c366e..8bb3f00 100644 --- a/passt-repair.c +++ b/passt-repair.c @@ -16,11 +16,14 @@ * off. Reply by echoing the command. Exit on EOF. */ +#include <sys/inotify.h> #include <sys/prctl.h> #include <sys/types.h> #include <sys/socket.h> +#include <sys/stat.h> #include <sys/un.h> #include <errno.h> +#include <stdbool.h> #include <stddef.h> #include <stdio.h> #include <stdlib.h> @@ -39,6 +42,8 @@ #include "seccomp_repair.h" #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ +#define REPAIR_EXT ".repair" +#define REPAIR_EXT_LEN strlen(REPAIR_EXT) /** * main() - Entry point and whole program with loop @@ -51,6 +56,9 @@ * #syscalls:repair socket s390x:socketcall i686:socketcall * #syscalls:repair recvfrom recvmsg arm:recv ppc64le:recv * #syscalls:repair sendto sendmsg arm:send ppc64le:send + * #syscalls:repair stat|statx stat64|statx statx + * #syscalls:repair fstat|fstat64 newfstatat|fstatat64 + * #syscalls:repair inotify_init1 inotify_add_watch */ int main(int argc, char **argv) { @@ -58,12 +66,14 @@ int main(int argc, char **argv) __attribute__ ((aligned(__alignof__(struct cmsghdr)))); struct sockaddr_un a = { AF_UNIX, "" }; int fds[SCM_MAX_FD], s, ret, i, n = 0; + bool inotify_dir = false; struct sock_fprog prog; int8_t cmd = INT8_MAX; struct cmsghdr *cmsg; struct msghdr msg; struct iovec iov; size_t cmsg_len; + struct stat sb; int op; prctl(PR_SET_DUMPABLE, 0); @@ -90,19 +100,77 @@ int main(int argc, char **argv) _exit(2); } - ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]); + if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { + fprintf(stderr, "Failed to create AF_UNIX socket: %i\n", errno); + _exit(1); + } + + if ((stat(argv[1], &sb))) { + fprintf(stderr, "Can't stat() %s: %i\n", argv[1], errno); + _exit(1); + } + + if ((sb.st_mode & S_IFMT) == S_IFDIR) { + char buf[sizeof(struct inotify_event) + NAME_MAX + 1]; + const struct inotify_event *ev; + char path[PATH_MAX + 1]; + ssize_t n; + int fd; + + ev = (struct inotify_event *)buf; + + if ((fd = inotify_init1(IN_CLOEXEC)) < 0) { + fprintf(stderr, "inotify_init1: %i\n", errno); + _exit(1); + } + + if (inotify_add_watch(fd, argv[1], IN_CREATE) < 0) { + fprintf(stderr, "inotify_add_watch: %i\n", errno); + _exit(1); + } + + do { + n = read(fd, buf, sizeof(buf)); + if (n < 0) { + fprintf(stderr, "inotify read: %i", errno); + _exit(1); + } + + if (n < (ssize_t)sizeof(*ev)) { + fprintf(stderr, "Short inotify read: %zi", n); + _exit(1); + } + } while (ev->len < REPAIR_EXT_LEN || + memcmp(ev->name + strlen(ev->name) - REPAIR_EXT_LEN, + REPAIR_EXT, REPAIR_EXT_LEN)); + + snprintf(path, sizeof(path), "%s/%s", argv[1], ev->name); + if ((stat(path, &sb))) { + fprintf(stderr, "Can't stat() %s: %i\n", path, errno); + _exit(1); + } + + ret = snprintf(a.sun_path, sizeof(a.sun_path), path); + inotify_dir = true; + } else { + ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]); + } + if (ret <= 0 || ret >= (int)sizeof(a.sun_path)) { - fprintf(stderr, "Invalid socket path: %s\n", argv[1]); + fprintf(stderr, "Invalid socket path"); _exit(2); } - if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { - fprintf(stderr, "Failed to create AF_UNIX socket: %i\n", errno); - _exit(1); + if ((sb.st_mode & S_IFMT) != S_IFSOCK) { + fprintf(stderr, "%s is not a socket\n", a.sun_path); + _exit(2); } - if (connect(s, (struct sockaddr *)&a, sizeof(a))) { - fprintf(stderr, "Failed to connect to %s: %s\n", argv[1], + while (connect(s, (struct sockaddr *)&a, sizeof(a))) { + if (inotify_dir && errno == ECONNREFUSED) + continue; + + fprintf(stderr, "Failed to connect to %s: %s\n", a.sun_path, strerror(errno)); _exit(1); } -- 2.43.0
On Fri, Mar 07, 2025 at 11:41:20PM +0100, Stefano Brivio wrote:It might not be feasible for users to start passt-repair after passt is started, on a migration target, but before the migration process starts. For instance, with libvirt, the guest domain (and, hence, passt) is started on the target as part of the migration process. At least for the moment being, there's no hook a libvirt user (including KubeVirt) can use to start passt-repair before the migration starts. Add a directory watch using inotify: if PATH is a directory, instead of connecting to it, we'll watch for a .repair socket file to appear in it, and then attempt to connect to that socket.So, with this change, running passt-repair /tmp would be a Bad Idea. But that is the default path used by passt. To use this safely, you really want to have a directory set aside for the use of just one passt instance, or at least passt-owning uid. I feel like we should enforce, or at least document and encourage that somewhere. Not really sure where, though, so, with some misgivings Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- contrib/selinux/passt-repair.te | 16 +++---- passt-repair.1 | 6 ++- passt-repair.c | 82 ++++++++++++++++++++++++++++++--- 3 files changed, 88 insertions(+), 16 deletions(-) diff --git a/contrib/selinux/passt-repair.te b/contrib/selinux/passt-repair.te index f171be6..7157dfb 100644 --- a/contrib/selinux/passt-repair.te +++ b/contrib/selinux/passt-repair.te @@ -61,11 +61,11 @@ allow passt_repair_t unconfined_t:unix_stream_socket { connectto read write }; allow passt_repair_t passt_t:unix_stream_socket { connectto read write }; allow passt_repair_t user_tmp_t:unix_stream_socket { connectto read write }; -allow passt_repair_t user_tmp_t:dir search; +allow passt_repair_t user_tmp_t:dir { getattr read search watch }; -allow passt_repair_t unconfined_t:sock_file { read write }; -allow passt_repair_t passt_t:sock_file { read write }; -allow passt_repair_t user_tmp_t:sock_file { read write }; +allow passt_repair_t unconfined_t:sock_file { getattr read write }; +allow passt_repair_t passt_t:sock_file { getattr read write }; +allow passt_repair_t user_tmp_t:sock_file { getattr read write }; allow passt_repair_t unconfined_t:tcp_socket { read setopt write }; allow passt_repair_t passt_t:tcp_socket { read setopt write }; @@ -80,8 +80,8 @@ allow passt_repair_t passt_t:tcp_socket { read setopt write }; allow passt_repair_t qemu_var_run_t:unix_stream_socket { connectto read write }; allow passt_repair_t virt_var_run_t:unix_stream_socket { connectto read write }; -allow passt_repair_t qemu_var_run_t:dir search; -allow passt_repair_t virt_var_run_t:dir search; +allow passt_repair_t qemu_var_run_t:dir { getattr read search watch }; +allow passt_repair_t virt_var_run_t:dir { getattr read search watch }; -allow passt_repair_t qemu_var_run_t:sock_file { read write }; -allow passt_repair_t virt_var_run_t:sock_file { read write }; +allow passt_repair_t qemu_var_run_t:sock_file { getattr read write }; +allow passt_repair_t virt_var_run_t:sock_file { getattr read write }; diff --git a/passt-repair.1 b/passt-repair.1 index 7c1b140..e65aadd 100644 --- a/passt-repair.1 +++ b/passt-repair.1 @@ -16,13 +16,17 @@ .B passt-repair is a privileged helper setting and clearing repair mode on TCP sockets on behalf of \fBpasst\fR(1), as instructed via single-byte commands over a UNIX domain -socket, specified by \fIPATH\fR. +socket. It can be used to migrate TCP connections between guests without granting additional capabilities to \fBpasst\fR(1) itself: to migrate TCP connections, \fBpasst\fR(1) leverages repair mode, which needs the \fBCAP_NET_ADMIN\fR capability (see \fBcapabilities\fR(7)) to be set or cleared. +If \fIPATH\fR represents a UNIX domain socket, \fBpasst-repair\fR(1) attempts to +connect to it. If it is a directory, \fBpasst-repair\fR(1) waits until a file +ending with \fI.repair\fR appears in it, and then attempts to connect to it. + .SH PROTOCOL \fBpasst-repair\fR(1) connects to \fBpasst\fR(1) using the socket specified via diff --git a/passt-repair.c b/passt-repair.c index e0c366e..8bb3f00 100644 --- a/passt-repair.c +++ b/passt-repair.c @@ -16,11 +16,14 @@ * off. Reply by echoing the command. Exit on EOF. */ +#include <sys/inotify.h> #include <sys/prctl.h> #include <sys/types.h> #include <sys/socket.h> +#include <sys/stat.h> #include <sys/un.h> #include <errno.h> +#include <stdbool.h> #include <stddef.h> #include <stdio.h> #include <stdlib.h> @@ -39,6 +42,8 @@ #include "seccomp_repair.h" #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ +#define REPAIR_EXT ".repair" +#define REPAIR_EXT_LEN strlen(REPAIR_EXT) /** * main() - Entry point and whole program with loop @@ -51,6 +56,9 @@ * #syscalls:repair socket s390x:socketcall i686:socketcall * #syscalls:repair recvfrom recvmsg arm:recv ppc64le:recv * #syscalls:repair sendto sendmsg arm:send ppc64le:send + * #syscalls:repair stat|statx stat64|statx statx + * #syscalls:repair fstat|fstat64 newfstatat|fstatat64 + * #syscalls:repair inotify_init1 inotify_add_watch */ int main(int argc, char **argv) { @@ -58,12 +66,14 @@ int main(int argc, char **argv) __attribute__ ((aligned(__alignof__(struct cmsghdr)))); struct sockaddr_un a = { AF_UNIX, "" }; int fds[SCM_MAX_FD], s, ret, i, n = 0; + bool inotify_dir = false; struct sock_fprog prog; int8_t cmd = INT8_MAX; struct cmsghdr *cmsg; struct msghdr msg; struct iovec iov; size_t cmsg_len; + struct stat sb; int op; prctl(PR_SET_DUMPABLE, 0); @@ -90,19 +100,77 @@ int main(int argc, char **argv) _exit(2); } - ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]); + if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { + fprintf(stderr, "Failed to create AF_UNIX socket: %i\n", errno); + _exit(1); + } + + if ((stat(argv[1], &sb))) { + fprintf(stderr, "Can't stat() %s: %i\n", argv[1], errno); + _exit(1); + } + + if ((sb.st_mode & S_IFMT) == S_IFDIR) { + char buf[sizeof(struct inotify_event) + NAME_MAX + 1]; + const struct inotify_event *ev; + char path[PATH_MAX + 1]; + ssize_t n; + int fd; + + ev = (struct inotify_event *)buf; + + if ((fd = inotify_init1(IN_CLOEXEC)) < 0) { + fprintf(stderr, "inotify_init1: %i\n", errno); + _exit(1); + } + + if (inotify_add_watch(fd, argv[1], IN_CREATE) < 0) { + fprintf(stderr, "inotify_add_watch: %i\n", errno); + _exit(1); + } + + do { + n = read(fd, buf, sizeof(buf)); + if (n < 0) { + fprintf(stderr, "inotify read: %i", errno); + _exit(1); + } + + if (n < (ssize_t)sizeof(*ev)) { + fprintf(stderr, "Short inotify read: %zi", n); + _exit(1); + } + } while (ev->len < REPAIR_EXT_LEN || + memcmp(ev->name + strlen(ev->name) - REPAIR_EXT_LEN, + REPAIR_EXT, REPAIR_EXT_LEN)); + + snprintf(path, sizeof(path), "%s/%s", argv[1], ev->name); + if ((stat(path, &sb))) { + fprintf(stderr, "Can't stat() %s: %i\n", path, errno); + _exit(1); + } + + ret = snprintf(a.sun_path, sizeof(a.sun_path), path); + inotify_dir = true; + } else { + ret = snprintf(a.sun_path, sizeof(a.sun_path), "%s", argv[1]); + } + if (ret <= 0 || ret >= (int)sizeof(a.sun_path)) { - fprintf(stderr, "Invalid socket path: %s\n", argv[1]); + fprintf(stderr, "Invalid socket path"); _exit(2); } - if ((s = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { - fprintf(stderr, "Failed to create AF_UNIX socket: %i\n", errno); - _exit(1); + if ((sb.st_mode & S_IFMT) != S_IFSOCK) { + fprintf(stderr, "%s is not a socket\n", a.sun_path); + _exit(2); } - if (connect(s, (struct sockaddr *)&a, sizeof(a))) { - fprintf(stderr, "Failed to connect to %s: %s\n", argv[1], + while (connect(s, (struct sockaddr *)&a, sizeof(a))) { + if (inotify_dir && errno == ECONNREFUSED) + continue; + + fprintf(stderr, "Failed to connect to %s: %s\n", a.sun_path, strerror(errno)); _exit(1); }-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 11 Mar 2025 12:35:46 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Fri, Mar 07, 2025 at 11:41:20PM +0100, Stefano Brivio wrote:...why? On any distribution where it's available, you can make it connect to whatever you want, and it will do nothing else than returning an error when passt tries to switch a socket to repair mode. It will just work in the KubeVirt use case we planned for, for the moment. Then sure, you can give it capabilities or run it as root, disable LSMs, and make it connect to whatever process. But you need root anyway, so there isn't much to be gained.It might not be feasible for users to start passt-repair after passt is started, on a migration target, but before the migration process starts. For instance, with libvirt, the guest domain (and, hence, passt) is started on the target as part of the migration process. At least for the moment being, there's no hook a libvirt user (including KubeVirt) can use to start passt-repair before the migration starts. Add a directory watch using inotify: if PATH is a directory, instead of connecting to it, we'll watch for a .repair socket file to appear in it, and then attempt to connect to that socket.So, with this change, running passt-repair /tmp would be a Bad Idea.But that is the default path used by passt. To use this safely, you really want to have a directory set aside for the use of just one passt instance, or at least passt-owning uid.Right, that's what happens if libvirt starts it.I feel like we should enforce, or at least document and encourage that somewhere. Not really sure where, though, so, with some misgivingsI think we'll find a more reasonable solution by the time this becomes actually usable by mere mortals using distribution packages. I would anyway drop all this once we figure out how to make it convenient for libvirt. For stand-alone usage, this is not really needed. -- Stefano
On Tue, Mar 11, 2025 at 11:44:57PM +0100, Stefano Brivio wrote:On Tue, 11 Mar 2025 12:35:46 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:I mean that if you are able to run it privileged, then it would be a bad idea to do so with /tmp as the watch directory. Since that's kind of the default place to point it, it's a footgun.On Fri, Mar 07, 2025 at 11:41:20PM +0100, Stefano Brivio wrote:...why? On any distribution where it's available, you can make it connect to whatever you want, and it will do nothing else than returning an error when passt tries to switch a socket to repair mode.It might not be feasible for users to start passt-repair after passt is started, on a migration target, but before the migration process starts. For instance, with libvirt, the guest domain (and, hence, passt) is started on the target as part of the migration process. At least for the moment being, there's no hook a libvirt user (including KubeVirt) can use to start passt-repair before the migration starts. Add a directory watch using inotify: if PATH is a directory, instead of connecting to it, we'll watch for a .repair socket file to appear in it, and then attempt to connect to that socket.So, with this change, running passt-repair /tmp would be a Bad Idea.It will just work in the KubeVirt use case we planned for, for the moment. Then sure, you can give it capabilities or run it as root, disable LSMs, and make it connect to whatever process. But you need root anyway, so there isn't much to be gained.Hm. I guess we'll see. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonBut that is the default path used by passt. To use this safely, you really want to have a directory set aside for the use of just one passt instance, or at least passt-owning uid.Right, that's what happens if libvirt starts it.I feel like we should enforce, or at least document and encourage that somewhere. Not really sure where, though, so, with some misgivingsI think we'll find a more reasonable solution by the time this becomes actually usable by mere mortals using distribution packages. I would anyway drop all this once we figure out how to make it convenient for libvirt. For stand-alone usage, this is not really needed.