On Wed, Jan 29, 2025 at 08:04:28AM +0100, Stefano Brivio wrote:
On Wed, 29 Jan 2025 12:29:27 +1100
David Gibson <david(a)gibson.dropbear.id.au> wrote:
On Tue, Jan 28, 2025 at 07:51:31AM +0100, Stefano
Brivio wrote:
On Tue, 28 Jan 2025 12:51:59 +1100
David Gibson <david(a)gibson.dropbear.id.au> wrote:
On Tue, Jan 28, 2025 at 12:15:32AM +0100, Stefano
Brivio wrote:
> A privileged helper to set/clear TCP_REPAIR on sockets on behalf of
> passt. Not used yet.
>
> Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
> ---
> Makefile | 10 +++--
> passt-repair.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 118 insertions(+), 3 deletions(-)
> create mode 100644 passt-repair.c
>
> diff --git a/Makefile b/Makefile
> index 1383875..1b71cb0 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -42,7 +42,8 @@ PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c
fwd.c \
> tcp.c tcp_buf.c tcp_splice.c tcp_vu.c udp.c udp_flow.c udp_vu.c util.c \
> vhost_user.c virtio.c vu_common.c
> QRAP_SRCS = qrap.c
> -SRCS = $(PASST_SRCS) $(QRAP_SRCS)
> +PASST_REPAIR_SRCS = passt-repair.c
> +SRCS = $(PASST_SRCS) $(QRAP_SRCS) $(PASST_REPAIR_SRCS)
>
> MANPAGES = passt.1 pasta.1 qrap.1
>
> @@ -72,9 +73,9 @@ mandir ?= $(datarootdir)/man
> man1dir ?= $(mandir)/man1
>
> ifeq ($(TARGET_ARCH),x86_64)
> -BIN := passt passt.avx2 pasta pasta.avx2 qrap
> +BIN := passt passt.avx2 pasta pasta.avx2 qrap passt-repair
> else
> -BIN := passt pasta qrap
> +BIN := passt pasta qrap passt-repair
> endif
>
> all: $(BIN) $(MANPAGES) docs
> @@ -101,6 +102,9 @@ pasta.avx2 pasta.1 pasta: pasta%: passt%
> qrap: $(QRAP_SRCS) passt.h
> $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) -DARCH=\"$(TARGET_ARCH)\"
$(QRAP_SRCS) -o qrap $(LDFLAGS)
>
> +passt-repair: $(PASST_REPAIR_SRCS)
> + $(CC) $(FLAGS) $(CFLAGS) $(CPPFLAGS) $(PASST_REPAIR_SRCS) -o passt-repair
$(LDFLAGS)
> +
> valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \
> rt_sigreturn getpid gettid kill clock_gettime mmap \
> mmap2 munmap open unlink gettimeofday futex statx \
> diff --git a/passt-repair.c b/passt-repair.c
> new file mode 100644
> index 0000000..e9b9609
> --- /dev/null
> +++ b/passt-repair.c
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +/* PASST - Plug A Simple Socket Transport
> + * for qemu/UNIX domain socket mode
> + *
> + * passt-repair.c - Privileged helper to set/clear TCP_REPAIR on sockets
> + *
> + * Copyright (c) 2025 Red Hat GmbH
> + * Author: Stefano Brivio <sbrivio(a)redhat.com>
> + *
> + * Connect to passt via UNIX domain socket, receive sockets via SCM_RIGHTS along
> + * with commands mapping to TCP_REPAIR values, and switch repair mode on or
> + * off. Reply by echoing the command. Exit if the command is INT_MAX.
> + */
> +
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/un.h>
> +#include <errno.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <limits.h>
> +#include <unistd.h>
> +#include <netdb.h>
> +
> +#include <netinet/tcp.h>
> +
> +#define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */
> +
> +int main(int argc, char **argv)
> +{
> + char buf[CMSG_SPACE(sizeof(int) * SCM_MAX_FD)]
> + __attribute__ ((aligned(__alignof__(struct cmsghdr))));
> + struct sockaddr_un a = { AF_UNIX, "" };
> + int cmd, fds[SCM_MAX_FD], s, ret, i;
> + struct cmsghdr *cmsg;
> + struct msghdr msg;
> + struct iovec iov;
> +
> + iov = (struct iovec){ &cmd, sizeof(cmd) };
I mean, local to local, it's *probably* fine, but still a network
protocol not defined in terms of explicit width fields makes me
nervous. I'd prefer to see the cmd being a packed structure with
fixed width elements.
It actually is, because:
struct {
int cmd;
};
is a packet structure with fixed width elements. Any architecture we
build for (at least the ones I'm aware of) has a 32-bit int. We can
make it uint32_t if it makes you feel better.
Sorry, I should have said "*explicitly* fixed width fields". So, yes,
uint32_t would make me feel better :)
Changed to int8_t anyway meanwhile. We don't need all those bits.
Works or me.
I also think
we should do some sort of basic magic / version exchange.
I don't see any reason we'd need to extend the protocol, but I'd
rather have the option if we have to.
passt-repair will be packaged and distributed together with passt,
though. Versions must match.
But nothing enforces that.
Distribution packages. If I run claws-mail with the wrong version of,
say, libpixman, it won't start. If you don't use them, you're on your
own.
But shared libraries *do* have versioning checks: there are defined
compatibility semantics for sonames, and there can be symbol versions
as well.
AIUI KubeVirt
will be running passt-repair
in a different context. Which means it may well be deployed by a
different path than the passt binary
No, that's not the way it works. It needs to match, in the sense that
1. it's a KubeVirt requirement to have compatible packages between
distribution and the "base container image" and 2. this would most
likely be sourced from the "base container image" anyway.
I maintain the packages for four distributions, plus AppArmor and
SELinux policies upstream and downstream, and I take care of updating
the package in KubeVirt as well, so I guess I have a vague idea of
what's convenient, enforced, burdensome, and so on.
which means however we
distribute it's quite plausible that a downstream screwup could
mismatch the versions. We should endeavour to have a reasonably
graceful failure mode for that.
Regardless of this, I think that *this one* is an interface (I wouldn't
even call it a protocol) that needs to be set in stone, except for
hypothetical (and highly unlikely) UAPI additions which we'll be anyway
able to accommodate for easily.
Ok, I can buy that, but it's a contradictory position to "versions
must match".
It's a single socket option with three possible
values (for 13 years
now), of which we plan to use two. If we want this interface to do
anything else, it should be another interface.
So there's really no problem with this.
Besides, the helper runs with CAP_NET_ADMIN (even though CAP_NET_RAW
should ideally suffice), so it needs to be extremely simple and
auditable.
Sending and checking a magic number is not a lot of complexity, even
in something on this scale.
And
latency here might matter more than in
the rest of the migration process.
Plus
checking a magic number
should make things less damaging and more debuggable if you were to
point the repair helper at an entirely unrelated unix socket instead
of passt's repair socket.
Maybe, yes, even though I don't really see good chances for that
mistake to happen. Feel free to post a proposal, of course.
I disagree on the good chances for a mistake thing. In GSS I saw
plenty of occasions where things that shouldn't be mismatched were due
to some packaging or user screwup. And that's before even considering
the way that KubeVirt deploys its various pieces seems to provide a
number of opportunities to mess this up.
So, I'll see what I can come up with. I'm fine with requiring
matching versions if it's actually checked. Maybe a magic derived
from our git hash, or even our build-id.
Both would make things significantly less usable, because they would
make different but compatible builds incompatible, and different
implementations rather inconvenient.
Ok, so you're definitely now saying versions *don't* need to match.
For example, it might be a practical solution to have
a Go
implementation of this in KubeVirt's virt-handler itself, but if it
needs to extract information or strings from the binary, that becomes
impractical.
Ok... could we at least add just a magic number then. If we do ever
need a new protocol we can change it, otherwise the protocol immutable
for now.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson