On Wed, Oct 09, 2024 at 10:44:33PM +0200, Stefano Brivio wrote:On Wed, 9 Oct 2024 15:07:21 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote: > On Wed, 2 Oct 2024 15:48:26 +1000 > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > In pasta mode, where addressing permits we "splice" connections, forwarding > > directly from host socket to guest/container socket without any L2 or L3 > > processing. This gives us a very large performance improvement when it's > > possible. > > > > Since the traffic is from a local socket within the guest, it will go over > > the guest's 'lo' interface, and accordingly we set the guest side address > > to be the loopback address. However this has a surprising side effect: > > sometimes guests will run services that are only supposed to be used within > > the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's > > forwarding exposes those services to the host, which isn't generally what > > we want. > > > > Correct this by instead forwarding inbound "splice" flows to the guest's > > external address. > > > > Link: https://github.com/containers/podman/issues/24045 > > > > Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> > > --- > > conf.c | 9 +++++++++ > > fwd.c | 31 +++++++++++++++++++++++-------- > > passt.1 | 23 +++++++++++++++++++---- > > passt.h | 2 ++ > > 4 files changed, 53 insertions(+), 12 deletions(-) > > > > diff --git a/conf.c b/conf.c > > index 6e62510..b5318f3 100644 > > --- a/conf.c > > +++ b/conf.c > > @@ -908,6 +908,9 @@ pasta_opts: > > " -U, --udp-ns SPEC UDP port forwarding to init namespace\n" > > " SPEC is as described above\n" > > " default: auto\n" > > + " --host-lo-to-ns-lo DEPRECATED:\n" > > + " Translate host-loopback forwards to\n" > > + " namespace loopback\n" > > " --userns NSPATH Target user namespace to join\n" > > " --netns PATH|NAME Target network namespace to join\n" > > " --netns-only Don't join existing user namespace\n" > > @@ -1284,6 +1287,7 @@ void conf(struct ctx *c, int argc, char **argv) > > {"netns-only", no_argument, NULL, 20 }, > > {"map-host-loopback", required_argument, NULL, 21 }, > > {"map-guest-addr", required_argument, NULL, 22 }, > > + {"host-lo-to-ns-lo", no_argument, NULL, 23 }, > > { 0 }, > > }; > > const char *logname = (c->mode == MODE_PASTA) ? "pasta" : "passt"; > > @@ -1461,6 +1465,11 @@ void conf(struct ctx *c, int argc, char **argv) > > conf_nat(optarg, &c->ip4.map_guest_addr, > > &c->ip6.map_guest_addr, NULL); > > break; > > + case 23: > > + if (c->mode != MODE_PASTA) > > + die("--host-lo-to-ns-lo is for pasta mode only"); > > + c->host_lo_to_ns_lo = 1; > > + break; > > case 'd': > > c->debug = 1; > > c->quiet = 0; > > diff --git a/fwd.c b/fwd.c > > index a505098..c71f5e1 100644 > > --- a/fwd.c > > +++ b/fwd.c > > @@ -447,20 +447,35 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, > > (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { > > /* spliceable */ > > > > - /* Preserve the specific loopback adddress used, but let the > > - * kernel pick a source port on the target side > > + /* The traffic will go over the guest's 'lo' interface, but by > > + * default use its external address, so we don't inadvertently > > + * expose services that listen only on the guest's loopback > > + * address. That can be overridden by --host-lo-to-ns-lo which > > + * will instead forward to the loopback address in the guest. > > + * > > + * In either case, let the kernel pick the source address to > > + * match. > > */ > > - tgt->oaddr = ini->eaddr; > > + if (inany_v4(&ini->eaddr)) { > > + if (c->host_lo_to_ns_lo) > > + tgt->eaddr = inany_loopback4; > > + else > > + tgt->eaddr = inany_from_v4(c->ip4.addr_seen); > > + tgt->oaddr = inany_any4; > > + } else { > > + if (c->host_lo_to_ns_lo) > > + tgt->eaddr = inany_loopback6; > > + else > > + tgt->eaddr.a6 = c->ip6.addr_seen; > > Either this... > > > + tgt->oaddr = inany_any6; > > or this (and not something before this patch, up to 3/4) make the > "TCP/IPv6: host to ns (spliced): big transfer" test in pasta/tcp hang, > sometimes (about one in three/four runs), that's what I mistakenly > reported as coming from Laurent's series at:Huh, interesting. Just got back from my leave and ran that group of tests in a loop this afternoon, but didn't manage to reproduce. I have administrivia that will probably fill the rest of this week, but I'll look into this as soon as I can. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson