[PATCH v4 0/7] Don't expose container loopback services to the host
podman issue #24045 pointed out that pasta's spliced forwarding logic can expose services within the namespace bound only to 127.0.0.1 or ::1 to the host. However, the namespace probably expects those to only be accessible to itself, so that's probably not what we want. This changes our forwarding logic so as not to do this. Note that the podman tests will currently fail with this series, I've submitted podman PR https://github.com/containers/podman/pull/24064 to fix that. Link: https://github.com/containers/podman/issues/24045 Changes since v3: * Added several extra patches working around failures on Debian, because its dhclient-script doesn't wait for DAD to complete Changes since v2: * Add new field do structure comment Changes since v1: * Add --host-lo-to-ns-lo option to preserve the old behaviour * Clarify the new behaviour in the man page * Add some extra patches making some other corrections to the man page David Gibson (7): arp: Fix a handful of small warts test: Explicitly wait for DAD to complete on SLAAC addresses test: Wait for DAD on DHCPv6 addresses passt.1: Mark --stderr as deprecated more prominently passt.1: Clarify and update "Handling of local addresses" section test: Clarify test for spliced inbound transfers fwd: Direct inbound spliced forwards to the guest's external address arp.c | 8 ++--- conf.c | 9 ++++++ fwd.c | 31 +++++++++++++----- passt.1 | 75 ++++++++++++++++++++++++++----------------- passt.h | 2 ++ test/passt/dhcp | 2 ++ test/passt/ndp | 4 ++- test/passt_in_ns/tcp | 8 ++--- test/passt_in_ns/udp | 4 +-- test/pasta/dhcp | 2 ++ test/pasta/ndp | 3 +- test/pasta/tcp | 16 ++++----- test/pasta/udp | 8 ++--- test/two_guests/basic | 6 +++- 14 files changed, 115 insertions(+), 63 deletions(-) -- 2.47.0
This fixes a number of harmless but slightly ugly warts in the ARP
resolution code:
* Use in4addr_any to represent 0.0.0.0 rather than hand constructing an
example.
* When comparing am->sip against 0.0.0.0 use sizeof(am->sip) instead of
sizeof(am->tip) (same value, but makes more logical sense)
* Described the guest's assigned address as such, rather than as "our
address" - that's not usually what we mean by "our address" these days
* Remove "we might have the same IP address" comment which I can't make
sense of in context (possibly it's relating to the statement below,
which already has its own comment?)
Signed-off-by: David Gibson
On Thu, 17 Oct 2024 16:32:58 +1100
David Gibson
This fixes a number of harmless but slightly ugly warts in the ARP resolution code: * Use in4addr_any to represent 0.0.0.0 rather than hand constructing an example. * When comparing am->sip against 0.0.0.0 use sizeof(am->sip) instead of sizeof(am->tip) (same value, but makes more logical sense) * Described the guest's assigned address as such, rather than as "our address" - that's not usually what we mean by "our address" these days * Remove "we might have the same IP address" comment which I can't make sense of in context (possibly it's relating to the statement below, which already has its own comment?)
Right, yes, I added the part below later, in 64c0f20ab3f8 ("arp: Don't resolve own, configured IPv4 address"). The first check comes from 2166c5872e9b ("arp: Don't answer announcements from guest or namespace") instead. The commit title makes it obvious, the comment not so much.
Signed-off-by: David Gibson
--- arp.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arp.c b/arp.c index 53334da..d34b20e 100644 --- a/arp.c +++ b/arp.c @@ -59,14 +59,12 @@ int arp(const struct ctx *c, const struct pool *p) ah->ar_op != htons(ARPOP_REQUEST)) return 1;
- /* Discard announcements (but not 0.0.0.0 "probes"): we might have the - * same IP address, hide that. - */ - if (memcmp(am->sip, (unsigned char[4]){ 0 }, sizeof(am->tip)) && + /* Discard announcements, but not 0.0.0.0 "probes" */ + if (memcmp(am->sip, &in4addr_any, sizeof(am->sip)) && !memcmp(am->sip, am->tip, sizeof(am->sip))) return 1;
- /* Don't resolve our own address, either. */ + /* Don't resolve guest's assigned address, either. */
Nit: "the guest's assigned address".
if (!memcmp(am->tip, &c->ip4.addr, sizeof(am->tip))) return 1;
-- Stefano
On Thu, Oct 17, 2024 at 10:45:24AM +0200, Stefano Brivio wrote:
On Thu, 17 Oct 2024 16:32:58 +1100 David Gibson
wrote: This fixes a number of harmless but slightly ugly warts in the ARP resolution code: * Use in4addr_any to represent 0.0.0.0 rather than hand constructing an example. * When comparing am->sip against 0.0.0.0 use sizeof(am->sip) instead of sizeof(am->tip) (same value, but makes more logical sense) * Described the guest's assigned address as such, rather than as "our address" - that's not usually what we mean by "our address" these days * Remove "we might have the same IP address" comment which I can't make sense of in context (possibly it's relating to the statement below, which already has its own comment?)
Right, yes, I added the part below later, in 64c0f20ab3f8 ("arp: Don't resolve own, configured IPv4 address").
The first check comes from 2166c5872e9b ("arp: Don't answer announcements from guest or namespace") instead. The commit title makes it obvious, the comment not so much.
Signed-off-by: David Gibson
--- arp.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arp.c b/arp.c index 53334da..d34b20e 100644 --- a/arp.c +++ b/arp.c @@ -59,14 +59,12 @@ int arp(const struct ctx *c, const struct pool *p) ah->ar_op != htons(ARPOP_REQUEST)) return 1;
- /* Discard announcements (but not 0.0.0.0 "probes"): we might have the - * same IP address, hide that. - */ - if (memcmp(am->sip, (unsigned char[4]){ 0 }, sizeof(am->tip)) && + /* Discard announcements, but not 0.0.0.0 "probes" */ + if (memcmp(am->sip, &in4addr_any, sizeof(am->sip)) && !memcmp(am->sip, am->tip, sizeof(am->sip))) return 1;
- /* Don't resolve our own address, either. */ + /* Don't resolve guest's assigned address, either. */
Nit: "the guest's assigned address".
Fixed.
if (!memcmp(am->tip, &c->ip4.addr, sizeof(am->tip))) return 1;
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
Getting a SLAAC address takes a little while because the kernel must
complete Duplicate Address Detection (DAD) before marking the address as
ready. In several places we have an explicit 'sleep 2' to wait for that
to complete.
Fixed length delays are never a great idea, although this one is pretty
solid. Still, it would be better to explicitly wait for DAD to complete
in case of long delays (which might happen on slow emulated hosts, or with
heavy load), and to speed the tests up if DAD completes quicker.
Replace the fixed sleeps with a loop waiting for DAD to complete. We do
this by looping waiting for all tentative addresses to disappear.
Signed-off-by: David Gibson
After running dhclient -6 we expect the DHCPv6 assigned address to be
immediately usable. That's true with the Fedora dhclient-script (and the
upstream ISC DHCP one), however it's not true with the Debian
dhclient-script. The Debian script can complete with the address still
in "tentative" state, and the address won't be usable until Duplicate
Address Detection (DAD) completes. That's arguably a bug in Debian (see
link below), but for the time being we need to work around it anyway.
We usually get away with this, because by the time we do anything where the
address matters, DAD has completed. However, it's not robust, so we should
explicitly wait for DAD to complete when we get an DHCPv6 address.
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1085231
Signed-off-by: David Gibson
On Thu, 17 Oct 2024 16:33:00 +1100
David Gibson
After running dhclient -6 we expect the DHCPv6 assigned address to be immediately usable. That's true with the Fedora dhclient-script (and the upstream ISC DHCP one), however it's not true with the Debian dhclient-script. The Debian script can complete with the address still in "tentative" state, and the address won't be usable until Duplicate Address Detection (DAD) completes. That's arguably a bug in Debian (see link below), but for the time being we need to work around it anyway.
We usually get away with this, because by the time we do anything where the address matters, DAD has completed. However, it's not robust, so we should explicitly wait for DAD to complete when we get an DHCPv6 address.
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1085231
Signed-off-by: David Gibson
--- test/passt/dhcp | 2 ++ test/pasta/dhcp | 2 ++ test/two_guests/basic | 3 +++
What about the equivalent cases in passt_in_ns/dhcp and perf/passt_tcp? -- Stefano
On Thu, Oct 17, 2024 at 10:45:29AM +0200, Stefano Brivio wrote:
On Thu, 17 Oct 2024 16:33:00 +1100 David Gibson
wrote: After running dhclient -6 we expect the DHCPv6 assigned address to be immediately usable. That's true with the Fedora dhclient-script (and the upstream ISC DHCP one), however it's not true with the Debian dhclient-script. The Debian script can complete with the address still in "tentative" state, and the address won't be usable until Duplicate Address Detection (DAD) completes. That's arguably a bug in Debian (see link below), but for the time being we need to work around it anyway.
We usually get away with this, because by the time we do anything where the address matters, DAD has completed. However, it's not robust, so we should explicitly wait for DAD to complete when we get an DHCPv6 address.
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1085231
Signed-off-by: David Gibson
--- test/passt/dhcp | 2 ++ test/pasta/dhcp | 2 ++ test/two_guests/basic | 3 +++ What about the equivalent cases in passt_in_ns/dhcp and perf/passt_tcp?
Oops, somehow I missed those. Will fix and respin. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
The description of this option says that it's deprecated, but unlike
--no-copy-addrs and --no-copy-routes it doesn't have a clear label. Add
one to make it easier to spot.
Signed-off-by: David Gibson
This section didn't mention the effect of the --map-host-loopback option
which now alters this behaviour. Update it accordingly.
It used "local addresses" to mean specifically 127.0.0.0/8 and ::1.
However, "local" could also refer to link-local addresses or to addresses
of any scope which happen to be configured on the host. Use "loopback
address" to be more precise about this.
Signed-off-by: David Gibson
The tests in pasta/tcp and pasta/udp for inbound transfers have the server
listening within the namespace explicitly bound to 127.0.0.1 or ::1. This
only works because of the behaviour of inbound splice connections, which
always appear with both source and destination addresses as loopback in
the namespace. That's not an inherent property for "spliced" connections
and arguably an undesirable one. Also update the test names to make it
clearer that these tests are expecting to exercise the "splice" path.
Interestingly this was already correct for the equivalent passt_in_ns/*,
although we also update the test names for clarity there.
Note that there are similar issues in some of the podman tests, addressed
in https://github.com/containers/podman/pull/24064
Signed-off-by: David Gibson
In pasta mode, where addressing permits we "splice" connections, forwarding
directly from host socket to guest/container socket without any L2 or L3
processing. This gives us a very large performance improvement when it's
possible.
Since the traffic is from a local socket within the guest, it will go over
the guest's 'lo' interface, and accordingly we set the guest side address
to be the loopback address. However this has a surprising side effect:
sometimes guests will run services that are only supposed to be used within
the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's
forwarding exposes those services to the host, which isn't generally what
we want.
Correct this by instead forwarding inbound "splice" flows to the guest's
external address.
Link: https://github.com/containers/podman/issues/24045
Signed-off-by: David Gibson
participants (2)
-
David Gibson
-
Stefano Brivio