On Thu, 13 Oct 2022 06:01:19 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote:On Tue, 11 Oct 2022 16:40:15 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Ah, "of course". Podman calls us with UID 0 in the user namespace it just created, so if we drop CAP_SYS_ADMIN in isolate_initial() we can't join the network namespace, and if we drop CAP_NET_ADMIN we can't configure it. So for that case (and only for that, I suppose), we need something like (tested): diff --git a/isolation.c b/isolation.c index 1769180..fee6dbd 100644 --- a/isolation.c +++ b/isolation.c @@ -190,7 +190,7 @@ void isolate_initial(void) * namespace if we have it, so that we can forward low ports * into the guest/namespace */ - drop_caps_ep_except((1UL << CAP_NET_BIND_SERVICE)); + drop_caps_ep_except(BIT(CAP_SYS_ADMIN) | BIT(CAP_NET_ADMIN)); } ...which is a bit pointless. Better than *any* capability, but not by far. So, if we make this totally independent from configuration, we need those two capabilities. We could add a "postconf" stage and cover a tiny bit more of conf.c. Or we could have a special path in isolate_initial() for the case we know we're not in the init namespace. I'm not sure. If you have a specific preference/strong opinion I would actually be happier. :) -- Stefano@@ -251,7 +275,19 @@ int isolate_prefork(struct ctx *c) return -errno; } - drop_caps(); /* Relative to the new user namespace this time. */ + /* Drop capabilites in our new userns */ + if (c->mode == MODE_PASTA) { + /* Keep CAP_SYS_ADMIN, so that we can setns() to the + * netns when we need to act upon it + */ + ns_caps |= 1UL << CAP_SYS_ADMIN; + /* Keep CAP_NET_BIND_SERVICE, so we can splice + * outbound connections to low port numbers + */ + ns_caps |= 1UL << CAP_NET_BIND_SERVICE; + } + + drop_caps_ep_except(ns_caps);Hmm, I didn't really look into this yet, but there seems to be an issue with filesystem-bound network namespaces now. Running something like: pasta --config-net --netns /run/user/1000/netns/netns-6466ff4b-1efc-2b58-685b-cbc12feb9ccc (from Podman), this happens: [...] [pid 1763223] setns(7, CLONE_NEWNET) = -1 EPERM (Operation not permitted)