The intended semantics of --netns-only are pretty unclear to me. It's intended for pasta, but it's not clear whether its saying the spawned shell should only enter the target netns, or that the passt/pasta packet forwarding process should only sandbox itself in a network namespace, not a user namespace. In any case, as far as I can tell there's not actually any case in which the --netns-only option will work. If nothing else, we will always fail in sandbox(), because it attempts a number of operations which require CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in our initial user namespace when we start, so the only way we can have CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which we won't do with --netns-only. For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before we'll fail before we get to that point: in conf_ns_check() we'll attempt to join the target network namespace. This also requires CAP_SYS_ADMIN in both our current user namespace and the user namespace which owns the target network namespace. Again, since we've dropped capabilities in our original namespace this will never be the case. For pasta creating its own network namespace we'll fail for a similar reason in yet another place. This time we'll fail in nl_sock_init() again because we attempt to enter the new network ns via NS_CALL without having regained CAP_SYS_ADMIN by joining a new user namespace. Because this happens after spawning the shell, it results in a weird failure mode, where the pasta spawned shell is running, but pasta isn't actually handling packets. Exiting the shell will lead to a hang until the process is explicitly killed. Since there's no way to invoke it, remove this feature. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 33 ++++++++++----------------------- passt.c | 10 ++++------ passt.h | 2 -- pasta.c | 28 ++++++++++------------------ util.c | 3 +-- 5 files changed, 25 insertions(+), 51 deletions(-) diff --git a/conf.c b/conf.c index cddc769..1dfbba1 100644 --- a/conf.c +++ b/conf.c @@ -498,7 +498,7 @@ static int conf_ns_check(void *arg) { struct ctx *c = (struct ctx *)arg; - if ((!c->netns_only && setns(c->pasta_userns_fd, CLONE_NEWUSER)) || + if (setns(c->pasta_userns_fd, CLONE_NEWUSER) || setns(c->pasta_netns_fd, CLONE_NEWNET)) c->pasta_userns_fd = c->pasta_netns_fd = -1; @@ -518,17 +518,12 @@ static int conf_ns_check(void *arg) static int conf_ns_opt(struct ctx *c, char *nsdir, const char *conf_userns, const char *optarg) { - int ufd = -1, nfd = -1, try, ret, netns_only_reset = c->netns_only; + int ufd = -1, nfd = -1, try, ret; char userns[PATH_MAX] = { 0 }, netns[PATH_MAX]; char *endptr; long pid_arg; pid_t pid; - if (c->netns_only && *conf_userns) { - err("Both --userns and --netns-only given"); - return -EINVAL; - } - /* It might be a PID, a netns path, or a netns name */ for (try = 0; try < 3; try++) { if (try == 0) { @@ -538,7 +533,7 @@ static int conf_ns_opt(struct ctx *c, pid = pid_arg; - if (!*conf_userns && !c->netns_only) { + if (!*conf_userns) { ret = snprintf(userns, PATH_MAX, "/proc/%i/ns/user", pid); if (ret <= 0 || ret > (int)sizeof(userns)) @@ -548,9 +543,6 @@ static int conf_ns_opt(struct ctx *c, if (ret <= 0 || ret > (int)sizeof(netns)) continue; } else if (try == 1) { - if (!*conf_userns) - c->netns_only = 1; - ret = snprintf(netns, PATH_MAX, "%s", optarg); if (ret <= 0 || ret > (int)sizeof(userns)) continue; @@ -562,19 +554,17 @@ static int conf_ns_opt(struct ctx *c, } /* Don't pass O_CLOEXEC here: ns_enter() needs those files */ - if (!c->netns_only) { - if (*conf_userns) - /* NOLINTNEXTLINE(android-cloexec-open) */ - ufd = open(conf_userns, O_RDONLY); - else if (*userns) - /* NOLINTNEXTLINE(android-cloexec-open) */ - ufd = open(userns, O_RDONLY); - } + if (*conf_userns) + /* NOLINTNEXTLINE(android-cloexec-open) */ + ufd = open(conf_userns, O_RDONLY); + else if (*userns) + /* NOLINTNEXTLINE(android-cloexec-open) */ + ufd = open(userns, O_RDONLY); /* NOLINTNEXTLINE(android-cloexec-open) */ nfd = open(netns, O_RDONLY); - if (nfd == -1 || (ufd == -1 && !c->netns_only)) { + if (nfd == -1 || ufd == -1) { if (nfd >= 0) close(nfd); @@ -604,8 +594,6 @@ static int conf_ns_opt(struct ctx *c, } } - c->netns_only = netns_only_reset; - return -ENOENT; } @@ -1046,7 +1034,6 @@ void conf(struct ctx *c, int argc, char **argv) {"tcp-ns", required_argument, NULL, 'T' }, {"udp-ns", required_argument, NULL, 'U' }, {"userns", required_argument, NULL, 2 }, - {"netns-only", no_argument, &c->netns_only, 1 }, {"nsrun-dir", required_argument, NULL, 3 }, {"config-net", no_argument, &c->pasta_conf_ns, 1 }, {"ns-mac-addr", required_argument, NULL, 4 }, diff --git a/passt.c b/passt.c index 58d9062..64edd39 100644 --- a/passt.c +++ b/passt.c @@ -199,12 +199,10 @@ static int sandbox(struct ctx *c) { int flags = CLONE_NEWIPC | CLONE_NEWNS | CLONE_NEWUTS; - if (!c->netns_only) { - if (c->pasta_userns_fd == -1) - flags |= CLONE_NEWUSER; - else - setns(c->pasta_userns_fd, CLONE_NEWUSER); - } + if (c->pasta_userns_fd == -1) + flags |= CLONE_NEWUSER; + else + setns(c->pasta_userns_fd, CLONE_NEWUSER); c->pasta_userns_fd = -1; diff --git a/passt.h b/passt.h index e541341..7f9d54b 100644 --- a/passt.h +++ b/passt.h @@ -110,7 +110,6 @@ enum passt_modes { * @gid: GID we should drop to, if started as root * @pasta_netns_fd: File descriptor for network namespace in pasta mode * @pasta_userns_fd: Descriptor for user namespace to join, -1 once joined - * @netns_only: In pasta mode, don't join or create a user namespace * @no_netns_quit: In pasta mode, don't exit if fs-bound namespace is gone * @netns_base: Base name for fs-bound namespace, if any, in pasta mode * @netns_dir: Directory of fs-bound namespace, if any, in pasta mode @@ -177,7 +176,6 @@ struct ctx { int pasta_netns_fd; int pasta_userns_fd; - int netns_only; int no_netns_quit; char netns_base[PATH_MAX]; diff --git a/pasta.c b/pasta.c index 5166082..dc35fef 100644 --- a/pasta.c +++ b/pasta.c @@ -82,16 +82,12 @@ static int pasta_wait_for_ns(void *arg) int flags = O_RDONLY | O_CLOEXEC; char ns[PATH_MAX]; - if (c->netns_only) - goto netns; - snprintf(ns, PATH_MAX, "/proc/%i/ns/user", pasta_child_pid); do while ((c->pasta_userns_fd = open(ns, flags)) < 0); while (setns(c->pasta_userns_fd, CLONE_NEWUSER) && !close(c->pasta_userns_fd)); -netns: snprintf(ns, PATH_MAX, "/proc/%i/ns/net", pasta_child_pid); do while ((c->pasta_netns_fd = open(ns, flags)) < 0); @@ -121,21 +117,18 @@ static int pasta_setup_ns(void *arg) { struct pasta_setup_ns_arg *a = (struct pasta_setup_ns_arg *)arg; char *shell; + char buf[BUFSIZ]; - if (!a->c->netns_only) { - char buf[BUFSIZ]; - - snprintf(buf, BUFSIZ, "%i %i %i", 0, a->euid, 1); + snprintf(buf, BUFSIZ, "%i %i %i", 0, a->euid, 1); - FWRITE("/proc/self/uid_map", buf, - "Cannot set uid_map in namespace"); + FWRITE("/proc/self/uid_map", buf, + "Cannot set uid_map in namespace"); - FWRITE("/proc/self/setgroups", "deny", - "Cannot write to setgroups in namespace"); + FWRITE("/proc/self/setgroups", "deny", + "Cannot write to setgroups in namespace"); - FWRITE("/proc/self/gid_map", buf, - "Cannot set gid_map in namespace"); - } + FWRITE("/proc/self/gid_map", buf, + "Cannot set gid_map in namespace"); FWRITE("/proc/sys/net/ipv4/ping_group_range", "0 0", "Cannot set ping_group_range, ICMP requests might fail"); @@ -165,9 +158,8 @@ void pasta_start_ns(struct ctx *c) pasta_child_pid = clone(pasta_setup_ns, ns_fn_stack + sizeof(ns_fn_stack) / 2, - (c->netns_only ? 0 : CLONE_NEWNET) | - CLONE_NEWIPC | CLONE_NEWPID | CLONE_NEWUSER | - CLONE_NEWUTS, + CLONE_NEWNET | CLONE_NEWIPC | CLONE_NEWPID | + CLONE_NEWUSER | CLONE_NEWUTS, (void *)&arg); if (pasta_child_pid == -1) { diff --git a/util.c b/util.c index f45bc72..11cd3f4 100644 --- a/util.c +++ b/util.c @@ -524,8 +524,7 @@ void check_root(struct ctx *c) */ int ns_enter(const struct ctx *c) { - if (!c->netns_only && - c->pasta_userns_fd != -1 && + if (c->pasta_userns_fd != -1 && setns(c->pasta_userns_fd, CLONE_NEWUSER)) exit(EXIT_FAILURE); -- 2.36.1
On Tue, 19 Jul 2022 16:23:10 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:The intended semantics of --netns-only are pretty unclear to me. It's intended for pasta, but it's not clear whether its saying the spawned shell should only enter the target netns, or that the passt/pasta packet forwarding process should only sandbox itself in a network namespace, not a user namespace.The latter. I think this is marginally more clear in the man page, but needs indeed a better explanation.In any case, as far as I can tell there's not actually any case in which the --netns-only option will work. If nothing else, we will always fail in sandbox(), because it attempts a number of operations which require CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in our initial user namespace when we start, so the only way we can have CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which we won't do with --netns-only. For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before we'll fail before we get to that point: in conf_ns_check() we'll attempt to join the target network namespace. This also requires CAP_SYS_ADMIN in both our current user namespace and the user namespace which owns the target network namespace. Again, since we've dropped capabilities in our original namespace this will never be the case....however, we can also have UID 0 in a non-init user namespace, and that will work. This is what happens in the Podman integration case. Unfortunately the demo is broken at the moment (I had to rebase the patch with a bit of care, I'll publish the updated one soon).For pasta creating its own network namespace we'll fail for a similar reason in yet another place. This time we'll fail in nl_sock_init() again because we attempt to enter the new network ns via NS_CALL without having regained CAP_SYS_ADMIN by joining a new user namespace. Because this happens after spawning the shell, it results in a weird failure mode, where the pasta spawned shell is running, but pasta isn't actually handling packets. Exiting the shell will lead to a hang until the process is explicitly killed.Ouch, I didn't think of this. Anyway, let me get back to you in a couple of days on the whole issue. The usage is there, albeit poorly documented, with a broken demo, and no handling of (kind of) corner cases. -- Stefano
On Tue, Jul 19, 2022 at 10:39:25PM +0200, Stefano Brivio wrote:On Tue, 19 Jul 2022 16:23:10 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Definitely. At present it also appears to affect the spawned shell as well, it a rather counter-intuitive way.The intended semantics of --netns-only are pretty unclear to me. It's intended for pasta, but it's not clear whether its saying the spawned shell should only enter the target netns, or that the passt/pasta packet forwarding process should only sandbox itself in a network namespace, not a user namespace.The latter. I think this is marginally more clear in the man page, but needs indeed a better explanation.Hrm.. I thought being UID 0 just meant we started with all the capabilities, so once we've explicitly dropped them we still won't be able to do this. That seemed to be what happened when I tried running it as root.In any case, as far as I can tell there's not actually any case in which the --netns-only option will work. If nothing else, we will always fail in sandbox(), because it attempts a number of operations which require CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in our initial user namespace when we start, so the only way we can have CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which we won't do with --netns-only. For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before we'll fail before we get to that point: in conf_ns_check() we'll attempt to join the target network namespace. This also requires CAP_SYS_ADMIN in both our current user namespace and the user namespace which owns the target network namespace. Again, since we've dropped capabilities in our original namespace this will never be the case....however, we can also have UID 0 in a non-init user namespace, and that will work.This is what happens in the Podman integration case. Unfortunately the demo is broken at the moment (I had to rebase the patch with a bit of care, I'll publish the updated one soon).Can you explain a bit more about what the podman use case is, and why it requires the netns only logic?-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonFor pasta creating its own network namespace we'll fail for a similar reason in yet another place. This time we'll fail in nl_sock_init() again because we attempt to enter the new network ns via NS_CALL without having regained CAP_SYS_ADMIN by joining a new user namespace. Because this happens after spawning the shell, it results in a weird failure mode, where the pasta spawned shell is running, but pasta isn't actually handling packets. Exiting the shell will lead to a hang until the process is explicitly killed.Ouch, I didn't think of this. Anyway, let me get back to you in a couple of days on the whole issue. The usage is there, albeit poorly documented, with a broken demo, and no handling of (kind of) corner cases.
On Wed, 20 Jul 2022 12:45:26 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Tue, Jul 19, 2022 at 10:39:25PM +0200, Stefano Brivio wrote:Right, in that case we should restrict conditions where we can spawn a shell to having UID 0 in a non-init namespace. See working example below.On Tue, 19 Jul 2022 16:23:10 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Definitely. At present it also appears to affect the spawned shell as well, it a rather counter-intuitive way.The intended semantics of --netns-only are pretty unclear to me. It's intended for pasta, but it's not clear whether its saying the spawned shell should only enter the target netns, or that the passt/pasta packet forwarding process should only sandbox itself in a network namespace, not a user namespace.The latter. I think this is marginally more clear in the man page, but needs indeed a better explanation.If you run it as root, it will drop to nobody (or user passed via --runas), and it drops capabilities anyway, so it won't be able to do that. If you run it as UID 0 in a non-init namespace, it won't change the UID, though, and even after dropping capabilities, it will be able to join a network namespace.Hrm.. I thought being UID 0 just meant we started with all the capabilities, so once we've explicitly dropped them we still won't be able to do this. That seemed to be what happened when I tried running it as root.In any case, as far as I can tell there's not actually any case in which the --netns-only option will work. If nothing else, we will always fail in sandbox(), because it attempts a number of operations which require CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in our initial user namespace when we start, so the only way we can have CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which we won't do with --netns-only. For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before we'll fail before we get to that point: in conf_ns_check() we'll attempt to join the target network namespace. This also requires CAP_SYS_ADMIN in both our current user namespace and the user namespace which owns the target network namespace. Again, since we've dropped capabilities in our original namespace this will never be the case....however, we can also have UID 0 in a non-init user namespace, and that will work.Podman creates a network namespace (with a filesystem handle), starts slirp4netns (or pasta, in the integration draft) as UID 0 in a new user namespace, pointing it to the network namespace: # ps aux|grep pasta sbrivio 2283703 0.0 0.0 2070672 56468 pts/10 Sl+ Jul19 0:40 ./bin/podman run --net=pasta:-T,5213-5214,-U,5213-5214 -p 5203-5204:5203-5204/tcp -p 5203-5204:5203-5204/udp --rm -ti alpine sh sbrivio 2283760 0.1 0.0 85300 51120 ? Ss Jul19 0:57 /usr/bin/pasta --config-net -u 5203:5203 -t 5203:5203 -T 5213-5214 -U 5213-5214 /run/user/1000/netns/netns-3b6147d8-34e1-a516-87c3-631938a1973e # readlink /proc/2283703/ns/net net:[4026531992] # readlink /proc/2283760/ns/net net:[4026531992] # readlink /proc/2283703/ns/user user:[4026533032] # readlink /proc/2283760/ns/user user:[4026533032] It's equivalent to this example (for convenience, with PIDs instead of filesystem handles): --- [TTY #0] $ unshare -Ur # echo $$ 4117948 [TTY #1] $ nsenter --preserve-credentials -U -t 4117948 # unshare -n # ip li sh 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 # echo $$ 4126920 [TTY #0] # ./pasta -f --netns-only 4126920 Outbound interface: enp9s0, namespace interface: enp9s0 ARP: address: a8:a1:59:8e:d7:b6 DHCP: assign: 88.198.0.164 mask: 255.255.255.224 router: 88.198.0.161 DNS: 185.12.64.1 185.12.64.2 NDP/DHCPv6: assign: 2a01:4f8:222:904::2 router: fe80::1 our link-local: fe80::aaa1:59ff:fe8e:d7b6 DNS: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 [TTY #1] # ip li sh 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp9s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f2:c0:09:fe:89:c3 brd ff:ff:ff:ff:ff:ff --- Unrelated to the Podman case: you can also do this and let pasta spawn an interactive shell with its network namespace (also created by pasta) detached: --- $ unshare -Ur # ./pasta --netns-only Cannot set ping_group_range, ICMP requests might fail $ ip li sh 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp9s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether a8:a1:59:8e:d7:b6 brd ff:ff:ff:ff:ff:ff --- ...if you then log out from this shell, it will hang: openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) but that's a separate issue (which I just discovered). -- StefanoThis is what happens in the Podman integration case. Unfortunately the demo is broken at the moment (I had to rebase the patch with a bit of care, I'll publish the updated one soon).Can you explain a bit more about what the podman use case is, and why it requires the netns only logic?