On Wed, 20 Jul 2022 12:45:26 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Tue, Jul 19, 2022 at 10:39:25PM +0200, Stefano Brivio wrote:Right, in that case we should restrict conditions where we can spawn a shell to having UID 0 in a non-init namespace. See working example below.On Tue, 19 Jul 2022 16:23:10 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Definitely. At present it also appears to affect the spawned shell as well, it a rather counter-intuitive way.The intended semantics of --netns-only are pretty unclear to me. It's intended for pasta, but it's not clear whether its saying the spawned shell should only enter the target netns, or that the passt/pasta packet forwarding process should only sandbox itself in a network namespace, not a user namespace.The latter. I think this is marginally more clear in the man page, but needs indeed a better explanation.If you run it as root, it will drop to nobody (or user passed via --runas), and it drops capabilities anyway, so it won't be able to do that. If you run it as UID 0 in a non-init namespace, it won't change the UID, though, and even after dropping capabilities, it will be able to join a network namespace.Hrm.. I thought being UID 0 just meant we started with all the capabilities, so once we've explicitly dropped them we still won't be able to do this. That seemed to be what happened when I tried running it as root.In any case, as far as I can tell there's not actually any case in which the --netns-only option will work. If nothing else, we will always fail in sandbox(), because it attempts a number of operations which require CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in our initial user namespace when we start, so the only way we can have CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which we won't do with --netns-only. For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before we'll fail before we get to that point: in conf_ns_check() we'll attempt to join the target network namespace. This also requires CAP_SYS_ADMIN in both our current user namespace and the user namespace which owns the target network namespace. Again, since we've dropped capabilities in our original namespace this will never be the case....however, we can also have UID 0 in a non-init user namespace, and that will work.Podman creates a network namespace (with a filesystem handle), starts slirp4netns (or pasta, in the integration draft) as UID 0 in a new user namespace, pointing it to the network namespace: # ps aux|grep pasta sbrivio 2283703 0.0 0.0 2070672 56468 pts/10 Sl+ Jul19 0:40 ./bin/podman run --net=pasta:-T,5213-5214,-U,5213-5214 -p 5203-5204:5203-5204/tcp -p 5203-5204:5203-5204/udp --rm -ti alpine sh sbrivio 2283760 0.1 0.0 85300 51120 ? Ss Jul19 0:57 /usr/bin/pasta --config-net -u 5203:5203 -t 5203:5203 -T 5213-5214 -U 5213-5214 /run/user/1000/netns/netns-3b6147d8-34e1-a516-87c3-631938a1973e # readlink /proc/2283703/ns/net net:[4026531992] # readlink /proc/2283760/ns/net net:[4026531992] # readlink /proc/2283703/ns/user user:[4026533032] # readlink /proc/2283760/ns/user user:[4026533032] It's equivalent to this example (for convenience, with PIDs instead of filesystem handles): --- [TTY #0] $ unshare -Ur # echo $$ 4117948 [TTY #1] $ nsenter --preserve-credentials -U -t 4117948 # unshare -n # ip li sh 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 # echo $$ 4126920 [TTY #0] # ./pasta -f --netns-only 4126920 Outbound interface: enp9s0, namespace interface: enp9s0 ARP: address: a8:a1:59:8e:d7:b6 DHCP: assign: 88.198.0.164 mask: 255.255.255.224 router: 88.198.0.161 DNS: 185.12.64.1 185.12.64.2 NDP/DHCPv6: assign: 2a01:4f8:222:904::2 router: fe80::1 our link-local: fe80::aaa1:59ff:fe8e:d7b6 DNS: 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1 [TTY #1] # ip li sh 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp9s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f2:c0:09:fe:89:c3 brd ff:ff:ff:ff:ff:ff --- Unrelated to the Podman case: you can also do this and let pasta spawn an interactive shell with its network namespace (also created by pasta) detached: --- $ unshare -Ur # ./pasta --netns-only Cannot set ping_group_range, ICMP requests might fail $ ip li sh 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp9s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether a8:a1:59:8e:d7:b6 brd ff:ff:ff:ff:ff:ff --- ...if you then log out from this shell, it will hang: openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/proc/6500/ns/net", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) but that's a separate issue (which I just discovered). -- StefanoThis is what happens in the Podman integration case. Unfortunately the demo is broken at the moment (I had to rebase the patch with a bit of care, I'll publish the updated one soon).Can you explain a bit more about what the podman use case is, and why it requires the netns only logic?