Here are some cleanup, as promised here: https://listman.redhat.com/archives/libvir-list/2023-February/237721.html Now, there are still some patches missing. Firstly, we still don't really capture error from passt. My suggestion was to wait for socket to show up with errfd open. But active wait is viewed as undesirable [1]. Secondly, Laine reported SELinux issues. Yeah, I don't see us setting SELinux label on nor socket that passt and QEMU talk to each other, nor on the log file. Speaking of which - we usually have per domain (or per helper daemon instance even) log file, while for passt we have a global one (/var/log/passt.log). I don't think that will fly if two or more SELinux enabled domains want to use passt. Thirdly, Stefano suggested a graceful shutdown for passt: have libvirt connect to the socket and close it. Since we pass --one-off, this should singal passt to exit. But I haven't implemented that because it's redundant. We can't rely on passt quitting itself and thus use the big gun (virPidFileForceCleanupPath()) at which point, the socket way is just an optimization. I might look into the first two, at some point. But not today. 1: https://listman.redhat.com/archives/libvir-list/2023-February/237663.html Michal Prívozník (4): Revert "qemu: allow passt to self-daemonize" qemu_extdevice: Make qemuExtDevicesHasDevice() check def->nets qemu_passt: Report error when getting passt PID failed qemu_passt: Don't let passt fork off src/qemu/qemu_extdevice.c | 11 +++++++++++ src/qemu/qemu_passt.c | 15 ++++++++++----- 2 files changed, 21 insertions(+), 5 deletions(-) -- 2.39.1
This reverts commit 0c4e716835eaf2a575bd063fde074c0fc7c4e4d4. This patch was pushed by my mistake. Even though it got ACKed on the list, I've raised couple of issues with it. They will be fixed in next commits. Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com> --- src/qemu/qemu_passt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index f640a69c00..0f09bf3db8 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -141,23 +141,24 @@ qemuPasstStart(virDomainObj *vm, g_autofree char *passtSocketName = qemuPasstCreateSocketPath(vm, net); g_autoptr(virCommand) cmd = NULL; g_autofree char *pidfile = qemuPasstCreatePidFilename(vm, net); - g_autofree char *errbuf = NULL; char macaddr[VIR_MAC_STRING_BUFLEN]; size_t i; pid_t pid = (pid_t) -1; int exitstatus = 0; int cmdret = 0; + VIR_AUTOCLOSE errfd = -1; cmd = virCommandNew(PASST); virCommandClearCaps(cmd); - virCommandSetErrorBuffer(cmd, &errbuf); + virCommandSetPidFile(cmd, pidfile); + virCommandSetErrorFD(cmd, &errfd); + virCommandDaemonize(cmd); virCommandAddArgList(cmd, "--one-off", "--socket", passtSocketName, "--mac-addr", virMacAddrFormat(&net->mac, macaddr), - "--pid", pidfile, NULL); if (net->mtu) { @@ -263,7 +264,7 @@ qemuPasstStart(virDomainObj *vm, if (cmdret < 0 || exitstatus != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, - _("Could not start 'passt': %s"), errbuf); + _("Could not start 'passt'. exitstatus: %d"), exitstatus); goto error; } -- 2.39.1
On 2/14/23 6:51 AM, Michal Privoznik wrote:This reverts commit 0c4e716835eaf2a575bd063fde074c0fc7c4e4d4. This patch was pushed by my mistake. Even though it got ACKed on the list, I've raised couple of issues with it. They will be fixed in next commits.I admire your optimism :-)Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>Reviewed-by: Laine Stump <laine(a)redhat.com>--- src/qemu/qemu_passt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index f640a69c00..0f09bf3db8 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -141,23 +141,24 @@ qemuPasstStart(virDomainObj *vm, g_autofree char *passtSocketName = qemuPasstCreateSocketPath(vm, net); g_autoptr(virCommand) cmd = NULL; g_autofree char *pidfile = qemuPasstCreatePidFilename(vm, net); - g_autofree char *errbuf = NULL; char macaddr[VIR_MAC_STRING_BUFLEN]; size_t i; pid_t pid = (pid_t) -1; int exitstatus = 0; int cmdret = 0; + VIR_AUTOCLOSE errfd = -1; cmd = virCommandNew(PASST); virCommandClearCaps(cmd); - virCommandSetErrorBuffer(cmd, &errbuf); + virCommandSetPidFile(cmd, pidfile); + virCommandSetErrorFD(cmd, &errfd); + virCommandDaemonize(cmd); virCommandAddArgList(cmd, "--one-off", "--socket", passtSocketName, "--mac-addr", virMacAddrFormat(&net->mac, macaddr), - "--pid", pidfile, NULL); if (net->mtu) { @@ -263,7 +264,7 @@ qemuPasstStart(virDomainObj *vm, if (cmdret < 0 || exitstatus != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, - _("Could not start 'passt': %s"), errbuf); + _("Could not start 'passt'. exitstatus: %d"), exitstatus); goto error; }
We can have external helper processes running for domain <interface/> too (e.g. slirp or passt). But this is not reflected in qemuExtDevicesHasDevice() which simply ignores these. Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com> --- src/qemu/qemu_extdevice.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/qemu/qemu_extdevice.c b/src/qemu/qemu_extdevice.c index fdefe59215..47e97f3565 100644 --- a/src/qemu/qemu_extdevice.c +++ b/src/qemu/qemu_extdevice.c @@ -296,6 +296,17 @@ qemuExtDevicesHasDevice(virDomainDef *def) return true; } + for (i = 0; i < def->nnets; i++) { + virDomainNetDef *net = def->nets[i]; + + if (QEMU_DOMAIN_NETWORK_PRIVATE(net)->slirp) + return true; + + if (net->type == VIR_DOMAIN_NET_TYPE_USER && + net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) + return true; + } + for (i = 0; i < def->ntpms; i++) { if (def->tpms[i]->type == VIR_DOMAIN_TPM_TYPE_EMULATOR) return true; -- 2.39.1
On 2/14/23 6:51 AM, Michal Privoznik wrote:We can have external helper processes running for domain <interface/> too (e.g. slirp or passt). But this is not reflected in qemuExtDevicesHasDevice() which simply ignores these.The slirp-helper patches missed adding the check in this (oddly-named) function (even while adding in the correct hunk to qemuExtDevicesSetupCroup()) probably because it wasn't really obvious without reading/interpreting/understanding all the code in two separate files that it was needed; my passt patches missed adding the check in this function because I was following the pattern of what was done for slirp, and slirp hadn't touched this function (nor had it touched the function that calls both of these functions, qemuSetupCgroupForExtDevices(), which is in another file). It's reasonable to think that some future person may also not notice qemuExtDevicesHasDevice(), and believe that they only need to modify qemuExtDevicesSetupCgroup(). Anyway, my point is that I think this could be avoided by adding a comment in qemuExtDevicesSetupCgroup() that points out it is only called if qemuExtDevicesHasDevice() returns true, and so any addition to qemuExtDevicesSetupCgroup() should have a corresponding addition to qemuExtDevicesHasDevice(). It's too late at night / early in the morning for my brain to compose a concise sentence to this effect, but it would make me happy if you added one before pushing.Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>Reviewed-by: Laine Stump <laine(a)redhat.com>--- src/qemu/qemu_extdevice.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/qemu/qemu_extdevice.c b/src/qemu/qemu_extdevice.c index fdefe59215..47e97f3565 100644 --- a/src/qemu/qemu_extdevice.c +++ b/src/qemu/qemu_extdevice.c @@ -296,6 +296,17 @@ qemuExtDevicesHasDevice(virDomainDef *def) return true; } + for (i = 0; i < def->nnets; i++) { + virDomainNetDef *net = def->nets[i]; + + if (QEMU_DOMAIN_NETWORK_PRIVATE(net)->slirp) + return true; + + if (net->type == VIR_DOMAIN_NET_TYPE_USER && + net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST) + return true; + } + for (i = 0; i < def->ntpms; i++) { if (def->tpms[i]->type == VIR_DOMAIN_TPM_TYPE_EMULATOR) return true;
On 2/15/23 08:22, Laine Stump wrote:On 2/14/23 6:51 AM, Michal Privoznik wrote:Sounds good, but I'd rather do that in a follow up patch, as it's a different semantic change than this commit. Done here: https://listman.redhat.com/archives/libvir-list/2023-February/237863.htmlWe can have external helper processes running for domain <interface/> too (e.g. slirp or passt). But this is not reflected in qemuExtDevicesHasDevice() which simply ignores these.The slirp-helper patches missed adding the check in this (oddly-named) function (even while adding in the correct hunk to qemuExtDevicesSetupCroup()) probably because it wasn't really obvious without reading/interpreting/understanding all the code in two separate files that it was needed; my passt patches missed adding the check in this function because I was following the pattern of what was done for slirp, and slirp hadn't touched this function (nor had it touched the function that calls both of these functions, qemuSetupCgroupForExtDevices(), which is in another file). It's reasonable to think that some future person may also not notice qemuExtDevicesHasDevice(), and believe that they only need to modify qemuExtDevicesSetupCgroup(). Anyway, my point is that I think this could be avoided by adding a comment in qemuExtDevicesSetupCgroup() that points out it is only called if qemuExtDevicesHasDevice() returns true, and so any addition to qemuExtDevicesSetupCgroup() should have a corresponding addition to qemuExtDevicesHasDevice(). It's too late at night / early in the morning for my brain to compose a concise sentence to this effect, but it would make me happy if you added one before pushing.MichalSigned-off-by: Michal Privoznik <mprivozn(a)redhat.com>Reviewed-by: Laine Stump <laine(a)redhat.com>
If qemuPasstGetPid() fails, or the passt's PID is -1 then qemuPasstSetupCgroup() returns early without any error message set. Report an appropriate error. Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com> --- src/qemu/qemu_passt.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index 0f09bf3db8..78830fdc26 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -125,8 +125,11 @@ qemuPasstSetupCgroup(virDomainObj *vm, { pid_t pid = (pid_t) -1; - if (qemuPasstGetPid(vm, net, &pid) < 0 || pid <= 0) + if (qemuPasstGetPid(vm, net, &pid) < 0 || pid <= 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not get process ID of passt")); return -1; + } return virCgroupAddProcess(cgroup, pid); } -- 2.39.1
On 2/14/23 6:51 AM, Michal Privoznik wrote:If qemuPasstGetPid() fails, or the passt's PID is -1 then qemuPasstSetupCgroup() returns early without any error message set. Report an appropriate error. Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>Reviewed-by: Laine Stump <laine(a)redhat.com>--- src/qemu/qemu_passt.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index 0f09bf3db8..78830fdc26 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -125,8 +125,11 @@ qemuPasstSetupCgroup(virDomainObj *vm, { pid_t pid = (pid_t) -1; - if (qemuPasstGetPid(vm, net, &pid) < 0 || pid <= 0) + if (qemuPasstGetPid(vm, net, &pid) < 0 || pid <= 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Could not get process ID of passt")); return -1; + } return virCgroupAddProcess(cgroup, pid); }
When passt starts it tries to do some security measures to restrict itself. For instance, it creates its own namespaces, umounts basically everything, drops capabilities, forks off to further restrict itself (the child is where all interesting work takes place now). This is sound, except it's causing two problems: 1) the PID file FD, which we leak into the passt process, gets closed (and thus our virPidFile*() helpers see unlocked PID file, which makes them think the process is gone), 2) the PID file no longer reflects true PID of the process. Worse, the child calls setsid() so we can't even kill the whole process group. I mean, we can but it won't be any good. Fortunately, passt has '--foreground' argument, which causes it to undergo the same security measures but without forking off the child. This in turn means, that the PID file FD won't get closed and the PID file itself contains the correct PID. Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com> --- src/qemu/qemu_passt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index 78830fdc26..441cfe87e8 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -159,6 +159,7 @@ qemuPasstStart(virDomainObj *vm, virCommandDaemonize(cmd); virCommandAddArgList(cmd, + "--foreground", "--one-off", "--socket", passtSocketName, "--mac-addr", virMacAddrFormat(&net->mac, macaddr), -- 2.39.1
On Tue, 14 Feb 2023 12:51:22 +0100 Michal Privoznik <mprivozn(a)redhat.com> wrote:When passt starts it tries to do some security measures to restrict itself. For instance, it creates its own namespaces, umounts basically everything, drops capabilities, forks off to further restrict itself (the child is where all interesting work takes place now). This is sound, except it's causing two problems: 1) the PID file FD, which we leak into the passt process, gets closed (and thus our virPidFile*() helpers see unlocked PID file, which makes them think the process is gone),I didn't realise this was the case, but giving passt write (unless I'm missing something) access to a file created by libvirtd doesn't look desirable to me.2) the PID file no longer reflects true PID of the process. Worse, the child calls setsid() so we can't even kill the whole process group. I mean, we can but it won't be any good. Fortunately, passt has '--foreground' argument, which causes it to undergo the same security measures but without forking off the child.They're not the same -- unfortunately they can't be, because, on Linux, you can't change the PID of an existing process, so there's no way to enter a new PID namespace without clone(). If passt remains in the same PID namespace, it's still able to see PIDs of other processes, which is not desirable from a security perspective. Again from a security perspective, this is probably a small impact, so I guess it's fine if there's no other way around it. But I see a lot of ways around it... -- Stefano
On 2/14/23 14:02, Stefano Brivio wrote:On Tue, 14 Feb 2023 12:51:22 +0100 Michal Privoznik <mprivozn(a)redhat.com> wrote:That's not what's happening here. In fact, the opposite. We let libvirt create the file, lock and write it. Then the file's FD is leaked into passt process. This way, we can learn whether the PID file is still valid: if we just try to lock it and: a) fail, then the file is still locked, i.e. passt is still running, b) succeed, then the file is no longer locket by passt, i.e. it's no longer running. Thing is, passt doesn't even know about the FD (unless it starts closing FDs "randomly", e.g. via closefrom(3) or an alternative). This is the reason we can't use passt's --pid, because it writes the PID file without any locking dance. Therefore, libvirt can't know whether the PID file is still valid. If it did just read the file and killed PID from there it might kill a random process that was unfortunate to be assigned the same PID. BTW, there are two other issues with PID files in passt: 1) no locking means, that the file is very easily overwritten: $ passt --pid /tmp/passt.pid 2>/dev/null & \ passt --pid /tmp/passt.pid 2>/dev/null; \ pgrep passt ; cat /tmp/passt.pid [1] 21029 [1]+ Done passt --pid /tmp/passt.pid 2> /dev/null 21034 21035 21035 2) with --daemonize, only the PID of the parent process is written which is pretty much useless, as the parent quits shortly after clone(). Now, libvirt's virPidFile* machinery (src/util/virpidfile.c) ensures that case 1) won't happen (yes, we have internal APIs that read pid files unlocked, but those are not used from this code we're talking about). And this patch tries to fix the case 2) by instructing passt to not clone(). If it means that it can't use PID namespace, then so be it. It's better to not let processes behind.When passt starts it tries to do some security measures to restrict itself. For instance, it creates its own namespaces, umounts basically everything, drops capabilities, forks off to further restrict itself (the child is where all interesting work takes place now). This is sound, except it's causing two problems: 1) the PID file FD, which we leak into the passt process, gets closed (and thus our virPidFile*() helpers see unlocked PID file, which makes them think the process is gone),I didn't realise this was the case, but giving passt write (unless I'm missing something) access to a file created by libvirtd doesn't look desirable to me.I understand that, but it's already confined in plenty other ways. It's way worse to leave a process running than being able to see other PIDs.2) the PID file no longer reflects true PID of the process. Worse, the child calls setsid() so we can't even kill the whole process group. I mean, we can but it won't be any good. Fortunately, passt has '--foreground' argument, which causes it to undergo the same security measures but without forking off the child.They're not the same -- unfortunately they can't be, because, on Linux, you can't change the PID of an existing process, so there's no way to enter a new PID namespace without clone(). If passt remains in the same PID namespace, it's still able to see PIDs of other processes, which is not desirable from a security perspective.Again from a security perspective, this is probably a small impact, so I guess it's fine if there's no other way around it. But I see a lot of ways around it...Is there? If we have a helper process that fork()-s then the only way we can 'track' its PIDs (and kill them) is by placing them into a CGroup. Until we do that we have to trust helper processes to behave properly. But that's something I'd like to avoid. Michal
On Tue, 14 Feb 2023 16:30:17 +0100 Michal Prívozník <mprivozn(a)redhat.com> wrote:On 2/14/23 14:02, Stefano Brivio wrote:Okay, but the outcome is exactly what I'm saying. You're using advisory locking, what prevents passt from writing (a lot of bytes) to it? I just followed the path from open() to fcntl(), maybe I'm missing something.On Tue, 14 Feb 2023 12:51:22 +0100 Michal Privoznik <mprivozn(a)redhat.com> wrote:That's not what's happening here. In fact, the opposite. We let libvirt create the file, lock and write it.When passt starts it tries to do some security measures to restrict itself. For instance, it creates its own namespaces, umounts basically everything, drops capabilities, forks off to further restrict itself (the child is where all interesting work takes place now). This is sound, except it's causing two problems: 1) the PID file FD, which we leak into the passt process, gets closed (and thus our virPidFile*() helpers see unlocked PID file, which makes them think the process is gone),I didn't realise this was the case, but giving passt write (unless I'm missing something) access to a file created by libvirtd doesn't look desirable to me.Then the file's FD is leaked into passt process. This way, we can learn whether the PID file is still valid: if we just try to lock it and: a) fail, then the file is still locked, i.e. passt is still running, b) succeed, then the file is no longer locket by passt, i.e. it's no longer running. Thing is, passt doesn't even know about the FD (unless it starts closing FDs "randomly", e.g. via closefrom(3) or an alternative).But it can also use it for write(2), as long as the attacker knows (quite obvious information) that libvirt is starting passt. The number of that file descriptor is deterministic, too.This is the reason we can't use passt's --pid, because it writes the PID file without any locking dance. Therefore, libvirt can't know whether the PID file is still valid. If it did just read the file and killed PID from there it might kill a random process that was unfortunate to be assigned the same PID.I don't think that's enough, passt could pass a copy of that file descriptor to another process (via e.g. SCM_RIGHTS), which then locks it. It's enough to cover the unfortunate random process, but not something done intentionally. That's the reason why pidfd_open(2) and CLONE_PIDFD were introduced.BTW, there are two other issues with PID files in passt: 1) no locking means, that the file is very easily overwritten: $ passt --pid /tmp/passt.pid 2>/dev/null & \ passt --pid /tmp/passt.pid 2>/dev/null; \ pgrep passt ; cat /tmp/passt.pid [1] 21029 [1]+ Done passt --pid /tmp/passt.pid 2> /dev/null 21034 21035 21035It can be overwritten even with advisory locking, and mandatory locking isn't implemented in recent kernels. I'm not aware of any way to prevent this (and I wouldn't want to give the illusion that this is a "secure" mechanism, either).2) with --daemonize, only the PID of the parent process is written which is pretty much useless, as the parent quits shortly after clone().No, that's the PID of the child (of course!), see passt's __daemon() (util.c). It's written by the parent, sure, because the child doesn't know its own "real" PID.Now, libvirt's virPidFile* machinery (src/util/virpidfile.c) ensures that case 1) won't happen (yes, we have internal APIs that read pid files unlocked, but those are not used from this code we're talking about). And this patch tries to fix the case 2) by instructing passt to not clone(). If it means that it can't use PID namespace, then so be it. It's better to not let processes behind.Even though 2) is nothing you need to fix, I'd still suggest to *not* use that option *for this purpose*, because it's prone to races.Not if somebody manages to find a creative way to attack other processes by arbitrary code execution in passt. In any case, there's no need to leave any process running.I understand that, but it's already confined in plenty other ways. It's way worse to leave a process running than being able to see other PIDs.2) the PID file no longer reflects true PID of the process. Worse, the child calls setsid() so we can't even kill the whole process group. I mean, we can but it won't be any good. Fortunately, passt has '--foreground' argument, which causes it to undergo the same security measures but without forking off the child.They're not the same -- unfortunately they can't be, because, on Linux, you can't change the PID of an existing process, so there's no way to enter a new PID namespace without clone(). If passt remains in the same PID namespace, it's still able to see PIDs of other processes, which is not desirable from a security perspective.Yes, see my email on the other thread, archived at: https://listman.redhat.com/archives/libvir-list/2023-February/237741.html That is: just connect() and close() the socket if qemu doesn't do it. That's not more or less reliable than expecting passt to do the right thing when you give --foreground.Again from a security perspective, this is probably a small impact, so I guess it's fine if there's no other way around it. But I see a lot of ways around it...Is there?If we have a helper process that fork()-s then the only way we can 'track' its PIDs (and kill them) is by placing them into a CGroup. Until we do that we have to trust helper processes to behave properly. But that's something I'd like to avoid.By the way, you need in any case to rely on --one-off as qemu terminates, because passt should run even without libvirtd running, as long as the guest is connected (for details about this reasoning, see https://bugzilla.redhat.com/show_bug.cgi?id=1597326). -- Stefano
On 2/14/23 8:02 AM, Stefano Brivio wrote:On Tue, 14 Feb 2023 12:51:22 +0100 Michal Privoznik <mprivozn(a)redhat.com> wrote:When passt starts it tries to do some security measures to restrict itself. For instance, it creates its own namespaces, umounts basically everything, drops capabilities, forks off to further restrict itself (the child is where all interesting work takes place now). This is sound, except it's causing two problems: 1) the PID file FD, which we leak into the passt process, gets closed (and thus our virPidFile*() helpers see unlocked PID file, which makes them think the process is gone),I didn't realise this was the case, but giving passt write (unless I'm missing something) access to a file created by libvirtd doesn't look desirable to me.> 2) the PID file no longer reflects true PID of the process. > > Worse, the child calls setsid() so we can't even kill the whole > process group. I mean, we can but it won't be any good.I think that (incorrect PID in the pidfile) is happening because Michal is using the original version of my patches that were pushed - I had mimicked the behavior of slirp, where libvirt deamonizes the new process. If that process then daemonizes itself, we have some sort of "double daemon"; libvirt has saved off the pid of what it thinks is going to be the final process, but then that process further forks and exits from the process whose pid libvirt saved. But because passt was cleaning up after itself I hadn't noticed the discrepancy in pids when testing. Without going into all the details of the pidfile and locking and etc, I just want to say that if we can fork/exec dnsmasq and let it daemonize itself and create its own pidfile, then certainly we can do the same thing for passt. (and if there's a fundamental problem, then it's a fundamental problem for dnsmasq as well).> > Fortunately, passt has '--foreground' argument, which causes it > to undergo the same security measures but without forking off the > child.But if we do --foreground in combination with calling virCommandDaemonize(), then won't we still have the problem that libvirt won't know whether or not passt has failed to start (not unless we want to put in a bunch of gorp to continue grabbing stderr while watching to see if the passt process has exited, etc. It's going to take some convincing for me to think we should run passt with --foreground rather than letting it daemonize itself.They're not the same -- unfortunately they can't be, because, on Linux, you can't change the PID of an existing process, so there's no way to enter a new PID namespace without clone(). If passt remains in the same PID namespace, it's still able to see PIDs of other processes, which is not desirable from a security perspective. Again from a security perspective, this is probably a small impact, so I guess it's fine if there's no other way around it. But I see a lot of ways around it...
On 2/15/23 08:50, Laine Stump wrote:On 2/14/23 8:02 AM, Stefano Brivio wrote:Alright. I think I have a solution that would please everybody involved. I'll post it tomorrow though. I need to test it thoroughly. We would be able to get passt's PID (which is needed not only for killing it, but also for CGroup placement), NOT use --foreground and still pass errors from it to users (that is unless logfile was specified, because unfortunately, --log-file and --stderr are mutually exclusive). MichalOn Tue, 14 Feb 2023 12:51:22 +0100 Michal Privoznik <mprivozn(a)redhat.com> wrote:When passt starts it tries to do some security measures to restrict itself. For instance, it creates its own namespaces, umounts basically everything, drops capabilities, forks off to further restrict itself (the child is where all interesting work takes place now). This is sound, except it's causing two problems: 1) the PID file FD, which we leak into the passt process, gets closed (and thus our virPidFile*() helpers see unlocked PID file, which makes them think the process is gone),I didn't realise this was the case, but giving passt write (unless I'm missing something) access to a file created by libvirtd doesn't look desirable to me.> 2) the PID file no longer reflects true PID of the process. > > Worse, the child calls setsid() so we can't even kill the whole > process group. I mean, we can but it won't be any good.I think that (incorrect PID in the pidfile) is happening because Michal is using the original version of my patches that were pushed - I had mimicked the behavior of slirp, where libvirt deamonizes the new process. If that process then daemonizes itself, we have some sort of "double daemon"; libvirt has saved off the pid of what it thinks is going to be the final process, but then that process further forks and exits from the process whose pid libvirt saved. But because passt was cleaning up after itself I hadn't noticed the discrepancy in pids when testing. Without going into all the details of the pidfile and locking and etc, I just want to say that if we can fork/exec dnsmasq and let it daemonize itself and create its own pidfile, then certainly we can do the same thing for passt. (and if there's a fundamental problem, then it's a fundamental problem for dnsmasq as well).
On 2/15/23 12:04 PM, Michal Prívozník wrote:On 2/15/23 08:50, Laine Stump wrote:This last item is fixed by patches I have pending to passt itself.On 2/14/23 8:02 AM, Stefano Brivio wrote:Alright. I think I have a solution that would please everybody involved. I'll post it tomorrow though. I need to test it thoroughly. We would be able to get passt's PID (which is needed not only for killing it, but also for CGroup placement), NOT use --foreground and still pass errors from it to users (that is unless logfile was specified, because unfortunately, --log-file and --stderr are mutually exclusive).On Tue, 14 Feb 2023 12:51:22 +0100 Michal Privoznik <mprivozn(a)redhat.com> wrote:When passt starts it tries to do some security measures to restrict itself. For instance, it creates its own namespaces, umounts basically everything, drops capabilities, forks off to further restrict itself (the child is where all interesting work takes place now). This is sound, except it's causing two problems: 1) the PID file FD, which we leak into the passt process, gets closed (and thus our virPidFile*() helpers see unlocked PID file, which makes them think the process is gone),I didn't realise this was the case, but giving passt write (unless I'm missing something) access to a file created by libvirtd doesn't look desirable to me.> 2) the PID file no longer reflects true PID of the process. > > Worse, the child calls setsid() so we can't even kill the whole > process group. I mean, we can but it won't be any good.I think that (incorrect PID in the pidfile) is happening because Michal is using the original version of my patches that were pushed - I had mimicked the behavior of slirp, where libvirt deamonizes the new process. If that process then daemonizes itself, we have some sort of "double daemon"; libvirt has saved off the pid of what it thinks is going to be the final process, but then that process further forks and exits from the process whose pid libvirt saved. But because passt was cleaning up after itself I hadn't noticed the discrepancy in pids when testing. Without going into all the details of the pidfile and locking and etc, I just want to say that if we can fork/exec dnsmasq and let it daemonize itself and create its own pidfile, then certainly we can do the same thing for passt. (and if there's a fundamental problem, then it's a fundamental problem for dnsmasq as well).
On Wed, 15 Feb 2023 18:04:56 +0100 Michal Prívozník <mprivozn(a)redhat.com> wrote:On 2/15/23 08:50, Laine Stump wrote:That doesn't need to be the case (--log-file and --stderr being mutually exclusive)... if you have a use case for it, let's change that in passt. I just wanted to keep it simple for users ("give a log file, and be sure it won't spam"). Also mind that Laine's series: https://archives.passt.top/passt-dev/20230215082437.110151-1-laine@redhat.c… *should* already cover all the cases where libvirt is interested in relaying "early" errors back to the user. By the way, the one below is pretty much the patch I would have proposed for libvirt. I prepared it earlier today and didn't have a chance to test it yet, it's compile-tested only, and doesn't take cgroups into account (which, it seems, is needed no matter the lifecycle). So I'm sharing it here as reference (that's how simple I wanted it to be -- minus cgroups), or if it's convenient for you to copy and paste something. --- diff --git a/src/qemu/qemu_extdevice.c b/src/qemu/qemu_extdevice.c index fdefe59215..23d25c134a 100644 --- a/src/qemu/qemu_extdevice.c +++ b/src/qemu/qemu_extdevice.c @@ -337,12 +337,6 @@ qemuExtDevicesSetupCgroup(virQEMUDriver *driver, if (slirp && qemuSlirpSetupCgroup(slirp, cgroup) < 0) return -1; - - if (net->type == VIR_DOMAIN_NET_TYPE_USER && - net->backend.type == VIR_DOMAIN_NET_BACKEND_PASST && - qemuPasstSetupCgroup(vm, net, cgroup) < 0) { - return -1; - } } for (i = 0; i < def->ntpms; i++) { diff --git a/src/qemu/qemu_passt.c b/src/qemu/qemu_passt.c index f640a69c00..2327b3e25e 100644 --- a/src/qemu/qemu_passt.c +++ b/src/qemu/qemu_passt.c @@ -28,29 +28,12 @@ #include "virerror.h" #include "virjson.h" #include "virlog.h" -#include "virpidfile.h" #define VIR_FROM_THIS VIR_FROM_NONE VIR_LOG_INIT("qemu.passt"); -static char * -qemuPasstCreatePidFilename(virDomainObj *vm, - virDomainNetDef *net) -{ - qemuDomainObjPrivate *priv = vm->privateData; - virQEMUDriver *driver = priv->driver; - g_autofree char *shortName = virDomainDefGetShortName(vm->def); - g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); - g_autofree char *name = NULL; - - name = g_strdup_printf("%s-%s-passt", shortName, net->info.alias); - - return virPidFileBuildPath(cfg->passtStateDir, name); -} - - static char * qemuPasstCreateSocketPath(virDomainObj *vm, virDomainNetDef *net) @@ -65,17 +48,6 @@ qemuPasstCreateSocketPath(virDomainObj *vm, } -static int -qemuPasstGetPid(virDomainObj *vm, - virDomainNetDef *net, - pid_t *pid) -{ - g_autofree char *pidfile = qemuPasstCreatePidFilename(vm, net); - - return virPidFileReadPathIfLocked(pidfile, pid); -} - - int qemuPasstAddNetProps(virDomainObj *vm, virDomainNetDef *net, @@ -106,29 +78,32 @@ void qemuPasstStop(virDomainObj *vm, virDomainNetDef *net) { - g_autofree char *pidfile = qemuPasstCreatePidFilename(vm, net); - virErrorPtr orig_err; - - virErrorPreserveLast(&orig_err); - - if (virPidFileForceCleanupPath(pidfile) < 0) - VIR_WARN("Unable to kill passt process"); - - virErrorRestore(&orig_err); -} - + g_autofree char *passtSocketName = qemuPasstCreateSocketPath(vm, net); + struct sockaddr_un addr = { .sun_family = AF_UNIX }; + int fd; + + fd = socket(AF_UNIX, SOCK_STREAM, 0); + if (fd < 0) { + virReportError(errno, + "%s", _("Unable to open socket to connect to passt")); + return; + } -int -qemuPasstSetupCgroup(virDomainObj *vm, - virDomainNetDef *net, - virCgroup *cgroup) -{ - pid_t pid = (pid_t) -1; + if (virStrcpyStatic(addr.sun_path, passtSocketName) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Socket path %s too big for destination"), + passtSocketName); + goto out; + } - if (qemuPasstGetPid(vm, net, &pid) < 0 || pid <= 0) - return -1; + if (connect(fd, (struct sockaddr *)&addr, sizeof(addr)) < 0) { + if (errno != ECONNREFUSED && errno != ENOENT) + virReportError(errno, + "%s", _("Unable to connect to passt to terminate it")); + } - return virCgroupAddProcess(cgroup, pid); + out: + VIR_FORCE_CLOSE(fd); } @@ -140,13 +115,9 @@ qemuPasstStart(virDomainObj *vm, virQEMUDriver *driver = priv->driver; g_autofree char *passtSocketName = qemuPasstCreateSocketPath(vm, net); g_autoptr(virCommand) cmd = NULL; - g_autofree char *pidfile = qemuPasstCreatePidFilename(vm, net); g_autofree char *errbuf = NULL; char macaddr[VIR_MAC_STRING_BUFLEN]; size_t i; - pid_t pid = (pid_t) -1; - int exitstatus = 0; - int cmdret = 0; cmd = virCommandNew(PASST); @@ -157,7 +128,6 @@ qemuPasstStart(virDomainObj *vm, "--one-off", "--socket", passtSocketName, "--mac-addr", virMacAddrFormat(&net->mac, macaddr), - "--pid", pidfile, NULL); if (net->mtu) { @@ -254,26 +224,15 @@ qemuPasstStart(virDomainObj *vm, virCommandAddArg(cmd, virBufferCurrentContent(&buf)); } - if (qemuExtDeviceLogCommand(driver, vm, cmd, "passt") < 0) return -1; - if (qemuSecurityCommandRun(driver, vm, cmd, -1, -1, &exitstatus, &cmdret) < 0) - goto error; - - if (cmdret < 0 || exitstatus != 0) { + /* passt forks once it's ready, terminates on connection closure */ + if (virCommandRun(cmd, NULL) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Could not start 'passt': %s"), errbuf); - goto error; + return -1; } return 0; - - error: - ignore_value(virPidFileReadPathIfLocked(pidfile, &pid)); - if (pid != -1) - virProcessKillPainfully(pid, true); - unlink(pidfile); - - return -1; } -- StefanoOn 2/14/23 8:02 AM, Stefano Brivio wrote:Alright. I think I have a solution that would please everybody involved. I'll post it tomorrow though. I need to test it thoroughly. We would be able to get passt's PID (which is needed not only for killing it, but also for CGroup placement), NOT use --foreground and still pass errors from it to users (that is unless logfile was specified, because unfortunately, --log-file and --stderr are mutually exclusive).On Tue, 14 Feb 2023 12:51:22 +0100 Michal Privoznik <mprivozn(a)redhat.com> wrote:When passt starts it tries to do some security measures to restrict itself. For instance, it creates its own namespaces, umounts basically everything, drops capabilities, forks off to further restrict itself (the child is where all interesting work takes place now). This is sound, except it's causing two problems: 1) the PID file FD, which we leak into the passt process, gets closed (and thus our virPidFile*() helpers see unlocked PID file, which makes them think the process is gone),I didn't realise this was the case, but giving passt write (unless I'm missing something) access to a file created by libvirtd doesn't look desirable to me.> 2) the PID file no longer reflects true PID of the process. > > Worse, the child calls setsid() so we can't even kill the whole > process group. I mean, we can but it won't be any good.I think that (incorrect PID in the pidfile) is happening because Michal is using the original version of my patches that were pushed - I had mimicked the behavior of slirp, where libvirt deamonizes the new process. If that process then daemonizes itself, we have some sort of "double daemon"; libvirt has saved off the pid of what it thinks is going to be the final process, but then that process further forks and exits from the process whose pid libvirt saved. But because passt was cleaning up after itself I hadn't noticed the discrepancy in pids when testing. Without going into all the details of the pidfile and locking and etc, I just want to say that if we can fork/exec dnsmasq and let it daemonize itself and create its own pidfile, then certainly we can do the same thing for passt. (and if there's a fundamental problem, then it's a fundamental problem for dnsmasq as well).
On 2/15/23 19:30, Stefano Brivio wrote:On Wed, 15 Feb 2023 18:04:56 +0100 Michal Prívozník <mprivozn(a)redhat.com> wrote:Thanks, this looks exactly like what we need. So for now I can just pass --stderr if there's no --log-file, to deal with those "releases" that don't have those patches merged yet.On 2/15/23 08:50, Laine Stump wrote:That doesn't need to be the case (--log-file and --stderr being mutually exclusive)... if you have a use case for it, let's change that in passt. I just wanted to keep it simple for users ("give a log file, and be sure it won't spam"). Also mind that Laine's series: https://archives.passt.top/passt-dev/20230215082437.110151-1-laine@redhat.c…On 2/14/23 8:02 AM, Stefano Brivio wrote:Alright. I think I have a solution that would please everybody involved. I'll post it tomorrow though. I need to test it thoroughly. We would be able to get passt's PID (which is needed not only for killing it, but also for CGroup placement), NOT use --foreground and still pass errors from it to users (that is unless logfile was specified, because unfortunately, --log-file and --stderr are mutually exclusive).On Tue, 14 Feb 2023 12:51:22 +0100 Michal Privoznik <mprivozn(a)redhat.com> wrote: > When passt starts it tries to do some security measures to > restrict itself. For instance, it creates its own namespaces, > umounts basically everything, drops capabilities, forks off to > further restrict itself (the child is where all interesting work > takes place now). This is sound, except it's causing two > problems: > > 1) the PID file FD, which we leak into the passt process, gets > closed (and thus our virPidFile*() helpers see unlocked PID > file, which makes them think the process is gone), I didn't realise this was the case, but giving passt write (unless I'm missing something) access to a file created by libvirtd doesn't look desirable to me.> 2) the PID file no longer reflects true PID of the process. > > Worse, the child calls setsid() so we can't even kill the whole > process group. I mean, we can but it won't be any good.I think that (incorrect PID in the pidfile) is happening because Michal is using the original version of my patches that were pushed - I had mimicked the behavior of slirp, where libvirt deamonizes the new process. If that process then daemonizes itself, we have some sort of "double daemon"; libvirt has saved off the pid of what it thinks is going to be the final process, but then that process further forks and exits from the process whose pid libvirt saved. But because passt was cleaning up after itself I hadn't noticed the discrepancy in pids when testing. Without going into all the details of the pidfile and locking and etc, I just want to say that if we can fork/exec dnsmasq and let it daemonize itself and create its own pidfile, then certainly we can do the same thing for passt. (and if there's a fundamental problem, then it's a fundamental problem for dnsmasq as well).*should* already cover all the cases where libvirt is interested in relaying "early" errors back to the user. By the way, the one below is pretty much the patch I would have proposed for libvirt. I prepared it earlier today and didn't have a chance to test it yet, it's compile-tested only, and doesn't take cgroups into account (which, it seems, is needed no matter the lifecycle). So I'm sharing it here as reference (that's how simple I wanted it to be -- minus cgroups), or if it's convenient for you to copy and paste something.This effectively disables placing passt into the CGroup set up for emulator thread. And I don't think we want that. Firstly, it makes statistics gathering report incorrect values. Secondly, these helper processes are "implementation detail" - I mean, users don't really care (from accounting POV) whether a task runs in emulator thread inside of QEMU or in a separate process. It's still an emulation and as such should be accounted for. And also, on NUMA machines we definitely want to place passt as close to the emulator as possible (i.e. if emulator thread is pinned than helper processes should be pinned too). Furthermore, it enhances security, because libvirt sets up devices controller in such way, that only devices from domain XML are allowed and everything else is forbidden. I could go on with other controllers but I believe you get the picture. Now true, for qemu:///session we don't set any CGroups as we lack the permissions to do so [1], and this is probably the target audience for this feature anyway, but for qemu:///system (when running libvirtd/virtqemud as root) we do set up CGroups and MUST place helper processes into them. I mean, if we are concerned about security (just look at the discussion about --foreground), then CGroups are definitely step in the right direction. 1: and even this might change in the future as there are some plans to let a privileged component create the CGroup for us (e.g. systemd). Michal
On Thu, Feb 16, 2023 at 09:52:27 +0100, Michal Prívozník wrote:On 2/15/23 19:30, Stefano Brivio wrote: > On Wed, 15 Feb 2023 18:04:56 +0100 > Michal Prívozník <mprivozn(a)redhat.com> wrote: >> On 2/15/23 08:50, Laine Stump wrote:[...]Just a side note. We are already at the point where we need infrastructure to pin/cgroup-place helper processes explicitly similarly to how we do this for the emulator thread. Originally the helper processes (e.g. the pr-helper) didn't do much so it didn't matter much where we placed it. With processes which do heavy lifting such as network communication or in case of the qemu-storage-daemon I'm going to implent we are in the ream of CPU hungry processes where it starts to make sense to dedicate CPU to it or separate it from the rest. The approach we pick should be generic enough so that we don't have to keep re-doing it for each helper process in the future. I plan to address that with the QSD work, but that will take some time. If you end up dealing with this sooner, please consider other helper daemons too.*should* already cover all the cases where libvirt is interested in relaying "early" errors back to the user. By the way, the one below is pretty much the patch I would have proposed for libvirt. I prepared it earlier today and didn't have a chance to test it yet, it's compile-tested only, and doesn't take cgroups into account (which, it seems, is needed no matter the lifecycle). So I'm sharing it here as reference (that's how simple I wanted it to be -- minus cgroups), or if it's convenient for you to copy and paste something.This effectively disables placing passt into the CGroup set up for emulator thread. And I don't think we want that. Firstly, it makes statistics gathering report incorrect values. Secondly, these helper processes are "implementation detail" - I mean, users don't really care (from accounting POV) whether a task runs in emulator thread inside of QEMU or in a separate process. It's still an emulation and as such should be accounted for. And also, on NUMA machines we definitely want to place passt as close to the emulator as possible (i.e. if emulator thread is pinned than helper processes should be pinned too).
On Thu, 16 Feb 2023 09:52:27 +0100 Michal Prívozník <mprivozn(a)redhat.com> wrote:On 2/15/23 19:30, Stefano Brivio wrote:I wouldn't even bother with that, the user base (especially with libvirt) is small enough that we can be quite confident that 100% of the users will upgrade as soon as a new release (most likely coming this week) includes them. I'd suggest to keep it simple, because you really can. Those are actual releases. Their naming just doesn't follow "semantic versioning".On Wed, 15 Feb 2023 18:04:56 +0100 Michal Prívozník <mprivozn(a)redhat.com> wrote:Thanks, this looks exactly like what we need. So for now I can just pass --stderr if there's no --log-file, to deal with those "releases" that don't have those patches merged yet.On 2/15/23 08:50, Laine Stump wrote:That doesn't need to be the case (--log-file and --stderr being mutually exclusive)... if you have a use case for it, let's change that in passt. I just wanted to keep it simple for users ("give a log file, and be sure it won't spam"). Also mind that Laine's series: https://archives.passt.top/passt-dev/20230215082437.110151-1-laine@redhat.c…On 2/14/23 8:02 AM, Stefano Brivio wrote: > On Tue, 14 Feb 2023 12:51:22 +0100 > Michal Privoznik <mprivozn(a)redhat.com> wrote: > >> When passt starts it tries to do some security measures to >> restrict itself. For instance, it creates its own namespaces, >> umounts basically everything, drops capabilities, forks off to >> further restrict itself (the child is where all interesting work >> takes place now). This is sound, except it's causing two >> problems: >> >> 1) the PID file FD, which we leak into the passt process, gets >> closed (and thus our virPidFile*() helpers see unlocked PID >> file, which makes them think the process is gone), > > I didn't realise this was the case, but giving passt write (unless I'm > missing something) access to a file created by libvirtd doesn't look > desirable to me. > >> 2) the PID file no longer reflects true PID of the process. >> >> Worse, the child calls setsid() so we can't even kill the whole >> process group. I mean, we can but it won't be any good. I think that (incorrect PID in the pidfile) is happening because Michal is using the original version of my patches that were pushed - I had mimicked the behavior of slirp, where libvirt deamonizes the new process. If that process then daemonizes itself, we have some sort of "double daemon"; libvirt has saved off the pid of what it thinks is going to be the final process, but then that process further forks and exits from the process whose pid libvirt saved. But because passt was cleaning up after itself I hadn't noticed the discrepancy in pids when testing. Without going into all the details of the pidfile and locking and etc, I just want to say that if we can fork/exec dnsmasq and let it daemonize itself and create its own pidfile, then certainly we can do the same thing for passt. (and if there's a fundamental problem, then it's a fundamental problem for dnsmasq as well).Alright. I think I have a solution that would please everybody involved. I'll post it tomorrow though. I need to test it thoroughly. We would be able to get passt's PID (which is needed not only for killing it, but also for CGroup placement), NOT use --foreground and still pass errors from it to users (that is unless logfile was specified, because unfortunately, --log-file and --stderr are mutually exclusive).Yes, definitely, I see now -- I thought, earlier, that cgroups were just used to handle lifecycles at the moment.*should* already cover all the cases where libvirt is interested in relaying "early" errors back to the user. By the way, the one below is pretty much the patch I would have proposed for libvirt. I prepared it earlier today and didn't have a chance to test it yet, it's compile-tested only, and doesn't take cgroups into account (which, it seems, is needed no matter the lifecycle). So I'm sharing it here as reference (that's how simple I wanted it to be -- minus cgroups), or if it's convenient for you to copy and paste something.This effectively disables placing passt into the CGroup set up for emulator thread. And I don't think we want that. Firstly, it makes statistics gathering report incorrect values. Secondly, these helper processes are "implementation detail" - I mean, users don't really care (from accounting POV) whether a task runs in emulator thread inside of QEMU or in a separate process. It's still an emulation and as such should be accounted for. And also, on NUMA machines we definitely want to place passt as close to the emulator as possible (i.e. if emulator thread is pinned than helper processes should be pinned too).[...]-- Stefano