On 2/15/23 19:30, Stefano Brivio wrote:
On Wed, 15 Feb 2023 18:04:56 +0100
Michal Prívozník <mprivozn(a)redhat.com> wrote:
On 2/15/23 08:50, Laine Stump wrote:
On 2/14/23 8:02 AM, Stefano Brivio wrote:
On Tue, 14 Feb 2023 12:51:22 +0100
Michal Privoznik <mprivozn(a)redhat.com> wrote:
> When passt starts it tries to do some security measures to
> restrict itself. For instance, it creates its own namespaces,
> umounts basically everything, drops capabilities, forks off to
> further restrict itself (the child is where all interesting work
> takes place now). This is sound, except it's causing two
> problems:
>
> 1) the PID file FD, which we leak into the passt process, gets
> closed (and thus our virPidFile*() helpers see unlocked PID
> file, which makes them think the process is gone),
I didn't realise this was the case, but giving passt write (unless I'm
missing something) access to a file created by libvirtd doesn't look
desirable to me.
> 2) the PID file no longer reflects true PID of the process.
>
> Worse, the child calls setsid() so we can't even kill the whole
> process group. I mean, we can but it won't be any good.
I think that (incorrect PID in the pidfile) is happening because Michal
is using the original version of my patches that were pushed - I had
mimicked the behavior of slirp, where libvirt deamonizes the new
process. If that process then daemonizes itself, we have some sort of
"double daemon"; libvirt has saved off the pid of what it thinks is
going to be the final process, but then that process further forks and
exits from the process whose pid libvirt saved. But because passt was
cleaning up after itself I hadn't noticed the discrepancy in pids when
testing.
Without going into all the details of the pidfile and locking and etc, I
just want to say that if we can fork/exec dnsmasq and let it daemonize
itself and create its own pidfile, then certainly we can do the same
thing for passt. (and if there's a fundamental problem, then it's a
fundamental problem for dnsmasq as well).
Alright. I think I have a solution that would please everybody involved.
I'll post it tomorrow though. I need to test it thoroughly. We would be
able to get passt's PID (which is needed not only for killing it, but
also for CGroup placement), NOT use --foreground and still pass errors
from it to users (that is unless logfile was specified, because
unfortunately, --log-file and --stderr are mutually exclusive).
That doesn't need to be the case (--log-file and --stderr being
mutually exclusive)... if you have a use case for it, let's change that
in passt. I just wanted to keep it simple for users ("give a log file,
and be sure it won't spam").
Also mind that Laine's series:
https://archives.passt.top/passt-dev/20230215082437.110151-1-laine@redhat.c…
Thanks, this looks exactly like what we need. So for now I can just pass
--stderr if there's no --log-file, to deal with those "releases" that
don't have those patches merged yet.
*should* already cover all the cases where libvirt is interested in
relaying "early" errors back to the user.
By the way, the one below is pretty much the patch I would have proposed
for libvirt. I prepared it earlier today and didn't have a chance to
test it yet, it's compile-tested only, and doesn't take cgroups into
account (which, it seems, is needed no matter the lifecycle).
So I'm sharing it here as reference (that's how simple I wanted it to
be -- minus cgroups), or if it's convenient for you to copy and paste
something.
This effectively disables placing passt into the CGroup set up for
emulator thread. And I don't think we want that. Firstly, it makes
statistics gathering report incorrect values. Secondly, these helper
processes are "implementation detail" - I mean, users don't really care
(from accounting POV) whether a task runs in emulator thread inside of
QEMU or in a separate process. It's still an emulation and as such
should be accounted for. And also, on NUMA machines we definitely want
to place passt as close to the emulator as possible (i.e. if emulator
thread is pinned than helper processes should be pinned too).
Furthermore, it enhances security, because libvirt sets up devices
controller in such way, that only devices from domain XML are allowed
and everything else is forbidden.
I could go on with other controllers but I believe you get the picture.
Now true, for qemu:///session we don't set any CGroups as we lack the
permissions to do so [1], and this is probably the target audience for
this feature anyway, but for qemu:///system (when running
libvirtd/virtqemud as root) we do set up CGroups and MUST place helper
processes into them. I mean, if we are concerned about security (just
look at the discussion about --foreground), then CGroups are definitely
step in the right direction.
1: and even this might change in the future as there are some plans to
let a privileged component create the CGroup for us (e.g. systemd).
Michal