Startup fd to avoid busywaits
Hello! I would like to propose a patch that allows the invoker to pass a "ready fd" on startup that gets written to once the setup has been completed, similar to slirp4netns's `--ready-fd` flag. Currently we have to poll the interface in a loop to wait for setup to be completed, and it would be much better if we could instead block on fd activity. Just wanted to check if such a contribution would be welcome before putting in the work of authoring it, or if there's already a better way to wait for the interface to come up. This is our current implementation: https://github.com/NixOS/nix/pull/15919/changes#diff-2a9176262efad1ef345d882... Thanks, Lisanna
Hi, On 27/05/2026 19:08, Lisanna Dettwyler wrote:
Hello! I would like to propose a patch that allows the invoker to pass a "ready fd" on startup that gets written to once the setup has been completed, similar to slirp4netns's `--ready-fd` flag. Currently we have to poll the interface in a loop to wait for setup to be completed, and it would be much better if we could instead block on fd activity.
Just wanted to check if such a contribution would be welcome before putting in the work of authoring it, or if there's already a better way to wait for the interface to come up. This is our current implementation: https://github.com/NixOS/nix/pull/15919/changes#diff-2a9176262efad1ef345d882...
I am not a pasta maintainer but this is rather simple as we do it in podman. By default pasta forks into the background, when the parent exists the child is ready for connections. So all you need to do is fork/exec and then wait for the exit, that way you also get easily the exit code to know if the setup failed and can read the stderr for errors. From your linked code I see the use of --foreground so the question would be why are you using this over the default? -- Paul Holzinger
Hi Lisanna,
On Wed, 27 May 2026 13:08:01 -0400
Lisanna Dettwyler
Hello! I would like to propose a patch that allows the invoker to pass a "ready fd" on startup that gets written to once the setup has been completed, similar to slirp4netns's `--ready-fd` flag. Currently we have to poll the interface in a loop to wait for setup to be completed, and it would be much better if we could instead block on fd activity.
As I was implementing the first prototype of pasta, I spotted this in slirp4netns and I was rather surprised because...
Just wanted to check if such a contribution would be welcome before putting in the work of authoring it, or if there's already a better way to wait for the interface to come up.
...traditionally, well-behaved UNIX daemons fork to background when they're ready, and that's what pasta does. This fits quite naturally with typical UNIX-like tools and interfaces: if you want to start pasta (as a daemon) from a script, just do: [whatever comes before] pasta [whatever comes after, now that pasta is ready] Instead of opening a file descriptor, starting a subshell, waiting for that file descriptor, etc. This is how other tools generally start pasta (and passt). Podman calls exec.Command(), for example: https://github.com/containers/common/blob/a5ccdae846b629b5ceaefa6ffd5c651140...
This is our current implementation: https://github.com/NixOS/nix/pull/15919/changes#diff-2a9176262efad1ef345d882...
Ouch, that looks rather painful. :( I read this comment, a bit above: // Bring up pasta, for handling FOD networking. We don't let it daemonize // itself for process managements reasons and kill it manually when done. but it's not clear to me what "process managements reasons" might be. Maybe we have another way to satisfy those requirements? I tried quite hard to make it all as simple and as boring as possible. About this other comment: // FIXME ideally we want a notification when pasta exits, but we cannot do // this at present [...] ...I think ideally the easiest would be to just let pasta terminate by itself, given that you set up namespaces externally (just like Podman and Docker/rootlesskit do). But pasta can also write a PID file, and you could pidfd_open() on its PID. I think that would be much cleaner. While at it, a bit below: // TODO these redirections are crimes. pasta closes all non-stdio file // descriptors very early and lacks fd arguments for the namespaces we // want it to join. we cannot have pasta join the namespaces via pids; // doing so requires capabilities which pasta *also* drops very early. ...actually, pasta explicitly supports joining namespaces via PIDs, I'm not entirely sure what would prevent it in Nix. Would there be some capability we need to drop a bit later? On that topic, you might be interested in: https://bugs.passt.top/show_bug.cgi?id=204 and, perhaps more importantly, in these points coming from the NixPak / bubblewrap usage: https://bugs.passt.top/show_bug.cgi?id=204#c3 https://archives.passt.top/passt-user/671252c8-88f6-45b7-b719-b82786e84bb7@g... I'm not opposed to a --ready-fd (and a --keep-fds) option if that solves issues for you, of course, but I'd say let's make sure we're not duplicating existing (maybe more robust?) mechanisms first. -- Stefano
Hi all,
Thanks for the detailed replies! It looks like allowing it to daemonize and
waiting on the parent works just fine. The comments in the code I linked
are from a different developer associated with a fork of Nix, I think for
our purposes allowing it to exit on its own is perfectly fine, but I'll
check on this.
As far as the namespace joining goes, pasta doesn't have permissions to
join the namespaces if provided verbatim without the redirection hack, but
let me get back to you on this also.
Thanks,
Lisanna
On Wed, May 27, 2026 at 3:39 PM Stefano Brivio
Hi Lisanna,
On Wed, 27 May 2026 13:08:01 -0400 Lisanna Dettwyler
wrote: Hello! I would like to propose a patch that allows the invoker to pass a "ready fd" on startup that gets written to once the setup has been completed, similar to slirp4netns's `--ready-fd` flag. Currently we have to poll the interface in a loop to wait for setup to be completed, and it would be much better if we could instead block on fd activity.
As I was implementing the first prototype of pasta, I spotted this in slirp4netns and I was rather surprised because...
Just wanted to check if such a contribution would be welcome before putting in the work of authoring it, or if there's already a better way to wait for the interface to come up.
...traditionally, well-behaved UNIX daemons fork to background when they're ready, and that's what pasta does.
This fits quite naturally with typical UNIX-like tools and interfaces: if you want to start pasta (as a daemon) from a script, just do:
[whatever comes before] pasta [whatever comes after, now that pasta is ready]
Instead of opening a file descriptor, starting a subshell, waiting for that file descriptor, etc.
This is how other tools generally start pasta (and passt). Podman calls exec.Command(), for example:
https://github.com/containers/common/blob/a5ccdae846b629b5ceaefa6ffd5c651140...
This is our current implementation:
https://github.com/NixOS/nix/pull/15919/changes#diff-2a9176262efad1ef345d882...
Ouch, that looks rather painful. :( I read this comment, a bit above:
// Bring up pasta, for handling FOD networking. We don't let it daemonize // itself for process managements reasons and kill it manually when done.
but it's not clear to me what "process managements reasons" might be. Maybe we have another way to satisfy those requirements? I tried quite hard to make it all as simple and as boring as possible.
About this other comment:
// FIXME ideally we want a notification when pasta exits, but we cannot do // this at present [...]
...I think ideally the easiest would be to just let pasta terminate by itself, given that you set up namespaces externally (just like Podman and Docker/rootlesskit do).
But pasta can also write a PID file, and you could pidfd_open() on its PID. I think that would be much cleaner.
While at it, a bit below:
// TODO these redirections are crimes. pasta closes all non-stdio file // descriptors very early and lacks fd arguments for the namespaces we // want it to join. we cannot have pasta join the namespaces via pids; // doing so requires capabilities which pasta *also* drops very early.
...actually, pasta explicitly supports joining namespaces via PIDs, I'm not entirely sure what would prevent it in Nix. Would there be some capability we need to drop a bit later?
On that topic, you might be interested in:
https://bugs.passt.top/show_bug.cgi?id=204
and, perhaps more importantly, in these points coming from the NixPak / bubblewrap usage:
https://bugs.passt.top/show_bug.cgi?id=204#c3
https://archives.passt.top/passt-user/671252c8-88f6-45b7-b719-b82786e84bb7@g...
I'm not opposed to a --ready-fd (and a --keep-fds) option if that solves issues for you, of course, but I'd say let's make sure we're not duplicating existing (maybe more robust?) mechanisms first.
-- Stefano
Hi Stefano, Indeed it would be useful if the capability dropping could be modified or moved until after the net and user namespaces were opened. I'm not that familiar with the codebase so I'm not sure where would be the best spot for that to be moved to or what capability needs to not be dropped. Thanks, Lisanna On Tue, Jun 2, 2026 at 2:51 PM Lisanna Dettwyler < lisanna.dettwyler@gmail.com> wrote:
Hi all,
Thanks for the detailed replies! It looks like allowing it to daemonize and waiting on the parent works just fine. The comments in the code I linked are from a different developer associated with a fork of Nix, I think for our purposes allowing it to exit on its own is perfectly fine, but I'll check on this.
As far as the namespace joining goes, pasta doesn't have permissions to join the namespaces if provided verbatim without the redirection hack, but let me get back to you on this also.
Thanks, Lisanna
On Wed, May 27, 2026 at 3:39 PM Stefano Brivio
wrote: Hi Lisanna,
On Wed, 27 May 2026 13:08:01 -0400 Lisanna Dettwyler
wrote: Hello! I would like to propose a patch that allows the invoker to pass a "ready fd" on startup that gets written to once the setup has been completed, similar to slirp4netns's `--ready-fd` flag. Currently we have to poll the interface in a loop to wait for setup to be completed, and it would be much better if we could instead block on fd activity.
As I was implementing the first prototype of pasta, I spotted this in slirp4netns and I was rather surprised because...
Just wanted to check if such a contribution would be welcome before putting in the work of authoring it, or if there's already a better way to wait for the interface to come up.
...traditionally, well-behaved UNIX daemons fork to background when they're ready, and that's what pasta does.
This fits quite naturally with typical UNIX-like tools and interfaces: if you want to start pasta (as a daemon) from a script, just do:
[whatever comes before] pasta [whatever comes after, now that pasta is ready]
Instead of opening a file descriptor, starting a subshell, waiting for that file descriptor, etc.
This is how other tools generally start pasta (and passt). Podman calls exec.Command(), for example:
https://github.com/containers/common/blob/a5ccdae846b629b5ceaefa6ffd5c651140...
This is our current implementation:
https://github.com/NixOS/nix/pull/15919/changes#diff-2a9176262efad1ef345d882...
Ouch, that looks rather painful. :( I read this comment, a bit above:
// Bring up pasta, for handling FOD networking. We don't let it daemonize // itself for process managements reasons and kill it manually when done.
but it's not clear to me what "process managements reasons" might be. Maybe we have another way to satisfy those requirements? I tried quite hard to make it all as simple and as boring as possible.
About this other comment:
// FIXME ideally we want a notification when pasta exits, but we cannot do // this at present [...]
...I think ideally the easiest would be to just let pasta terminate by itself, given that you set up namespaces externally (just like Podman and Docker/rootlesskit do).
But pasta can also write a PID file, and you could pidfd_open() on its PID. I think that would be much cleaner.
While at it, a bit below:
// TODO these redirections are crimes. pasta closes all non-stdio file // descriptors very early and lacks fd arguments for the namespaces we // want it to join. we cannot have pasta join the namespaces via pids; // doing so requires capabilities which pasta *also* drops very early.
...actually, pasta explicitly supports joining namespaces via PIDs, I'm not entirely sure what would prevent it in Nix. Would there be some capability we need to drop a bit later?
On that topic, you might be interested in:
https://bugs.passt.top/show_bug.cgi?id=204
and, perhaps more importantly, in these points coming from the NixPak / bubblewrap usage:
https://bugs.passt.top/show_bug.cgi?id=204#c3
https://archives.passt.top/passt-user/671252c8-88f6-45b7-b719-b82786e84bb7@g...
I'm not opposed to a --ready-fd (and a --keep-fds) option if that solves issues for you, of course, but I'd say let's make sure we're not duplicating existing (maybe more robust?) mechanisms first.
-- Stefano
On Tue, Jun 02, 2026 at 06:23:29PM -0400, Lisanna Dettwyler wrote:
Hi Stefano,
Indeed it would be useful if the capability dropping could be modified or moved until after the net and user namespaces were opened. I'm not that familiar with the codebase so I'm not sure where would be the best spot for that to be moved to or what capability needs to not be dropped.
We certainly could delay the capability drop, but whether it's wise is a different question. The longer we leave it, the greater attack surface we have while still privileged. Waiting until after the namespaces are opened means we've at least parsed the command line, which is a fair bit of code. On the other hand we shouldn't have opened listening network sockets yet, so we should have relatively little exposure to either external or guest traffic.
Thanks, Lisanna
On Tue, Jun 2, 2026 at 2:51 PM Lisanna Dettwyler < lisanna.dettwyler@gmail.com> wrote:
Hi all,
Thanks for the detailed replies! It looks like allowing it to daemonize and waiting on the parent works just fine. The comments in the code I linked are from a different developer associated with a fork of Nix, I think for our purposes allowing it to exit on its own is perfectly fine, but I'll check on this.
As far as the namespace joining goes, pasta doesn't have permissions to join the namespaces if provided verbatim without the redirection hack, but let me get back to you on this also.
Thanks, Lisanna
On Wed, May 27, 2026 at 3:39 PM Stefano Brivio
wrote: Hi Lisanna,
On Wed, 27 May 2026 13:08:01 -0400 Lisanna Dettwyler
wrote: Hello! I would like to propose a patch that allows the invoker to pass a "ready fd" on startup that gets written to once the setup has been completed, similar to slirp4netns's `--ready-fd` flag. Currently we have to poll the interface in a loop to wait for setup to be completed, and it would be much better if we could instead block on fd activity.
As I was implementing the first prototype of pasta, I spotted this in slirp4netns and I was rather surprised because...
Just wanted to check if such a contribution would be welcome before putting in the work of authoring it, or if there's already a better way to wait for the interface to come up.
...traditionally, well-behaved UNIX daemons fork to background when they're ready, and that's what pasta does.
This fits quite naturally with typical UNIX-like tools and interfaces: if you want to start pasta (as a daemon) from a script, just do:
[whatever comes before] pasta [whatever comes after, now that pasta is ready]
Instead of opening a file descriptor, starting a subshell, waiting for that file descriptor, etc.
This is how other tools generally start pasta (and passt). Podman calls exec.Command(), for example:
https://github.com/containers/common/blob/a5ccdae846b629b5ceaefa6ffd5c651140...
This is our current implementation:
https://github.com/NixOS/nix/pull/15919/changes#diff-2a9176262efad1ef345d882...
Ouch, that looks rather painful. :( I read this comment, a bit above:
// Bring up pasta, for handling FOD networking. We don't let it daemonize // itself for process managements reasons and kill it manually when done.
but it's not clear to me what "process managements reasons" might be. Maybe we have another way to satisfy those requirements? I tried quite hard to make it all as simple and as boring as possible.
About this other comment:
// FIXME ideally we want a notification when pasta exits, but we cannot do // this at present [...]
...I think ideally the easiest would be to just let pasta terminate by itself, given that you set up namespaces externally (just like Podman and Docker/rootlesskit do).
But pasta can also write a PID file, and you could pidfd_open() on its PID. I think that would be much cleaner.
While at it, a bit below:
// TODO these redirections are crimes. pasta closes all non-stdio file // descriptors very early and lacks fd arguments for the namespaces we // want it to join. we cannot have pasta join the namespaces via pids; // doing so requires capabilities which pasta *also* drops very early.
...actually, pasta explicitly supports joining namespaces via PIDs, I'm not entirely sure what would prevent it in Nix. Would there be some capability we need to drop a bit later?
On that topic, you might be interested in:
https://bugs.passt.top/show_bug.cgi?id=204
and, perhaps more importantly, in these points coming from the NixPak / bubblewrap usage:
https://bugs.passt.top/show_bug.cgi?id=204#c3
https://archives.passt.top/passt-user/671252c8-88f6-45b7-b719-b82786e84bb7@g...
I'm not opposed to a --ready-fd (and a --keep-fds) option if that solves issues for you, of course, but I'd say let's make sure we're not duplicating existing (maybe more robust?) mechanisms first.
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 3 Jun 2026 19:29:43 +1000
David Gibson
On Tue, Jun 02, 2026 at 06:23:29PM -0400, Lisanna Dettwyler wrote:
Hi Stefano,
Indeed it would be useful if the capability dropping could be modified or moved until after the net and user namespaces were opened. I'm not that familiar with the codebase so I'm not sure where would be the best spot for that to be moved to or what capability needs to not be dropped.
We certainly could delay the capability drop, but whether it's wise is a different question. The longer we leave it, the greater attack surface we have while still privileged.
Waiting until after the namespaces are opened means we've at least parsed the command line, which is a fair bit of code. On the other hand we shouldn't have opened listening network sockets yet, so we should have relatively little exposure to either external or guest traffic.
Right, I guess that's the most fundamental distinction in deciding when to drop capabilities or enforce whatever kind of restrictions, but the rest is still nice to have as soon as possible, so here we would really need to understand what the problem is (I didn't, yet). For example, Podman passes a pre-made network namespace (via --netns), and we needed commit 594dce66d3bb ("isolation: keep CAP_SYS_PTRACE when required") to be able to join it, but I really have no idea why we could possibly need anything else to join one by PID, and it looks like that comment about capabilities was added after that commit. But maybe that issue was caused by some other issue that has been solved meanwhile? I guess that should be checked first. If it's not solved, a small stand-alone reproducer would be helpful. -- Stefano
Sure, I'll try against the HEAD of master and if it's still an issue I'll
put together a small reproducer.
Thanks,
Lisanna
On Wed, Jun 3, 2026 at 11:45 AM Stefano Brivio
On Wed, 3 Jun 2026 19:29:43 +1000 David Gibson
wrote: On Tue, Jun 02, 2026 at 06:23:29PM -0400, Lisanna Dettwyler wrote:
Hi Stefano,
Indeed it would be useful if the capability dropping could be modified or moved until after the net and user namespaces were opened. I'm not that familiar with the codebase so I'm not sure where would be the best spot for that to be moved to or what capability needs to not be dropped.
We certainly could delay the capability drop, but whether it's wise is a different question. The longer we leave it, the greater attack surface we have while still privileged.
Waiting until after the namespaces are opened means we've at least parsed the command line, which is a fair bit of code. On the other hand we shouldn't have opened listening network sockets yet, so we should have relatively little exposure to either external or guest traffic.
Right, I guess that's the most fundamental distinction in deciding when to drop capabilities or enforce whatever kind of restrictions, but the rest is still nice to have as soon as possible, so here we would really need to understand what the problem is (I didn't, yet).
For example, Podman passes a pre-made network namespace (via --netns), and we needed commit 594dce66d3bb ("isolation: keep CAP_SYS_PTRACE when required") to be able to join it, but I really have no idea why we could possibly need anything else to join one by PID, and it looks like that comment about capabilities was added after that commit.
But maybe that issue was caused by some other issue that has been solved meanwhile? I guess that should be checked first. If it's not solved, a small stand-alone reproducer would be helpful.
-- Stefano
participants (4)
-
David Gibson
-
Lisanna Dettwyler
-
Paul Holzinger
-
Stefano Brivio