[PATCH 0/3] More caution with NONBLOCK flag on Unix sockets
This is a revision of the one patch from my pesto series that wasn't merged: a rework to the handling of the blocking flag for both listening and accepted Unix sockets. This new version is split into several pieces to make the rationales clearer at each step. It's also a bit more cautious in what it does, so should avoid the problems with the original version. David Gibson (3): treewide: Add SOCK_CLOEXEC to accept() calls that are missing it conf, tap, repair: Uniformly use non-blocking accept() on Unix sockets conf, repair, tap: More caution about blocking flag on Unix sockets conf.c | 10 +++++++--- repair.c | 13 +++++++++---- tap.c | 10 ++++++++-- util.c | 2 +- 4 files changed, 25 insertions(+), 10 deletions(-) -- 2.54.0
sock_unix(), which creates a listening Unix socket, doesn't set the
SOCK_NONBLOCK flag, meaning that accept() will block if called with no
pending connections. Generally, this doesn't matter because we only
accept() once we've received an epoll event indicating there's a pending
connection request.
Control connections (pesto) are an exception, because the way we queue
connections requires that we call accept() when we close one connection to
see if there's another one waiting. We rely on an EAGAIN here to know that
there's nothing waiting. To handle these we have an explicit fcntl() to
enable NONBLOCK on the control listening socket.
However, always using non-blocking accept() for Unix sockets would make
things a bit more uniform, and should be a bit less fragile in the case
that we ever somehow got a spurious connection event. So, alter
sock_unix() to always use the SOCK_NONBLOCK flag. Remove the control
socket's special case fcntl(), and adjust the error handling on each
Unix socket accept() for the new behaviour. As a bonus the last adds
reporting for accept() errors on tap socket connections.
we will need non-blocking accept() for the upcoming control/configuration
socket. Always add SOCK_NONBLOCK, which is more robust and in keeping with
the normal non-blocking style of passt.
Signed-off-by: David Gibson
Generally we try to set the O_CLOEXEC flag on every fd we create. This
seems to be generally accepted security best practice these days, and we
never fork(), so certainly have no need to pass fds to children.
A handful of accept4() calls on Unix sockets are missing the SOCK_CLOEXEC
flag to set this though. Add the missing flag.
Signed-off-by: David Gibson
On Wed, May 13, 2026 at 02:14:22PM +1000, David Gibson wrote:
sock_unix(), which creates a listening Unix socket, doesn't set the SOCK_NONBLOCK flag, meaning that accept() will block if called with no pending connections. Generally, this doesn't matter because we only accept() once we've received an epoll event indicating there's a pending connection request.
Control connections (pesto) are an exception, because the way we queue connections requires that we call accept() when we close one connection to see if there's another one waiting. We rely on an EAGAIN here to know that there's nothing waiting. To handle these we have an explicit fcntl() to enable NONBLOCK on the control listening socket.
However, always using non-blocking accept() for Unix sockets would make things a bit more uniform, and should be a bit less fragile in the case that we ever somehow got a spurious connection event. So, alter sock_unix() to always use the SOCK_NONBLOCK flag. Remove the control socket's special case fcntl(), and adjust the error handling on each Unix socket accept() for the new behaviour. As a bonus the last adds reporting for accept() errors on tap socket connections.
I didn't realise it, but adding that reporting also removes a valid, if fairly minor coverity warning (at least with coverity 2026.3.0).
we will need non-blocking accept() for the upcoming control/configuration socket. Always add SOCK_NONBLOCK, which is more robust and in keeping with the normal non-blocking style of passt.
Oops. This paragraph is left over from a previous version. Can you remove on merge, if there's no other reason to respin?
Signed-off-by: David Gibson
--- conf.c | 4 +--- repair.c | 4 ++-- tap.c | 5 +++++ util.c | 2 +- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/conf.c b/conf.c index 029b9c7c..dec43fca 100644 --- a/conf.c +++ b/conf.c @@ -1091,8 +1091,6 @@ static void conf_open_files(struct ctx *c) die_perror("Couldn't open control socket %s", c->control_path); } - if (fcntl(c->fd_control_listen, F_SETFL, O_NONBLOCK)) - die_perror("Couldn't set O_NONBLOCK on control socket"); } else { c->fd_control_listen = -1; } @@ -2087,7 +2085,7 @@ retry: fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC); if (fd < 0) { if (errno != EAGAIN) - warn_perror("accept4() on configuration listening socket"); + warn_perror("Error accept()ing configuration socket"); return; }
diff --git a/repair.c b/repair.c index 3e0e3e0a..42c4ae97 100644 --- a/repair.c +++ b/repair.c @@ -101,8 +101,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, SOCK_CLOEXEC)) < 0) { - rc = errno; - debug_perror("accept4() on TCP_REPAIR helper listening socket"); + if ((rc = errno) != EAGAIN) + warn_perror("Error accept()ing repair helper"); return rc; }
diff --git a/tap.c b/tap.c index e7cac9df..fda2da9b 100644 --- a/tap.c +++ b/tap.c @@ -1491,6 +1491,11 @@ void tap_listen_handler(struct ctx *c, uint32_t events) }
c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC); + if (c->fd_tap < 0) { + if (errno != EAGAIN) + warn_perror("Error accepting tap client"); + return; + }
if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len)) info("accepted connection from PID %i", ucred.pid); diff --git a/util.c b/util.c index 73c9d51d..204391c7 100644 --- a/util.c +++ b/util.c @@ -238,7 +238,7 @@ int sock_l4_dualstack_any(const struct ctx *c, enum epoll_type type, */ int sock_unix(char *sock_path) { - int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0); + int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK, 0); struct sockaddr_un addr = { .sun_family = AF_UNIX, }; -- 2.54.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 13 May 2026 14:14:21 +1000
David Gibson
Generally we try to set the O_CLOEXEC flag on every fd we create. This seems to be generally accepted security best practice these days, and we never fork(), so certainly have no need to pass fds to children.
But we do clone() with CLONE_FILES (even though when we clone() to call execvp() later, we don't set CLONE_FILES), so, even though I don't see a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be automatic from the fact we don't fork(). I spent some time on it and I really couldn't find a reason why we don't have O_CLOEXEC there, so probably there isn't any, and I think this patch is fine. I would just change this paragraph to "[...] these days, and we don't need to pass file descriptors to children."
A handful of accept4() calls on Unix sockets are missing the SOCK_CLOEXEC flag to set this though. Add the missing flag.
Signed-off-by: David Gibson
--- repair.c | 5 +++-- tap.c | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/repair.c b/repair.c index 69c53077..3e0e3e0a 100644 --- a/repair.c +++ b/repair.c @@ -87,7 +87,7 @@ int repair_listen_handler(struct ctx *c, uint32_t events) /* Another client is already connected: accept and close right away. */ if (c->fd_repair != -1) { int discard = accept4(c->fd_repair_listen, NULL, NULL, - SOCK_NONBLOCK); + SOCK_NONBLOCK | SOCK_CLOEXEC);
if (discard == -1) return errno; @@ -99,7 +99,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events) return EEXIST; }
- if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, 0)) < 0) { + if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, + SOCK_CLOEXEC)) < 0) { rc = errno; debug_perror("accept4() on TCP_REPAIR helper listening socket"); return rc; diff --git a/tap.c b/tap.c index 0920a325..e7cac9df 100644 --- a/tap.c +++ b/tap.c @@ -1477,7 +1477,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events) /* Another client is already connected: accept and close right away. */ if (c->fd_tap != -1) { int discard = accept4(c->fd_tap_listen, NULL, NULL, - SOCK_NONBLOCK); + SOCK_NONBLOCK | SOCK_CLOEXEC);
if (discard == -1) return; @@ -1490,7 +1490,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events) return; }
- c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, 0); + c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len)) info("accepted connection from PID %i", ucred.pid);
-- Stefano
On Wed, 13 May 2026 15:51:27 +1000
David Gibson
On Wed, May 13, 2026 at 02:14:22PM +1000, David Gibson wrote:
sock_unix(), which creates a listening Unix socket, doesn't set the SOCK_NONBLOCK flag, meaning that accept() will block if called with no pending connections. Generally, this doesn't matter because we only accept() once we've received an epoll event indicating there's a pending connection request.
Control connections (pesto) are an exception, because the way we queue connections requires that we call accept() when we close one connection to see if there's another one waiting. We rely on an EAGAIN here to know that there's nothing waiting. To handle these we have an explicit fcntl() to enable NONBLOCK on the control listening socket.
However, always using non-blocking accept() for Unix sockets would make things a bit more uniform, and should be a bit less fragile in the case that we ever somehow got a spurious connection event. So, alter sock_unix() to always use the SOCK_NONBLOCK flag. Remove the control socket's special case fcntl(), and adjust the error handling on each Unix socket accept() for the new behaviour. As a bonus the last adds reporting for accept() errors on tap socket connections.
I didn't realise it, but adding that reporting also removes a valid, if fairly minor coverity warning (at least with coverity 2026.3.0).
we will need non-blocking accept() for the upcoming control/configuration socket. Always add SOCK_NONBLOCK, which is more robust and in keeping with the normal non-blocking style of passt.
Oops. This paragraph is left over from a previous version. Can you remove on merge, if there's no other reason to respin?
I think the comments I'm raising (the one below and the one to 3/3) actually warrant a respin at this point.
Signed-off-by: David Gibson
--- conf.c | 4 +--- repair.c | 4 ++-- tap.c | 5 +++++ util.c | 2 +- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/conf.c b/conf.c index 029b9c7c..dec43fca 100644 --- a/conf.c +++ b/conf.c @@ -1091,8 +1091,6 @@ static void conf_open_files(struct ctx *c) die_perror("Couldn't open control socket %s", c->control_path); } - if (fcntl(c->fd_control_listen, F_SETFL, O_NONBLOCK)) - die_perror("Couldn't set O_NONBLOCK on control socket"); } else { c->fd_control_listen = -1; } @@ -2087,7 +2085,7 @@ retry: fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC); if (fd < 0) { if (errno != EAGAIN) - warn_perror("accept4() on configuration listening socket"); + warn_perror("Error accept()ing configuration socket"); return; }
diff --git a/repair.c b/repair.c index 3e0e3e0a..42c4ae97 100644 --- a/repair.c +++ b/repair.c @@ -101,8 +101,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, SOCK_CLOEXEC)) < 0) { - rc = errno; - debug_perror("accept4() on TCP_REPAIR helper listening socket"); + if ((rc = errno) != EAGAIN) + warn_perror("Error accept()ing repair helper");
See repair_wait() for the reason why this particular listening socket needs to be blocking (with a timeout).
return rc; }
diff --git a/tap.c b/tap.c index e7cac9df..fda2da9b 100644 --- a/tap.c +++ b/tap.c @@ -1491,6 +1491,11 @@ void tap_listen_handler(struct ctx *c, uint32_t events) }
c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC); + if (c->fd_tap < 0) { + if (errno != EAGAIN) + warn_perror("Error accepting tap client"); + return; + }
if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len)) info("accepted connection from PID %i", ucred.pid); diff --git a/util.c b/util.c index 73c9d51d..204391c7 100644 --- a/util.c +++ b/util.c @@ -238,7 +238,7 @@ int sock_l4_dualstack_any(const struct ctx *c, enum epoll_type type, */ int sock_unix(char *sock_path) { - int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0); + int fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC | SOCK_NONBLOCK, 0); struct sockaddr_un addr = { .sun_family = AF_UNIX, }; -- 2.54.0
-- Stefano
On Sat, May 16, 2026 at 05:46:27PM +0200, Stefano Brivio wrote:
On Wed, 13 May 2026 15:51:27 +1000 David Gibson
wrote: On Wed, May 13, 2026 at 02:14:22PM +1000, David Gibson wrote:
sock_unix(), which creates a listening Unix socket, doesn't set the SOCK_NONBLOCK flag, meaning that accept() will block if called with no pending connections. Generally, this doesn't matter because we only accept() once we've received an epoll event indicating there's a pending connection request.
Control connections (pesto) are an exception, because the way we queue connections requires that we call accept() when we close one connection to see if there's another one waiting. We rely on an EAGAIN here to know that there's nothing waiting. To handle these we have an explicit fcntl() to enable NONBLOCK on the control listening socket.
However, always using non-blocking accept() for Unix sockets would make things a bit more uniform, and should be a bit less fragile in the case that we ever somehow got a spurious connection event. So, alter sock_unix() to always use the SOCK_NONBLOCK flag. Remove the control socket's special case fcntl(), and adjust the error handling on each Unix socket accept() for the new behaviour. As a bonus the last adds reporting for accept() errors on tap socket connections.
I didn't realise it, but adding that reporting also removes a valid, if fairly minor coverity warning (at least with coverity 2026.3.0).
we will need non-blocking accept() for the upcoming control/configuration socket. Always add SOCK_NONBLOCK, which is more robust and in keeping with the normal non-blocking style of passt.
Oops. This paragraph is left over from a previous version. Can you remove on merge, if there's no other reason to respin?
I think the comments I'm raising (the one below and the one to 3/3) actually warrant a respin at this point.
Agreed.
Signed-off-by: David Gibson
--- conf.c | 4 +--- repair.c | 4 ++-- tap.c | 5 +++++ util.c | 2 +- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/conf.c b/conf.c index 029b9c7c..dec43fca 100644 --- a/conf.c +++ b/conf.c @@ -1091,8 +1091,6 @@ static void conf_open_files(struct ctx *c) die_perror("Couldn't open control socket %s", c->control_path); } - if (fcntl(c->fd_control_listen, F_SETFL, O_NONBLOCK)) - die_perror("Couldn't set O_NONBLOCK on control socket"); } else { c->fd_control_listen = -1; } @@ -2087,7 +2085,7 @@ retry: fd = accept4(c->fd_control_listen, NULL, NULL, SOCK_CLOEXEC); if (fd < 0) { if (errno != EAGAIN) - warn_perror("accept4() on configuration listening socket"); + warn_perror("Error accept()ing configuration socket"); return; }
diff --git a/repair.c b/repair.c index 3e0e3e0a..42c4ae97 100644 --- a/repair.c +++ b/repair.c @@ -101,8 +101,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events)
if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, SOCK_CLOEXEC)) < 0) { - rc = errno; - debug_perror("accept4() on TCP_REPAIR helper listening socket"); + if ((rc = errno) != EAGAIN) + warn_perror("Error accept()ing repair helper");
See repair_wait() for the reason why this particular listening socket needs to be blocking (with a timeout).
Ah, good point. That makes this whole patch pretty ill-conceived. I'll ditch everything except the error reporting addition. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
On Wed, 13 May 2026 14:14:21 +1000 David Gibson
wrote: Generally we try to set the O_CLOEXEC flag on every fd we create. This seems to be generally accepted security best practice these days, and we never fork(), so certainly have no need to pass fds to children.
But we do clone() with CLONE_FILES (even though when we clone() to call execvp() later, we don't set CLONE_FILES), so, even though I don't see a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be automatic from the fact we don't fork().
So, I did think about that when wrote it, but went for the short version rather than saying clone() with CLONE_FILES doesn't count. Now, I realised that we've both fallen for the trap again, forgetting that this has nothing to do with fork() or clone() and is, as it says right there in the name, about exec(). I'll update the message.
I spent some time on it and I really couldn't find a reason why we don't have O_CLOEXEC there, so probably there isn't any, and I think this patch is fine.
I would just change this paragraph to "[...] these days, and we don't need to pass file descriptors to children."
A handful of accept4() calls on Unix sockets are missing the SOCK_CLOEXEC flag to set this though. Add the missing flag.
Signed-off-by: David Gibson
--- repair.c | 5 +++-- tap.c | 4 ++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/repair.c b/repair.c index 69c53077..3e0e3e0a 100644 --- a/repair.c +++ b/repair.c @@ -87,7 +87,7 @@ int repair_listen_handler(struct ctx *c, uint32_t events) /* Another client is already connected: accept and close right away. */ if (c->fd_repair != -1) { int discard = accept4(c->fd_repair_listen, NULL, NULL, - SOCK_NONBLOCK); + SOCK_NONBLOCK | SOCK_CLOEXEC);
if (discard == -1) return errno; @@ -99,7 +99,8 @@ int repair_listen_handler(struct ctx *c, uint32_t events) return EEXIST; }
- if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, 0)) < 0) { + if ((c->fd_repair = accept4(c->fd_repair_listen, NULL, NULL, + SOCK_CLOEXEC)) < 0) { rc = errno; debug_perror("accept4() on TCP_REPAIR helper listening socket"); return rc; diff --git a/tap.c b/tap.c index 0920a325..e7cac9df 100644 --- a/tap.c +++ b/tap.c @@ -1477,7 +1477,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events) /* Another client is already connected: accept and close right away. */ if (c->fd_tap != -1) { int discard = accept4(c->fd_tap_listen, NULL, NULL, - SOCK_NONBLOCK); + SOCK_NONBLOCK | SOCK_CLOEXEC);
if (discard == -1) return; @@ -1490,7 +1490,7 @@ void tap_listen_handler(struct ctx *c, uint32_t events) return; }
- c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, 0); + c->fd_tap = accept4(c->fd_tap_listen, NULL, NULL, SOCK_CLOEXEC);
if (!getsockopt(c->fd_tap, SOL_SOCKET, SO_PEERCRED, &ucred, &len)) info("accepted connection from PID %i", ucred.pid);
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Mon, 18 May 2026 12:28:57 +1000
David Gibson
On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
On Wed, 13 May 2026 14:14:21 +1000 David Gibson
wrote: Generally we try to set the O_CLOEXEC flag on every fd we create. This seems to be generally accepted security best practice these days, and we never fork(), so certainly have no need to pass fds to children.
But we do clone() with CLONE_FILES (even though when we clone() to call execvp() later, we don't set CLONE_FILES), so, even though I don't see a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be automatic from the fact we don't fork().
So, I did think about that when wrote it, but went for the short version rather than saying clone() with CLONE_FILES doesn't count.
Now, I realised that we've both fallen for the trap again, forgetting that this has nothing to do with fork() or clone() and is, as it says right there in the name, about exec().
No, wait, I didn't fall for it, not this time. :) That's why I was mentioning that when we call clone() and execvp() later (which would be the only path that matters), we don't set CLONE_FILES anyway. -- Stefano
On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:
On Mon, 18 May 2026 12:28:57 +1000 David Gibson
wrote: On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
On Wed, 13 May 2026 14:14:21 +1000 David Gibson
wrote: Generally we try to set the O_CLOEXEC flag on every fd we create. This seems to be generally accepted security best practice these days, and we never fork(), so certainly have no need to pass fds to children.
But we do clone() with CLONE_FILES (even though when we clone() to call execvp() later, we don't set CLONE_FILES), so, even though I don't see a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be automatic from the fact we don't fork().
So, I did think about that when wrote it, but went for the short version rather than saying clone() with CLONE_FILES doesn't count.
Now, I realised that we've both fallen for the trap again, forgetting that this has nothing to do with fork() or clone() and is, as it says right there in the name, about exec().
No, wait, I didn't fall for it, not this time. :) That's why I was mentioning that when we call clone() and execvp() later (which would be
Uh...? I'm pretty sure the only execve(2) in the entire program is where we spawn passt.avx2. That's essentially the very first thing we do, long before this point.
the only path that matters), we don't set CLONE_FILES anyway.
CLONE_FILES is irrelevant, it's lost during execve(2). -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 20 May 2026 11:04:58 +1000
David Gibson
On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:
On Mon, 18 May 2026 12:28:57 +1000 David Gibson
wrote: On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
On Wed, 13 May 2026 14:14:21 +1000 David Gibson
wrote: Generally we try to set the O_CLOEXEC flag on every fd we create. This seems to be generally accepted security best practice these days, and we never fork(), so certainly have no need to pass fds to children.
But we do clone() with CLONE_FILES (even though when we clone() to call execvp() later, we don't set CLONE_FILES), so, even though I don't see a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be automatic from the fact we don't fork().
So, I did think about that when wrote it, but went for the short version rather than saying clone() with CLONE_FILES doesn't count.
Now, I realised that we've both fallen for the trap again, forgetting that this has nothing to do with fork() or clone() and is, as it says right there in the name, about exec().
No, wait, I didn't fall for it, not this time. :) That's why I was mentioning that when we call clone() and execvp() later (which would be
Uh...? I'm pretty sure the only execve(2) in the entire program is where we spawn passt.avx2. That's essentially the very first thing we do, long before this point.
Well, grep would beg to differ, as we don't call execve() at all, but: $ grep execv *.c | grep -v qrap arch.c: execv(new_path, argv); pasta.c: execvp(a->exe, a->argv); O_CLOEXEC (or lack thereof) also matters on execvp().
the only path that matters), we don't set CLONE_FILES anyway.
CLONE_FILES is irrelevant, it's lost during execve(2).
Yes, but if you first clone(), which we actually do before calling pasta_spawn_cmd(), and then execvp(), CLONE_FILES on clone() *would* matter, because the cloned process would inherit the open files, and the process started by execvp() would then get those files as well. But as I was mentioning in that path we don't use CLONE_FILES anyway, so that's not relevant. -- Stefano
On Wed, May 20, 2026 at 01:36:48PM +0200, Stefano Brivio wrote:
On Wed, 20 May 2026 11:04:58 +1000 David Gibson
wrote: On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:
On Mon, 18 May 2026 12:28:57 +1000 David Gibson
wrote: On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
On Wed, 13 May 2026 14:14:21 +1000 David Gibson
wrote: Generally we try to set the O_CLOEXEC flag on every fd we create. This seems to be generally accepted security best practice these days, and we never fork(), so certainly have no need to pass fds to children.
But we do clone() with CLONE_FILES (even though when we clone() to call execvp() later, we don't set CLONE_FILES), so, even though I don't see a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be automatic from the fact we don't fork().
So, I did think about that when wrote it, but went for the short version rather than saying clone() with CLONE_FILES doesn't count.
Now, I realised that we've both fallen for the trap again, forgetting that this has nothing to do with fork() or clone() and is, as it says right there in the name, about exec().
No, wait, I didn't fall for it, not this time. :) That's why I was mentioning that when we call clone() and execvp() later (which would be
Uh...? I'm pretty sure the only execve(2) in the entire program is where we spawn passt.avx2. That's essentially the very first thing we do, long before this point.
Well, grep would beg to differ, as we don't call execve() at all, but:
I meant the system call execve(2), which execv() and execvp() are library wrappers around.
$ grep execv *.c | grep -v qrap arch.c: execv(new_path, argv); pasta.c: execvp(a->exe, a->argv);
Ah, I did miss the one in pasta_spawn_cmd(). Of course, we definitely don't want to leak our internal fds into the spawned command, so CLOEXEC is what we want.
O_CLOEXEC (or lack thereof) also matters on execvp().
the only path that matters), we don't set CLONE_FILES anyway.
CLONE_FILES is irrelevant, it's lost during execve(2).
Yes, but if you first clone(), which we actually do before calling pasta_spawn_cmd(), and then execvp(), CLONE_FILES on clone() *would* matter, because the cloned process would inherit the open files, and the process started by execvp() would then get those files as well.
No, it doesn't matter. If you clone() without CLONE_FILES, the new thread/process gets a copy of the handles, which do or don't survive exec() depending on O_CLOEXEC. If you clone with CLONE_FILES, the new process shares the fd table. The fd table is unshared again as part of the exec(). From execve(2):
• The file descriptor table is unshared, undoing the effect of the CLONE_FILES flag of clone(2).
.. then the now copied files do or don't survive depending on O_CLOEXEC. Either way, O_CLOEXEC has the same final effect.
But as I was mentioning in that path we don't use CLONE_FILES anyway, so that's not relevant.
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 20 May 2026 22:52:47 +1000
David Gibson
On Wed, May 20, 2026 at 01:36:48PM +0200, Stefano Brivio wrote:
On Wed, 20 May 2026 11:04:58 +1000 David Gibson
wrote: On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:
On Mon, 18 May 2026 12:28:57 +1000 David Gibson
wrote: On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote:
On Wed, 13 May 2026 14:14:21 +1000 David Gibson
wrote: > Generally we try to set the O_CLOEXEC flag on every fd we create. This > seems to be generally accepted security best practice these days, and we > never fork(), so certainly have no need to pass fds to children.
But we do clone() with CLONE_FILES (even though when we clone() to call execvp() later, we don't set CLONE_FILES), so, even though I don't see a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be automatic from the fact we don't fork().
So, I did think about that when wrote it, but went for the short version rather than saying clone() with CLONE_FILES doesn't count.
Now, I realised that we've both fallen for the trap again, forgetting that this has nothing to do with fork() or clone() and is, as it says right there in the name, about exec().
No, wait, I didn't fall for it, not this time. :) That's why I was mentioning that when we call clone() and execvp() later (which would be
Uh...? I'm pretty sure the only execve(2) in the entire program is where we spawn passt.avx2. That's essentially the very first thing we do, long before this point.
Well, grep would beg to differ, as we don't call execve() at all, but:
I meant the system call execve(2), which execv() and execvp() are library wrappers around.
Oops, I missed the (2).
$ grep execv *.c | grep -v qrap arch.c: execv(new_path, argv); pasta.c: execvp(a->exe, a->argv);
Ah, I did miss the one in pasta_spawn_cmd(). Of course, we definitely don't want to leak our internal fds into the spawned command, so CLOEXEC is what we want.
O_CLOEXEC (or lack thereof) also matters on execvp().
the only path that matters), we don't set CLONE_FILES anyway.
CLONE_FILES is irrelevant, it's lost during execve(2).
Yes, but if you first clone(), which we actually do before calling pasta_spawn_cmd(), and then execvp(), CLONE_FILES on clone() *would* matter, because the cloned process would inherit the open files, and the process started by execvp() would then get those files as well.
No, it doesn't matter. If you clone() without CLONE_FILES, the new thread/process gets a copy of the handles, which do or don't survive exec() depending on O_CLOEXEC. If you clone with CLONE_FILES, the new process shares the fd table. The fd table is unshared again as part of the exec().
From execve(2):
• The file descriptor table is unshared, undoing the effect of the CLONE_FILES flag of clone(2).
.. then the now copied files do or don't survive depending on O_CLOEXEC. Either way, O_CLOEXEC has the same final effect.
Ah, right, I forgot about this part. But anyway, O_CLOEXEC is always relevant, as long as we call execvp() not right after start, regardless of having called clone() before. And in this case (pasta_spawn_cmd()) we call it rather "late", so the fact we don't call fork() is not really relevant for this purpose. -- Stefano
On Wed, May 20, 2026 at 04:22:10PM +0200, Stefano Brivio wrote:
On Wed, 20 May 2026 22:52:47 +1000 David Gibson
wrote: On Wed, May 20, 2026 at 01:36:48PM +0200, Stefano Brivio wrote:
On Wed, 20 May 2026 11:04:58 +1000 David Gibson
wrote: On Wed, May 20, 2026 at 02:37:02AM +0200, Stefano Brivio wrote:
On Mon, 18 May 2026 12:28:57 +1000 David Gibson
wrote: On Sat, May 16, 2026 at 05:46:11PM +0200, Stefano Brivio wrote: > On Wed, 13 May 2026 14:14:21 +1000 > David Gibson
wrote: > > > Generally we try to set the O_CLOEXEC flag on every fd we create. This > > seems to be generally accepted security best practice these days, and we > > never fork(), so certainly have no need to pass fds to children. > > But we do clone() with CLONE_FILES (even though when we clone() to call > execvp() later, we don't set CLONE_FILES), so, even though I don't see > a reason to skip O_CLOEXEC for c->fd_tap, this conclusion shouldn't be > automatic from the fact we don't fork(). So, I did think about that when wrote it, but went for the short version rather than saying clone() with CLONE_FILES doesn't count.
Now, I realised that we've both fallen for the trap again, forgetting that this has nothing to do with fork() or clone() and is, as it says right there in the name, about exec().
No, wait, I didn't fall for it, not this time. :) That's why I was mentioning that when we call clone() and execvp() later (which would be
Uh...? I'm pretty sure the only execve(2) in the entire program is where we spawn passt.avx2. That's essentially the very first thing we do, long before this point.
Well, grep would beg to differ, as we don't call execve() at all, but:
I meant the system call execve(2), which execv() and execvp() are library wrappers around.
Oops, I missed the (2).
$ grep execv *.c | grep -v qrap arch.c: execv(new_path, argv); pasta.c: execvp(a->exe, a->argv);
Ah, I did miss the one in pasta_spawn_cmd(). Of course, we definitely don't want to leak our internal fds into the spawned command, so CLOEXEC is what we want.
O_CLOEXEC (or lack thereof) also matters on execvp().
the only path that matters), we don't set CLONE_FILES anyway.
CLONE_FILES is irrelevant, it's lost during execve(2).
Yes, but if you first clone(), which we actually do before calling pasta_spawn_cmd(), and then execvp(), CLONE_FILES on clone() *would* matter, because the cloned process would inherit the open files, and the process started by execvp() would then get those files as well.
No, it doesn't matter. If you clone() without CLONE_FILES, the new thread/process gets a copy of the handles, which do or don't survive exec() depending on O_CLOEXEC. If you clone with CLONE_FILES, the new process shares the fd table. The fd table is unshared again as part of the exec().
From execve(2):
• The file descriptor table is unshared, undoing the effect of the CLONE_FILES flag of clone(2).
.. then the now copied files do or don't survive depending on O_CLOEXEC. Either way, O_CLOEXEC has the same final effect.
Ah, right, I forgot about this part. But anyway, O_CLOEXEC is always relevant, as long as we call execvp() not right after start, regardless of having called clone() before.
And in this case (pasta_spawn_cmd()) we call it rather "late", so the fact we don't call fork() is not really relevant for this purpose.
Right. I already removed the reference to fork() in the next vesion. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Stefano Brivio