We use our own implementation of assert() because the glic implementation uses syscalls that aren't in our seccomp filter, see 7a8ed9459dfe ("Make assertions actually useful"). And we replaced it by an err(), followed by an abort() (that is also catched by seccomp). We don't have a coredump or a backtrace but we have at least the error message... only if logging is enabled. As this kind of information is needed in any case, replace the "err()" function by an "fprintf(stderr, ...)". Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- util.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/util.h b/util.h index b7541ce24e5a..b44b4bfdccd7 100644 --- a/util.h +++ b/util.h @@ -13,6 +13,7 @@ #include <stdint.h> #include <string.h> #include <signal.h> +#include <stdio.h> #include <arpa/inet.h> #include "log.h" @@ -67,8 +68,10 @@ #define ASSERT(expr) \ do { \ if (!(expr)) { \ - err("ASSERTION FAILED in %s (%s:%d): %s", \ - __func__, __FILE__, __LINE__, STRINGIFY(expr)); \ + fprintf(stderr, \ + "ASSERTION FAILED in %s (%s:%d): %s\n", \ + __func__, __FILE__, __LINE__, \ + STRINGIFY(expr)); \ /* This may actually SIGSYS, due to seccomp, \ * but that will still get the job done \ */ \ -- 2.45.2
On Mon, 5 Aug 2024 16:10:27 +0200 Laurent Vivier <lvivier(a)redhat.com> wrote:We use our own implementation of assert() because the glic implementation uses syscalls that aren't in our seccomp filter, see 7a8ed9459dfe ("Make assertions actually useful"). And we replaced it by an err(), followed by an abort() (that is also catched by seccomp). We don't have a coredump or a backtrace but we have at least the error message... only if logging is enabled....wait, err() should always end up somewhere (syslog, log file, or stderr). If you pick stderr, and we closed stderr, then the error will not be reported anywhere (or, possibly worse, on a socket that happens to have number 2). In which case are you losing messages? -- Stefano
On 05/08/2024 16:34, Stefano Brivio wrote:On Mon, 5 Aug 2024 16:10:27 +0200 Laurent Vivier <lvivier(a)redhat.com> wrote:Try something: diff --git a/udp.c b/udp.c index a92014806a73..7e179d3b863b 100644 --- a/udp.c +++ b/udp.c @@ -707,6 +707,8 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref, if ((n = udp_sock_recv(c, ref.fd, events, mmh_recv)) <= 0) return; + ASSERT(0); + /* We divide datagrams into batches based on how we need to send them, * determined by udp_meta[i].tosidx. To avoid either two passes through * the array, or recalculating tosidx for a single entry, we have to And then generate some UDP traffic. This exits only with: Bad system call Oh, wait, I can see the error in the syslog... but it's not very intuitive as we have the "Bad system call" displayed in the terminal. So, ignore my patch. LaurentWe use our own implementation of assert() because the glic implementation uses syscalls that aren't in our seccomp filter, see 7a8ed9459dfe ("Make assertions actually useful"). And we replaced it by an err(), followed by an abort() (that is also catched by seccomp). We don't have a coredump or a backtrace but we have at least the error message... only if logging is enabled....wait, err() should always end up somewhere (syslog, log file, or stderr). If you pick stderr, and we closed stderr, then the error will not be reported anywhere (or, possibly worse, on a socket that happens to have number 2). In which case are you losing messages?
On Mon, 5 Aug 2024 17:52:06 +0200 Laurent Vivier <lvivier(a)redhat.com> wrote:On 05/08/2024 16:34, Stefano Brivio wrote:Oops, this is an actual issue: if you run passt in foreground, then err() should _also_ print to stderr, see commit bca0fefa32e0 ("conf, passt: Make --stderr do nothing, and deprecate it")... but it doesn't. That's because we set the log_runtime flag once we're ready (whether we daemonised or not), and then we have: if (debug_print || !log_conf_parsed || (!log_runtime && (log_mask & LOG_MASK(LOG_PRI(pri))))) { (void)vfprintf(stderr, format, ap); if (newline && format[strlen(format)] != '\n') fprintf(stderr, "\n"); } See also https://archives.passt.top/passt-dev/ZnTUrhu8LNWi-krS@zatzit/. Still, while we want to keep pasta usable, we should make sure we print to stderr when passt is running in foreground. I'll send another patch... -- StefanoOn Mon, 5 Aug 2024 16:10:27 +0200 Laurent Vivier <lvivier(a)redhat.com> wrote:Try something: diff --git a/udp.c b/udp.c index a92014806a73..7e179d3b863b 100644 --- a/udp.c +++ b/udp.c @@ -707,6 +707,8 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref, if ((n = udp_sock_recv(c, ref.fd, events, mmh_recv)) <= 0) return; + ASSERT(0); + /* We divide datagrams into batches based on how we need to send them, * determined by udp_meta[i].tosidx. To avoid either two passes through * the array, or recalculating tosidx for a single entry, we have to And then generate some UDP traffic. This exits only with: Bad system call Oh, wait, I can see the error in the syslog... but it's not very intuitive as we have the "Bad system call" displayed in the terminal.We use our own implementation of assert() because the glic implementation uses syscalls that aren't in our seccomp filter, see 7a8ed9459dfe ("Make assertions actually useful"). And we replaced it by an err(), followed by an abort() (that is also catched by seccomp). We don't have a coredump or a backtrace but we have at least the error message... only if logging is enabled....wait, err() should always end up somewhere (syslog, log file, or stderr). If you pick stderr, and we closed stderr, then the error will not be reported anywhere (or, possibly worse, on a socket that happens to have number 2). In which case are you losing messages?
On Mon, Aug 05, 2024 at 04:10:27PM +0200, Laurent Vivier wrote:We use our own implementation of assert() because the glic implementation uses syscalls that aren't in our seccomp filter, see 7a8ed9459dfe ("Make assertions actually useful"). And we replaced it by an err(), followed by an abort() (that is also catched by seccomp).I think Stefano's said everything I would on the change itself, but..We don't have a coredump or a backtrace but we have at least the error message... only if logging is enabled.Whether we get a coredump shouldn't be affected by our weird ASSERT() here. If coredumps are enabled (which they're not by default on current distros, AFAICT), we should still get a coredump with the SIGSYS here, just as we would for a SIGABRT.As this kind of information is needed in any case, replace the "err()" function by an "fprintf(stderr, ...)". Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- util.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/util.h b/util.h index b7541ce24e5a..b44b4bfdccd7 100644 --- a/util.h +++ b/util.h @@ -13,6 +13,7 @@ #include <stdint.h> #include <string.h> #include <signal.h> +#include <stdio.h> #include <arpa/inet.h> #include "log.h" @@ -67,8 +68,10 @@ #define ASSERT(expr) \ do { \ if (!(expr)) { \ - err("ASSERTION FAILED in %s (%s:%d): %s", \ - __func__, __FILE__, __LINE__, STRINGIFY(expr)); \ + fprintf(stderr, \ + "ASSERTION FAILED in %s (%s:%d): %s\n", \ + __func__, __FILE__, __LINE__, \ + STRINGIFY(expr)); \ /* This may actually SIGSYS, due to seccomp, \ * but that will still get the job done \ */ \-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On 06/08/2024 02:27, David Gibson wrote:On Mon, Aug 05, 2024 at 04:10:27PM +0200, Laurent Vivier wrote:I don't know what, but there is something that prevent passts to generate a core dump (I tried abort(), (char *)0 = 0, it works with a simple program, not with passt). Moreover, if we use gdb it doesn't stop on the syscall but exit and we cannot locate the exit point. And as strace doesn't report the syscall that generates the SIGSYS, it makes hard to find which one it is. Thanks, LaurentWe use our own implementation of assert() because the glic implementation uses syscalls that aren't in our seccomp filter, see 7a8ed9459dfe ("Make assertions actually useful"). And we replaced it by an err(), followed by an abort() (that is also catched by seccomp).I think Stefano's said everything I would on the change itself, but..We don't have a coredump or a backtrace but we have at least the error message... only if logging is enabled.Whether we get a coredump shouldn't be affected by our weird ASSERT() here. If coredumps are enabled (which they're not by default on current distros, AFAICT), we should still get a coredump with the SIGSYS here, just as we would for a SIGABRT.
On Tue, 6 Aug 2024 11:27:40 +0200 Laurent Vivier <lvivier(a)redhat.com> wrote:On 06/08/2024 02:27, David Gibson wrote:This: isolation.c:374: prctl(PR_SET_DUMPABLE, 0);On Mon, Aug 05, 2024 at 04:10:27PM +0200, Laurent Vivier wrote:I don't know what, but there is something that prevent passts to generate a core dump (I tried abort(), (char *)0 = 0, it works with a simple program, not with passt).We use our own implementation of assert() because the glic implementation uses syscalls that aren't in our seccomp filter, see 7a8ed9459dfe ("Make assertions actually useful"). And we replaced it by an err(), followed by an abort() (that is also catched by seccomp).I think Stefano's said everything I would on the change itself, but..We don't have a coredump or a backtrace but we have at least the error message... only if logging is enabled.Whether we get a coredump shouldn't be affected by our weird ASSERT() here. If coredumps are enabled (which they're not by default on current distros, AFAICT), we should still get a coredump with the SIGSYS here, just as we would for a SIGABRT.Moreover, if we use gdb it doesn't stop on the syscall but exit and we cannot locate the exit point. And as strace doesn't report the syscall that generates the SIGSYS, it makes hard to find which one it is.This is probably caused by the seccomp filter instead. You could skip applying that by commenting this out: if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) || prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) die_perror("Failed to apply seccomp filter"); and find out which system call causes that... maybe we could add it to the 'make valgrind' target, then, or add a 'make debug' one. It's just important to clearly show that users should *not* use any such target for regular operation. -- Stefano
On Tue, Aug 06, 2024 at 11:27:40AM +0200, Laurent Vivier wrote:On 06/08/2024 02:27, David Gibson wrote:Oh... that'll be the PR_SET_DUMPABLE. That'll stop strace, gdb, core dumps, .. I routinely comment that out when debugging. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonOn Mon, Aug 05, 2024 at 04:10:27PM +0200, Laurent Vivier wrote:I don't know what, but there is something that prevent passts to generate a core dump (I tried abort(), (char *)0 = 0, it works with a simple program, not with passt). Moreover, if we use gdb it doesn't stop on the syscall but exit and we cannot locate the exit point. And as strace doesn't report the syscall that generates the SIGSYS, it makes hard to find which one it is.We use our own implementation of assert() because the glic implementation uses syscalls that aren't in our seccomp filter, see 7a8ed9459dfe ("Make assertions actually useful"). And we replaced it by an err(), followed by an abort() (that is also catched by seccomp).I think Stefano's said everything I would on the change itself, but..We don't have a coredump or a backtrace but we have at least the error message... only if logging is enabled.Whether we get a coredump shouldn't be affected by our weird ASSERT() here. If coredumps are enabled (which they're not by default on current distros, AFAICT), we should still get a coredump with the SIGSYS here, just as we would for a SIGABRT.