This should probably be filed on bugzilla but I can't be bothered signing up for yet another service, sorry! O:-) Short version: in a CentOS Stream 9 container, install the latest build (0^20221015.gb3f3591-1) from the official COPR, then run $ passt --runas 65534 -e -t 1234 Segmentation fault (core dumped) Doing the same thing in a CentOS Stream 8 container doesn't result in a crash, and the previous build (0^20220929.g06aa26f-1) is fine even on CentOS Stream 9. The backtrace produced by gdb doesn't look very illuminating, but maybe it will make more sense to a developer: Starting program: /usr/bin/passt --runas 65534 -e -t 1234 warning: Error disabling address space randomization: Operation not permitted [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". process 2856 is executing new program: /usr/bin/passt.avx2 warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x000055663d5307ff in nl_sock_init (c=0x7fffd7fe4ed0, ns=false) at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/netlink.c:78 78 { (gdb) t a a bt Thread 1 (Thread 0x7fe8763da740 (LWP 2856) "passt.avx2"): #0 0x000055663d5307ff in nl_sock_init (c=0x7fffd7fe4ed0, ns=false) at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/netlink.c:78 #1 0x000055663d531296 in conf (c=<optimized out>, argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/conf.c:1547 #2 0x000055663d5262e6 in main (argc=6, argv=0x7fffd82b2a98) at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/passt.c:243 A very interesting thing that I've noticed is that the crash doesn't occur when building from upstream sources (tag 2022_10_15.b3f3591, so it should match what's in the RPM). So I've tried looking into the compiler options used during the RPM build, and the gcc command line for passt.avx2 looks like gcc -Wall -Wextra -pedantic -std=c99 -D_XOPEN_SOURCE=700 \ -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -pie -fPIE -DPAGE_SIZE=4096 \ -DNETNS_RUN_DIR=\"/run/netns\" -DPASST_AUDIT_ARCH=AUDIT_ARCH_X86_64 \ -DRLIMIT_STACK_VAL=8192 -DARCH=\"x86_64\" \ -DVERSION=\"0^20221015.gb3f3591-1.el9.x86_64\" -DTCP_HASH_NOINLINE \ -DSIPHASH_20B_NOINLINE -DCSUM_UNALIGNED_NO_IPA -DHAS_SND_WND \ -DHAS_BYTES_ACKED -DHAS_MIN_RTT -DHAS_GETRANDOM \ -fstack-protector-strong -Ofast -mavx2 -ftree-vectorize \ -funroll-loops -flto=auto -ffat-lto-objects -fexceptions -g \ -grecord-gcc-switches -pipe -Wall -Werror=format-security \ -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS \ -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 \ -fstack-protector-strong \ -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 \ -mtune=generic -fasynchronous-unwind-tables \ -fstack-clash-protection -fcf-protection arch.c arp.c checksum.c \ conf.c dhcp.c dhcpv6.c icmp.c igmp.c isolation.c lineread.c log.c \ mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c \ tap.c tcp.c tcp_splice.c udp.c util.c -o passt.avx2 -Wl,-z,relro \ -Wl,--as-needed -Wl,-z,now \ -specs=/usr/lib/rpm/redhat/redhat-hardened-ld \ -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 I tried making educated guesses at which ones among those could cause trouble, and pretty quickly landed on the LTO stuff. Indeed, dropping -flto=auto -ffat-lto-objects from the command results in a working binary, and adding %global _lto_cflags %nil to the top of the spec file produces a working RPM. Of course disabling LTO is a workaround, not a solution, especially considering that the previous version didn't have any problem with it, but hopefully there's enough information in here to allow the developers to track down and resolve the underlying issue :) -- Andrea Bolognani / Red Hat / Virtualization
Hi Andrea, Thanks for reporting! On Fri, 21 Oct 2022 10:05:03 -0700 Andrea Bolognani <abologna(a)redhat.com> wrote:This should probably be filed on bugzilla but I can't be bothered signing up for yet another service, sorry! O:-)Ah, no worries at all, this ought to be fixed quickly enough.Short version: in a CentOS Stream 9 container, install the latest build (0^20221015.gb3f3591-1) from the official COPR, then run $ passt --runas 65534 -e -t 1234 Segmentation fault (core dumped) Doing the same thing in a CentOS Stream 8 container doesn't result in a crash, and the previous build (0^20220929.g06aa26f-1) is fine even on CentOS Stream 9. The backtrace produced by gdb doesn't look very illuminating, but maybe it will make more sense to a developer: Starting program: /usr/bin/passt --runas 65534 -e -t 1234 warning: Error disabling address space randomization: Operation not permitted [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". process 2856 is executing new program: /usr/bin/passt.avx2 warning: Could not load shared library symbols for linux-vdso.so.1. Do you need "set solib-search-path" or "set sysroot"? [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x000055663d5307ff in nl_sock_init (c=0x7fffd7fe4ed0, ns=false) at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/netlink.c:78 78 { (gdb) t a a bt Thread 1 (Thread 0x7fe8763da740 (LWP 2856) "passt.avx2"): #0 0x000055663d5307ff in nl_sock_init (c=0x7fffd7fe4ed0, ns=false) at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/netlink.c:78 #1 0x000055663d531296 in conf (c=<optimized out>, argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/conf.c:1547 #2 0x000055663d5262e6 in main (argc=6, argv=0x7fffd82b2a98) at /usr/src/debug/passt-0^20221015.gb3f3591-1.el9.x86_64/passt.c:243 A very interesting thing that I've noticed is that the crash doesn't occur when building from upstream sources (tag 2022_10_15.b3f3591, so it should match what's in the RPM). So I've tried looking into the compiler options used during the RPM build, and the gcc command line for passt.avx2 looks like gcc -Wall -Wextra -pedantic -std=c99 -D_XOPEN_SOURCE=700 \ -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -pie -fPIE -DPAGE_SIZE=4096 \ -DNETNS_RUN_DIR=\"/run/netns\" -DPASST_AUDIT_ARCH=AUDIT_ARCH_X86_64 \ -DRLIMIT_STACK_VAL=8192 -DARCH=\"x86_64\" \ -DVERSION=\"0^20221015.gb3f3591-1.el9.x86_64\" -DTCP_HASH_NOINLINE \ -DSIPHASH_20B_NOINLINE -DCSUM_UNALIGNED_NO_IPA -DHAS_SND_WND \ -DHAS_BYTES_ACKED -DHAS_MIN_RTT -DHAS_GETRANDOM \ -fstack-protector-strong -Ofast -mavx2 -ftree-vectorize \ -funroll-loops -flto=auto -ffat-lto-objects -fexceptions -g \ -grecord-gcc-switches -pipe -Wall -Werror=format-security \ -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS \ -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 \ -fstack-protector-strong \ -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=x86-64-v2 \ -mtune=generic -fasynchronous-unwind-tables \ -fstack-clash-protection -fcf-protection arch.c arp.c checksum.c \ conf.c dhcp.c dhcpv6.c icmp.c igmp.c isolation.c lineread.c log.c \ mld.c ndp.c netlink.c packet.c passt.c pasta.c pcap.c siphash.c \ tap.c tcp.c tcp_splice.c udp.c util.c -o passt.avx2 -Wl,-z,relro \ -Wl,--as-needed -Wl,-z,now \ -specs=/usr/lib/rpm/redhat/redhat-hardened-ld \ -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 I tried making educated guesses at which ones among those could cause trouble, and pretty quickly landed on the LTO stuff. Indeed, droppingNice guess ;)-flto=auto -ffat-lto-objects from the command results in a working binary, and adding %global _lto_cflags %nil to the top of the spec file produces a working RPM.Uh-oh. We recently worked around a couple of issues we hit with LTO and gcc 12 (which doesn't automatically imply gcc has an issue, of course): 06aa26fcf398 Makefile: Hack for optimised-away store in ndp() before checksum calculation https://passt.top/passt/commit/?id=06aa26fcf398f5d19ab46e42996190d7f95e837a 505a33e9f9d9 Makefile: Extend noinline workarounds for LTO and -O2 to gcc 12 https://passt.top/passt/commit/?id=505a33e9f9d9d766e39fd9c54c6cb2136ae99ecb ...I wonder if this is somehow related. Could you also quickly try to start it with strace and report a couple of lines before the mischief?Of course disabling LTO is a workaround, not a solution, especially considering that the previous version didn't have any problem with it, but hopefully there's enough information in here to allow the developers to track down and resolve the underlying issue :)Probably yes. :) I'm looking into this quickly now, but I'll be travelling tomorrow. If I fail, I hope David is faster than me ;) -- Stefano
On Fri, 21 Oct 2022 21:15:07 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote:On Fri, 21 Oct 2022 20:48:20 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote:Another workaround: diff --git a/util.h b/util.h index 27829b1..64b9a26 100644 --- a/util.h +++ b/util.h @@ -72,7 +72,7 @@ #define IPV4_IS_LOOPBACK(addr) \ ((addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET) -#define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 4) +#define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 10) #define NS_CALL(fn, arg) \ do { \ char ns_fn_stack[NS_FN_STACK_SIZE]; \ ...we need to harden this "against" -fstack-protector-strong when inlining gets quite extreme due to LTO, with some build-time assertions, or a more reasonable (and involved) calculation of what ns_fn_stack really needs. I'll try to send a patch soon (again, if nobody beats me at it). -- StefanoCould you also quickly try to start it with strace and report a couple of lines before the mischief?Never mind, just reproduced...
On Fri, 21 Oct 2022 10:05:03 -0700 Andrea Bolognani <abologna(a)redhat.com> wrote:This should probably be filed on bugzilla but I can't be bothered signing up for yet another service, sorry! O:-) Short version: in a CentOS Stream 9 container, install the latest build (0^20221015.gb3f3591-1) from the official COPR, then runIt should now be fixed in 0^20221022.gb68da10-1: https://download.copr.fedorainfracloud.org/results/sbrivio/passt/centos-str… Persistent mirror at: https://passt.top/builds/copr/centos-stream-9-x86_64/04973930-passt/ -- Stefano