[PATCH v2 0/2] Some fixes for valgrind suppressions
With the new versions of compiler and valgrind in Fedora 39, I hit some spurious test failures. Here are some workarounds. Changes since v1: * Took entirely different approach, fixing the existing suppression * Extra patch allowing optimised valgrind builds David Gibson (2): valgrind: Adjust suppression for MSG_TRUNC with NULL buffer valgrind: Don't disable optimizations for valgrind builds Makefile | 2 +- tcp.c | 9 +++++++++ test/valgrind.supp | 3 +-- 3 files changed, 11 insertions(+), 3 deletions(-) -- 2.41.0
valgrind complains if we pass a NULL buffer to recv(), even if we use
MSG_TRUNC, in which case it's actually safe. For a long time we've had
a valgrind suppression for this. It singles out the recv() in
tcp_sock_consume(), the only place we use MSG_TRUNC.
However, tcp_sock_consume() only has a single caller, which makes it a
prime candidate for inlining. If inlined, it won't appear on the stack and
valgrind won't match the correct suppression.
It appears that certain compiler versions (for example gcc-13.2.1 in
Fedora 39) will inline this function even with the -O0 we use for valgrind
builds. This breaks the suppression leading to a spurious failure in the
tests.
There's not really any way to adjust the suppression itself without making
it overly broad (we don't want to match other recv() calls). So, as a hack
explicitly prevent inlining of this function when we're making a valgrind
build. To accomplish this add an explicit -DVALGRIND when making such a
build.
Signed-off-by: David Gibson
On Thu, 16 Nov 2023 16:43:49 +1100
David Gibson
valgrind complains if we pass a NULL buffer to recv(), even if we use MSG_TRUNC, in which case it's actually safe. For a long time we've had a valgrind suppression for this. It singles out the recv() in tcp_sock_consume(), the only place we use MSG_TRUNC.
However, tcp_sock_consume() only has a single caller, which makes it a prime candidate for inlining. If inlined, it won't appear on the stack and valgrind won't match the correct suppression.
It appears that certain compiler versions (for example gcc-13.2.1 in Fedora 39) will inline this function even with the -O0 we use for valgrind builds. This breaks the suppression leading to a spurious failure in the tests.
There's not really any way to adjust the suppression itself without making it overly broad (we don't want to match other recv() calls). So, as a hack explicitly prevent inlining of this function when we're making a valgrind build. To accomplish this add an explicit -DVALGRIND when making such a build.
Signed-off-by: David Gibson
--- Makefile | 2 +- tcp.c | 9 +++++++++ test/valgrind.supp | 3 +-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index ff21459..57b2544 100644 --- a/Makefile +++ b/Makefile @@ -128,7 +128,7 @@ qrap: $(QRAP_SRCS) passt.h valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ getpid gettid kill clock_gettime mmap \ munmap open unlink gettimeofday futex -valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) +valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) -DVALGRIND valgrind: all
.PHONY: clean diff --git a/tcp.c b/tcp.c index cfcd40a..b86d433 100644 --- a/tcp.c +++ b/tcp.c @@ -2097,6 +2097,15 @@ static void tcp_conn_from_tap(struct ctx *c, * * Return: 0 on success, negative error code from recv() on failure */ +#ifdef VALGRIND +/* valgrind doesn't realise that passing a NULL buffer to recv() is ok if using + * MSG_TRUNC. We have a supression for this in the tests, but it relies on ^ pp
(I can fix it up on merge but I guess you prefer to repost). Rest of the series looks good to me. -- Stefano
On Thu, Nov 16, 2023 at 08:21:34AM +0100, Stefano Brivio wrote:
On Thu, 16 Nov 2023 16:43:49 +1100 David Gibson
wrote: valgrind complains if we pass a NULL buffer to recv(), even if we use MSG_TRUNC, in which case it's actually safe. For a long time we've had a valgrind suppression for this. It singles out the recv() in tcp_sock_consume(), the only place we use MSG_TRUNC.
However, tcp_sock_consume() only has a single caller, which makes it a prime candidate for inlining. If inlined, it won't appear on the stack and valgrind won't match the correct suppression.
It appears that certain compiler versions (for example gcc-13.2.1 in Fedora 39) will inline this function even with the -O0 we use for valgrind builds. This breaks the suppression leading to a spurious failure in the tests.
There's not really any way to adjust the suppression itself without making it overly broad (we don't want to match other recv() calls). So, as a hack explicitly prevent inlining of this function when we're making a valgrind build. To accomplish this add an explicit -DVALGRIND when making such a build.
Signed-off-by: David Gibson
--- Makefile | 2 +- tcp.c | 9 +++++++++ test/valgrind.supp | 3 +-- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index ff21459..57b2544 100644 --- a/Makefile +++ b/Makefile @@ -128,7 +128,7 @@ qrap: $(QRAP_SRCS) passt.h valgrind: EXTRA_SYSCALLS += rt_sigprocmask rt_sigtimedwait rt_sigaction \ getpid gettid kill clock_gettime mmap \ munmap open unlink gettimeofday futex -valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) +valgrind: FLAGS:=-g -O0 $(filter-out -O%,$(FLAGS)) -DVALGRIND valgrind: all
.PHONY: clean diff --git a/tcp.c b/tcp.c index cfcd40a..b86d433 100644 --- a/tcp.c +++ b/tcp.c @@ -2097,6 +2097,15 @@ static void tcp_conn_from_tap(struct ctx *c, * * Return: 0 on success, negative error code from recv() on failure */ +#ifdef VALGRIND +/* valgrind doesn't realise that passing a NULL buffer to recv() is ok if using + * MSG_TRUNC. We have a supression for this in the tests, but it relies on ^ pp
(I can fix it up on merge but I guess you prefer to repost). Rest of the series looks good to me.
Oops, yes. Resent. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
When we plan to use valgrind, we need to build passt a bit differently:
* We need debug symbols so that valgrind can match problems it finds to
meaningful locations
* We need to allow additional syscalls in the seccomp filter, because
valgrind's wrappers need them
Currently we also disable optimization (-O0). This is unfortunate, because
it will make valgrind tests even slower than they'd otherwise be. Worse,
it's possible that the asm behaviour without optimization might be
different enough that valgrind could miss a use of uninitialized variable
or other fault it would detect.
I suspect this was originally done because without it inlining could mean
that suppressions we use don't reliably match the places we want them to.
Alas, it turns out this is true even with -O0. We've now implemented a
more robust workaround for this (explicit ((noinline)) attributes when
compiled with -DVALGRIND). So, we can re-enable optimization for valgrind
builds, making them closer to regular builds.
Signed-off-by: David Gibson
participants (2)
-
David Gibson
-
Stefano Brivio