Some of these are actual fixes, some are just hacks to keep tests running while we figure out issues. A part of this series should actually be reverted once we figure out why iperf3 clients and servers get stuck now and then. Stefano Brivio (5): test/lib: Run also iperf3 clients in background, revert to time-based wait test/lib: Wait for kernel to free up ports used by iperf3 before reusing them test/perf: Wait for neper servers in guest to be ready before starting client test/lib: Wait for DHCPv4 before starting DHCPv6 client in two_guests test test/distro: Update workarounds for Ubuntu 22.04 on s390x test/distro/ubuntu | 4 +++- test/lib/setup | 1 + test/lib/test | 11 ++++++++--- test/perf/passt_tcp | 4 ++++ test/perf/passt_udp | 2 ++ 5 files changed, 18 insertions(+), 4 deletions(-) -- 2.35.1
Unfortunately, this partially counters recent efforts by David to speed up these tests, but it looks like iperf3 clients don't reliably terminate, in some rare cases I couldn't isolate yet. For the moment being, reintroduce the time-based wait approach, now using the configurable test duration, and terminate the servers at the end of it, in case they're stuck. There's no point in keeping the 'sleep 2' later, so drop that, and while at it, make sure that the stuck servers have time to flush the JSON output before we use it. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- test/lib/test | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/test/lib/test b/test/lib/test index d68ade4..7259383 100755 --- a/test/lib/test +++ b/test/lib/test @@ -40,7 +40,7 @@ test_iperf3() { sleep 1 # Wait for server to be ready - pane_or_context_run "${__cctx}" \ + pane_or_context_run_bg "${__cctx}" \ '(' \ ' for i in $(seq 0 '${__procs}'); do' \ ' iperf3 -c '${__dest}' -p '${__port} \ @@ -49,9 +49,12 @@ test_iperf3() { ' wait' \ ')' + sleep $((__time + 5)) + # If client fails to deliver control message, tell server we're done - pane_or_context_run "${__sctx}" \ - 'sleep 2; kill -INT $(cat s*.pid); rm s*.pid' + pane_or_context_run "${__sctx}" 'kill -INT $(cat s*.pid); rm s*.pid' + + sleep 1 # ...and wait for output to be flushed __jval=".end.sum_received.bits_per_second" for __opt in ${@}; do -- 2.35.1
If we start another server on the same port right away, we might fail to bind the port. A small delay appears to be needed -- I'm not entirely sure why at this point. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- test/lib/test | 2 ++ 1 file changed, 2 insertions(+) diff --git a/test/lib/test b/test/lib/test index 7259383..558d0f0 100755 --- a/test/lib/test +++ b/test/lib/test @@ -68,6 +68,8 @@ test_iperf3() { 'for i in $(seq 0 '${__procs}'); do rm s${i}.json; done' TEST_ONE_subs="$(list_add_pair "${TEST_ONE_subs}" "__${__var}__" "${__bw}" )" + + sleep 3 # Wait for kernel to free up ports } test_one_line() { -- 2.35.1
Starting tcp_rr, tcp_crr, udp_rr servers in the guest takes a bit longer than starting the corresponding clients on the host, and we end up starting clients before servers unless we add a delay there. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- test/perf/passt_tcp | 4 ++++ test/perf/passt_udp | 2 ++ 2 files changed, 6 insertions(+) diff --git a/test/perf/passt_tcp b/test/perf/passt_tcp index 5ba5450..8b912c4 100644 --- a/test/perf/passt_tcp +++ b/test/perf/passt_tcp @@ -167,6 +167,7 @@ lat - lat - lat - guestb tcp_rr --nolog -P 10001 -C 10011 -6 +sleep 1 nsout LAT tcp_rr --nolog -P 10001 -C 10011 -6 -c -H ::1 | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 200 150 @@ -177,6 +178,7 @@ lat - lat - lat - guestb tcp_crr --nolog -P 10001 -C 10011 -6 +sleep 1 nsout LAT tcp_crr --nolog -P 10001 -C 10011 -6 -c -H ::1 | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 500 350 @@ -210,6 +212,7 @@ lat - lat - lat - guestb tcp_rr --nolog -P 10001 -C 10011 -4 +sleep 1 nsout LAT tcp_rr --nolog -P 10001 -C 10011 -4 -c -H 127.0.0.1 | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 200 150 @@ -220,6 +223,7 @@ lat - lat - lat - guestb tcp_crr --nolog -P 10001 -C 10011 -4 +sleep 1 nsout LAT tcp_crr --nolog -P 10001 -C 10011 -4 -c -H 127.0.0.1 | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 500 300 diff --git a/test/perf/passt_udp b/test/perf/passt_udp index fd2ddc1..3ad630e 100644 --- a/test/perf/passt_udp +++ b/test/perf/passt_udp @@ -138,6 +138,7 @@ lat - lat - lat - guestb udp_rr --nolog -P 10001 -C 10011 -6 +sleep 1 nsout LAT udp_rr --nolog -P 10001 -C 10011 -6 -c -H ::1 | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 200 150 ns ip link set dev lo mtu 65535 @@ -171,6 +172,7 @@ lat - lat - lat - guestb udp_rr --nolog -P 10001 -C 10011 -4 +sleep 1 nsout LAT udp_rr --nolog -P 10001 -C 10011 -4 -c -H 127.0.0.1 | sed -n 's/^throughput=\(.*\)/\1/p' lat __LAT__ 200 150 ns ip link set dev lo mtu 65535 -- 2.35.1
I'm not sure why, but dhclient hangs otherwise. This reflects what we do in the passt_in_ns setup steps. Eventually, this whole block could go away if we let pasta configure this network namespace with --config-net. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- test/lib/setup | 1 + 1 file changed, 1 insertion(+) diff --git a/test/lib/setup b/test/lib/setup index d7921bf..7e3f6c3 100755 --- a/test/lib/setup +++ b/test/lib/setup @@ -216,6 +216,7 @@ setup_two_guests() { context_run qemu_1 "/sbin/dhclient -4 --no-pid ${__ifname}" context_run qemu_2 "/sbin/dhclient -4 --no-pid ${__ifname}" + sleep 2 context_run qemu_1 "/sbin/dhclient -6 --no-pid ${__ifname}" context_run qemu_2 "/sbin/dhclient -6 --no-pid ${__ifname}" -- 2.35.1
On Fri, Sep 23, 2022 at 02:53:26AM +0200, Stefano Brivio wrote:I'm not sure why, but dhclient hangs otherwise. This reflects what we do in the passt_in_ns setup steps. Eventually, this whole block could go away if we let pasta configure this network namespace with --config-net.Heh.. I have a patch to do exactly that in the works.Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- test/lib/setup | 1 + 1 file changed, 1 insertion(+) diff --git a/test/lib/setup b/test/lib/setup index d7921bf..7e3f6c3 100755 --- a/test/lib/setup +++ b/test/lib/setup @@ -216,6 +216,7 @@ setup_two_guests() { context_run qemu_1 "/sbin/dhclient -4 --no-pid ${__ifname}" context_run qemu_2 "/sbin/dhclient -4 --no-pid ${__ifname}" + sleep 2 context_run qemu_1 "/sbin/dhclient -6 --no-pid ${__ifname}" context_run qemu_2 "/sbin/dhclient -6 --no-pid ${__ifname}"-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
If we use dhclient without creating a complete network configuration, systemd-resolved will stop working after a while, and this sometimes happens while we're still installing packages. Disable it, together with systemd-networkd, while taking care of removing the dhclient hook that prevents overriding /etc/resolv.conf. While at it, it looks like removing snapd and needrestart actually takes more time than keeping them: drop that line. Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- test/distro/ubuntu | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/test/distro/ubuntu b/test/distro/ubuntu index aa42c99..343fa03 100644 --- a/test/distro/ubuntu +++ b/test/distro/ubuntu @@ -191,7 +191,9 @@ test Ubuntu 22.04 (Jammy Jellyfish), s390x host ./qrap 5 qemu-system-s390x -m 2048 -smp 2 -serial stdio -nodefaults -nographic __BASEPATH__/prepared-jammy-server-cloudimg-s390x.img -net socket,fd=5 -net nic,model=virtio -device virtio-rng-ccw -snapshot host export DEBIAN_FRONTEND=noninteractive -host apt-get -y remove needrestart snapd +host service systemd-networkd stop +host service systemd-resolved stop +host rm /etc/dhcp/dhclient-enter-hooks.d/resolved-enter host dhclient -4 dns_ready_wait host apt-get update -- 2.35.1