On Wed, 9 Oct 2024 11:07:07 +0200 Laurent Vivier <lvivier(a)redhat.com> wrote:This series of patches adds vhost-user support to passt and then allows passt to connect to QEMU network backend using virtqueue rather than a socket. With QEMU, rather than using to connect: -netdev stream,id=s,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket we will use: -chardev socket,id=chr0,path=/tmp/passt_1.socket -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0 The memory backend is needed to share data between passt and QEMU. Performance comparison between "-netdev stream" and "-netdev vhost-user":On my setup, with a few tweaks (don't ask me why... we should figure out eventually): -- diff --git a/test/perf/passt_vu_tcp b/test/perf/passt_vu_tcp index b434008..76bdd48 100644 --- a/test/perf/passt_vu_tcp +++ b/test/perf/passt_vu_tcp @@ -38,10 +38,10 @@ hout FREQ_PROCFS (echo "scale=1"; sed -n 's/cpu MHz.*: \([0-9]*\)\..*$/(\1+10^2\ hout FREQ_CPUFREQ (echo "scale=1"; printf '( %i + 10^5 / 2 ) / 10^6\n' $(cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq) ) | bc -l hout FREQ [ -n "__FREQ_CPUFREQ__" ] && echo __FREQ_CPUFREQ__ || echo __FREQ_PROCFS__ -set THREADS 4 +set THREADS 4-6 set TIME 5 set OMIT 0.1 -set OPTS -Z -P __THREADS__ -l 1M -O__OMIT__ -N +set OPTS -Z -O__OMIT__ -N info Throughput in Gbps, latency in µs, __THREADS__ threads at __FREQ__ GHz report passt_vu tcp __THREADS__ __FREQ__ @@ -55,16 +55,16 @@ iperf3s ns 10002 bw - bw - guest ip link set dev __IFNAME__ mtu 1280 -iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 16M +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 16M -l 1M -P 4 bw __BW__ 1.2 1.5 guest ip link set dev __IFNAME__ mtu 1500 -iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 32M +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 32M -l 1M -P 4 bw __BW__ 1.6 1.8 guest ip link set dev __IFNAME__ mtu 9000 -iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 64M +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 64M -l 1M -P 4 bw __BW__ 4.0 5.0 guest ip link set dev __IFNAME__ mtu 65520 -iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 64M +iperf3 BW guest __MAP_NS6__ 10002 __TIME__ __OPTS__ -w 256M -l 1M -P 4 bw __BW__ 7.0 8.0 iperf3k ns @@ -93,22 +93,22 @@ tr TCP throughput over IPv4: guest to host iperf3s ns 10002 guest ip link set dev __IFNAME__ mtu 256 -iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 2M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 2M -l 1M -P 4 bw __BW__ 0.2 0.3 guest ip link set dev __IFNAME__ mtu 576 -iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 4M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 4M -l 1M -P 4 bw __BW__ 0.5 0.8 guest ip link set dev __IFNAME__ mtu 1280 -iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 8M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 8M -l 1M -P 4 bw __BW__ 1.2 1.5 guest ip link set dev __IFNAME__ mtu 1500 -iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 16M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 16M -l 1M -P 4 bw __BW__ 1.6 1.8 guest ip link set dev __IFNAME__ mtu 9000 -iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 64M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 64M -l 1M -P 4 bw __BW__ 4.0 5.0 guest ip link set dev __IFNAME__ mtu 65520 -iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 64M +iperf3 BW guest __MAP_NS4__ 10002 __TIME__ __OPTS__ -w 256M -l 1M -P 4 bw __BW__ 7.0 8.0 iperf3k ns @@ -145,7 +145,7 @@ bw - bw - bw - bw - -iperf3 BW ns ::1 10001 __TIME__ __OPTS__ -w 32M +iperf3 BW ns ::1 10001 __TIME__ __OPTS__ -w 256M -l 16k -P 6 bw __BW__ 6.0 6.8 iperf3k guest @@ -181,7 +181,7 @@ bw - bw - bw - bw - -iperf3 BW ns 127.0.0.1 10001 __TIME__ __OPTS__ -w 32M +iperf3 BW ns 127.0.0.1 10001 __TIME__ __OPTS__ -w 256M -l 16k -P 6 bw __BW__ 6.0 6.8 iperf3k guest -- I'm getting an even bigger improvement in throughput (and also significantly lower latency). Non-vhost-user first: -- === perf/passt_tcppasst: throughput and latencyThroughput in Gbps, latency in µs, 4 threads at 3.6 GHz MTU: | 256B | 576B | 1280B | 1500B | 9000B | 65520B | |--------|--------|--------|--------|--------|--------| TCP throughput over IPv6: guest to host | - | - | 6.3 | 6.8 | 18.4 | 21.4 | TCP RR latency over IPv6: guest to host | - | - | - | - | - | 52 | TCP CRR latency over IPv6: guest to host | - | - | - | - | - | 141 | |--------|--------|--------|--------|--------|--------| TCP throughput over IPv4: guest to host | 0.8 | 3.0 | 5.6 | 7.4 | 19.6 | 21.3 | TCP RR latency over IPv4: guest to host | - | - | - | - | - | 58 | TCP CRR latency over IPv4: guest to host | - | - | - | - | - | 132 | |--------|--------|--------|--------|--------|--------| TCP throughput over IPv6: host to guest | - | - | - | - | - | 18.0 | TCP RR latency over IPv6: host to guest | - | - | - | - | - | 50 | TCP CRR latency over IPv6: host to guest | - | - | - | - | - | 115 | |--------|--------|--------|--------|--------|--------| TCP throughput over IPv4: host to guest | - | - | - | - | - | 17.8 | TCP RR latency over IPv4: host to guest | - | - | - | - | - | 60 | TCP CRR latency over IPv6: host to guest | - | - | - | - | - | 94 | '--------'--------'--------'--------'--------'--------' ...passed. === perf/passt_udppasst: throughput and latencyThroughput in Gbps, latency in µs, 2 threads at 3.6 GHz pktlen: | 256B | 576B | 1280B | 1500B | 9000B | 65520B | |--------|--------|--------|--------|--------|--------| UDP throughput over IPv6: guest to host | - | - | 3.4 | 4.1 | 12.3 | 18.2 | UDP RR latency over IPv6: guest to host | - | - | - | - | - | 49 | |--------|--------|--------|--------|--------|--------| UDP throughput over IPv4: guest to host | 0.8 | 1.9 | 3.7 | 4.0 | 11.1 | 17.2 | UDP RR latency over IPv4: guest to host | - | - | - | - | - | 52 | |--------|--------|--------|--------|--------|--------| UDP throughput over IPv6: host to guest | - | - | 2.6 | 3.1 | 5.4 | 17.9 | UDP RR latency over IPv6: host to guest | - | - | - | - | - | 48 | |--------|--------|--------|--------|--------|--------| UDP throughput over IPv4: host to guest | 0.9 | 2.3 | 5.6 | 7.4 | 12.8 | 16.6 | UDP RR latency over IPv4: host to guest | - | - | - | - | - | 48 | '--------'--------'--------'--------'--------'--------' ...passed. [...] === perf/passt_vu_tcppasst: throughput and latencyThroughput in Gbps, latency in µs, 4-6 threads at 3.6 GHz MTU: | 256B | 576B | 1280B | 1500B | 9000B | 65520B | |--------|--------|--------|--------|--------|--------| TCP throughput over IPv6: guest to host | - | - | 8.2 | 10.1 | 14.7 | 22.3 | TCP RR latency over IPv6: guest to host | - | - | - | - | - | 30 | TCP CRR latency over IPv6: guest to host | - | - | - | - | - | 88 | |--------|--------|--------|--------|--------|--------| TCP throughput over IPv4: guest to host | 1.2 | 5.3 | 9.2 | 10.1 | 18.5 | 23.7 | TCP RR latency over IPv4: guest to host | - | - | - | - | - | 31 | TCP CRR latency over IPv4: guest to host | - | - | - | - | - | 93 | |--------|--------|--------|--------|--------|--------| TCP throughput over IPv6: host to guest | - | - | - | - | - | 42.1 | TCP RR latency over IPv6: host to guest | - | - | - | - | - | 30 | TCP CRR latency over IPv6: host to guest | - | - | - | - | - | 88 | |--------|--------|--------|--------|--------|--------| TCP throughput over IPv4: host to guest | - | - | - | - | - | 48.8 | TCP RR latency over IPv4: host to guest | - | - | - | - | - | 35 | TCP CRR latency over IPv6: host to guest | - | - | - | - | - | 79 | '--------'--------'--------'--------'--------'--------' ...passed. === perf/passt_vu_udppasst: throughput and latencyThroughput in Gbps, latency in µs, 2 threads at 3.6 GHz pktlen: | 256B | 576B | 1280B | 1500B | 9000B | 65520B | |--------|--------|--------|--------|--------|--------| UDP throughput over IPv6: guest to host | - | - | 2.2 | 2.6 | 14.1 | 33.4 | UDP RR latency over IPv6: guest to host | - | - | - | - | - | 32 | |--------|--------|--------|--------|--------|--------| UDP throughput over IPv4: guest to host | 0.4 | 1.1 | 2.6 | 2.3 | 13.6 | 28.9 | UDP RR latency over IPv4: guest to host | - | - | - | - | - | 31 | |--------|--------|--------|--------|--------|--------| UDP throughput over IPv6: host to guest | - | - | 3.1 | 3.7 | 18.7 | 29.4 | UDP RR latency over IPv6: host to guest | - | - | - | - | - | 35 | |--------|--------|--------|--------|--------|--------| UDP throughput over IPv4: host to guest | 0.5 | 1.3 | 3.3 | 3.8 | 18.6 | 37.5 | UDP RR latency over IPv4: host to guest | - | - | - | - | - | 35 | '--------'--------'--------'--------'--------'--------' ...passed. -- passt is CPU-bound only on host-to-guest tests. But there, iperf3 seems to actually use more CPU time than passt itself. -- Stefano