On Tue, 20 May 2025 17:09:44 +0200
Eugenio Perez Martin
[...]
Now if I isolate the vhost kernel thread [1] I get way more performance as expected: - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 43.1 GBytes 37.1 Gbits/sec 0 sender [ 5] 0.00-10.04 sec 43.1 GBytes 36.9 Gbits/sec receiver
After analyzing perf output, rep_movs_alternative is the most called function in the three iperf3 (~20%Self), passt.avx2 (~15%Self) and vhost (~15%Self)
Interesting... s/most called function/function using the most cycles/, I suppose. So it looks somewhat similar to https://archives.passt.top/passt-dev/20241017021027.2ac9ea53@elisabeth/ now?
But I don't see any of them consuming 100% of CPU in top: pasta consumes ~85% %CPU, both iperf3 client and server consumes 60%, and vhost consumes ~53%.
So... I have mixed feelings about this :). By "default" it seems to have less performance, but my test is maybe too synthetic.
Well, surely we can't ask Podman users to pin specific stuff to given CPU threads. :)
There is room for improvement with the mentioned optimizations so I'd continue applying them, continuing with UDP and TCP zerocopy, and developing zerocopy vhost rx.
That definitely makes sense to me.
With these numbers I think the series should not be merged at the moment. I could send it as RFC if you want but I've not applied the comments the first one received, POC style :).
I don't think it's really needed for you to spend time on semi-polishing something just to have an RFC if you're still working on it. I guess the implementation will change substantially anyway once you factor in further optimisations. -- Stefano