On Thu, 26 Jan 2023 00:21:33 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:On Wed, 25 Jan 2023 14:13:44 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:This turned out to be a combination of three different issues: - left-over patches in my local qemu tree (and build) trying to address the virtio-net TX hang ultimately fixed by kernel commit d71ebe8114b4 ("virtio-net: correctly enable callback during start_xmit"). I'm using the latest upstream now, clean - the issue you reported at https://bugs.passt.top/show_bug.cgi?id=41, I just posted a patch for it - the issue introduced by "tcp: Combine two parts of pasta tap send path together", patch also posted With these three sorted, finally I could apply this series! Apologies for the delay. -- StefanoOn Tue, Jan 24, 2023 at 10:20:43PM +0100, Stefano Brivio wrote:[...]On Fri, 6 Jan 2023 11:43:04 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Drat, I didn't encounter that. Any chance you could bisect to figure out which patch specifically seems to trigger it?Although we have an abstraction for the "slow path" (DHCP, NDP) guest bound packets, the TCP and UDP forwarding paths write directly to the tap fd. However, it turns out how they send frames to the tap device is more similar than it originally appears. This series unifies the low-level tap send functions for TCP and UDP, and makes some clean ups along the way. This is based on my earlier outstanding series.For some reason, performance tests consistently get stuck (both TCP and UDP, sometimes throughput, sometimes latency tests) with this series, and not without it, but I don't see any possible relationship with that.I wonder if this could be related to the stalls I'm debugging, although those didn't appear on the perf tests and also occur on main. I have now discovered they seem to be masked by large socket buffer sizes - more info at https://bugs.passt.top/show_bug.cgi?id=41Maybe the subsequent failures (or even this one) could actually be related, and triggered somehow by some change in timing. I'm still clueless at the moment.