I alluded to this in my last patchset, but here's the information I've gathered on the current problem I'm hitting running the passt distro tests. It's pretty weird. On at least two occasions the tests have stalled during the Fedora 30, aarch64 test, with the guest getting a timeout downloading the package lists before installing new packages. For me this is selecting the mirror mirror.2degrees.nz. I'm not sure which factors are are relevant to reproducing the problem though. * The problem seems to me that the download suddenly stops progressing * partway through, causing dnf to eventually time out * If I manually try a "dnf clean all && dnf makecache -v" using the same guest image, it doesn't fail every time, but it fails significantly more often than not * It doesn't fail on the same file every time * I haven't been able to reproduce manually downloading the failing file with curl (tried repeatedly) * If I restrict dnf to a single repository rather than the whole set, I haven't managed to reproduce the problem * If I use qemu's -net user slirp instead of passt with the same disk image , I haven't been able to reproduce the problem (tried a bunch of times) * I've reproduced with the guest using both IPv4 and IPv6 * I have reproduced what looks like the same problem with an x86 guest image under KVM (also Fedora 30), but it seems to happen much less often (seen once in 10 or more attempts) * Seems to reproduce fairly readily with an x86 guest under TCG though, so I'm guessing the difference is timing related. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Thu, 23 Jun 2022 15:59:20 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:I alluded to this in my last patchset, but here's the information I've gathered on the current problem I'm hitting running the passt distro tests. It's pretty weird. On at least two occasions the tests have stalled during the Fedora 30, aarch64 test, with the guest getting a timeout downloading the package lists before installing new packages. For me this is selecting the mirror mirror.2degrees.nz. I'm not sure which factors are are relevant to reproducing the problem though. * The problem seems to me that the download suddenly stops progressing * partway through, causing dnf to eventually time out * If I manually try a "dnf clean all && dnf makecache -v" using the same guest image, it doesn't fail every time, but it fails significantly more often than not * It doesn't fail on the same file every time * I haven't been able to reproduce manually downloading the failing file with curl (tried repeatedly) * If I restrict dnf to a single repository rather than the whole set, I haven't managed to reproduce the problemIf I remember correctly, dnf downloads from multiple repositories at the same time, which might explain these two points.* If I use qemu's -net user slirp instead of passt with the same disk image , I haven't been able to reproduce the problem (tried a bunch of times) * I've reproduced with the guest using both IPv4 and IPv6 * I have reproduced what looks like the same problem with an x86 guest image under KVM (also Fedora 30), but it seems to happen much less often (seen once in 10 or more attempts) * Seems to reproduce fairly readily with an x86 guest under TCG though, so I'm guessing the difference is timing related....hmm, I never hit this, and I guess our versions of qemu eventually crossed at some point -- I'm using 7.0.50 (v7.0.0-937-gd6900f445e) right now. Passing a capture file via --pcap for that instance of passt (started at the beginning of fedora/tests) might help shine some light on this. You could also add run ./test with PCAP=1, but that would capture everything, which will take a ton of space. -- Stefano