On Wed, Aug 14, 2024 at 08:40:22AM +0200, Stefano Brivio wrote:
Hi Matt,
On Tue, 13 Aug 2024 22:58:42 -0700 Matt Hamilton
wrote: I am using Podman in Fedora 40, which uses pasta by default for rootless container networking.
Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`, but recently two newer versions were released, `passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
After upgrading, one pod kept going offline after a few minutes. The containers remained running, but could not make outbound connections. Journalctl revealed that the pasta process for the pod had crashed with:
Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash (flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr) && side->eport != 0 && side->fport != 0 Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023 pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000 Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=1 subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023 pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31 res=1
After much debugging, I isolated the trigger to a particular container making a peer-to-peer TCP connection to a remote address with port 0.
Thanks for the analysis and for the report!
Reverting passt to version 20240326 works as expected, and the container stays online. It's been a long time since I wrote any C, but the code seems clear and checks that the endpoint and forwarding ports do not equal 0. I assume that a port 0 connection is not realistic or useful, and that actual attempt to connect over this port indicate a bug in the client code. Is this correct?
Right, that's somehow unexpected because TCP port zero is reserved and not assigned, so it should never be used. However, I'm not sure how we can even reach flow_hash() with it.
David, this seems to come from 163a339214dd ("tcp, flow: Replace TCP specific hash function with general flow hash"), any clue?
Stefano reproduced, and I've found the issue. The assert was intended to check that we never created flows with 0 port - and we don't. Unfortunately it was also invoked when searching for an existing flow matching a new packet. Patch coming shortly. Note that this will fix the crash, but it still won't permit the connection to port 0 to go through. I don't know if that will allow your application to run, or whether it relies on that port 0 connection. Actually allowing the connection to go through would be much harder. It's easy to remove the explicit checks, obviously, but making sure we never pass that 0 to an API where it doesn't mean what we want it to would require some time. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson