Hi Matt,
On Tue, 13 Aug 2024 22:58:42 -0700
Matt Hamilton <matt(a)thmail.io> wrote:
I am using Podman in Fedora 40, which uses pasta
by default for rootless
container networking.
Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`,
but recently two newer versions were released,
`passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
After upgrading, one pod kept going offline after a few minutes. The
containers remained running, but could not make outbound connections.
Journalctl revealed that the pasta process for the pod had crashed with:
Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
(flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
&& side->eport != 0 && side->fport != 0
Aug 08 23:07:55 dev audit[95859]: SECCOMP auid=1000 uid=1000
gid=1000 ses=1
subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
gid=1000 ses=1
subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
res=1
After much debugging, I isolated the trigger to a particular container
making a peer-to-peer TCP connection to a remote address with port 0.
Thanks for
the analysis and for the report!
Reverting passt to version 20240326 works as
expected, and the container
stays online. It's been a long time since I wrote any C, but the code
seems clear and checks that the endpoint and forwarding ports do not
equal 0. I assume that a port 0 connection is not realistic or useful,
and that actual attempt to connect over this port indicate a bug in the
client code. Is this correct?
Right, that's somehow unexpected because TCP
port zero is reserved
and not assigned, so it should never be used. However, I'm not sure how
we can even reach flow_hash() with it.
David, this seems to come from 163a339214dd ("tcp, flow: Replace TCP
specific hash function with general flow hash"), any clue?
Stefano
reproduced, and I've found the issue. The assert was intended
to check that we never created flows with 0 port - and we don't.
Unfortunately it was also invoked when searching for an existing flow
matching a new packet.
Patch coming shortly. Note that this will fix the crash, but it still
won't permit the connection to port 0 to go through. I don't know if
that will allow your application to run, or whether it relies on that
port 0 connection.
Actually allowing the connection to go through would be much harder.
It's easy to remove the explicit checks, obviously, but making sure we
never pass that 0 to an API where it doesn't mean what we want it to
would require some time.