On Tue, Aug 13, 2024 at 10:58:42PM -0700, Matt
Hamilton wrote:
I am using Podman in Fedora 40, which uses pasta
by default for rootless
container networking.
Fedora 40's base version of passt is `passt-0^20240326.g4988e2b-1.fc40`, but
recently two newer versions were released,
`passt-0^20240726.g57a21d2-1.fc40` and `0^20240806.gee36266-1.fc40`.
After upgrading, one pod kept going offline after a few minutes. The
containers remained running, but could not make outbound connections.
Journalctl revealed that the pasta process for the pod had crashed with:
Aug 08 23:07:55 dev pasta[95859]: ASSERTION FAILED in flow_hash
(flow.c:566): pif != PIF_NONE && !inany_is_unspecified(&side->eaddr)
&& side->eport != 0 && side->fport != 0
Ouch.
Aug 08 23:07:55 dev audit[95859]: SECCOMP
auid=1000 uid=1000
gid=1000 ses=1
subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
arch=c000003e syscall=186 compat=0 ip=0x7f8f8c23b64f code=0x80000000
Aug 08 23:07:55 dev audit[95859]: ANOM_ABEND auid=1000 uid=1000
gid=1000 ses=1
subj=unconfined_u:unconfined_r:container_runtime_t:s0-s0:c0.c1023
pid=95859 comm="pasta.avx2" exe="/usr/bin/pasta.avx2" sig=31
res=1
After much debugging, I isolated the trigger to a particular container
making a peer-to-peer TCP connection to a remote address with port 0.
Huh.
Reverting passt to version 20240326 works as
expected, and the container
stays online. It's been a long time since I wrote any C, but the code seems
clear and checks that the endpoint and forwarding ports do not equal 0. I
assume that a port 0 connection is not realistic or useful, and that actual
attempt to connect over this port indicate a bug in the client code. Is this
correct?
So, AFAICT the RFCs don't preclude using port 0 for connections on
the
wire. However, it's usually not really sensible to do so: at least on
systems with a BSD-like socket interface, a port of 0 usually means
"unspecified" or "kernel, please pick for me". Obviously this
client
is making it happen - my guess would be that a 0 port in connect() is
interpreted as a literal port 0, but I'm not sure how the server is
receiving it in thie case, since a bind() with port 0 will cause the
kernel to pick a port.
So, it does look like the client is doing something weird, although
whether it's technically invalid is debateable.
Even if it is valid for the client to do this, pasta can't really
handle that case, because it's using the sockets interface to do the
forwarding. BUT, it absolutely should not be crashing - it should log
a debug message, drop the connection and carry on.
We have code which is supposed to handle this case gracefully before
reaching that assertion. I'm not immediately sure why that's not working.
One possibility is that the client _isn't_ doing something weird, but
an unusual port forwarding configuration on pasta is remapping a
sensible port to port 0, thus causing the crash.
Getting the full podman command line for the failing container would
be the next step here. If you could file a bug at
https://bugs.passt.top that would be most helpful.
I tried to make an account on bugzilla a day or two ago, but haven't
received the email confirmation link - I tried signing up using my
personal domain (used here) and a free service (gmail). I came here as a
second attempt to reach the devs!
If you can get me hooked up over there, I can file a bug with more
detailed logs and the podman command to reproduce.