Re: [PATCH] tap: Drop frames if no client connected

22 Sep 2025

      On Mon, 22 Sep 2025 15:17:12 +0800
Yumei Huang  wrote:
...
On Fri, Sep 19, 2025 at 9:38 AM David Gibson
 wrote:
...
On Thu, Sep 18, 2025 at 09:17:14AM +0200, Stefano Brivio wrote:
...
On Thu, 18 Sep 2025 14:28:37 +1000
David Gibson  wrote:
...
On Mon, Sep 15, 2025 at 08:13:19AM +0200, Stefano Brivio wrote:
...
On Fri, 12 Sep 2025 12:01:37 +1000
David Gibson  wrote:
...
On Thu, Sep 11, 2025 at 11:54:25AM +0200, Stefano Brivio wrote:  
> On Thu, 11 Sep 2025 16:55:19 +0800
> Yumei Huang  wrote:
>  
> > If no client is attached, discard outgoing frames and report them as
> > sent. This mimics the behavior of a physical host with its network
> > cable unplugged.
> >
> > Suggested-by: David Gibson 
> > Signed-off-by: Yumei Huang   
>
> Thanks, the fix itself obviously makes sense, but I have a few questions
> and comments:
>
> - first off, what happens if we don't return early in tap_send_frames()?
>   Commit messages for fixes (assuming this is a fix) should always say
>   what concrete problem we had, what is going to be fixed, or if we're
>   not aware of any real issue but things are just fragile / wrong
Without this we will get an EBADF in either writev() (pasta) or
sendmsg() (passt).  That's basically harmless, but a bit ugly.
Explicitly catching this case results in behaviour that's probably a
bit clearer to debug if we hit it.
Putting that context in the commit message would be useful.
> - until a while ago, this couldn't happen at all. We were just blocking
>   the whole execution as long as the tap / guest / container interface
>   wasn't up and running.
>
>   I wonder when this changed and if it makes sense to go back to the
>   previous behaviour. I had just a quick look and I wonder if I
>   accidentally broke this in c9b241346569 ("conf, passt, tap: Open
>   socket and PID files before switching UID/GID").
>
>   Before that, main() would call tap_sock_init(), which would call
>   tap_sock_unix_open(), a blocking function.
>
>   Should we make the whole thing blocking again? If not, is there
>   anything else that's breaking with that? Timers, other inputs, etc.
I don't think we can quite do that.  I'm not sure if it's the only
reason, but for vhost-user I believe we need the epoll loop up and
running before we have the tap connection fully set up, because we
need it to process the vhost-user control messages.  Laurent, can you
verify?
We discussed this in the past, before realising that the execution
continues for whatever reason, and probably before I broke the
assumption that guest connection was blocking.
Yes, in the vhost-user case, the epoll loop needs to run before we have
a working connection to the guest, but:
- we can anyway block until the control socket is set up (we used to do
  that)
The vhost-user control socket?  I'm not entirely sure what you mean by
"block" here.  Since we need the epoll loop up, I don't see how we can
block in the conventional sense.
Let's rather say "until the data setup is complete".
And by "block", I mean we would ignore any other event, obviously we
have to listen to the control socket (in the main loop or in a separate
dedicated loop).
Ok.  We could do that.  I don't think the peer visible behaviour would
really be different from what we get now silently dropping frames to
tap.  I'm not convinced it's really simpler than the current approach
either:
* For now, we could just skip all epoll handling if the event type
   isn't the control socket, but we'd need to be finer grained about
   this if we add anything else that needs handling before guest
   connection (e.g. dynamic configuration update mechanism and/or
   netlink monitor)
* Ignoring events in that way could lead to us busy-looping on epoll,
   because we might not clear events.  So we're back to having to
   consider every event type, at least to some extent.
...
I'm not suggesting we do this though (see below). It's just a
possibility.
...
...
- the vhost-user implementation autonomously throws data away received
  before that point
Right.  It doesn't have anywhere to put it, so it doesn't have much
choice.
...
Now, I don't think we necessarily need to stick to that approach, it
was the obvious choice when passt was much simpler, and it keeps things
simple in the sense that we don't need to care about cases like the
ones this patch is addressing.
On the other hand, if we want to switch to a different model, we need
to have a look at other possible breakages, I guess.
...
There are several different approaches we can take here.  I discussed
some with Yumei and suggested she take this one.  Here's some
reasoning (maybe this would also be useful in the commit message,
though it's rather bulky)
# Don't listen() until the tap connection is ready
- It's not clear that the host rejecting the connection is better
   than the host accepting, then the connection stalling until the
   guest is ready.
 - Would require substantial rework because we currently listen() as
   we parse the command line and don't store the information we'd need
   to do it later.
Right, that looks like a lot of effort for nothing.
...
# Don't accept() until the tap connection is ready
- To the peer, will behave basically the same as this patch - the
   host will complete the TCP handshake, then the connection will stall
   until the guest is ready.
Same here.
...
- More work to implement, because essentially every sock-side handler
   has to check fd_tap and abort early
There's one substantial issue at TCP level, though, that we're keeping
with the current approach and with this patch: we'll accept inbound
connections and silently stall them.
We could mitigate that by making the TCP handler aware of this, and by
resetting the connection if the guest isn't there. This would at least
be consistent with the case where the guest isn't listening on the port
(we accept(), fail to connect to it, eventually call tcp_rst()).
True.  Arguably less consistent with a non-passt-connected peer that's
not there though.  Plus with the silently stall approach we have a
chance that the TCP connection will recover if the guest attaches
reasonably soon.
...
If we don't do this, I think we should at least check what happens in
terms of race conditions between passt starting and the guest appearing
and accepting the connection. I guess we'll retry for a bit, which is
desirable, but we should check that the whole retrying thing actually
works.
That's because the current approach just happened by accident.
Right.  I'm not entirely sure what concrete action you're suggesting
at this point, though.
What I suggested in Monday's call and seemed to be all agreed upon, and
also mentioned above: *check what happens*.
Try that case, with this patch.
Does it work to cover situations where users might start passt a bit
before the guest connects, and try to connect to services right away?
I suggested using ssh which should have a quite long timeout and retry
connecting for a while. You mentioned you would assist Yumei in testing
this if needed.
Ah, yes, you're right and I'd forgotten that.  Following up today.
I tried both 'ssh' and 'socat'(writing a big file) before a guest
connects, they get a 'Connection reset' after 10s, even if the guest
connects in ~2s.
It's because, when start ssh or socat, passt would try to finish the
tcp handshake with the guest. It sends SYN to the guest immediately
and waits for SYN-ACK. However, the SYN frame is dropped/lost due to
no guest connected. So though the guest connects in seconds, the tcp
handshake would timeout, and returns rst via tcp_rst().
Ah, right. We won't try to resend the SYN, that's simply not
implemented.

The timeout you see is SYN_TIMEOUT, timer set by tcp_timer_ctl() and
handled by tcp_timer_handler().
...
Either with or without this patch, they got the same 'connection reset'.
Maybe it's something to fix?
First off, this shows that the current patch is harmless, so I would go
ahead and apply it (but see 2. below).

Strictly speaking, I don't think we really *need* to fix anything, but
for sure the behaviour isn't ideal. I see two alternatives:

1. we implement a periodic retry for the SYN segment. This would *seem*
   to give the best behaviour in this case, but:

   a. it's quite complicated (we need to calculate some delays for the
      timers, etc.), and not really transparent (which is in general a
      goal of passt)

   b. if the guest never appears, we're just wasting client's time. See
      db2c91ae86c7 ("tcp: Set ACK flag on *all* RST segments, even for
      client in SYN-SENT state") for an example where it's important to
      fail fast

   c. if the guest appears but isn't listening to the port, see b.

2. reset right away as I was suggesting in
   https://archives.passt.top/passt-dev/20250915081319.00e72e53@elisabeth/:
...
We could mitigate that by making the TCP handler aware of this, and by
resetting the connection if the guest isn't there. This would at least
be consistent with the case where the guest isn't listening on the port
(we accept(), fail to connect to it, eventually call tcp_rst()).
and let the client retry as appropriate (if implemented). Those retries
   can be quite fast, see this report (from IRC) for 722d347c1932 ("tcp:
   Don't reset outbound connection on SYN retries"):

3.3223:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00000001)
3.3223:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00000001)
3.3224:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 packet)
3.3224:          Flow 0 (NEW): FREE -> NEW
3.3224:          Flow 0 (INI): NEW -> INI
3.3224:          Flow 0 (INI): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => ?
3.3224:          Flow 0 (TGT): INI -> TGT
3.3224:          Flow 0 (TGT): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => HOST [0.0.0.0]:0 -> [192.0.0.1]:80
3.3224:          Flow 0 (TCP connection): TGT -> TYPED
3.3224:          Flow 0 (TCP connection): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => HOST [0.0.0.0]:0 -> [192.0.0.1]:80
3.3224:          Flow 0 (TCP connection): event at tcp_conn_from_tap:1489
3.3224:          Flow 0 (TCP connection): TAP_SYN_RCVD: CLOSED -> SYN_SENT
3.3224:          Flow 0 (TCP connection): failed to set TCP_MAXSEG on socket 21
3.3224:          Flow 0 (TCP connection): Side 0 hash table insert: bucket: 294539
3.3225:          Flow 0 (TCP connection): TYPED -> ACTIVE
3.3225:          Flow 0 (TCP connection): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => HOST [0.0.0.0]:0 -> [192.0.0.1]:80
4.0027:          pasta: epoll event on namespace timer watch 17 (events: 0x00000001)
4.3612:          pasta: epoll event on /dev/net/tun device 18 (events: 0x00000001)
4.3613:          tap: protocol 6, 192.168.122.14:55532 -> 192.0.0.1:80 (1 packet)
4.3613:          Flow 0 (TCP connection): packet length 40 from tap
4.3613:          Flow 0 (TCP connection): TCP reset at tcp_tap_handler:1989
4.3613:          Flow 0 (TCP connection): flag at tcp_prepare_flags:1163
4.3613:          Flow 0 (TCP connection): event at tcp_rst_do:1206
4.3613:          Flow 0 (TCP connection): CLOSED: SYN_SENT -> CLOSED
4.3614:          Flow 0 (TCP connection): Side 0 hash table remove: bucket: 294539
4.3614:          Flow 0 (FREE): ACTIVE -> FREE
4.3614:          Flow 0 (FREE): TAP [192.168.122.14]:55532 -> [192.0.0.1]:80 => HOST [0.0.0.0]:0 -> [192.0.0.1]:80

   ...the retry happened within one second. This is a container, so Linux
   kernel, and the client was wget.

So, in the end, I would suggest going with 2.: check if the guest /
container is connected in the TCP handler (tcp_data_from_sock()) and
reset the connection if it's not.

I would suggest checking that together with this patch. They would
still be two different patches, but I think it would be good to
check / test what happens with both of them.

-- 
Stefano