On Thu, 25 Sep 2025 13:08:35 +0800
Yumei Huang
On Wed, Sep 24, 2025 at 5:56 PM Stefano Brivio
wrote: On Wed, 24 Sep 2025 11:49:28 +1000 David Gibson
wrote: So... summarising. As I see it, we have two main cases to consider: the one where the guest comes online pretty soon, and the one where it doesn't. Here's what I think the behaviour would be for these two cases with a variety of ways of handling it. This is more-or-less from the peer's perspective.
(0) Physicaly disconnected guest (bridged network, no passt involved)
(0a) Guest online never SYN ... SYN ... SYN ... <peer times out>
(0b) Guest online soonish SYN ... SYN ... SYN-ACK, ACK <working connection>
(1) Status quo
Passt doesn't resend SYNs, and will time out the connection after 10s.
(1a) Guest online never SYN, SYN-ACK, ACK ... ... ... ... <passt times out> RST
(0b) Guest online soonish SYN, SYN-ACK, ACK ... ... ... ... <passt times out> RST
(2) Yumei's patch
As (1), but without EBADFs
(3) passt resends SYNs
(3a) Guest online never SYN, SYN-ACK, ACK ... ... ... ... ... <passt times out> RST
(3b) Guest online soonish SYN, SYN-ACK, ACK ... ... ... ... <working connection>
(4) Passt resends SYNs + Yumei's patch
As (3), but without EBADFs
(5) passt explicitly resets when guest is not present
(6a) Guest online never SYN, SYN-ACK, ACK, RST
(6b) Guest online soonish SYN, SYN-ACK, ACK, RST
(6) Delayed listen()
(6a) Guest online never SYN, RST
(6b) Guest online soonish SYN, RST
(99) Bridged guest isn't listening (no passt)
(99a) Guest online never SYN, RST
(99b) Guest online soonish SYN, RST
=====
It all makes sense, thanks for summarising those.
So, if (99) is our model, we can match it pretty exactly with delayed listen(). But if (0) is our model, the closest we can get is (3) or (4), which I think will look fairly similar to peer application, even though it looks different to the peer TCP stack.
I think (0) is a better model, because it means we won't reset connections if they happen to land when a still running guest has its connection to passt temporarily interrupted.
Which brings me, I think, to the same conclusion you had: we should resend SYNs.
Suggested next steps: - Apply Yumei's patch, it doesn't change behaviour and removes the odd EBADFs - Yumei investigates implementing SYN resends
Right, that also makes sense to me.
Glad we reached an agreement here. BTW, in case you missed it, the v2 patch was sent as https://archives.passt.top/passt-dev/20250912081705.20796-1-yuhuang@redhat.c....
I never miss patches. :) No worries, I just got a few interruptions in a row but I plan to apply it soon.
For the second part, we could probably reuse a mechanism similar to what we do for re-transmits, and perhaps rename 'retrans' in struct tcp_tap_conn to 'retries', so that we can use it for both (we're a bit tight on space there).
I got an initial thought about calling tcp_send_flag() in tcp_flow_defer(). But it seems not working. Trying to figure that out..
That might work, even though, I guess, the most natural alternative would be to change the handling of an expired SYN_TIMEOUT in tcp_timer_handler(). Look at this case: } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { flow_dbg(conn, "handshake timeout"); ...it should become a bit more like this one: } else { flow_dbg(conn, "ACK timeout, retry"); conn->retrans++; ... where we retry for a few times, before resetting the connection. With timers, you already have timed triggers, as opposed to trying things out periodically from tcp_flow_defer(). -- Stefano