On Wed, 24 Sep 2025 11:49:28 +1000
David Gibson
So... summarising. As I see it, we have two main cases to consider: the one where the guest comes online pretty soon, and the one where it doesn't. Here's what I think the behaviour would be for these two cases with a variety of ways of handling it. This is more-or-less from the peer's perspective.
(0) Physicaly disconnected guest (bridged network, no passt involved)
(0a) Guest online never SYN ... SYN ... SYN ... <peer times out>
(0b) Guest online soonish SYN ... SYN ... SYN-ACK, ACK <working connection>
(1) Status quo
Passt doesn't resend SYNs, and will time out the connection after 10s.
(1a) Guest online never SYN, SYN-ACK, ACK ... ... ... ... <passt times out> RST
(0b) Guest online soonish SYN, SYN-ACK, ACK ... ... ... ... <passt times out> RST
(2) Yumei's patch
As (1), but without EBADFs
(3) passt resends SYNs
(3a) Guest online never SYN, SYN-ACK, ACK ... ... ... ... ... <passt times out> RST
(3b) Guest online soonish SYN, SYN-ACK, ACK ... ... ... ... <working connection>
(4) Passt resends SYNs + Yumei's patch
As (3), but without EBADFs
(5) passt explicitly resets when guest is not present
(6a) Guest online never SYN, SYN-ACK, ACK, RST
(6b) Guest online soonish SYN, SYN-ACK, ACK, RST
(6) Delayed listen()
(6a) Guest online never SYN, RST
(6b) Guest online soonish SYN, RST
(99) Bridged guest isn't listening (no passt)
(99a) Guest online never SYN, RST
(99b) Guest online soonish SYN, RST
=====
It all makes sense, thanks for summarising those.
So, if (99) is our model, we can match it pretty exactly with delayed listen(). But if (0) is our model, the closest we can get is (3) or (4), which I think will look fairly similar to peer application, even though it looks different to the peer TCP stack.
I think (0) is a better model, because it means we won't reset connections if they happen to land when a still running guest has its connection to passt temporarily interrupted.
Which brings me, I think, to the same conclusion you had: we should resend SYNs.
Suggested next steps: - Apply Yumei's patch, it doesn't change behaviour and removes the odd EBADFs - Yumei investigates implementing SYN resends
Right, that also makes sense to me. For the second part, we could probably reuse a mechanism similar to what we do for re-transmits, and perhaps rename 'retrans' in struct tcp_tap_conn to 'retries', so that we can use it for both (we're a bit tight on space there). -- Stefano