Re: [PATCH 1/1] migrate: Use forward table information to close() listening sockets

5 Feb 2026

On Wed, 4 Feb 2026 21:57:03 +1000
David Gibson  wrote:
...
On Mon, Feb 02, 2026 at 11:12:08PM +0100, Stefano Brivio wrote:
...
On Mon, 2 Feb 2026 10:24:14 +1000
David Gibson  wrote:
...
On Sat, Jan 31, 2026 at 10:47:28AM +0100, Stefano Brivio wrote:
...
On Fri, 30 Jan 2026 16:58:11 +1100
David Gibson  wrote:
...
On incoming migrations we need to bind() reconstructed sockets to their
correct local address.  We can't do this if the origin passt instance is
in the same namespace and still has those addresses bound.  Arguably that's
a bug in bind()s operation during repair mode, but for now we have to work
around it.
So, to allow local-to-local migrations we close() sockets on the outgoing
side as we process them.  In addition to closing the connected socket we
also have to close the associated listen()ing socket, because that can also
cause an address conflict.
To do that, we introduced the listening_sock field in the connection
state, because we had no other way to find the right listening sockets.
Now that we have the forwarding table, we have a complete list of
listening sockets elsewhere.  We can use that instead, to close all
listening sockets on outbound migration, rather than just the ones that
might conflict.
This is cleaner and, importantly, saves a valuable 32-bits in the flow
state structure.  It does mean that there is a longer window where a peer
attempting to connect during migration might get a Connection Refused.
I think this is an acceptable trade-off for now: arguably we should not
allow local-to-local migrations in any case, since the socket closes make
it impossible to safely roll back migration as per the qemu model.
Signed-off-by: David Gibson 
---
 flow.c     | 12 ++++++++++++
 fwd.c      | 21 +++++++++++++++++++++
 fwd.h      |  1 +
 tcp.c      |  9 ---------
 tcp_conn.h |  3 ---
 5 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/flow.c b/flow.c
index fd4d5f38..5207143d 100644
--- a/flow.c
+++ b/flow.c
@@ -1023,6 +1023,9 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret)
debug("...roll back migration");
+	if (fwd_listen_sync(c, &c->tcp.fwd_in, PIF_HOST, IPPROTO_TCP) < 0)
+		die("Failed to re-establish listening sockets");
+
  foreach_established_tcp_flow(flow) {
      if (FLOW_IDX(flow) >= bound)
      	break;
@@ -1147,6 +1150,15 @@ int flow_migrate_source(struct ctx *c, const struct migrate_stage *stage,
Nit: the comment to this function currently says "Send data (flow
table) for flow, close listening". I fixed that up (dropped ", close listening").
Good point, thanks.
...
...
return flow_migrate_source_rollback(c, FLOW_MAX, rc);
  }
+	/* HACK: A local to local migrate will fail if the origin passt has the
+	 * listening sockets still open when the destination passt tries to bind
+	 * them.  This does mean there's a window where we lost our listen()s,
+	 * even if the migration is rolled back later.  The only way to really
+	 * fix that is to not allow local to local migration, which arguably we
+	 * should (use namespaces for testing instead). */
Actually, we already use namespaces in the current tests,
Oh, nice.
...
but we didn't
(always) do that during development, and it might be convenient in
general to have the possibility to test *a part* of the implementation
using the same namespace as long as it's reasonably cheap (it seems to
be).
Depends what cost you're talking about.  It's cheap in terms of
computational complexity, and code compexity.  It means, however, that
we can't necessarily roll back failed migrations - i.e. resume on the
origin system.  That isn't really correct for the qemu migration
model, which is why I think allowing local migrations probably isn't
the best idea, at least by default.
Well it's not entirely true that we can't roll back failed migrations.
Depending on the stage we're at, we might close connections, but other
than that we can always continue, somehow. Losing some connections
looks relatively cheap as well.
I guess that's true. I'm not sure we currently handle this even as
well as is possible within the constraints.
I actually checked a while ago, nothing unexpected happen.
...
...
But sure, it's not ideal. Maybe yes, we shouldn't have that as default,
add a new option and update QEMU's documentation (see below) with
it.
There's two layers to it.  Dropping connections on what's otherwise a
no-op migration failure is ugly.  Opening the possibility that we
might not be able to rebind ports we were already listening on is
worse.
But that would only happen if somebody starts an unrelated process
binding the same ports, right? I'm not sure if it's something we really
need to take care of.
...
...
...
...
That's just a part because anyway bind() and connect() will conflict,
if we're in the same namespace, which is a kernel issue you already
noted:
Well, it's a kernel issue that the bound listen()ing sockets conflict
with the half-constructed flow sockets.  Having the listening sockets
of the origin passt conflict with the listening sockets of the
destination passt is pretty much expected, and would still be an
impediment to local migration.
Ah, right, we don't set listening sockets to repair mode... should we,
by the way?
Uh.. I don't think so?  I'm not really sure how repair mode operates
for listening sockets, or if it should even allow that.  The relevant
state of a listening socket is pretty much limited to its bound
address, so I don't think we need any additional mechanism to extract
state.
I wasn't suggesting that for any functional purpose. Repair mode has no
meaning there.

But it would have the advantage, if we fix/change this in the kernel:

  https://pad.passt.top/p/TcpRepairTodo#L3
  Repair mode sockets should not have address conflicts with
  non-repair sockets (both bind() and connect())

that bind() calls wouldn't conflict with bound sockets in repair mode,
and let us test a bit more of local migration.

On the other hand I'm not sure what should happen once you bring a
socket, which was originally bound to a given port, out of repair mode,
once there's a new socket bound to it.

-- 
Stefano