On Mon, 3 Feb 2025 12:55:47 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Fri, Jan 31, 2025 at 08:39:50PM +0100, Stefano Brivio wrote:Right, exactly that.On migration, the source process asks passt-helper to set TCP sockets in repair mode, dumps the information we need to migrate connections, and closes them. At this point, we can't pass them back to passt-helper using SCM_RIGHTS, because they are closed, from that perspective, and sendmsg() will give us EBADF. But if we don't clear repair mode, the port they are bound to will not be available for binding in the target. Terminate once we're done with the migration and we reported the state. This is equivalent to clearing repair mode on the sockets we just closed.As noted on the passt-repair patch, I think this is based on a misinterpreation of the situation. I think the problem is that the sockets aren't closed in passt-repair, so the additional handle copy is keeping the underlying socket open. This appears to work, because it is causing passt-repair to also terminate.That said, we probably want to terminate on the source side after a succesful migrate anyway. At the very least we need to close() all our sockets, and delete the corresponding flows, because we don't own them any more. Quitting is probably the simplest way to do that.I'm not sure if there's an established behaviour for helpers supporting state migration. We could probably close sockets, delete flows, and keep things up and running for the rest (restart from a clean situation), but at that point we already the guest networking is already broken in a number of ways. So, yeah, maybe let's keep this instead.Which also makes me realise, on a *failed* outbound migration, we _do_ need to turn repair mode off on everything again. Is that implemented yet?No, sorry, that's the "/* TODO: rollback */" comments in flow_migrate_source_pre(), flow.c. But other than that, it should be pretty much implemented, at a migrate.c level. -- Stefano