On Mon, 3 Feb 2025 19:52:37 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Mon, Feb 03, 2025 at 07:09:32AM +0100, Stefano Brivio wrote:I meant passt and similar. Is there any convention we should adopt?On Mon, 3 Feb 2025 12:55:47 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:By "helper" do you mean passt as a device helper to qemu, or passt-repair as a helper to passt. For the latter I wouldn't expect so - it's only a weirdness of our situation that we need passt-repair at all. If the former, I'm not really sure what you're after.On Fri, Jan 31, 2025 at 08:39:50PM +0100, Stefano Brivio wrote:Right, exactly that.On migration, the source process asks passt-helper to set TCP sockets in repair mode, dumps the information we need to migrate connections, and closes them. At this point, we can't pass them back to passt-helper using SCM_RIGHTS, because they are closed, from that perspective, and sendmsg() will give us EBADF. But if we don't clear repair mode, the port they are bound to will not be available for binding in the target. Terminate once we're done with the migration and we reported the state. This is equivalent to clearing repair mode on the sockets we just closed.As noted on the passt-repair patch, I think this is based on a misinterpreation of the situation. I think the problem is that the sockets aren't closed in passt-repair, so the additional handle copy is keeping the underlying socket open. This appears to work, because it is causing passt-repair to also terminate.That said, we probably want to terminate on the source side after a succesful migrate anyway. At the very least we need to close() all our sockets, and delete the corresponding flows, because we don't own them any more. Quitting is probably the simplest way to do that.I'm not sure if there's an established behaviour for helpers supporting state migration.I think it's simply where we close sockets, by the way. -- StefanoWe could probably close sockets, delete flows, and keep things up and running for the rest (restart from a clean situation), but at that point we already the guest networking is already broken in a number of ways. So, yeah, maybe let's keep this instead.So, I realised it's a bit more complicated than that. We need to identify exactly where the "point of no return" is. I'll discuss in our call tonight.