Re: [PATCH v2 0/2] More graceful handling of migration without passt-repair

26 Feb 2025

      On Wed, 26 Feb 2025 11:27:32 +1100
David Gibson  wrote:
...
On Tue, Feb 25, 2025 at 06:43:16PM +0100, Stefano Brivio wrote:
...
On Tue, 25 Feb 2025 16:51:30 +1100
David Gibson  wrote:
...
From Red Hat internal testing we've had some reports that if
attempting to migrate without passt-repair, the failure mode is uglier
than we'd like.  The migration fails, which is somewhat expected, but
we don't correctly roll things back on the source, so it breaks
network there as well.
Handle this more gracefully allowing the migration to proceed in this
case, but allow TCP connections to break
I've now tested this reasonably:
 * I get a clean migration if there are now active flows
 * Migration completes, although connections are broken if
   passt-repair isn't connected
 * Basic test suite (minus perf)
I didn't manage to test with libvirt yet, but I'm pretty convinced the
behaviour should be better than it was.
I did, and it is. The series looks good to me and I would apply it as
it is, but I'm waiting a bit longer in case you want to try out some
variations based on my tests as well. Here's what I did.
[snip]
Thanks for the detailed instructions.  More complex than I might have
liked, but oh well.
...
$ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
  Migration: [97.59 %]error: End of file while reading data: : Input/output error
...despite --verbose the error doesn't tell much (perhaps I need
LIBVIRT_DEBUG=1 instead?), but passt terminates at this point. With
this series (I just used 'make install' from the local build), migration
succeeds instead:
$ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session
  Migration: [100.00 %]
Now, on the target, I still have to figure out how to tell libvirt
to start QEMU and prepare for the migration (equivalent of '-incoming'
as we use in our tests), instead of just starting a new instance like
it does. Otherwise, I have no chance to start passt-repair there.
Perhaps it has something to do with persistent mode described here:
Ah.  So I'm pretty sure virsh migrate will automatically start qemu
with --incoming on the target.
("-incoming"), yes, see src/qemu/qemu_migration.c,
qemuMigrationDstPrepare().
...
IIUC the problem here is more about
timing: we want it to start it early, so that we have a chance to
start passt-repair and let it connect before the migration actually
happens.
For the timing itself, we could actually wait for passt-repair to be
there, with a timeout (say, 100ms).

We could also modify passt-repair to set up an inotify watcher if the
socket isn't there yet.
...
Crud... I didn't think of this before.  I don't know that there's any
sensible way to do this without having libvirt managing passt-repair
as well.
But we can't really use it as we're assuming that passt-repair will run
with capabilities virtqemud doesn't want/need.
...
I mean it's not impossible there's some option to do this,
but I doubt there's been any reason before for something outside of
libvirt to control the timing of the target qemu's creation.  I think
we need to ask libvirt people about this.
I'm looking into it (and perhaps virtiofsd had similar troubles?).
...
...
https://libvirt.org/migration.html#configuration-file-handling
Yeah.. I don't think this is relevant.
...
and --listen-address, but I'm not quite sure yet.
That is, I could only test different failures (early one on source, or
later one on target) with this, not a complete successful migration.
...
There are more fragile cases that I'm looking to fix, particularly the
die()s in flow_migrate_source_rollback() and elsewhere, however I ran
into various complications that I didn't manage to sort out today.
I'll continue looking at those tomorrow.  I'm now pretty confident
that those additional fixes won't entirely supersede the changes in
this series, so it should be fine to apply these on their own.
By the way, I think the somewhat less fragile/more obvious case where
we fail clumsily is when the target doesn't have the same address as
the source (among other possible addresses). In that case, we fail (and
terminate) with a rather awkward:
Ah, yes, that is a higher priority fragile case.
...
93.7217: ERROR:   Failed to bind socket for migrated flow: Cannot assign requested address
  93.7218: ERROR:   Flow 0 (TCP connection): Can't set up socket: (null), drop
  93.7331: ERROR:   Selecting TCP_SEND_QUEUE, socket 1: Socket operation on non-socket
  93.7333: ERROR:   Unexpected reply from TCP_REPAIR helper: -100
that's because, oops, I only took care of socket() failures in
tcp_flow_repair_socket(), but not bind() failures (!). Sorry.
No, you check for errors on both.
Well, "check", yes, but I'm not even setting an error code. I haven't
tried your 3/3 yet but look at "(null)" resulting from:

 		flow_err(flow, "Can't set up socket: %s, drop", strerror_(rc));

...rc is 0.
...
The problem is that in
tcp_flow_migrate_target() we cancel the flow allocation and carry on -
but the source will still send information for this flow, putting us
out of sync with the stream.
That, too, yes.
...
...
Once that's fixed, flow_migrate_target() should also take care of
decreasing 'count' accordingly. I just had a glimpse but didn't
really try to sketch a fix.
Adjusting count won't do the job.  Instead we'd need to keep the flow
around, but marked as "dead" somehow, so that we read but discard the
incoming information for it.  The MIGRATING state I added in one of my
drafts was supposed to help with this sort of thing.  But that's quite
a complex change.
I think it's great that you could (practically) solve it with three
lines...
...
Hrm... at least in the near term, I think it might actually be easier
to set IP_FREEBIND when we create sockets for in-migrating flows.
That way we can process them normally, they just won't do much without
the address set.  It has the additional advantage that it should work
if the higher layers only move the IP just after the migration,
instead of in advance.
Perhaps we want it anyway, but I wonder: what happens if we turn repair
mode off and we bound to a non-local address? I suppose we won't send
out anything, but I'm not sure. If we send out the first keep-alive
segment with a wrong address, we probably ruined the connection.

Once I find a solution for the target libvirt/passt-repair thing (and
the remaining SELinux issues), I'll try to have a look at this too. I
haven't tried yet a migration with a mismatching address on the target
and passt-repair available.

-- 
Stefano