On Tue, 25 Feb 2025 16:51:30 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:From Red Hat internal testing we've had some reports that if attempting to migrate without passt-repair, the failure mode is uglier than we'd like. The migration fails, which is somewhat expected, but we don't correctly roll things back on the source, so it breaks network there as well. Handle this more gracefully allowing the migration to proceed in this case, but allow TCP connections to break I've now tested this reasonably: * I get a clean migration if there are now active flows * Migration completes, although connections are broken if passt-repair isn't connected * Basic test suite (minus perf) I didn't manage to test with libvirt yet, but I'm pretty convinced the behaviour should be better than it was.I did, and it is. The series looks good to me and I would apply it as it is, but I'm waiting a bit longer in case you want to try out some variations based on my tests as well. Here's what I did. L0 is Debian testing, L1 are two similar (virt-clone'd) instances of RHEL 9.5 (with passt-0^20250217.ga1e48a0-1.el9.x86_64 or local build with this series, qemu-kvm-9.1.0-14.el9.x86_64, libvirt-10.10.0-7.el9.x86_64), and L2 is Alpine 3.21-ish. The two L1 instances (hosting the source and target guests), of course, don't need to be run under libvirt, but they do in my case. They are connected by passt, so that they share the same address internally, but I'm forwarding different SSH ports to them. Relevant libvirt XML snippets for L1 instances: <interface type='user'> <mac address='52:54:00:8a:9e:c2'/> <portForward proto='tcp'> <range start='1295' to='22'/> </portForward> <model type='virtio'/> <backend type='passt'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> and: <interface type='user'> <mac address='52:54:00:b8:99:8c'/> <portForward proto='tcp'> <range start='11951' to='22'/> </portForward> <model type='virtio'/> <backend type='passt'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> ...I didn't switch those to vhost-user mode yet. I prepared the L2 guest on L1 with: $ wget https://dl-cdn.alpinelinux.org/alpine/v3.21/releases/cloud/nocloud_alpine-3… $ virt-customize -a nocloud_alpine-3.21.2-x86_64-bios-tiny-r0.qcow2 --root-password password:root $ virt-install -d --name alpine --memory 1024 --noreboot --osinfo alpinelinux3.20 --network backend.type=passt,portForward0.proto=tcp,portForward0.range0.start=40922,portForward0.range0.to=2222 --import --disk nocloud_alpine-3.21.2-x86_64-bios-tiny-r0.qcow2 And made sure I can connect via SSH to the second (target node) L1 with: $ ssh-copy-id -f -p 11951 $GATEWAY There are some known SELinux issues at this point that I'm still working on (similar for AppArmor), so I *temporarily* set it to permissive mode with 'setenforce 0', on L1. Some were not known, though, and it's taking me longer than expected. Now I can start passt-repair (or not) on the source L1 (node): # passt-repair /run/user/1001/libvirt/qemu/run/passt/8-alpine-net0.socket.repair and open a TCP connection in the source L2 guest ('virsh console alpine', then login as root/root): # apk add inetutils-telnet # telnet passt.top 80 and finally ask libvirt to migrate the guest. Note that I need "--unsafe" because I didn't care about migrating storage (it's good enough to have the guest memory for this test). Without this series, migration fails on the source: $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session Migration: [97.59 %]error: End of file while reading data: : Input/output error ...despite --verbose the error doesn't tell much (perhaps I need LIBVIRT_DEBUG=1 instead?), but passt terminates at this point. With this series (I just used 'make install' from the local build), migration succeeds instead: $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://88.198.0.161:10951/session Migration: [100.00 %] Now, on the target, I still have to figure out how to tell libvirt to start QEMU and prepare for the migration (equivalent of '-incoming' as we use in our tests), instead of just starting a new instance like it does. Otherwise, I have no chance to start passt-repair there. Perhaps it has something to do with persistent mode described here: https://libvirt.org/migration.html#configuration-file-handling and --listen-address, but I'm not quite sure yet. That is, I could only test different failures (early one on source, or later one on target) with this, not a complete successful migration.There are more fragile cases that I'm looking to fix, particularly the die()s in flow_migrate_source_rollback() and elsewhere, however I ran into various complications that I didn't manage to sort out today. I'll continue looking at those tomorrow. I'm now pretty confident that those additional fixes won't entirely supersede the changes in this series, so it should be fine to apply these on their own.By the way, I think the somewhat less fragile/more obvious case where we fail clumsily is when the target doesn't have the same address as the source (among other possible addresses). In that case, we fail (and terminate) with a rather awkward: 93.7217: ERROR: Failed to bind socket for migrated flow: Cannot assign requested address 93.7218: ERROR: Flow 0 (TCP connection): Can't set up socket: (null), drop 93.7331: ERROR: Selecting TCP_SEND_QUEUE, socket 1: Socket operation on non-socket 93.7333: ERROR: Unexpected reply from TCP_REPAIR helper: -100 that's because, oops, I only took care of socket() failures in tcp_flow_repair_socket(), but not bind() failures (!). Sorry. Once that's fixed, flow_migrate_target() should also take care of decreasing 'count' accordingly. I just had a glimpse but didn't really try to sketch a fix. -- Stefano