More debugging of bugs found with the rampstream test today. * Send queue transfer bug I spotted by inspection a nasty bug which would mean we never properly transfer the send queue (in repair mode we read it into the the wrong buffer, then transferred the right one). I think we only didn't hit this because in each of the cases I've seen the send queue has been empty. I think that makes sense: with high speed local transfers, there's probably more than enough time between when the guest is stopped and when we dump the queue for the sndbuf to drain completely to the peer. * Queue transfer error checking I've made the error checking when extracting and reloading the queue a bit more robust. * rampstream corruption bug I found the cause of the stream corruption bug. I don't think repair mode fully supports SO_PEEK_OFF semantics, but apparently it shares enough code with the normal recv() path that the peek offset applied, which meant (I think) we skipped the "already sent" portion of the rcv queue when dumping it. I've fixed this by disabling SO_PEEK_OFF (setting it to -1) before migration on the source. This needs some fixing to deal correctly with the case of a failed migration which resumes on the source. * rampstream unexpected EOF bug Unfortunately that wasn't the only bug. With the peek offset fixed, I no longer get stream corruption, but I still get unexpected EOFs on the rampstream_in test (at least with 64M [rw]mem_max). The receiving rampstream is getting an EOF because passt is sending an RST to the guest 14.1750: Flow 0 (TCP connection): TCP reset at tcp_sock_handler:2270 That happens because we get an EPOLLERR on the socket at some point after migration. From some earlier debugging hacks, I think that's an ECONNRESET specifically, but I haven't debugged further because I was focused on the corruption bug. David Gibson (3): migrate: Migrate guest observed addresses rampstream: Add utility to test for corruption of data streams debug Stefano Brivio (6): migrate: Skeleton of live migration logic Add interfaces and configuration bits for passt-repair vhost_user: Make source quit after reporting migration state tcp: Get bound address for connected inbound sockets too migrate: Migrate TCP flows test: Add migration tests Makefile | 14 +- conf.c | 43 +- contrib/selinux/passt.te | 2 +- epoll_type.h | 6 +- flow.c | 259 +++++++++- flow.h | 8 + flow_table.h | 6 +- migrate.c | 309 ++++++++++++ migrate.h | 51 ++ passt.1 | 11 + passt.c | 21 +- passt.h | 15 + repair.c | 218 +++++++++ repair.h | 16 + tap.c | 65 +-- tcp.c | 921 +++++++++++++++++++++++++++++++++++- tcp_conn.h | 99 ++++ test/.gitignore | 1 + test/Makefile | 5 +- test/lib/layout | 55 ++- test/lib/setup | 140 +++++- test/lib/test | 48 ++ test/migrate/basic | 59 +++ test/migrate/bidirectional | 64 +++ test/migrate/iperf3_bidir6 | 58 +++ test/migrate/iperf3_in4 | 50 ++ test/migrate/iperf3_in6 | 58 +++ test/migrate/iperf3_out4 | 50 ++ test/migrate/iperf3_out6 | 58 +++ test/migrate/rampstream_in | 60 +++ test/migrate/rampstream_out | 55 +++ test/passt.mbuto | 5 +- test/rampstream-check.sh | 3 + test/rampstream.c | 143 ++++++ test/run | 29 ++ util.c | 62 +++ util.h | 30 ++ vhost_user.c | 67 +-- virtio.h | 4 - vu_common.c | 49 +- vu_common.h | 2 +- 41 files changed, 3014 insertions(+), 205 deletions(-) create mode 100644 migrate.c create mode 100644 migrate.h create mode 100644 repair.c create mode 100644 repair.h create mode 100644 test/migrate/basic create mode 100644 test/migrate/bidirectional create mode 100644 test/migrate/iperf3_bidir6 create mode 100644 test/migrate/iperf3_in4 create mode 100644 test/migrate/iperf3_in6 create mode 100644 test/migrate/iperf3_out4 create mode 100644 test/migrate/iperf3_out6 create mode 100644 test/migrate/rampstream_in create mode 100644 test/migrate/rampstream_out create mode 100755 test/rampstream-check.sh create mode 100644 test/rampstream.c -- 2.48.1