[PATCH v4 0/8] Draft, incomplete series introducing state migration
I applied what I could, and squashed a number patches, including those from "[PATCH 0/6] More migration improvements". I didn't test the full flow here. David Gibson (2): migrate: Make more handling common rather than vhost-user specific migrate: Don't handle the migration channel through epoll Stefano Brivio (6): flow_table: Use size in extern declaration for flowtab, export hash table Introduce facilities for guest migration on top of vhost-user infrastructure Add interfaces and configuration bits for passt-repair flow, tcp: Basic pre-migration source handler to dump sequence numbers vhost_user: Make source quit after reporting migration state Implement target side of migration Makefile | 14 +-- conf.c | 44 ++++++- epoll_type.h | 6 +- flow.c | 104 ++++++++++++++++- flow.h | 4 + flow_table.h | 5 +- migrate.c | 319 +++++++++++++++++++++++++++++++++++++++++++++++++++ migrate.h | 96 ++++++++++++++++ passt.1 | 11 ++ passt.c | 17 ++- passt.h | 17 +++ repair.c | 193 +++++++++++++++++++++++++++++++ repair.h | 16 +++ tap.c | 65 +---------- tcp.c | 170 +++++++++++++++++++++++++++ tcp_conn.h | 7 ++ util.c | 62 ++++++++++ util.h | 1 + vhost_user.c | 66 +++-------- virtio.h | 4 - vu_common.c | 49 +------- vu_common.h | 2 +- 22 files changed, 1087 insertions(+), 185 deletions(-) create mode 100644 migrate.c create mode 100644 migrate.h create mode 100644 repair.c create mode 100644 repair.h -- 2.43.0
This can all be thrown away once we stop copying the flow table.
Signed-off-by: Stefano Brivio
Add two sets (source or target) of three functions each for passt in
vhost-user mode, triggered by activity on the file descriptor passed
via VHOST_USER_PROTOCOL_F_DEVICE_STATE:
- migrate_source_pre() and migrate_target_pre() are called to prepare
for migration, before data is transferred
- migrate_source() sends, and migrate_target() receives migration data
- migrate_source_post() and migrate_target_post() are responsible for
any post-migration task
Callbacks are added to these functions with arrays of function
pointers in migrate.c. Migration handlers are versioned.
Versioned descriptions of data sections will be added to the
data_versions array, which points to versioned iovec arrays. Version
1 is currently empty and will be filled in in subsequent patches.
The source announces the data version to be used and informs the peer
about endianness, and the size of void *, time_t, flow entries and
flow hash table entries.
The target checks if the version of the source is still supported. If
it's not, it aborts the migration.
From David: vu_migrate() moves to migrate.c now.
On top of this, from David:
--
migrate: Handle sending header section from data sections
Currently the global state header is included in the list of data sections
to send for migration. This makes for an asymmetry between the source and
target sides: for the source, the header is sent after the 'pre' handlers
along with all the rest of the data. For the target side, the header must
be read first (before the 'pre' handlers), so that we know the version
which determines what all the rest of the data will be.
Change this so that the header is handled explicitly on both the source and
target side. This will make some future changes simpler as well.
--
migrate: Make migration handlers simpler and more flexible
Currently the structure of the migration callbacks is quite rigid. There
are separate tables for pre and post handlers, for source and target.
Those only prepare or fix up with no reading or writing of the stream.
The actual reading and writing is done with an iovec of buffers to
transfer. That can't handle any variably sized structures, nor sending
state which is only obtained during migration rather than being tracked
usually.
Replace both the handlers and the sections with an ordered list of
migration "stages". Each stage has a callback for both source and target
side doing whatever is necessary - these can be NULL, for example for
preparation steps that have no equivalent on the other side. Things that
are just buffers to be transferred are handled with a macro generating a
suitable stage entry.
--
Signed-off-by: Stefano Brivio
From: David Gibson
From: David Gibson
In vhost-user mode, by default, create a second UNIX domain socket
accepting connections from passt-repair, with the usual listener
socket.
When we need to set or clear TCP_REPAIR on sockets, we'll send them
via SCM_RIGHTS to passt-repair, who sets the socket option values we
ask for.
To that end, introduce batched functions to request TCP_REPAIR
settings on sockets, so that we don't have to send a single message
for each socket, on migration. When needed, repair_flush() will
send the message and check for the reply.
Signed-off-by: Stefano Brivio
Very much draft quality, but it works. Ask passt-repair to switch
TCP sockets to repair mode and dump their current sequence numbers to
the flow table, which will be transferred and used by the target in
the next step.
Signed-off-by: Stefano Brivio
On Tue, Feb 04, 2025 at 01:47:43AM +0100, Stefano Brivio wrote:
Very much draft quality, but it works. Ask passt-repair to switch TCP sockets to repair mode and dump their current sequence numbers to the flow table, which will be transferred and used by the target in the next step.
[snip]
@@ -268,6 +270,7 @@ void migrate_close(struct ctx *c) { if (c->device_state_fd != -1) { debug("Closing migration channel, fd: %d", c->device_state_fd); + epoll_del(c, c->device_state_fd);
You have a stray revert hunks here again which breaks things horribly. And therefore makes me not confident that I'm actually testing the same code you are.
close(c->device_state_fd); c->device_state_fd = -1; c->device_state_result = -1; @@ -282,14 +285,12 @@ void migrate_close(struct ctx *c) */ void migrate_request(struct ctx *c, int fd, bool target) { - debug("Migration requested, fd: %d (was %d)", - fd, c->device_state_fd); + debug("Migration requested, fd: %d", c->device_state_fd);
if (c->device_state_fd != -1) migrate_close(c);
c->device_state_fd = fd; - c->migrate_target = target;
And here. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 4 Feb 2025 14:43:07 +1100
David Gibson
On Tue, Feb 04, 2025 at 01:47:43AM +0100, Stefano Brivio wrote:
Very much draft quality, but it works. Ask passt-repair to switch TCP sockets to repair mode and dump their current sequence numbers to the flow table, which will be transferred and used by the target in the next step.
[snip]
@@ -268,6 +270,7 @@ void migrate_close(struct ctx *c) { if (c->device_state_fd != -1) { debug("Closing migration channel, fd: %d", c->device_state_fd); + epoll_del(c, c->device_state_fd);
You have a stray revert hunks here again which breaks things horribly. And therefore makes me not confident that I'm actually testing the same code you are.
Well, folding "[PATCH 6/6] migrate: Make migration handlers simpler and more flexible" into the first patches, where it belongs, was a lot of fun. As I mentioned, I didn't test this one, I just build-tested the final result.
close(c->device_state_fd); c->device_state_fd = -1; c->device_state_result = -1; @@ -282,14 +285,12 @@ void migrate_close(struct ctx *c) */ void migrate_request(struct ctx *c, int fd, bool target) { - debug("Migration requested, fd: %d (was %d)", - fd, c->device_state_fd); + debug("Migration requested, fd: %d", c->device_state_fd);
if (c->device_state_fd != -1) migrate_close(c);
c->device_state_fd = fd; - c->migrate_target = target;
And here.
Can you please post the fixed rebase...? -- Stefano
On Tue, Feb 04, 2025 at 07:44:10AM +0100, Stefano Brivio wrote:
On Tue, 4 Feb 2025 14:43:07 +1100 David Gibson
wrote: On Tue, Feb 04, 2025 at 01:47:43AM +0100, Stefano Brivio wrote:
Very much draft quality, but it works. Ask passt-repair to switch TCP sockets to repair mode and dump their current sequence numbers to the flow table, which will be transferred and used by the target in the next step.
[snip]
@@ -268,6 +270,7 @@ void migrate_close(struct ctx *c) { if (c->device_state_fd != -1) { debug("Closing migration channel, fd: %d", c->device_state_fd); + epoll_del(c, c->device_state_fd);
You have a stray revert hunks here again which breaks things horribly. And therefore makes me not confident that I'm actually testing the same code you are.
Well, folding "[PATCH 6/6] migrate: Make migration handlers simpler and more flexible" into the first patches, where it belongs, was a lot of fun.
Heh, fair point.
As I mentioned, I didn't test this one, I just build-tested the final result.
close(c->device_state_fd); c->device_state_fd = -1; c->device_state_result = -1; @@ -282,14 +285,12 @@ void migrate_close(struct ctx *c) */ void migrate_request(struct ctx *c, int fd, bool target) { - debug("Migration requested, fd: %d (was %d)", - fd, c->device_state_fd); + debug("Migration requested, fd: %d", c->device_state_fd);
if (c->device_state_fd != -1) migrate_close(c);
c->device_state_fd = fd; - c->migrate_target = target;
And here.
Can you please post the fixed rebase...?
Ah, yeah, I've been thinking this way of posting patches back and forth hasn't been working great for this rapid iteration. On the other hand I was relucant to do the patch folding myself for fear of clobbering changes you were making concurrently to the same patches. I'll try posting the whole rebase and see how we go. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On migration, the source process asks passt-helper to set TCP sockets
in repair mode, dumps the information we need to migrate connections,
and closes them.
At this point, we can't pass them back to passt-helper using
SCM_RIGHTS, because they are closed, from that perspective, and
sendmsg() will give us EBADF. But if we don't clear repair mode, the
port they are bound to will not be available for binding in the
target.
Terminate once we're done with the migration and we reported the
state. This is equivalent to clearing repair mode on the sockets we
just closed.
Signed-off-by: Stefano Brivio
It's draft quality, with a number of hacks, and it will need a partial
rewrite. Add:
- flow_migrate_target_post(), to open target-side sockets and bind
them, switch them to repair mode, connect them, and make them leave
repair mode again
- copies of flow table, 'flow_first_free' pointer, related hash table,
and hash secret. The copy of the hash secret shows that the current
declarative approach to data sections has some drawbacks
Change tcp_flow_dump_seq() into tcp_flow_repair_seq(), which can dump
as well as restore sequences (used before connecting sockets).
Once we connect sockets, before we take them out of repair mode, we
need to restore MSS and window scaling information (what would be
determined by TCP options on handshake). I'm using hardcoded values as
we don't have a way to transfer these bits of socket-side information.
Before we turn repair mode off, add sockets to the epoll list and set
up per-socket timerfd descriptors, with initial timer scheduling.
Signed-off-by: Stefano Brivio
On Tue, Feb 04, 2025 at 01:47:37AM +0100, Stefano Brivio wrote:
I applied what I could, and squashed a number patches, including those from "[PATCH 0/6] More migration improvements".
I didn't test the full flow here.
With this patch set, I'm still unable to migrate a connection. First there are some simple things: * There are, again, some bogus revert hunks in the series which I had to remove. See comments * There are some bugs in the already committed patches, I posted a couple of fixes for those Then debugging the actual migration problem. The first issue I encountered was that the connect() on the target was failing with EADDRNOTAVAIL. I think this is a timing issue - the exit of the source instance isn't happening quite in time to free up the port. I was able to work around that by closing the source side socket immediately after extracting its info with tcp_repair, instead of postponing that to the exit check_device_state. To address this properly, I think we need to do a couple of things: * Treat the source-side point of no return as being at the end of migrate_source(). Close our sockets and purge flows at that point. * On the target side, defer actually re-opening/repairing the sockets until check_state, or even send_rarp time. Laurent, if you have some insight (or can find out) what guarantees we have in terms of what points on the target come after what points on the source that would be useful to check. With the hacky workaround, the connect() now success, but I still couldn't send data from the target. From strace, the target side passt doesn't seem to be attempting doing any writes to the socket, but I'm not sure why yet. pcap on the target side does show some packets on the stream coming from the guest, although it's only one byte (0x0a) and I thought I'd typed 5ish. It then appears to retransmit several times without getting an ack from passt. I'll keep debugging tomorrow if you don't have an insight. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 4 Feb 2025 17:01:04 +1100
David Gibson
With the hacky workaround, the connect() now success, but I still couldn't send data from the target. From strace, the target side passt doesn't seem to be attempting doing any writes to the socket, but I'm not sure why yet. pcap on the target side does show some packets on the stream coming from the guest, although it's only one byte (0x0a) and I thought I'd typed 5ish. It then appears to retransmit several times without getting an ack from passt.
It looks like an issue with the window, I had something similar before it occurred to me that I had to set the window scaling. Perhaps my hardcoded value isn't good enough, after all. Can you share the pcap? -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio