On Thu, Mar 06, 2025 at 08:00:51PM +0100, Stefano Brivio wrote:...and time out after that. This will be needed because of an upcoming change to passt-repair enabling it to start before passt is started, on both source and target, by means of an inotify watch. Once the inotify watch triggers, passt-repair will connect right away, but we have no guarantees that the connection completes before we start the migration process, so wait for it (for a reasonable amount of time). Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Hmm. So I'm pretty confident this patch does what it sets out to do. I'm increasingly dubious about this direction.--- I'm still working on the passt-repair change mentioned above. It's turning out to be a bit trickier than expected because to support typical libvirt migration usage, on the target, we also need to be able to wait for creation of a directory and, separately, for a socket file being added inside it.Oof. This significantly amplifies my concerns about the security model: one of the most obvious ways to to bound who could potentially convince passt-repair to connect is to make sure the directory exists before hand: if you don't have permissions to create the socket in that directory, passt-repair won't connect. If we're also waiting for directories to appear, it makes this even harder to analyse in terms of whether someone who's not supposed to could convince passt-repair to connect to them. Given the complexity you're hitting here, I'm really starting to think maybe we should reverse our model: have passt connect to passt-repair. The security model is easier to understand: permission to the socket and its directory is permission to use TCP_REPAIR. There's no awkward waiting on either side - I think it might actually be less total code chyurn. Judging by the early queries from the Kubevirt guys it seems to be the more obvious model people would expect. On the other hand, configuring the socket path is still an issue, and being able to tell passt where to connect to through libvirt might become more important than it is now.However, this part looks solid and it should do what we need. I tested it by starting passt-repair after sleep(1) in migrate/* tests, changing this timeout to a minute, on either source, target, or both. flow.c | 20 ++++++++++++++++++++ repair.c | 31 +++++++++++++++++++++++++++++++ repair.h | 1 + 3 files changed, 52 insertions(+) diff --git a/flow.c b/flow.c index 749c498..5e64b79 100644 --- a/flow.c +++ b/flow.c @@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret) return ret; } +/** + * flow_migrate_need_repair() - Do we need to set repair mode for any flow? + * + * Return: true if repair mode is needed, false otherwise + */ +static bool flow_migrate_need_repair(void) +{ + union flow *flow; + + foreach_established_tcp_flow(flow) + return true; + + return false; +} + /** * flow_migrate_repair_all() - Turn repair mode on or off for all flows * @c: Execution context @@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, (void)stage; (void)fd; + if (flow_migrate_need_repair()) + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; @@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, if (!count) return 0; + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; diff --git a/repair.c b/repair.c index 3ee089f..ce5a4a4 100644 --- a/repair.c +++ b/repair.c @@ -27,6 +27,11 @@ #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ +/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */ +#define REPAIR_ACCEPT_TIMEOUT_MS 100100ms is a pretty long time in the context of a stopped guest. I guess is is a worst case. We could possibly mitigate it by doing the wait when memory change logging goes on (so migration is starting, bu the guest is not get stopped). However at that point we can't rely on flow_migrate_need_repair(), because we might have no flows, but one is opened before the guest is stopped.+#define REPAIR_ACCEPT_TIMEOUT_US (REPAIR_ACCEPT_TIMEOUT_MS * 1000) +static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000);In this context the static_assert() seems like a pretty arbitrary bound. To make the reason for it clearer, I'd suggest moving it to where we construct the timeval.+ /* Pending file descriptors for next repair_flush() call, or command change */ static int repair_fds[SCM_MAX_FD]; @@ -138,6 +143,32 @@ void repair_handler(struct ctx *c, uint32_t events) repair_close(c); } +/** + * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect + * @c: Execution context + */ +void repair_wait(struct ctx *c) +{ + struct timeval tv = { .tv_sec = 0, + .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) }; + + if (c->fd_repair >= 0 || c->fd_repair_listen == -1) + return; + + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, + &tv, sizeof(tv))) { + err_perror("Set timeout on TCP_REPAIR listening socket"); + return; + } + + repair_listen_handler(c, EPOLLIN); + + tv.tv_usec = 0; + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, + &tv, sizeof(tv))) + err_perror("Clear timeout on TCP_REPAIR listening socket"); +} + /** * repair_flush() - Flush current set of sockets to helper, with current command * @c: Execution context diff --git a/repair.h b/repair.h index de279d6..1d37922 100644 --- a/repair.h +++ b/repair.h @@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c); void repair_listen_handler(struct ctx *c, uint32_t events); void repair_handler(struct ctx *c, uint32_t events); void repair_close(struct ctx *c); +void repair_wait(struct ctx *c); int repair_flush(struct ctx *c); int repair_set(struct ctx *c, int s, int cmd);-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson