...and time out after that. This will be needed because of an upcoming change to passt-repair enabling it to start before passt is started, on both source and target, by means of an inotify watch. Once the inotify watch triggers, passt-repair will connect right away, but we have no guarantees that the connection completes before we start the migration process, so wait for it (for a reasonable amount of time). Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- I'm still working on the passt-repair change mentioned above. It's turning out to be a bit trickier than expected because to support typical libvirt migration usage, on the target, we also need to be able to wait for creation of a directory and, separately, for a socket file being added inside it. However, this part looks solid and it should do what we need. I tested it by starting passt-repair after sleep(1) in migrate/* tests, changing this timeout to a minute, on either source, target, or both. flow.c | 20 ++++++++++++++++++++ repair.c | 31 +++++++++++++++++++++++++++++++ repair.h | 1 + 3 files changed, 52 insertions(+) diff --git a/flow.c b/flow.c index 749c498..5e64b79 100644 --- a/flow.c +++ b/flow.c @@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret) return ret; } +/** + * flow_migrate_need_repair() - Do we need to set repair mode for any flow? + * + * Return: true if repair mode is needed, false otherwise + */ +static bool flow_migrate_need_repair(void) +{ + union flow *flow; + + foreach_established_tcp_flow(flow) + return true; + + return false; +} + /** * flow_migrate_repair_all() - Turn repair mode on or off for all flows * @c: Execution context @@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, (void)stage; (void)fd; + if (flow_migrate_need_repair()) + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; @@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, if (!count) return 0; + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; diff --git a/repair.c b/repair.c index 3ee089f..ce5a4a4 100644 --- a/repair.c +++ b/repair.c @@ -27,6 +27,11 @@ #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ +/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */ +#define REPAIR_ACCEPT_TIMEOUT_MS 100 +#define REPAIR_ACCEPT_TIMEOUT_US (REPAIR_ACCEPT_TIMEOUT_MS * 1000) +static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000); + /* Pending file descriptors for next repair_flush() call, or command change */ static int repair_fds[SCM_MAX_FD]; @@ -138,6 +143,32 @@ void repair_handler(struct ctx *c, uint32_t events) repair_close(c); } +/** + * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect + * @c: Execution context + */ +void repair_wait(struct ctx *c) +{ + struct timeval tv = { .tv_sec = 0, + .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) }; + + if (c->fd_repair >= 0 || c->fd_repair_listen == -1) + return; + + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, + &tv, sizeof(tv))) { + err_perror("Set timeout on TCP_REPAIR listening socket"); + return; + } + + repair_listen_handler(c, EPOLLIN); + + tv.tv_usec = 0; + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, + &tv, sizeof(tv))) + err_perror("Clear timeout on TCP_REPAIR listening socket"); +} + /** * repair_flush() - Flush current set of sockets to helper, with current command * @c: Execution context diff --git a/repair.h b/repair.h index de279d6..1d37922 100644 --- a/repair.h +++ b/repair.h @@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c); void repair_listen_handler(struct ctx *c, uint32_t events); void repair_handler(struct ctx *c, uint32_t events); void repair_close(struct ctx *c); +void repair_wait(struct ctx *c); int repair_flush(struct ctx *c); int repair_set(struct ctx *c, int s, int cmd); -- 2.43.0
On Thu, Mar 06, 2025 at 08:00:51PM +0100, Stefano Brivio wrote:...and time out after that. This will be needed because of an upcoming change to passt-repair enabling it to start before passt is started, on both source and target, by means of an inotify watch. Once the inotify watch triggers, passt-repair will connect right away, but we have no guarantees that the connection completes before we start the migration process, so wait for it (for a reasonable amount of time). Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Hmm. So I'm pretty confident this patch does what it sets out to do. I'm increasingly dubious about this direction.--- I'm still working on the passt-repair change mentioned above. It's turning out to be a bit trickier than expected because to support typical libvirt migration usage, on the target, we also need to be able to wait for creation of a directory and, separately, for a socket file being added inside it.Oof. This significantly amplifies my concerns about the security model: one of the most obvious ways to to bound who could potentially convince passt-repair to connect is to make sure the directory exists before hand: if you don't have permissions to create the socket in that directory, passt-repair won't connect. If we're also waiting for directories to appear, it makes this even harder to analyse in terms of whether someone who's not supposed to could convince passt-repair to connect to them. Given the complexity you're hitting here, I'm really starting to think maybe we should reverse our model: have passt connect to passt-repair. The security model is easier to understand: permission to the socket and its directory is permission to use TCP_REPAIR. There's no awkward waiting on either side - I think it might actually be less total code chyurn. Judging by the early queries from the Kubevirt guys it seems to be the more obvious model people would expect. On the other hand, configuring the socket path is still an issue, and being able to tell passt where to connect to through libvirt might become more important than it is now.However, this part looks solid and it should do what we need. I tested it by starting passt-repair after sleep(1) in migrate/* tests, changing this timeout to a minute, on either source, target, or both. flow.c | 20 ++++++++++++++++++++ repair.c | 31 +++++++++++++++++++++++++++++++ repair.h | 1 + 3 files changed, 52 insertions(+) diff --git a/flow.c b/flow.c index 749c498..5e64b79 100644 --- a/flow.c +++ b/flow.c @@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret) return ret; } +/** + * flow_migrate_need_repair() - Do we need to set repair mode for any flow? + * + * Return: true if repair mode is needed, false otherwise + */ +static bool flow_migrate_need_repair(void) +{ + union flow *flow; + + foreach_established_tcp_flow(flow) + return true; + + return false; +} + /** * flow_migrate_repair_all() - Turn repair mode on or off for all flows * @c: Execution context @@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, (void)stage; (void)fd; + if (flow_migrate_need_repair()) + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; @@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, if (!count) return 0; + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; diff --git a/repair.c b/repair.c index 3ee089f..ce5a4a4 100644 --- a/repair.c +++ b/repair.c @@ -27,6 +27,11 @@ #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ +/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */ +#define REPAIR_ACCEPT_TIMEOUT_MS 100100ms is a pretty long time in the context of a stopped guest. I guess is is a worst case. We could possibly mitigate it by doing the wait when memory change logging goes on (so migration is starting, bu the guest is not get stopped). However at that point we can't rely on flow_migrate_need_repair(), because we might have no flows, but one is opened before the guest is stopped.+#define REPAIR_ACCEPT_TIMEOUT_US (REPAIR_ACCEPT_TIMEOUT_MS * 1000) +static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000);In this context the static_assert() seems like a pretty arbitrary bound. To make the reason for it clearer, I'd suggest moving it to where we construct the timeval.+ /* Pending file descriptors for next repair_flush() call, or command change */ static int repair_fds[SCM_MAX_FD]; @@ -138,6 +143,32 @@ void repair_handler(struct ctx *c, uint32_t events) repair_close(c); } +/** + * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect + * @c: Execution context + */ +void repair_wait(struct ctx *c) +{ + struct timeval tv = { .tv_sec = 0, + .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) }; + + if (c->fd_repair >= 0 || c->fd_repair_listen == -1) + return; + + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, + &tv, sizeof(tv))) { + err_perror("Set timeout on TCP_REPAIR listening socket"); + return; + } + + repair_listen_handler(c, EPOLLIN); + + tv.tv_usec = 0; + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, + &tv, sizeof(tv))) + err_perror("Clear timeout on TCP_REPAIR listening socket"); +} + /** * repair_flush() - Flush current set of sockets to helper, with current command * @c: Execution context diff --git a/repair.h b/repair.h index de279d6..1d37922 100644 --- a/repair.h +++ b/repair.h @@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c); void repair_listen_handler(struct ctx *c, uint32_t events); void repair_handler(struct ctx *c, uint32_t events); void repair_close(struct ctx *c); +void repair_wait(struct ctx *c); int repair_flush(struct ctx *c); int repair_set(struct ctx *c, int s, int cmd);-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, 7 Mar 2025 13:59:08 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Mar 06, 2025 at 08:00:51PM +0100, Stefano Brivio wrote:I don't think your concern is particularly founded because in a general case we have LSM policies preventing other processes from creating that file in that directory, and in KubeVirt's case we simply have no other processes that can access the pod mount namespace and do that. But that part of the change is not strictly necessary, surely not for a proper libvirt integration (libvirt knows the path) and also not for the integration for KubeVirt (KubeVirt can build the path). So, to dodge the obstructionism here, I'll happily drop that part. It also saves me time....and time out after that. This will be needed because of an upcoming change to passt-repair enabling it to start before passt is started, on both source and target, by means of an inotify watch. Once the inotify watch triggers, passt-repair will connect right away, but we have no guarantees that the connection completes before we start the migration process, so wait for it (for a reasonable amount of time). Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Hmm. So I'm pretty confident this patch does what it sets out to do. I'm increasingly dubious about this direction.--- I'm still working on the passt-repair change mentioned above. It's turning out to be a bit trickier than expected because to support typical libvirt migration usage, on the target, we also need to be able to wait for creation of a directory and, separately, for a socket file being added inside it.Oof. This significantly amplifies my concerns about the security model: one of the most obvious ways to to bound who could potentially convince passt-repair to connect is to make sure the directory exists before hand: if you don't have permissions to create the socket in that directory, passt-repair won't connect. If we're also waiting for directories to appear, it makes this even harder to analyse in terms of whether someone who's not supposed to could convince passt-repair to connect to them.Given the complexity you're hitting here, I'm really starting to think maybe we should reverse our model: have passt connect to passt-repair. The security model is easier to understand: permission to the socket and its directory is permission to use TCP_REPAIR. There's no awkward waiting on either side - I think it might actually be less total code chyurn.Of course it's a bit less code, but it turns a potential concern into the actually expected situation: you have something permanently around that allows any process (because you have no practical way to restrict it, not in distributions and not in KubeVirt) to set/clear repair mode on sockets. Unless you start introducing authentication, that is. In the current model it's still a one-off operation, even with an additional wait. You start passt-repair once to do its job, and once it does the job, it terminates.Judging by the early queries from the Kubevirt guys it seems to be the more obvious model people would expect.Note that you talked to people taking care of (almost exclusively) networking there. Knowing myself a fair bit of code, architecture, and history of security weaknesses and discussions around them, I'm pretty sure you can't do the opposite. The long-term solutions are anyway having libvirt provide a phase or blocking event where this can be done, and, at least in KubeVirt's case, eventually handle this with seitan.On the other hand, configuring the socket path is still an issue, and being able to tell passt where to connect to through libvirt might become more important than it is now.It's actually not a case I can reproduce at all, as expected: passt-repair connects quickly enough once the socket is available.However, this part looks solid and it should do what we need. I tested it by starting passt-repair after sleep(1) in migrate/* tests, changing this timeout to a minute, on either source, target, or both. flow.c | 20 ++++++++++++++++++++ repair.c | 31 +++++++++++++++++++++++++++++++ repair.h | 1 + 3 files changed, 52 insertions(+) diff --git a/flow.c b/flow.c index 749c498..5e64b79 100644 --- a/flow.c +++ b/flow.c @@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret) return ret; } +/** + * flow_migrate_need_repair() - Do we need to set repair mode for any flow? + * + * Return: true if repair mode is needed, false otherwise + */ +static bool flow_migrate_need_repair(void) +{ + union flow *flow; + + foreach_established_tcp_flow(flow) + return true; + + return false; +} + /** * flow_migrate_repair_all() - Turn repair mode on or off for all flows * @c: Execution context @@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, (void)stage; (void)fd; + if (flow_migrate_need_repair()) + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; @@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, if (!count) return 0; + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; diff --git a/repair.c b/repair.c index 3ee089f..ce5a4a4 100644 --- a/repair.c +++ b/repair.c @@ -27,6 +27,11 @@ #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ +/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */ +#define REPAIR_ACCEPT_TIMEOUT_MS 100100ms is a pretty long time in the context of a stopped guest. I guess is is a worst case.We could possibly mitigate it by doing theMitigate what exactly? The expected mode of operation is with passt-repair available. The failure handling is a fallback. I can turn it into 1 ms, it just seems to increase risk for no good reason.wait when memory change logging goes on (so migration is starting, bu the guest is not get stopped). However at that point we can't rely on flow_migrate_need_repair(), because we might have no flows, but one is opened before the guest is stopped.Right, that complicates things quite a lot.Oh, I didn't think of that. Sure, I'll move it.+#define REPAIR_ACCEPT_TIMEOUT_US (REPAIR_ACCEPT_TIMEOUT_MS * 1000) +static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000);In this context the static_assert() seems like a pretty arbitrary bound. To make the reason for it clearer, I'd suggest moving it to where we construct the timeval.> + > /* Pending file descriptors for next repair_flush() call, or command change */ > static int repair_fds[SCM_MAX_FD]; > > @@ -138,6 +143,32 @@ void repair_handler(struct ctx *c, uint32_t events) > repair_close(c); > } > > +/** > + * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect > + * @c: Execution context > + */ > +void repair_wait(struct ctx *c) > +{ > + struct timeval tv = { .tv_sec = 0, > + .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) }; > + > + if (c->fd_repair >= 0 || c->fd_repair_listen == -1) > + return; > + > + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, > + &tv, sizeof(tv))) { > + err_perror("Set timeout on TCP_REPAIR listening socket"); > + return; > + } > + > + repair_listen_handler(c, EPOLLIN); > + > + tv.tv_usec = 0; > + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, > + &tv, sizeof(tv))) > + err_perror("Clear timeout on TCP_REPAIR listening socket"); > +} > + > /** > * repair_flush() - Flush current set of sockets to helper, with current command > * @c: Execution context > diff --git a/repair.h b/repair.h > index de279d6..1d37922 100644 > --- a/repair.h > +++ b/repair.h > @@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c); > void repair_listen_handler(struct ctx *c, uint32_t events); > void repair_handler(struct ctx *c, uint32_t events); > void repair_close(struct ctx *c); > +void repair_wait(struct ctx *c); > int repair_flush(struct ctx *c); > int repair_set(struct ctx *c, int s, int cmd);-- Stefano
On Fri, Mar 07, 2025 at 07:14:45AM +0100, Stefano Brivio wrote:On Fri, 7 Mar 2025 13:59:08 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:I mean, LSM properties should generally be a additional layer of protection, things should be basically ok with the vanilla Unix permission model.On Thu, Mar 06, 2025 at 08:00:51PM +0100, Stefano Brivio wrote:I don't think your concern is particularly founded because in a general case we have LSM policies preventing other processes from creating that file in that directory, and in KubeVirt's case we simply have no other processes that can access the pod mount namespace and do that....and time out after that. This will be needed because of an upcoming change to passt-repair enabling it to start before passt is started, on both source and target, by means of an inotify watch. Once the inotify watch triggers, passt-repair will connect right away, but we have no guarantees that the connection completes before we start the migration process, so wait for it (for a reasonable amount of time). Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>Hmm. So I'm pretty confident this patch does what it sets out to do. I'm increasingly dubious about this direction.--- I'm still working on the passt-repair change mentioned above. It's turning out to be a bit trickier than expected because to support typical libvirt migration usage, on the target, we also need to be able to wait for creation of a directory and, separately, for a socket file being added inside it.Oof. This significantly amplifies my concerns about the security model: one of the most obvious ways to to bound who could potentially convince passt-repair to connect is to make sure the directory exists before hand: if you don't have permissions to create the socket in that directory, passt-repair won't connect. If we're also waiting for directories to appear, it makes this even harder to analyse in terms of whether someone who's not supposed to could convince passt-repair to connect to them.But that part of the change is not strictly necessary, surely not for a proper libvirt integration (libvirt knows the path) and also not for the integration for KubeVirt (KubeVirt can build the path). So, to dodge the obstructionism here, I'll happily drop that part. It also saves me time.A (semi-)permanentaly running daemon was not what I had in mind. What Istead I was thinking that you start passt-repair shortly before the migration (as we already planned), it does exactly one accept(), then terminates once its single client is done.Given the complexity you're hitting here, I'm really starting to think maybe we should reverse our model: have passt connect to passt-repair. The security model is easier to understand: permission to the socket and its directory is permission to use TCP_REPAIR. There's no awkward waiting on either side - I think it might actually be less total code chyurn.Of course it's a bit less code, but it turns a potential concern into the actually expected situation: you have something permanently around that allows any process (because you have no practical way to restrictit, not in distributions and not in KubeVirt) to set/clear repair mode on sockets. Unless you start introducing authentication, that is.Well, the plan was the authentication was via the Unix permissions on the socket.In the current model it's still a one-off operation, even with an additional wait. You start passt-repair once to do its job, and once it does the job, it terminates.Right, also what I was intending... but of course, I realised a few hours later that I'd forgotten that our filesystem isolation means we can't connect() to a Unix socket in the wider world at migration time. There are ways to work around that, of course, but I have yet to spot one that seems better than the status quo. So, Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> I guess.Right. Probably not really valuable while it still needs CAP_NET_ADMIN, but if we do manage to change that to CAP_NET_RAW, it would make sense for passt to only use the helper if it can't do the TCP_REPAIR itself. That makes setup simpler for environments that don't have counterproductive "security" policies.Judging by the early queries from the Kubevirt guys it seems to be the more obvious model people would expect.Note that you talked to people taking care of (almost exclusively) networking there. Knowing myself a fair bit of code, architecture, and history of security weaknesses and discussions around them, I'm pretty sure you can't do the opposite. The long-term solutions are anyway having libvirt provide a phase or blocking event where this can be done, and, at least in KubeVirt's case, eventually handle this with seitan.Good to know. It does mean we're adding 100ms of downtime for the fallback case where there is no passt-repair, of course.On the other hand, configuring the socket path is still an issue, and being able to tell passt where to connect to through libvirt might become more important than it is now.It's actually not a case I can reproduce at all, as expected: passt-repair connects quickly enough once the socket is available.However, this part looks solid and it should do what we need. I tested it by starting passt-repair after sleep(1) in migrate/* tests, changing this timeout to a minute, on either source, target, or both. flow.c | 20 ++++++++++++++++++++ repair.c | 31 +++++++++++++++++++++++++++++++ repair.h | 1 + 3 files changed, 52 insertions(+) diff --git a/flow.c b/flow.c index 749c498..5e64b79 100644 --- a/flow.c +++ b/flow.c @@ -911,6 +911,21 @@ static int flow_migrate_source_rollback(struct ctx *c, unsigned bound, int ret) return ret; } +/** + * flow_migrate_need_repair() - Do we need to set repair mode for any flow? + * + * Return: true if repair mode is needed, false otherwise + */ +static bool flow_migrate_need_repair(void) +{ + union flow *flow; + + foreach_established_tcp_flow(flow) + return true; + + return false; +} + /** * flow_migrate_repair_all() - Turn repair mode on or off for all flows * @c: Execution context @@ -966,6 +981,9 @@ int flow_migrate_source_pre(struct ctx *c, const struct migrate_stage *stage, (void)stage; (void)fd; + if (flow_migrate_need_repair()) + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; @@ -1083,6 +1101,8 @@ int flow_migrate_target(struct ctx *c, const struct migrate_stage *stage, if (!count) return 0; + repair_wait(c); + if ((rc = flow_migrate_repair_all(c, true))) return -rc; diff --git a/repair.c b/repair.c index 3ee089f..ce5a4a4 100644 --- a/repair.c +++ b/repair.c @@ -27,6 +27,11 @@ #define SCM_MAX_FD 253 /* From Linux kernel (include/net/scm.h), not in UAPI */ +/* Wait for a while for TCP_REPAIR helper to connect if it's not there yet */ +#define REPAIR_ACCEPT_TIMEOUT_MS 100100ms is a pretty long time in the context of a stopped guest. I guess is is a worst case.We could possibly mitigate it by doing theMitigate what exactly? The expected mode of operation is with passt-repair available. The failure handling is a fallback. I can turn it into 1 ms, it just seems to increase risk for no good reason.-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonwait when memory change logging goes on (so migration is starting, bu the guest is not get stopped). However at that point we can't rely on flow_migrate_need_repair(), because we might have no flows, but one is opened before the guest is stopped.Right, that complicates things quite a lot.Oh, I didn't think of that. Sure, I'll move it.+#define REPAIR_ACCEPT_TIMEOUT_US (REPAIR_ACCEPT_TIMEOUT_MS * 1000) +static_assert(REPAIR_ACCEPT_TIMEOUT_US < 1000 * 1000);In this context the static_assert() seems like a pretty arbitrary bound. To make the reason for it clearer, I'd suggest moving it to where we construct the timeval.> + > /* Pending file descriptors for next repair_flush() call, or command change */ > static int repair_fds[SCM_MAX_FD]; > > @@ -138,6 +143,32 @@ void repair_handler(struct ctx *c, uint32_t events) > repair_close(c); > } > > +/** > + * repair_wait() - Wait (with timeout) for TCP_REPAIR helper to connect > + * @c: Execution context > + */ > +void repair_wait(struct ctx *c) > +{ > + struct timeval tv = { .tv_sec = 0, > + .tv_usec = (long)(REPAIR_ACCEPT_TIMEOUT_US) }; > + > + if (c->fd_repair >= 0 || c->fd_repair_listen == -1) > + return; > + > + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, > + &tv, sizeof(tv))) { > + err_perror("Set timeout on TCP_REPAIR listening socket"); > + return; > + } > + > + repair_listen_handler(c, EPOLLIN); > + > + tv.tv_usec = 0; > + if (setsockopt(c->fd_repair_listen, SOL_SOCKET, SO_RCVTIMEO, > + &tv, sizeof(tv))) > + err_perror("Clear timeout on TCP_REPAIR listening socket"); > +} > + > /** > * repair_flush() - Flush current set of sockets to helper, with current command > * @c: Execution context > diff --git a/repair.h b/repair.h > index de279d6..1d37922 100644 > --- a/repair.h > +++ b/repair.h > @@ -10,6 +10,7 @@ void repair_sock_init(const struct ctx *c); > void repair_listen_handler(struct ctx *c, uint32_t events); > void repair_handler(struct ctx *c, uint32_t events); > void repair_close(struct ctx *c); > +void repair_wait(struct ctx *c); > int repair_flush(struct ctx *c); > int repair_set(struct ctx *c, int s, int cmd);