[PATCH] tcp_splice, flow: Add socket to epoll set before connect(), drop assert
...otherwise, if we have a real error on connect() (that is, not
EINPROGRESS), we'll return early from tcp_splice_connect() and later
try to fetch the epoll file descriptor:
ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
which is still (correctly) EPOLLFD_ID_INVALID.
Replace the ASSERT() in flow_epollfd() with a warning, as it looks
like there might be harmless cases where the socket is not in the
epoll set yet, and we'll just crash for nothing. We can turn this back
to an ASSERT() once we audit these paths in more detail.
Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-44734...
Signed-off-by: Stefano Brivio
On Tue, Dec 09, 2025 at 12:53:33AM +0100, Stefano Brivio wrote:
...otherwise, if we have a real error on connect() (that is, not EINPROGRESS), we'll return early from tcp_splice_connect() and later try to fetch the epoll file descriptor:
ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
which is still (correctly) EPOLLFD_ID_INVALID.
Replace the ASSERT() in flow_epollfd() with a warning, as it looks like there might be harmless cases where the socket is not in the epoll set yet, and we'll just crash for nothing. We can turn this back to an ASSERT() once we audit these paths in more detail.
Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-44734... Signed-off-by: Stefano Brivio
--- I might merge this in a bit even without review as we might now have broken distribution packages around. flow.c | 7 ++++++- tcp_splice.c | 4 ++-- 2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/flow.c b/flow.c index 8d72965..4f53486 100644 --- a/flow.c +++ b/flow.c @@ -359,7 +359,12 @@ bool flow_in_epoll(const struct flow_common *f) */ int flow_epollfd(const struct flow_common *f) { - ASSERT(f->epollid < EPOLLFD_ID_MAX); + if (f->epollid >= EPOLLFD_ID_MAX) { + flow_log_(f, true, LOG_WARNING, + "Invalid epollid %i for flow, assuming default", + f->epollid); + return epoll_id_to_fd[EPOLLFD_ID_DEFAULT]; + }
This LGTM for safety's sake, although it's conceptually ugly.
return epoll_id_to_fd[f->epollid]; } diff --git a/tcp_splice.c b/tcp_splice.c index 717766a..4405224 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -381,14 +381,14 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport);
+ conn_event(c, conn, SPLICE_CONNECT); + if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) { if (errno != EINPROGRESS) { flow_trace(conn, "Couldn't connect socket for splice: %s", strerror_(errno)); return -errno; } - - conn_event(c, conn, SPLICE_CONNECT);
I don't really understand the rationale for this.
} else { conn_event(c, conn, SPLICE_ESTABLISHED); return tcp_splice_connect_finish(c, conn);
I think the true fix for this specific failure on the connect-error path is to check flow_in_epoll() before calling flow_epollfd() / epoll_del() in the CLOSING path of conn_flag_do(). -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 9 Dec 2025 11:09:33 +1100
David Gibson
On Tue, Dec 09, 2025 at 12:53:33AM +0100, Stefano Brivio wrote:
...otherwise, if we have a real error on connect() (that is, not EINPROGRESS), we'll return early from tcp_splice_connect() and later try to fetch the epoll file descriptor:
ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
which is still (correctly) EPOLLFD_ID_INVALID.
Replace the ASSERT() in flow_epollfd() with a warning, as it looks like there might be harmless cases where the socket is not in the epoll set yet, and we'll just crash for nothing. We can turn this back to an ASSERT() once we audit these paths in more detail.
Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-44734... Signed-off-by: Stefano Brivio
--- I might merge this in a bit even without review as we might now have broken distribution packages around. flow.c | 7 ++++++- tcp_splice.c | 4 ++-- 2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/flow.c b/flow.c index 8d72965..4f53486 100644 --- a/flow.c +++ b/flow.c @@ -359,7 +359,12 @@ bool flow_in_epoll(const struct flow_common *f) */ int flow_epollfd(const struct flow_common *f) { - ASSERT(f->epollid < EPOLLFD_ID_MAX); + if (f->epollid >= EPOLLFD_ID_MAX) { + flow_log_(f, true, LOG_WARNING, + "Invalid epollid %i for flow, assuming default", + f->epollid); + return epoll_id_to_fd[EPOLLFD_ID_DEFAULT]; + }
This LGTM for safety's sake, although it's conceptually ugly.
Well it's much uglier to have containers crashing randomly...
return epoll_id_to_fd[f->epollid]; } diff --git a/tcp_splice.c b/tcp_splice.c index 717766a..4405224 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -381,14 +381,14 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport);
+ conn_event(c, conn, SPLICE_CONNECT); + if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) { if (errno != EINPROGRESS) { flow_trace(conn, "Couldn't connect socket for splice: %s", strerror_(errno)); return -errno; } - - conn_event(c, conn, SPLICE_CONNECT);
I don't really understand the rationale for this.
If we call connect(), I think we should be ready to handle events on the socket/flow at that point. Now, it's all synchronous so we won't actually get events before we call conn_event(), but it makes more sense than the alternative, that is, having a potentially connect()ed socket around not in any epoll set.
} else { conn_event(c, conn, SPLICE_ESTABLISHED); return tcp_splice_connect_finish(c, conn);
I think the true fix for this specific failure on the connect-error path is to check flow_in_epoll() before calling flow_epollfd() / epoll_del() in the CLOSING path of conn_flag_do().
I need to re-run tests anyway so I can merge another patch doing that but I'm trying to hurry now. A few minutes is fine though. -- Stefano
On Tue, Dec 09, 2025 at 01:21:44AM +0100, Stefano Brivio wrote:
On Tue, 9 Dec 2025 11:09:33 +1100 David Gibson
wrote: On Tue, Dec 09, 2025 at 12:53:33AM +0100, Stefano Brivio wrote:
...otherwise, if we have a real error on connect() (that is, not EINPROGRESS), we'll return early from tcp_splice_connect() and later try to fetch the epoll file descriptor:
ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
which is still (correctly) EPOLLFD_ID_INVALID.
Replace the ASSERT() in flow_epollfd() with a warning, as it looks like there might be harmless cases where the socket is not in the epoll set yet, and we'll just crash for nothing. We can turn this back to an ASSERT() once we audit these paths in more detail.
Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-44734... Signed-off-by: Stefano Brivio
--- I might merge this in a bit even without review as we might now have broken distribution packages around. flow.c | 7 ++++++- tcp_splice.c | 4 ++-- 2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/flow.c b/flow.c index 8d72965..4f53486 100644 --- a/flow.c +++ b/flow.c @@ -359,7 +359,12 @@ bool flow_in_epoll(const struct flow_common *f) */ int flow_epollfd(const struct flow_common *f) { - ASSERT(f->epollid < EPOLLFD_ID_MAX); + if (f->epollid >= EPOLLFD_ID_MAX) { + flow_log_(f, true, LOG_WARNING, + "Invalid epollid %i for flow, assuming default", + f->epollid); + return epoll_id_to_fd[EPOLLFD_ID_DEFAULT]; + }
This LGTM for safety's sake, although it's conceptually ugly.
Well it's much uglier to have containers crashing randomly...
Certainly.
return epoll_id_to_fd[f->epollid]; } diff --git a/tcp_splice.c b/tcp_splice.c index 717766a..4405224 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -381,14 +381,14 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport);
+ conn_event(c, conn, SPLICE_CONNECT); + if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) { if (errno != EINPROGRESS) { flow_trace(conn, "Couldn't connect socket for splice: %s", strerror_(errno)); return -errno; } - - conn_event(c, conn, SPLICE_CONNECT);
I don't really understand the rationale for this.
If we call connect(), I think we should be ready to handle events on the socket/flow at that point.
Now, it's all synchronous so we won't actually get events before we call conn_event(), but it makes more sense than the alternative, that is, having a potentially connect()ed socket around not in any epoll set.
Ok, that makes sense.
Reviewed-by: David Gibson
} else { conn_event(c, conn, SPLICE_ESTABLISHED); return tcp_splice_connect_finish(c, conn);
I think the true fix for this specific failure on the connect-error path is to check flow_in_epoll() before calling flow_epollfd() / epoll_del() in the CLOSING path of conn_flag_do().
I need to re-run tests anyway so I can merge another patch doing that but I'm trying to hurry now. A few minutes is fine though.
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Stefano Brivio