On Tue, Dec 09, 2025 at 12:53:33AM +0100, Stefano Brivio wrote:
...otherwise, if we have a real error on connect() (that is, not EINPROGRESS), we'll return early from tcp_splice_connect() and later try to fetch the epoll file descriptor:
ASSERTION FAILED in flow_epollfd (flow.c:362): f->epollid < ((1 << 8) - 1)
which is still (correctly) EPOLLFD_ID_INVALID.
Replace the ASSERT() in flow_epollfd() with a warning, as it looks like there might be harmless cases where the socket is not in the epoll set yet, and we'll just crash for nothing. We can turn this back to an ASSERT() once we audit these paths in more detail.
Link: https://bodhi.fedoraproject.org/updates/FEDORA-2025-93b4eb64c3#comment-44734... Signed-off-by: Stefano Brivio
--- I might merge this in a bit even without review as we might now have broken distribution packages around. flow.c | 7 ++++++- tcp_splice.c | 4 ++-- 2 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/flow.c b/flow.c index 8d72965..4f53486 100644 --- a/flow.c +++ b/flow.c @@ -359,7 +359,12 @@ bool flow_in_epoll(const struct flow_common *f) */ int flow_epollfd(const struct flow_common *f) { - ASSERT(f->epollid < EPOLLFD_ID_MAX); + if (f->epollid >= EPOLLFD_ID_MAX) { + flow_log_(f, true, LOG_WARNING, + "Invalid epollid %i for flow, assuming default", + f->epollid); + return epoll_id_to_fd[EPOLLFD_ID_DEFAULT]; + }
This LGTM for safety's sake, although it's conceptually ugly.
return epoll_id_to_fd[f->epollid]; } diff --git a/tcp_splice.c b/tcp_splice.c index 717766a..4405224 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -381,14 +381,14 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn)
pif_sockaddr(c, &sa, tgtpif, &tgt->eaddr, tgt->eport);
+ conn_event(c, conn, SPLICE_CONNECT); + if (connect(conn->s[1], &sa.sa, socklen_inany(&sa))) { if (errno != EINPROGRESS) { flow_trace(conn, "Couldn't connect socket for splice: %s", strerror_(errno)); return -errno; } - - conn_event(c, conn, SPLICE_CONNECT);
I don't really understand the rationale for this.
} else { conn_event(c, conn, SPLICE_ESTABLISHED); return tcp_splice_connect_finish(c, conn);
I think the true fix for this specific failure on the connect-error path is to check flow_in_epoll() before calling flow_epollfd() / epoll_del() in the CLOSING path of conn_flag_do(). -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson