[PATCH 1/6] tcp_splice: Improve error reporting
A number of things can, at least theoretically, go wrong when forwarding
data across a spliced connection. We generally handle this by resetting
the connection on both sides. However, in many cases we don't log any
message about why the connection was reset, which can make it hard to
debug why this is happening.
Add a bunch of debug and error logging to make this easier to figure out.
Signed-off-by: David Gibson
On Wed, 20 May 2026 23:08:46 +1000
David Gibson
A number of things can, at least theoretically, go wrong when forwarding data across a spliced connection. We generally handle this by resetting the connection on both sides. However, in many cases we don't log any message about why the connection was reset, which can make it hard to debug why this is happening.
Add a bunch of debug and error logging to make this easier to figure out.
Signed-off-by: David Gibson
--- tcp_splice.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 42ee8abc..1359d6b8 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -502,15 +502,18 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, if (rc) flow_perror(conn, "Error retrieving SO_ERROR"); else - flow_trace(conn, "Error event on socket: %s", - strerror_(err)); - + flow_dbg(conn, "Error event on %s socket: %s", + pif_name(conn->f.pif[evsidei]), + strerror_(err)); goto reset; }
if (conn->events == SPLICE_CONNECT) { - if (!(events & EPOLLOUT)) + if (!(events & EPOLLOUT)) { + flow_err(conn, "Unexpected events 0x%x during connect", + events);
Shouldn't all the flow_err() and flow_perror() calls here be ratelimited, that is, eventually calling the err_ratelimit() function Anshu introduced recently? We don't have helpers ready for flow_err() and flow_perror(), I was about to post a patch that would go before this series but I'm not sure if there's a specific reason to avoid those.
goto reset; + } if (tcp_splice_connect_finish(c, conn)) goto reset; } @@ -545,8 +548,11 @@ retry: SPLICE_F_MOVE | SPLICE_F_NONBLOCK); while (readlen < 0 && errno == EINTR);
- if (readlen < 0 && errno != EAGAIN) + if (readlen < 0 && errno != EAGAIN) { + flow_perror(conn, "Splicing from %s socket", + pif_name(conn->f.pif[fromsidei])); goto reset; + }
flow_trace(conn, "%zi from read-side call", readlen);
@@ -569,8 +575,11 @@ retry: SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK); while (written < 0 && errno == EINTR);
- if (written < 0 && errno != EAGAIN) + if (written < 0 && errno != EAGAIN) { + flow_perror(conn, "Splicing to %s socket", + pif_name(conn->f.pif[!fromsidei])); goto reset; + }
flow_trace(conn, "%zi from write-side call (passed %zi)", written, c->tcp.pipe_size); @@ -627,8 +636,11 @@ retry: flow_foreach_sidei(sidei) { if ((conn->events & FIN_RCVD(sidei)) && !(conn->events & FIN_SENT(!sidei))) { - if (shutdown(conn->s[!sidei], SHUT_WR) < 0) + if (shutdown(conn->s[!sidei], SHUT_WR) < 0) { + flow_perror(conn, "shutdown() on %s", + pif_name(conn->f.pif[!sidei])); goto reset; + } conn_event(conn, FIN_SENT(!sidei)); } } @@ -647,8 +659,11 @@ retry: goto swap; }
- if (events & EPOLLHUP) + if (events & EPOLLHUP) { + flow_dbg(conn, "Hangup from %s socket", + pif_name(conn->f.pif[evsidei])); goto reset; + }
return;
-- Stefano
On Wed, May 20, 2026 at 04:31:34PM +0200, Stefano Brivio wrote:
On Wed, 20 May 2026 23:08:46 +1000 David Gibson
wrote: A number of things can, at least theoretically, go wrong when forwarding data across a spliced connection. We generally handle this by resetting the connection on both sides. However, in many cases we don't log any message about why the connection was reset, which can make it hard to debug why this is happening.
Add a bunch of debug and error logging to make this easier to figure out.
Signed-off-by: David Gibson
--- tcp_splice.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 42ee8abc..1359d6b8 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -502,15 +502,18 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, if (rc) flow_perror(conn, "Error retrieving SO_ERROR"); else - flow_trace(conn, "Error event on socket: %s", - strerror_(err)); - + flow_dbg(conn, "Error event on %s socket: %s", + pif_name(conn->f.pif[evsidei]), + strerror_(err)); goto reset; }
if (conn->events == SPLICE_CONNECT) { - if (!(events & EPOLLOUT)) + if (!(events & EPOLLOUT)) { + flow_err(conn, "Unexpected events 0x%x during connect", + events);
Shouldn't all the flow_err() and flow_perror() calls here be ratelimited, that is, eventually calling the err_ratelimit() function Anshu introduced recently?
I did think about that, I concluded it wasn't necessary here because it indicates that something has gone unexpectedly wrong at the kernel level, it's not guest triggerable. I can put in ratelimits if you still think they're necessary.
We don't have helpers ready for flow_err() and flow_perror(), I was about to post a patch that would go before this series but I'm not sure if there's a specific reason to avoid those.
goto reset; + } if (tcp_splice_connect_finish(c, conn)) goto reset; } @@ -545,8 +548,11 @@ retry: SPLICE_F_MOVE | SPLICE_F_NONBLOCK); while (readlen < 0 && errno == EINTR);
- if (readlen < 0 && errno != EAGAIN) + if (readlen < 0 && errno != EAGAIN) { + flow_perror(conn, "Splicing from %s socket", + pif_name(conn->f.pif[fromsidei])); goto reset; + }
flow_trace(conn, "%zi from read-side call", readlen);
@@ -569,8 +575,11 @@ retry: SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK); while (written < 0 && errno == EINTR);
- if (written < 0 && errno != EAGAIN) + if (written < 0 && errno != EAGAIN) { + flow_perror(conn, "Splicing to %s socket", + pif_name(conn->f.pif[!fromsidei])); goto reset; + }
flow_trace(conn, "%zi from write-side call (passed %zi)", written, c->tcp.pipe_size); @@ -627,8 +636,11 @@ retry: flow_foreach_sidei(sidei) { if ((conn->events & FIN_RCVD(sidei)) && !(conn->events & FIN_SENT(!sidei))) { - if (shutdown(conn->s[!sidei], SHUT_WR) < 0) + if (shutdown(conn->s[!sidei], SHUT_WR) < 0) { + flow_perror(conn, "shutdown() on %s", + pif_name(conn->f.pif[!sidei])); goto reset; + } conn_event(conn, FIN_SENT(!sidei)); } } @@ -647,8 +659,11 @@ retry: goto swap; }
- if (events & EPOLLHUP) + if (events & EPOLLHUP) { + flow_dbg(conn, "Hangup from %s socket", + pif_name(conn->f.pif[evsidei]));
Except for this one, which is debug level for that reason.
goto reset; + }
return;
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Thu, 21 May 2026 10:43:37 +1000
David Gibson
On Wed, May 20, 2026 at 04:31:34PM +0200, Stefano Brivio wrote:
On Wed, 20 May 2026 23:08:46 +1000 David Gibson
wrote: A number of things can, at least theoretically, go wrong when forwarding data across a spliced connection. We generally handle this by resetting the connection on both sides. However, in many cases we don't log any message about why the connection was reset, which can make it hard to debug why this is happening.
Add a bunch of debug and error logging to make this easier to figure out.
Signed-off-by: David Gibson
--- tcp_splice.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 42ee8abc..1359d6b8 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -502,15 +502,18 @@ void tcp_splice_sock_handler(struct ctx *c, union epoll_ref ref, if (rc) flow_perror(conn, "Error retrieving SO_ERROR"); else - flow_trace(conn, "Error event on socket: %s", - strerror_(err)); - + flow_dbg(conn, "Error event on %s socket: %s", + pif_name(conn->f.pif[evsidei]), + strerror_(err)); goto reset; }
if (conn->events == SPLICE_CONNECT) { - if (!(events & EPOLLOUT)) + if (!(events & EPOLLOUT)) { + flow_err(conn, "Unexpected events 0x%x during connect", + events);
Shouldn't all the flow_err() and flow_perror() calls here be ratelimited, that is, eventually calling the err_ratelimit() function Anshu introduced recently?
I did think about that, I concluded it wasn't necessary here because it indicates that something has gone unexpectedly wrong at the kernel level, it's not guest triggerable.
I can put in ratelimits if you still think they're necessary.
I do, yes, because we're dangerously close to something the container can indirectly try to trigger, I think.
We don't have helpers ready for flow_err() and flow_perror(), I was about to post a patch that would go before this series but I'm not sure if there's a specific reason to avoid those.
goto reset; + } if (tcp_splice_connect_finish(c, conn)) goto reset; } @@ -545,8 +548,11 @@ retry: SPLICE_F_MOVE | SPLICE_F_NONBLOCK); while (readlen < 0 && errno == EINTR);
- if (readlen < 0 && errno != EAGAIN) + if (readlen < 0 && errno != EAGAIN) { + flow_perror(conn, "Splicing from %s socket", + pif_name(conn->f.pif[fromsidei])); goto reset; + }
flow_trace(conn, "%zi from read-side call", readlen);
@@ -569,8 +575,11 @@ retry: SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK); while (written < 0 && errno == EINTR);
- if (written < 0 && errno != EAGAIN) + if (written < 0 && errno != EAGAIN) { + flow_perror(conn, "Splicing to %s socket", + pif_name(conn->f.pif[!fromsidei])); goto reset; + }
flow_trace(conn, "%zi from write-side call (passed %zi)", written, c->tcp.pipe_size); @@ -627,8 +636,11 @@ retry: flow_foreach_sidei(sidei) { if ((conn->events & FIN_RCVD(sidei)) && !(conn->events & FIN_SENT(!sidei))) { - if (shutdown(conn->s[!sidei], SHUT_WR) < 0) + if (shutdown(conn->s[!sidei], SHUT_WR) < 0) { + flow_perror(conn, "shutdown() on %s", + pif_name(conn->f.pif[!sidei])); goto reset; + } conn_event(conn, FIN_SENT(!sidei)); } } @@ -647,8 +659,11 @@ retry: goto swap; }
- if (events & EPOLLHUP) + if (events & EPOLLHUP) { + flow_dbg(conn, "Hangup from %s socket", + pif_name(conn->f.pif[evsidei]));
Except for this one, which is debug level for that reason.
goto reset; + }
return;
-- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio