On Sun, Feb 18, 2024 at 10:01:24PM +0100, Stefano Brivio wrote:On Tue, 6 Feb 2024 12:17:28 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Yeah, it took me a while to discover that too :)This makes a number of changes to improve error reporting while connecting a new spliced socket: * We use flow_err() and similar functions so all messages include info on which specific flow was affected * We use strerror() to interpret raw error values * We now report errors on connection (at "trace" level, since this would allow spamming the logs) * We also look up and report some details on EPOLLERR events, which can include connection errors, since we use a non-blocking connect(). Again we use "trace" level since this can spam the logs. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp_splice.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 5ba9c8ea..49075e5c 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -349,8 +349,9 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn, ASSERT(0); if (conn->s[1] < 0) { - warn("Couldn't open connectable socket for splice (%d)", - conn->s[1]); + flow_err(conn, + "Couldn't open connectable socket for splice: %s", + strerror(-conn->s[1]));It took me a bit to convince myself that we actually store negative error codes in pools, which sounds like a neat idea, except for two things:- in tcp_sock_refill_pool(), once we get an error from tcp_conn_new_sock(), should we really continue to call it and try to get more sockets? Presumably, that will have something to do with some kind of resource exhaustion, so perhaps we shouldn't risk making it worse and just stop refilling the pool. If we do, we might have up to one negative error code in the pool, I guess.Some sort of exhaustion is certainly the most likely cause, yes.- if we have an error code in the pool, that error might have occurred a while ago, but here, we're logging it at the moment we need that socket.Fair point.Maybe it would be simpler and saner to just have -1 values in the pool (error or no socket available) and report errors in tcp_conn_new_sock() instead?Yeah, that makes sense. I'll rework to that goal.I'm still reviewing patches 17/22 to 22/22, sorry for the delay.-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson