[PATCH v2 0/4] Fix race condition while closing spliced connections
Fix bug 202, where a race condition could cause connections to be incorrectly reset in certain circumstances. Patch 2/4 is the bug fix proper. 1/4 improves error reporting and debugging messages in the vicinity. Patches 3..4/4 are some small cleanups I noticed in the area while working on the fix. Link: https://bugs.passt.top/show_bug.cgi?id=202 v2: * Formatting and comment fixes, per Stefano's review * Dropped patches 5 & 6 for now. I still think they're worthwhile, but are closely related to other oddities that need some work. I didn't want to delay the bugfix itself. David Gibson (4): tcp_splice: Improve error reporting tcp_splice: Avoid missing EOF recognition while forwarding tcp_splice: Clean up flow control path for splice forwarding tcp_splice: Simplify tracking of read/written bytes flow.h | 7 ++ log.h | 19 +++--- passt.c | 2 +- tcp_conn.h | 6 +- tcp_splice.c | 185 ++++++++++++++++++++++++++++++--------------------- tcp_splice.h | 2 +- 6 files changed, 129 insertions(+), 92 deletions(-) -- 2.54.0
A number of things can, at least theoretically, go wrong when forwarding
data across a spliced connection. We generally handle this by resetting
the connection on both sides. However, in many cases we don't log any
message about why the connection was reset, which can make it hard to
debug why this is happening.
Add a bunch of debug and error logging to make this easier to figure out.
We ratelimit for safety, which requires some tweaks to make the ratelimit
logic work with the flow specific log functions.
Signed-off-by: David Gibson
Splice forwarding can be blocked either waiting for data from one side
or waiting for space on the other. For that reason,
tcp_splice_sock_handler() on either socket can forward data in either or
both directions, depending on whether we have EPOLLIN, EPOLLOUT or both
events.
The flow control for this is quite hard to follow though, since we forward
in one direction, then sometimes loop back with a goto to do it in the
other direction. Simplify this by adding a tcp_splice_forward() function
with the logic to forward in one direction and calling it either once or
twice from tcp_splice_sock_handler().
Signed-off-by: David Gibson
tcp_splice_sock_handler() has an optimised path for the common case where
the amount we splice(2) into the pipe is exactly the same as the amount we
splice(2) out again. If the pipe is empty at that point, we stop
forwarding until we get another epoll event.
However, via a subtle chain of events, this can cause a bug for a
half-closed connection. Suppose the connection is already half-closed in
the other direction - that is, we've already called shutdown(SHUT_WR) on
the socket for which we're getting the event. In this event we're getting
the last batch of data in the other direction, and also a FIN. This can
result in EPOLLIN, EPOLLRDHUP and EPOLLHUP events simultaneously.
We read the last data from the socket and successfully splice it to the
other side. Since there is no data in the pipe, we exit the forwarding
loop. However, because we did read data, we don't set the eof flag.
Because we don't set eof, we don't (yet) propagate the FIN to the other
side, or set FIN_SENT_(!fromsidei). Therefore we don't (yet) recognize
this as a clean termination and set the CLOSING flag. We would correct
this when we get our next event, however before we can do so we process
the EPOLLHUP event. Because we haven't recognized this as a clean close
we assume it is an abrupt close and send an RST to the other side.
To avoid this, don't stop attempting to forward data on this path.
Continue for at least one more loop. If we're at EOF, we'll recognize it
on the next splice(2). If not it gives us an opportunity to forward more
data without returning to the mail epoll loop.
Reported-by: Paul Holzinger
For each each direction of each spliced connection, we keep track of how
many bytes we've read from one socket and written to the other. However,
we never actually care about the absolute values of these, only the
difference between them, which represents how much data is currently "in
flight" in the splicing pipe.
Simplify the handling by having a single variable tracking the number of
bytes in the pipe.
As a bonus, the new scheme makes it clearer that we don't need to worry
about overflows: pending can never become larger than the maximum pipe
bufffer size, well within 32-bits.
I _think_ the old scheme was safe in the case of overflow - again under
the assumption that read/written can never be further apart than the pipe
buffer size. However, it's much harder to reason about this case. It's
certainly plausible that an overflow could occur - sending 4GiB through
a local socket is entirely achievable.
Signed-off-by: David Gibson
participants (1)
-
David Gibson