Re: [PATCH 5/6] tcp_splice: Simplify EPOLLRDHUP / eof / FIN handling

21 May 2026

On Thu, May 21, 2026 at 07:40:31AM +0200, Stefano Brivio wrote:
...
On Thu, 21 May 2026 12:03:33 +1000
David Gibson  wrote:
...
On Wed, May 20, 2026 at 10:30:04PM +0200, Stefano Brivio wrote:
...
On Wed, 20 May 2026 23:08:50 +1000
David Gibson  wrote:
...
There are two ways we can tell one of our sockets has received a FIN.  We
can either see an EPOLLRDHUP epoll event, or we can get a zero-length read
(EOF) on the socket.  We currently use both, in a mildly confusing way:
we only set the FIN_RCVD() flag based on the EPOLLRDHUP event, but then
some other close out logic is based on seeing an EOF.
Simplify this by setting the flag based on only the EOF.  To make sure we
don't miss an event if we get an EPOLLRDHUP with no data, we trigger the
forwarding path for EPOLLRDHUP as well as EPOLLIN.
Signed-off-by: David Gibson 
---
 tcp_splice.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/tcp_splice.c b/tcp_splice.c
index 8fbd490f..b45f0060 100644
--- a/tcp_splice.c
+++ b/tcp_splice.c
@@ -487,7 +487,6 @@ static int tcp_splice_forward(struct ctx *c, struct
  uint8_t lowat_set_flag = RCVLOWAT_SET(fromsidei);
  uint8_t lowat_act_flag = RCVLOWAT_ACT(fromsidei);
  int never_read = 1;
-	int eof = 0;
while (1) {
      ssize_t readlen, written;
@@ -510,7 +509,7 @@ retry:
      flow_trace(conn, "%zi from read-side call", readlen);
if (!readlen) {
-			eof = 1;
+			conn_event(conn, FIN_RCVD(fromsidei));
I'm not sure if I really found a concrete issue with this, but it looks
a bit scary, because it changes the semantics of FIN_RCVD, which used to
mean that we infer we received a FIN, regardless of whether we're done
processing all data from that half of the connection.
Now FIN_RCVD is only set if we actually processed all the data and we
hit the end of file.
True.  But the only place that tested FIN_RCVD was at the end of
tcp_splice_forward(), conditional on 'eof' anyway.  In a sense, this
was the cause of bug202 - we had FIN_RCVD set, but we didn't process
it and shutdown() on the other side, because we didn't have eof.
That sounds like a good motivation to clean this up, just two concerns
below:
...
...
The (potential) issue I see here is that we get EPOLLRDHUP, splice()
returns -1 with EAGAIN in errno because we had no room in the pipe,
and it would have returned 0 instead.
Will we ever get our zero-sized "read" later? If not, we might have
missed EPOLLRDHUP *and* the end of file. I'm not entirely sure we have
guarantees in that sense from splice().
It's not really about guarantees from splice.  I'm pretty sure this is
ok, reasoning as follows.
Consider all the exit points from the loop body:
 - Each return is a return -1, so we kill the connection anyway.  They
   don't matter
 - Each continue, goto retry and the end of the body will do the read
   side splice() again, so get another chance to see the EOF
 - That leaves just the breaks
Consider each break (there are three, since patch 2 of this series)
      if (written < 0) {
      	if (!conn->pending[fromsidei])
      		break;
(1) The pipe is empty and the write-splice returned EAGAIN, so it
didn't remove data from the pipe.
You're assuming that !conn->pending[fromsidei] means that the pipe is
empty. From what we see of it, it is.
It does mean the pipe is empty.  Everything we put in, we've taken
out.  There cannot be anything in there.
...
What the kernel can do with it, though, is different. It might return
EAGAIN even if we think we should have space, because it's resizing it
under memory pressure or anything like that. Or it delays freeing up
space or accounting for whatever reason.
Theoretically, I suppose.  But !pending doesn't just mean the pipe is
not full it means it's completely completely empty.  Not being able to
put any bytes at all into an empty pipe would be *very* surprising.
So much so that if it happened in practice, I suspect we wouldn't be
safe not having epoll events on the pipe ends, so that we can be
notified when it deigns to accept some data.
...
So it would be nice to make this part robust to that. I thought setting
FIN_RCVD on EPOLLRDHUP was a good way to achieve that.
...
Therefore, the pipe must have been
empty before the write-splice.  Which means the read-splice can't have
blocked on a full pipe.
      	conn_event(conn, OUT_WAIT(!fromsidei));
      	break;
      }
(2) The pipe is non-empty and the write-splice returned EAGAIN, so it
must have blocked on the output socket.  We've set OUT_WAIT(), so
we'll get an EPOLLOUT at some point which will cause us to read-splice
again, meaning we get another chance to see the EOF.
...later. But what if we don't get a zero-sized read *at all*? I'm not
sure if splice() guarantees we do get one if we reach end-of-file.
...
That's something valid and very well established for read() and recv(),
but splice() is a bit weird. The documentation says:
A return value of 0 means end of input.
but I wouldn't assume we'll *always* get at least one in case of EOF.
What else could we plausibly get?
...
...
[...]
      if (conn->events & FIN_RCVD(fromsidei))
      	break;
(3) By the new semantics of FIN_RCVD, we *have* seen the EOF.
...
The existing implementation distinguishes between end-of-file we hit in
a given iteration, and EPOLLRDHUP we might have seen at any time.
That was actually intended.
It might be intended, but I can't see that we did anything with that
information.
We always set FIN_RCVD on it. You're right, if we only checked that on
'eof', that didn't solve much, but that wasn't necessarily intended. My
original intention was to make setting of FIN_RCVD (or whatever it was
originally) robust.
Ok, well.  I've spotted other changes to make in the vicinity that I
think will make some of this easier to reason about anyway.  So I'll
consider your points as I rework this and other patches.
...
...
That said the conditions on which we exit / retry this loop are pretty
darn confusing.  I'll see if I can improve them.
-- 
Stefano
-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson