[PATCH 00/11] tcp_splice: Better exploit symmetry between sides of connection
For spliced connections, both "sides" are sockets, and for many purposes how we deal with each side is symmetric. Currently, however, we track the information for each side in independent fields in the structure, meaning we can't easily exploit that symmetry. This makes a number of reorganizations of the tcp splice code so that we can explot that symmetry to reduce code size. This will have some additional advantages when we come to integrate with the in-progress unified flow table. Based on top of the interface identifiers and automatic forwarding cleanup series. David Gibson (11): tcp_splice: Remove redundant tcp_splice_epoll_ctl() tcp_splice: Correct error handling in tcp_splice_epoll_ctl() tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl() tcp_splice: Remove unnecessary forward declaration tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl() tcp_splice: Don't pool pipes in pairs tcp_splice: Rename sides of connection from a/b to 0/1 tcp_splice: Exploit side symmetry in tcp_splice_timer() tcp_splice: Exploit side symmetry in tcp_splice_connect_finish() tcp_splice: Exploit side symmetry in tcp_splice_destroy() tcp_splice: Simplify selection of socket and pipe sides in socket handler tcp_conn.h | 45 +++--- tcp_splice.c | 389 +++++++++++++++++++++------------------------------ 2 files changed, 179 insertions(+), 255 deletions(-) -- 2.41.0
tcp_splice_conn_update() calls tcp_splice_epoll_ctl() twice: first ignoring
the return value, then checking it. This serves no purpose. If the first
call succeeds, the second call will do exactly the same thing again, since
nothing has changed in conn. If the first call fails, then
tcp_splice_epoll_ctl() itself will EPOLL_CTL_DEL both fds, meaning when
the second call tries to EPOLL_CTL_MOD them it will necessarily fail.
It appears that this duplication was introduced by accident in an
otherwise unrelated patch.
Fixes: bb708111 ("treewide: Packet abstraction with mandatory boundary checks")
Signed-off-by: David Gibson
If we get an error from epoll_ctl() in tcp_splice_epoll_ctl() we goto the
'delete' path where we remove both sockets from the epoll set and return
an error. There are several problems with this:
- We 'return -errno' after the EPOLL_CTL_DEL operations, which means the
deleting epoll_ctl() calls may have overwritten the errno values which
actually triggered the failures.
- The call from conn_flag_do() occurs when the CLOSING flag is set, in
which case we go do the delete path regardless of error. In that case
the 'return errno' is meaningless since we don't expect the EPOLL_CTL_DEL
operations to fail and we ignore the return code anyway.
- All other calls to tcp_splice_epoll_ctl() check the return code and if
non-zero immediately call conn_flag(..., CLOSING) which will call
tcp_splice_epoll_ctl() again explicitly to remove the sockets from epoll.
That means removing them when the error first occurs is redundant.
- We never specifically report an error on the epoll_ctl() operations. We
just set the connection to CLOSING, more or less silently killing it.
This could make debugging difficult in the unlikely even that we get a
failure here.
Re-organise tcp_splice_epoll_ctl() to just log a message then return in the
error case, and only EPOLL_CTL_DEL when explicitly asked to with the
CLOSING flag.
Signed-off-by: David Gibson
tcp_splice_epoll_ctl() removes both sockets from the epoll set if called
when conn->flags & CLOSING. This will always happen immediately after
setting that flag, since conn_flag_do() makes the call itself. That's also
the _only_ time it can happen: we perform the EPOLL_CTL_DEL without
clearing the conn->in_epoll flag, meaning that any further calls to
tcp_splice_epoll_ctl() would attempt EPOLL_CTL_MOD, which would necessarily
fail since the fds are no longer in the epoll.
The EPOLL_CTL_DEL path in tcp_splice_epoll_ctl() has essentially zero
overlap with anything else the function does, so just move them to be
open coded in conn_flag_do().
Caveat: this does require kernel 2.6.9 or later, in order to pass NULL as
the event structure for epoll_ctl(). I _think_ we already require newer
kernels than that for other features. We could work around that if support
for such old kernels is important.
Given that, simply directly perform the EPOLL_CTL_DEL operations from
conn_flag_do() rather than unnecessarily multiplexini
Signed-off-by: David Gibson
On Thu, 12 Oct 2023 12:51:06 +1100
David Gibson
tcp_splice_epoll_ctl() removes both sockets from the epoll set if called when conn->flags & CLOSING. This will always happen immediately after setting that flag, since conn_flag_do() makes the call itself. That's also the _only_ time it can happen: we perform the EPOLL_CTL_DEL without clearing the conn->in_epoll flag, meaning that any further calls to tcp_splice_epoll_ctl() would attempt EPOLL_CTL_MOD, which would necessarily fail since the fds are no longer in the epoll.
The EPOLL_CTL_DEL path in tcp_splice_epoll_ctl() has essentially zero overlap with anything else the function does, so just move them to be open coded in conn_flag_do().
Caveat: this does require kernel 2.6.9 or later, in order to pass NULL as the event structure for epoll_ctl(). I _think_ we already require newer kernels than that for other features.
Yes, we do, we need at least 3.13 for unprivileged user namespaces. -- Stefano
On Fri, Nov 03, 2023 at 05:20:53PM +0100, Stefano Brivio wrote:
On Thu, 12 Oct 2023 12:51:06 +1100 David Gibson
wrote: tcp_splice_epoll_ctl() removes both sockets from the epoll set if called when conn->flags & CLOSING. This will always happen immediately after setting that flag, since conn_flag_do() makes the call itself. That's also the _only_ time it can happen: we perform the EPOLL_CTL_DEL without clearing the conn->in_epoll flag, meaning that any further calls to tcp_splice_epoll_ctl() would attempt EPOLL_CTL_MOD, which would necessarily fail since the fds are no longer in the epoll.
The EPOLL_CTL_DEL path in tcp_splice_epoll_ctl() has essentially zero overlap with anything else the function does, so just move them to be open coded in conn_flag_do().
Caveat: this does require kernel 2.6.9 or later, in order to pass NULL as the event structure for epoll_ctl(). I _think_ we already require newer kernels than that for other features.
Yes, we do, we need at least 3.13 for unprivileged user namespaces.
Thanks, message updated accordingly. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
In tcp_splice.c we forward declare tcp_splice_epoll_ctl() then define it
later on. However, there are no circular dependencies which prevent us
from simply having the full definition in place of the forward declaration.
Signed-off-by: David Gibson
We initialise the events_a and events_b variables with
tcp_splice_conn_epoll_events() function, then immediately copy the values
into ev_a.events and ev_b.events. We can't simply pass &ev_[ab].events to
tcp_splice_conn_epoll_events(), because struct epoll_event is packed,
leading to 'pointer may be unaligned' warnings if we attempt that.
We can, however, make tcp_splice_conn_epoll_events() take struct
epoll_event pointers rather than raw u32 pointers, avoiding the awkward
temporaries.
Signed-off-by: David Gibson
On Thu, 12 Oct 2023 12:51:08 +1100
David Gibson
We initialise the events_a and events_b variables with tcp_splice_conn_epoll_events() function, then immediately copy the values into ev_a.events and ev_b.events. We can't simply pass &ev_[ab].events to tcp_splice_conn_epoll_events(), because struct epoll_event is packed, leading to 'pointer may be unaligned' warnings if we attempt that.
We can, however, make tcp_splice_conn_epoll_events() take struct epoll_event pointers rather than raw u32 pointers, avoiding the awkward temporaries.
Signed-off-by: David Gibson
--- tcp_splice.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 439fc1d..3419207 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -95,25 +95,26 @@ static int tcp_sock_refill_ns(void *arg); /** * tcp_splice_conn_epoll_events() - epoll events masks for given state * @events: Connection event flags - * @a: Event mask for socket with accepted connection, set on return - * @b: Event mask for connection target socket, set on return + * @a: Event for socket with accepted connection, set on return + * @b: Event for connection target socket, set on return */ static void tcp_splice_conn_epoll_events(uint16_t events, - uint32_t *a, uint32_t *b) + struct epoll_event *a, + struct epoll_event *b) { - *a = *b = 0; + a->events = b->events = 0;
if (events & SPLICE_ESTABLISHED) { if (!(events & B_FIN_SENT)) - *a = EPOLLIN | EPOLLRDHUP; + a->events = EPOLLIN | EPOLLRDHUP; if (!(events & A_FIN_SENT)) - *b = EPOLLIN | EPOLLRDHUP; + b->events = EPOLLIN | EPOLLRDHUP; } else if (events & SPLICE_CONNECT) { - *b = EPOLLOUT; + b->events = EPOLLOUT; }
- *a |= (events & A_OUT_WAIT) ? EPOLLOUT : 0; - *b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0; + a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0; + a->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
This should be b->events |= ..., but it's fixed in 7/11 anyway. -- Stefano
On Fri, Nov 03, 2023 at 05:21:34PM +0100, Stefano Brivio wrote:
On Thu, 12 Oct 2023 12:51:08 +1100 David Gibson
wrote: We initialise the events_a and events_b variables with tcp_splice_conn_epoll_events() function, then immediately copy the values into ev_a.events and ev_b.events. We can't simply pass &ev_[ab].events to tcp_splice_conn_epoll_events(), because struct epoll_event is packed, leading to 'pointer may be unaligned' warnings if we attempt that.
We can, however, make tcp_splice_conn_epoll_events() take struct epoll_event pointers rather than raw u32 pointers, avoiding the awkward temporaries.
Signed-off-by: David Gibson
--- tcp_splice.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 439fc1d..3419207 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -95,25 +95,26 @@ static int tcp_sock_refill_ns(void *arg); /** * tcp_splice_conn_epoll_events() - epoll events masks for given state * @events: Connection event flags - * @a: Event mask for socket with accepted connection, set on return - * @b: Event mask for connection target socket, set on return + * @a: Event for socket with accepted connection, set on return + * @b: Event for connection target socket, set on return */ static void tcp_splice_conn_epoll_events(uint16_t events, - uint32_t *a, uint32_t *b) + struct epoll_event *a, + struct epoll_event *b) { - *a = *b = 0; + a->events = b->events = 0;
if (events & SPLICE_ESTABLISHED) { if (!(events & B_FIN_SENT)) - *a = EPOLLIN | EPOLLRDHUP; + a->events = EPOLLIN | EPOLLRDHUP; if (!(events & A_FIN_SENT)) - *b = EPOLLIN | EPOLLRDHUP; + b->events = EPOLLIN | EPOLLRDHUP; } else if (events & SPLICE_CONNECT) { - *b = EPOLLOUT; + b->events = EPOLLOUT; }
- *a |= (events & A_OUT_WAIT) ? EPOLLOUT : 0; - *b |= (events & B_OUT_WAIT) ? EPOLLOUT : 0; + a->events |= (events & A_OUT_WAIT) ? EPOLLOUT : 0; + a->events |= (events & B_OUT_WAIT) ? EPOLLOUT : 0;
This should be b->events |= ..., but it's fixed in 7/11 anyway.
Oops. Fixed, nonetheless. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
To reduce latencies, the tcp splice code maintains a pool of pre-opened
pipes to use for new connections. This is structured as an array of pairs
of pipes, with each pipe, of course, being a pair of fds. Thus when we
use the pool, a single pool "slot" provides both the a->b and b->a pipes.
There's no strong reason to store the pool in pairs, though - we can
with not much difficulty instead take the a->b and b->a pipes for a new
connection independently from separate slots in the pool, or even take one
from the the pool and create the other as we need it, if there's only one
pipe left in the pool.
This marginally increases the length of code, but simplifies the structure
of the pipe pool. We should be able to re-shrink the code with later
changes, too.
In the process we also fix some minor bugs:
- If we both failed to find a pipe in the pool and to create a new one, we
didn't log an error and would silently drop the connection. That could
make debugging such a situation difficult. Add in an error message for
that case
- When refilling the pool, if we were only able to open a single pipe in
the pair, we attempted to rollback, but instead of closing the opened
pipe, we instead closed the pipe we failed to open (probably leading to
some ignored EBADFD errors).
Signed-off-by: David Gibson
Each spliced connection has two mostly, although not entirely, symmetric
sides. We currently call those "a" and "b" and have different fields in
the connection structure for each one.
We can better exploit that symmetry if we use two element arrays rather
thatn separately named fields. Do that in the places we can, and for the
others change the "a"/"b" terminology to 0/1 to match.
Signed-off-by: David Gibson
tcp_splice_timer() has two very similar blocks one after another that
handle the SO_RCVLOWAT flags for the two sides of the connection. We can
deduplicate this with a loop across the two sides.
Signed-off-by: David Gibson
On Thu, 12 Oct 2023 12:51:11 +1100
David Gibson
tcp_splice_timer() has two very similar blocks one after another that handle the SO_RCVLOWAT flags for the two sides of the connection. We can deduplicate this with a loop across the two sides.
Signed-off-by: David Gibson
--- tcp_splice.c | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 4a0580c..259d774 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -835,30 +835,25 @@ void tcp_splice_init(struct ctx *c) void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union) { struct tcp_splice_conn *conn = &conn_union->splice; + int side;
if (conn->flags & CLOSING) { tcp_splice_destroy(c, conn_union); return; }
- if ( (conn->flags & RCVLOWAT_SET_0) && - !(conn->flags & RCVLOWAT_ACT_0)) { - if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT, - &((int){ 1 }), sizeof(int))) { - trace("TCP (spliced): can't set SO_RCVLOWAT on " - "%i", conn->s[0]); - } - conn_flag(c, conn, ~RCVLOWAT_SET_0); - } + for (side = 0; side < SIDES; side++) { + uint8_t set = side == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1; + uint8_t act = side == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
- if ( (conn->flags & RCVLOWAT_SET_1) && - !(conn->flags & RCVLOWAT_ACT_1)) { - if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT, - &((int){ 1 }), sizeof(int))) { - trace("TCP (spliced): can't set SO_RCVLOWAT on " - "%i", conn->s[1]); + if ( (conn->flags & set) && !(conn->flags & act)) { ^
Nit: this extra whitespace is now useless. -- Stefano
On Fri, Nov 03, 2023 at 05:21:53PM +0100, Stefano Brivio wrote:
On Thu, 12 Oct 2023 12:51:11 +1100 David Gibson
wrote: tcp_splice_timer() has two very similar blocks one after another that handle the SO_RCVLOWAT flags for the two sides of the connection. We can deduplicate this with a loop across the two sides.
Signed-off-by: David Gibson
--- tcp_splice.c | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 4a0580c..259d774 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -835,30 +835,25 @@ void tcp_splice_init(struct ctx *c) void tcp_splice_timer(struct ctx *c, union tcp_conn *conn_union) { struct tcp_splice_conn *conn = &conn_union->splice; + int side;
if (conn->flags & CLOSING) { tcp_splice_destroy(c, conn_union); return; }
- if ( (conn->flags & RCVLOWAT_SET_0) && - !(conn->flags & RCVLOWAT_ACT_0)) { - if (setsockopt(conn->s[0], SOL_SOCKET, SO_RCVLOWAT, - &((int){ 1 }), sizeof(int))) { - trace("TCP (spliced): can't set SO_RCVLOWAT on " - "%i", conn->s[0]); - } - conn_flag(c, conn, ~RCVLOWAT_SET_0); - } + for (side = 0; side < SIDES; side++) { + uint8_t set = side == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1; + uint8_t act = side == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
- if ( (conn->flags & RCVLOWAT_SET_1) && - !(conn->flags & RCVLOWAT_ACT_1)) { - if (setsockopt(conn->s[1], SOL_SOCKET, SO_RCVLOWAT, - &((int){ 1 }), sizeof(int))) { - trace("TCP (spliced): can't set SO_RCVLOWAT on " - "%i", conn->s[1]); + if ( (conn->flags & set) && !(conn->flags & act)) { ^
Nit: this extra whitespace is now useless.
Corrected. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
tcp_splice_connect_finish() has two very similar blocks opening the two
pipes for each direction of the connection. We can deduplicate this with
a loop across the two sides.
Signed-off-by: David Gibson
tcp_splice_destroy() has some close-to-duplicated logic handling closing of
the socket and ipies for each side of the connection. We can use a loop
across the sides to reduce the duplication.
Signed-off-by: David Gibson
On Thu, 12 Oct 2023 12:51:13 +1100
David Gibson
tcp_splice_destroy() has some close-to-duplicated logic handling closing of the socket and ipies for each side of the connection. We can use a loop ^^^^^ pipes
across the sides to reduce the duplication.
Signed-off-by: David Gibson
--- tcp_splice.c | 32 ++++++++++++++------------------ 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 99ef8a4..239f6d2 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new) void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union) { struct tcp_splice_conn *conn = &conn_union->splice; + int side;
- if (conn->events & SPLICE_ESTABLISHED) { - /* Flushing might need to block: don't recycle them. */ - if (conn->pipe[0][0] != -1) { - close(conn->pipe[0][0]); - close(conn->pipe[0][1]); - conn->pipe[0][0] = conn->pipe[0][1] = -1; + for (side = 0; side < SIDES; side++) { + if (conn->events & SPLICE_ESTABLISHED) { + /* Flushing might need to block: don't recycle them. */ + if (conn->pipe[side][0] != -1) { + close(conn->pipe[side][0]); + close(conn->pipe[side][1]); + conn->pipe[side][0] = conn->pipe[side][1] = -1; + } } - if (conn->pipe[1][0] != -1) { - close(conn->pipe[1][0]); - close(conn->pipe[1][1]); - conn->pipe[1][0] = conn->pipe[1][1] = -1; + + if (side == 0 || conn->events & SPLICE_CONNECT) { + close(conn->s[side]); + conn->s[side] = -1; } - }
- if (conn->events & SPLICE_CONNECT) { - close(conn->s[1]); - conn->s[1] = -1; + conn->read[side] = conn->written[side] = 0; }
- close(conn->s[0]); - conn->s[0] = -1; - conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
With this, on SPLICE_CONNECT, we would close the [0] side, but not the [1] side. SPLICE_CONNECT means we already have an open socket for [1], though. I think it should be: [loop on sides] if (side == 1 || conn->events & SPLICE_CONNECT) { close(conn->s[side]); conn->s[1] = -1; } } and then we still need to unconditionally close conn->s[0]. Perhaps we could take both parts outside of the loop: if (conn->events & SPLICE_CONNECT) { close(conn->s[1]); conn->s[1] = -1; } close(conn->s[0]); conn->s[0] = -1; conn->read[side] = conn->written[side] = 0; The handling for the SPLICE_ESTABLISHED case looks correct to me. -- Stefano
On Fri, Nov 03, 2023 at 05:22:13PM +0100, Stefano Brivio wrote:
On Thu, 12 Oct 2023 12:51:13 +1100 David Gibson
wrote: tcp_splice_destroy() has some close-to-duplicated logic handling closing of the socket and ipies for each side of the connection. We can use a loop ^^^^^ pipes
Oops, fixed.
across the sides to reduce the duplication.
Signed-off-by: David Gibson
--- tcp_splice.c | 32 ++++++++++++++------------------ 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 99ef8a4..239f6d2 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new) void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union) { struct tcp_splice_conn *conn = &conn_union->splice; + int side;
- if (conn->events & SPLICE_ESTABLISHED) { - /* Flushing might need to block: don't recycle them. */ - if (conn->pipe[0][0] != -1) { - close(conn->pipe[0][0]); - close(conn->pipe[0][1]); - conn->pipe[0][0] = conn->pipe[0][1] = -1; + for (side = 0; side < SIDES; side++) { + if (conn->events & SPLICE_ESTABLISHED) { + /* Flushing might need to block: don't recycle them. */ + if (conn->pipe[side][0] != -1) { + close(conn->pipe[side][0]); + close(conn->pipe[side][1]); + conn->pipe[side][0] = conn->pipe[side][1] = -1; + } } - if (conn->pipe[1][0] != -1) { - close(conn->pipe[1][0]); - close(conn->pipe[1][1]); - conn->pipe[1][0] = conn->pipe[1][1] = -1; + + if (side == 0 || conn->events & SPLICE_CONNECT) { + close(conn->s[side]); + conn->s[side] = -1; } - }
- if (conn->events & SPLICE_CONNECT) { - close(conn->s[1]); - conn->s[1] = -1; + conn->read[side] = conn->written[side] = 0; }
- close(conn->s[0]); - conn->s[0] = -1; - conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
With this, on SPLICE_CONNECT, we would close the [0] side, but not the [1] side. SPLICE_CONNECT means we already have an open socket for [1], though. I think it should be:
[loop on sides]
if (side == 1 || conn->events & SPLICE_CONNECT) { close(conn->s[side]); conn->s[1] = -1; } }
and then we still need to unconditionally close conn->s[0]. Perhaps we could take both parts outside of the loop:
Uh.. I think you're misreading. In the updated code we have: if (side == 0 || conn->events & SPLICE_CONNECT) { close(conn->s[side]); conn->s[side] = -1; } That's an OR, so we always close side 0, and we close side 1 iff we have SPLICE_CONNECT, which matches what you're describing.
if (conn->events & SPLICE_CONNECT) { close(conn->s[1]); conn->s[1] = -1; }
close(conn->s[0]); conn->s[0] = -1; conn->read[side] = conn->written[side] = 0;
The handling for the SPLICE_ESTABLISHED case looks correct to me.
We are relying on the fact that setting SPLICE_ESTABLISHED doesn't clear SPLICE_CONNECT. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Mon, 6 Nov 2023 13:39:14 +1100
David Gibson
On Fri, Nov 03, 2023 at 05:22:13PM +0100, Stefano Brivio wrote:
On Thu, 12 Oct 2023 12:51:13 +1100 David Gibson
wrote: tcp_splice_destroy() has some close-to-duplicated logic handling closing of the socket and ipies for each side of the connection. We can use a loop ^^^^^ pipes
Oops, fixed.
across the sides to reduce the duplication.
Signed-off-by: David Gibson
--- tcp_splice.c | 32 ++++++++++++++------------------ 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 99ef8a4..239f6d2 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -258,30 +258,26 @@ void tcp_splice_conn_update(const struct ctx *c, struct tcp_splice_conn *new) void tcp_splice_destroy(struct ctx *c, union tcp_conn *conn_union) { struct tcp_splice_conn *conn = &conn_union->splice; + int side;
- if (conn->events & SPLICE_ESTABLISHED) { - /* Flushing might need to block: don't recycle them. */ - if (conn->pipe[0][0] != -1) { - close(conn->pipe[0][0]); - close(conn->pipe[0][1]); - conn->pipe[0][0] = conn->pipe[0][1] = -1; + for (side = 0; side < SIDES; side++) { + if (conn->events & SPLICE_ESTABLISHED) { + /* Flushing might need to block: don't recycle them. */ + if (conn->pipe[side][0] != -1) { + close(conn->pipe[side][0]); + close(conn->pipe[side][1]); + conn->pipe[side][0] = conn->pipe[side][1] = -1; + } } - if (conn->pipe[1][0] != -1) { - close(conn->pipe[1][0]); - close(conn->pipe[1][1]); - conn->pipe[1][0] = conn->pipe[1][1] = -1; + + if (side == 0 || conn->events & SPLICE_CONNECT) { + close(conn->s[side]); + conn->s[side] = -1; } - }
- if (conn->events & SPLICE_CONNECT) { - close(conn->s[1]); - conn->s[1] = -1; + conn->read[side] = conn->written[side] = 0; }
- close(conn->s[0]); - conn->s[0] = -1; - conn->read[0] = conn->written[0] = conn->read[1] = conn->written[1] = 0;
With this, on SPLICE_CONNECT, we would close the [0] side, but not the [1] side. SPLICE_CONNECT means we already have an open socket for [1], though. I think it should be:
[loop on sides]
if (side == 1 || conn->events & SPLICE_CONNECT) { close(conn->s[side]); conn->s[1] = -1; } }
and then we still need to unconditionally close conn->s[0]. Perhaps we could take both parts outside of the loop:
Uh.. I think you're misreading. In the updated code we have: if (side == 0 || conn->events & SPLICE_CONNECT) { close(conn->s[side]); conn->s[side] = -1; } That's an OR, so we always close side 0, and we close side 1 iff we have SPLICE_CONNECT, which matches what you're describing.
Gosh, yes, sorry, I read && for some reason. -- Stefano
tcp_splice_sock_handler() uses the tcp_splice_dir() helper to select
which of the socket, pipe and counter fields to use depending on which
side of the connection the socket event is coming from.
Now that we are using arrays for the two sides, rather than separate named
fields, we can instead just use a variable indicating the side and use
that to index the arrays whever we need a particular side's field.
Signed-off-by: David Gibson
On Thu, 12 Oct 2023 12:51:14 +1100
David Gibson
tcp_splice_sock_handler() uses the tcp_splice_dir() helper to select which of the socket, pipe and counter fields to use depending on which side of the connection the socket event is coming from.
Now that we are using arrays for the two sides, rather than separate named fields, we can instead just use a variable indicating the side and use that to index the arrays whever we need a particular side's field.
Signed-off-by: David Gibson
--- tcp_splice.c | 81 ++++++++++++++-------------------------------------- 1 file changed, 22 insertions(+), 59 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 239f6d2..822d15a 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -438,29 +438,6 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn, return tcp_splice_connect(c, conn, s, port); }
-/** - * tcp_splice_dir() - Set sockets/pipe pointers reflecting flow direction - * @conn: Connection pointers - * @ref_sock: Socket returned as reference from epoll - * @reverse: Reverse direction: @ref_sock is used as destination - * @from: Destination socket pointer to set - * @to: Source socket pointer to set - * @pipes: Pipe set, assigned on return - */ -static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock, - int reverse, int *from, int *to, int **pipes) -{ - if (!reverse) { - *from = ref_sock; - *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0]; - } else { - *to = ref_sock; - *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0]; - } - - *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1]; -} - /** * tcp_splice_conn_from_sock() - Attempt to init state for a spliced connection * @c: Execution context @@ -521,8 +498,7 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn, int s, uint32_t events) { uint8_t lowat_set_flag, lowat_act_flag; - int from, to, *pipes, eof, never_read; - uint32_t *seq_read, *seq_write; + int fromside, eof, never_read;
if (conn->events == SPLICE_CLOSED) return; @@ -538,14 +514,15 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn, }
if (events & EPOLLOUT) { - if (s == conn->s[0]) + if (s == conn->s[0]) { conn_event(c, conn, ~OUT_WAIT_0); - else + fromside = 1; + } else { conn_event(c, conn, ~OUT_WAIT_1); - - tcp_splice_dir(conn, s, 1, &from, &to, &pipes); + fromside = 0; + } } else { - tcp_splice_dir(conn, s, 0, &from, &to, &pipes); + fromside = s == conn->s[0] ? 0 : 1; }
if (events & EPOLLRDHUP) { @@ -566,24 +543,16 @@ swap: eof = 0; never_read = 1;
- if (from == conn->s[0]) { - seq_read = &conn->read[0]; - seq_write = &conn->written[0]; - lowat_set_flag = RCVLOWAT_SET_0; - lowat_act_flag = RCVLOWAT_ACT_0; - } else { - seq_read = &conn->read[1]; - seq_write = &conn->written[1]; - lowat_set_flag = RCVLOWAT_SET_1; - lowat_act_flag = RCVLOWAT_ACT_1; - } + lowat_set_flag = fromside == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1; + lowat_act_flag = fromside == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
while (1) { ssize_t readlen, to_write = 0, written; int more = 0;
retry: - readlen = splice(from, NULL, pipes[1], NULL, c->tcp.pipe_size, + readlen = splice(conn->s[fromside], NULL, + conn->pipe[fromside][1], NULL, c->tcp.pipe_size, SPLICE_F_MOVE | SPLICE_F_NONBLOCK); trace("TCP (spliced): %li from read-side call", readlen); if (readlen < 0) { @@ -608,7 +577,8 @@ retry: }
eintr: - written = splice(pipes[0], NULL, to, NULL, to_write, + written = splice(conn->pipe[fromside][0], NULL, + conn->s[!fromside], NULL, to_write, SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK); trace("TCP (spliced): %li from write-side call (passed %lu)", written, to_write); @@ -622,8 +592,8 @@ eintr: readlen > (long)c->tcp.pipe_size / 10) { int lowat = c->tcp.pipe_size / 4;
- setsockopt(from, SOL_SOCKET, SO_RCVLOWAT, - &lowat, sizeof(lowat)); + setsockopt(conn->s[fromside], SOL_SOCKET, + SO_RCVLOWAT, &lowat, sizeof(lowat));
conn_flag(c, conn, lowat_set_flag); conn_flag(c, conn, lowat_act_flag); @@ -632,8 +602,8 @@ eintr: break; }
- *seq_read += readlen > 0 ? readlen : 0; - *seq_write += written > 0 ? written : 0; + conn->read[fromside] += readlen > 0 ? readlen : 0; + conn->written[fromside] += written > 0 ? written : 0;
The extra whitespace had the function of making this look more "tabular", now you should add a further one (or drop it altogether depending on taste). -- Stefano
On Fri, Nov 03, 2023 at 05:21:12PM +0100, Stefano Brivio wrote:
On Thu, 12 Oct 2023 12:51:14 +1100 David Gibson
wrote: tcp_splice_sock_handler() uses the tcp_splice_dir() helper to select which of the socket, pipe and counter fields to use depending on which side of the connection the socket event is coming from.
Now that we are using arrays for the two sides, rather than separate named fields, we can instead just use a variable indicating the side and use that to index the arrays whever we need a particular side's field.
Signed-off-by: David Gibson
--- tcp_splice.c | 81 ++++++++++++++-------------------------------------- 1 file changed, 22 insertions(+), 59 deletions(-) diff --git a/tcp_splice.c b/tcp_splice.c index 239f6d2..822d15a 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -438,29 +438,6 @@ static int tcp_splice_new(const struct ctx *c, struct tcp_splice_conn *conn, return tcp_splice_connect(c, conn, s, port); }
-/** - * tcp_splice_dir() - Set sockets/pipe pointers reflecting flow direction - * @conn: Connection pointers - * @ref_sock: Socket returned as reference from epoll - * @reverse: Reverse direction: @ref_sock is used as destination - * @from: Destination socket pointer to set - * @to: Source socket pointer to set - * @pipes: Pipe set, assigned on return - */ -static void tcp_splice_dir(struct tcp_splice_conn *conn, int ref_sock, - int reverse, int *from, int *to, int **pipes) -{ - if (!reverse) { - *from = ref_sock; - *to = (*from == conn->s[0]) ? conn->s[1] : conn->s[0]; - } else { - *to = ref_sock; - *from = (*to == conn->s[0]) ? conn->s[1] : conn->s[0]; - } - - *pipes = *from == conn->s[0] ? conn->pipe[0] : conn->pipe[1]; -} - /** * tcp_splice_conn_from_sock() - Attempt to init state for a spliced connection * @c: Execution context @@ -521,8 +498,7 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn, int s, uint32_t events) { uint8_t lowat_set_flag, lowat_act_flag; - int from, to, *pipes, eof, never_read; - uint32_t *seq_read, *seq_write; + int fromside, eof, never_read;
if (conn->events == SPLICE_CLOSED) return; @@ -538,14 +514,15 @@ void tcp_splice_sock_handler(struct ctx *c, struct tcp_splice_conn *conn, }
if (events & EPOLLOUT) { - if (s == conn->s[0]) + if (s == conn->s[0]) { conn_event(c, conn, ~OUT_WAIT_0); - else + fromside = 1; + } else { conn_event(c, conn, ~OUT_WAIT_1); - - tcp_splice_dir(conn, s, 1, &from, &to, &pipes); + fromside = 0; + } } else { - tcp_splice_dir(conn, s, 0, &from, &to, &pipes); + fromside = s == conn->s[0] ? 0 : 1; }
if (events & EPOLLRDHUP) { @@ -566,24 +543,16 @@ swap: eof = 0; never_read = 1;
- if (from == conn->s[0]) { - seq_read = &conn->read[0]; - seq_write = &conn->written[0]; - lowat_set_flag = RCVLOWAT_SET_0; - lowat_act_flag = RCVLOWAT_ACT_0; - } else { - seq_read = &conn->read[1]; - seq_write = &conn->written[1]; - lowat_set_flag = RCVLOWAT_SET_1; - lowat_act_flag = RCVLOWAT_ACT_1; - } + lowat_set_flag = fromside == 0 ? RCVLOWAT_SET_0 : RCVLOWAT_SET_1; + lowat_act_flag = fromside == 0 ? RCVLOWAT_ACT_0 : RCVLOWAT_ACT_1;
while (1) { ssize_t readlen, to_write = 0, written; int more = 0;
retry: - readlen = splice(from, NULL, pipes[1], NULL, c->tcp.pipe_size, + readlen = splice(conn->s[fromside], NULL, + conn->pipe[fromside][1], NULL, c->tcp.pipe_size, SPLICE_F_MOVE | SPLICE_F_NONBLOCK); trace("TCP (spliced): %li from read-side call", readlen); if (readlen < 0) { @@ -608,7 +577,8 @@ retry: }
eintr: - written = splice(pipes[0], NULL, to, NULL, to_write, + written = splice(conn->pipe[fromside][0], NULL, + conn->s[!fromside], NULL, to_write, SPLICE_F_MOVE | more | SPLICE_F_NONBLOCK); trace("TCP (spliced): %li from write-side call (passed %lu)", written, to_write); @@ -622,8 +592,8 @@ eintr: readlen > (long)c->tcp.pipe_size / 10) { int lowat = c->tcp.pipe_size / 4;
- setsockopt(from, SOL_SOCKET, SO_RCVLOWAT, - &lowat, sizeof(lowat)); + setsockopt(conn->s[fromside], SOL_SOCKET, + SO_RCVLOWAT, &lowat, sizeof(lowat));
conn_flag(c, conn, lowat_set_flag); conn_flag(c, conn, lowat_act_flag); @@ -632,8 +602,8 @@ eintr: break; }
- *seq_read += readlen > 0 ? readlen : 0; - *seq_write += written > 0 ? written : 0; + conn->read[fromside] += readlen > 0 ? readlen : 0; + conn->written[fromside] += written > 0 ? written : 0;
The extra whitespace had the function of making this look more "tabular", now you should add a further one (or drop it altogether depending on taste).
Ah, yes. I've restored the alignment. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Stefano Brivio