On Wed, Jan 17, 2024 at 08:59:14PM +0100, Stefano Brivio wrote:On Thu, 21 Dec 2023 18:02:26 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:The simplification / consistency is even more important than it seems right here. One of the big aims of this (though I haven't implemented it yet) is to allow our NAT to be done generically, rather that per-protocol. That requires having the addressing information in the common structure regardless of flow type.Every flow in the flow table now has space for the the addresses as seen by both the host and guest side. We fill that information in for regular "tap" TCP connections, but not for spliced connections. Fill in that information for spliced connections too, so it's now uniformly available for all flow types (that are implemented so far).I wonder if carrying the address for spliced connections is in any way useful -- other than being obviously useful as a simplification (which justifies this of course).That is, for a spliced connection, addresses and ports are kind of meaningless to us once the connection is established: we operate exclusively above Layer 4. Also, conceptually, all that's there to represent for a spliced connection is that addresses are loopback. To be clear: I'm not suggesting any change to this -- I just want to raise the conceptual inconsistency if it didn't occur to you.Hmm.. I would say in this case it's conceptual consistency leading to redundancy of information in practice rather than a conceptual inconsistency.-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonSigned-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 35 +++++++++++++---------------- tcp_splice.c | 62 +++++++++++++++++++++++++++++++++++++--------------- tcp_splice.h | 3 +-- 3 files changed, 60 insertions(+), 40 deletions(-) diff --git a/tcp.c b/tcp.c index 18ab3ac..6d77cf6 100644 --- a/tcp.c +++ b/tcp.c @@ -2658,32 +2658,23 @@ static void tcp_snat_inbound(const struct ctx *c, union inany_addr *addr) * tcp_tap_conn_from_sock() - Initialize state for non-spliced connection * @c: Execution context * @ref: epoll reference of listening socket - * @conn: connection structure to initialize + * @conn: connection structure (with TAPFSIDE(@conn) completed) * @s: Accepted socket - * @sa: Peer socket address (from accept()) * @now: Current timestamp - * - * Return: true if able to create a tap connection, false otherwise */ -static bool tcp_tap_conn_from_sock(struct ctx *c, +static void tcp_tap_conn_from_sock(struct ctx *c, union tcp_listen_epoll_ref ref, struct tcp_tap_conn *conn, int s, - const struct sockaddr *sa, const struct timespec *now) { + ASSERT(flowside_complete(SOCKFSIDE(conn))); + conn->f.type = FLOW_TCP; conn->sock = s; conn->timer = -1; conn->ws_to_tap = conn->ws_from_tap = 0; conn_event(c, conn, SOCK_ACCEPTED); - if (flowside_from_sock(SOCKFSIDE(conn), PIF_HOST, s, NULL, sa) < 0) { - err("tcp: Failed to get local name, connection dropped"); - return false; - } - - ASSERT(flowside_complete(SOCKFSIDE(conn))); - TAPFSIDE(conn)->pif = PIF_TAP; TAPFSIDE(conn)->faddr = SOCKFSIDE(conn)->eaddr; TAPFSIDE(conn)->fport = SOCKFSIDE(conn)->eport; @@ -2712,8 +2703,6 @@ static bool tcp_tap_conn_from_sock(struct ctx *c, conn_flag(c, conn, ACK_FROM_TAP_DUE); tcp_get_sndbuf(conn); - - return true; } /** @@ -2737,15 +2726,21 @@ void tcp_listen_handler(struct ctx *c, union epoll_ref ref, if (s < 0) goto cancel; - if (c->mode == MODE_PASTA && - tcp_splice_conn_from_sock(c, ref.tcp_listen, &flow->tcp_splice, - s, (struct sockaddr *)&sa)) + if (flowside_from_sock(&flow->f.side[0], ref.tcp_listen.pif, s, + NULL, &sa) < 0) { + err("tcp: Failed to get local name, connection dropped"); + close(s); + flow_alloc_cancel(flow); return; + } - if (tcp_tap_conn_from_sock(c, ref.tcp_listen, &flow->tcp, s, - (struct sockaddr *)&sa, now)) + if (c->mode == MODE_PASTA && + tcp_splice_conn_from_sock(c, ref.tcp_listen, &flow->tcp_splice, s)) return; + tcp_tap_conn_from_sock(c, ref.tcp_listen, &flow->tcp, s, now); + return; + cancel: /* Failed to create the connection */ if (s >= 0) diff --git a/tcp_splice.c b/tcp_splice.c index eec02fe..0faeb1b 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -72,6 +72,9 @@ static int ns_sock_pool6 [TCP_SOCK_POOL_SIZE]; /* Pool of pre-opened pipes */ static int splice_pipe_pool [TCP_SPLICE_PIPE_POOL_SIZE][2]; +#define FSIDE0(conn) (&(conn)->f.side[0]) +#define FSIDE1(conn) (&(conn)->f.side[1]) + #define CONN_V6(x) (x->flags & SPLICE_V6) #define CONN_V4(x) (!CONN_V6(x)) #define CONN_HAS(conn, set) ((conn->events & (set)) == (set)) @@ -280,9 +283,21 @@ bool tcp_splice_flow_defer(union flow *flow) static int tcp_splice_connect_finish(const struct ctx *c, struct tcp_splice_conn *conn) { + struct sockaddr_storage sa; + socklen_t sl = sizeof(sa); unsigned side; int i = 0; + if (getsockname(conn->s[1], (struct sockaddr *)&sa, &sl) < 0) { + int ret = -errno; + conn_flag(c, conn, CLOSING); + return ret; + } + inany_from_sockaddr(&FSIDE1(conn)->faddr, &FSIDE1(conn)->fport, + (struct sockaddr *)&sa); + + ASSERT(flowside_complete(FSIDE1(conn))); + for (side = 0; side < SIDES; side++) { conn->pipe[side][0] = conn->pipe[side][1] = -1; @@ -352,13 +367,24 @@ static int tcp_splice_connect(const struct ctx *c, struct tcp_splice_conn *conn, conn->s[1]); } + /* It would be nicer if we could initialise FSIDE1 all at once with + * flowaddrs_from_af() or flowaddrs_from_sock(). However, we can't get + * the forwarding port until the connect() has finished and we don't + * want to block to wait for it. Meanwhile we have the endpoint address[...] endpoint address and port [...]. Or, if "address" includes the port too, then the comment should also say "forwarding address", not "forwarding port". It's confusing otherwise: why is there anything special with the endpoint *address* as opposed to the forwarding *port*?+ * here, but don't have a place to stash it other than in the flowaddrs + * itself. So, initialisation of FSIDE1 is split between here and + * tcp_splice_connect_finish(). Ugly but necessary. + */ if (CONN_V6(conn)) { sa = (struct sockaddr *)&addr6; sl = sizeof(addr6); + inany_from_af(&FSIDE1(conn)->eaddr, AF_INET6, &addr6.sin6_addr); } else { sa = (struct sockaddr *)&addr4; sl = sizeof(addr4); + inany_from_af(&FSIDE1(conn)->eaddr, AF_INET, &addr4.sin_addr); } + FSIDE1(conn)->eport = port;