[PATCH 00/16] More flow table preliminaries: address handling improvements
Here's another batch of cleanups and tweaks in preparation for the flow table. This set focuses on improved helpers for handling addresses, particularly in the TCP splice path. David Gibson (16): treewide: Use sa_family_t for address family variables tcp, udp: Don't precompute port remappings in epoll references flow: Add helper to determine a flow's protocol tcp_splice: Simplify clean up logic inany: Helper to test for IPv4 or IPv6 loopback address tcp, tcp_splice: Helpers for getting sockets from the pools tcp_splice: More specific variable names in new splice path tcp_splice: Fix incorrect parameter comment for tcp_splice_connect() tcp_splice: Merge tcp_splice_new() into its caller tcp_splice: Improve error reporting on connect path inany: Add inany_ntop() helper tcp_splice: Improve logic deciding when to splice util: Provide global constants for IPv4 loopback and unspecified address inany: Introduce union sockaddr_inany tcp, tcp_splice: Better construction of IPv4 or IPv6 sockaddrs inany: Extend inany_from_af to easily set unspecified addresses Makefile | 6 +- flow.c | 7 ++ flow.h | 4 + icmp.c | 24 +++--- icmp.h | 4 +- inany.c | 34 ++++++++ inany.h | 96 ++++++++++++++++++---- tcp.c | 106 ++++++++++++------------ tcp.h | 4 +- tcp_conn.h | 4 +- tcp_splice.c | 223 ++++++++++++++++++++++++++------------------------- tcp_splice.h | 5 +- udp.c | 21 ++--- udp.h | 2 +- util.c | 5 +- util.h | 4 +- 16 files changed, 335 insertions(+), 214 deletions(-) create mode 100644 inany.c -- 2.43.0
Sometimes we use sa_family_t for variables and parameters containing a
socket address family, other times we use a plain int. Since sa_family_t
is what's actually used in struct sockaddr and friends, standardise on
that.
Signed-off-by: David Gibson
The epoll references for both TCP listening sockets and UDP sockets
includes a port number. This gives the destination port that traffic to
that socket will be sent to on the other side. That will usually be the
same as the socket's bound port, but might not if the -t, -u, -T or -U
options are given with different original and forwarded port numbers.
As we move towards a more flexible forwarding model for passt, it's going
to become possible for that destination port to vary depending on more
things (for example the source or destination address). So, it will no
longer make sense to have a fixed value for a listening socket.
Change to simpler semantics where this field in the reference gives the
bound port of the socket. We apply the translations to the correct
destination port later on, when we're actually forwarding.
Signed-off-by: David Gibson
Each flow already has a type field. This implies the protocol the flow
represents, but also has more information: we have two ways to represent
TCP flows, "tap" and "spliced". In order to generalise some of the flow
mechanics, we'll need to determine a flow's protocol in terms of the IP
(L4) protocol number.
Introduce a constant table and helper macro to derive this from the flow
type.
Signed-off-by: David Gibson
Currently tcp_splice_flow_defer() contains specific logic to determine if
we're far enough initialised that we need to close pipes and/or sockets.
This is potentially fragile if we change something about the order in which
we do things. We can simplify this by initialising the pipe and socket
fields to -1 very early, then close()ing them if and only if they're non
negative.
This lets us remove a special case cleanup if our connect() fails. This
will already trigger a CLOSING event, and the socket fd in question is
populated in the connection structure. Thus we can let the new cleanup
logic handle it rather than requiring an explicit close().
Signed-off-by: David Gibson
tcp_splice_conn_from_sock() needs to check if an inany is either the IPv6
loopback address (::) or an IPv4 loopback address (127.0.0.0/8). We may
have other places that also want to check this in future, so simplify it
with an inany_is_loopback() helper.
Signed-off-by: David Gibson
We maintain pools of ready-to-connect sockets in both the original and
(for pasta) guest namespace to reduce latency when starting new TCP
connections. If we exhaust those pools we have to take a higher
latency path to get a new socket.
Currently we open-code that fallback in the places we need it. To
improve clarity encapsulate that into helper functions.
Signed-off-by: David Gibson
In tcp_splice_conn_from_sock(), the 'port' variable stores the source port
of the connection on the originating side. In tcp_splice_new(), called
directly from it, the 'port' parameter gives the _destination_ port of the
originating connection and is then updated to the destination port of the
connection on the other side.
Similarly, in tcp_splice_conn_from_sock(), 's' is the fd of the accetped
socket (on side 0), whereas in tcp_splice_new(), 's' is the fd of the
connecting socket (side 1).
I, for one, find having the same variable name with different meanings in
such close proximity in the flow of control pretty confusing. Alter the
names for greater specificity and clarity.
Signed-off-by: David Gibson
Both the name and description are wrong.
Signed-off-by: David Gibson
The only caller of tcp_splice_new() is tcp_splice_conn_from_sock(). Both
are quite short, and the division of responsibilities between the two isn't
particularly obvious. Simplify by merging the former into the latter.
Signed-off-by: David Gibson
This makes a number of changes to improve error reporting while connecting
a new spliced socket:
* We use flow_err() and similar functions so all messages include info
on which specific flow was affected
* We use strerror() to interpret raw error values
* We now report errors on connection (at "trace" level, since this would
allow spamming the logs)
* We also look up and report some details on EPOLLERR events, which can
include connection errors, since we use a non-blocking connect(). Again
we use "trace" level since this can spam the logs.
* Use the 'goto fail' idiom in tcp_splice_conn_from_sock to combine some
common handling
Signed-off-by: David Gibson
Add this helper to format an inany into either IPv4 or IPv6 text format as
appropriate.
Signed-off-by: David Gibson
This makes several tweaks to improve the logic which decides whether we're
able to use the splice method for a new connection.
* Rather than only calling tcp_splice_conn_from_sock() in pasta mode, we
check for pasta mode within it, better localising the checks.
* Previously if we got a connection from a non-loopback address we'd
always fall back to the "tap" path, even if the connection was on a
socket in the namespace. If we did get a non-loopback address on a
namespace socket, something has gone wrong and the "tap" path certainly
won't be able to handle it. Report the error and close, rather than
passing it along to tap.
Signed-off-by: David Gibson
For various reasons, the standard library doesn't provide IPv4 loopback and
unspecified addresses as struct in_addr constants, the way it does
in6addr_loopback and in6addr_any for IPv6. This lack means that in several
places we initialise a local with a macro just so that we can take the
address of an IPv4 loopback or unspecified address.
Provide our own in4addr_any and in4addr_loopback global constants to make
this a bit simpler.
Signed-off-by: David Gibson
There are a number of places where we want to handle either a sockaddr_in
or a sockaddr_in6. In some of those we use a void *, which works ok and
matches some standard library interfaces, but doesn't give a signature
level hint that we're dealing with only sockaddr_in or sockaddr_in6, not
(say) sockaddr_un or another type of socket address. Other places we use a
sockaddr_storage, which also works, but has the same problem in addition to
allocating more on the stack than we need to.
Introduce union sockaddr_inany to explictly handle this case: it has
variants for sockaddr_in and sockaddr_in6. Use it in a number of
places where it's easy to do so.
Signed-off-by: David Gibson
In both tcp_conn_from_tap() and tcp_splice_connect() we need to construct
a socket address for connect() which could be either IPv4 or IPv6. At the
moment we initialise both a sockaddr_in and a sockaddr_in6 as locals, then
set a pointer to one or the other. This is a little bit ugly.
More importantly, though, in the case of tcp_conn_from_tap() initialising
the sockaddr_in6 when we're actually passed an IPv4 address will access
memory beyond the implied (struct in_addr) we're passed as daddr. In
practice that will be a pointer into a packet buffer, so there will be
enough valid memory to get 16 bytes of (garbage) IPv6 address that are then
ignored. However, it's not a good look to access beyond what the
parameters seem to imply is passed.
We can clean up these cases using sockaddr_inany and a new helper
sockaddr_inany_init().
Signed-off-by: David Gibson
inany_from_af() can be used to initialise a union inany_addr from either
an IPv4 or IPv6 address. If we want to set it to the unspecified IPv4
or IPv6 address we can do that, but need to explicitly pass the correct
address version. Make this easier by allowing it to interpret a NULL
address as the unspecified address of the appropriate family.
We only have one trivial use for this at present, but it will have more
uses in future.
Signed-off-by: David Gibson
On Mon, Jan 29, 2024 at 03:35:41PM +1100, David Gibson wrote:
Here's another batch of cleanups and tweaks in preparation for the flow table. This set focuses on improved helpers for handling addresses, particularly in the TCP splice path.
Sorry Stefano, I jumped the gun a bit here. Feel free to do a review if you have the time, but don't apply yet. I have some changes that still need to be made here. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
participants (1)
-
David Gibson