It turns out a couple of places on the IPv4 specific inbound path
accidentally use control structures that are supposed to be for IPv6.
That could lead to weird behaviour in a rather complex set of
circumstances.
Path 1/4 here is the actual fix, the rest makes some clean ups to the
code that should make similar mistakes harder errors harder to commit
in future.
This is based on my earlier cleanup of the UDP splicing code, although
I think it will rebase trivially.
David Gibson (4):
udp: Fix inorrect use of IPv6 mh buffers in IPv4 path
udp: Better factor IPv4 and IPv6 paths in udp_sock_handler()
udp: Preadjust udp[46]_l2_iov_tap[].iov_base for pasta mode
udp: Factor out control structure management from
udp_sock_fill_data_v[46]
udp.c | 184 ++++++++++++++++++++++++++--------------------------------
1 file changed, 81 insertions(+), 103 deletions(-)
--
2.38.1
The UDP "splicing" (forwarding packets from one L4 socket to another,
rather than via the tuntap device) code assumes that any given UDP
port in the init namespace will only communicate with a single port on
the ns side at a time, and vice versa. This will often be the case,
but since UDP is a connectionless protocol, it need not be. In fact
it is not the case in our existing UDP bandwidth checks, although the
specific configuration there means it's not harmful in that case.
The failure mode in this case can be quite bad: we don't just fall
back to an unoptimized oath, or drop packets, we will misdirect
packets to the wrong destination.
This series make some substantial simplifications to how we handle the
splice forwarding, then corrects it to handle the case of multiple
source ports sending to a single destination.
This does come at a performance cost. It's not as large as I feared,
and shouldn't affect the most common case where there is a 1 to 1
mapping between source and destination ports. I haven't yet been able
to confirm the latter because the iperf3 bandwidth test we use *does*
have interleaved streams with a common destination port.
Based on the earlier series for dual stack TCP sockets.
Changes since v3:
* Changed interface of udp_splice_sendfrom() to slightly better
separate concerns and to make some future cleanups simpler
* Fixed a serious buffer overrun bug where we weren't bounds checking
as we scanned for additional datagrams with the same source
address.
Changes since v2:
* Minor style and comment revisions
Changes since v1:
* Added patches 12..16/16 fixing the delivery of packets, as well as
just simplifying the mechanics
David Gibson (16):
udp: Also bind() connected ports for "splice" forwarding
udp: Separate tracking of inbound and outbound packet flows
udp: Always use sendto() rather than send() for forwarding spliced
packets
udp: Don't connect "forward" sockets for spliced flows
udp: Remove the @bound field from union udp_epoll_ref
udp: Split splice field in udp_epoll_ref into (mostly) independent
bits
udp: Don't create double sockets for -U port
udp: Re-use fixed bound sockets for packet forwarding when possible
udp: Don't explicitly track originating socket for spliced
"connections"
udp: Update UDP "connection" timestamps in both directions
udp: Simplify udp_sock_handler_splice
udp: Make UDP_SPLICE_FRAMES and UDP_TAP_FRAMES_MEM the same thing
udp: Add helper to extract port from a sockaddr_in or sockaddr_in6
udp: Unify buffers for tap and splice paths
udp: Split send half of udp_sock_handler_splice() from the receive
half
udp: Correct splice forwarding when receiving from multiple sources
passt.h | 2 +
udp.c | 522 ++++++++++++++++++++++++++------------------------------
udp.h | 16 +-
3 files changed, 248 insertions(+), 292 deletions(-)
--
2.38.1
The UDP "splicing" (forwarding packets from one L4 socket to another,
rather than via the tuntap device) code assumes that any given UDP
port in the init namespace will only communicate with a single port on
the ns side at a time, and vice versa. This will often be the case,
but since UDP is a connectionless protocol, it need not be. In fact
it is not the case in our existing UDP bandwidth checks, although the
specific configuration there means it's not harmful in that case.
The failure mode in this case can be quite bad: we don't just fall
back to an unoptimized oath, or drop packets, we will misdirect
packets to the wrong destination.
This series make some substantial simplifications to how we handle the
splice forwarding, then corrects it to handle the case of multiple
source ports sending to a single destination.
This does come at a performance cost. It's not as large as I feared,
and shouldn't affect the most common case where there is a 1 to 1
mapping between source and destination ports. I haven't yet been able
to confirm the latter because the iperf3 bandwidth test we use *does*
have interleaved streams with a common destination port.
Based on the earlier series for dual stack TCP sockets.
Changes since v1:
* Added patches 12..16/16 fixing the delivery of packets, as well as
just simplifying the mechanics
David Gibson (16):
udp: Also bind() connected ports for "splice" forwarding
udp: Separate tracking of inbound and outbound packet flows
udp: Always use sendto() rather than send() for forwarding spliced
packets
udp: Don't connect "forward" sockets for spliced flows
udp: Remove the @bound field from union udp_epoll_ref
udp: Split splice field in udp_epoll_ref into (mostly) independent
bits
udp: Don't create double sockets for -U port
udp: Re-use fixed bound sockets for packet forwarding when possible
udp: Don't explicitly track originating socket for spliced
"connections"
udp: Update UDP "connection" timestamps in both directions
udp: Simplify udp_sock_handler_splice
udp: Make UDP_SPLICE_FRAMES and UDP_TAP_FRAMES_MEM the same thing
udp: Add helper to extract port from a sockaddr_in or sockaddr_in6
udp: Unify buffers for tap and splice paths
udp: Split send half of udp_sock_handler_splice() from the receive
half
udp: Correct splice forwarding when receiving from multiple sources
passt.h | 2 +
udp.c | 518 +++++++++++++++++++++++++-------------------------------
udp.h | 16 +-
3 files changed, 244 insertions(+), 292 deletions(-)
--
2.38.1
With this series, fuzzing actually works, albeit slowly. More on that
below.
Patches 1 & 2 are the same as before.
Patch 3 is Stefano Brivio's modified patch (with some changes that we
discussed together on IRC but otherwise unchanged).
Patch 4 is new, but discussed already upstream: It changes the order
in which EPOLLIN and EPOLLRDHUP events are processed, so that we don't
drop packets when the socket is closed.
Patches 5 & 6 are the hacks that were needed to get fuzzing to work.
Patch 6 removes all seccomp and other isolation stuff because for
unclear reasons that breaks AFL instrumentation. AFL appears to fork
off a second process, and somehow strace cannot follow that process,
but the second process fails, and that breaks AFL completely. Without
strace data it's rather hard to see what's going on so I didn't
investigate this further.
Patch 7 adds the fuzzing wrapper and is not greatly changed from
before. However I did have to disable the AFL "fork server"
optimization which somehow doesn't work with passt (it does work fine
with libnbd & nbdkit).
Speed is not great. I'm getting ~ 75-80 execs/second. Really we want
this to be much higher, since that ultimately governs how fast we can
explore new code paths and find bugs. Ideally well over 1000 execs/s
(per fuzzing process) would be a good target to aim for. (Of course
this depends on the hardware as well.)
We could try to find out why the fork server is not compatible with
passt, but I don't think we'd gain very much performance there. To
explore this I ran a dummy fuzzed process from the same wrapper, and
it was hardly any faster.
I think the real gains are going to come from reducing the overhead of
starting passt. There seem to be some netlink messages sent during
start up and maybe if those could be reduced or eliminated we might
see better performance.
The other factor is fuzzing stability, which hovers around 87-90%.
While this isn't necessarily bad, I wonder where the non-determinism
is coming from [ideal figures would be 95-100%]. Passt doesn't appear
to use threads. It does call getrandom (for TCP sequence numbers) so
it'd be good to have a way to bypass that. However I don't fully
understand what's going on here.
Rich.
The UDP "splicing" (forwarding packets from one L4 socket to another,
rather than via the tuntap device) code assumes that any given UDP
port in the init namespace will only communicate with a single port on
the ns side at a time, and vice versa. This will often be the case,
but since UDP is a connectionless protocol, it need not be. In fact
it is not the case in our existing UDP bandwidth checks, although the
specific configuration there means it's not harmful in that case.
The failure mode in this case can be quite bad: we don't just fall
back to an unoptimized oath, or drop packets, we will misdirect
packets to the wrong destination.
This series make some substantial simplifications to how we handle the
splice forwarding, then corrects it to handle the case of multiple
source ports sending to a single destination.
This does come at a performance cost. It's not as large as I feared,
and shouldn't affect the most common case where there is a 1 to 1
mapping between source and destination ports. I haven't yet been able
to confirm the latter because the iperf3 bandwidth test we use *does*
have interleaved streams with a common destination port.
Based on the earlier series for dual stack TCP sockets.
Changes since v2:
* Minor style and comment revisions
Changes since v1:
* Added patches 12..16/16 fixing the delivery of packets, as well as
just simplifying the mechanics
David Gibson (16):
udp: Also bind() connected ports for "splice" forwarding
udp: Separate tracking of inbound and outbound packet flows
udp: Always use sendto() rather than send() for forwarding spliced
packets
udp: Don't connect "forward" sockets for spliced flows
udp: Remove the @bound field from union udp_epoll_ref
udp: Split splice field in udp_epoll_ref into (mostly) independent
bits
udp: Don't create double sockets for -U port
udp: Re-use fixed bound sockets for packet forwarding when possible
udp: Don't explicitly track originating socket for spliced
"connections"
udp: Update UDP "connection" timestamps in both directions
udp: Simplify udp_sock_handler_splice
udp: Make UDP_SPLICE_FRAMES and UDP_TAP_FRAMES_MEM the same thing
udp: Add helper to extract port from a sockaddr_in or sockaddr_in6
udp: Unify buffers for tap and splice paths
udp: Split send half of udp_sock_handler_splice() from the receive
half
udp: Correct splice forwarding when receiving from multiple sources
passt.h | 2 +
udp.c | 519 +++++++++++++++++++++++++-------------------------------
udp.h | 16 +-
3 files changed, 244 insertions(+), 293 deletions(-)
--
2.38.1
When forwarding many ports, passt can consume a lot of kernel memory
because of the many listening sockets it opens. There are not a lot
of ways we can reduce that, but here's one.
Currently we create separate listening sockets for each port for both
IPv4 and IPv6. However in Linux (and probably other platforms), it's
possible to listen for both IPv4 and IPv6 connections on an IPv6
socket. This series uses such dual stack sockets to halve the number
of listening sockets needed for TCP. When forwarding all TCP and UDP
ports, this reduces the kernel memory used from around 677 MiB to
around 487 MiB (kernel 6.0.8 on an x86_64 Fedora 37 machine).
This should also be possible for UDP, but that will require a mostly
separate implementation.
Changes since v2:
* Assorted minor polishing based on Stefano's review
David Gibson (32):
clang-tidy: Suppress warning about assignments in if statements
style: Minor corrections to function comments
tcp_splice: #include tcp_splice.h in tcp_splice.c
tcp: Remove unused TCP_MAX_SOCKS constant
tcp: Better helpers for converting between connection pointer and
index
tcp_splice: Helpers for converting from index to/from tcp_splice_conn
tcp: Move connection state structures into a shared header
tcp: Add connection union type
tcp: Improved helpers to update connections after moving
tcp: Unify spliced and non-spliced connection tables
tcp: Unify tcp_defer_handler and tcp_splice_defer_handler()
tcp: Partially unify tcp_timer() and tcp_splice_timer()
tcp: Unify the IN_EPOLL flag
tcp: Separate helpers to create ns listening sockets
tcp: Unify part of spliced and non-spliced conn_from_sock path
tcp: Use the same sockets to listen for spliced and non-spliced
connections
tcp: Remove splice from tcp_epoll_ref
tcp: Don't store hash bucket in connection structures
inany: Helper functions for handling addresses which could be IPv4 or
IPv6
tcp: Hash IPv4 and IPv4-mapped-IPv6 addresses the same
tcp: Take tcp_hash_insert() address from struct tcp_conn
tcp: Simplify tcp_hash_match() to take an inany_addr
tcp: Unify initial sequence number calculation for IPv4 and IPv6
tcp: Have tcp_seq_init() take its parameters from struct tcp_conn
tcp: Fix small errors in tcp_seq_init() time handling
tcp: Remove v6 flag from tcp_epoll_ref
tcp: NAT IPv4-mapped IPv6 addresses like IPv4 addresses
tcp_splice: Allow splicing of connections from IPv4-mapped loopback
tcp: Consolidate tcp_sock_init[46]
util: Allow sock_l4() to open dual stack sockets
util: Always return -1 on error in sock_l4()
tcp: Use dual stack sockets for port forwarding when possible
Makefile | 15 +-
conf.c | 12 +-
inany.h | 94 +++++
siphash.c | 2 +
tap.c | 6 +-
tcp.c | 981 ++++++++++++++++++++++-----------------------------
tcp.h | 11 +-
tcp_conn.h | 192 ++++++++++
tcp_splice.c | 333 +++++++----------
tcp_splice.h | 12 +-
util.c | 19 +-
11 files changed, 891 insertions(+), 786 deletions(-)
create mode 100644 inany.h
create mode 100644 tcp_conn.h
--
2.38.1
This series make some substantial simplifications to how we handle the
forwarding of "spliced" UDP packets (those that don't go over the
tuntap device, but instead are forwarded from one L4 socket to
another).
This doesn't yet change the existing (arguably broken) assumption that
UDP communications are from one port to one port within the pasta
namespace, not one to many or many to one. However, the
simplifications made here will make it easier to correct that in
future.
Based on the earlier series for dual stack TCP sockets.
David Gibson (11):
udp: Also bind() connected ports for "splice" forwarding
udp: Separate tracking of inbound and outbound packet flows
udp: Always use sendto() rather than send() for forwarding spliced
packets
udp: Don't connect "forward" sockets for spliced flows
udp: Remove the @bound field from union udp_epoll_ref
udp: Split splice field in udp_epoll_ref into (mostly) independent
bits
udp: Don't create double sockets for -U port
udp: Re-use fixed bound sockets for packet forwarding when possible
udp: Don't explicitly track originating socket for spliced
"connections"
udp: Update UDP "connection" timestamps in both directions
udp: Simplify udp_sock_handler_splice
passt.h | 2 +
udp.c | 344 ++++++++++++++++++++++----------------------------------
udp.h | 16 ++-
3 files changed, 144 insertions(+), 218 deletions(-)
--
2.38.1
Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
---
README.md | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/README.md b/README.md
index 33e4dcf..46cbd9f 100644
--- a/README.md
+++ b/README.md
@@ -657,6 +657,10 @@ See also the [test logs](/builds/latest/test/).
### [Chat](/passt/chat)
* Somebody might be available on [IRC](https://irc.passt.top)
+### Weekly development [meeting](https://pad.passt.top/p/weekly)
+* Open to everybody! Feel free to join and propose a different time directly on
+ the agenda.
+
## Security and Vulnerability Reports
* Please send an email to [passt-sec](mailto:passt-sec@passt.top), private list,
--
2.35.1
In preparation for trying to implement dual stack sockets for UDP,
I've been getting my head around how the UDP splicing works. Alas,
I'm pretty sure that it's broken if there's not a one-to-one
correspondence between init side source ports and ns side destination
ports. That will typically be the case, but given its UDP there's no
guarantee.
In addition, UDP servers in the ns will not see the correct port
numbers with getpeername(). That's also true of spliced TCP
connections (see https://bugs.passt.top/show_bug.cgi?id=39), but it's
more likely to matter for UDP (I don't know of any TCP protocols that
care about source port number on the server side, but there are some
common UDP protocols that have at least port number conventions on
both sides).
I can expand on the details later, but pasta will do the wrong thing
in at least some circumstances for both a single init side socket
sendto()ing packets to multiple different ports in the ns/guest and
multiple init side sockets send()ing to the same port in the ns/guest.
I think I know how to fix it, but it's not a trivial job. So, the
question is do I embark on this now, or do I just remove UDP
"splicing" entirely for the time being (other than a minimum required
to make -U work)? That would unblock dual stack UDP sockets and we
can attempt to reoptimize this later.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson