Re: [PATCH 7/8] udp: Decide whether to "splice" per datagram rather than per socket

15 Dec 2022

On Wed, Dec 14, 2022 at 11:35:54AM +0100, Stefano Brivio wrote:
...
On Wed, 14 Dec 2022 12:47:25 +1100
David Gibson  wrote:
...
On Tue, Dec 13, 2022 at 11:49:18PM +0100, Stefano Brivio wrote:
...
On Mon,  5 Dec 2022 19:14:24 +1100
David Gibson  wrote:
...
Currently we have special sockets for receiving datagrams from locahost
which can use the optimized "splice" path rather than going across the tap
interface.
We want to loosen this so that sockets can receive sockets that will be
forwarded by both the spliced and non-spliced paths.  To do this, we alter
the meaning of the @splice bit in the reference to mean that packets
receieved on this socket *can* be spliced, not that they *will* be spliced.
They'll only actually be spliced if they come from 127.0.0.1 or ::1.
We can't (for now) remove the splice bit entirely, unlike with TCP.  Our
gateway mapping means that if the ns initiates communication to the gw
address, we'll translate that to target 127.0.0.1 on the host side.  Reply
packets will therefore have source address 127.0.0.1 when received on the
host, but these need to go via the tap path where that will be translated
back to the gateway address.  We need the @splice bit to distinguish that
case from packets going from localhost to a port mapped explicitly with
-u which should be spliced.
Signed-off-by: David Gibson 
---
 udp.c | 54 +++++++++++++++++++++++++++++++++++-------------------
 udp.h |  2 +-
 2 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/udp.c b/udp.c
index 6ccfe8c..011a157 100644
--- a/udp.c
+++ b/udp.c
@@ -513,16 +513,27 @@ static int udp_splice_new_ns(void *arg)
 }
/**
- * sa_port() - Determine port from a sockaddr_in or sockaddr_in6
+ * udp_mmh_splice_port() - Is source address of message suitable for splicing?
  * @v6:		Is @sa a sockaddr_in6 (otherwise sockaddr_in)?
- * @sa:		Pointer to either sockaddr_in or sockaddr_in6
+ * @mmh:	mmsghdr of incoming message
+ *
+ * Return: if @sa refers to localhost (127.0.0.1 or ::1) the port from
+ *         @sa, otherwise 0.
+ *
+ * NOTE: this relies on the fact that it's not valid to use UDP port 0
The port is reserved by IANA indeed, but... it can actually be used. On
Linux, you can bind() it and you can connect() to it. As far as I can
tell from the new version of udp_sock_handler() we would actually
misdirect packets in that case.
Hm, ok.  Given the IANA reservation, I think it would be acceptable to
simply drop such packets - but if we were to make that choice we
should do so explicitly, rather than misdirecting them.
Acceptable, sure, but... I don't know, it somehow doesn't look
desirable to me. The kernel doesn't enforce this, so I guess we
shouldn't either.
...
...
How bad would it be to use an int here?
Pretty straightforward.  Just means we have to use the somewhat
abtruse "if (port <= USHRT_MAX)" or "if (port >= 0)" or something
instead of just "if (port)".  Should I go ahead and make that change?
Eh, I don't like it either, but... I guess it's better than the
alternative, so yes, thanks. Or pass port as a pointer, set on return.
I'm fine with both.
I think the int is less ugly than the pointer.  Ok, I'll make that change.
...
...
...
By the way, I think the comment should also mention that the port is
returned in host order.
Ok, easily done.  Generally I try to keep the endianness associated
with the type, rather than attempting to document it for each variable
(or even worse, each point in time for each variable).
Yes, I see, and it's a more valid approach than mine, but still mine
comes almost for free.
Right.  Actually rereading the function in question, it specifically
says "port from @sa" and in the sockaddr, of course, it's network
endian, so it could particularly do with clarification in this case.
...
By the way, I got distracted by this and I forgot about two things:
...
+static in_port_t udp_mmh_splice_port(bool v6, const struct mmsghdr *mmh)
 {
-	const struct sockaddr_in6 *sa6 = sa;
-	const struct sockaddr_in *sa4 = sa;
+	const struct sockaddr_in6 *sa6 = mmh->msg_hdr.msg_name;
+	const struct sockaddr_in *sa4 = mmh->msg_hdr.msg_name;;
Stray semicolon here.
Fixed.
...
...
+
+	if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr))
+		return ntohs(sa6->sin6_port);
-	return v6 ? ntohs(sa6->sin6_port) : ntohs(sa4->sin_port);
+	if (ntohl(sa4->sin_addr.s_addr) == INADDR_LOOPBACK)
+		return ntohs(sa4->sin_port);
If it's IPv6, but not a loopback address, we'll check if
sa4->sin_addr.s_addr == INADDR_LOOPBACK -- which might actually be true for
an IPv6, non-loopback address.
Oops, yes, that's definitely wrong, and wrong enough to respin.
...
Also, I think we can happily "splice" for any loopback address, not
just 127.0.0.1. What about something like:
Well.. yes and no.  Yes, we can deliver packets in this way, but we'll
lost track of the original from address, so reply packets will be
misdirected.

Hrm.. ISTR you mentioned some cases that worked despite that, so I'll
make the change, and hope to fix it up better when I get to the NAT /
splice semantics rework.
...
if (v6 && IN6_IS_ADDR_LOOPBACK(&sa6->sin6_addr))
      return ntohs(sa6->sin6_port);
if (!v4 && IN4_IS_ADDR_LOOPBACK(&sa4->sin_addr))
      return ntohs(sa4->sin_port);
return -1;
?
-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson