On Thu, Feb 22, 2024 at 01:45:44PM +0100, Stefano Brivio wrote:On Tue, 6 Feb 2024 12:17:31 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:It's unclear to me from either rfc 6890 or rfc 1122 whether this is describing something meaningful on the wire, or only something meaningful in APIs.TCP connections should typically not have wildcard addresses (0.0.0.0 or ::) nor a zero port number for either endpoint.This is only true for non-local connections, see also: https://archives.passt.top/passt-dev/20240119105630.089c5d34@elisabeth/ Just looking at RFC 6890, which should be sufficient: +----------------------+----------------------------+ | Attribute | Value | +----------------------+----------------------------+ | Address Block | 0.0.0.0/8 | | Name | "This host on this network"| | RFC | [RFC1122], Section 3.2.1.3 | | Allocation Date | September 1981 | | Termination Date | N/A | | Source | True | | Destination | False | | Forwardable | False | | Global | False | | Reserved-by-Protocol | True | +----------------------+----------------------------+...it can be used as source, but surely not by a non-loopback interface, so not by the guest. About using it as destination: the table says it can't be done, but: $ nc -l -p 8818 & echo abcd | nc 0.0.0.0 8818 [2] 2624647 abcd and: # tcpdump -ni lo tcp port 8818 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes 13:24:51.379436 IP 127.0.0.1.58574 > 127.0.0.1.8818: Flags [S], seq 3285989360, win 65495, options [mss 65495,sackOK,TS val 1253719321 ecr 0,nop,wscale 7], length 0 13:24:51.379447 IP 127.0.0.1.8818 > 127.0.0.1.58574: Flags [S.], seq 3128422158, ack 3285989361, win 65483, options [mss 65495,sackOK,TS val 1253719321 ecr 1253719321,nop,wscale 7], length 0 13:24:51.379457 IP 127.0.0.1.58574 > 127.0.0.1.8818: Flags [.], ack 1, win 512, options [nop,nop,TS val 1253719321 ecr 1253719321], length 0 13:24:51.379508 IP 127.0.0.1.58574 > 127.0.0.1.8818: Flags [P.], seq 1:6, ack 1, win 512, options [nop,nop,TS val 1253719321 ecr 1253719321], length 5 13:24:51.379513 IP 127.0.0.1.8818 > 127.0.0.1.58574: Flags [.], ack 6, win 512, options [nop,nop,TS val 1253719321 ecr 1253719321], length 0 it's not effectively used, but the kernel understands what I mean by 0.0.0.0:Right: the 0.0.0.0 means something at the API level, but not AFAICT, at the wire level. At least for the "from tap" side of this change, it's the wire level that I'm checking.connect(3, {sa_family=AF_INET, sin_port=htons(8818), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EINPROGRESS (Operation now in progress) So I wouldn't exclude its usage in tcp_listen_handler() -- even though sure, the kernel translates that, so I don't think it can ever become a practical issue.Hmm. In listen_handler() it's not "on the wire", but reading from an API still seems different from sending to an API to me. At the very least 0.0.0.0 doesn't have the same meaning in all API contexts, which makes it, IMO, unsafe to carry around from one API to another.If it annoys you in the perspective of the flow table, maybe we can turn it into a loopback address, when we see it's used as source or destination?It appears the kernel already does that (IMO, correctly). If we do see a 0.0.0.0 here, I think that would be the kernel indicating some sort of special case that would need special handling. So, again, I don't think we should just blindly copy this address to other APIs.If that's problematic as well, I can totally live with this patch, though. About IPv6: +----------------------+---------------------+ | Attribute | Value | +----------------------+---------------------+ | Address Block | ::/128 | | Name | Unspecified Address | | RFC | [RFC4291] | | Allocation Date | February 2006 | | Termination Date | N/A | | Source | True | | Destination | False | | Forwardable | False | | Global | False | | Reserved-by-Protocol | True | +----------------------+---------------------+ ...this is more specific, and its usage as source address is exemplified in RFC 4291, 2.5.2: One example of its use is in the Source Address field of any IPv6 packets sent by an initializing host before it has learned its own address. ...which should never be the case for any flow passt has to handle, so I think it's fine to refuse it in tcp_listen_handler().Well, depends how broadly you define "flow". We may need to handle this for NDP and/or DHCPv6. But this is specifically for TCP.The rest of the series, up to 22/22, looks good to me.-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonIt's not entirely clear (at least to me) if it's strictly against the RFCs to do so, but at any rate the socket interfaces often treat those values specially[1], so it's not really possible to manipulate such connections. Likewise they should not have broadcast or multicast addresses for either endpoint. However, nothing prevents a guest from creating a SYN packet with such values, and it's not entirely clear what the effect on passt would be. To ensure sane behaviour, explicitly check for this case and drop such packets, logging a debug warning (we don't want a higher level, because that would allow a guest to spam the logs). We never expect such an address on an accept()ed socket either, but just in case, check for it as well. [1] Depending on context as "unknown", "match any" or "kernel, pick something for me" Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 67 insertions(+), 7 deletions(-) diff --git a/tcp.c b/tcp.c index 31d4e87b..236a8d23 100644 --- a/tcp.c +++ b/tcp.c @@ -284,6 +284,7 @@ #include <sys/types.h> #include <sys/uio.h> #include <time.h> +#include <arpa/inet.h> #include <linux/tcp.h> /* For struct tcp_info */ @@ -1930,27 +1931,59 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af, const struct tcphdr *th, const char *opts, size_t optlen, const struct timespec *now) { + in_port_t srcport = ntohs(th->source); + in_port_t dstport = ntohs(th->dest); struct sockaddr_in addr4 = { .sin_family = AF_INET, - .sin_port = th->dest, + .sin_port = htons(dstport), .sin_addr = *(struct in_addr *)daddr, }; struct sockaddr_in6 addr6 = { .sin6_family = AF_INET6, - .sin6_port = th->dest, + .sin6_port = htons(dstport), .sin6_addr = *(struct in6_addr *)daddr, }; const struct sockaddr *sa; struct tcp_tap_conn *conn; union flow *flow; + int s = -1, mss; socklen_t sl; - int s, mss; - - (void)saddr; if (!(flow = flow_alloc())) return; + if (af == AF_INET) { + if (IN4_IS_ADDR_UNSPECIFIED(saddr) || + IN4_IS_ADDR_BROADCAST(saddr) || + IN4_IS_ADDR_MULTICAST(saddr) || srcport == 0 || + IN4_IS_ADDR_UNSPECIFIED(daddr) || + IN4_IS_ADDR_BROADCAST(daddr) || + IN4_IS_ADDR_MULTICAST(daddr) || dstport == 0) { + char sstr[INET_ADDRSTRLEN], dstr[INET_ADDRSTRLEN]; + + debug("Invalid endpoint in TCP SYN: %s:%hu -> %s:%hu", + inet_ntop(AF_INET, saddr, sstr, sizeof(sstr)), + srcport, + inet_ntop(AF_INET, daddr, dstr, sizeof(dstr)), + dstport); + goto cancel; + } + } else if (af == AF_INET6) { + if (IN6_IS_ADDR_UNSPECIFIED(saddr) || + IN6_IS_ADDR_MULTICAST(saddr) || srcport == 0 || + IN6_IS_ADDR_UNSPECIFIED(daddr) || + IN6_IS_ADDR_MULTICAST(daddr) || dstport == 0) { + char sstr[INET6_ADDRSTRLEN], dstr[INET6_ADDRSTRLEN]; + + debug("Invalid endpoint in TCP SYN: %s:%hu -> %s:%hu", + inet_ntop(AF_INET6, saddr, sstr, sizeof(sstr)), + srcport, + inet_ntop(AF_INET6, daddr, dstr, sizeof(dstr)), + dstport); + goto cancel; + } + } + if ((s = tcp_conn_sock(c, af)) < 0) goto cancel; @@ -2001,8 +2034,8 @@ static void tcp_conn_from_tap(struct ctx *c, sa_family_t af, sl = sizeof(addr6); } - conn->fport = ntohs(th->dest); - conn->eport = ntohs(th->source); + conn->fport = dstport; + conn->eport = srcport; conn->seq_init_from_tap = ntohl(th->seq); conn->seq_from_tap = conn->seq_init_from_tap + 1; @@ -2732,6 +2765,33 @@ void tcp_listen_handler(struct ctx *c, union epoll_ref ref, if (s < 0) goto cancel; + if (sa.sa_family == AF_INET) { + const struct in_addr *addr = &sa.sa4.sin_addr; + in_port_t port = sa.sa4.sin_port; + + if (IN4_IS_ADDR_UNSPECIFIED(addr) || + IN4_IS_ADDR_BROADCAST(addr) || + IN4_IS_ADDR_MULTICAST(addr) || port == 0) { + char str[INET_ADDRSTRLEN]; + + err("Invalid endpoint from TCP accept(): %s:%hu", + inet_ntop(AF_INET, addr, str, sizeof(str)), port); + goto cancel; + } + } else if (sa.sa_family == AF_INET6) { + const struct in6_addr *addr = &sa.sa6.sin6_addr; + in_port_t port = sa.sa6.sin6_port; + + if (IN6_IS_ADDR_UNSPECIFIED(addr) || + IN6_IS_ADDR_MULTICAST(addr) || port == 0) { + char str[INET6_ADDRSTRLEN]; + + err("Invalid endpoint from TCP accept(): %s:%hu", + inet_ntop(AF_INET6, addr, str, sizeof(str)), port); + goto cancel; + } + } + if (tcp_splice_conn_from_sock(c, ref.tcp_listen.pif, ref.tcp_listen.port, flow, s, &sa)) return;