[PATCH v4 0/8] Add vhost-user support to passt (part 1)
v4: - rebase - see detailed v4 history log in each patch v3: - add a patch that has been extracted from: "tcp: extract buffer management from tcp_send_flag()" -> "tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()" - see detailed v3 history log in each patch - I didn't address the alignment problem when we provide a pointer to a sub-structure in the internal buffer structure. (for the last patches of the series). v2 comparing to vhost-user full part: - part 1 includes only preliminary patches (checksum, iovec, cleanup) - see detailed v2 history log in each patch. Full series v1 available at: [PATCH 00/24] Add vhost-user support to passt. https://url.corp.redhat.com/passt-vhost-user-v1 Thanks, Laurent Laurent Vivier (8): pcap: add pcap_iov() checksum: align buffers checksum: add csum_iov() util: move IP stuff from util.[ch] to ip.[ch] checksum: use csum_ip4_header() in udp.c and tcp.c checksum: introduce functions to compute the header part checksum for TCP/UDP tap: make tap_update_mac() generic tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers() Makefile | 10 +-- checksum.c | 173 ++++++++++++++++++++++++++++++----------- checksum.h | 9 ++- conf.c | 1 + dhcp.c | 1 + flow.c | 1 + fwd.c | 1 + icmp.c | 1 + inany.c | 1 + ip.c | 72 +++++++++++++++++ ip.h | 86 +++++++++++++++++++++ ndp.c | 1 + pcap.c | 26 ++++++- pcap.h | 1 + qrap.c | 1 + tap.c | 14 ++-- tap.h | 2 +- tcp.c | 214 +++++++++++++++++++++++++++++---------------------- tcp_splice.c | 1 + udp.c | 38 ++++----- util.c | 55 ------------- util.h | 76 ------------------ 22 files changed, 475 insertions(+), 310 deletions(-) create mode 100644 ip.c create mode 100644 ip.h -- 2.42.0
Introduce a new function pcap_iov() to capture packet desribed by an IO
vector.
Update pcap_frame() to manage iovcnt > 1.
Signed-off-by: Laurent Vivier
On Thu, Feb 29, 2024 at 05:59:48PM +0100, Laurent Vivier wrote:
Introduce a new function pcap_iov() to capture packet desribed by an IO vector.
Update pcap_frame() to manage iovcnt > 1.
Yikes. I hadn't actually realised my version only worked for iovcnt == 1.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson
---
Notes: v4: - use pcap_frame()
v3: - update rationale - update comment - use strerror(errno) - use size_t for io vector length
v2: - introduce pcap_header(), a common helper to write packet header - use writev() rather than write() in a loop - add functions comment
pcap.c | 26 ++++++++++++++++++++++---- pcap.h | 1 + 2 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/pcap.c b/pcap.c index a4057b5f9c6a..372f02045262 100644 --- a/pcap.c +++ b/pcap.c @@ -32,6 +32,7 @@ #include "passt.h" #include "log.h" #include "pcap.h" +#include "iov.h"
#define PCAP_VERSION_MINOR 4
@@ -78,7 +79,7 @@ struct pcap_pkthdr { static void pcap_frame(const struct iovec *iov, size_t iovcnt, size_t offset, const struct timeval *tv) { - size_t len = iov->iov_len - offset; + size_t len = iov_size(iov, iovcnt) - offset; struct pcap_pkthdr h = { .tv_sec = tv->tv_sec, .tv_usec = tv->tv_usec, @@ -87,10 +88,8 @@ static void pcap_frame(const struct iovec *iov, size_t iovcnt, }; struct iovec hiov = { &h, sizeof(h) };
- (void)iovcnt; - if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || - write_remainder(pcap_fd, iov, 1, offset) < 0) { + write_remainder(pcap_fd, iov, iovcnt, offset) < 0) { debug("Cannot log packet, length %zu: %s", len, strerror(errno)); } @@ -135,6 +134,25 @@ void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, pcap_frame(iov + i * frame_parts, frame_parts, offset, &tv); }
+/* + * pcap_iov - Write packet data described by an I/O vector + * to a pcap file descriptor. + * + * @iov: Pointer to the array of struct iovec describing the I/O vector + * containing packet data to write, including L2 header + * @iovcnt: Number of buffers (@iov entries) + */ +void pcap_iov(const struct iovec *iov, size_t iovcnt) +{ + struct timeval tv; + + if (pcap_fd == -1) + return; + + gettimeofday(&tv, NULL); + pcap_frame(iov, iovcnt, 0, &tv); +} + /** * pcap_init() - Initialise pcap file * @c: Execution context diff --git a/pcap.h b/pcap.h index 85fc58e57572..b1c4c909c109 100644 --- a/pcap.h +++ b/pcap.h @@ -9,6 +9,7 @@ void pcap(const char *pkt, size_t len); void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, size_t offset); +void pcap_iov(const struct iovec *iov, size_t n); void pcap_init(struct ctx *c);
#endif /* PCAP_H */
-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
If buffer is not aligned use sum_16b() only on the not aligned
part, and then use csum_avx2() on the remaining part
Remove unneeded now function csum_unaligned().
Signed-off-by: Laurent Vivier
Introduce the function csum_unfolded() that computes the unfolded
32-bit checksum of a data buffer, and call it from csum() that returns
the folded value.
Introduce csum_iov() that computes the checksum using csum_folded() on
all vectors of the iovec array and returns the folded result.
Signed-off-by: Laurent Vivier
Introduce ip.[ch] file to encapsulate IP protocol handling functions and
structures. Modify various files to include the new header ip.h when
it's needed.
Signed-off-by: Laurent Vivier
On Thu, Feb 29, 2024 at 05:59:51PM +0100, Laurent Vivier wrote:
Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson --- Notes: v4: - rebase
v3: - rewrap rationale - add David's R-b
v2: - update rationale and comments
Makefile | 10 +++--- conf.c | 1 + dhcp.c | 1 + flow.c | 1 + fwd.c | 1 + icmp.c | 1 + inany.c | 1 + ip.c | 72 +++++++++++++++++++++++++++++++++++++++++++ ip.h | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++
Noe that ip.h exists, in4addr_loopback and in4addr_any should probably go in there rather than inany.h. That can be a followup change though. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On 3/1/24 00:19, David Gibson wrote:
On Thu, Feb 29, 2024 at 05:59:51PM +0100, Laurent Vivier wrote:
Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson --- Notes: v4: - rebase
v3: - rewrap rationale - add David's R-b
v2: - update rationale and comments
Makefile | 10 +++--- conf.c | 1 + dhcp.c | 1 + flow.c | 1 + fwd.c | 1 + icmp.c | 1 + inany.c | 1 + ip.c | 72 +++++++++++++++++++++++++++++++++++++++++++ ip.h | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++
Noe that ip.h exists, in4addr_loopback and in4addr_any should probably go in there rather than inany.h. That can be a followup change though.
Not sure they should be moved to ip.h: they depend on inany_loopback4 and inany_any4 that are both defined in inany.c and the structure inany_addr is defined in inany.h I think it's better to let them where they are. Thanks, Laurent
We can find the same function to compute the IPv4 header
checksum in tcp.c, udp.c and tap.c
Use the function defined for tap.c, csum_ip4_header(), but
with the code used in tcp.c and udp.c as it doesn't need a fully
initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
Signed-off-by: Laurent Vivier
On Thu, Feb 29, 2024 at 05:59:52PM +0100, Laurent Vivier wrote:
We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c
Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson --- Notes: v4: - rebase
v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr
v2: - use csum_ip4_header() from checksum.c - use code from tcp.c and udp.c in csum_ip4_header() - use "const struct iphfr *", check is not updated by the function but by the caller.
checksum.c | 17 +++++++++++++---- checksum.h | 3 ++- tap.c | 3 ++- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 22 insertions(+), 45 deletions(-)
diff --git a/checksum.c b/checksum.c index 74e3742bc6f6..511b296a9a80 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include
#include "util.h" +#include "ip.h" #include "checksum.h"
/* Checksums are optional for UDP over IPv4, so we usually just set @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum) uint16_t csum(const void *buf, size_t len, uint32_t init);
/** - * csum_ip4_header() - Calculate and set IPv4 header checksum + * csum_ip4_header() - Calculate IPv4 header checksum * @ip4h: IPv4 header
Function comment needs to be updated for the new parameters. In particular it needs to note that tot_len, saddr and daddr are all passed in network order. As noted elsewhere, I kind of hate passing non-host-endian values in plain integer types, but I can see why doing otherwise here would be very awkward.
*/ -void csum_ip4_header(struct iphdr *ip4h) +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) { - ip4h->check = 0; - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); + uint32_t sum = L2_BUF_IP4_PSUM(protocol); + + sum += tot_len; + sum += (saddr >> 16) & 0xffff; + sum += saddr & 0xffff; + sum += (daddr >> 16) & 0xffff; + sum += daddr & 0xffff; + + return ~csum_fold(sum); }
/** diff --git a/checksum.h b/checksum.h index dfa705a04a24..92db73612b6e 100644 --- a/checksum.h +++ b/checksum.h @@ -13,7 +13,8 @@ struct icmp6hdr; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); -void csum_ip4_header(struct iphdr *ip4h); +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); diff --git a/tap.c b/tap.c index d35d8944fc41..d4649f0167ab 100644 --- a/tap.c +++ b/tap.c @@ -161,7 +161,8 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst, ip4h->protocol = proto; ip4h->saddr = src.s_addr; ip4h->daddr = dst.s_addr; - csum_ip4_header(ip4h); + ip4h->check = csum_ip4_header(ip4h->tot_len, proto, + src.s_addr, dst.s_addr); return ip4h + 1; }
diff --git a/tcp.c b/tcp.c index e0588f92e65f..ea0802c6b102 100644 --- a/tcp.c +++ b/tcp.c @@ -935,23 +935,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) trace("TCP: failed to set SO_SNDBUF to %i", v); }
-/** - * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one - * @buf: L2 packet buffer with final IPv4 header - */ -static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf) -{ - uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP); - - sum += buf->iph.tot_len; - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - - buf->iph.check = (uint16_t)~csum_fold(sum); -} - /** * tcp_update_check_tcp4() - Update TCP checksum from stored one * @buf: L2 packet buffer with final IPv4 header @@ -1394,10 +1377,9 @@ do { \ b->iph.saddr = a4->s_addr; b->iph.daddr = c->ip4.addr_seen.s_addr;
- if (check) - b->iph.check = *check; - else - tcp_update_check_ip4(b); + b->iph.check = check ? *check : + csum_ip4_header(b->iph.tot_len, IPPROTO_TCP, + b->iph.saddr, b->iph.daddr);
SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
diff --git a/udp.c b/udp.c index 26774df7018c..d517c99dcc69 100644 --- a/udp.c +++ b/udp.c @@ -275,23 +275,6 @@ static void udp_invert_portmap(struct udp_fwd_ports *fwd) } }
-/** - * udp_update_check4() - Update checksum with variable parts from stored one - * @buf: L2 packet buffer with final IPv4 header - */ -static void udp_update_check4(struct udp4_l2_buf_t *buf) -{ - uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP); - - sum += buf->iph.tot_len; - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - - buf->iph.check = (uint16_t)~csum_fold(sum); -} - /** * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses * @eth_d: Ethernet destination address, NULL if unchanged @@ -619,7 +602,8 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, b->iph.saddr = src->s_addr; }
- udp_update_check4(b); + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.saddr, b->iph.daddr); b->uh.source = b->s_in.sin_port; b->uh.dest = htons(dstport); b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh));
-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On 3/1/24 00:25, David Gibson wrote:
On Thu, Feb 29, 2024 at 05:59:52PM +0100, Laurent Vivier wrote:
We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c
Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr.
Signed-off-by: Laurent Vivier
Reviewed-by: David Gibson --- Notes: v4: - rebase
v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr
v2: - use csum_ip4_header() from checksum.c - use code from tcp.c and udp.c in csum_ip4_header() - use "const struct iphfr *", check is not updated by the function but by the caller.
checksum.c | 17 +++++++++++++---- checksum.h | 3 ++- tap.c | 3 ++- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 22 insertions(+), 45 deletions(-)
diff --git a/checksum.c b/checksum.c index 74e3742bc6f6..511b296a9a80 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include
#include "util.h" +#include "ip.h" #include "checksum.h"
/* Checksums are optional for UDP over IPv4, so we usually just set @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum) uint16_t csum(const void *buf, size_t len, uint32_t init);
/** - * csum_ip4_header() - Calculate and set IPv4 header checksum + * csum_ip4_header() - Calculate IPv4 header checksum * @ip4h: IPv4 header
Function comment needs to be updated for the new parameters. In particular it needs to note that tot_len, saddr and daddr are all passed in network order.
As noted elsewhere, I kind of hate passing non-host-endian values in plain integer types, but I can see why doing otherwise here would be very awkward.
Perhaps we can use __be16 and __be32 types to really show the endianness in the code? Thanks, Laurent
The TCP and UDP checksums are computed using the data in the TCP/UDP
payload but also some informations in the IP header (protocol,
length, source and destination addresses).
We add two functions, proto_ipv4_header_psum() and
proto_ipv6_header_psum(), to compute the checksum of the IP
header part.
Signed-off-by: Laurent Vivier
On Thu, Feb 29, 2024 at 05:59:53PM +0100, Laurent Vivier wrote:
The TCP and UDP checksums are computed using the data in the TCP/UDP payload but also some informations in the IP header (protocol, length, source and destination addresses).
We add two functions, proto_ipv4_header_psum() and proto_ipv6_header_psum(), to compute the checksum of the IP header part.
Signed-off-by: Laurent Vivier
--- Notes: v4: - fix payload length endianness
v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr/ipv6hdr
v2: - move new function to checksum.c - use _psum rather than _checksum in the name - replace csum_udp4() and csum_udp6() by the new function
checksum.c | 69 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 4 ++++ tcp.c | 45 ++++++++++++++++------------------- udp.c | 13 ++++++---- 4 files changed, 86 insertions(+), 45 deletions(-)
diff --git a/checksum.c b/checksum.c index 511b296a9a80..93c8d5205c2b 100644 --- a/checksum.c +++ b/checksum.c @@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, return ~csum_fold(sum); }
+/** + * proto_ipv4_header_psum() - Calculates the partial checksum of an + * IPv4 header for UDP or TCP + * @tot_len: IPv4 Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * @proto: proto Protocol number
Needs to note that tot_len is in host order, but saddr and daddr are in network order. Usually, I'd take host order as assumed for a plain integer type, but since it's mixed here, we should annotate them all. Alternatively, we could pass saddr and daddr as struct in_addr. In general I've tried to pass IPv4 addresses with that type, rather than in_addr_t or uint32_t. Looking at the callers, it seems like it's a mixed bag whether that's messier or cleaner in this case.
+ * Returns: Partial checksum of the IPv4 header + */ +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) +{ + uint32_t psum = htons(protocol); + + psum += (saddr >> 16) & 0xffff; + psum += saddr & 0xffff; + psum += (daddr >> 16) & 0xffff; + psum += daddr & 0xffff; + psum += htons(tot_len); + + return psum; +} + /** * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet * @udp4hr: UDP header, initialised apart from checksum @@ -150,14 +174,12 @@ void csum_udp4(struct udphdr *udp4hr, udp4hr->check = 0;
if (UDP4_REAL_CHECKSUMS) { - /* UNTESTED: if we did want real UDPv4 checksums, this - * is roughly what we'd need */ - uint32_t psum = csum_fold(saddr.s_addr) - + csum_fold(daddr.s_addr) - + htons(len + sizeof(*udp4hr)) - + htons(IPPROTO_UDP); - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp4hr, sizeof(*udp4hr)); + uint16_t tot_len = len + sizeof(struct udphdr); + uint32_t psum = proto_ipv4_header_psum(tot_len, + IPPROTO_UDP, + saddr.s_addr, + daddr.s_addr); + psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum); udp4hr->check = csum(payload, len, psum); } } @@ -180,6 +202,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) icmp4hr->checksum = csum(payload, len, psum); }
+/** + * proto_ipv6_header_psum() - Calculates the partial checksum of an + * IPv6 header for UDP or TCP + * @payload_len: IPv6 payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * Returns: Partial checksum of the IPv6 header + */ +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr)
I don't see any point to passing the addresses by value here. You take their address, so they must be written back to memory if passed in registers. At the call sites, you still have the dereference so it doesn't help with alignment.
+{ + uint32_t sum = htons(protocol) + htons(payload_len); + + sum += sum_16b(&saddr, sizeof(saddr)); + sum += sum_16b(&daddr, sizeof(daddr)); + + return sum; +} + /** * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet * @udp6hr: UDP header, initialised apart from checksum @@ -190,14 +232,11 @@ void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len) { - /* Partial checksum for the pseudo-IPv6 header */ - uint32_t psum = sum_16b(saddr, sizeof(*saddr)) + - sum_16b(daddr, sizeof(*daddr)) + - htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP); - + uint32_t psum = proto_ipv6_header_psum(len + sizeof(struct udphdr), + IPPROTO_UDP, *saddr, *daddr); udp6hr->check = 0; - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp6hr, sizeof(*udp6hr)); + + psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum); udp6hr->check = csum(payload, len, psum); }
diff --git a/checksum.h b/checksum.h index 92db73612b6e..b2b5b8e8b77e 100644 --- a/checksum.h +++ b/checksum.h @@ -15,10 +15,14 @@ uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, uint32_t saddr, uint32_t daddr); +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len); +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr); void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); diff --git a/tcp.c b/tcp.c index ea0802c6b102..d78efa5401bb 100644 --- a/tcp.c +++ b/tcp.c @@ -939,39 +939,30 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) * tcp_update_check_tcp4() - Update TCP checksum from stored one * @buf: L2 packet buffer with final IPv4 header
Function comment no longer matches the parameters.
*/ -static void tcp_update_check_tcp4(struct tcp4_l2_buf_t *buf) +static void tcp_update_check_tcp4(struct iphdr *iph)
Hmm... so this takes only a pointer to iph, but writes to the TCP header it assumes is beyond that, and reads from the payload it assumes is beyond that. That seems like a dangerous interface to me (not to mention that I fear it could trigger TBAA traps).
{ - uint16_t tlen = ntohs(buf->iph.tot_len) - 20; - uint32_t sum = htons(IPPROTO_TCP); + uint16_t tlen = ntohs(iph->tot_len) - sizeof(struct iphdr); + uint32_t sum = proto_ipv4_header_psum(tlen, IPPROTO_TCP, + iph->saddr, iph->daddr); + struct tcphdr *th = (struct tcphdr *)(iph + 1);
- sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - sum += htons(ntohs(buf->iph.tot_len) - 20); - - buf->th.check = 0; - buf->th.check = csum(&buf->th, tlen, sum); + th->check = 0; + th->check = csum(th, tlen, sum); }
/** * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6 * @buf: L2 packet buffer with final IPv6 header */ -static void tcp_update_check_tcp6(struct tcp6_l2_buf_t *buf) +static void tcp_update_check_tcp6(struct ipv6hdr *ip6h)
Same comments as for the IPv4 version.
{ - int len = ntohs(buf->ip6h.payload_len) + sizeof(struct ipv6hdr); - - buf->ip6h.hop_limit = IPPROTO_TCP; - buf->ip6h.version = 0; - buf->ip6h.nexthdr = 0; + struct tcphdr *th = (struct tcphdr *)(ip6h + 1); + uint16_t payload_len = ntohs(ip6h->payload_len); + uint32_t sum = proto_ipv6_header_psum(payload_len, IPPROTO_TCP, + ip6h->saddr, ip6h->daddr);
- buf->th.check = 0; - buf->th.check = csum(&buf->ip6h, len, 0); - - buf->ip6h.hop_limit = 255; - buf->ip6h.version = 6; - buf->ip6h.nexthdr = IPPROTO_TCP; + th->check = 0; + th->check = csum(th, payload_len, sum); }
/** @@ -1383,7 +1374,7 @@ do { \
SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
- tcp_update_check_tcp4(b); + tcp_update_check_tcp4(&b->iph);
tlen = tap_iov_len(c, &b->taph, ip_len); } else { @@ -1402,7 +1393,11 @@ do { \
SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq);
- tcp_update_check_tcp6(b); + tcp_update_check_tcp6(&b->ip6h); + + b->ip6h.hop_limit = 255; + b->ip6h.version = 6; + b->ip6h.nexthdr = IPPROTO_TCP;
b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf; b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff; diff --git a/udp.c b/udp.c index d517c99dcc69..410ace16a6a2 100644 --- a/udp.c +++ b/udp.c @@ -625,6 +625,7 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, { struct udp6_l2_buf_t *b = &udp6_l2_buf[n]; struct in6_addr *src; + uint16_t payload_len; in_port_t src_port; size_t ip_len;
@@ -633,7 +634,8 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport,
ip_len = udp6_l2_mh_sock[n].msg_len + sizeof(b->ip6h) + sizeof(b->uh);
- b->ip6h.payload_len = htons(udp6_l2_mh_sock[n].msg_len + sizeof(b->uh)); + payload_len = udp6_l2_mh_sock[n].msg_len + sizeof(b->uh); + b->ip6h.payload_len = htons(payload_len);
if (IN6_IS_ADDR_LINKLOCAL(src)) { b->ip6h.daddr = c->ip6.addr_ll_seen; @@ -675,10 +677,11 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, b->uh.source = b->s_in6.sin6_port; b->uh.dest = htons(dstport); b->uh.len = b->ip6h.payload_len; - - b->ip6h.hop_limit = IPPROTO_UDP; - b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0; - b->uh.check = csum(&b->ip6h, ip_len, 0); + b->uh.check = 0; + b->uh.check = csum(&b->uh, payload_len, + proto_ipv6_header_psum(payload_len, IPPROTO_UDP, + b->ip6h.saddr, + b->ip6h.daddr)); b->ip6h.version = 6; b->ip6h.nexthdr = IPPROTO_UDP; b->ip6h.hop_limit = 255;
-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Use ethhdr rather than tap_hdr.
Signed-off-by: Laurent Vivier
Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function
tcp_fill_header().
Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to
tcp_fill_ipv4_header() and tcp_fill_ipv6_header()
Signed-off-by: Laurent Vivier
On Thu, Feb 29, 2024 at 05:59:55PM +0100, Laurent Vivier wrote:
Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function tcp_fill_header().
Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to tcp_fill_ipv4_header() and tcp_fill_ipv6_header()
Signed-off-by: Laurent Vivier
--- Notes: v4: - group all the ip6g initialisations together and remove flow_lbl preset to 0 - add ASSERT(a4)
v3: - add to sub-series part 1
v2: - extract header filling functions from "tcp: extract buffer management from tcp_send_flag()" - rename them tcp_fill_flag_header()/tcp_fill_ipv4_header(), tcp_fill_ipv6_header() - use upside-down Christmas tree arguments order - replace (void *) by (struct tcphdr *)
tcp.c | 154 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 104 insertions(+), 50 deletions(-)
diff --git a/tcp.c b/tcp.c index 5b2fdf662a6c..ced22534a103 100644 --- a/tcp.c +++ b/tcp.c @@ -1326,6 +1326,108 @@ void tcp_defer_handler(struct ctx *c) tcp_l2_data_buf_flush(c); }
+/** + * tcp_fill_header() - Fill the TCP header fields for a given TCP segment. + * + * @th: Pointer to the TCP header structure + * @conn: Pointer to the TCP connection structure + * @seq: Sequence number + */ +static void tcp_fill_header(struct tcphdr *th, + const struct tcp_tap_conn *conn, uint32_t seq) +{ + th->source = htons(conn->fport); + th->dest = htons(conn->eport); + th->seq = htonl(seq); + th->ack_seq = htonl(conn->seq_ack_to_tap); + if (conn->events & ESTABLISHED) { + th->window = htons(conn->wnd_to_tap); + } else { + unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; + + th->window = htons(MIN(wnd, USHRT_MAX)); + } +} + +/** + * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers
I don't love the name, since it does also fill the TCP header. Maybe 'tcp_fill_headers4()'?
+ * @c: Execution context + * @conn: Connection pointer + * @iph: Pointer to IPv4 header, immediately followed by a TCP header
Again, really don't love accessing beyond a given pointer's type.
+ * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: IP frame length including L2 headers, host order
AFAICT the return value does *not* include the L2 headers..
+ */ +static size_t tcp_fill_ipv4_header(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct iphdr *iph, size_t plen, + const uint16_t *check, uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); + const struct in_addr *a4 = inany_v4(&conn->faddr); + struct tcphdr *th = (struct tcphdr *)(iph + 1); + + ASSERT(a4); + + iph->tot_len = htons(ip_len); + iph->saddr = a4->s_addr; + iph->daddr = c->ip4.addr_seen.s_addr; + + iph->check = check ? *check : + csum_ip4_header(iph->tot_len, IPPROTO_TCP, + iph->saddr, iph->daddr); + + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp4(iph);
It's a bit ugly that tcp_fill_header() fills the TCP header, but *not* the checksum. Could we handle this by passing the pseudo-header psum into tcp_fill_header()? Then the logic for that in tcp_update_check_tcp4() would become part of this function.
+ return ip_len; +} + +/** + * tcp_fill_ipv6_header() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers + * @c: Execution context + * @conn: Connection pointer + * @ip6h: Pointer to IPv6 header, immediately followed by a TCP header + * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: The total length of the IPv6 packet, host order + */ +static size_t tcp_fill_ipv6_header(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct ipv6hdr *ip6h, size_t plen, + uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); + struct tcphdr *th = (struct tcphdr *)(ip6h + 1); + + ip6h->payload_len = htons(plen + sizeof(struct tcphdr)); + ip6h->saddr = conn->faddr.a6; + if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr)) + ip6h->daddr = c->ip6.addr_ll_seen; + else + ip6h->daddr = c->ip6.addr_seen; + + ip6h->hop_limit = 255; + ip6h->version = 6; + ip6h->nexthdr = IPPROTO_TCP; + + ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf; + ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff; + ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff; + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp6(ip6h); + + return ip_len; +} + /** * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers * @c: Execution context @@ -1345,67 +1447,19 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c, const struct in_addr *a4 = inany_v4(&conn->faddr); size_t ip_len, tlen;
-#define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq) \ -do { \ - b->th.source = htons(conn->fport); \ - b->th.dest = htons(conn->eport); \ - b->th.seq = htonl(seq); \ - b->th.ack_seq = htonl(conn->seq_ack_to_tap); \ - if (conn->events & ESTABLISHED) { \ - b->th.window = htons(conn->wnd_to_tap); \ - } else { \ - unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; \ - \ - b->th.window = htons(MIN(wnd, USHRT_MAX)); \ - } \ -} while (0) - if (a4) { struct tcp4_l2_buf_t *b = (struct tcp4_l2_buf_t *)p;
- ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); - b->iph.tot_len = htons(ip_len); - b->iph.saddr = a4->s_addr; - b->iph.daddr = c->ip4.addr_seen.s_addr; - - b->iph.check = check ? *check : - csum_ip4_header(b->iph.tot_len, IPPROTO_TCP, - b->iph.saddr, b->iph.daddr); - - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - - tcp_update_check_tcp4(&b->iph); + ip_len = tcp_fill_ipv4_header(c, conn, &b->iph, plen, check, seq);
tlen = tap_iov_len(c, &b->taph, ip_len); } else { struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p;
- ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); - - b->ip6h.payload_len = htons(plen + sizeof(struct tcphdr)); - b->ip6h.saddr = conn->faddr.a6; - if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr)) - b->ip6h.daddr = c->ip6.addr_ll_seen; - else - b->ip6h.daddr = c->ip6.addr_seen; - - memset(b->ip6h.flow_lbl, 0, 3); - - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - - tcp_update_check_tcp6(&b->ip6h); - - b->ip6h.hop_limit = 255; - b->ip6h.version = 6; - b->ip6h.nexthdr = IPPROTO_TCP; - - b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf; - b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff; - b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff; + ip_len = tcp_fill_ipv6_header(c, conn, &b->ip6h, plen, seq);
tlen = tap_iov_len(c, &b->taph, ip_len); } -#undef SET_TCP_HEADER_COMMON_V4_V6
return tlen; }
-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On 3/1/24 00:54, David Gibson wrote:
On Thu, Feb 29, 2024 at 05:59:55PM +0100, Laurent Vivier wrote:
Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function tcp_fill_header().
Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to tcp_fill_ipv4_header() and tcp_fill_ipv6_header()
Signed-off-by: Laurent Vivier
--- Notes: v4: - group all the ip6g initialisations together and remove flow_lbl preset to 0 - add ASSERT(a4)
v3: - add to sub-series part 1
v2: - extract header filling functions from "tcp: extract buffer management from tcp_send_flag()" - rename them tcp_fill_flag_header()/tcp_fill_ipv4_header(), tcp_fill_ipv6_header() - use upside-down Christmas tree arguments order - replace (void *) by (struct tcphdr *)
tcp.c | 154 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 104 insertions(+), 50 deletions(-)
diff --git a/tcp.c b/tcp.c index 5b2fdf662a6c..ced22534a103 100644 --- a/tcp.c +++ b/tcp.c @@ -1326,6 +1326,108 @@ void tcp_defer_handler(struct ctx *c) tcp_l2_data_buf_flush(c); }
+/** + * tcp_fill_header() - Fill the TCP header fields for a given TCP segment. + * + * @th: Pointer to the TCP header structure + * @conn: Pointer to the TCP connection structure + * @seq: Sequence number + */ +static void tcp_fill_header(struct tcphdr *th, + const struct tcp_tap_conn *conn, uint32_t seq) +{ + th->source = htons(conn->fport); + th->dest = htons(conn->eport); + th->seq = htonl(seq); + th->ack_seq = htonl(conn->seq_ack_to_tap); + if (conn->events & ESTABLISHED) { + th->window = htons(conn->wnd_to_tap); + } else { + unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; + + th->window = htons(MIN(wnd, USHRT_MAX)); + } +} + +/** + * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers
I don't love the name, since it does also fill the TCP header. Maybe 'tcp_fill_headers4()'?
+ * @c: Execution context + * @conn: Connection pointer + * @iph: Pointer to IPv4 header, immediately followed by a TCP header
Again, really don't love accessing beyond a given pointer's type.
+ * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: IP frame length including L2 headers, host order
AFAICT the return value does *not* include the L2 headers..
+ */ +static size_t tcp_fill_ipv4_header(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct iphdr *iph, size_t plen, + const uint16_t *check, uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); + const struct in_addr *a4 = inany_v4(&conn->faddr); + struct tcphdr *th = (struct tcphdr *)(iph + 1); + + ASSERT(a4); + + iph->tot_len = htons(ip_len); + iph->saddr = a4->s_addr; + iph->daddr = c->ip4.addr_seen.s_addr; + + iph->check = check ? *check : + csum_ip4_header(iph->tot_len, IPPROTO_TCP, + iph->saddr, iph->daddr); + + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp4(iph);
It's a bit ugly that tcp_fill_header() fills the TCP header, but *not* the checksum. Could we handle this by passing the pseudo-header psum into tcp_fill_header()? Then the logic for that in tcp_update_check_tcp4() would become part of this function.
The problem with that is we must also pass the payload (that is after the TCP header) and the payload length. So we need to add two parameters to tcp_fill_header(); psum and payload_length (guessing also the payload is following "th"). Moreover in the vhost-user part tcp_update_check_tcp4() is now called conditionally, because the checksum is computed in the vhost-code as the payload is stripped along several iovecs. I'm going to send my v5 without updating this part. If you really think it should be done differently please give me more details (considering also the vhost-user part). Thanks, Laurent
participants (2)
-
David Gibson
-
Laurent Vivier