Stefano, you may have already done this, but I hit the same issue trying to rebase my own code on the vhost-user stuff. This is Laurent's most recent vhost-user part 1 series, but with fixes applied for the cppcheck and clang-tidy regressions introduced. Laurent Vivier (9): pcap: add pcap_iov() checksum: align buffers checksum: add csum_iov() util: move IP stuff from util.[ch] to ip.[ch] udp: little cleanup in udp_update_hdrX() to prepare future changes checksum: use csum_ip4_header() in udp.c and tcp.c checksum: introduce functions to compute the header part checksum for TCP/UDP tap: make tap_update_mac() generic tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers() Makefile | 10 +-- checksum.c | 173 ++++++++++++++++++++++++++++++---------- checksum.h | 12 ++- conf.c | 1 + dhcp.c | 1 + flow.c | 1 + fwd.c | 1 + icmp.c | 1 + inany.c | 1 + iov.c | 1 - ip.c | 72 +++++++++++++++++ ip.h | 86 ++++++++++++++++++++ ndp.c | 1 + pcap.c | 27 ++++++- pcap.h | 1 + qrap.c | 1 + tap.c | 13 +-- tap.h | 2 +- tcp.c | 221 +++++++++++++++++++++++++++++---------------------- tcp_splice.c | 1 + udp.c | 82 ++++++++----------- util.c | 55 ------------- util.h | 76 ------------------ 23 files changed, 505 insertions(+), 335 deletions(-) create mode 100644 ip.c create mode 100644 ip.h -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> Introduce a new function pcap_iov() to capture packet desribed by an IO vector. Update pcap_frame() to manage iovcnt > 1. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-2-lvivier(a)redhat.com> --- iov.c | 1 - pcap.c | 27 +++++++++++++++++++++++---- pcap.h | 1 + 3 files changed, 24 insertions(+), 5 deletions(-) diff --git a/iov.c b/iov.c index e3312628..e7b4eb75 100644 --- a/iov.c +++ b/iov.c @@ -146,7 +146,6 @@ size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, * * Returns: The total size in bytes. */ -/* cppcheck-suppress unusedFunction */ size_t iov_size(const struct iovec *iov, size_t iov_cnt) { unsigned int i; diff --git a/pcap.c b/pcap.c index a4057b5f..a0f01ad3 100644 --- a/pcap.c +++ b/pcap.c @@ -32,6 +32,7 @@ #include "passt.h" #include "log.h" #include "pcap.h" +#include "iov.h" #define PCAP_VERSION_MINOR 4 @@ -78,7 +79,7 @@ struct pcap_pkthdr { static void pcap_frame(const struct iovec *iov, size_t iovcnt, size_t offset, const struct timeval *tv) { - size_t len = iov->iov_len - offset; + size_t len = iov_size(iov, iovcnt) - offset; struct pcap_pkthdr h = { .tv_sec = tv->tv_sec, .tv_usec = tv->tv_usec, @@ -87,10 +88,8 @@ static void pcap_frame(const struct iovec *iov, size_t iovcnt, }; struct iovec hiov = { &h, sizeof(h) }; - (void)iovcnt; - if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || - write_remainder(pcap_fd, iov, 1, offset) < 0) { + write_remainder(pcap_fd, iov, iovcnt, offset) < 0) { debug("Cannot log packet, length %zu: %s", len, strerror(errno)); } @@ -135,6 +134,26 @@ void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, pcap_frame(iov + i * frame_parts, frame_parts, offset, &tv); } +/* + * pcap_iov - Write packet data described by an I/O vector + * to a pcap file descriptor. + * + * @iov: Pointer to the array of struct iovec describing the I/O vector + * containing packet data to write, including L2 header + * @iovcnt: Number of buffers (@iov entries) + */ +/* cppcheck-suppress unusedFunction */ +void pcap_iov(const struct iovec *iov, size_t iovcnt) +{ + struct timeval tv; + + if (pcap_fd == -1) + return; + + gettimeofday(&tv, NULL); + pcap_frame(iov, iovcnt, 0, &tv); +} + /** * pcap_init() - Initialise pcap file * @c: Execution context diff --git a/pcap.h b/pcap.h index 85fc58e5..15d46572 100644 --- a/pcap.h +++ b/pcap.h @@ -9,6 +9,7 @@ void pcap(const char *pkt, size_t len); void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, size_t offset); +void pcap_iov(const struct iovec *iov, size_t iovcnt); void pcap_init(struct ctx *c); #endif /* PCAP_H */ -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> If buffer is not aligned use sum_16b() only on the not aligned part, and then use csum_avx2() on the remaining part Remove unneeded now function csum_unaligned(). Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-3-lvivier(a)redhat.com> --- checksum.c | 47 ++++++++++++++++++++++++----------------------- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/checksum.c b/checksum.c index f21c9b7a..65486b46 100644 --- a/checksum.c +++ b/checksum.c @@ -56,6 +56,8 @@ #include <linux/udp.h> #include <linux/icmpv6.h> +#include "util.h" + /* Checksums are optional for UDP over IPv4, so we usually just set * them to 0. Change this to 1 to calculate real UDP over IPv4 * checksums @@ -110,20 +112,7 @@ uint16_t csum_fold(uint32_t sum) return sum; } -/** - * csum_unaligned() - Compute TCP/IP-style checksum for not 32-byte aligned data - * @buf: Input data - * @len: Input length - * @init: Initial 32-bit checksum, 0 for no pre-computed checksum - * - * Return: 16-bit IPv4-style checksum - */ -/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ -__attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ -uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) -{ - return (uint16_t)~csum_fold(sum_16b(buf, len) + init); -} +uint16_t csum(const void *buf, size_t len, uint32_t init); /** * csum_ip4_header() - Calculate and set IPv4 header checksum @@ -132,7 +121,7 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) void csum_ip4_header(struct iphdr *ip4h) { ip4h->check = 0; - ip4h->check = csum_unaligned(ip4h, (size_t)ip4h->ihl * 4, 0); + ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); } /** @@ -159,7 +148,7 @@ void csum_udp4(struct udphdr *udp4hr, + htons(IPPROTO_UDP); /* Add in partial checksum for the UDP header alone */ psum += sum_16b(udp4hr, sizeof(*udp4hr)); - udp4hr->check = csum_unaligned(payload, len, psum); + udp4hr->check = csum(payload, len, psum); } } @@ -178,7 +167,7 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) /* Partial checksum for ICMP header alone */ psum = sum_16b(icmp4hr, sizeof(*icmp4hr)); - icmp4hr->checksum = csum_unaligned(payload, len, psum); + icmp4hr->checksum = csum(payload, len, psum); } /** @@ -199,7 +188,7 @@ void csum_udp6(struct udphdr *udp6hr, udp6hr->check = 0; /* Add in partial checksum for the UDP header alone */ psum += sum_16b(udp6hr, sizeof(*udp6hr)); - udp6hr->check = csum_unaligned(payload, len, psum); + udp6hr->check = csum(payload, len, psum); } /** @@ -222,7 +211,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr, icmp6hr->icmp6_cksum = 0; /* Add in partial checksum for the ICMPv6 header alone */ psum += sum_16b(icmp6hr, sizeof(*icmp6hr)); - icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum); + icmp6hr->icmp6_cksum = csum(payload, len, psum); } #ifdef __AVX2__ @@ -397,17 +386,29 @@ less_than_128_bytes: /** * csum() - Compute TCP/IP-style checksum - * @buf: Input buffer, must be aligned to 32-byte boundary + * @buf: Input buffer * @len: Input length * @init: Initial 32-bit checksum, 0 for no pre-computed checksum * - * Return: 16-bit folded, complemented checksum sum + * Return: 16-bit folded, complemented checksum */ /* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ uint16_t csum(const void *buf, size_t len, uint32_t init) { - return (uint16_t)~csum_fold(csum_avx2(buf, len, init)); + intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i)); + unsigned int pad = align - (intptr_t)buf; + + if (len < pad) + pad = len; + + if (pad) + init += sum_16b(buf, pad); + + if (len > pad) + init = csum_avx2((void *)align, len - pad, init); + + return (uint16_t)~csum_fold(init); } #else /* __AVX2__ */ @@ -424,7 +425,7 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ uint16_t csum(const void *buf, size_t len, uint32_t init) { - return csum_unaligned(buf, len, init); + return (uint16_t)~csum_fold(sum_16b(buf, len) + init); } #endif /* !__AVX2__ */ -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> Introduce the function csum_unfolded() that computes the unfolded 32-bit checksum of a data buffer, and call it from csum() that returns the folded value. Introduce csum_iov() that computes the checksum using csum_folded() on all vectors of the iovec array and returns the folded result. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-4-lvivier(a)redhat.com> --- checksum.c | 59 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 4 +++- 2 files changed, 49 insertions(+), 14 deletions(-) diff --git a/checksum.c b/checksum.c index 65486b46..643957b5 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set * them to 0. Change this to 1 to calculate real UDP over IPv4 @@ -112,8 +113,6 @@ uint16_t csum_fold(uint32_t sum) return sum; } -uint16_t csum(const void *buf, size_t len, uint32_t init); - /** * csum_ip4_header() - Calculate and set IPv4 header checksum * @ip4h: IPv4 header @@ -385,16 +384,17 @@ less_than_128_bytes: } /** - * csum() - Compute TCP/IP-style checksum - * @buf: Input buffer - * @len: Input length - * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * csum_unfolded - Calculate the unfolded checksum of a data buffer. * - * Return: 16-bit folded, complemented checksum + * @buf: Input buffer + * @len: Input length + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 32-bit unfolded */ /* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ -uint16_t csum(const void *buf, size_t len, uint32_t init) +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) { intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i)); unsigned int pad = align - (intptr_t)buf; @@ -408,16 +408,31 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) if (len > pad) init = csum_avx2((void *)align, len - pad, init); - return (uint16_t)~csum_fold(init); + return init; } - #else /* __AVX2__ */ +/** + * csum_unfolded - Calculate the unfolded checksum of a data buffer. + * + * @buf: Input buffer + * @len: Input length + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 32-bit unfolded checksum + */ +/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ +__attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) +{ + return sum_16b(buf, len) + init; +} +#endif /* !__AVX2__ */ /** * csum() - Compute TCP/IP-style checksum * @buf: Input buffer * @len: Input length - * @sum: Initial 32-bit checksum, 0 for no pre-computed checksum + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum * * Return: 16-bit folded, complemented checksum */ @@ -425,7 +440,25 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ uint16_t csum(const void *buf, size_t len, uint32_t init) { - return (uint16_t)~csum_fold(sum_16b(buf, len) + init); + return (uint16_t)~csum_fold(csum_unfolded(buf, len, init)); } -#endif /* !__AVX2__ */ +/** + * csum_iov() - Calculates the unfolded checksum over an array of IO vectors + * + * @iov Pointer to the array of IO vectors + * @n Length of the array + * @init Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 16-bit folded, complemented checksum + */ +/* cppcheck-suppress unusedFunction */ +uint16_t csum_iov(const struct iovec *iov, size_t n, uint32_t init) +{ + unsigned int i; + + for (i = 0; i < n; i++) + init = csum_unfolded(iov[i].iov_base, iov[i].iov_len, init); + + return (uint16_t)~csum_fold(init); +} diff --git a/checksum.h b/checksum.h index 21c0310d..cb80ab38 100644 --- a/checksum.h +++ b/checksum.h @@ -17,13 +17,15 @@ void csum_ip4_header(struct iphdr *ip4h); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); -void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len); +void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len); void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init); uint16_t csum(const void *buf, size_t len, uint32_t init); +uint16_t csum_iov(const struct iovec *iov, size_t n, uint32_t init); #endif /* CHECKSUM_H */ -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-5-lvivier(a)redhat.com> --- Makefile | 10 +++--- conf.c | 1 + dhcp.c | 1 + flow.c | 1 + fwd.c | 1 + icmp.c | 1 + inany.c | 1 + ip.c | 72 +++++++++++++++++++++++++++++++++++++++++++ ip.h | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++ ndp.c | 1 + qrap.c | 1 + tap.c | 1 + tcp.c | 1 + tcp_splice.c | 1 + udp.c | 1 + util.c | 55 --------------------------------- util.h | 76 ---------------------------------------------- 17 files changed, 175 insertions(+), 136 deletions(-) create mode 100644 ip.c create mode 100644 ip.h diff --git a/Makefile b/Makefile index 8f966941..2735797a 100644 --- a/Makefile +++ b/Makefile @@ -45,8 +45,8 @@ FLAGS += -DVERSION=\"$(VERSION)\" FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS) PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c fwd.c \ - icmp.c igmp.c inany.c iov.c isolation.c lineread.c log.c mld.c ndp.c \ - netlink.c packet.c passt.c pasta.c pcap.c pif.c tap.c tcp.c \ + icmp.c igmp.c inany.c iov.c ip.c isolation.c lineread.c log.c mld.c \ + ndp.c netlink.c packet.c passt.c pasta.c pcap.c pif.c tap.c tcp.c \ tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c SRCS = $(PASST_SRCS) $(QRAP_SRCS) @@ -54,9 +54,9 @@ SRCS = $(PASST_SRCS) $(QRAP_SRCS) MANPAGES = passt.1 pasta.1 qrap.1 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h fwd.h \ - flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \ - netlink.h packet.h passt.h pasta.h pcap.h pif.h siphash.h tap.h tcp.h \ - tcp_conn.h tcp_splice.h udp.h util.h + flow_table.h icmp.h inany.h iov.h ip.h isolation.h lineread.h log.h \ + ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h siphash.h tap.h \ + tcp.h tcp_conn.h tcp_splice.h udp.h util.h HEADERS = $(PASST_HEADERS) seccomp.h C := \#include <linux/tcp.h>\nstruct tcp_info x = { .tcpi_snd_wnd = 0 }; diff --git a/conf.c b/conf.c index e630140d..4a783b8b 100644 --- a/conf.c +++ b/conf.c @@ -35,6 +35,7 @@ #include <netinet/if_ether.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "netlink.h" #include "udp.h" diff --git a/dhcp.c b/dhcp.c index 11077286..ff4834a3 100644 --- a/dhcp.c +++ b/dhcp.c @@ -25,6 +25,7 @@ #include <limits.h> #include "util.h" +#include "ip.h" #include "checksum.h" #include "packet.h" #include "passt.h" diff --git a/flow.c b/flow.c index d7974d59..5bb24ccf 100644 --- a/flow.c +++ b/flow.c @@ -11,6 +11,7 @@ #include <string.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "siphash.h" #include "inany.h" diff --git a/fwd.c b/fwd.c index 09650b26..a235d131 100644 --- a/fwd.c +++ b/fwd.c @@ -21,6 +21,7 @@ #include <stdio.h> #include "util.h" +#include "ip.h" #include "fwd.h" #include "passt.h" #include "lineread.h" diff --git a/icmp.c b/icmp.c index fb2fcafc..49d6dd92 100644 --- a/icmp.c +++ b/icmp.c @@ -33,6 +33,7 @@ #include "packet.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "log.h" diff --git a/inany.c b/inany.c index 1c165b14..c8479a75 100644 --- a/inany.c +++ b/inany.c @@ -13,6 +13,7 @@ #include <arpa/inet.h> #include "util.h" +#include "ip.h" #include "siphash.h" #include "inany.h" diff --git a/ip.c b/ip.c new file mode 100644 index 00000000..2cc7f654 --- /dev/null +++ b/ip.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* PASST - Plug A Simple Socket Transport + * for qemu/UNIX domain socket mode + * + * PASTA - Pack A Subtle Tap Abstraction + * for network namespace/tap device mode + * + * ip.c - IP related functions + * + * Copyright (c) 2020-2021 Red Hat GmbH + * Author: Stefano Brivio <sbrivio(a)redhat.com> + */ + +#include <stddef.h> +#include "util.h" +#include "ip.h" + +#define IPV6_NH_OPT(nh) \ + ((nh) == 0 || (nh) == 43 || (nh) == 44 || (nh) == 50 || \ + (nh) == 51 || (nh) == 60 || (nh) == 135 || (nh) == 139 || \ + (nh) == 140 || (nh) == 253 || (nh) == 254) + +/** + * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol + * @p: Packet pool, packet number @idx has IPv6 header at @offset + * @idx: Index of packet in pool + * @offset: Pre-calculated IPv6 header offset + * @proto: Filled with L4 protocol number + * @dlen: Data length (payload excluding header extensions), set on return + * + * Return: pointer to L4 header, NULL if not found + */ +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, + size_t *dlen) +{ + const struct ipv6_opt_hdr *o; + const struct ipv6hdr *ip6h; + char *base; + int hdrlen; + uint8_t nh; + + base = packet_get(p, idx, 0, 0, NULL); + ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen); + if (!ip6h) + return NULL; + + offset += sizeof(*ip6h); + + nh = ip6h->nexthdr; + if (!IPV6_NH_OPT(nh)) + goto found; + + while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) { + nh = o->nexthdr; + hdrlen = (o->hdrlen + 1) * 8; + + if (IPV6_NH_OPT(nh)) + offset += hdrlen; + else + goto found; + } + + return NULL; + +found: + if (nh == 59) + return NULL; + + *proto = nh; + return base + offset; +} diff --git a/ip.h b/ip.h new file mode 100644 index 00000000..9be47783 --- /dev/null +++ b/ip.h @@ -0,0 +1,86 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * Copyright (c) 2021 Red Hat GmbH + * Author: Stefano Brivio <sbrivio(a)redhat.com> + */ + +#ifndef IP_H +#define IP_H + +#include <netinet/ip.h> +#include <netinet/ip6.h> + +#define IN4_IS_ADDR_UNSPECIFIED(a) \ + (((struct in_addr *)(a))->s_addr == htonl_constant(INADDR_ANY)) +#define IN4_IS_ADDR_BROADCAST(a) \ + (((struct in_addr *)(a))->s_addr == htonl_constant(INADDR_BROADCAST)) +#define IN4_IS_ADDR_LOOPBACK(a) \ + (ntohl(((struct in_addr *)(a))->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET) +#define IN4_IS_ADDR_MULTICAST(a) \ + (IN_MULTICAST(ntohl(((struct in_addr *)(a))->s_addr))) +#define IN4_ARE_ADDR_EQUAL(a, b) \ + (((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr) +#define IN4ADDR_LOOPBACK_INIT \ + { .s_addr = htonl_constant(INADDR_LOOPBACK) } +#define IN4ADDR_ANY_INIT \ + { .s_addr = htonl_constant(INADDR_ANY) } + +#define L2_BUF_IP4_INIT(proto) \ + { \ + .version = 4, \ + .ihl = 5, \ + .tos = 0, \ + .tot_len = 0, \ + .id = 0, \ + .frag_off = 0, \ + .ttl = 0xff, \ + .protocol = (proto), \ + .saddr = 0, \ + .daddr = 0, \ + } +#define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ + (uint32_t)htons_constant(0xff00 | (proto))) + +#define L2_BUF_IP6_INIT(proto) \ + { \ + .priority = 0, \ + .version = 6, \ + .flow_lbl = { 0 }, \ + .payload_len = 0, \ + .nexthdr = (proto), \ + .hop_limit = 255, \ + .saddr = IN6ADDR_ANY_INIT, \ + .daddr = IN6ADDR_ANY_INIT, \ + } + +struct ipv6hdr { +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wpedantic" +#if __BYTE_ORDER == __BIG_ENDIAN + uint8_t version:4, + priority:4; +#else + uint8_t priority:4, + version:4; +#endif +#pragma GCC diagnostic pop + uint8_t flow_lbl[3]; + + uint16_t payload_len; + uint8_t nexthdr; + uint8_t hop_limit; + + struct in6_addr saddr; + struct in6_addr daddr; +}; + +struct ipv6_opt_hdr { + uint8_t nexthdr; + uint8_t hdrlen; + /* + * TLV encoded option data follows. + */ +} __attribute__((packed)); /* required for some archs */ + +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, + size_t *dlen); +#endif /* IP_H */ diff --git a/ndp.c b/ndp.c index 4c85ab8b..c58f4b22 100644 --- a/ndp.c +++ b/ndp.c @@ -28,6 +28,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "log.h" diff --git a/qrap.c b/qrap.c index 97f350a4..d5967062 100644 --- a/qrap.c +++ b/qrap.c @@ -32,6 +32,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "arp.h" diff --git a/tap.c b/tap.c index 3a666212..d35d8944 100644 --- a/tap.c +++ b/tap.c @@ -45,6 +45,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "iov.h" #include "passt.h" #include "arp.h" diff --git a/tcp.c b/tcp.c index 560d1d49..e0588f92 100644 --- a/tcp.c +++ b/tcp.c @@ -290,6 +290,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "siphash.h" diff --git a/tcp_splice.c b/tcp_splice.c index 4957abb8..d066112c 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -49,6 +49,7 @@ #include <sys/socket.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "log.h" #include "tcp_splice.h" diff --git a/udp.c b/udp.c index cb7c31f7..26774df7 100644 --- a/udp.c +++ b/udp.c @@ -113,6 +113,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "siphash.h" #include "inany.h" #include "passt.h" diff --git a/util.c b/util.c index 81449b78..bac5a534 100644 --- a/util.c +++ b/util.c @@ -32,61 +32,6 @@ #include "packet.h" #include "log.h" -#define IPV6_NH_OPT(nh) \ - ((nh) == 0 || (nh) == 43 || (nh) == 44 || (nh) == 50 || \ - (nh) == 51 || (nh) == 60 || (nh) == 135 || (nh) == 139 || \ - (nh) == 140 || (nh) == 253 || (nh) == 254) - -/** - * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol - * @p: Packet pool, packet number @idx has IPv6 header at @offset - * @idx: Index of packet in pool - * @offset: Pre-calculated IPv6 header offset - * @proto: Filled with L4 protocol number - * @dlen: Data length (payload excluding header extensions), set on return - * - * Return: pointer to L4 header, NULL if not found - */ -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, - size_t *dlen) -{ - const struct ipv6_opt_hdr *o; - const struct ipv6hdr *ip6h; - char *base; - int hdrlen; - uint8_t nh; - - base = packet_get(p, idx, 0, 0, NULL); - ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen); - if (!ip6h) - return NULL; - - offset += sizeof(*ip6h); - - nh = ip6h->nexthdr; - if (!IPV6_NH_OPT(nh)) - goto found; - - while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) { - nh = o->nexthdr; - hdrlen = (o->hdrlen + 1) * 8; - - if (IPV6_NH_OPT(nh)) - offset += hdrlen; - else - goto found; - } - - return NULL; - -found: - if (nh == 59) - return NULL; - - *proto = nh; - return base + offset; -} - /** * sock_l4() - Create and bind socket for given L4, add to epoll list * @c: Execution context diff --git a/util.h b/util.h index 55513490..25e54a77 100644 --- a/util.h +++ b/util.h @@ -110,22 +110,6 @@ #define htonl_constant(x) (__bswap_constant_32(x)) #endif -#define IN4_IS_ADDR_UNSPECIFIED(a) \ - (((struct in_addr *)(a))->s_addr == htonl_constant(INADDR_ANY)) -#define IN4_IS_ADDR_BROADCAST(a) \ - (((struct in_addr *)(a))->s_addr == htonl_constant(INADDR_BROADCAST)) -#define IN4_IS_ADDR_LOOPBACK(a) \ - (ntohl(((struct in_addr *)(a))->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET) -#define IN4_IS_ADDR_MULTICAST(a) \ - (IN_MULTICAST(ntohl(((struct in_addr *)(a))->s_addr))) -#define IN4_ARE_ADDR_EQUAL(a, b) \ - (((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr) -#define IN4ADDR_LOOPBACK_INIT \ - { .s_addr = htonl_constant(INADDR_LOOPBACK) } -#define IN4ADDR_ANY_INIT \ - { .s_addr = htonl_constant(INADDR_ANY) } - - #define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 8) int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, void *arg); @@ -138,34 +122,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, (void *)(arg)); \ } while (0) -#define L2_BUF_IP4_INIT(proto) \ - { \ - .version = 4, \ - .ihl = 5, \ - .tos = 0, \ - .tot_len = 0, \ - .id = 0, \ - .frag_off = 0, \ - .ttl = 0xff, \ - .protocol = (proto), \ - .saddr = 0, \ - .daddr = 0, \ - } -#define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ - (uint32_t)htons_constant(0xff00 | (proto))) - -#define L2_BUF_IP6_INIT(proto) \ - { \ - .priority = 0, \ - .version = 6, \ - .flow_lbl = { 0 }, \ - .payload_len = 0, \ - .nexthdr = (proto), \ - .hop_limit = 255, \ - .saddr = IN6ADDR_ANY_INIT, \ - .daddr = IN6ADDR_ANY_INIT, \ - } - #define RCVBUF_BIG (2UL * 1024 * 1024) #define SNDBUF_BIG (4UL * 1024 * 1024) #define SNDBUF_SMALL (128UL * 1024) @@ -173,45 +129,13 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, #include <net/if.h> #include <limits.h> #include <stdint.h> -#include <netinet/ip6.h> #include "packet.h" struct ctx; -struct ipv6hdr { -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wpedantic" -#if __BYTE_ORDER == __BIG_ENDIAN - uint8_t version:4, - priority:4; -#else - uint8_t priority:4, - version:4; -#endif -#pragma GCC diagnostic pop - uint8_t flow_lbl[3]; - - uint16_t payload_len; - uint8_t nexthdr; - uint8_t hop_limit; - - struct in6_addr saddr; - struct in6_addr daddr; -}; - -struct ipv6_opt_hdr { - uint8_t nexthdr; - uint8_t hdrlen; - /* - * TLV encoded option data follows. - */ -} __attribute__((packed)); /* required for some archs */ - /* cppcheck-suppress funcArgNamesDifferent */ __attribute__ ((weak)) int ffsl(long int i) { return __builtin_ffsl(i); } -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, - size_t *dlen); int sock_l4(const struct ctx *c, sa_family_t af, uint8_t proto, const void *bind_addr, const char *ifname, uint16_t port, uint32_t data); -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> in udp_update_hdr4(): Assign the source address to src, either b->s_in.sin_addr, c->ip4.dns_match or c->ip4.gw and then set b->iph.saddr to src->s_addr. in udp_update_hdr6(): Assign the source address to src, either b->s_in6.sin6_addr, c->ip6.dns_match, c->ip6.gw or c->ip6.addr_ll. Assign the destination to dst, either c->ip6.addr_seen or &c->ip6.addr_ll_seen. Then set dst to b->ip6h.daddr and src to b->ip6h.saddr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Message-ID: <20240303135114.1023026-6-lvivier(a)redhat.com> --- udp.c | 39 +++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/udp.c b/udp.c index 26774df7..2b41a1bd 100644 --- a/udp.c +++ b/udp.c @@ -588,7 +588,7 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, const struct timespec *now) { struct udp4_l2_buf_t *b = &udp4_l2_buf[n]; - struct in_addr *src; + const struct in_addr *src; in_port_t src_port; size_t ip_len; @@ -602,10 +602,9 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.dns_match) && IN4_ARE_ADDR_EQUAL(src, &c->ip4.dns_host) && src_port == 53) { - b->iph.saddr = c->ip4.dns_match.s_addr; + src = &c->ip4.dns_match; } else if (IN4_IS_ADDR_LOOPBACK(src) || IN4_ARE_ADDR_EQUAL(src, &c->ip4.addr_seen)) { - b->iph.saddr = c->ip4.gw.s_addr; udp_tap_map[V4][src_port].ts = now->tv_sec; udp_tap_map[V4][src_port].flags |= PORT_LOCAL; @@ -615,9 +614,10 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, udp_tap_map[V4][src_port].flags &= ~PORT_LOOPBACK; bitmap_set(udp_act[V4][UDP_ACT_TAP], src_port); - } else { - b->iph.saddr = src->s_addr; + + src = &c->ip4.gw; } + b->iph.saddr = src->s_addr; udp_update_check4(b); b->uh.source = b->s_in.sin_port; @@ -640,10 +640,11 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, const struct timespec *now) { struct udp6_l2_buf_t *b = &udp6_l2_buf[n]; - struct in6_addr *src; + const struct in6_addr *src, *dst; in_port_t src_port; size_t ip_len; + dst = &c->ip6.addr_seen; src = &b->s_in6.sin6_addr; src_port = ntohs(b->s_in6.sin6_port); @@ -652,23 +653,14 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, b->ip6h.payload_len = htons(udp6_l2_mh_sock[n].msg_len + sizeof(b->uh)); if (IN6_IS_ADDR_LINKLOCAL(src)) { - b->ip6h.daddr = c->ip6.addr_ll_seen; - b->ip6h.saddr = b->s_in6.sin6_addr; + dst = &c->ip6.addr_ll_seen; } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.dns_match) && IN6_ARE_ADDR_EQUAL(src, &c->ip6.dns_host) && src_port == 53) { - b->ip6h.daddr = c->ip6.addr_seen; - b->ip6h.saddr = c->ip6.dns_match; + src = &c->ip6.dns_match; } else if (IN6_IS_ADDR_LOOPBACK(src) || IN6_ARE_ADDR_EQUAL(src, &c->ip6.addr_seen) || IN6_ARE_ADDR_EQUAL(src, &c->ip6.addr)) { - b->ip6h.daddr = c->ip6.addr_ll_seen; - - if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) - b->ip6h.saddr = c->ip6.gw; - else - b->ip6h.saddr = c->ip6.addr_ll; - udp_tap_map[V6][src_port].ts = now->tv_sec; udp_tap_map[V6][src_port].flags |= PORT_LOCAL; @@ -683,10 +675,17 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, udp_tap_map[V6][src_port].flags &= ~PORT_GUA; bitmap_set(udp_act[V6][UDP_ACT_TAP], src_port); - } else { - b->ip6h.daddr = c->ip6.addr_seen; - b->ip6h.saddr = b->s_in6.sin6_addr; + + dst = &c->ip6.addr_ll_seen; + + if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) + src = &c->ip6.gw; + else + src = &c->ip6.addr_ll; + } + b->ip6h.daddr = *dst; + b->ip6h.saddr = *src; b->uh.source = b->s_in6.sin6_port; b->uh.dest = htons(dstport); -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-7-lvivier(a)redhat.com> --- checksum.c | 24 +++++++++++++++++++----- checksum.h | 3 ++- tap.c | 2 +- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 27 insertions(+), 46 deletions(-) diff --git a/checksum.c b/checksum.c index 643957b5..1ac79758 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set @@ -114,13 +115,26 @@ uint16_t csum_fold(uint32_t sum) } /** - * csum_ip4_header() - Calculate and set IPv4 header checksum - * @ip4h: IPv4 header + * csum_ip4_header() - Calculate IPv4 header checksum + * @tot_len: IPv4 payload length (data + IP header, network order) + * @protocol: Protocol number (network order) + * @saddr: IPv4 source address (network order) + * @daddr: IPv4 destination address (network order) + * + * Return: 16-bit folded sum of the IPv4 header */ -void csum_ip4_header(struct iphdr *ip4h) +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + struct in_addr saddr, struct in_addr daddr) { - ip4h->check = 0; - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); + uint32_t sum = L2_BUF_IP4_PSUM(protocol); + + sum += tot_len; + sum += (saddr.s_addr >> 16) & 0xffff; + sum += saddr.s_addr & 0xffff; + sum += (daddr.s_addr >> 16) & 0xffff; + sum += daddr.s_addr & 0xffff; + + return ~csum_fold(sum); } /** diff --git a/checksum.h b/checksum.h index cb80ab38..996f456b 100644 --- a/checksum.h +++ b/checksum.h @@ -13,7 +13,8 @@ struct icmp6hdr; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); -void csum_ip4_header(struct iphdr *ip4h); +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + struct in_addr saddr, struct in_addr daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); diff --git a/tap.c b/tap.c index d35d8944..f971461c 100644 --- a/tap.c +++ b/tap.c @@ -161,7 +161,7 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst, ip4h->protocol = proto; ip4h->saddr = src.s_addr; ip4h->daddr = dst.s_addr; - csum_ip4_header(ip4h); + ip4h->check = csum_ip4_header(ip4h->tot_len, proto, src, dst); return ip4h + 1; } diff --git a/tcp.c b/tcp.c index e0588f92..7fb9dba9 100644 --- a/tcp.c +++ b/tcp.c @@ -935,23 +935,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) trace("TCP: failed to set SO_SNDBUF to %i", v); } -/** - * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one - * @buf: L2 packet buffer with final IPv4 header - */ -static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf) -{ - uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP); - - sum += buf->iph.tot_len; - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - - buf->iph.check = (uint16_t)~csum_fold(sum); -} - /** * tcp_update_check_tcp4() - Update TCP checksum from stored one * @buf: L2 packet buffer with final IPv4 header @@ -1394,10 +1377,9 @@ do { \ b->iph.saddr = a4->s_addr; b->iph.daddr = c->ip4.addr_seen.s_addr; - if (check) - b->iph.check = *check; - else - tcp_update_check_ip4(b); + b->iph.check = check ? *check : + csum_ip4_header(b->iph.tot_len, IPPROTO_TCP, + *a4, c->ip4.addr_seen); SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); diff --git a/udp.c b/udp.c index 2b41a1bd..fb8373be 100644 --- a/udp.c +++ b/udp.c @@ -275,23 +275,6 @@ static void udp_invert_portmap(struct udp_fwd_ports *fwd) } } -/** - * udp_update_check4() - Update checksum with variable parts from stored one - * @buf: L2 packet buffer with final IPv4 header - */ -static void udp_update_check4(struct udp4_l2_buf_t *buf) -{ - uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP); - - sum += buf->iph.tot_len; - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - - buf->iph.check = (uint16_t)~csum_fold(sum); -} - /** * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses * @eth_d: Ethernet destination address, NULL if unchanged @@ -619,7 +602,8 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, } b->iph.saddr = src->s_addr; - udp_update_check4(b); + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + *src, c->ip4.addr_seen); b->uh.source = b->s_in.sin_port; b->uh.dest = htons(dstport); b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh)); -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> The TCP and UDP checksums are computed using the data in the TCP/UDP payload but also some informations in the IP header (protocol, length, source and destination addresses). We add two functions, proto_ipv4_header_psum() and proto_ipv6_header_psum(), to compute the checksum of the IP header part. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Message-ID: <20240303135114.1023026-8-lvivier(a)redhat.com> --- checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 5 ++++ tcp.c | 50 +++++++++++++++++++--------------------- udp.c | 18 ++++++++------- 4 files changed, 90 insertions(+), 50 deletions(-) diff --git a/checksum.c b/checksum.c index 1ac79758..f8a7b539 100644 --- a/checksum.c +++ b/checksum.c @@ -137,6 +137,29 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, return ~csum_fold(sum); } +/** + * proto_ipv4_header_psum() - Calculates the partial checksum of an + * IPv4 header for UDP or TCP + * @tot_len: IPv4 Payload length (host order) + * @proto: Protocol number (host order) + * @saddr: Source address (network order) + * @daddr: Destination address (network order) + * Returns: Partial checksum of the IPv4 header + */ +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + struct in_addr saddr, struct in_addr daddr) +{ + uint32_t psum = htons(protocol); + + psum += (saddr.s_addr >> 16) & 0xffff; + psum += saddr.s_addr & 0xffff; + psum += (daddr.s_addr >> 16) & 0xffff; + psum += daddr.s_addr & 0xffff; + psum += htons(tot_len); + + return psum; +} + /** * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet * @udp4hr: UDP header, initialised apart from checksum @@ -153,14 +176,10 @@ void csum_udp4(struct udphdr *udp4hr, udp4hr->check = 0; if (UDP4_REAL_CHECKSUMS) { - /* UNTESTED: if we did want real UDPv4 checksums, this - * is roughly what we'd need */ - uint32_t psum = csum_fold(saddr.s_addr) - + csum_fold(daddr.s_addr) - + htons(len + sizeof(*udp4hr)) - + htons(IPPROTO_UDP); - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp4hr, sizeof(*udp4hr)); + uint16_t tot_len = len + sizeof(struct udphdr); + uint32_t psum = proto_ipv4_header_psum(tot_len, IPPROTO_UDP, + saddr, daddr); + psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum); udp4hr->check = csum(payload, len, psum); } } @@ -183,6 +202,27 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) icmp4hr->checksum = csum(payload, len, psum); } +/** + * proto_ipv6_header_psum() - Calculates the partial checksum of an + * IPv6 header for UDP or TCP + * @payload_len: IPv6 payload length (host order) + * @proto: Protocol number (host order) + * @saddr: Source address (network order) + * @daddr: Destination address (network order) + * Returns: Partial checksum of the IPv6 header + */ +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + const struct in6_addr *saddr, + const struct in6_addr *daddr) +{ + uint32_t sum = htons(protocol) + htons(payload_len); + + sum += sum_16b(saddr, sizeof(*saddr)); + sum += sum_16b(daddr, sizeof(*daddr)); + + return sum; +} + /** * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet * @udp6hr: UDP header, initialised apart from checksum @@ -193,14 +233,11 @@ void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len) { - /* Partial checksum for the pseudo-IPv6 header */ - uint32_t psum = sum_16b(saddr, sizeof(*saddr)) + - sum_16b(daddr, sizeof(*daddr)) + - htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP); - + uint32_t psum = proto_ipv6_header_psum(len + sizeof(struct udphdr), + IPPROTO_UDP, saddr, daddr); udp6hr->check = 0; - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp6hr, sizeof(*udp6hr)); + + psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum); udp6hr->check = csum(payload, len, psum); } diff --git a/checksum.h b/checksum.h index 996f456b..0f396767 100644 --- a/checksum.h +++ b/checksum.h @@ -15,10 +15,15 @@ uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, struct in_addr saddr, struct in_addr daddr); +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + struct in_addr saddr, struct in_addr daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len); +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + const struct in6_addr *saddr, + const struct in6_addr *daddr); void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); diff --git a/tcp.c b/tcp.c index 7fb9dba9..a7156ee1 100644 --- a/tcp.c +++ b/tcp.c @@ -937,41 +937,33 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) /** * tcp_update_check_tcp4() - Update TCP checksum from stored one - * @buf: L2 packet buffer with final IPv4 header + * @iph: IPv4 header + * @th: TCP header followed by TCP payload */ -static void tcp_update_check_tcp4(struct tcp4_l2_buf_t *buf) +static void tcp_update_check_tcp4(const struct iphdr *iph, struct tcphdr *th) { - uint16_t tlen = ntohs(buf->iph.tot_len) - 20; - uint32_t sum = htons(IPPROTO_TCP); + uint16_t tlen = ntohs(iph->tot_len) - sizeof(struct iphdr); + struct in_addr saddr = { .s_addr = iph->saddr }; + struct in_addr daddr = { .s_addr = iph->daddr }; + uint32_t sum = proto_ipv4_header_psum(tlen, IPPROTO_TCP, saddr, daddr); - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - sum += htons(ntohs(buf->iph.tot_len) - 20); - - buf->th.check = 0; - buf->th.check = csum(&buf->th, tlen, sum); + th->check = 0; + th->check = csum(th, tlen, sum); } /** * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6 - * @buf: L2 packet buffer with final IPv6 header + * @ip6h: IPv6 header + * @th: TCP header followed by TCP payload */ -static void tcp_update_check_tcp6(struct tcp6_l2_buf_t *buf) +static void tcp_update_check_tcp6(struct ipv6hdr *ip6h, struct tcphdr *th) { - int len = ntohs(buf->ip6h.payload_len) + sizeof(struct ipv6hdr); - - buf->ip6h.hop_limit = IPPROTO_TCP; - buf->ip6h.version = 0; - buf->ip6h.nexthdr = 0; + uint16_t payload_len = ntohs(ip6h->payload_len); + uint32_t sum = proto_ipv6_header_psum(payload_len, IPPROTO_TCP, + &ip6h->saddr, &ip6h->daddr); - buf->th.check = 0; - buf->th.check = csum(&buf->ip6h, len, 0); - - buf->ip6h.hop_limit = 255; - buf->ip6h.version = 6; - buf->ip6h.nexthdr = IPPROTO_TCP; + th->check = 0; + th->check = csum(th, payload_len, sum); } /** @@ -1383,7 +1375,7 @@ do { \ SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - tcp_update_check_tcp4(b); + tcp_update_check_tcp4(&b->iph, &b->th); tlen = tap_iov_len(c, &b->taph, ip_len); } else { @@ -1402,7 +1394,11 @@ do { \ SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - tcp_update_check_tcp6(b); + tcp_update_check_tcp6(&b->ip6h, &b->th); + + b->ip6h.hop_limit = 255; + b->ip6h.version = 6; + b->ip6h.nexthdr = IPPROTO_TCP; b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf; b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff; diff --git a/udp.c b/udp.c index fb8373be..2fd67925 100644 --- a/udp.c +++ b/udp.c @@ -625,6 +625,7 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, { struct udp6_l2_buf_t *b = &udp6_l2_buf[n]; const struct in6_addr *src, *dst; + uint16_t payload_len; in_port_t src_port; size_t ip_len; @@ -634,7 +635,8 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, ip_len = udp6_l2_mh_sock[n].msg_len + sizeof(b->ip6h) + sizeof(b->uh); - b->ip6h.payload_len = htons(udp6_l2_mh_sock[n].msg_len + sizeof(b->uh)); + payload_len = udp6_l2_mh_sock[n].msg_len + sizeof(b->uh); + b->ip6h.payload_len = htons(payload_len); if (IN6_IS_ADDR_LINKLOCAL(src)) { dst = &c->ip6.addr_ll_seen; @@ -670,17 +672,17 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, } b->ip6h.daddr = *dst; b->ip6h.saddr = *src; + b->ip6h.version = 6; + b->ip6h.nexthdr = IPPROTO_UDP; + b->ip6h.hop_limit = 255; b->uh.source = b->s_in6.sin6_port; b->uh.dest = htons(dstport); b->uh.len = b->ip6h.payload_len; - - b->ip6h.hop_limit = IPPROTO_UDP; - b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0; - b->uh.check = csum(&b->ip6h, ip_len, 0); - b->ip6h.version = 6; - b->ip6h.nexthdr = IPPROTO_UDP; - b->ip6h.hop_limit = 255; + b->uh.check = 0; + b->uh.check = csum(&b->uh, payload_len, + proto_ipv6_header_psum(payload_len, IPPROTO_UDP, + src, dst)); return tap_iov_len(c, &b->taph, ip_len); } -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> Use ethhdr rather than tap_hdr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> Message-ID: <20240303135114.1023026-9-lvivier(a)redhat.com> --- tap.c | 10 +++++----- tap.h | 2 +- tcp.c | 8 ++++---- udp.c | 4 ++-- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/tap.c b/tap.c index f971461c..c7b93726 100644 --- a/tap.c +++ b/tap.c @@ -419,18 +419,18 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n) } /** - * tap_update_mac() - Update tap L2 header with new Ethernet addresses - * @taph: Tap headers to update + * eth_update_mac() - Update tap L2 header with new Ethernet addresses + * @eh: Ethernet headers to update * @eth_d: Ethernet destination address, NULL if unchanged * @eth_s: Ethernet source address, NULL if unchanged */ -void tap_update_mac(struct tap_hdr *taph, +void eth_update_mac(struct ethhdr *eh, const unsigned char *eth_d, const unsigned char *eth_s) { if (eth_d) - memcpy(taph->eh.h_dest, eth_d, sizeof(taph->eh.h_dest)); + memcpy(eh->h_dest, eth_d, sizeof(eh->h_dest)); if (eth_s) - memcpy(taph->eh.h_source, eth_s, sizeof(taph->eh.h_source)); + memcpy(eh->h_source, eth_s, sizeof(eh->h_source)); } PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf); diff --git a/tap.h b/tap.h index 466d9146..437b9aa2 100644 --- a/tap.h +++ b/tap.h @@ -74,7 +74,7 @@ void tap_icmp6_send(const struct ctx *c, const void *in, size_t len); int tap_send(const struct ctx *c, const void *data, size_t len); size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n); -void tap_update_mac(struct tap_hdr *taph, +void eth_update_mac(struct ethhdr *eh, const unsigned char *eth_d, const unsigned char *eth_s); void tap_listen_handler(struct ctx *c, uint32_t events); void tap_handler_pasta(struct ctx *c, uint32_t events, diff --git a/tcp.c b/tcp.c index a7156ee1..db42d535 100644 --- a/tcp.c +++ b/tcp.c @@ -981,10 +981,10 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s) struct tcp4_l2_buf_t *b4 = &tcp4_l2_buf[i]; struct tcp6_l2_buf_t *b6 = &tcp6_l2_buf[i]; - tap_update_mac(&b4->taph, eth_d, eth_s); - tap_update_mac(&b6->taph, eth_d, eth_s); - tap_update_mac(&b4f->taph, eth_d, eth_s); - tap_update_mac(&b6f->taph, eth_d, eth_s); + eth_update_mac(&b4->taph.eh, eth_d, eth_s); + eth_update_mac(&b6->taph.eh, eth_d, eth_s); + eth_update_mac(&b4f->taph.eh, eth_d, eth_s); + eth_update_mac(&b6f->taph.eh, eth_d, eth_s); } } diff --git a/udp.c b/udp.c index 2fd67925..1f46afb2 100644 --- a/udp.c +++ b/udp.c @@ -288,8 +288,8 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s) struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i]; struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i]; - tap_update_mac(&b4->taph, eth_d, eth_s); - tap_update_mac(&b6->taph, eth_d, eth_s); + eth_update_mac(&b4->taph.eh, eth_d, eth_s); + eth_update_mac(&b6->taph.eh, eth_d, eth_s); } } -- 2.44.0
From: Laurent Vivier <lvivier(a)redhat.com> Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function tcp_fill_header(). Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to tcp_fill_ipv4_header() and tcp_fill_ipv6_header() Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Message-ID: <20240303135114.1023026-10-lvivier(a)redhat.com> --- tcp.c | 156 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 106 insertions(+), 50 deletions(-) diff --git a/tcp.c b/tcp.c index db42d535..d5eedf4d 100644 --- a/tcp.c +++ b/tcp.c @@ -1327,6 +1327,108 @@ void tcp_defer_handler(struct ctx *c) tcp_l2_data_buf_flush(c); } +/** + * tcp_fill_header() - Fill the TCP header fields for a given TCP segment. + * + * @th: Pointer to the TCP header structure + * @conn: Pointer to the TCP connection structure + * @seq: Sequence number + */ +static void tcp_fill_header(struct tcphdr *th, + const struct tcp_tap_conn *conn, uint32_t seq) +{ + th->source = htons(conn->fport); + th->dest = htons(conn->eport); + th->seq = htonl(seq); + th->ack_seq = htonl(conn->seq_ack_to_tap); + if (conn->events & ESTABLISHED) { + th->window = htons(conn->wnd_to_tap); + } else { + unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; + + th->window = htons(MIN(wnd, USHRT_MAX)); + } +} + +/** + * tcp_fill_headers4() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers + * @c: Execution context + * @conn: Connection pointer + * @iph: Pointer to IPv4 header + * @th: Pointer to TCP header + * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: The total length of the IPv4 packet, host order + */ +static size_t tcp_fill_headers4(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct iphdr *iph, struct tcphdr *th, + size_t plen, const uint16_t *check, + uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); + const struct in_addr *a4 = inany_v4(&conn->faddr); + + ASSERT(a4); + + iph->tot_len = htons(ip_len); + iph->saddr = a4->s_addr; + iph->daddr = c->ip4.addr_seen.s_addr; + + iph->check = check ? *check : + csum_ip4_header(iph->tot_len, IPPROTO_TCP, + *a4, c->ip4.addr_seen); + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp4(iph, th); + + return ip_len; +} + +/** + * tcp_fill_headers6() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers + * @c: Execution context + * @conn: Connection pointer + * @ip6h: Pointer to IPv6 header + * @th: Pointer to TCP header + * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: The total length of the IPv6 packet, host order + */ +static size_t tcp_fill_headers6(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct ipv6hdr *ip6h, struct tcphdr *th, + size_t plen, uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); + + ip6h->payload_len = htons(plen + sizeof(struct tcphdr)); + ip6h->saddr = conn->faddr.a6; + if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr)) + ip6h->daddr = c->ip6.addr_ll_seen; + else + ip6h->daddr = c->ip6.addr_seen; + + ip6h->hop_limit = 255; + ip6h->version = 6; + ip6h->nexthdr = IPPROTO_TCP; + + ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf; + ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff; + ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff; + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp6(ip6h, th); + + return ip_len; +} + /** * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers * @c: Execution context @@ -1346,67 +1448,21 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c, const struct in_addr *a4 = inany_v4(&conn->faddr); size_t ip_len, tlen; -#define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq) \ -do { \ - b->th.source = htons(conn->fport); \ - b->th.dest = htons(conn->eport); \ - b->th.seq = htonl(seq); \ - b->th.ack_seq = htonl(conn->seq_ack_to_tap); \ - if (conn->events & ESTABLISHED) { \ - b->th.window = htons(conn->wnd_to_tap); \ - } else { \ - unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; \ - \ - b->th.window = htons(MIN(wnd, USHRT_MAX)); \ - } \ -} while (0) - if (a4) { struct tcp4_l2_buf_t *b = (struct tcp4_l2_buf_t *)p; - ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); - b->iph.tot_len = htons(ip_len); - b->iph.saddr = a4->s_addr; - b->iph.daddr = c->ip4.addr_seen.s_addr; - - b->iph.check = check ? *check : - csum_ip4_header(b->iph.tot_len, IPPROTO_TCP, - *a4, c->ip4.addr_seen); - - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - - tcp_update_check_tcp4(&b->iph, &b->th); + ip_len = tcp_fill_headers4(c, conn, &b->iph, &b->th, plen, + check, seq); tlen = tap_iov_len(c, &b->taph, ip_len); } else { struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p; - ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); - - b->ip6h.payload_len = htons(plen + sizeof(struct tcphdr)); - b->ip6h.saddr = conn->faddr.a6; - if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr)) - b->ip6h.daddr = c->ip6.addr_ll_seen; - else - b->ip6h.daddr = c->ip6.addr_seen; - - memset(b->ip6h.flow_lbl, 0, 3); - - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - - tcp_update_check_tcp6(&b->ip6h, &b->th); - - b->ip6h.hop_limit = 255; - b->ip6h.version = 6; - b->ip6h.nexthdr = IPPROTO_TCP; - - b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf; - b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff; - b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff; + ip_len = tcp_fill_headers6(c, conn, &b->ip6h, &b->th, plen, + seq); tlen = tap_iov_len(c, &b->taph, ip_len); } -#undef SET_TCP_HEADER_COMMON_V4_V6 return tlen; } -- 2.44.0
On Wed, 6 Mar 2024 15:18:14 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Stefano, you may have already done this, but I hit the same issue trying to rebase my own code on the vhost-user stuff. This is Laurent's most recent vhost-user part 1 series, but with fixes applied for the cppcheck and clang-tidy regressions introduced.I didn't get to this point yet, so this would help, but I can't apply it like this as it introduces further changes with a Signed-off-by Laurent and not otherwise marked. -- Stefano
On Wed, Mar 06, 2024 at 06:28:51AM +0100, Stefano Brivio wrote:On Wed, 6 Mar 2024 15:18:14 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Fair point, reposted. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonStefano, you may have already done this, but I hit the same issue trying to rebase my own code on the vhost-user stuff. This is Laurent's most recent vhost-user part 1 series, but with fixes applied for the cppcheck and clang-tidy regressions introduced.I didn't get to this point yet, so this would help, but I can't apply it like this as it introduces further changes with a Signed-off-by Laurent and not otherwise marked.
On Wed, 6 Mar 2024 16:58:59 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Wed, Mar 06, 2024 at 06:28:51AM +0100, Stefano Brivio wrote:Thanks a lot, running tests now. -- StefanoOn Wed, 6 Mar 2024 15:18:14 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Fair point, reposted.Stefano, you may have already done this, but I hit the same issue trying to rebase my own code on the vhost-user stuff. This is Laurent's most recent vhost-user part 1 series, but with fixes applied for the cppcheck and clang-tidy regressions introduced.I didn't get to this point yet, so this would help, but I can't apply it like this as it introduces further changes with a Signed-off-by Laurent and not otherwise marked.