The main packet "fast paths" for UDP and TCP mostly just forward packets rather than generating them from scratch. However the control paths for ICMP and DHCP sometimes generate packets more or less from scratch. Because these are relatively rare, it's not performance critical. The paths for sending these packets have some duplication of the header generation. There's also some layering violation in tap_ip_send() which both generates IP headers and updates the L4 (UDP or UCMP) checksum. Finally that checksum generation is a little awkward: it temporarily generates the IP pseudo header (or something close enough to serve) in the place of the actual header, generates the checksum, then replaces it with the real IP header. This approach seems to be causing miscompiles with some LTO optimization, because the stores to the pseudo header are being moved or elided across the code calculating the checksum. This series addresses all of these. We consolidate and clarify the packet sending helpers, and use them in some places there was previously duplicated code. In the process we use new checksum generation helpers which take a different approach which should avoid the LTO problems (this aspect I haven't tested yet though). Changes since v1: * Numerous minor style changes * Rename header generation helpers to make their behaviour clearer * Added several missing function doc comments * Corrected some erroneous statements and terms in comments David Gibson (14): Add csum_icmp6() helper for calculating ICMPv6 checksums Add csum_icmp4() helper for calculating ICMP checksums Add csum_udp6() helper for calculating UDP over IPv6 checksums Add csum_udp4() helper for calculating UDP over IPv4 checksums Add csum_ip4_header() helper to calculate IPv4 header checksums Add helpers for normal inbound packet destination addresses Remove support for TCP packets from tap_ip_send() tap: Remove unhelpeful vnet_pre optimization from tap_send() Split tap_ip_send() into IPv4 and IPv6 specific functions tap: Split tap_ip6_send() into UDP and ICMP variants ndp: Remove unneeded eh_source parameter ndp: Use tap_icmp6_send() helper tap: Split tap_ip4_send() into UDP and ICMP variants dhcp: Use tap_udp4_send() helper in dhcp() arp.c | 2 +- checksum.c | 120 ++++++++++++++++----- checksum.h | 15 ++- dhcp.c | 19 +--- dhcpv6.c | 21 +--- icmp.c | 12 +-- ndp.c | 28 +---- ndp.h | 3 +- tap.c | 312 ++++++++++++++++++++++++++++++++++------------------- tap.h | 19 +++- 10 files changed, 345 insertions(+), 206 deletions(-) -- 2.37.3
At least two places in passt calculate ICMPv6 checksums, ndp() and tap_ip_send(). Add a helper to handle this calculation in both places. For future flexibility, the new helper takes parameters for the fields in the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to be explicitly constructed. It also allows the ICMPv6 header and payload to be in separate buffers, although we don't use this yet. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 25 +++++++++++++++++++++++++ checksum.h | 5 +++++ ndp.c | 5 +---- tap.c | 6 ++---- 4 files changed, 33 insertions(+), 8 deletions(-) diff --git a/checksum.c b/checksum.c index 56ad01e..78c6960 100644 --- a/checksum.c +++ b/checksum.c @@ -52,6 +52,8 @@ #include <stddef.h> #include <stdint.h> +#include <linux/icmpv6.h> + /** * sum_16b() - Calculate sum of 16-bit words * @buf: Input buffer @@ -105,6 +107,29 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) return (uint16_t)~csum_fold(sum_16b(buf, len) + init); } +/** + * csum_icmp6() - Calculate and set checksum for an ICMPv6 packet + * @icmp6hr: ICMPv6 header, initialised apart from checksum + * @saddr: IPv6 source address + * @daddr: IPv6 destination address + * @payload: ICMP packet payload + * @len: Length of @payload (not including ICMPv6 header) + */ +void csum_icmp6(struct icmp6hdr *icmp6hr, + const struct in6_addr *saddr, const struct in6_addr *daddr, + const void *payload, size_t len) +{ + /* Partial checksum for the pseudo-IPv6 header */ + uint32_t psum = sum_16b(saddr, sizeof(*saddr)) + + sum_16b(daddr, sizeof(*daddr)) + + htons(len + sizeof(*icmp6hr)) + htons(IPPROTO_ICMPV6); + + icmp6hr->icmp6_cksum = 0; + /* Add in partial checksum for the ICMPv6 header alone */ + psum += sum_16b(icmp6hr, sizeof(*icmp6hr)); + icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum); +} + /** * csum_tcp4() - Calculate TCP checksum for IPv4 and set in place * @iph: Packet buffer, IP header diff --git a/checksum.h b/checksum.h index 5418406..d7daabf 100644 --- a/checksum.h +++ b/checksum.h @@ -6,9 +6,14 @@ #ifndef CHECKSUM_H #define CHECKSUM_H +struct icmp6hdr; + uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); +void csum_icmp6(struct icmp6hdr *icmp6hr, + const struct in6_addr *saddr, const struct in6_addr *daddr, + const void *payload, size_t len); void csum_tcp4(struct iphdr *iph); uint16_t csum(const void *buf, size_t len, uint32_t init); diff --git a/ndp.c b/ndp.c index dec36a9..03f1d06 100644 --- a/ndp.c +++ b/ndp.c @@ -189,10 +189,7 @@ dns_done: ip6hr->saddr = c->ip6.addr_ll; ip6hr->payload_len = htons(sizeof(*ihr) + len); - ip6hr->hop_limit = IPPROTO_ICMPV6; - ihr->icmp6_cksum = 0; - ihr->icmp6_cksum = csum_unaligned(ip6hr, sizeof(*ip6hr) + - sizeof(*ihr) + len, 0); + csum_icmp6(ihr, &ip6hr->saddr, &ip6hr->daddr, ihr + 1, len); ip6hr->version = 6; ip6hr->nexthdr = IPPROTO_ICMPV6; diff --git a/tap.c b/tap.c index 8b6d9bc..aafc92b 100644 --- a/tap.c +++ b/tap.c @@ -191,10 +191,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, } else if (proto == IPPROTO_ICMPV6) { struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1); - ih->icmp6_cksum = 0; - ih->icmp6_cksum = csum_unaligned(ip6h, - len + sizeof(*ip6h), - 0); + csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr, + ih + 1, len - sizeof(*ih)); } ip6h->version = 6; ip6h->nexthdr = proto; -- 2.37.3
Although tap_ip_send() is currently the only place calculating ICMP checksums, create a helper function for symmetry with ICMPv6. For future flexibility it allows the ICMPv6 header and payload to be in separate buffers. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 16 ++++++++++++++++ checksum.h | 2 ++ tap.c | 4 +--- 3 files changed, 19 insertions(+), 3 deletions(-) diff --git a/checksum.c b/checksum.c index 78c6960..f35c948 100644 --- a/checksum.c +++ b/checksum.c @@ -52,6 +52,7 @@ #include <stddef.h> #include <stdint.h> +#include <linux/icmp.h> #include <linux/icmpv6.h> /** @@ -107,6 +108,21 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) return (uint16_t)~csum_fold(sum_16b(buf, len) + init); } +/** + * csum_icmp4() - Calculate and set checksum for an ICMP packet + * @icmp4hr: ICMP header, initialised apart from checksum + * @payload: ICMP packet payload + * @len: Length of @payload (not including ICMP header) + */ +void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) +{ + /* Partial checksum for ICMP header alone */ + uint32_t psum = sum_16b(icmp4hr, sizeof(*icmp4hr)); + + icmp4hr->checksum = 0; + icmp4hr->checksum = csum_unaligned(payload, len, psum); +} + /** * csum_icmp6() - Calculate and set checksum for an ICMPv6 packet * @icmp6hr: ICMPv6 header, initialised apart from checksum diff --git a/checksum.h b/checksum.h index d7daabf..bf0620f 100644 --- a/checksum.h +++ b/checksum.h @@ -6,11 +6,13 @@ #ifndef CHECKSUM_H #define CHECKSUM_H +struct icmphdr; struct icmp6hdr; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); +void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len); void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); diff --git a/tap.c b/tap.c index aafc92b..f082901 100644 --- a/tap.c +++ b/tap.c @@ -148,9 +148,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, uh->check = 0; } else if (iph->protocol == IPPROTO_ICMP) { struct icmphdr *ih = (struct icmphdr *)(iph + 1); - - ih->checksum = 0; - ih->checksum = csum_unaligned(ih, len, 0); + csum_icmp4(ih, ih + 1, len - sizeof(*ih)); } if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh), 1) < 0) -- 2.37.3
Add a helper for calculating UDP checksums when used over IPv6 For future flexibility, the new helper takes parameters for the fields in the IPv6 pseudo-header, so an IPv6 header or pseudo-header doesn't need to be explicitly constructed. It also allows the UDP header and payload to be in separate buffers, although we don't use this yet. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 22 ++++++++++++++++++++++ checksum.h | 4 ++++ tap.c | 5 ++--- 3 files changed, 28 insertions(+), 3 deletions(-) diff --git a/checksum.c b/checksum.c index f35c948..175381d 100644 --- a/checksum.c +++ b/checksum.c @@ -52,6 +52,7 @@ #include <stddef.h> #include <stdint.h> +#include <linux/udp.h> #include <linux/icmp.h> #include <linux/icmpv6.h> @@ -123,6 +124,27 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) icmp4hr->checksum = csum_unaligned(payload, len, psum); } +/** + * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet + * @udp6hr: UDP header, initialised apart from checksum + * @payload: UDP packet payload + * @len: Length of @payload (not including UDP header) + */ +void csum_udp6(struct udphdr *udp6hr, + const struct in6_addr *saddr, const struct in6_addr *daddr, + const void *payload, size_t len) +{ + /* Partial checksum for the pseudo-IPv6 header */ + uint32_t psum = sum_16b(saddr, sizeof(*saddr)) + + sum_16b(daddr, sizeof(*daddr)) + + htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP); + + udp6hr->check = 0; + /* Add in partial checksum for the UDP header alone */ + psum += sum_16b(udp6hr, sizeof(*udp6hr)); + udp6hr->check = csum_unaligned(payload, len, psum); +} + /** * csum_icmp6() - Calculate and set checksum for an ICMPv6 packet * @icmp6hr: ICMPv6 header, initialised apart from checksum diff --git a/checksum.h b/checksum.h index bf0620f..2bb2ff9 100644 --- a/checksum.h +++ b/checksum.h @@ -6,6 +6,7 @@ #ifndef CHECKSUM_H #define CHECKSUM_H +struct udphdr; struct icmphdr; struct icmp6hdr; @@ -13,6 +14,9 @@ uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len); +void csum_udp6(struct udphdr *udp6hr, + const struct in6_addr *saddr, const struct in6_addr *daddr, + const void *payload, size_t len); void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); diff --git a/tap.c b/tap.c index f082901..9c197cb 100644 --- a/tap.c +++ b/tap.c @@ -183,9 +183,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, } else if (proto == IPPROTO_UDP) { struct udphdr *uh = (struct udphdr *)(ip6h + 1); - uh->check = 0; - uh->check = csum_unaligned(ip6h, len + sizeof(*ip6h), - 0); + csum_udp6(uh, &ip6h->saddr, &ip6h->daddr, + uh + 1, len - sizeof(*uh)); } else if (proto == IPPROTO_ICMPV6) { struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1); -- 2.37.3
At least two places in passt fill in UDP over IPv4 checksums, although since UDP checksums are optional with IPv4 that just amounts to storing a 0 (in tap_ip_send()) or leaving a 0 from an earlier initialization (in dhcp()). For consistency, add a helper for this "calculation". Just for the heck of it, add the option (compile time disabled for now) to calculate real UDP checksums. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 33 +++++++++++++++++++++++++++++++++ checksum.h | 2 ++ dhcp.c | 2 +- tap.c | 2 +- 4 files changed, 37 insertions(+), 2 deletions(-) diff --git a/checksum.c b/checksum.c index 175381d..cf6fc31 100644 --- a/checksum.c +++ b/checksum.c @@ -56,6 +56,12 @@ #include <linux/icmp.h> #include <linux/icmpv6.h> +/* Checksums are optional for UDP over IPv4, so we usually just set + * them to 0. Change this to 1 to calculate real UDP over IPv4 + * checksums + */ +#define UDP4_REAL_CHECKSUMS 0 + /** * sum_16b() - Calculate sum of 16-bit words * @buf: Input buffer @@ -109,6 +115,33 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) return (uint16_t)~csum_fold(sum_16b(buf, len) + init); } +/** + * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet + * @udp4hr: UDP header, initialised apart from checksum + * @saddr: IPv4 source address + * @daddr: IPv4 destination address + * @payload: ICMPv4 packet payload + * @len: Length of @payload (not including UDP) + */ +void csum_udp4(struct udphdr *udp4hr, in_addr_t saddr, in_addr_t daddr, + const void *payload, size_t len) +{ + /* UDP checksums are optional, so don't bother */ + udp4hr->check = 0; + + if (UDP4_REAL_CHECKSUMS) { + /* UNTESTED: if we did want real UDPv4 checksums, this + * is roughly what we'd need */ + uint32_t psum = csum_fold(htonl(saddr)) + + csum_fold(htonl(daddr)) + + htons(len + sizeof(*udp4hr)) + + htons(IPPROTO_UDP); + /* Add in partial checksum for the UDP header alone */ + psum += sum_16b(udp4hr, sizeof(*udp4hr)); + udp4hr->check = csum_unaligned(payload, len, psum); + } +} + /** * csum_icmp4() - Calculate and set checksum for an ICMP packet * @icmp4hr: ICMP header, initialised apart from checksum diff --git a/checksum.h b/checksum.h index 2bb2ff9..2a5e915 100644 --- a/checksum.h +++ b/checksum.h @@ -13,6 +13,8 @@ struct icmp6hdr; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); +void csum_udp4(struct udphdr *udp4hr, in_addr_t saddr, in_addr_t daddr, + const void *payload, size_t len); void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len); void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, diff --git a/dhcp.c b/dhcp.c index 7f0cc0b..8dcf645 100644 --- a/dhcp.c +++ b/dhcp.c @@ -364,9 +364,9 @@ int dhcp(const struct ctx *c, const struct pool *p) opt_set_dns_search(c, sizeof(m->o)); uh->len = htons(len = offsetof(struct msg, o) + fill(m) + sizeof(*uh)); - uh->check = 0; uh->source = htons(67); uh->dest = htons(68); + csum_udp4(uh, c->ip4.gw, c->ip4.addr, uh + 1, len - sizeof(*uh)); iph->tot_len = htons(len += sizeof(*iph)); iph->daddr = c->ip4.addr; diff --git a/tap.c b/tap.c index 9c197cb..58fc1de 100644 --- a/tap.c +++ b/tap.c @@ -145,7 +145,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, } else if (iph->protocol == IPPROTO_UDP) { struct udphdr *uh = (struct udphdr *)(iph + 1); - uh->check = 0; + csum_udp4(uh, iph->saddr, iph->daddr, uh + 1, len - sizeof(*uh)); } else if (iph->protocol == IPPROTO_ICMP) { struct icmphdr *ih = (struct icmphdr *)(iph + 1); csum_icmp4(ih, ih + 1, len - sizeof(*ih)); -- 2.37.3
We calculate IPv4 header checksums in at least two places, in dhcp() and in tap_ip_send. Add a helper to handle this calculation in both places. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 10 ++++++++++ checksum.h | 1 + dhcp.c | 3 +-- tap.c | 3 +-- 4 files changed, 13 insertions(+), 4 deletions(-) diff --git a/checksum.c b/checksum.c index cf6fc31..7b83196 100644 --- a/checksum.c +++ b/checksum.c @@ -115,6 +115,16 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) return (uint16_t)~csum_fold(sum_16b(buf, len) + init); } +/** + * csum_ip4_header() - Calculate and set IPv4 header checksum + * @ip4h: IPv4 header + */ +void csum_ip4_header(struct iphdr *ip4h) +{ + ip4h->check = 0; + ip4h->check = csum_unaligned(ip4h, (size_t)ip4h->ihl * 4, 0); +} + /** * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet * @udp4hr: UDP header, initialised apart from checksum diff --git a/checksum.h b/checksum.h index 2a5e915..91e9954 100644 --- a/checksum.h +++ b/checksum.h @@ -13,6 +13,7 @@ struct icmp6hdr; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); +void csum_ip4_header(struct iphdr *ip4h); void csum_udp4(struct udphdr *udp4hr, in_addr_t saddr, in_addr_t daddr, const void *payload, size_t len); void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len); diff --git a/dhcp.c b/dhcp.c index 8dcf645..875e18b 100644 --- a/dhcp.c +++ b/dhcp.c @@ -371,8 +371,7 @@ int dhcp(const struct ctx *c, const struct pool *p) iph->tot_len = htons(len += sizeof(*iph)); iph->daddr = c->ip4.addr; iph->saddr = c->ip4.gw; - iph->check = 0; - iph->check = csum_unaligned(iph, (intptr_t)(iph->ihl * 4), 0); + csum_ip4_header(iph); len += sizeof(*eh); memcpy(eh->h_dest, eh->h_source, ETH_ALEN); diff --git a/tap.c b/tap.c index 58fc1de..de02c56 100644 --- a/tap.c +++ b/tap.c @@ -135,8 +135,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, iph->daddr = c->ip4.addr_seen; memcpy(&iph->saddr, &src->s6_addr[12], 4); - iph->check = 0; - iph->check = csum_unaligned(iph, (size_t)iph->ihl * 4, 0); + csum_ip4_header(iph); memcpy(data, in, len); -- 2.37.3
tap_ip_send() doesn't take a destination address, because it's specifically for inbound packets, and the IP addresses of the guest/namespace are already known to us. Rather than open-coding this destination address logic, make helper functions for it which will enable some later cleanups. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 33 ++++++++++++++++++++++++++++----- tap.h | 3 +++ 2 files changed, 31 insertions(+), 5 deletions(-) diff --git a/tap.c b/tap.c index de02c56..89be383 100644 --- a/tap.c +++ b/tap.c @@ -96,6 +96,32 @@ int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre) return write(c->fd_tap, (char *)data + (vnet_pre ? 4 : 0), len); } +/** + * tap_ip4_daddr() - Normal IPv4 destination address for inbound packets + * @c: Execution context + * + * Returns: IPv4 address, network order + */ +in_addr_t tap_ip4_daddr(const struct ctx *c) +{ + return c->ip4.addr_seen; +} + +/** + * tap_ip6_daddr() - Normal IPv4 destination address for inbound packets + * @c: Execution context + * @src: Source address + * + * Returns: pointer to IPv6 address + */ +const struct in6_addr *tap_ip6_daddr(const struct ctx *c, + const struct in6_addr *src) +{ + if (IN6_IS_ADDR_LINKLOCAL(src)) + return &c->ip6.addr_ll_seen; + return &c->ip6.addr_seen; +} + /** * tap_ip_send() - Send IP packet, with L2 headers, calculating L3/L4 checksums * @c: Execution context @@ -132,7 +158,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, iph->frag_off = 0; iph->ttl = 255; iph->protocol = proto; - iph->daddr = c->ip4.addr_seen; + iph->daddr = tap_ip4_daddr(c); memcpy(&iph->saddr, &src->s6_addr[12], 4); csum_ip4_header(iph); @@ -163,10 +189,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, ip6h->priority = 0; ip6h->saddr = *src; - if (IN6_IS_ADDR_LINKLOCAL(src)) - ip6h->daddr = c->ip6.addr_ll_seen; - else - ip6h->daddr = c->ip6.addr_seen; + ip6h->daddr = *tap_ip6_daddr(c, src); memcpy(data, in, len); diff --git a/tap.h b/tap.h index df3aec0..a6764b4 100644 --- a/tap.h +++ b/tap.h @@ -6,6 +6,9 @@ #ifndef TAP_H #define TAP_H +in_addr_t tap_ip4_daddr(const struct ctx *c); +const struct in6_addr *tap_ip6_daddr(const struct ctx *c, + const struct in6_addr *src); void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, const char *in, size_t len, uint32_t flow); int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre); -- 2.37.3
tap_ip_send() is never used for TCP packets, we're unlikely to use it for that in future, and the handling of TCP packets makes other cleanups unnecessarily awkward. Remove it. This is the only user of csum_tcp4(), so we can remove that as well. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 34 ---------------------------------- checksum.h | 1 - tap.c | 11 ++--------- 3 files changed, 2 insertions(+), 44 deletions(-) diff --git a/checksum.c b/checksum.c index 7b83196..09d2c7c 100644 --- a/checksum.c +++ b/checksum.c @@ -211,40 +211,6 @@ void csum_icmp6(struct icmp6hdr *icmp6hr, icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum); } -/** - * csum_tcp4() - Calculate TCP checksum for IPv4 and set in place - * @iph: Packet buffer, IP header - */ -void csum_tcp4(struct iphdr *iph) -{ - uint16_t tlen = ntohs(iph->tot_len) - iph->ihl * 4, *p; - struct tcphdr *th; - uint32_t sum = 0; - - th = (struct tcphdr *)((char *)iph + (intptr_t)(iph->ihl * 4)); - p = (uint16_t *)th; - - sum += (iph->saddr >> 16) & 0xffff; - sum += iph->saddr & 0xffff; - sum += (iph->daddr >> 16) & 0xffff; - sum += iph->daddr & 0xffff; - - sum += htons(IPPROTO_TCP); - sum += htons(tlen); - - th->check = 0; - while (tlen > 1) { - sum += *p++; - tlen -= 2; - } - - if (tlen > 0) { - sum += *p & htons(0xff00); - } - - th->check = (uint16_t)~csum_fold(sum); -} - #ifdef __AVX2__ #include <immintrin.h> diff --git a/checksum.h b/checksum.h index 91e9954..b87b0d6 100644 --- a/checksum.h +++ b/checksum.h @@ -23,7 +23,6 @@ void csum_udp6(struct udphdr *udp6hr, void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); -void csum_tcp4(struct iphdr *iph); uint16_t csum(const void *buf, size_t len, uint32_t init); #endif /* CHECKSUM_H */ diff --git a/tap.c b/tap.c index 89be383..844ee43 100644 --- a/tap.c +++ b/tap.c @@ -165,9 +165,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, memcpy(data, in, len); - if (iph->protocol == IPPROTO_TCP) { - csum_tcp4(iph); - } else if (iph->protocol == IPPROTO_UDP) { + if (iph->protocol == IPPROTO_UDP) { struct udphdr *uh = (struct udphdr *)(iph + 1); csum_udp4(uh, iph->saddr, iph->daddr, uh + 1, len - sizeof(*uh)); @@ -196,13 +194,8 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, ip6h->hop_limit = proto; ip6h->version = 0; ip6h->nexthdr = 0; - if (proto == IPPROTO_TCP) { - struct tcphdr *th = (struct tcphdr *)(ip6h + 1); - th->check = 0; - th->check = csum_unaligned(ip6h, len + sizeof(*ip6h), - 0); - } else if (proto == IPPROTO_UDP) { + if (proto == IPPROTO_UDP) { struct udphdr *uh = (struct udphdr *)(ip6h + 1); csum_udp6(uh, &ip6h->saddr, &ip6h->daddr, -- 2.37.3
Callers of tap_send() can optionally use a small optimization by adding extra space for the 4 byte length header used on the qemu socket interface. tap_ip_send() is currently the only user of this, but this is used only for "slow path" ICMP and DHCP packets, so there's not a lot of value to the optimization. Worse, having the two paths here complicates the interface and makes future cleanups difficult, so just remove it. I have some plans to bring back the optimization in a more general way in future, but for now it's just in the way. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- arp.c | 2 +- dhcp.c | 2 +- ndp.c | 2 +- tap.c | 29 +++++++++-------------------- tap.h | 2 +- 5 files changed, 13 insertions(+), 24 deletions(-) diff --git a/arp.c b/arp.c index 0ad97af..141d43f 100644 --- a/arp.c +++ b/arp.c @@ -81,7 +81,7 @@ int arp(const struct ctx *c, const struct pool *p) memcpy(eh->h_dest, eh->h_source, sizeof(eh->h_dest)); memcpy(eh->h_source, c->mac, sizeof(eh->h_source)); - if (tap_send(c, eh, len, 0) < 0) + if (tap_send(c, eh, len) < 0) perror("ARP: send"); return 1; diff --git a/dhcp.c b/dhcp.c index 875e18b..2b3af82 100644 --- a/dhcp.c +++ b/dhcp.c @@ -377,7 +377,7 @@ int dhcp(const struct ctx *c, const struct pool *p) memcpy(eh->h_dest, eh->h_source, ETH_ALEN); memcpy(eh->h_source, c->mac, ETH_ALEN); - if (tap_send(c, eh, len, 0) < 0) + if (tap_send(c, eh, len) < 0) perror("DHCP: send"); return 1; diff --git a/ndp.c b/ndp.c index 03f1d06..79be0cf 100644 --- a/ndp.c +++ b/ndp.c @@ -200,7 +200,7 @@ dns_done: memcpy(ehr->h_source, c->mac, ETH_ALEN); ehr->h_proto = htons(ETH_P_IPV6); - if (tap_send(c, ehr, len, 0) < 0) + if (tap_send(c, ehr, len) < 0) perror("NDP: send"); return 1; diff --git a/tap.c b/tap.c index 844ee43..07592dd 100644 --- a/tap.c +++ b/tap.c @@ -66,34 +66,24 @@ static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_buf); * @c: Execution context * @data: Packet buffer * @len: Total L2 packet length - * @vnet_pre: Buffer has four-byte headroom * * Return: return code from send() or write() */ -int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre) +int tap_send(const struct ctx *c, const void *data, size_t len) { - if (vnet_pre) - pcap((char *)data + 4, len); - else - pcap(data, len); + pcap(data, len); if (c->mode == MODE_PASST) { int flags = MSG_NOSIGNAL | MSG_DONTWAIT; + uint32_t vnet_len = htonl(len); - if (vnet_pre) { - *((uint32_t *)data) = htonl(len); - len += 4; - } else { - uint32_t vnet_len = htonl(len); - - if (send(c->fd_tap, &vnet_len, 4, flags) < 0) - return -1; - } + if (send(c->fd_tap, &vnet_len, 4, flags) < 0) + return -1; return send(c->fd_tap, data, len, flags); } - return write(c->fd_tap, (char *)data + (vnet_pre ? 4 : 0), len); + return write(c->fd_tap, (char *)data, len); } /** @@ -135,10 +125,9 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, const char *in, size_t len, uint32_t flow) { char buf[USHRT_MAX]; - char *pkt = buf + 4; struct ethhdr *eh; - eh = (struct ethhdr *)pkt; + eh = (struct ethhdr *)buf; /* TODO: ARP table lookup */ memcpy(eh->h_dest, c->mac_guest, ETH_ALEN); @@ -174,7 +163,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, csum_icmp4(ih, ih + 1, len - sizeof(*ih)); } - if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh), 1) < 0) + if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh)) < 0) debug("tap: failed to send %lu bytes (IPv4)", len); } else { struct ipv6hdr *ip6h = (struct ipv6hdr *)(eh + 1); @@ -215,7 +204,7 @@ void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, ip6h->flow_lbl[2] = (flow >> 0) & 0xff; } - if (tap_send(c, buf, len + sizeof(*ip6h) + sizeof(*eh), 1) < 1) + if (tap_send(c, buf, len + sizeof(*ip6h) + sizeof(*eh)) < 1) debug("tap: failed to send %lu bytes (IPv6)", len); } } diff --git a/tap.h b/tap.h index a6764b4..a8da8bb 100644 --- a/tap.h +++ b/tap.h @@ -11,7 +11,7 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c, const struct in6_addr *src); void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, const char *in, size_t len, uint32_t flow); -int tap_send(const struct ctx *c, const void *data, size_t len, int vnet_pre); +int tap_send(const struct ctx *c, const void *data, size_t len); void tap_handler(struct ctx *c, int fd, uint32_t events, const struct timespec *now); void tap_sock_init(struct ctx *c); -- 2.37.3
The IPv4 and IPv6 paths in tap_ip_send() have very little in common, and it turns out that every caller (statically) knows if it is using IPv4 or IPv6. So split into separate tap_ip4_send() and tap_ip6_send() functions. Use a new tap_l2_hdr() function for the very small common part. While we're there, make some minor cleanups: - We were double writing some fields in the IPv6 header, so that it temporary matched the pseudo-header for checksum calculation. With recent checksum reworks, this isn't neccessary any more. - We don't use any IPv4 header options, so use some sizeof() constructs instead of some open coded values for header length. - The comment used to say that the flow label was for TCP over IPv6, but in fact the only thing we used it for was DHCPv6 over UDP traffic Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- dhcpv6.c | 6 +- icmp.c | 10 +--- tap.c | 177 +++++++++++++++++++++++++++++-------------------------- tap.h | 6 +- 4 files changed, 103 insertions(+), 96 deletions(-) diff --git a/dhcpv6.c b/dhcpv6.c index e7640ce..7829968 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -531,8 +531,8 @@ int dhcpv6(struct ctx *c, const struct pool *p, resp_not_on_link.hdr.xid = mh->xid; - tap_ip_send(c, src, IPPROTO_UDP, - (char *)&resp_not_on_link, n, mh->xid); + tap_ip6_send(c, src, IPPROTO_UDP, + (char *)&resp_not_on_link, n, mh->xid); return 1; } @@ -580,7 +580,7 @@ int dhcpv6(struct ctx *c, const struct pool *p, resp.hdr.xid = mh->xid; - tap_ip_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid); + tap_ip6_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid); c->ip6.addr_seen = c->ip6.addr; return 1; diff --git a/icmp.c b/icmp.c index 21ea2d7..61c2d90 100644 --- a/icmp.c +++ b/icmp.c @@ -69,10 +69,6 @@ static uint8_t icmp_act[IP_VERSIONS][DIV_ROUND_UP(ICMP_NUM_IDS, 8)]; void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, const struct timespec *now) { - struct in6_addr a6 = { .s6_addr = { 0, 0, 0, 0, - 0, 0, 0, 0, - 0, 0, 0xff, 0xff, - 0, 0, 0, 0 } }; union icmp_epoll_ref *iref = &ref.r.p.icmp; struct sockaddr_storage sr; socklen_t sl = sizeof(sr); @@ -109,7 +105,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, icmp_id_map[V6][id].seq = seq; } - tap_ip_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0); + tap_ip6_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0); } else { struct sockaddr_in *sr4 = (struct sockaddr_in *)&sr; struct icmphdr *ih = (struct icmphdr *)buf; @@ -127,9 +123,7 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, icmp_id_map[V4][id].seq = seq; } - memcpy(&a6.s6_addr[12], &sr4->sin_addr, sizeof(sr4->sin_addr)); - - tap_ip_send(c, &a6, IPPROTO_ICMP, buf, n, 0); + tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n); } } diff --git a/tap.c b/tap.c index 07592dd..0e8c99b 100644 --- a/tap.c +++ b/tap.c @@ -113,100 +113,111 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c, } /** - * tap_ip_send() - Send IP packet, with L2 headers, calculating L3/L4 checksums + * tap_push_l2h() - Build an L2 header for an inbound packet * @c: Execution context - * @src: IPv6 source address, IPv4-mapped for IPv4 sources - * @proto: L4 protocol number - * @in: Payload - * @len: L4 payload length - * @flow: Flow label for TCP over IPv6 + * @buf: Buffer address at which to generate header + * @proto: Ethernet protocol number for L3 + * + * Return: pointer at which to write the packet's payload */ -void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, - const char *in, size_t len, uint32_t flow) +static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto) { - char buf[USHRT_MAX]; - struct ethhdr *eh; - - eh = (struct ethhdr *)buf; + struct ethhdr *eh = (struct ethhdr *)buf; /* TODO: ARP table lookup */ memcpy(eh->h_dest, c->mac_guest, ETH_ALEN); memcpy(eh->h_source, c->mac, ETH_ALEN); + eh->h_proto = ntohs(proto); + return eh + 1; +} - if (IN6_IS_ADDR_V4MAPPED(src)) { - struct iphdr *iph = (struct iphdr *)(eh + 1); - char *data = (char *)(iph + 1); - - eh->h_proto = ntohs(ETH_P_IP); - - iph->version = 4; - iph->ihl = 5; - iph->tos = 0; - iph->tot_len = htons(len + 20); - iph->id = 0; - iph->frag_off = 0; - iph->ttl = 255; - iph->protocol = proto; - iph->daddr = tap_ip4_daddr(c); - memcpy(&iph->saddr, &src->s6_addr[12], 4); - - csum_ip4_header(iph); - - memcpy(data, in, len); - - if (iph->protocol == IPPROTO_UDP) { - struct udphdr *uh = (struct udphdr *)(iph + 1); - - csum_udp4(uh, iph->saddr, iph->daddr, uh + 1, len - sizeof(*uh)); - } else if (iph->protocol == IPPROTO_ICMP) { - struct icmphdr *ih = (struct icmphdr *)(iph + 1); - csum_icmp4(ih, ih + 1, len - sizeof(*ih)); - } - - if (tap_send(c, buf, len + sizeof(*iph) + sizeof(*eh)) < 0) - debug("tap: failed to send %lu bytes (IPv4)", len); - } else { - struct ipv6hdr *ip6h = (struct ipv6hdr *)(eh + 1); - char *data = (char *)(ip6h + 1); - - eh->h_proto = ntohs(ETH_P_IPV6); - - memset(ip6h->flow_lbl, 0, 3); - ip6h->payload_len = htons(len); - ip6h->priority = 0; - - ip6h->saddr = *src; - ip6h->daddr = *tap_ip6_daddr(c, src); - - memcpy(data, in, len); - - ip6h->hop_limit = proto; - ip6h->version = 0; - ip6h->nexthdr = 0; - - if (proto == IPPROTO_UDP) { - struct udphdr *uh = (struct udphdr *)(ip6h + 1); - - csum_udp6(uh, &ip6h->saddr, &ip6h->daddr, - uh + 1, len - sizeof(*uh)); - } else if (proto == IPPROTO_ICMPV6) { - struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1); +/** + * tap_ip4_send() - Send IPv4 packet, with L2 headers, calculating L3/L4 checksums + * @c: Execution context + * @src: IPv4 source address + * @proto: L4 protocol number + * @in: Payload + * @len: L4 payload length + */ +void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto, + const char *in, size_t len) +{ + char buf[USHRT_MAX]; + struct iphdr *ip4h = (struct iphdr *)tap_push_l2h(c, buf, ETH_P_IP); + char *data = (char *)(ip4h + 1); + + ip4h->version = 4; + ip4h->ihl = sizeof(struct iphdr) / 4; + ip4h->tos = 0; + ip4h->tot_len = htons(len + sizeof(*ip4h)); + ip4h->id = 0; + ip4h->frag_off = 0; + ip4h->ttl = 255; + ip4h->protocol = proto; + ip4h->saddr = src; + ip4h->daddr = tap_ip4_daddr(c); + csum_ip4_header(ip4h); + + memcpy(data, in, len); + + if (ip4h->protocol == IPPROTO_UDP) { + struct udphdr *uh = (struct udphdr *)(ip4h + 1); + + csum_udp4(uh, ip4h->saddr, ip4h->daddr, + uh + 1, len - sizeof(*uh)); + } else if (ip4h->protocol == IPPROTO_ICMP) { + struct icmphdr *ih = (struct icmphdr *)(ip4h + 1); + csum_icmp4(ih, ih + 1, len - sizeof(*ih)); + } - csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr, - ih + 1, len - sizeof(*ih)); - } - ip6h->version = 6; - ip6h->nexthdr = proto; - ip6h->hop_limit = 255; - if (flow) { - ip6h->flow_lbl[0] = (flow >> 16) & 0xf; - ip6h->flow_lbl[1] = (flow >> 8) & 0xff; - ip6h->flow_lbl[2] = (flow >> 0) & 0xff; - } + if (tap_send(c, buf, len + (data - buf)) < 0) + debug("tap: failed to send %lu bytes (IPv4)", len); +} - if (tap_send(c, buf, len + sizeof(*ip6h) + sizeof(*eh)) < 1) - debug("tap: failed to send %lu bytes (IPv6)", len); +/** + * tap_ip6_send() - Send IPv6 packet, with L2 headers, calculating L3/L4 checksums + * @c: Execution context + * @src: IPv6 source address + * @proto: L4 protocol number + * @in: Payload + * @len: L4 payload length + * @flow: Flow label + */ +void tap_ip6_send(const struct ctx *c, const struct in6_addr *src, + uint8_t proto, const char *in, size_t len, uint32_t flow) +{ + char buf[USHRT_MAX]; + struct ipv6hdr *ip6h = + (struct ipv6hdr *)tap_push_l2h(c, buf, ETH_P_IPV6); + char *data = (char *)(ip6h + 1); + + ip6h->payload_len = htons(len); + ip6h->priority = 0; + ip6h->version = 6; + ip6h->nexthdr = proto; + ip6h->hop_limit = 255; + ip6h->saddr = *src; + ip6h->daddr = *tap_ip6_daddr(c, src); + ip6h->flow_lbl[0] = (flow >> 16) & 0xf; + ip6h->flow_lbl[1] = (flow >> 8) & 0xff; + ip6h->flow_lbl[2] = (flow >> 0) & 0xff; + + memcpy(data, in, len); + + if (proto == IPPROTO_UDP) { + struct udphdr *uh = (struct udphdr *)(ip6h + 1); + + csum_udp6(uh, &ip6h->saddr, &ip6h->daddr, + uh + 1, len - sizeof(*uh)); + } else if (proto == IPPROTO_ICMPV6) { + struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1); + + csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr, + ih + 1, len - sizeof(*ih)); } + + if (tap_send(c, buf, len + (data - buf)) < 1) + debug("tap: failed to send %lu bytes (IPv6)", len); } PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf); diff --git a/tap.h b/tap.h index a8da8bb..011ba8e 100644 --- a/tap.h +++ b/tap.h @@ -9,8 +9,10 @@ in_addr_t tap_ip4_daddr(const struct ctx *c); const struct in6_addr *tap_ip6_daddr(const struct ctx *c, const struct in6_addr *src); -void tap_ip_send(const struct ctx *c, const struct in6_addr *src, uint8_t proto, - const char *in, size_t len, uint32_t flow); +void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto, + const char *in, size_t len); +void tap_ip6_send(const struct ctx *c, const struct in6_addr *src, + uint8_t proto, const char *in, size_t len, uint32_t flow); int tap_send(const struct ctx *c, const void *data, size_t len); void tap_handler(struct ctx *c, int fd, uint32_t events, const struct timespec *now); -- 2.37.3
tap_ip6_send() has special case logic to compute the checksums for UDP and ICMP packets, which is a mild layering violation. By using a suitable helper we can split it into tap_udp6_send() and tap_icmp6_send() functions without greatly increasing the code size, this removing that layering violation. We make some small changes to the interface while there. In both cases we make the destination IPv6 address a parameter, which will be useful later. For the UDP variant we make it take just the UDP payload, and it will generate the UDP header. For the ICMP variant we pass in the ICMP header as before. The inconsistency is because that's what seems to be the more natural way to invoke the function in the callers in each case. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- dhcpv6.c | 21 +++------------ icmp.c | 3 ++- tap.c | 82 ++++++++++++++++++++++++++++++++++++++++++-------------- tap.h | 9 +++++-- 4 files changed, 75 insertions(+), 40 deletions(-) diff --git a/dhcpv6.c b/dhcpv6.c index 7829968..e763aed 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -208,15 +208,8 @@ struct msg_hdr { uint32_t xid:24; } __attribute__((__packed__)); -#if __BYTE_ORDER == __BIG_ENDIAN -#define UH_RESP {{{ 547, 546, 0, 0, }}} -#else -#define UH_RESP {{{ __bswap_constant_16(547), __bswap_constant_16(546), 0, 0 }}} -#endif - /** * struct resp_t - Normal advertise and reply message - * @uh: UDP header * @hdr: DHCP message header * @server_id: Server Identifier option * @ia_na: Non-temporary Address option @@ -226,7 +219,6 @@ struct msg_hdr { * @dns_search: Domain Search List, here just for storage size */ static struct resp_t { - struct udphdr uh; struct msg_hdr hdr; struct opt_server_id server_id; @@ -236,7 +228,6 @@ static struct resp_t { struct opt_dns_servers dns_servers; struct opt_dns_search dns_search; } __attribute__((__packed__)) resp = { - UH_RESP, { 0 }, SERVER_ID, @@ -270,13 +261,11 @@ static const struct opt_status_code sc_not_on_link = { /** * struct resp_not_on_link_t - NotOnLink error (mandated by RFC 8415, 18.3.2.) - * @uh: UDP header * @hdr: DHCP message header * @server_id: Server Identifier option * @var: Payload: IA_NA from client, status code, client ID */ static struct resp_not_on_link_t { - struct udphdr uh; struct msg_hdr hdr; struct opt_server_id server_id; @@ -284,7 +273,6 @@ static struct resp_not_on_link_t { uint8_t var[sizeof(struct opt_ia_na) + sizeof(struct opt_status_code) + sizeof(struct opt_client_id)]; } __attribute__((__packed__)) resp_not_on_link = { - UH_RESP, { TYPE_REPLY, 0 }, SERVER_ID, { 0, }, @@ -527,12 +515,11 @@ int dhcpv6(struct ctx *c, const struct pool *p, n += sizeof(struct opt_hdr) + ntohs(client_id->l); n = offsetof(struct resp_not_on_link_t, var) + n; - resp_not_on_link.uh.len = htons(n); resp_not_on_link.hdr.xid = mh->xid; - tap_ip6_send(c, src, IPPROTO_UDP, - (char *)&resp_not_on_link, n, mh->xid); + tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546, + mh->xid, &resp_not_on_link, n); return 1; } @@ -576,11 +563,11 @@ int dhcpv6(struct ctx *c, const struct pool *p, n = offsetof(struct resp_t, client_id) + sizeof(struct opt_hdr) + ntohs(client_id->l); n = dhcpv6_dns_fill(c, (char *)&resp, n); - resp.uh.len = htons(n); resp.hdr.xid = mh->xid; - tap_ip6_send(c, src, IPPROTO_UDP, (char *)&resp, n, mh->xid); + tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546, + mh->xid, &resp, n); c->ip6.addr_seen = c->ip6.addr; return 1; diff --git a/icmp.c b/icmp.c index 61c2d90..6493ea9 100644 --- a/icmp.c +++ b/icmp.c @@ -105,7 +105,8 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, icmp_id_map[V6][id].seq = seq; } - tap_ip6_send(c, &sr6->sin6_addr, IPPROTO_ICMPV6, buf, n, 0); + tap_icmp6_send(c, &sr6->sin6_addr, + tap_ip6_daddr(c, &sr6->sin6_addr), buf, n); } else { struct sockaddr_in *sr4 = (struct sockaddr_in *)&sr; struct icmphdr *ih = (struct icmphdr *)buf; diff --git a/tap.c b/tap.c index 0e8c99b..135d799 100644 --- a/tap.c +++ b/tap.c @@ -175,21 +175,22 @@ void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto, } /** - * tap_ip6_send() - Send IPv6 packet, with L2 headers, calculating L3/L4 checksums + * tap_push_ip6h() - Build IPv6 header for inbound packet * @c: Execution context * @src: IPv6 source address - * @proto: L4 protocol number - * @in: Payload + * @dst: IPv6 destination address * @len: L4 payload length - * @flow: Flow label + * @proto: L4 protocol number + * @flow: IPv6 flow identifier + * + * Return: pointer at which to write the packet's payload */ -void tap_ip6_send(const struct ctx *c, const struct in6_addr *src, - uint8_t proto, const char *in, size_t len, uint32_t flow) +static void *tap_push_ip6h(char *buf, + const struct in6_addr *src, + const struct in6_addr *dst, + size_t len, uint8_t proto, uint32_t flow) { - char buf[USHRT_MAX]; - struct ipv6hdr *ip6h = - (struct ipv6hdr *)tap_push_l2h(c, buf, ETH_P_IPV6); - char *data = (char *)(ip6h + 1); + struct ipv6hdr *ip6h = (struct ipv6hdr *)buf; ip6h->payload_len = htons(len); ip6h->priority = 0; @@ -197,24 +198,65 @@ void tap_ip6_send(const struct ctx *c, const struct in6_addr *src, ip6h->nexthdr = proto; ip6h->hop_limit = 255; ip6h->saddr = *src; - ip6h->daddr = *tap_ip6_daddr(c, src); + ip6h->daddr = *dst; ip6h->flow_lbl[0] = (flow >> 16) & 0xf; ip6h->flow_lbl[1] = (flow >> 8) & 0xff; ip6h->flow_lbl[2] = (flow >> 0) & 0xff; + return ip6h + 1; +} +/** + * tap_udp6_send() - Send UDP over IPv6 packet + * @c: Execution context + * @src: IPv6 source address + * @sport: UDP source port + * @dst: IPv6 destination address + * @dport: UDP destination port + * @flow: Flow label + * @in: UDP payload contents (not including UDP header) + * @len: UDP payload length (not including UDP header) + */ +void tap_udp6_send(const struct ctx *c, + const struct in6_addr *src, in_port_t sport, + const struct in6_addr *dst, in_port_t dport, + uint32_t flow, const void *in, size_t len) +{ + size_t udplen = len + sizeof(struct udphdr); + char buf[USHRT_MAX]; + void *ip6h = tap_push_l2h(c, buf, ETH_P_IPV6); + void *uhp = tap_push_ip6h(ip6h, src, dst, udplen, IPPROTO_UDP, flow); + struct udphdr *uh = (struct udphdr *)uhp; + char *data = (char *)(uh + 1); + + uh->source = htons(sport); + uh->dest = htons(dport); + uh->len = htons(udplen); + csum_udp6(uh, src, dst, in, len); memcpy(data, in, len); - if (proto == IPPROTO_UDP) { - struct udphdr *uh = (struct udphdr *)(ip6h + 1); + if (tap_send(c, buf, len + (data - buf)) < 1) + debug("tap: failed to send %lu bytes (IPv6)", len); +} - csum_udp6(uh, &ip6h->saddr, &ip6h->daddr, - uh + 1, len - sizeof(*uh)); - } else if (proto == IPPROTO_ICMPV6) { - struct icmp6hdr *ih = (struct icmp6hdr *)(ip6h + 1); +/** + * tap_icmp6_send() - Send ICMPv6 packet + * @c: Execution context + * @src: IPv6 source address + * @dst: IPv6 destination address + * @in: ICMP packet, including ICMP header + * @len: ICMP packet length, including ICMP header + */ +void tap_icmp6_send(const struct ctx *c, + const struct in6_addr *src, const struct in6_addr *dst, + void *in, size_t len) +{ + char buf[USHRT_MAX]; + void *ip6h = tap_push_l2h(c, buf, ETH_P_IPV6); + char *data = tap_push_ip6h(ip6h, src, dst, len, IPPROTO_ICMPV6, 0); + struct icmp6hdr *icmp6h = (struct icmp6hdr *)data; - csum_icmp6(ih, &ip6h->saddr, &ip6h->daddr, - ih + 1, len - sizeof(*ih)); - } + memcpy(data, in, len); + csum_icmp6(icmp6h, src, dst, icmp6h + 1, len - sizeof(*icmp6h)); if (tap_send(c, buf, len + (data - buf)) < 1) debug("tap: failed to send %lu bytes (IPv6)", len); diff --git a/tap.h b/tap.h index 011ba8e..d43c7a0 100644 --- a/tap.h +++ b/tap.h @@ -11,8 +11,13 @@ const struct in6_addr *tap_ip6_daddr(const struct ctx *c, const struct in6_addr *src); void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto, const char *in, size_t len); -void tap_ip6_send(const struct ctx *c, const struct in6_addr *src, - uint8_t proto, const char *in, size_t len, uint32_t flow); +void tap_udp6_send(const struct ctx *c, + const struct in6_addr *src, in_port_t sport, + const struct in6_addr *dst, in_port_t dport, + uint32_t flow, const void *in, size_t len); +void tap_icmp6_send(const struct ctx *c, + const struct in6_addr *src, const struct in6_addr *dst, + void *in, size_t len); int tap_send(const struct ctx *c, const void *data, size_t len); void tap_handler(struct ctx *c, int fd, uint32_t events, const struct timespec *now); -- 2.37.3
ndp() takes a parameter giving the ethernet source address of the packet it is to respond to, which it uses to determine the destination address to send the reply packet to. This is not necessary, because the address will always be the guest's MAC address. Even if the guest has just changed MAC address, then either tap_handler_passt() or tap_handler_pasta() - which are the only call paths leading to ndp() will have updated c->mac_guest with the new value. So, remove the parameter, and just use c->mac_guest, making it more consistent with other paths where we construct packets to send inwards. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- ndp.c | 6 ++---- ndp.h | 3 +-- tap.c | 2 +- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/ndp.c b/ndp.c index 79be0cf..f96b4b7 100644 --- a/ndp.c +++ b/ndp.c @@ -41,13 +41,11 @@ * ndp() - Check for NDP solicitations, reply as needed * @c: Execution context * @ih: ICMPv6 header - * @eh_source: Source Ethernet address * @saddr Source IPv6 address * * Return: 0 if not handled here, 1 if handled, -1 on failure */ -int ndp(struct ctx *c, const struct icmp6hdr *ih, - const unsigned char *eh_source, const struct in6_addr *saddr) +int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr) { char buf[BUFSIZ] = { 0 }; struct ipv6hdr *ip6hr; @@ -196,7 +194,7 @@ dns_done: ip6hr->hop_limit = 255; len += sizeof(*ehr) + sizeof(*ip6hr) + sizeof(*ihr); - memcpy(ehr->h_dest, eh_source, ETH_ALEN); + memcpy(ehr->h_dest, c->mac_guest, ETH_ALEN); memcpy(ehr->h_source, c->mac, ETH_ALEN); ehr->h_proto = htons(ETH_P_IPV6); diff --git a/ndp.h b/ndp.h index d857425..b012747 100644 --- a/ndp.h +++ b/ndp.h @@ -6,7 +6,6 @@ #ifndef NDP_H #define NDP_H -int ndp(struct ctx *c, const struct icmp6hdr *ih, - const unsigned char *eh_source, const struct in6_addr *saddr); +int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr); #endif /* NDP_H */ diff --git a/tap.c b/tap.c index 135d799..0031d82 100644 --- a/tap.c +++ b/tap.c @@ -576,7 +576,7 @@ resume: if (l4_len < sizeof(struct icmp6hdr)) continue; - if (ndp(c, (struct icmp6hdr *)l4h, eh->h_source, saddr)) + if (ndp(c, (struct icmp6hdr *)l4h, saddr)) continue; tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1); -- 2.37.3
We send ICMPv6 packets to the guest from both icmp.c and from ndp.c. The case in ndp() manually constructs L2 and IPv6 headers, unlike the version in icmp.c which uses the tap_icmp6_send() helper from tap.c Now that we've broaded the parameters of tap_icmp6_send() we can use it in ndp() as well saving some duplicated logic. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- ndp.c | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-) diff --git a/ndp.c b/ndp.c index f96b4b7..80e1f19 100644 --- a/ndp.c +++ b/ndp.c @@ -47,6 +47,7 @@ */ int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr) { + const struct in6_addr *rsaddr; /* src addr for reply */ char buf[BUFSIZ] = { 0 }; struct ipv6hdr *ip6hr; struct icmp6hdr *ihr; @@ -180,26 +181,12 @@ dns_done: else c->ip6.addr_seen = *saddr; - ip6hr->daddr = *saddr; if (IN6_IS_ADDR_LINKLOCAL(&c->ip6.gw)) - ip6hr->saddr = c->ip6.gw; + rsaddr = &c->ip6.gw; else - ip6hr->saddr = c->ip6.addr_ll; + rsaddr = &c->ip6.addr_ll; - ip6hr->payload_len = htons(sizeof(*ihr) + len); - csum_icmp6(ihr, &ip6hr->saddr, &ip6hr->daddr, ihr + 1, len); - - ip6hr->version = 6; - ip6hr->nexthdr = IPPROTO_ICMPV6; - ip6hr->hop_limit = 255; - - len += sizeof(*ehr) + sizeof(*ip6hr) + sizeof(*ihr); - memcpy(ehr->h_dest, c->mac_guest, ETH_ALEN); - memcpy(ehr->h_source, c->mac, ETH_ALEN); - ehr->h_proto = htons(ETH_P_IPV6); - - if (tap_send(c, ehr, len) < 0) - perror("NDP: send"); + tap_icmp6_send(c, rsaddr, saddr, ihr, len + sizeof(*ihr)); return 1; } -- 2.37.3
tap_ip4_send() has special case logic to compute the checksums for UDP and ICMP packets, which is a mild layering violation. By using a suitable helper we can split it into tap_udp4_send() and tap_icmp4_send() functions without greatly increasing the code size, this removing that layering violation. We make some small changes to the interface while there. In both cases we make the destination IPv4 address a parameter, which will be useful later. For the UDP variant we make it take just the UDP payload, and it will generate the UDP header. For the ICMP variant we pass in the ICMP header as before. The inconsistency is because that's what seems to be the more natural way to invoke the function in the callers in each case. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- icmp.c | 3 ++- tap.c | 77 ++++++++++++++++++++++++++++++++++++++++++++-------------- tap.h | 7 ++++-- 3 files changed, 66 insertions(+), 21 deletions(-) diff --git a/icmp.c b/icmp.c index 6493ea9..233acf9 100644 --- a/icmp.c +++ b/icmp.c @@ -124,7 +124,8 @@ void icmp_sock_handler(const struct ctx *c, union epoll_ref ref, icmp_id_map[V4][id].seq = seq; } - tap_ip4_send(c, sr4->sin_addr.s_addr, IPPROTO_ICMP, buf, n); + tap_icmp4_send(c, sr4->sin_addr.s_addr, tap_ip4_daddr(c), + buf, n); } } diff --git a/tap.c b/tap.c index 0031d82..d250a0b 100644 --- a/tap.c +++ b/tap.c @@ -132,19 +132,19 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto) } /** - * tap_ip4_send() - Send IPv4 packet, with L2 headers, calculating L3/L4 checksums + * tap_push_ip4h() - Build IPv4 header for inbound packet, with checksum * @c: Execution context - * @src: IPv4 source address - * @proto: L4 protocol number - * @in: Payload + * @src: IPv4 source address, network order + * @dst: IPv4 destination address, network order * @len: L4 payload length + * @proto: L4 protocol number + * + * Return: pointer at which to write the packet's payload */ -void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto, - const char *in, size_t len) +static void *tap_push_ip4h(char *buf, in_addr_t src, in_addr_t dst, + size_t len, uint8_t proto) { - char buf[USHRT_MAX]; - struct iphdr *ip4h = (struct iphdr *)tap_push_l2h(c, buf, ETH_P_IP); - char *data = (char *)(ip4h + 1); + struct iphdr *ip4h = (struct iphdr *)buf; ip4h->version = 4; ip4h->ihl = sizeof(struct iphdr) / 4; @@ -155,20 +155,61 @@ void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto, ip4h->ttl = 255; ip4h->protocol = proto; ip4h->saddr = src; - ip4h->daddr = tap_ip4_daddr(c); + ip4h->daddr = dst; csum_ip4_header(ip4h); + return ip4h + 1; +} +/** + * tap_udp4_send() - Send UDP over IPv4 packet + * @c: Execution context + * @src: IPv4 source address + * @sport: UDP source port + * @dst: IPv4 destination address + * @dport: UDP destination port + * @in: UDP payload contents (not including UDP header) + * @len: UDP payload length (not including UDP header) + */ +/* cppcheck-suppress unusedFunction */ +void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport, + in_addr_t dst, in_port_t dport, + const void *in, size_t len) +{ + size_t udplen = len + sizeof(struct udphdr); + char buf[USHRT_MAX]; + void *ip4h = tap_push_l2h(c, buf, ETH_P_IP); + void *uhp = tap_push_ip4h(ip4h, src, dst, udplen, IPPROTO_UDP); + struct udphdr *uh = (struct udphdr *)uhp; + char *data = (char *)(uh + 1); + + uh->source = htons(sport); + uh->dest = htons(dport); + uh->len = htons(udplen); + csum_udp4(uh, src, dst, in, len); memcpy(data, in, len); - if (ip4h->protocol == IPPROTO_UDP) { - struct udphdr *uh = (struct udphdr *)(ip4h + 1); + if (tap_send(c, buf, len + (data - buf)) < 0) + debug("tap: failed to send %lu bytes (IPv4)", len); +} - csum_udp4(uh, ip4h->saddr, ip4h->daddr, - uh + 1, len - sizeof(*uh)); - } else if (ip4h->protocol == IPPROTO_ICMP) { - struct icmphdr *ih = (struct icmphdr *)(ip4h + 1); - csum_icmp4(ih, ih + 1, len - sizeof(*ih)); - } +/** + * tap_icmp4_send() - Send ICMPv4 packet + * @c: Execution context + * @src: IPv4 source address + * @dst: IPv4 destination address + * @in: ICMP packet, including ICMP header + * @len: ICMP packet length, including ICMP header + */ +void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst, + void *in, size_t len) +{ + char buf[USHRT_MAX]; + void *ip4h = tap_push_l2h(c, buf, ETH_P_IP); + char *data = tap_push_ip4h(ip4h, src, dst, len, IPPROTO_ICMP); + struct icmphdr *icmp4h = (struct icmphdr *)data; + + memcpy(data, in, len); + csum_icmp4(icmp4h, icmp4h + 1, len - sizeof(*icmp4h)); if (tap_send(c, buf, len + (data - buf)) < 0) debug("tap: failed to send %lu bytes (IPv4)", len); diff --git a/tap.h b/tap.h index d43c7a0..743bc58 100644 --- a/tap.h +++ b/tap.h @@ -7,10 +7,13 @@ #define TAP_H in_addr_t tap_ip4_daddr(const struct ctx *c); +void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport, + in_addr_t dst, in_port_t dport, + const void *in, size_t len); +void tap_icmp4_send(const struct ctx *c, in_addr_t src, in_addr_t dst, + void *in, size_t len); const struct in6_addr *tap_ip6_daddr(const struct ctx *c, const struct in6_addr *src); -void tap_ip4_send(const struct ctx *c, in_addr_t src, uint8_t proto, - const char *in, size_t len); void tap_udp6_send(const struct ctx *c, const struct in6_addr *src, in_port_t sport, const struct in6_addr *dst, in_port_t dport, -- 2.37.3
The IPv4 specific dhcp() manually constructs L2 and IP headers to send its DHCP reply packet, unlike its IPv6 equivalent in dhcpv6.c which uses the tap_udp6_send() helper. Now that we've broaded the parameters to tap_udp4_send() we can use it in dhcp() to avoid some duplicated logic. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- dhcp.c | 18 ++---------------- tap.c | 1 - 2 files changed, 2 insertions(+), 17 deletions(-) diff --git a/dhcp.c b/dhcp.c index 2b3af82..d22698a 100644 --- a/dhcp.c +++ b/dhcp.c @@ -363,22 +363,8 @@ int dhcp(const struct ctx *c, const struct pool *p) if (!c->no_dhcp_dns_search) opt_set_dns_search(c, sizeof(m->o)); - uh->len = htons(len = offsetof(struct msg, o) + fill(m) + sizeof(*uh)); - uh->source = htons(67); - uh->dest = htons(68); - csum_udp4(uh, c->ip4.gw, c->ip4.addr, uh + 1, len - sizeof(*uh)); - - iph->tot_len = htons(len += sizeof(*iph)); - iph->daddr = c->ip4.addr; - iph->saddr = c->ip4.gw; - csum_ip4_header(iph); - - len += sizeof(*eh); - memcpy(eh->h_dest, eh->h_source, ETH_ALEN); - memcpy(eh->h_source, c->mac, ETH_ALEN); - - if (tap_send(c, eh, len) < 0) - perror("DHCP: send"); + len = offsetof(struct msg, o) + fill(m); + tap_udp4_send(c, c->ip4.gw, 67, c->ip4.addr, 68, m, len); return 1; } diff --git a/tap.c b/tap.c index d250a0b..3f78c99 100644 --- a/tap.c +++ b/tap.c @@ -170,7 +170,6 @@ static void *tap_push_ip4h(char *buf, in_addr_t src, in_addr_t dst, * @in: UDP payload contents (not including UDP header) * @len: UDP payload length (not including UDP header) */ -/* cppcheck-suppress unusedFunction */ void tap_udp4_send(const struct ctx *c, in_addr_t src, in_port_t sport, in_addr_t dst, in_port_t dport, const void *in, size_t len) -- 2.37.3
On Wed, 19 Oct 2022 11:43:43 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:The main packet "fast paths" for UDP and TCP mostly just forward packets rather than generating them from scratch. However the control paths for ICMP and DHCP sometimes generate packets more or less from scratch. Because these are relatively rare, it's not performance critical. The paths for sending these packets have some duplication of the header generation. There's also some layering violation in tap_ip_send() which both generates IP headers and updates the L4 (UDP or UCMP) checksum. Finally that checksum generation is a little awkward: it temporarily generates the IP pseudo header (or something close enough to serve) in the place of the actual header, generates the checksum, then replaces it with the real IP header. This approach seems to be causing miscompiles with some LTO optimization, because the stores to the pseudo header are being moved or elided across the code calculating the checksum. This series addresses all of these. We consolidate and clarify the packet sending helpers, and use them in some places there was previously duplicated code. In the process we use new checksum generation helpers which take a different approach which should avoid the LTO problems (this aspect I haven't tested yet though). Changes since v1: * Numerous minor style changes * Rename header generation helpers to make their behaviour clearer * Added several missing function doc comments * Corrected some erroneous statements and terms in commentsThanks, it looks good to me! I'm travelling, I'll apply in a bit. -- Stefano
On Wed, 19 Oct 2022 11:43:43 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:The main packet "fast paths" for UDP and TCP mostly just forward packets rather than generating them from scratch. However the control paths for ICMP and DHCP sometimes generate packets more or less from scratch. Because these are relatively rare, it's not performance critical. The paths for sending these packets have some duplication of the header generation. There's also some layering violation in tap_ip_send() which both generates IP headers and updates the L4 (UDP or UCMP) checksum. Finally that checksum generation is a little awkward: it temporarily generates the IP pseudo header (or something close enough to serve) in the place of the actual header, generates the checksum, then replaces it with the real IP header. This approach seems to be causing miscompiles with some LTO optimization, because the stores to the pseudo header are being moved or elided across the code calculating the checksum. This series addresses all of these. We consolidate and clarify the packet sending helpers, and use them in some places there was previously duplicated code. In the process we use new checksum generation helpers which take a different approach which should avoid the LTO problems (this aspect I haven't tested yet though). Changes since v1: * Numerous minor style changes * Rename header generation helpers to make their behaviour clearer * Added several missing function doc comments * Corrected some erroneous statements and terms in commentsApplied now, thanks, and sorry for the delay. -- Stefano