Laurent's changes to split TCP buffers into various components with IOVs is now merged. This series has a batch of small cleanups to make the handling of this slightly nicer. These are preliminaries to doing something similar with the UDP buffers. Note that patch 10/10 might interfere with the experiments to work out what is going wrong with the odd batching / performance issues we've seen. We can leave it off for the time being if that's a problem. Changes since v1: * Added new patch removing some unused structures * Added two new patches cleaning up some endianness confusion * Added a bunch of missing cases to standardisation of length variable names * Assorted minor revisions based on Stefano's review David Gibson (10): checksum: Use proto_ipv6_header_psum() for ICMPv6 as well tap: Split tap specific and L2 (ethernet) headers tap: Remove unused structs tap_msg, tap_l4_msg treewide: Remove misleading and redundant endianness notes checksum: Make csum_ip4_header() take a host endian length treewide: Standardise variable names for various packet lengths tcp: Simplify packet length calculation when preparing headers tap, tcp: (Re-)abstract TAP specific header handling iov: Helper macro to construct iovs covering existing variables or fields tcp: Update tap specific header too in tcp_fill_headers[46]() arp.c | 6 +- checksum.c | 68 ++++++++++----------- checksum.h | 12 ++-- dhcp.c | 6 +- icmp.c | 12 ++-- iov.h | 3 + ndp.c | 6 +- passt.h | 28 ++------- pcap.c | 14 ++--- pcap.h | 2 +- tap.c | 146 ++++++++++++++++++++++---------------------- tap.h | 60 ++++++++++++------ tcp.c | 176 +++++++++++++++++++++++------------------------------ udp.c | 52 ++++++++-------- 14 files changed, 290 insertions(+), 301 deletions(-) -- 2.44.0
7df624e79 ("checksum: introduce functions to compute the header part checksum for TCP/UDP") introduced a helper to compute the partial checksum for the IPv6 pseudo-header used in L4 protocol checksums. It used it in csum_udp6() for UDP packets, but not in csum_icmp6() for the identical calculation in csum_icmp6() for ICMPv6 packets. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/checksum.c b/checksum.c index f8a7b539..9cbe0b29 100644 --- a/checksum.c +++ b/checksum.c @@ -253,10 +253,8 @@ void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len) { - /* Partial checksum for the pseudo-IPv6 header */ - uint32_t psum = sum_16b(saddr, sizeof(*saddr)) + - sum_16b(daddr, sizeof(*daddr)) + - htons(len + sizeof(*icmp6hr)) + htons(IPPROTO_ICMPV6); + uint32_t psum = proto_ipv6_header_psum(len + sizeof(*icmp6hr), + IPPROTO_ICMPV6, saddr, daddr); icmp6hr->icmp6_cksum = 0; /* Add in partial checksum for the ICMPv6 header alone */ -- 2.44.0
In some places (well, actually only UDP now) we use struct tap_hdr to represent both tap backend specific and L2 ethernet headers. Handling these together seemed like a good idea at the time, but Laurent's changes in the TCP code working towards vhost-user support suggest that treating them separately is more useful, more often. Alter struct tap_hdr to represent only the TAP backend specific headers. Updated related helpers and the UDP code to match. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.h | 21 +++++++++------------ udp.c | 23 ++++++++++++++--------- 2 files changed, 23 insertions(+), 21 deletions(-) diff --git a/tap.h b/tap.h index 2adc4e2b..dbc23b31 100644 --- a/tap.h +++ b/tap.h @@ -6,30 +6,28 @@ #ifndef TAP_H #define TAP_H +#define ETH_HDR_INIT(proto) { .h_proto = htons_constant(proto) } + /** - * struct tap_hdr - L2 and tap specific headers + * struct tap_hdr - tap backend specific headers * @vnet_len: Frame length (for qemu socket transport) - * @eh: Ethernet header */ struct tap_hdr { uint32_t vnet_len; - struct ethhdr eh; } __attribute__((packed)); -#define TAP_HDR_INIT(proto) { .eh.h_proto = htons_constant(proto) } - static inline size_t tap_hdr_len_(const struct ctx *c) { if (c->mode == MODE_PASST) return sizeof(struct tap_hdr); else - return sizeof(struct ethhdr); + return 0; } /** * tap_frame_base() - Find start of tap frame * @c: Execution context - * @taph: Pointer to L2 and tap specific header buffer + * @taph: Pointer to tap specific header buffer * * Returns: pointer to the start of tap frame - suitable for an * iov_base to be passed to tap_send_frames()) @@ -43,17 +41,16 @@ static inline void *tap_frame_base(const struct ctx *c, struct tap_hdr *taph) * tap_frame_len() - Finalize tap frame and return total length * @c: Execution context * @taph: Tap header to finalize - * @plen: L3 packet length (excludes L2 and tap specific headers) + * @plen: L2 packet length (includes L2, excludes tap specific headers) * - * Returns: length of the tap frame including L2 and tap specific - * headers - suitable for an iov_len to be passed to - * tap_send_frames() + * Returns: length of the tap frame including tap specific headers - suitable + * for an iov_len to be passed to tap_send_frames() */ static inline size_t tap_frame_len(const struct ctx *c, struct tap_hdr *taph, size_t plen) { if (c->mode == MODE_PASST) - taph->vnet_len = htonl(plen + sizeof(taph->eh)); + taph->vnet_len = htonl(plen); return plen + tap_hdr_len_(c); } diff --git a/udp.c b/udp.c index 594ea191..4bf90591 100644 --- a/udp.c +++ b/udp.c @@ -173,7 +173,8 @@ static uint8_t udp_act[IP_VERSIONS][UDP_ACT_TYPE_MAX][DIV_ROUND_UP(NUM_PORTS, 8) /** * udp4_l2_buf_t - Pre-cooked IPv4 packet buffers for tap connections * @s_in: Source socket address, filled in by recvmmsg() - * @taph: Tap-level headers (partially pre-filled) + * @taph: Tap backend specific header + * @eh: Prefilled ethernet header * @iph: Pre-filled IP header (except for tot_len and saddr) * @uh: Headroom for UDP header * @data: Storage for UDP payload @@ -182,6 +183,7 @@ static struct udp4_l2_buf_t { struct sockaddr_in s_in; struct tap_hdr taph; + struct ethhdr eh; struct iphdr iph; struct udphdr uh; uint8_t data[USHRT_MAX - @@ -192,7 +194,8 @@ udp4_l2_buf[UDP_MAX_FRAMES]; /** * udp6_l2_buf_t - Pre-cooked IPv6 packet buffers for tap connections * @s_in6: Source socket address, filled in by recvmmsg() - * @taph: Tap-level headers (partially pre-filled) + * @taph: Tap backend specific header + * @eh: Pre-filled ethernet header * @ip6h: Pre-filled IP header (except for payload_len and addresses) * @uh: Headroom for UDP header * @data: Storage for UDP payload @@ -202,10 +205,11 @@ struct udp6_l2_buf_t { #ifdef __AVX2__ /* Align ip6h to 32-byte boundary. */ uint8_t pad[64 - (sizeof(struct sockaddr_in6) + sizeof(struct ethhdr) + - sizeof(uint32_t))]; + sizeof(struct tap_hdr))]; #endif struct tap_hdr taph; + struct ethhdr eh; struct ipv6hdr ip6h; struct udphdr uh; uint8_t data[USHRT_MAX - @@ -289,8 +293,8 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s) struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i]; struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i]; - eth_update_mac(&b4->taph.eh, eth_d, eth_s); - eth_update_mac(&b6->taph.eh, eth_d, eth_s); + eth_update_mac(&b4->eh, eth_d, eth_s); + eth_update_mac(&b6->eh, eth_d, eth_s); } } @@ -307,7 +311,7 @@ static void udp_sock4_iov_init_one(const struct ctx *c, size_t i) struct iovec *tiov = &udp4_l2_iov_tap[i]; *buf = (struct udp4_l2_buf_t) { - .taph = TAP_HDR_INIT(ETH_P_IP), + .eh = ETH_HDR_INIT(ETH_P_IP), .iph = L2_BUF_IP4_INIT(IPPROTO_UDP) }; @@ -335,7 +339,7 @@ static void udp_sock6_iov_init_one(const struct ctx *c, size_t i) struct iovec *tiov = &udp6_l2_iov_tap[i]; *buf = (struct udp6_l2_buf_t) { - .taph = TAP_HDR_INIT(ETH_P_IPV6), + .eh = ETH_HDR_INIT(ETH_P_IPV6), .ip6h = L2_BUF_IP6_INIT(IPPROTO_UDP) }; @@ -608,7 +612,7 @@ static size_t udp_update_hdr4(const struct ctx *c, struct udp4_l2_buf_t *b, b->uh.dest = htons(dstport); b->uh.len = htons(datalen + sizeof(b->uh)); - return tap_frame_len(c, &b->taph, ip_len); + return tap_frame_len(c, &b->taph, ip_len + sizeof(b->eh)); } /** @@ -676,7 +680,8 @@ static size_t udp_update_hdr6(const struct ctx *c, struct udp6_l2_buf_t *b, b->uh.len = b->ip6h.payload_len; csum_udp6(&b->uh, src, dst, b->data, datalen); - return tap_frame_len(c, &b->taph, payload_len + sizeof(b->ip6h)); + return tap_frame_len(c, &b->taph, payload_len + + sizeof(b->ip6h) + sizeof(b->eh)); } /** -- 2.44.0
Use of these structures was removed in bb708111833e ("treewide: Packet abstraction with mandatory boundary checks"). Remove the stale declarations. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- passt.h | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/passt.h b/passt.h index 76026f09..89a7d094 100644 --- a/passt.h +++ b/passt.h @@ -9,26 +9,6 @@ #define UNIX_SOCK_MAX 100 #define UNIX_SOCK_PATH "/tmp/passt_%i.socket" -/** - * struct tap_msg - Generic message descriptor for arrays of messages - * @pkt_buf_offset: Offset from @pkt_buf - * @len: Message length, with L2 headers - */ -struct tap_msg { - uint32_t pkt_buf_offset; - uint16_t len; -}; - -/** - * struct tap_l4_msg - Layer-4 message descriptor for protocol handlers - * @pkt_buf_offset: Offset of message from @pkt_buf - * @l4_len: Length of Layer-4 payload, host order - */ -struct tap_l4_msg { - uint32_t pkt_buf_offset; - uint16_t l4_len; -}; - union epoll_ref; #include <stdbool.h> -- 2.44.0
In general, it's much less error-prone to have the endianness of values implied by the type, rather than just noting it in comments. We can't always easily avoid it, because C, but we can do so when possible. struct in_addr and in6_addr are always encoded network endian, so noting it explicitly isn't useful. Remove them. In some cases we also have endianness notes on uint8_t parameters, which doesn't make sense: for a single byte endianness is irrelevant. Remove those too. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 18 +++++++++--------- passt.h | 8 ++++---- tap.c | 6 +++--- 3 files changed, 16 insertions(+), 16 deletions(-) diff --git a/checksum.c b/checksum.c index 9cbe0b29..a5a506c7 100644 --- a/checksum.c +++ b/checksum.c @@ -117,9 +117,9 @@ uint16_t csum_fold(uint32_t sum) /** * csum_ip4_header() - Calculate IPv4 header checksum * @tot_len: IPv4 payload length (data + IP header, network order) - * @protocol: Protocol number (network order) - * @saddr: IPv4 source address (network order) - * @daddr: IPv4 destination address (network order) + * @protocol: Protocol number + * @saddr: IPv4 source address + * @daddr: IPv4 destination address * * Return: 16-bit folded sum of the IPv4 header */ @@ -141,9 +141,9 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, * proto_ipv4_header_psum() - Calculates the partial checksum of an * IPv4 header for UDP or TCP * @tot_len: IPv4 Payload length (host order) - * @proto: Protocol number (host order) - * @saddr: Source address (network order) - * @daddr: Destination address (network order) + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address * Returns: Partial checksum of the IPv4 header */ uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, @@ -206,9 +206,9 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) * proto_ipv6_header_psum() - Calculates the partial checksum of an * IPv6 header for UDP or TCP * @payload_len: IPv6 payload length (host order) - * @proto: Protocol number (host order) - * @saddr: Source address (network order) - * @daddr: Destination address (network order) + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address * Returns: Partial checksum of the IPv6 header */ uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, diff --git a/passt.h b/passt.h index 89a7d094..bc58d647 100644 --- a/passt.h +++ b/passt.h @@ -124,10 +124,10 @@ enum passt_modes { * @addr: IPv4 address for external, routable interface * @addr_seen: Latest IPv4 address seen as source from tap * @prefixlen: IPv4 prefix length (netmask) - * @gw: Default IPv4 gateway, network order - * @dns: DNS addresses for DHCP, zero-terminated, network order - * @dns_match: Forward DNS query if sent to this address, network order - * @dns_host: Use this DNS on the host for forwarding, network order + * @gw: Default IPv4 gateway + * @dns: DNS addresses for DHCP, zero-terminated + * @dns_match: Forward DNS query if sent to this address + * @dns_host: Use this DNS on the host for forwarding * @addr_out: Optional source address for outbound traffic * @ifname_out: Optional interface name to bind outbound sockets to */ diff --git a/tap.c b/tap.c index 13e4da79..d0ef6b5c 100644 --- a/tap.c +++ b/tap.c @@ -95,7 +95,7 @@ void tap_send_single(const struct ctx *c, const void *data, size_t len) * tap_ip4_daddr() - Normal IPv4 destination address for inbound packets * @c: Execution context * - * Return: IPv4 address, network order + * Return: IPv4 address */ struct in_addr tap_ip4_daddr(const struct ctx *c) { @@ -139,8 +139,8 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto) /** * tap_push_ip4h() - Build IPv4 header for inbound packet, with checksum * @c: Execution context - * @src: IPv4 source address, network order - * @dst: IPv4 destination address, network order + * @src: IPv4 source address + * @dst: IPv4 destination address * @len: L4 payload length * @proto: L4 protocol number * -- 2.44.0
csum_ip4_header() takes the packet length as a network endian value. In general it's very error-prone to pass non-native-endian values as a raw integer. It's particularly bad here because this differs from other checksum functions (e.g. proto_ipv4_header_psum()) which take host native lengths. It turns out all the callers have easy access to the native endian value, so switch it to use host order like everything else. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 4 ++-- tap.c | 6 ++++-- tcp.c | 2 +- udp.c | 2 +- 4 files changed, 8 insertions(+), 6 deletions(-) diff --git a/checksum.c b/checksum.c index a5a506c7..b330e1ef 100644 --- a/checksum.c +++ b/checksum.c @@ -116,7 +116,7 @@ uint16_t csum_fold(uint32_t sum) /** * csum_ip4_header() - Calculate IPv4 header checksum - * @tot_len: IPv4 payload length (data + IP header, network order) + * @tot_len: IPv4 packet length (data + IP header, host order) * @protocol: Protocol number * @saddr: IPv4 source address * @daddr: IPv4 destination address @@ -128,7 +128,7 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, { uint32_t sum = L2_BUF_IP4_PSUM(protocol); - sum += tot_len; + sum += htons(tot_len); sum += (saddr.s_addr >> 16) & 0xffff; sum += saddr.s_addr & 0xffff; sum += (daddr.s_addr >> 16) & 0xffff; diff --git a/tap.c b/tap.c index d0ef6b5c..230566ba 100644 --- a/tap.c +++ b/tap.c @@ -149,17 +149,19 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto) static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, struct in_addr dst, size_t len, uint8_t proto) { + uint16_t tot_len = len + sizeof(*ip4h); + ip4h->version = 4; ip4h->ihl = sizeof(struct iphdr) / 4; ip4h->tos = 0; - ip4h->tot_len = htons(len + sizeof(*ip4h)); + ip4h->tot_len = htons(tot_len); ip4h->id = 0; ip4h->frag_off = 0; ip4h->ttl = 255; ip4h->protocol = proto; ip4h->saddr = src.s_addr; ip4h->daddr = dst.s_addr; - ip4h->check = csum_ip4_header(ip4h->tot_len, proto, src, dst); + ip4h->check = csum_ip4_header(tot_len, proto, src, dst); return ip4h + 1; } diff --git a/tcp.c b/tcp.c index 24f99cdf..3ba3aa4d 100644 --- a/tcp.c +++ b/tcp.c @@ -1359,7 +1359,7 @@ static size_t tcp_fill_headers4(const struct ctx *c, iph->daddr = c->ip4.addr_seen.s_addr; iph->check = check ? *check : - csum_ip4_header(iph->tot_len, IPPROTO_TCP, + csum_ip4_header(ip_len, IPPROTO_TCP, *a4, c->ip4.addr_seen); tcp_fill_header(th, conn, seq); diff --git a/udp.c b/udp.c index 4bf90591..09f98130 100644 --- a/udp.c +++ b/udp.c @@ -605,7 +605,7 @@ static size_t udp_update_hdr4(const struct ctx *c, struct udp4_l2_buf_t *b, b->iph.tot_len = htons(ip_len); b->iph.daddr = c->ip4.addr_seen.s_addr; b->iph.saddr = src.s_addr; - b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.check = csum_ip4_header(ip_len, IPPROTO_UDP, src, c->ip4.addr_seen); b->uh.source = b->s_in.sin_port; -- 2.44.0
At various points we need to track the lengths of a packet including or excluding various different sets of headers. We don't always use the same variable names for doing so. Worse in some places we use the same name for different things: e.g. tcp_fill_headers[46]() use ip_len for the length including the IP headers, but then tcp_send_flag() which calls it uses it to mean the IP payload length only. To improve clarity, standardise on these names: dlen: L4 protocol payload length ("data length") l4len: plen + length of L4 protocol header l3len: l4len + length of IPv4/IPv6 header l2len: l3len + length of L2 (ethernet) header Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- arp.c | 6 +-- checksum.c | 46 +++++++++--------- checksum.h | 12 ++--- dhcp.c | 6 +-- icmp.c | 12 ++--- ndp.c | 6 +-- pcap.c | 14 +++--- pcap.h | 2 +- tap.c | 137 +++++++++++++++++++++++++++-------------------------- tap.h | 18 +++---- tcp.c | 102 +++++++++++++++++++-------------------- udp.c | 26 +++++----- 12 files changed, 194 insertions(+), 193 deletions(-) diff --git a/arp.c b/arp.c index 113cda2f..93b22c5d 100644 --- a/arp.c +++ b/arp.c @@ -43,7 +43,7 @@ int arp(const struct ctx *c, const struct pool *p) struct ethhdr *eh; struct arphdr *ah; struct arpmsg *am; - size_t len; + size_t l2len; eh = packet_get(p, 0, 0, sizeof(*eh), NULL); ah = packet_get(p, 0, sizeof(*eh), sizeof(*ah), NULL); @@ -78,11 +78,11 @@ int arp(const struct ctx *c, const struct pool *p) memcpy(am->tip, am->sip, sizeof(am->tip)); memcpy(am->sip, swap, sizeof(am->sip)); - len = sizeof(*eh) + sizeof(*ah) + sizeof(*am); + l2len = sizeof(*eh) + sizeof(*ah) + sizeof(*am); memcpy(eh->h_dest, eh->h_source, sizeof(eh->h_dest)); memcpy(eh->h_source, c->mac, sizeof(eh->h_source)); - tap_send_single(c, eh, len); + tap_send_single(c, eh, l2len); return 1; } diff --git a/checksum.c b/checksum.c index b330e1ef..006614fc 100644 --- a/checksum.c +++ b/checksum.c @@ -116,19 +116,19 @@ uint16_t csum_fold(uint32_t sum) /** * csum_ip4_header() - Calculate IPv4 header checksum - * @tot_len: IPv4 packet length (data + IP header, host order) + * @l3len: IPv4 packet length (host order) * @protocol: Protocol number * @saddr: IPv4 source address * @daddr: IPv4 destination address * * Return: 16-bit folded sum of the IPv4 header */ -uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, +uint16_t csum_ip4_header(uint16_t l3len, uint8_t protocol, struct in_addr saddr, struct in_addr daddr) { uint32_t sum = L2_BUF_IP4_PSUM(protocol); - sum += htons(tot_len); + sum += htons(l3len); sum += (saddr.s_addr >> 16) & 0xffff; sum += saddr.s_addr & 0xffff; sum += (daddr.s_addr >> 16) & 0xffff; @@ -140,13 +140,13 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, /** * proto_ipv4_header_psum() - Calculates the partial checksum of an * IPv4 header for UDP or TCP - * @tot_len: IPv4 Payload length (host order) + * @l4len: IPv4 Payload length (host order) * @proto: Protocol number * @saddr: Source address * @daddr: Destination address * Returns: Partial checksum of the IPv4 header */ -uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, +uint32_t proto_ipv4_header_psum(uint16_t l4len, uint8_t protocol, struct in_addr saddr, struct in_addr daddr) { uint32_t psum = htons(protocol); @@ -155,7 +155,7 @@ uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, psum += saddr.s_addr & 0xffff; psum += (daddr.s_addr >> 16) & 0xffff; psum += daddr.s_addr & 0xffff; - psum += htons(tot_len); + psum += htons(l4len); return psum; } @@ -165,22 +165,22 @@ uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, * @udp4hr: UDP header, initialised apart from checksum * @saddr: IPv4 source address * @daddr: IPv4 destination address - * @payload: ICMPv4 packet payload - * @len: Length of @payload (not including UDP) + * @payload: UDP packet payload + * @dlen: Length of @payload (not including UDP header) */ void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, - const void *payload, size_t len) + const void *payload, size_t dlen) { /* UDP checksums are optional, so don't bother */ udp4hr->check = 0; if (UDP4_REAL_CHECKSUMS) { - uint16_t tot_len = len + sizeof(struct udphdr); - uint32_t psum = proto_ipv4_header_psum(tot_len, IPPROTO_UDP, + uint16_t l4len = dlen + sizeof(struct udphdr); + uint32_t psum = proto_ipv4_header_psum(l4len, IPPROTO_UDP, saddr, daddr); psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum); - udp4hr->check = csum(payload, len, psum); + udp4hr->check = csum(payload, dlen, psum); } } @@ -188,9 +188,9 @@ void csum_udp4(struct udphdr *udp4hr, * csum_icmp4() - Calculate and set checksum for an ICMP packet * @icmp4hr: ICMP header, initialised apart from checksum * @payload: ICMP packet payload - * @len: Length of @payload (not including ICMP header) + * @dlen: Length of @payload (not including ICMP header) */ -void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) +void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t dlen) { uint32_t psum; @@ -199,7 +199,7 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) /* Partial checksum for ICMP header alone */ psum = sum_16b(icmp4hr, sizeof(*icmp4hr)); - icmp4hr->checksum = csum(payload, len, psum); + icmp4hr->checksum = csum(payload, dlen, psum); } /** @@ -227,18 +227,18 @@ uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet * @udp6hr: UDP header, initialised apart from checksum * @payload: UDP packet payload - * @len: Length of @payload (not including UDP header) + * @dlen: Length of @payload (not including UDP header) */ void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, - const void *payload, size_t len) + const void *payload, size_t dlen) { - uint32_t psum = proto_ipv6_header_psum(len + sizeof(struct udphdr), + uint32_t psum = proto_ipv6_header_psum(dlen + sizeof(struct udphdr), IPPROTO_UDP, saddr, daddr); udp6hr->check = 0; psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum); - udp6hr->check = csum(payload, len, psum); + udp6hr->check = csum(payload, dlen, psum); } /** @@ -247,19 +247,19 @@ void csum_udp6(struct udphdr *udp6hr, * @saddr: IPv6 source address * @daddr: IPv6 destination address * @payload: ICMP packet payload - * @len: Length of @payload (not including ICMPv6 header) + * @dlen: Length of @payload (not including ICMPv6 header) */ void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, - const void *payload, size_t len) + const void *payload, size_t dlen) { - uint32_t psum = proto_ipv6_header_psum(len + sizeof(*icmp6hr), + uint32_t psum = proto_ipv6_header_psum(dlen + sizeof(*icmp6hr), IPPROTO_ICMPV6, saddr, daddr); icmp6hr->icmp6_cksum = 0; /* Add in partial checksum for the ICMPv6 header alone */ psum += sum_16b(icmp6hr, sizeof(*icmp6hr)); - icmp6hr->icmp6_cksum = csum(payload, len, psum); + icmp6hr->icmp6_cksum = csum(payload, dlen, psum); } #ifdef __AVX2__ diff --git a/checksum.h b/checksum.h index 0f396767..c5964ac7 100644 --- a/checksum.h +++ b/checksum.h @@ -13,23 +13,23 @@ struct icmp6hdr; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); -uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, +uint16_t csum_ip4_header(uint16_t l3len, uint8_t protocol, struct in_addr saddr, struct in_addr daddr); -uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, +uint32_t proto_ipv4_header_psum(uint16_t l4len, uint8_t protocol, struct in_addr saddr, struct in_addr daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, - const void *payload, size_t len); -void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len); + const void *payload, size_t dlen); +void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t dlen); uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, const struct in6_addr *saddr, const struct in6_addr *daddr); void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, - const void *payload, size_t len); + const void *payload, size_t dlen); void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, - const void *payload, size_t len); + const void *payload, size_t dlen); uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init); uint16_t csum(const void *buf, size_t len, uint32_t init); uint16_t csum_iov(const struct iovec *iov, size_t n, uint32_t init); diff --git a/dhcp.c b/dhcp.c index ff4834a3..aa9f59da 100644 --- a/dhcp.c +++ b/dhcp.c @@ -275,7 +275,7 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len) */ int dhcp(const struct ctx *c, const struct pool *p) { - size_t mlen, len, offset = 0, opt_len, opt_off = 0; + size_t mlen, dlen, offset = 0, opt_len, opt_off = 0; const struct ethhdr *eh; const struct iphdr *iph; const struct udphdr *uh; @@ -377,8 +377,8 @@ int dhcp(const struct ctx *c, const struct pool *p) if (!c->no_dhcp_dns_search) opt_set_dns_search(c, sizeof(m->o)); - len = offsetof(struct msg, o) + fill(m); - tap_udp4_send(c, c->ip4.gw, 67, c->ip4.addr, 68, m, len); + dlen = offsetof(struct msg, o) + fill(m); + tap_udp4_send(c, c->ip4.gw, 67, c->ip4.addr, 68, m, dlen); return 1; } diff --git a/icmp.c b/icmp.c index 76bb9e9f..1c5cf84b 100644 --- a/icmp.c +++ b/icmp.c @@ -224,8 +224,8 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, union sockaddr_inany sa = { .sa_family = af }; const socklen_t sl = af == AF_INET ? sizeof(sa.sa4) : sizeof(sa.sa6); struct icmp_ping_flow *pingf, **id_sock; + size_t dlen, l4len; uint16_t id, seq; - size_t plen; void *pkt; (void)saddr; @@ -234,11 +234,11 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, if (af == AF_INET) { const struct icmphdr *ih; - if (!(pkt = packet_get(p, 0, 0, sizeof(*ih), &plen))) + if (!(pkt = packet_get(p, 0, 0, sizeof(*ih), &dlen))) return 1; ih = (struct icmphdr *)pkt; - plen += sizeof(*ih); + l4len = dlen + sizeof(*ih); if (ih->type != ICMP_ECHO) return 1; @@ -250,11 +250,11 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, } else if (af == AF_INET6) { const struct icmp6hdr *ih; - if (!(pkt = packet_get(p, 0, 0, sizeof(*ih), &plen))) + if (!(pkt = packet_get(p, 0, 0, sizeof(*ih), &dlen))) return 1; ih = (struct icmp6hdr *)pkt; - plen += sizeof(*ih); + l4len = dlen + sizeof(*ih); if (ih->icmp6_type != ICMPV6_ECHO_REQUEST) return 1; @@ -274,7 +274,7 @@ int icmp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, pingf->ts = now->tv_sec; - if (sendto(pingf->sock, pkt, plen, MSG_NOSIGNAL, &sa.sa, sl) < 0) { + if (sendto(pingf->sock, pkt, l4len, MSG_NOSIGNAL, &sa.sa, sl) < 0) { flow_dbg(pingf, "failed to relay request to socket: %s", strerror(errno)); } else { diff --git a/ndp.c b/ndp.c index c58f4b22..cea3df5d 100644 --- a/ndp.c +++ b/ndp.c @@ -54,7 +54,7 @@ int ndp(struct ctx *c, const struct icmp6hdr *ih, const struct in6_addr *saddr) struct icmp6hdr *ihr; struct ethhdr *ehr; unsigned char *p; - size_t len; + size_t dlen; if (ih->icmp6_type < RS || ih->icmp6_type > NA) return 0; @@ -175,7 +175,7 @@ dns_done: return 1; } - len = (uintptr_t)p - (uintptr_t)ihr - sizeof(*ihr); + dlen = (uintptr_t)p - (uintptr_t)ihr - sizeof(*ihr); if (IN6_IS_ADDR_LINKLOCAL(saddr)) c->ip6.addr_ll_seen = *saddr; @@ -187,7 +187,7 @@ dns_done: else rsaddr = &c->ip6.addr_ll; - tap_icmp6_send(c, rsaddr, saddr, ihr, len + sizeof(*ihr)); + tap_icmp6_send(c, rsaddr, saddr, ihr, dlen + sizeof(*ihr)); return 1; } diff --git a/pcap.c b/pcap.c index 45bbfcd4..507be2ac 100644 --- a/pcap.c +++ b/pcap.c @@ -79,30 +79,30 @@ struct pcap_pkthdr { static void pcap_frame(const struct iovec *iov, size_t iovcnt, size_t offset, const struct timespec *now) { - size_t len = iov_size(iov, iovcnt) - offset; + size_t l2len = iov_size(iov, iovcnt) - offset; struct pcap_pkthdr h = { .tv_sec = now->tv_sec, .tv_usec = DIV_ROUND_CLOSEST(now->tv_nsec, 1000), - .caplen = len, - .len = len + .caplen = l2len, + .len = l2len }; struct iovec hiov = { &h, sizeof(h) }; if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || write_remainder(pcap_fd, iov, iovcnt, offset) < 0) { debug("Cannot log packet, length %zu: %s", - len, strerror(errno)); + l2len, strerror(errno)); } } /** * pcap() - Capture a single frame to pcap file * @pkt: Pointer to data buffer, including L2 headers - * @len: L2 packet length + * @l2len: L2 frame length */ -void pcap(const char *pkt, size_t len) +void pcap(const char *pkt, size_t l2len) { - struct iovec iov = { (char *)pkt, len }; + struct iovec iov = { (char *)pkt, l2len }; struct timespec now; if (pcap_fd == -1) diff --git a/pcap.h b/pcap.h index 15d46572..53392374 100644 --- a/pcap.h +++ b/pcap.h @@ -6,7 +6,7 @@ #ifndef PCAP_H #define PCAP_H -void pcap(const char *pkt, size_t len); +void pcap(const char *pkt, size_t l2len); void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, size_t offset); void pcap_iov(const struct iovec *iov, size_t iovcnt); diff --git a/tap.c b/tap.c index 230566ba..c6864ac6 100644 --- a/tap.c +++ b/tap.c @@ -70,11 +70,11 @@ static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_buf); * tap_send_single() - Send a single frame * @c: Execution context * @data: Packet buffer - * @len: Total L2 packet length + * @l2len: Total L2 packet length */ -void tap_send_single(const struct ctx *c, const void *data, size_t len) +void tap_send_single(const struct ctx *c, const void *data, size_t l2len) { - uint32_t vnet_len = htonl(len); + uint32_t vnet_len = htonl(l2len); struct iovec iov[2]; size_t iovcnt = 0; @@ -85,7 +85,7 @@ void tap_send_single(const struct ctx *c, const void *data, size_t len) } iov[iovcnt].iov_base = (void *)data; - iov[iovcnt].iov_len = len; + iov[iovcnt].iov_len = l2len; iovcnt++; tap_send_frames(c, iov, iovcnt, 1); @@ -141,27 +141,27 @@ static void *tap_push_l2h(const struct ctx *c, void *buf, uint16_t proto) * @c: Execution context * @src: IPv4 source address * @dst: IPv4 destination address - * @len: L4 payload length + * @l4len: IPv4 payload length * @proto: L4 protocol number * * Return: pointer at which to write the packet's payload */ static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, - struct in_addr dst, size_t len, uint8_t proto) + struct in_addr dst, size_t l4len, uint8_t proto) { - uint16_t tot_len = len + sizeof(*ip4h); + uint16_t l3len = l4len + sizeof(*ip4h); ip4h->version = 4; ip4h->ihl = sizeof(struct iphdr) / 4; ip4h->tos = 0; - ip4h->tot_len = htons(tot_len); + ip4h->tot_len = htons(l3len); ip4h->id = 0; ip4h->frag_off = 0; ip4h->ttl = 255; ip4h->protocol = proto; ip4h->saddr = src.s_addr; ip4h->daddr = dst.s_addr; - ip4h->check = csum_ip4_header(tot_len, proto, src, dst); + ip4h->check = csum_ip4_header(l3len, proto, src, dst); return ip4h + 1; } @@ -173,25 +173,25 @@ static void *tap_push_ip4h(struct iphdr *ip4h, struct in_addr src, * @dst: IPv4 destination address * @dport: UDP destination port * @in: UDP payload contents (not including UDP header) - * @len: UDP payload length (not including UDP header) + * @dlen: UDP payload length (not including UDP header) */ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport, struct in_addr dst, in_port_t dport, - const void *in, size_t len) + const void *in, size_t dlen) { - size_t udplen = len + sizeof(struct udphdr); + size_t l4len = dlen + sizeof(struct udphdr); char buf[USHRT_MAX]; struct iphdr *ip4h = tap_push_l2h(c, buf, ETH_P_IP); - struct udphdr *uh = tap_push_ip4h(ip4h, src, dst, udplen, IPPROTO_UDP); + struct udphdr *uh = tap_push_ip4h(ip4h, src, dst, l4len, IPPROTO_UDP); char *data = (char *)(uh + 1); uh->source = htons(sport); uh->dest = htons(dport); - uh->len = htons(udplen); - csum_udp4(uh, src, dst, in, len); - memcpy(data, in, len); + uh->len = htons(l4len); + csum_udp4(uh, src, dst, in, dlen); + memcpy(data, in, dlen); - tap_send_single(c, buf, len + (data - buf)); + tap_send_single(c, buf, dlen + (data - buf)); } /** @@ -200,20 +200,20 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport, * @src: IPv4 source address * @dst: IPv4 destination address * @in: ICMP packet, including ICMP header - * @len: ICMP packet length, including ICMP header + * @l4len: ICMP packet length, including ICMP header */ void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst, - const void *in, size_t len) + const void *in, size_t l4len) { char buf[USHRT_MAX]; struct iphdr *ip4h = tap_push_l2h(c, buf, ETH_P_IP); struct icmphdr *icmp4h = tap_push_ip4h(ip4h, src, dst, - len, IPPROTO_ICMP); + l4len, IPPROTO_ICMP); - memcpy(icmp4h, in, len); - csum_icmp4(icmp4h, icmp4h + 1, len - sizeof(*icmp4h)); + memcpy(icmp4h, in, l4len); + csum_icmp4(icmp4h, icmp4h + 1, l4len - sizeof(*icmp4h)); - tap_send_single(c, buf, len + ((char *)icmp4h - buf)); + tap_send_single(c, buf, l4len + ((char *)icmp4h - buf)); } /** @@ -221,7 +221,7 @@ void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst, * @c: Execution context * @src: IPv6 source address * @dst: IPv6 destination address - * @len: L4 payload length + * @l4len: L4 payload length * @proto: L4 protocol number * @flow: IPv6 flow identifier * @@ -230,9 +230,9 @@ void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst, static void *tap_push_ip6h(struct ipv6hdr *ip6h, const struct in6_addr *src, const struct in6_addr *dst, - size_t len, uint8_t proto, uint32_t flow) + size_t l4len, uint8_t proto, uint32_t flow) { - ip6h->payload_len = htons(len); + ip6h->payload_len = htons(l4len); ip6h->priority = 0; ip6h->version = 6; ip6h->nexthdr = proto; @@ -254,27 +254,27 @@ static void *tap_push_ip6h(struct ipv6hdr *ip6h, * @dport: UDP destination port * @flow: Flow label * @in: UDP payload contents (not including UDP header) - * @len: UDP payload length (not including UDP header) + * @dlen: UDP payload length (not including UDP header) */ void tap_udp6_send(const struct ctx *c, const struct in6_addr *src, in_port_t sport, const struct in6_addr *dst, in_port_t dport, - uint32_t flow, const void *in, size_t len) + uint32_t flow, const void *in, size_t dlen) { - size_t udplen = len + sizeof(struct udphdr); + size_t l4len = dlen + sizeof(struct udphdr); char buf[USHRT_MAX]; struct ipv6hdr *ip6h = tap_push_l2h(c, buf, ETH_P_IPV6); struct udphdr *uh = tap_push_ip6h(ip6h, src, dst, - udplen, IPPROTO_UDP, flow); + l4len, IPPROTO_UDP, flow); char *data = (char *)(uh + 1); uh->source = htons(sport); uh->dest = htons(dport); - uh->len = htons(udplen); - csum_udp6(uh, src, dst, in, len); - memcpy(data, in, len); + uh->len = htons(l4len); + csum_udp6(uh, src, dst, in, dlen); + memcpy(data, in, dlen); - tap_send_single(c, buf, len + (data - buf)); + tap_send_single(c, buf, dlen + (data - buf)); } /** @@ -283,21 +283,21 @@ void tap_udp6_send(const struct ctx *c, * @src: IPv6 source address * @dst: IPv6 destination address * @in: ICMP packet, including ICMP header - * @len: ICMP packet length, including ICMP header + * @l4len: ICMP packet length, including ICMP header */ void tap_icmp6_send(const struct ctx *c, const struct in6_addr *src, const struct in6_addr *dst, - const void *in, size_t len) + const void *in, size_t l4len) { char buf[USHRT_MAX]; struct ipv6hdr *ip6h = tap_push_l2h(c, buf, ETH_P_IPV6); - struct icmp6hdr *icmp6h = tap_push_ip6h(ip6h, src, dst, len, + struct icmp6hdr *icmp6h = tap_push_ip6h(ip6h, src, dst, l4len, IPPROTO_ICMPV6, 0); - memcpy(icmp6h, in, len); - csum_icmp6(icmp6h, src, dst, icmp6h + 1, len - sizeof(*icmp6h)); + memcpy(icmp6h, in, l4len); + csum_icmp6(icmp6h, src, dst, icmp6h + 1, l4len - sizeof(*icmp6h)); - tap_send_single(c, buf, len + ((char *)icmp6h - buf)); + tap_send_single(c, buf, l4len + ((char *)icmp6h - buf)); } /** @@ -591,21 +591,21 @@ static int tap4_handler(struct ctx *c, const struct pool *in, i = 0; resume: for (seq_count = 0, seq = NULL; i < in->count; i++) { - size_t l2_len, l3_len, hlen, l4_len; + size_t l2len, l3len, hlen, l4len; const struct ethhdr *eh; const struct udphdr *uh; struct iphdr *iph; const char *l4h; - packet_get(in, i, 0, 0, &l2_len); + packet_get(in, i, 0, 0, &l2len); - eh = packet_get(in, i, 0, sizeof(*eh), &l3_len); + eh = packet_get(in, i, 0, sizeof(*eh), &l3len); if (!eh) continue; if (ntohs(eh->h_proto) == ETH_P_ARP) { PACKET_POOL_P(pkt, 1, in->buf, sizeof(pkt_buf)); - packet_add(pkt, l2_len, (char *)eh); + packet_add(pkt, l2len, (char *)eh); arp(c, pkt); continue; } @@ -615,15 +615,15 @@ resume: continue; hlen = iph->ihl * 4UL; - if (hlen < sizeof(*iph) || htons(iph->tot_len) > l3_len || - hlen > l3_len) + if (hlen < sizeof(*iph) || htons(iph->tot_len) > l3len || + hlen > l3len) continue; /* We don't handle IP fragments, drop them */ if (tap4_is_fragment(iph, now)) continue; - l4_len = htons(iph->tot_len) - hlen; + l4len = htons(iph->tot_len) - hlen; if (IN4_IS_ADDR_LOOPBACK(&iph->saddr) || IN4_IS_ADDR_LOOPBACK(&iph->daddr)) { @@ -638,7 +638,7 @@ resume: if (iph->saddr && c->ip4.addr_seen.s_addr != iph->saddr) c->ip4.addr_seen.s_addr = iph->saddr; - l4h = packet_get(in, i, sizeof(*eh) + hlen, l4_len, NULL); + l4h = packet_get(in, i, sizeof(*eh) + hlen, l4len, NULL); if (!l4h) continue; @@ -650,7 +650,7 @@ resume: tap_packet_debug(iph, NULL, NULL, 0, NULL, 1); - packet_add(pkt, l4_len, l4h); + packet_add(pkt, l4len, l4h); icmp_tap_handler(c, PIF_TAP, AF_INET, &iph->saddr, &iph->daddr, pkt, now); @@ -664,7 +664,7 @@ resume: if (iph->protocol == IPPROTO_UDP) { PACKET_POOL_P(pkt, 1, in->buf, sizeof(pkt_buf)); - packet_add(pkt, l2_len, (char *)eh); + packet_add(pkt, l2len, (char *)eh); if (dhcp(c, pkt)) continue; } @@ -713,7 +713,7 @@ resume: #undef L4_SET append: - packet_add((struct pool *)&seq->p, l4_len, l4h); + packet_add((struct pool *)&seq->p, l4len, l4h); } for (j = 0, seq = tap4_l4; j < seq_count; j++, seq++) { @@ -765,7 +765,7 @@ static int tap6_handler(struct ctx *c, const struct pool *in, i = 0; resume: for (seq_count = 0, seq = NULL; i < in->count; i++) { - size_t l4_len, plen, check; + size_t l4len, plen, check; struct in6_addr *saddr, *daddr; const struct ethhdr *eh; const struct udphdr *uh; @@ -788,7 +788,7 @@ resume: if (plen != check) continue; - if (!(l4h = ipv6_l4hdr(in, i, sizeof(*eh), &proto, &l4_len))) + if (!(l4h = ipv6_l4hdr(in, i, sizeof(*eh), &proto, &l4len))) continue; if (IN6_IS_ADDR_LOOPBACK(saddr) || IN6_IS_ADDR_LOOPBACK(daddr)) { @@ -816,7 +816,7 @@ resume: if (c->no_icmp) continue; - if (l4_len < sizeof(struct icmp6hdr)) + if (l4len < sizeof(struct icmp6hdr)) continue; if (ndp(c, (struct icmp6hdr *)l4h, saddr)) @@ -824,20 +824,20 @@ resume: tap_packet_debug(NULL, ip6h, NULL, proto, NULL, 1); - packet_add(pkt, l4_len, l4h); + packet_add(pkt, l4len, l4h); icmp_tap_handler(c, PIF_TAP, AF_INET6, saddr, daddr, pkt, now); continue; } - if (l4_len < sizeof(*uh)) + if (l4len < sizeof(*uh)) continue; uh = (struct udphdr *)l4h; if (proto == IPPROTO_UDP) { PACKET_POOL_P(pkt, 1, in->buf, sizeof(pkt_buf)); - packet_add(pkt, l4_len, l4h); + packet_add(pkt, l4len, l4h); if (dhcpv6(c, pkt, saddr, daddr)) continue; @@ -888,7 +888,7 @@ resume: #undef L4_SET append: - packet_add((struct pool *)&seq->p, l4_len, l4h); + packet_add((struct pool *)&seq->p, l4len, l4h); } for (j = 0, seq = tap6_l4; j < seq_count; j++, seq++) { @@ -971,7 +971,7 @@ redo: } while (n > (ssize_t)sizeof(uint32_t)) { - ssize_t len = ntohl(*(uint32_t *)p); + ssize_t l2len = ntohl(*(uint32_t *)p); p += sizeof(uint32_t); n -= sizeof(uint32_t); @@ -979,19 +979,20 @@ redo: /* At most one packet might not fit in a single read, and this * needs to be blocking. */ - if (len > n) { - rem = recv(c->fd_tap, p + n, len - n, 0); - if ((n += rem) != len) + if (l2len > n) { + rem = recv(c->fd_tap, p + n, l2len - n, 0); + if ((n += rem) != l2len) return; } /* Complete the partial read above before discarding a malformed * frame, otherwise the stream will be inconsistent. */ - if (len < (ssize_t)sizeof(*eh) || len > (ssize_t)ETH_MAX_MTU) + if (l2len < (ssize_t)sizeof(*eh) || + l2len > (ssize_t)ETH_MAX_MTU) goto next; - pcap(p, len); + pcap(p, l2len); eh = (struct ethhdr *)p; @@ -1003,18 +1004,18 @@ redo: switch (ntohs(eh->h_proto)) { case ETH_P_ARP: case ETH_P_IP: - packet_add(pool_tap4, len, p); + packet_add(pool_tap4, l2len, p); break; case ETH_P_IPV6: - packet_add(pool_tap6, len, p); + packet_add(pool_tap6, l2len, p); break; default: break; } next: - p += len; - n -= len; + p += l2len; + n -= l2len; } tap4_handler(c, pool_tap4, now); diff --git a/tap.h b/tap.h index dbc23b31..7c2e3917 100644 --- a/tap.h +++ b/tap.h @@ -41,35 +41,35 @@ static inline void *tap_frame_base(const struct ctx *c, struct tap_hdr *taph) * tap_frame_len() - Finalize tap frame and return total length * @c: Execution context * @taph: Tap header to finalize - * @plen: L2 packet length (includes L2, excludes tap specific headers) + * @l2len: L2 packet length (includes L2, excludes tap specific headers) * * Returns: length of the tap frame including tap specific headers - suitable * for an iov_len to be passed to tap_send_frames() */ static inline size_t tap_frame_len(const struct ctx *c, struct tap_hdr *taph, - size_t plen) + size_t l2len) { if (c->mode == MODE_PASST) - taph->vnet_len = htonl(plen); - return plen + tap_hdr_len_(c); + taph->vnet_len = htonl(l2len); + return l2len + tap_hdr_len_(c); } struct in_addr tap_ip4_daddr(const struct ctx *c); void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport, struct in_addr dst, in_port_t dport, - const void *in, size_t len); + const void *in, size_t dlen); void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst, - const void *in, size_t len); + const void *in, size_t l4len); const struct in6_addr *tap_ip6_daddr(const struct ctx *c, const struct in6_addr *src); void tap_udp6_send(const struct ctx *c, const struct in6_addr *src, in_port_t sport, const struct in6_addr *dst, in_port_t dport, - uint32_t flow, const void *in, size_t len); + uint32_t flow, const void *in, size_t dlen); void tap_icmp6_send(const struct ctx *c, const struct in6_addr *src, const struct in6_addr *dst, - const void *in, size_t len); -void tap_send_single(const struct ctx *c, const void *data, size_t len); + const void *in, size_t l4len); +void tap_send_single(const struct ctx *c, const void *data, size_t l2len); size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t bufs_per_frame, size_t nframes); void eth_update_mac(struct ethhdr *eh, diff --git a/tcp.c b/tcp.c index 3ba3aa4d..f8b31c17 100644 --- a/tcp.c +++ b/tcp.c @@ -891,13 +891,13 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) */ static void tcp_update_check_tcp4(const struct iphdr *iph, struct tcphdr *th) { - uint16_t tlen = ntohs(iph->tot_len) - sizeof(struct iphdr); + uint16_t l4len = ntohs(iph->tot_len) - sizeof(struct iphdr); struct in_addr saddr = { .s_addr = iph->saddr }; struct in_addr daddr = { .s_addr = iph->daddr }; - uint32_t sum = proto_ipv4_header_psum(tlen, IPPROTO_TCP, saddr, daddr); + uint32_t sum = proto_ipv4_header_psum(l4len, IPPROTO_TCP, saddr, daddr); th->check = 0; - th->check = csum(th, tlen, sum); + th->check = csum(th, l4len, sum); } /** @@ -907,12 +907,12 @@ static void tcp_update_check_tcp4(const struct iphdr *iph, struct tcphdr *th) */ static void tcp_update_check_tcp6(struct ipv6hdr *ip6h, struct tcphdr *th) { - uint16_t payload_len = ntohs(ip6h->payload_len); - uint32_t sum = proto_ipv6_header_psum(payload_len, IPPROTO_TCP, + uint16_t l4len = ntohs(ip6h->payload_len); + uint32_t sum = proto_ipv6_header_psum(l4len, IPPROTO_TCP, &ip6h->saddr, &ip6h->daddr); th->check = 0; - th->check = csum(th, payload_len, sum); + th->check = csum(th, l4len, sum); } /** @@ -1337,7 +1337,7 @@ static void tcp_fill_header(struct tcphdr *th, * @conn: Connection pointer * @iph: Pointer to IPv4 header * @th: Pointer to TCP header - * @plen: Payload length (including TCP header options) + * @dlen: TCP payload length * @check: Checksum, if already known * @seq: Sequence number for this segment * @@ -1346,27 +1346,28 @@ static void tcp_fill_header(struct tcphdr *th, static size_t tcp_fill_headers4(const struct ctx *c, const struct tcp_tap_conn *conn, struct iphdr *iph, struct tcphdr *th, - size_t plen, const uint16_t *check, + size_t dlen, const uint16_t *check, uint32_t seq) { - size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); const struct in_addr *a4 = inany_v4(&conn->faddr); + size_t l4len = dlen + sizeof(*th); + size_t l3len = l4len + sizeof(*iph); ASSERT(a4); - iph->tot_len = htons(ip_len); + iph->tot_len = htons(l3len); iph->saddr = a4->s_addr; iph->daddr = c->ip4.addr_seen.s_addr; iph->check = check ? *check : - csum_ip4_header(ip_len, IPPROTO_TCP, + csum_ip4_header(l3len, IPPROTO_TCP, *a4, c->ip4.addr_seen); tcp_fill_header(th, conn, seq); tcp_update_check_tcp4(iph, th); - return ip_len; + return l3len; } /** @@ -1375,7 +1376,7 @@ static size_t tcp_fill_headers4(const struct ctx *c, * @conn: Connection pointer * @ip6h: Pointer to IPv6 header * @th: Pointer to TCP header - * @plen: Payload length (including TCP header options) + * @dlen: TCP payload length * @check: Checksum, if already known * @seq: Sequence number for this segment * @@ -1384,11 +1385,12 @@ static size_t tcp_fill_headers4(const struct ctx *c, static size_t tcp_fill_headers6(const struct ctx *c, const struct tcp_tap_conn *conn, struct ipv6hdr *ip6h, struct tcphdr *th, - size_t plen, uint32_t seq) + size_t dlen, uint32_t seq) { - size_t ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); + size_t l4len = dlen + sizeof(*th); + size_t l3len = l4len + sizeof(*ip6h); - ip6h->payload_len = htons(plen + sizeof(struct tcphdr)); + ip6h->payload_len = htons(l4len); ip6h->saddr = conn->faddr.a6; if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr)) ip6h->daddr = c->ip6.addr_ll_seen; @@ -1407,7 +1409,7 @@ static size_t tcp_fill_headers6(const struct ctx *c, tcp_update_check_tcp6(ip6h, th); - return ip_len; + return l3len; } /** @@ -1415,7 +1417,7 @@ static size_t tcp_fill_headers6(const struct ctx *c, * @c: Execution context * @conn: Connection pointer * @iov: Pointer to an array of iovec of TCP pre-cooked buffers - * @plen: Payload length (including TCP header options) + * @dlen: TCP payload length * @check: Checksum, if already known * @seq: Sequence number for this segment * @@ -1423,25 +1425,25 @@ static size_t tcp_fill_headers6(const struct ctx *c, */ static size_t tcp_l2_buf_fill_headers(const struct ctx *c, const struct tcp_tap_conn *conn, - struct iovec *iov, size_t plen, + struct iovec *iov, size_t dlen, const uint16_t *check, uint32_t seq) { const struct in_addr *a4 = inany_v4(&conn->faddr); - size_t ip_len, tlen; + size_t l3len, l4len; if (a4) { - ip_len = tcp_fill_headers4(c, conn, iov[TCP_IOV_IP].iov_base, - iov[TCP_IOV_PAYLOAD].iov_base, plen, + l3len = tcp_fill_headers4(c, conn, iov[TCP_IOV_IP].iov_base, + iov[TCP_IOV_PAYLOAD].iov_base, dlen, check, seq); - tlen = ip_len - sizeof(struct iphdr); + l4len = l3len - sizeof(struct iphdr); } else { - ip_len = tcp_fill_headers6(c, conn, iov[TCP_IOV_IP].iov_base, - iov[TCP_IOV_PAYLOAD].iov_base, plen, + l3len = tcp_fill_headers6(c, conn, iov[TCP_IOV_IP].iov_base, + iov[TCP_IOV_PAYLOAD].iov_base, dlen, seq); - tlen = ip_len - sizeof(struct ipv6hdr); + l4len = l3len - sizeof(struct ipv6hdr); } - return tlen; + return l4len; } /** @@ -1578,7 +1580,7 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) size_t optlen = 0; struct tcphdr *th; struct iovec *iov; - size_t ip_len; + size_t l4len; char *data; if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap) && @@ -1658,11 +1660,11 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) th->syn = !!(flags & SYN); th->fin = !!(flags & FIN); - ip_len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, - conn->seq_to_tap); - iov[TCP_IOV_PAYLOAD].iov_len = ip_len; + l4len = tcp_l2_buf_fill_headers(c, conn, iov, optlen, NULL, + conn->seq_to_tap); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; - *(uint32_t *)iov[TCP_IOV_VLEN].iov_base = htonl(vnet_len + ip_len); + *(uint32_t *)iov[TCP_IOV_VLEN].iov_base = htonl(vnet_len + l4len); if (th->ack) { if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap)) @@ -2136,17 +2138,17 @@ static int tcp_sock_consume(const struct tcp_tap_conn *conn, uint32_t ack_seq) * tcp_data_to_tap() - Finalise (queue) highest-numbered scatter-gather buffer * @c: Execution context * @conn: Connection pointer - * @plen: Payload length at L4 + * @dlen: TCP payload length * @no_csum: Don't compute IPv4 checksum, use the one from previous buffer * @seq: Sequence number to be sent */ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, - ssize_t plen, int no_csum, uint32_t seq) + ssize_t dlen, int no_csum, uint32_t seq) { uint32_t *seq_update = &conn->seq_to_tap; struct iovec *iov; - size_t ip_len; uint32_t vnet_len; + size_t l4len; if (CONN_V4(conn)) { struct iovec *iov_prev = tcp4_l2_iov[tcp4_payload_used - 1]; @@ -2158,26 +2160,24 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, } tcp4_seq_update[tcp4_payload_used].seq = seq_update; - tcp4_seq_update[tcp4_payload_used].len = plen; + tcp4_seq_update[tcp4_payload_used].len = dlen; iov = tcp4_l2_iov[tcp4_payload_used++]; - ip_len = tcp_l2_buf_fill_headers(c, conn, iov, plen, check, - seq); - iov[TCP_IOV_PAYLOAD].iov_len = ip_len; - vnet_len = sizeof(struct ethhdr) + sizeof(struct iphdr) + - ip_len; + l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; + vnet_len = sizeof(struct ethhdr) + sizeof(struct iphdr) + l4len; *(uint32_t *)iov[TCP_IOV_VLEN].iov_base = htonl(vnet_len); if (tcp4_payload_used > TCP_FRAMES_MEM - 1) tcp_payload_flush(c); } else if (CONN_V6(conn)) { tcp6_seq_update[tcp6_payload_used].seq = seq_update; - tcp6_seq_update[tcp6_payload_used].len = plen; + tcp6_seq_update[tcp6_payload_used].len = dlen; iov = tcp6_l2_iov[tcp6_payload_used++]; - ip_len = tcp_l2_buf_fill_headers(c, conn, iov, plen, NULL, seq); - iov[TCP_IOV_PAYLOAD].iov_len = ip_len; - vnet_len = sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + - ip_len; + l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); + iov[TCP_IOV_PAYLOAD].iov_len = l4len; + vnet_len = sizeof(struct ethhdr) + sizeof(struct ipv6hdr) + + l4len; *(uint32_t *)iov[TCP_IOV_VLEN].iov_base = htonl(vnet_len); if (tcp6_payload_used > TCP_FRAMES_MEM - 1) tcp_payload_flush(c); @@ -2197,7 +2197,7 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) { uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; int fill_bufs, send_bufs = 0, last_len, iov_rem = 0; - int sendlen, len, plen, v4 = CONN_V4(conn); + int sendlen, len, dlen, v4 = CONN_V4(conn); int s = conn->sock, i, ret = 0; struct msghdr mh_sock = { 0 }; uint16_t mss = MSS_GET(conn); @@ -2289,16 +2289,16 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) tcp_update_seqack_wnd(c, conn, 0, NULL); /* Finally, queue to tap */ - plen = mss; + dlen = mss; seq = conn->seq_to_tap; for (i = 0; i < send_bufs; i++) { int no_csum = i && i != send_bufs - 1 && tcp4_payload_used; if (i == send_bufs - 1) - plen = last_len; + dlen = last_len; - tcp_data_to_tap(c, conn, plen, no_csum, seq); - seq += plen; + tcp_data_to_tap(c, conn, dlen, no_csum, seq); + seq += dlen; } conn_flag(c, conn, ACK_FROM_TAP_DUE); diff --git a/udp.c b/udp.c index 09f98130..2d27eaeb 100644 --- a/udp.c +++ b/udp.c @@ -570,16 +570,16 @@ static void udp_splice_sendfrom(const struct ctx *c, unsigned start, unsigned n, * @c: Execution context * @b: Pointer to udp4_l2_buf to update * @dstport: Destination port number - * @datalen: Length of UDP payload + * @dlen: Length of UDP payload * @now: Current timestamp * * Return: size of tap frame with headers */ static size_t udp_update_hdr4(const struct ctx *c, struct udp4_l2_buf_t *b, - in_port_t dstport, size_t datalen, + in_port_t dstport, size_t dlen, const struct timespec *now) { - size_t ip_len = datalen + sizeof(b->iph) + sizeof(b->uh); + size_t l3len = dlen + sizeof(b->iph) + sizeof(b->uh); in_port_t srcport = ntohs(b->s_in.sin_port); struct in_addr src = b->s_in.sin_addr; @@ -602,17 +602,17 @@ static size_t udp_update_hdr4(const struct ctx *c, struct udp4_l2_buf_t *b, src = c->ip4.gw; } - b->iph.tot_len = htons(ip_len); + b->iph.tot_len = htons(l3len); b->iph.daddr = c->ip4.addr_seen.s_addr; b->iph.saddr = src.s_addr; - b->iph.check = csum_ip4_header(ip_len, IPPROTO_UDP, + b->iph.check = csum_ip4_header(l3len, IPPROTO_UDP, src, c->ip4.addr_seen); b->uh.source = b->s_in.sin_port; b->uh.dest = htons(dstport); - b->uh.len = htons(datalen + sizeof(b->uh)); + b->uh.len = htons(dlen + sizeof(b->uh)); - return tap_frame_len(c, &b->taph, ip_len + sizeof(b->eh)); + return tap_frame_len(c, &b->taph, l3len + sizeof(b->eh)); } /** @@ -620,19 +620,19 @@ static size_t udp_update_hdr4(const struct ctx *c, struct udp4_l2_buf_t *b, * @c: Execution context * @b: Pointer to udp6_l2_buf to update * @dstport: Destination port number - * @datalen: Length of UDP payload + * @dlen: Length of UDP payload * @now: Current timestamp * * Return: size of tap frame with headers */ static size_t udp_update_hdr6(const struct ctx *c, struct udp6_l2_buf_t *b, - in_port_t dstport, size_t datalen, + in_port_t dstport, size_t dlen, const struct timespec *now) { const struct in6_addr *src = &b->s_in6.sin6_addr; const struct in6_addr *dst = &c->ip6.addr_seen; - uint16_t payload_len = datalen + sizeof(b->uh); in_port_t srcport = ntohs(b->s_in6.sin6_port); + uint16_t l4len = dlen + sizeof(b->uh); if (IN6_IS_ADDR_LINKLOCAL(src)) { dst = &c->ip6.addr_ll_seen; @@ -668,7 +668,7 @@ static size_t udp_update_hdr6(const struct ctx *c, struct udp6_l2_buf_t *b, } - b->ip6h.payload_len = htons(payload_len); + b->ip6h.payload_len = htons(l4len); b->ip6h.daddr = *dst; b->ip6h.saddr = *src; b->ip6h.version = 6; @@ -678,9 +678,9 @@ static size_t udp_update_hdr6(const struct ctx *c, struct udp6_l2_buf_t *b, b->uh.source = b->s_in6.sin6_port; b->uh.dest = htons(dstport); b->uh.len = b->ip6h.payload_len; - csum_udp6(&b->uh, src, dst, b->data, datalen); + csum_udp6(&b->uh, src, dst, b->data, dlen); - return tap_frame_len(c, &b->taph, payload_len + + return tap_frame_len(c, &b->taph, l4len + sizeof(b->ip6h) + sizeof(b->eh)); } -- 2.44.0
tcp_fill_headers[46]() compute the L3 packet length from the L4 packet length, then their caller tcp_l2_buf_fill_headers() converts it back to the L4 packet length. We can just use the L4 length throughout. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au>eewwee --- tcp.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/tcp.c b/tcp.c index f8b31c17..43206d06 100644 --- a/tcp.c +++ b/tcp.c @@ -1341,7 +1341,7 @@ static void tcp_fill_header(struct tcphdr *th, * @check: Checksum, if already known * @seq: Sequence number for this segment * - * Return: The total length of the IPv4 packet, host order + * Return: The IPv4 payload length, host order */ static size_t tcp_fill_headers4(const struct ctx *c, const struct tcp_tap_conn *conn, @@ -1367,7 +1367,7 @@ static size_t tcp_fill_headers4(const struct ctx *c, tcp_update_check_tcp4(iph, th); - return l3len; + return l4len; } /** @@ -1380,7 +1380,7 @@ static size_t tcp_fill_headers4(const struct ctx *c, * @check: Checksum, if already known * @seq: Sequence number for this segment * - * Return: The total length of the IPv6 packet, host order + * Return: The IPv6 payload length, host order */ static size_t tcp_fill_headers6(const struct ctx *c, const struct tcp_tap_conn *conn, @@ -1388,7 +1388,6 @@ static size_t tcp_fill_headers6(const struct ctx *c, size_t dlen, uint32_t seq) { size_t l4len = dlen + sizeof(*th); - size_t l3len = l4len + sizeof(*ip6h); ip6h->payload_len = htons(l4len); ip6h->saddr = conn->faddr.a6; @@ -1409,7 +1408,7 @@ static size_t tcp_fill_headers6(const struct ctx *c, tcp_update_check_tcp6(ip6h, th); - return l3len; + return l4len; } /** @@ -1429,21 +1428,16 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c, const uint16_t *check, uint32_t seq) { const struct in_addr *a4 = inany_v4(&conn->faddr); - size_t l3len, l4len; if (a4) { - l3len = tcp_fill_headers4(c, conn, iov[TCP_IOV_IP].iov_base, - iov[TCP_IOV_PAYLOAD].iov_base, dlen, - check, seq); - l4len = l3len - sizeof(struct iphdr); - } else { - l3len = tcp_fill_headers6(c, conn, iov[TCP_IOV_IP].iov_base, - iov[TCP_IOV_PAYLOAD].iov_base, dlen, - seq); - l4len = l3len - sizeof(struct ipv6hdr); + return tcp_fill_headers4(c, conn, iov[TCP_IOV_IP].iov_base, + iov[TCP_IOV_PAYLOAD].iov_base, dlen, + check, seq); } - return l4len; + return tcp_fill_headers6(c, conn, iov[TCP_IOV_IP].iov_base, + iov[TCP_IOV_PAYLOAD].iov_base, dlen, + seq); } /** -- 2.44.0
Recent changes to the TCP code (reworking of the buffer handling) have meant that it now (again) deals explicitly with the MODE_PASST specific vnet_len field, instead of using the (partial) abstractions provided by the tap layer. The abstractions we had don't work for the new TCP structure, so make some new ones that do: tap_hdr_iov() which constructs an iovec suitable for containing (just) the TAP specific header and tap_hdr_update() which updates it as necessary per-packet. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.h | 27 +++++++++++++++++++++++++++ tcp.c | 40 +++++++++++++++------------------------- 2 files changed, 42 insertions(+), 25 deletions(-) diff --git a/tap.h b/tap.h index 7c2e3917..9216d5af 100644 --- a/tap.h +++ b/tap.h @@ -16,6 +16,33 @@ struct tap_hdr { uint32_t vnet_len; } __attribute__((packed)); +/** + * tap_hdr_iov() - struct iovec for a tap header + * @c: Execution context + * @taph: Pointer to tap specific header buffer + * + * Returns: A struct iovec covering the correct portion of @taph to use as the + * tap specific header in the current configuration. + */ +static inline struct iovec tap_hdr_iov(const struct ctx *c, + struct tap_hdr *thdr) +{ + return (struct iovec){ + .iov_base = thdr, + .iov_len = c->mode == MODE_PASST ? sizeof(*thdr) : 0, + }; +} + +/** + * tap_hdr_update() - Update the tap specific header for a frame + * @taph: Tap specific header buffer to update + * @l2len: Frame length (including L2 headers) + */ +static inline void tap_hdr_update(struct tap_hdr *thdr, size_t l2len) +{ + thdr->vnet_len = htonl(l2len); +} + static inline size_t tap_hdr_len_(const struct ctx *c) { if (c->mode == MODE_PASST) diff --git a/tcp.c b/tcp.c index 43206d06..d42c01d3 100644 --- a/tcp.c +++ b/tcp.c @@ -452,7 +452,7 @@ struct tcp_flags_t { /* Ethernet header for IPv4 frames */ static struct ethhdr tcp4_eth_src; -static uint32_t tcp4_payload_vnet_len[TCP_FRAMES_MEM]; +static struct tap_hdr tcp4_payload_tap_hdr[TCP_FRAMES_MEM]; /* IPv4 headers */ static struct iphdr tcp4_payload_ip[TCP_FRAMES_MEM]; /* TCP segments with payload for IPv4 frames */ @@ -463,7 +463,7 @@ static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is greater than 65516" static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM]; static unsigned int tcp4_payload_used; -static uint32_t tcp4_flags_vnet_len[TCP_FRAMES_MEM]; +static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM]; /* IPv4 headers for TCP segment without payload */ static struct iphdr tcp4_flags_ip[TCP_FRAMES_MEM]; /* TCP segments without payload for IPv4 frames */ @@ -474,7 +474,7 @@ static unsigned int tcp4_flags_used; /* Ethernet header for IPv6 frames */ static struct ethhdr tcp6_eth_src; -static uint32_t tcp6_payload_vnet_len[TCP_FRAMES_MEM]; +static struct tap_hdr tcp6_payload_tap_hdr[TCP_FRAMES_MEM]; /* IPv6 headers */ static struct ipv6hdr tcp6_payload_ip[TCP_FRAMES_MEM]; /* TCP headers and data for IPv6 frames */ @@ -485,7 +485,7 @@ static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is greater than 65516" static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM]; static unsigned int tcp6_payload_used; -static uint32_t tcp6_flags_vnet_len[TCP_FRAMES_MEM]; +static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM]; /* IPv6 headers for TCP segment without payload */ static struct ipv6hdr tcp6_flags_ip[TCP_FRAMES_MEM]; /* TCP segment without payload for IPv6 frames */ @@ -499,14 +499,14 @@ static struct iovec iov_sock [TCP_FRAMES_MEM + 1]; /* * enum tcp_iov_parts - I/O vector parts for one TCP frame - * @TCP_IOV_VLEN virtio net header + * @TCP_IOV_TAP tap backend specific header * @TCP_IOV_ETH Ethernet header * @TCP_IOV_IP IP (v4/v6) header * @TCP_IOV_PAYLOAD IP payload (TCP header + data) * @TCP_NUM_IOVS the number of entries in the iovec array */ enum tcp_iov_parts { - TCP_IOV_VLEN = 0, + TCP_IOV_TAP = 0, TCP_IOV_ETH = 1, TCP_IOV_IP = 2, TCP_IOV_PAYLOAD = 3, @@ -953,9 +953,7 @@ static void tcp_sock4_iov_init(const struct ctx *c) for (i = 0; i < TCP_FRAMES_MEM; i++) { iov = tcp4_l2_iov[i]; - iov[TCP_IOV_VLEN].iov_base = &tcp4_payload_vnet_len[i]; - iov[TCP_IOV_VLEN].iov_len = c->mode == MODE_PASTA ? 0 : - sizeof(tcp4_payload_vnet_len[i]); + iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp4_payload_tap_hdr[i]); iov[TCP_IOV_ETH].iov_base = &tcp4_eth_src; iov[TCP_IOV_ETH].iov_len = sizeof(tcp4_eth_src); iov[TCP_IOV_IP].iov_base = &tcp4_payload_ip[i]; @@ -966,9 +964,7 @@ static void tcp_sock4_iov_init(const struct ctx *c) for (i = 0; i < TCP_FRAMES_MEM; i++) { iov = tcp4_l2_flags_iov[i]; - iov[TCP_IOV_VLEN].iov_base = &tcp4_flags_vnet_len[i]; - iov[TCP_IOV_VLEN].iov_len = c->mode == MODE_PASTA ? 0 : - sizeof(tcp4_flags_vnet_len[i]); + iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp4_flags_tap_hdr[i]); iov[TCP_IOV_ETH].iov_base = &tcp4_eth_src; iov[TCP_IOV_ETH].iov_len = sizeof(tcp4_eth_src); iov[TCP_IOV_IP].iov_base = &tcp4_flags_ip[i]; @@ -1004,9 +1000,7 @@ static void tcp_sock6_iov_init(const struct ctx *c) for (i = 0; i < TCP_FRAMES_MEM; i++) { iov = tcp6_l2_iov[i]; - iov[TCP_IOV_VLEN].iov_base = &tcp6_payload_vnet_len[i]; - iov[TCP_IOV_VLEN].iov_len = c->mode == MODE_PASTA ? 0 : - sizeof(tcp6_payload_vnet_len[i]); + iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp6_payload_tap_hdr[i]); iov[TCP_IOV_ETH].iov_base = &tcp6_eth_src; iov[TCP_IOV_ETH].iov_len = sizeof(tcp6_eth_src); iov[TCP_IOV_IP].iov_base = &tcp6_payload_ip[i]; @@ -1017,9 +1011,7 @@ static void tcp_sock6_iov_init(const struct ctx *c) for (i = 0; i < TCP_FRAMES_MEM; i++) { iov = tcp6_l2_flags_iov[i]; - iov[TCP_IOV_VLEN].iov_base = &tcp6_flags_vnet_len[i]; - iov[TCP_IOV_VLEN].iov_len = c->mode == MODE_PASTA ? 0 : - sizeof(tcp6_flags_vnet_len[i]); + iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp6_flags_tap_hdr[i]); iov[TCP_IOV_ETH].iov_base = &tcp6_eth_src; iov[TCP_IOV_ETH].iov_len = sizeof(tcp6_eth_src); iov[TCP_IOV_IP].iov_base = &tcp6_flags_ip[i]; @@ -1658,7 +1650,7 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) conn->seq_to_tap); iov[TCP_IOV_PAYLOAD].iov_len = l4len; - *(uint32_t *)iov[TCP_IOV_VLEN].iov_base = htonl(vnet_len + l4len); + tap_hdr_update(iov[TCP_IOV_TAP].iov_base, vnet_len + l4len); if (th->ack) { if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap)) @@ -2141,7 +2133,6 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, { uint32_t *seq_update = &conn->seq_to_tap; struct iovec *iov; - uint32_t vnet_len; size_t l4len; if (CONN_V4(conn)) { @@ -2159,8 +2150,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, iov = tcp4_l2_iov[tcp4_payload_used++]; l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); iov[TCP_IOV_PAYLOAD].iov_len = l4len; - vnet_len = sizeof(struct ethhdr) + sizeof(struct iphdr) + l4len; - *(uint32_t *)iov[TCP_IOV_VLEN].iov_base = htonl(vnet_len); + tap_hdr_update(iov[TCP_IOV_TAP].iov_base, l4len + + sizeof(struct iphdr) + sizeof(struct ethhdr)); if (tcp4_payload_used > TCP_FRAMES_MEM - 1) tcp_payload_flush(c); } else if (CONN_V6(conn)) { @@ -2170,9 +2161,8 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, iov = tcp6_l2_iov[tcp6_payload_used++]; l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); iov[TCP_IOV_PAYLOAD].iov_len = l4len; - vnet_len = sizeof(struct ethhdr) + sizeof(struct ipv6hdr) - + l4len; - *(uint32_t *)iov[TCP_IOV_VLEN].iov_base = htonl(vnet_len); + tap_hdr_update(iov[TCP_IOV_TAP].iov_base, l4len + + sizeof(struct ipv6hdr) + sizeof(struct ethhdr)); if (tcp6_payload_used > TCP_FRAMES_MEM - 1) tcp_payload_flush(c); } -- 2.44.0
Laurent's recent changes mean we use IO vectors much more heavily in the TCP code. In many of those cases, and few others around the code base, individual iovs of these vectors are constructed to exactly cover existing variables or fields. We can make initializing such iovs shorter and clearer with a macro for the purpose. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- iov.h | 3 +++ tap.c | 3 +-- tcp.c | 24 +++++++++--------------- udp.c | 7 +++---- 4 files changed, 16 insertions(+), 21 deletions(-) diff --git a/iov.h b/iov.h index 6058af77..5668ca5f 100644 --- a/iov.h +++ b/iov.h @@ -18,6 +18,9 @@ #include <unistd.h> #include <string.h> +#define IOV_OF_LVALUE(lval) \ + (struct iovec){ .iov_base = &(lval), .iov_len = sizeof(lval) } + size_t iov_skip_bytes(const struct iovec *iov, size_t n, size_t skip, size_t *offset); size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, diff --git a/tap.c b/tap.c index c6864ac6..91fd2e2b 100644 --- a/tap.c +++ b/tap.c @@ -79,8 +79,7 @@ void tap_send_single(const struct ctx *c, const void *data, size_t l2len) size_t iovcnt = 0; if (c->mode == MODE_PASST) { - iov[iovcnt].iov_base = &vnet_len; - iov[iovcnt].iov_len = sizeof(vnet_len); + iov[iovcnt] = IOV_OF_LVALUE(vnet_len); iovcnt++; } diff --git a/tcp.c b/tcp.c index d42c01d3..845afcef 100644 --- a/tcp.c +++ b/tcp.c @@ -290,6 +290,7 @@ #include "checksum.h" #include "util.h" +#include "iov.h" #include "ip.h" #include "passt.h" #include "tap.h" @@ -954,10 +955,8 @@ static void tcp_sock4_iov_init(const struct ctx *c) iov = tcp4_l2_iov[i]; iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp4_payload_tap_hdr[i]); - iov[TCP_IOV_ETH].iov_base = &tcp4_eth_src; - iov[TCP_IOV_ETH].iov_len = sizeof(tcp4_eth_src); - iov[TCP_IOV_IP].iov_base = &tcp4_payload_ip[i]; - iov[TCP_IOV_IP].iov_len = sizeof(tcp4_payload_ip[i]); + iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp4_eth_src); + iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp4_payload_ip[i]); iov[TCP_IOV_PAYLOAD].iov_base = &tcp4_payload[i]; } @@ -966,9 +965,8 @@ static void tcp_sock4_iov_init(const struct ctx *c) iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp4_flags_tap_hdr[i]); iov[TCP_IOV_ETH].iov_base = &tcp4_eth_src; - iov[TCP_IOV_ETH].iov_len = sizeof(tcp4_eth_src); - iov[TCP_IOV_IP].iov_base = &tcp4_flags_ip[i]; - iov[TCP_IOV_IP].iov_len = sizeof(tcp4_flags_ip[i]); + iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp4_eth_src); + iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp4_flags_ip[i]); iov[TCP_IOV_PAYLOAD].iov_base = &tcp4_flags[i]; } } @@ -1001,10 +999,8 @@ static void tcp_sock6_iov_init(const struct ctx *c) iov = tcp6_l2_iov[i]; iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp6_payload_tap_hdr[i]); - iov[TCP_IOV_ETH].iov_base = &tcp6_eth_src; - iov[TCP_IOV_ETH].iov_len = sizeof(tcp6_eth_src); - iov[TCP_IOV_IP].iov_base = &tcp6_payload_ip[i]; - iov[TCP_IOV_IP].iov_len = sizeof(tcp6_payload_ip[i]); + iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp6_eth_src); + iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp6_payload_ip[i]); iov[TCP_IOV_PAYLOAD].iov_base = &tcp6_payload[i]; } @@ -1012,10 +1008,8 @@ static void tcp_sock6_iov_init(const struct ctx *c) iov = tcp6_l2_flags_iov[i]; iov[TCP_IOV_TAP] = tap_hdr_iov(c, &tcp6_flags_tap_hdr[i]); - iov[TCP_IOV_ETH].iov_base = &tcp6_eth_src; - iov[TCP_IOV_ETH].iov_len = sizeof(tcp6_eth_src); - iov[TCP_IOV_IP].iov_base = &tcp6_flags_ip[i]; - iov[TCP_IOV_IP].iov_len = sizeof(tcp6_flags_ip[i]); + iov[TCP_IOV_ETH] = IOV_OF_LVALUE(tcp6_eth_src); + iov[TCP_IOV_IP] = IOV_OF_LVALUE(tcp6_flags_ip[i]); iov[TCP_IOV_PAYLOAD].iov_base = &tcp6_flags[i]; } } diff --git a/udp.c b/udp.c index 2d27eaeb..7186fae9 100644 --- a/udp.c +++ b/udp.c @@ -113,6 +113,7 @@ #include "checksum.h" #include "util.h" +#include "iov.h" #include "ip.h" #include "siphash.h" #include "inany.h" @@ -315,8 +316,7 @@ static void udp_sock4_iov_init_one(const struct ctx *c, size_t i) .iph = L2_BUF_IP4_INIT(IPPROTO_UDP) }; - siov->iov_base = buf->data; - siov->iov_len = sizeof(buf->data); + *siov = IOV_OF_LVALUE(buf->data); mh->msg_name = &buf->s_in; mh->msg_namelen = sizeof(buf->s_in); @@ -343,8 +343,7 @@ static void udp_sock6_iov_init_one(const struct ctx *c, size_t i) .ip6h = L2_BUF_IP6_INIT(IPPROTO_UDP) }; - siov->iov_base = buf->data; - siov->iov_len = sizeof(buf->data); + *siov = IOV_OF_LVALUE(buf->data); mh->msg_name = &buf->s_in6; mh->msg_namelen = sizeof(buf->s_in6); -- 2.44.0
tcp_fill_headers[46]() fill most of the headers, but the tap specific header (the frame length for qemu sockets) is filled in afterwards. Filling this as well: * Removes a little redundancy between the tcp_send_flag() and tcp_data_to_tap() path * Makes calculation of the correct length a little easier * Removes the now misleadingly named 'vnet_len' variable in tcp_send_flag() Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/tcp.c b/tcp.c index 845afcef..21d0af06 100644 --- a/tcp.c +++ b/tcp.c @@ -1321,6 +1321,7 @@ static void tcp_fill_header(struct tcphdr *th, * tcp_fill_headers4() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers * @c: Execution context * @conn: Connection pointer + * @taph: tap backend specific header * @iph: Pointer to IPv4 header * @th: Pointer to TCP header * @dlen: TCP payload length @@ -1331,6 +1332,7 @@ static void tcp_fill_header(struct tcphdr *th, */ static size_t tcp_fill_headers4(const struct ctx *c, const struct tcp_tap_conn *conn, + struct tap_hdr *taph, struct iphdr *iph, struct tcphdr *th, size_t dlen, const uint16_t *check, uint32_t seq) @@ -1353,6 +1355,8 @@ static size_t tcp_fill_headers4(const struct ctx *c, tcp_update_check_tcp4(iph, th); + tap_hdr_update(taph, l3len + sizeof(struct ethhdr)); + return l4len; } @@ -1360,6 +1364,7 @@ static size_t tcp_fill_headers4(const struct ctx *c, * tcp_fill_headers6() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers * @c: Execution context * @conn: Connection pointer + * @taph: tap backend specific header * @ip6h: Pointer to IPv6 header * @th: Pointer to TCP header * @dlen: TCP payload length @@ -1370,6 +1375,7 @@ static size_t tcp_fill_headers4(const struct ctx *c, */ static size_t tcp_fill_headers6(const struct ctx *c, const struct tcp_tap_conn *conn, + struct tap_hdr *taph, struct ipv6hdr *ip6h, struct tcphdr *th, size_t dlen, uint32_t seq) { @@ -1394,6 +1400,8 @@ static size_t tcp_fill_headers6(const struct ctx *c, tcp_update_check_tcp6(ip6h, th); + tap_hdr_update(taph, l4len + sizeof(*ip6h) + sizeof(struct ethhdr)); + return l4len; } @@ -1416,12 +1424,14 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c, const struct in_addr *a4 = inany_v4(&conn->faddr); if (a4) { - return tcp_fill_headers4(c, conn, iov[TCP_IOV_IP].iov_base, + return tcp_fill_headers4(c, conn, iov[TCP_IOV_TAP].iov_base, + iov[TCP_IOV_IP].iov_base, iov[TCP_IOV_PAYLOAD].iov_base, dlen, check, seq); } - return tcp_fill_headers6(c, conn, iov[TCP_IOV_IP].iov_base, + return tcp_fill_headers6(c, conn, iov[TCP_IOV_TAP].iov_base, + iov[TCP_IOV_IP].iov_base, iov[TCP_IOV_PAYLOAD].iov_base, dlen, seq); } @@ -1556,7 +1566,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) struct tcp_info tinfo = { 0 }; socklen_t sl = sizeof(tinfo); int s = conn->sock; - uint32_t vnet_len; size_t optlen = 0; struct tcphdr *th; struct iovec *iov; @@ -1583,13 +1592,10 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) if (!tcp_update_seqack_wnd(c, conn, flags, &tinfo) && !flags) return 0; - if (CONN_V4(conn)) { + if (CONN_V4(conn)) iov = tcp4_l2_flags_iov[tcp4_flags_used++]; - vnet_len = sizeof(struct ethhdr) + sizeof(struct iphdr); - } else { + else iov = tcp6_l2_flags_iov[tcp6_flags_used++]; - vnet_len = sizeof(struct ethhdr) + sizeof(struct ipv6hdr); - } payload = iov[TCP_IOV_PAYLOAD].iov_base; th = &payload->th; @@ -1644,8 +1650,6 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) conn->seq_to_tap); iov[TCP_IOV_PAYLOAD].iov_len = l4len; - tap_hdr_update(iov[TCP_IOV_TAP].iov_base, vnet_len + l4len); - if (th->ack) { if (SEQ_GE(conn->seq_ack_to_tap, conn->seq_from_tap)) conn_flag(c, conn, ~ACK_TO_TAP_DUE); @@ -2144,8 +2148,6 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, iov = tcp4_l2_iov[tcp4_payload_used++]; l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, check, seq); iov[TCP_IOV_PAYLOAD].iov_len = l4len; - tap_hdr_update(iov[TCP_IOV_TAP].iov_base, l4len - + sizeof(struct iphdr) + sizeof(struct ethhdr)); if (tcp4_payload_used > TCP_FRAMES_MEM - 1) tcp_payload_flush(c); } else if (CONN_V6(conn)) { @@ -2155,8 +2157,6 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, iov = tcp6_l2_iov[tcp6_payload_used++]; l4len = tcp_l2_buf_fill_headers(c, conn, iov, dlen, NULL, seq); iov[TCP_IOV_PAYLOAD].iov_len = l4len; - tap_hdr_update(iov[TCP_IOV_TAP].iov_base, l4len - + sizeof(struct ipv6hdr) + sizeof(struct ethhdr)); if (tcp6_payload_used > TCP_FRAMES_MEM - 1) tcp_payload_flush(c); } -- 2.44.0
On Wed, 1 May 2024 16:53:43 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Laurent's changes to split TCP buffers into various components with IOVs is now merged. This series has a batch of small cleanups to make the handling of this slightly nicer. These are preliminaries to doing something similar with the UDP buffers. Note that patch 10/10 might interfere with the experiments to work out what is going wrong with the odd batching / performance issues we've seen. We can leave it off for the time being if that's a problem.I think it's quick to locally revert for tests if needed.Changes since v1: * Added new patch removing some unused structures * Added two new patches cleaning up some endianness confusion * Added a bunch of missing cases to standardisation of length variable names * Assorted minor revisions based on Stefano's review David Gibson (10): checksum: Use proto_ipv6_header_psum() for ICMPv6 as well tap: Split tap specific and L2 (ethernet) headers tap: Remove unused structs tap_msg, tap_l4_msg treewide: Remove misleading and redundant endianness notes checksum: Make csum_ip4_header() take a host endian length treewide: Standardise variable names for various packet lengths tcp: Simplify packet length calculation when preparing headers tap, tcp: (Re-)abstract TAP specific header handling iov: Helper macro to construct iovs covering existing variables or fields tcp: Update tap specific header too in tcp_fill_headers[46]()Applied. -- Stefano