Here's a current version of my IOV tail and some cleanups to TCP buffer handling based on it. Now rebased on top of v14 of the vhost-user patches. This was aimed at sharing more code between the "buffer" and vhost-user paths, but that turned out to be trickier than I anticipated, so it hasn't really been accomplished. Nonetheless I think these are reasonable cleanups on their own merits, and may yet make sharing some more code between the paths easier in future. David Gibson (7): iov: iov tail helpers iov, checksum: Replace csum_iov() with csum_iov_tail() tcp: Pass TCP header and payload separately to tcp_update_check_tcp[46]() tcp: Pass TCP header and payload separately to tcp_fill_headers[46]() tcp: Merge tcp_update_check_tcp[46]() tcp: Merge tcp_fill_headers[46]() with each other tcp_vu: Remove unnecessary tcp_vu_update_check() function checksum.c | 58 +++++-------- checksum.h | 8 +- iov.c | 91 ++++++++++++++++++++ iov.h | 76 ++++++++++++++++ tap.c | 6 +- tcp.c | 229 +++++++++++++------------------------------------ tcp_buf.c | 33 ++++--- tcp_internal.h | 21 ++--- tcp_vu.c | 120 ++++++++++---------------- udp.c | 7 +- udp_vu.c | 9 +- 11 files changed, 338 insertions(+), 320 deletions(-) -- 2.47.0
In the vhost-user code we have a number of places where we need to locate a particular header within the guest-supplied IO vector. We need to work out which buffer the header is in, and verify that it's contiguous and aligned as we need. At the moment this is open-coded, but introduce a helper to make this more straightforward. We add a new datatype 'struct iov_tail' representing an IO vector from which we've logically consumed some number of headers. The IOV_PULL_HEADER macro consumes a new header from the vector, returning a pointer and updating the iov_tail. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- iov.c | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ iov.h | 76 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 169 insertions(+) diff --git a/iov.c b/iov.c index 3741db2..4c6416c 100644 --- a/iov.c +++ b/iov.c @@ -155,3 +155,96 @@ size_t iov_size(const struct iovec *iov, size_t iov_cnt) return len; } + +/** + * iov_tail_prune() - Remove any unneeded buffers from an IOV tail + * @tail: IO vector tail (modified) + * + * If an IOV tail's offset is large enough, it may not include any bytes from + * the first (or first several) buffers in the underlying IO vector. Modify the + * tail's representation so it contains the same logical bytes, but only + * includes buffers that are actually needed. This will avoid stepping through + * unnecessary elements of the underlying IO vector on future operations. + * + * Return: true if the tail still contains any bytes, otherwise false + */ +bool iov_tail_prune(struct iov_tail *tail) +{ + size_t i; + + i = iov_skip_bytes(tail->iov, tail->cnt, tail->off, &tail->off); + tail->iov += i; + tail->cnt -= i; + + return !!tail->cnt; +} + +/** + * iov_tail_size - Calculate the total size of an IO vector tail + * @tail: IO vector tail + * + * Returns: The total size in bytes. + */ +/* cppcheck-suppress unusedFunction */ +size_t iov_tail_size(struct iov_tail *tail) +{ + iov_tail_prune(tail); + return iov_size(tail->iov, tail->cnt) - tail->off; +} + +/** + * iov_peek_header_() - Get pointer to a header from an IOV tail + * @tail: IOV tail to get header from + * @len: Length of header to get, in bytes + * @align: Required alignment of header, in bytes + * + * @tail may be pruned, but will represent the same bytes as before. + * + * Returns: Pointer to the first @len logical bytes of the tail, NULL if that + * overruns the IO vector, is not contiguous or doesn't have the + * requested alignment. + */ +void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align) +{ + char *p; + + if (!iov_tail_prune(tail)) + return NULL; /* Nothing left */ + + if (tail->off + len < tail->off) + return NULL; /* Overflow */ + + if (tail->off + len > tail->iov[0].iov_len) + return NULL; /* Not contiguous */ + + p = (char *)tail->iov[0].iov_base + tail->off; + if ((uintptr_t)p % align) + return NULL; /* not aligned */ + + return p; +} + +/** + * iov_remove_header_() - Remove a header from an IOV tail + * @tail: IOV tail to remove header from (modified) + * @len: Length of header to remove, in bytes + * @align: Required alignment of header, in bytes + * + * On success, @tail is updated so that it longer includes the bytes of the + * returned header. + * + * Returns: Pointer to the first @len logical bytes of the tail, NULL if that + * overruns the IO vector, is not contiguous or doesn't have the + * requested alignment. + */ +/* cppcheck-suppress unusedFunction */ +void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align) +{ + char *p = iov_peek_header_(tail, len, align); + + if (!p) + return NULL; + + tail->off = tail->off + len; + return p; +} diff --git a/iov.h b/iov.h index a9e1722..9855bf0 100644 --- a/iov.h +++ b/iov.h @@ -28,4 +28,80 @@ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, size_t offset, void *buf, size_t bytes); size_t iov_size(const struct iovec *iov, size_t iov_cnt); + +/* + * DOC: Theory of Operation, struct iov_tail + * + * Sometimes a single logical network frame is split across multiple buffers, + * represented by an IO vector (struct iovec[]). We often want to process this + * one header / network layer at a time. So, it's useful to maintain a "tail" + * of the vector representing the parts we haven't yet extracted. + * + * The headers we extract need not line up with buffer boundaries (though we do + * assume they're contiguous within a single buffer for now). So, we could + * represent that tail as another struct iovec[], but that would mean copying + * the whole array of struct iovecs, just so we can adjust the offset and length + * on the first one. + * + * So, instead represent the tail as pointer into an existing struct iovec[], + * with an explicit offset for where the "tail" starts within it. If we extract + * enough headers that some buffers of the original vector no longer contain + * part of the tail, we (lazily) advance our struct iovec * to the first buffer + * we still need, and adjust the vector length and offset to match. + */ + +/** + * struct iov_tail - An IO vector which may have some headers logically removed + * @iov: IO vector + * @cnt: Number of entries in @iov + * @off: Current offset in @iov + */ +struct iov_tail { + const struct iovec *iov; + size_t cnt, off; +}; + +/** + * IOV_TAIL() - Create a new IOV tail + * @iov_: IO vector to create tail from + * @cnt_: Length of the IO vector at @iov_ + * @off_: Byte offset in the IO vector where the tail begins + */ +#define IOV_TAIL(iov_, cnt_, off_) \ + (struct iov_tail){ .iov = (iov_), .cnt = (cnt_), .off = (off_) } + +bool iov_tail_prune(struct iov_tail *tail); +size_t iov_tail_size(struct iov_tail *tail); +void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align); +void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align); + +/** + * IOV_PEEK_HEADER() - Get typed pointer to a header from an IOV tail + * @tail_: IOV tail to get header from + * @type_: Data type of the header + * + * @tail_ may be pruned, but will represent the same bytes as before. + * + * Returns: Pointer of type (@type_ *) located at the start of @tail_, NULL if + * we can't get a contiguous and aligned pointer. + */ +#define IOV_PEEK_HEADER(tail_, type_) \ + ((type_ *)(iov_peek_header_((tail_), \ + sizeof(type_), __alignof__(type_)))) + +/** + * IOV_REMOVE_HEADER() - Remove and return typed header from an IOV tail + * @tail_: IOV tail to remove header from (modified) + * @type_: Data type of the header to remove + * + * On success, @tail_ is updated so that it longer includes the bytes of the + * returned header. + * + * Returns: Pointer of type (@type_ *) located at the old start of @tail_, NULL + * if we can't get a contiguous and aligned pointer. + */ +#define IOV_REMOVE_HEADER(tail_, type_) \ + ((type_ *)(iov_remove_header_((tail_), \ + sizeof(type_), __alignof__(type_)))) + #endif /* IOVEC_H */ -- 2.47.0
We usually want to checksum only the tail part of a frame, excluding at least some headers. csum_iov() does that for a frame represented as an IO vector, not actually summing the entire IO vector. We now have struct iov_tail to explicitly represent this construct, so replace csum_iov() with csum_iov_tail() taking that representation rather than 3 parameters. We propagate the same change to csum_udp4() and csum_udp6() which take similar parameters. This slightly simplifies the code, and will allow some further simplifications as struct iov_tail is more widely used. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- checksum.c | 58 ++++++++++++++++++++++-------------------------------- checksum.h | 8 ++++---- iov.c | 1 - tap.c | 6 ++++-- tcp.c | 6 ++++-- udp.c | 7 ++++--- udp_vu.c | 9 +++++---- 7 files changed, 44 insertions(+), 51 deletions(-) diff --git a/checksum.c b/checksum.c index c673993..1c4354d 100644 --- a/checksum.c +++ b/checksum.c @@ -166,24 +166,22 @@ uint32_t proto_ipv4_header_psum(uint16_t l4len, uint8_t protocol, * @udp4hr: UDP header, initialised apart from checksum * @saddr: IPv4 source address * @daddr: IPv4 destination address - * @iov: Pointer to the array of IO vectors - * @iov_cnt: Length of the array - * @offset: UDP payload offset in the iovec array + * @data: UDP payload (as IO vector tail) */ void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, - const struct iovec *iov, int iov_cnt, size_t offset) + struct iov_tail *data) { /* UDP checksums are optional, so don't bother */ udp4hr->check = 0; if (UDP4_REAL_CHECKSUMS) { - uint16_t l4len = iov_size(iov, iov_cnt) - offset + - sizeof(struct udphdr); + uint16_t l4len = iov_tail_size(data) + sizeof(struct udphdr); uint32_t psum = proto_ipv4_header_psum(l4len, IPPROTO_UDP, saddr, daddr); + psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum); - udp4hr->check = csum_iov(iov, iov_cnt, offset, psum); + udp4hr->check = csum_iov_tail(data, psum); } } @@ -231,22 +229,20 @@ uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, * @udp6hr: UDP header, initialised apart from checksum * @saddr: Source address * @daddr: Destination address - * @iov: Pointer to the array of IO vectors - * @iov_cnt: Length of the array - * @offset: UDP payload offset in the iovec array + * @data: UDP payload (as IO vector tail) */ void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, - const struct iovec *iov, int iov_cnt, size_t offset) + struct iov_tail *data) { - uint16_t l4len = iov_size(iov, iov_cnt) - offset + - sizeof(struct udphdr); + uint16_t l4len = iov_tail_size(data) + sizeof(struct udphdr); uint32_t psum = proto_ipv6_header_psum(l4len, IPPROTO_UDP, saddr, daddr); + udp6hr->check = 0; psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum); - udp6hr->check = csum_iov(iov, iov_cnt, offset, psum); + udp6hr->check = csum_iov_tail(data, psum); } /** @@ -501,31 +497,23 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) } /** - * csum_iov() - Calculates the unfolded checksum over an array of IO vectors - * - * @iov Pointer to the array of IO vectors - * @n Length of the array - * @offset: Offset of the data to checksum within the full data length + * csum_iov_tail() - Calculate unfolded checksum for the tail of an IO vector + * @tail: IO vector tail to checksum * @init Initial 32-bit checksum, 0 for no pre-computed checksum * * Return: 16-bit folded, complemented checksum */ -uint16_t csum_iov(const struct iovec *iov, size_t n, size_t offset, - uint32_t init) +uint16_t csum_iov_tail(struct iov_tail *tail, uint32_t init) { - unsigned int i; - size_t first; - - i = iov_skip_bytes(iov, n, offset, &first); - if (i >= n) - return (uint16_t)~csum_fold(init); - - init = csum_unfolded((char *)iov[i].iov_base + first, - iov[i].iov_len - first, init); - i++; - - for (; i < n; i++) - init = csum_unfolded(iov[i].iov_base, iov[i].iov_len, init); - + if (iov_tail_prune(tail)) { + size_t i; + + init = csum_unfolded((char *)tail->iov[0].iov_base + tail->off, + tail->iov[0].iov_len - tail->off, init); + for (i = 1; i < tail->cnt; i++) { + const struct iovec *iov = &tail->iov[i]; + init = csum_unfolded(iov->iov_base, iov->iov_len, init); + } + } return (uint16_t)~csum_fold(init); } diff --git a/checksum.h b/checksum.h index 31ba322..e243c97 100644 --- a/checksum.h +++ b/checksum.h @@ -9,6 +9,7 @@ struct udphdr; struct icmphdr; struct icmp6hdr; +struct iov_tail; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); @@ -19,20 +20,19 @@ uint32_t proto_ipv4_header_psum(uint16_t l4len, uint8_t protocol, struct in_addr saddr, struct in_addr daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, - const struct iovec *iov, int iov_cnt, size_t offset); + struct iov_tail *data); void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t dlen); uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, const struct in6_addr *saddr, const struct in6_addr *daddr); void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, - const struct iovec *iov, int iov_cnt, size_t offset); + struct iov_tail *data); void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t dlen); uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init); uint16_t csum(const void *buf, size_t len, uint32_t init); -uint16_t csum_iov(const struct iovec *iov, size_t n, size_t offset, - uint32_t init); +uint16_t csum_iov_tail(struct iov_tail *tail, uint32_t init); #endif /* CHECKSUM_H */ diff --git a/iov.c b/iov.c index 4c6416c..2f7be15 100644 --- a/iov.c +++ b/iov.c @@ -185,7 +185,6 @@ bool iov_tail_prune(struct iov_tail *tail) * * Returns: The total size in bytes. */ -/* cppcheck-suppress unusedFunction */ size_t iov_tail_size(struct iov_tail *tail) { iov_tail_prune(tail); diff --git a/tap.c b/tap.c index 386f0bc..3035610 100644 --- a/tap.c +++ b/tap.c @@ -184,11 +184,12 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport, .iov_base = (void *)in, .iov_len = dlen }; + struct iov_tail payload = IOV_TAIL(&iov, 1, 0); uh->source = htons(sport); uh->dest = htons(dport); uh->len = htons(l4len); - csum_udp4(uh, src, dst, &iov, 1, 0); + csum_udp4(uh, src, dst, &payload); memcpy(data, in, dlen); tap_send_single(c, buf, dlen + (data - buf)); @@ -271,11 +272,12 @@ void tap_udp6_send(const struct ctx *c, .iov_base = in, .iov_len = dlen }; + struct iov_tail payload = IOV_TAIL(&iov, 1, 0); uh->source = htons(sport); uh->dest = htons(dport); uh->len = htons(l4len); - csum_udp6(uh, src, dst, &iov, 1, 0); + csum_udp6(uh, src, dst, &payload); memcpy(data, in, dlen); tap_send_single(c, buf, dlen + (data - buf)); diff --git a/tcp.c b/tcp.c index 61c12a5..f334ca5 100644 --- a/tcp.c +++ b/tcp.c @@ -764,6 +764,7 @@ void tcp_update_check_tcp4(const struct iphdr *iph, size_t l4offset) { uint16_t l4len = ntohs(iph->tot_len) - sizeof(struct iphdr); + struct iov_tail l4 = IOV_TAIL(iov, iov_cnt, l4offset); struct in_addr saddr = { .s_addr = iph->saddr }; struct in_addr daddr = { .s_addr = iph->daddr }; size_t check_ofs; @@ -801,7 +802,7 @@ void tcp_update_check_tcp4(const struct iphdr *iph, check = (uint16_t *)ptr; *check = 0; - *check = csum_iov(iov, iov_cnt, l4offset, sum); + *check = csum_iov_tail(&l4, sum); } /** @@ -815,6 +816,7 @@ void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, const struct iovec *iov, int iov_cnt, size_t l4offset) { + struct iov_tail l4 = IOV_TAIL(iov, iov_cnt, l4offset); uint16_t l4len = ntohs(ip6h->payload_len); size_t check_ofs; uint16_t *check; @@ -852,7 +854,7 @@ void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, check = (uint16_t *)ptr; *check = 0; - *check = csum_iov(iov, iov_cnt, l4offset, sum); + *check = csum_iov_tail(&l4, sum); } /** diff --git a/udp.c b/udp.c index 5b0093a..c89f031 100644 --- a/udp.c +++ b/udp.c @@ -316,7 +316,8 @@ size_t udp_update_hdr4(struct iphdr *ip4h, struct udp_payload_t *bp, .iov_base = bp->data, .iov_len = dlen }; - csum_udp4(&bp->uh, *src, *dst, &iov, 1, 0); + struct iov_tail data = IOV_TAIL(&iov, 1, 0); + csum_udp4(&bp->uh, *src, *dst, &data); } return l4len; @@ -360,8 +361,8 @@ size_t udp_update_hdr6(struct ipv6hdr *ip6h, struct udp_payload_t *bp, .iov_base = bp->data, .iov_len = dlen }; - csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, - &iov, 1, 0); + struct iov_tail data = IOV_TAIL(&iov, 1, 0); + csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data); } return l4len; diff --git a/udp_vu.c b/udp_vu.c index c911022..9c697f3 100644 --- a/udp_vu.c +++ b/udp_vu.c @@ -199,15 +199,16 @@ static void udp_vu_csum(const struct flowside *toside, int iov_used) const struct in_addr *dst4 = inany_v4(&toside->eaddr); char *base = iov_vu[0].iov_base; struct udp_payload_t *bp; + struct iov_tail data; if (src4 && dst4) { bp = vu_payloadv4(base); - csum_udp4(&bp->uh, *src4, *dst4, iov_vu, iov_used, - (char *)&bp->data - base); + data = IOV_TAIL(iov_vu, iov_used, (char *)&bp->data - base); + csum_udp4(&bp->uh, *src4, *dst4, &data); } else { bp = vu_payloadv6(base); - csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, - iov_vu, iov_used, (char *)&bp->data - base); + data = IOV_TAIL(iov_vu, iov_used, (char *)&bp->data - base); + csum_udp6(&bp->uh, &toside->oaddr.a6, &toside->eaddr.a6, &data); } } -- 2.47.0
Currently these expects both the TCP header and payload in a single IOV, and goes to some trouble to locate the checksum field within it. In the current caller we've already know where the TCP header is, so we might as well just pass it in. This will need to work a bit differently for vhost-user, but that code already needs to locate the TCP header for other reasons, so again we can just pass it in. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 106 ++++++++++--------------------------------------- tcp_internal.h | 10 ++--- tcp_vu.c | 12 ++++-- 3 files changed, 34 insertions(+), 94 deletions(-) diff --git a/tcp.c b/tcp.c index f334ca5..5c40e18 100644 --- a/tcp.c +++ b/tcp.c @@ -755,106 +755,42 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) /** * tcp_update_check_tcp4() - Calculate TCP checksum for IPv4 * @iph: IPv4 header - * @iov: Pointer to the array of IO vectors - * @iov_cnt: Length of the array - * @l4offset: IPv4 payload offset in the iovec array + * @th: TCP header (updated) + * @payload: TCP payload */ -void tcp_update_check_tcp4(const struct iphdr *iph, - const struct iovec *iov, int iov_cnt, - size_t l4offset) +void tcp_update_check_tcp4(const struct iphdr *iph, struct tcphdr *th, + struct iov_tail *payload) { uint16_t l4len = ntohs(iph->tot_len) - sizeof(struct iphdr); - struct iov_tail l4 = IOV_TAIL(iov, iov_cnt, l4offset); struct in_addr saddr = { .s_addr = iph->saddr }; struct in_addr daddr = { .s_addr = iph->daddr }; - size_t check_ofs; - uint16_t *check; - int check_idx; uint32_t sum; - char *ptr; sum = proto_ipv4_header_psum(l4len, IPPROTO_TCP, saddr, daddr); - check_idx = iov_skip_bytes(iov, iov_cnt, - l4offset + offsetof(struct tcphdr, check), - &check_ofs); - - if (check_idx >= iov_cnt) { - err("TCP4 buffer is too small, iov size %zd, check offset %zd", - iov_size(iov, iov_cnt), - l4offset + offsetof(struct tcphdr, check)); - return; - } - - if (check_ofs + sizeof(*check) > iov[check_idx].iov_len) { - err("TCP4 checksum field memory is not contiguous " - "check_ofs %zd check_idx %d iov_len %zd", - check_ofs, check_idx, iov[check_idx].iov_len); - return; - } - - ptr = (char *)iov[check_idx].iov_base + check_ofs; - if ((uintptr_t)ptr & (__alignof__(*check) - 1)) { - err("TCP4 checksum field is not correctly aligned in memory"); - return; - } - - check = (uint16_t *)ptr; - - *check = 0; - *check = csum_iov_tail(&l4, sum); + th->check = 0; + sum = csum_unfolded(th, sizeof(*th), sum); + th->check = csum_iov_tail(payload, sum); } /** * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6 * @ip6h: IPv6 header - * @iov: Pointer to the array of IO vectors - * @iov_cnt: Length of the array - * @l4offset: IPv6 payload offset in the iovec array + * @th: TCP header (updated) + * @payload: TCP payload */ -void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, - const struct iovec *iov, int iov_cnt, - size_t l4offset) +void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, struct tcphdr *th, + struct iov_tail *payload) { - struct iov_tail l4 = IOV_TAIL(iov, iov_cnt, l4offset); uint16_t l4len = ntohs(ip6h->payload_len); - size_t check_ofs; - uint16_t *check; - int check_idx; uint32_t sum; - char *ptr; sum = proto_ipv6_header_psum(l4len, IPPROTO_TCP, &ip6h->saddr, &ip6h->daddr); - check_idx = iov_skip_bytes(iov, iov_cnt, - l4offset + offsetof(struct tcphdr, check), - &check_ofs); - - if (check_idx >= iov_cnt) { - err("TCP6 buffer is too small, iov size %zd, check offset %zd", - iov_size(iov, iov_cnt), - l4offset + offsetof(struct tcphdr, check)); - return; - } - - if (check_ofs + sizeof(*check) > iov[check_idx].iov_len) { - err("TCP6 checksum field memory is not contiguous " - "check_ofs %zd check_idx %d iov_len %zd", - check_ofs, check_idx, iov[check_idx].iov_len); - return; - } - - ptr = (char *)iov[check_idx].iov_base + check_ofs; - if ((uintptr_t)ptr & (__alignof__(*check) - 1)) { - err("TCP6 checksum field is not correctly aligned in memory"); - return; - } - - check = (uint16_t *)ptr; - - *check = 0; - *check = csum_iov_tail(&l4, sum); + th->check = 0; + sum = csum_unfolded(th, sizeof(*th), sum); + th->check = csum_iov_tail(payload, sum); } /** @@ -1005,11 +941,12 @@ void tcp_fill_headers4(const struct tcp_tap_conn *conn, bp->th.check = 0; } else { const struct iovec iov = { - .iov_base = bp, - .iov_len = ntohs(iph->tot_len) - sizeof(struct iphdr), + .iov_base = bp->data, + .iov_len = dlen, }; + struct iov_tail payload = IOV_TAIL(&iov, 1, 0); - tcp_update_check_tcp4(iph, &iov, 1, 0); + tcp_update_check_tcp4(iph, &bp->th, &payload); } tap_hdr_update(taph, l3len + sizeof(struct ethhdr)); @@ -1052,11 +989,12 @@ void tcp_fill_headers6(const struct tcp_tap_conn *conn, bp->th.check = 0; } else { const struct iovec iov = { - .iov_base = bp, - .iov_len = ntohs(ip6h->payload_len) + .iov_base = bp->data, + .iov_len = dlen, }; + struct iov_tail payload = IOV_TAIL(&iov, 1, 0); - tcp_update_check_tcp6(ip6h, &iov, 1, 0); + tcp_update_check_tcp6(ip6h, &bp->th, &payload); } tap_hdr_update(taph, l4len + sizeof(*ip6h) + sizeof(struct ethhdr)); diff --git a/tcp_internal.h b/tcp_internal.h index d7b125f..744c5c0 100644 --- a/tcp_internal.h +++ b/tcp_internal.h @@ -162,12 +162,10 @@ void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn); struct tcp_info_linux; -void tcp_update_check_tcp4(const struct iphdr *iph, - const struct iovec *iov, int iov_cnt, - size_t l4offset); -void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, - const struct iovec *iov, int iov_cnt, - size_t l4offset); +void tcp_update_check_tcp4(const struct iphdr *iph, struct tcphdr *th, + struct iov_tail *payload); +void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, struct tcphdr *th, + struct iov_tail *payload); void tcp_fill_headers4(const struct tcp_tap_conn *conn, struct tap_hdr *taph, struct iphdr *iph, struct tcp_payload_t *bp, size_t dlen, diff --git a/tcp_vu.c b/tcp_vu.c index 05e2d1d..10d2464 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -72,15 +72,19 @@ static void tcp_vu_update_check(const struct flowside *tapside, char *base = iov[0].iov_base; if (inany_v4(&tapside->oaddr)) { + struct tcphdr *th = vu_payloadv4(base); const struct iphdr *iph = vu_ip(base); + struct iov_tail payload = IOV_TAIL(iov, iov_cnt, + (char *)(th + 1) - base); - tcp_update_check_tcp4(iph, iov, iov_cnt, - (char *)vu_payloadv4(base) - base); + tcp_update_check_tcp4(iph, th, &payload); } else { + struct tcphdr *th = vu_payloadv6(base); const struct ipv6hdr *ip6h = vu_ip(base); + struct iov_tail payload = IOV_TAIL(iov, iov_cnt, + (char *)(th + 1) - base); - tcp_update_check_tcp6(ip6h, iov, iov_cnt, - (char *)vu_payloadv6(base) - base); + tcp_update_check_tcp6(ip6h, th, &payload); } } -- 2.47.0
At the moment these take separate pointers to the tap specific and IP headers, but expect the TCP header and payload as a single tcp_payload_t. As well as being slightly inconsistent, this involves some slightly iffy pointer shenanigans when called on the flags path with a tcp_flags_t instead of a tcp_payload_t. More importantly, it's inconvenient for the upcoming vhost-user case, where the TCP header and payload might not be contiguous. Furthermore, the payload itself might not be contiguous. So, pass the TCP header as its own pointer, and the TCP payload as an IO vector. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- iov.c | 1 - tcp.c | 50 +++++++++++++++------------------------ tcp_buf.c | 22 ++++++++---------- tcp_internal.h | 4 ++-- tcp_vu.c | 63 ++++++++++++++++++++++++++++---------------------- 5 files changed, 65 insertions(+), 75 deletions(-) diff --git a/iov.c b/iov.c index 2f7be15..3b12272 100644 --- a/iov.c +++ b/iov.c @@ -236,7 +236,6 @@ void *iov_peek_header_(struct iov_tail *tail, size_t len, size_t align) * overruns the IO vector, is not contiguous or doesn't have the * requested alignment. */ -/* cppcheck-suppress unusedFunction */ void *iov_remove_header_(struct iov_tail *tail, size_t len, size_t align) { char *p = iov_peek_header_(tail, len, align); diff --git a/tcp.c b/tcp.c index 5c40e18..2f900fc 100644 --- a/tcp.c +++ b/tcp.c @@ -909,21 +909,21 @@ static void tcp_fill_header(struct tcphdr *th, * @conn: Connection pointer * @taph: tap backend specific header * @iph: Pointer to IPv4 header - * @bp: Pointer to TCP header followed by TCP payload - * @dlen: TCP payload length + * @th: Pointer to TCP header + * @payload: TCP payload * @check: Checksum, if already known * @seq: Sequence number for this segment * @no_tcp_csum: Do not set TCP checksum */ void tcp_fill_headers4(const struct tcp_tap_conn *conn, struct tap_hdr *taph, struct iphdr *iph, - struct tcp_payload_t *bp, size_t dlen, + struct tcphdr *th, struct iov_tail *payload, const uint16_t *check, uint32_t seq, bool no_tcp_csum) { const struct flowside *tapside = TAPFLOW(conn); const struct in_addr *src4 = inany_v4(&tapside->oaddr); const struct in_addr *dst4 = inany_v4(&tapside->eaddr); - size_t l4len = dlen + sizeof(bp->th); + size_t l4len = iov_tail_size(payload) + sizeof(*th); size_t l3len = l4len + sizeof(*iph); ASSERT(src4 && dst4); @@ -935,19 +935,12 @@ void tcp_fill_headers4(const struct tcp_tap_conn *conn, iph->check = check ? *check : csum_ip4_header(l3len, IPPROTO_TCP, *src4, *dst4); - tcp_fill_header(&bp->th, conn, seq); + tcp_fill_header(th, conn, seq); - if (no_tcp_csum) { - bp->th.check = 0; - } else { - const struct iovec iov = { - .iov_base = bp->data, - .iov_len = dlen, - }; - struct iov_tail payload = IOV_TAIL(&iov, 1, 0); - - tcp_update_check_tcp4(iph, &bp->th, &payload); - } + if (no_tcp_csum) + th->check = 0; + else + tcp_update_check_tcp4(iph, th, payload); tap_hdr_update(taph, l3len + sizeof(struct ethhdr)); } @@ -957,19 +950,19 @@ void tcp_fill_headers4(const struct tcp_tap_conn *conn, * @conn: Connection pointer * @taph: tap backend specific header * @ip6h: Pointer to IPv6 header - * @bp: Pointer to TCP header followed by TCP payload - * @dlen: TCP payload length + * @th: Pointer to TCP header + * @payload: TCP payload * @check: Checksum, if already known * @seq: Sequence number for this segment * @no_tcp_csum: Do not set TCP checksum */ void tcp_fill_headers6(const struct tcp_tap_conn *conn, struct tap_hdr *taph, struct ipv6hdr *ip6h, - struct tcp_payload_t *bp, size_t dlen, + struct tcphdr *th, struct iov_tail *payload, uint32_t seq, bool no_tcp_csum) { + size_t l4len = iov_tail_size(payload) + sizeof(*th); const struct flowside *tapside = TAPFLOW(conn); - size_t l4len = dlen + sizeof(bp->th); ip6h->payload_len = htons(l4len); ip6h->saddr = tapside->oaddr.a6; @@ -983,19 +976,12 @@ void tcp_fill_headers6(const struct tcp_tap_conn *conn, ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff; ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff; - tcp_fill_header(&bp->th, conn, seq); + tcp_fill_header(th, conn, seq); - if (no_tcp_csum) { - bp->th.check = 0; - } else { - const struct iovec iov = { - .iov_base = bp->data, - .iov_len = dlen, - }; - struct iov_tail payload = IOV_TAIL(&iov, 1, 0); - - tcp_update_check_tcp6(ip6h, &bp->th, &payload); - } + if (no_tcp_csum) + th->check = 0; + else + tcp_update_check_tcp6(ip6h, th, payload); tap_hdr_update(taph, l4len + sizeof(*ip6h) + sizeof(struct ethhdr)); } diff --git a/tcp_buf.c b/tcp_buf.c index 0946cd5..830c23d 100644 --- a/tcp_buf.c +++ b/tcp_buf.c @@ -151,29 +151,27 @@ void tcp_payload_flush(const struct ctx *c) * tcp_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers * @conn: Connection pointer * @iov: Pointer to an array of iovec of TCP pre-cooked buffers - * @dlen: TCP payload length * @check: Checksum, if already known * @seq: Sequence number for this segment * @no_tcp_csum: Do not set TCP checksum */ static void tcp_l2_buf_fill_headers(const struct tcp_tap_conn *conn, - struct iovec *iov, size_t dlen, - const uint16_t *check, uint32_t seq, - bool no_tcp_csum) + struct iovec *iov, const uint16_t *check, + uint32_t seq, bool no_tcp_csum) { + struct iov_tail tail = IOV_TAIL(&iov[TCP_IOV_PAYLOAD], 1, 0); + struct tcphdr *th = IOV_REMOVE_HEADER(&tail, struct tcphdr); const struct flowside *tapside = TAPFLOW(conn); const struct in_addr *a4 = inany_v4(&tapside->oaddr); if (a4) { tcp_fill_headers4(conn, iov[TCP_IOV_TAP].iov_base, - iov[TCP_IOV_IP].iov_base, - iov[TCP_IOV_PAYLOAD].iov_base, dlen, - check, seq, no_tcp_csum); + iov[TCP_IOV_IP].iov_base, th, + &tail, check, seq, no_tcp_csum); } else { tcp_fill_headers6(conn, iov[TCP_IOV_TAP].iov_base, - iov[TCP_IOV_IP].iov_base, - iov[TCP_IOV_PAYLOAD].iov_base, dlen, - seq, no_tcp_csum); + iov[TCP_IOV_IP].iov_base, th, + &tail, seq, no_tcp_csum); } } @@ -213,7 +211,7 @@ int tcp_buf_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) tcp_payload_used++; l4len = optlen + sizeof(struct tcphdr); iov[TCP_IOV_PAYLOAD].iov_len = l4len; - tcp_l2_buf_fill_headers(conn, iov, optlen, NULL, seq, false); + tcp_l2_buf_fill_headers(conn, iov, NULL, seq, false); if (flags & DUP_ACK) { struct iovec *dup_iov = tcp_l2_iov[tcp_payload_used++]; @@ -270,7 +268,7 @@ static void tcp_data_to_tap(const struct ctx *c, struct tcp_tap_conn *conn, payload->th.th_flags = 0; payload->th.ack = 1; iov[TCP_IOV_PAYLOAD].iov_len = dlen + sizeof(struct tcphdr); - tcp_l2_buf_fill_headers(conn, iov, dlen, check, seq, false); + tcp_l2_buf_fill_headers(conn, iov, check, seq, false); if (++tcp_payload_used > TCP_FRAMES_MEM - 1) tcp_payload_flush(c); } diff --git a/tcp_internal.h b/tcp_internal.h index 744c5c0..9732b5b 100644 --- a/tcp_internal.h +++ b/tcp_internal.h @@ -168,11 +168,11 @@ void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, struct tcphdr *th, struct iov_tail *payload); void tcp_fill_headers4(const struct tcp_tap_conn *conn, struct tap_hdr *taph, struct iphdr *iph, - struct tcp_payload_t *bp, size_t dlen, + struct tcphdr *th, struct iov_tail *payload, const uint16_t *check, uint32_t seq, bool no_tcp_csum); void tcp_fill_headers6(const struct tcp_tap_conn *conn, struct tap_hdr *taph, struct ipv6hdr *ip6h, - struct tcp_payload_t *bp, size_t dlen, + struct tcphdr *th, struct iov_tail *payload, uint32_t seq, bool no_tcp_csum); int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, diff --git a/tcp_vu.c b/tcp_vu.c index 10d2464..a4ec512 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -103,10 +103,12 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) const struct flowside *tapside = TAPFLOW(conn); size_t optlen, hdrlen; struct vu_virtq_element flags_elem[2]; - struct tcp_payload_t *payload; struct ipv6hdr *ip6h = NULL; struct iovec flags_iov[2]; + struct tcp_syn_opts *opts; struct iphdr *iph = NULL; + struct iov_tail payload; + struct tcphdr *th; struct ethhdr *eh; uint32_t seq; int elem_cnt; @@ -138,35 +140,35 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) iph = vu_ip(flags_elem[0].in_sg[0].iov_base); *iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_TCP); - payload = vu_payloadv4(flags_elem[0].in_sg[0].iov_base); + th = vu_payloadv4(flags_elem[0].in_sg[0].iov_base); } else { eh->h_proto = htons(ETH_P_IPV6); ip6h = vu_ip(flags_elem[0].in_sg[0].iov_base); *ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_TCP); - payload = vu_payloadv6(flags_elem[0].in_sg[0].iov_base); + th = vu_payloadv6(flags_elem[0].in_sg[0].iov_base); } - memset(&payload->th, 0, sizeof(payload->th)); - payload->th.doff = offsetof(struct tcp_payload_t, data) / 4; - payload->th.ack = 1; + memset(th, 0, sizeof(*th)); + th->doff = sizeof(*th) / 4; + th->ack = 1; seq = conn->seq_to_tap; - ret = tcp_prepare_flags(c, conn, flags, &payload->th, - (struct tcp_syn_opts *)payload->data, - &optlen); + opts = (struct tcp_syn_opts *)(th + 1); + ret = tcp_prepare_flags(c, conn, flags, th, opts, &optlen); if (ret <= 0) { vu_queue_rewind(vq, 1); return ret; } flags_elem[0].in_sg[0].iov_len = hdrlen + optlen; + payload = IOV_TAIL(flags_elem[0].in_sg, 1, hdrlen); if (CONN_V4(conn)) { - tcp_fill_headers4(conn, NULL, iph, payload, optlen, NULL, seq, - true); + tcp_fill_headers4(conn, NULL, iph, th, &payload, + NULL, seq, true); } else { - tcp_fill_headers6(conn, NULL, ip6h, payload, optlen, seq, true); + tcp_fill_headers6(conn, NULL, ip6h, th, &payload, seq, true); } if (*c->pcap) { @@ -316,23 +318,28 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, * tcp_vu_prepare() - Prepare the frame header * @c: Execution context * @conn: Connection pointer - * @first: Pointer to the array of IO vectors - * @dlen: Packet data length + * @iov: Pointer to the array of IO vectors + * @iov_cnt: Number of entries in @iov * @check: Checksum, if already known */ -static void tcp_vu_prepare(const struct ctx *c, - struct tcp_tap_conn *conn, char *base, - size_t dlen, const uint16_t **check) +static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn, + struct iovec *iov, size_t iov_cnt, + const uint16_t **check) { const struct flowside *toside = TAPFLOW(conn); - struct tcp_payload_t *payload; + bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)); + size_t hdrlen = tcp_vu_hdrlen(v6); + struct iov_tail payload = IOV_TAIL(iov, iov_cnt, hdrlen); + char *base = iov[0].iov_base; struct ipv6hdr *ip6h = NULL; struct iphdr *iph = NULL; + struct tcphdr *th; struct ethhdr *eh; /* we guess the first iovec provided by the guest can embed * all the headers needed by L2 frame */ + ASSERT(iov[0].iov_len >= hdrlen); eh = vu_eth(base); @@ -341,31 +348,31 @@ static void tcp_vu_prepare(const struct ctx *c, /* initialize header */ - if (inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)) { + if (!v6) { eh->h_proto = htons(ETH_P_IP); iph = vu_ip(base); *iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_TCP); - payload = vu_payloadv4(base); + th = vu_payloadv4(base); } else { eh->h_proto = htons(ETH_P_IPV6); ip6h = vu_ip(base); *ip6h = (struct ipv6hdr)L2_BUF_IP6_INIT(IPPROTO_TCP); - payload = vu_payloadv6(base); + th = vu_payloadv6(base); } - memset(&payload->th, 0, sizeof(payload->th)); - payload->th.doff = offsetof(struct tcp_payload_t, data) / 4; - payload->th.ack = 1; + memset(th, 0, sizeof(*th)); + th->doff = sizeof(*th) / 4; + th->ack = 1; - if (inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)) { - tcp_fill_headers4(conn, NULL, iph, payload, dlen, + if (!v6) { + tcp_fill_headers4(conn, NULL, iph, th, &payload, *check, conn->seq_to_tap, true); *check = &iph->check; } else { - tcp_fill_headers6(conn, NULL, ip6h, payload, dlen, + tcp_fill_headers6(conn, NULL, ip6h, th, &payload, conn->seq_to_tap, true); } } @@ -477,7 +484,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) if (i + 1 == head_cnt) check = NULL; - tcp_vu_prepare(c, conn, iov->iov_base, dlen, &check); + tcp_vu_prepare(c, conn, iov, buf_cnt, &check); if (*c->pcap) { tcp_vu_update_check(tapside, iov, buf_cnt); -- 2.47.0
The only reason we need separate functions for the IPv4 and IPv6 case is to calculate the checksum of the IP pseudo-header, which is different for the two cases. However, the caller already knows which path it's on and can access the values needed for the pseudo-header partial sum more easily than tcp_update_check_tcp[46]() can. So, merge these functions into a single tcp_update_csum() function that just takes the pseudo-header partial sum, calculated in the caller. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 59 +++++++++++++++++--------------------------------- tcp_internal.h | 6 ++--- tcp_vu.c | 22 ++++++++++++------- 3 files changed, 36 insertions(+), 51 deletions(-) diff --git a/tcp.c b/tcp.c index 2f900fc..482e460 100644 --- a/tcp.c +++ b/tcp.c @@ -753,44 +753,16 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) } /** - * tcp_update_check_tcp4() - Calculate TCP checksum for IPv4 - * @iph: IPv4 header + * tcp_update_csum() - Calculate TCP checksum + * @psum: Unfolded partial checksum of the IPv4 or IPv6 pseudo-header * @th: TCP header (updated) * @payload: TCP payload */ -void tcp_update_check_tcp4(const struct iphdr *iph, struct tcphdr *th, - struct iov_tail *payload) +void tcp_update_csum(uint32_t psum, struct tcphdr *th, struct iov_tail *payload) { - uint16_t l4len = ntohs(iph->tot_len) - sizeof(struct iphdr); - struct in_addr saddr = { .s_addr = iph->saddr }; - struct in_addr daddr = { .s_addr = iph->daddr }; - uint32_t sum; - - sum = proto_ipv4_header_psum(l4len, IPPROTO_TCP, saddr, daddr); - - th->check = 0; - sum = csum_unfolded(th, sizeof(*th), sum); - th->check = csum_iov_tail(payload, sum); -} - -/** - * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6 - * @ip6h: IPv6 header - * @th: TCP header (updated) - * @payload: TCP payload - */ -void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, struct tcphdr *th, - struct iov_tail *payload) -{ - uint16_t l4len = ntohs(ip6h->payload_len); - uint32_t sum; - - sum = proto_ipv6_header_psum(l4len, IPPROTO_TCP, &ip6h->saddr, - &ip6h->daddr); - th->check = 0; - sum = csum_unfolded(th, sizeof(*th), sum); - th->check = csum_iov_tail(payload, sum); + psum = csum_unfolded(th, sizeof(*th), psum); + th->check = csum_iov_tail(payload, psum); } /** @@ -937,10 +909,14 @@ void tcp_fill_headers4(const struct tcp_tap_conn *conn, tcp_fill_header(th, conn, seq); - if (no_tcp_csum) + if (no_tcp_csum) { th->check = 0; - else - tcp_update_check_tcp4(iph, th, payload); + } else { + uint32_t psum = proto_ipv4_header_psum(l4len, IPPROTO_TCP, + *src4, *dst4); + + tcp_update_csum(psum, th, payload); + } tap_hdr_update(taph, l3len + sizeof(struct ethhdr)); } @@ -978,10 +954,15 @@ void tcp_fill_headers6(const struct tcp_tap_conn *conn, tcp_fill_header(th, conn, seq); - if (no_tcp_csum) + if (no_tcp_csum) { th->check = 0; - else - tcp_update_check_tcp6(ip6h, th, payload); + } else { + uint32_t psum = proto_ipv6_header_psum(l4len, IPPROTO_TCP, + &ip6h->saddr, + &ip6h->daddr); + + tcp_update_csum(psum, th, payload); + } tap_hdr_update(taph, l4len + sizeof(*ip6h) + sizeof(struct ethhdr)); } diff --git a/tcp_internal.h b/tcp_internal.h index 9732b5b..cff06e0 100644 --- a/tcp_internal.h +++ b/tcp_internal.h @@ -162,10 +162,8 @@ void tcp_rst_do(const struct ctx *c, struct tcp_tap_conn *conn); struct tcp_info_linux; -void tcp_update_check_tcp4(const struct iphdr *iph, struct tcphdr *th, - struct iov_tail *payload); -void tcp_update_check_tcp6(const struct ipv6hdr *ip6h, struct tcphdr *th, - struct iov_tail *payload); +void tcp_update_csum(uint32_t psum, struct tcphdr *th, + struct iov_tail *payload); void tcp_fill_headers4(const struct tcp_tap_conn *conn, struct tap_hdr *taph, struct iphdr *iph, struct tcphdr *th, struct iov_tail *payload, diff --git a/tcp_vu.c b/tcp_vu.c index a4ec512..081c752 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -70,22 +70,28 @@ static void tcp_vu_update_check(const struct flowside *tapside, struct iovec *iov, int iov_cnt) { char *base = iov[0].iov_base; + struct iov_tail payload; + struct tcphdr *th; + uint32_t psum; if (inany_v4(&tapside->oaddr)) { - struct tcphdr *th = vu_payloadv4(base); + const struct in_addr *src4 = inany_v4(&tapside->oaddr); + const struct in_addr *dst4 = inany_v4(&tapside->eaddr); const struct iphdr *iph = vu_ip(base); - struct iov_tail payload = IOV_TAIL(iov, iov_cnt, - (char *)(th + 1) - base); + size_t l4len = ntohs(iph->tot_len) - sizeof(*iph); - tcp_update_check_tcp4(iph, th, &payload); + th = vu_payloadv4(base); + psum = proto_ipv4_header_psum(l4len, IPPROTO_TCP, *src4, *dst4); } else { - struct tcphdr *th = vu_payloadv6(base); const struct ipv6hdr *ip6h = vu_ip(base); - struct iov_tail payload = IOV_TAIL(iov, iov_cnt, - (char *)(th + 1) - base); + size_t l4len = ntohs(ip6h->payload_len); - tcp_update_check_tcp6(ip6h, th, &payload); + th = vu_payloadv6(base); + psum = proto_ipv6_header_psum(l4len, IPPROTO_TCP, + &ip6h->saddr, &ip6h->daddr); } + payload = IOV_TAIL(iov, iov_cnt, (char *)(th + 1) - base); + tcp_update_csum(psum, th, &payload); } /** -- 2.47.0
We have different versions of this function for IPv4 and IPv6, but the caller already requires some IP version specific code to get the right header pointers. Instead, have a common function that fills either an IPv4 or an IPv6 header based on which header pointer it is passed. This allows us to remove a small amount of code duplication and make a few slightly ugly conditionals. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 108 ++++++++++++++++++++++--------------------------- tcp_buf.c | 19 ++++----- tcp_internal.h | 13 +++--- tcp_vu.c | 32 ++++++--------- 4 files changed, 75 insertions(+), 97 deletions(-) diff --git a/tcp.c b/tcp.c index 482e460..1872ccb 100644 --- a/tcp.c +++ b/tcp.c @@ -877,94 +877,82 @@ static void tcp_fill_header(struct tcphdr *th, } /** - * tcp_fill_headers4() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers + * tcp_fill_headers() - Fill 802.3, IP, TCP headers * @conn: Connection pointer * @taph: tap backend specific header - * @iph: Pointer to IPv4 header + * @ip4h: Pointer to IPv4 header, or NULL + * @ip6h: Pointer to IPv6 header, or NULL * @th: Pointer to TCP header * @payload: TCP payload - * @check: Checksum, if already known + * @ip4_check: IPv4 checksum, if already known * @seq: Sequence number for this segment * @no_tcp_csum: Do not set TCP checksum */ -void tcp_fill_headers4(const struct tcp_tap_conn *conn, - struct tap_hdr *taph, struct iphdr *iph, - struct tcphdr *th, struct iov_tail *payload, - const uint16_t *check, uint32_t seq, bool no_tcp_csum) +void tcp_fill_headers(const struct tcp_tap_conn *conn, + struct tap_hdr *taph, + struct iphdr *ip4h, struct ipv6hdr *ip6h, + struct tcphdr *th, struct iov_tail *payload, + const uint16_t *ip4_check, uint32_t seq, bool no_tcp_csum) { const struct flowside *tapside = TAPFLOW(conn); - const struct in_addr *src4 = inany_v4(&tapside->oaddr); - const struct in_addr *dst4 = inany_v4(&tapside->eaddr); size_t l4len = iov_tail_size(payload) + sizeof(*th); - size_t l3len = l4len + sizeof(*iph); + size_t l3len = l4len; + uint32_t psum = 0; - ASSERT(src4 && dst4); + if (ip4h) { + const struct in_addr *src4 = inany_v4(&tapside->oaddr); + const struct in_addr *dst4 = inany_v4(&tapside->eaddr); - iph->tot_len = htons(l3len); - iph->saddr = src4->s_addr; - iph->daddr = dst4->s_addr; + ASSERT(src4 && dst4); - iph->check = check ? *check : - csum_ip4_header(l3len, IPPROTO_TCP, *src4, *dst4); + l3len += + sizeof(*ip4h); - tcp_fill_header(th, conn, seq); + ip4h->tot_len = htons(l3len); + ip4h->saddr = src4->s_addr; + ip4h->daddr = dst4->s_addr; - if (no_tcp_csum) { - th->check = 0; - } else { - uint32_t psum = proto_ipv4_header_psum(l4len, IPPROTO_TCP, - *src4, *dst4); + if (ip4_check) + ip4h->check = *ip4_check; + else + ip4h->check = csum_ip4_header(l3len, IPPROTO_TCP, + *src4, *dst4); - tcp_update_csum(psum, th, payload); + if (!no_tcp_csum) { + psum = proto_ipv4_header_psum(l4len, IPPROTO_TCP, + *src4, *dst4); + } } - tap_hdr_update(taph, l3len + sizeof(struct ethhdr)); -} + if (ip6h) { + l3len += sizeof(*ip6h); -/** - * tcp_fill_headers6() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers - * @conn: Connection pointer - * @taph: tap backend specific header - * @ip6h: Pointer to IPv6 header - * @th: Pointer to TCP header - * @payload: TCP payload - * @check: Checksum, if already known - * @seq: Sequence number for this segment - * @no_tcp_csum: Do not set TCP checksum - */ -void tcp_fill_headers6(const struct tcp_tap_conn *conn, - struct tap_hdr *taph, struct ipv6hdr *ip6h, - struct tcphdr *th, struct iov_tail *payload, - uint32_t seq, bool no_tcp_csum) -{ - size_t l4len = iov_tail_size(payload) + sizeof(*th); - const struct flowside *tapside = TAPFLOW(conn); + ip6h->payload_len = htons(l4len); + ip6h->saddr = tapside->oaddr.a6; + ip6h->daddr = tapside->eaddr.a6; - ip6h->payload_len = htons(l4len); - ip6h->saddr = tapside->oaddr.a6; - ip6h->daddr = tapside->eaddr.a6; + ip6h->hop_limit = 255; + ip6h->version = 6; + ip6h->nexthdr = IPPROTO_TCP; - ip6h->hop_limit = 255; - ip6h->version = 6; - ip6h->nexthdr = IPPROTO_TCP; + ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf; + ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff; + ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff; - ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf; - ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff; - ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff; + if (!no_tcp_csum) { + psum = proto_ipv6_header_psum(l4len, IPPROTO_TCP, + &ip6h->saddr, + &ip6h->daddr); + } + } tcp_fill_header(th, conn, seq); - if (no_tcp_csum) { + if (no_tcp_csum) th->check = 0; - } else { - uint32_t psum = proto_ipv6_header_psum(l4len, IPPROTO_TCP, - &ip6h->saddr, - &ip6h->daddr); - + else tcp_update_csum(psum, th, payload); - } - tap_hdr_update(taph, l4len + sizeof(*ip6h) + sizeof(struct ethhdr)); + tap_hdr_update(taph, l3len + sizeof(struct ethhdr)); } /** diff --git a/tcp_buf.c b/tcp_buf.c index 830c23d..a975a55 100644 --- a/tcp_buf.c +++ b/tcp_buf.c @@ -161,18 +161,19 @@ static void tcp_l2_buf_fill_headers(const struct tcp_tap_conn *conn, { struct iov_tail tail = IOV_TAIL(&iov[TCP_IOV_PAYLOAD], 1, 0); struct tcphdr *th = IOV_REMOVE_HEADER(&tail, struct tcphdr); + struct tap_hdr *taph = iov[TCP_IOV_TAP].iov_base; const struct flowside *tapside = TAPFLOW(conn); const struct in_addr *a4 = inany_v4(&tapside->oaddr); + struct ipv6hdr *ip6h = NULL; + struct iphdr *ip4h = NULL; - if (a4) { - tcp_fill_headers4(conn, iov[TCP_IOV_TAP].iov_base, - iov[TCP_IOV_IP].iov_base, th, - &tail, check, seq, no_tcp_csum); - } else { - tcp_fill_headers6(conn, iov[TCP_IOV_TAP].iov_base, - iov[TCP_IOV_IP].iov_base, th, - &tail, seq, no_tcp_csum); - } + if (a4) + ip4h = iov[TCP_IOV_IP].iov_base; + else + ip6h = iov[TCP_IOV_IP].iov_base; + + tcp_fill_headers(conn, taph, ip4h, ip6h, th, &tail, + check, seq, no_tcp_csum); } /** diff --git a/tcp_internal.h b/tcp_internal.h index cff06e0..94e5780 100644 --- a/tcp_internal.h +++ b/tcp_internal.h @@ -164,14 +164,11 @@ struct tcp_info_linux; void tcp_update_csum(uint32_t psum, struct tcphdr *th, struct iov_tail *payload); -void tcp_fill_headers4(const struct tcp_tap_conn *conn, - struct tap_hdr *taph, struct iphdr *iph, - struct tcphdr *th, struct iov_tail *payload, - const uint16_t *check, uint32_t seq, bool no_tcp_csum); -void tcp_fill_headers6(const struct tcp_tap_conn *conn, - struct tap_hdr *taph, struct ipv6hdr *ip6h, - struct tcphdr *th, struct iov_tail *payload, - uint32_t seq, bool no_tcp_csum); +void tcp_fill_headers(const struct tcp_tap_conn *conn, + struct tap_hdr *taph, + struct iphdr *ip4h, struct ipv6hdr *ip6h, + struct tcphdr *th, struct iov_tail *payload, + const uint16_t *ip4_check, uint32_t seq, bool no_tcp_csum); int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, bool force_seq, struct tcp_info_linux *tinfo); diff --git a/tcp_vu.c b/tcp_vu.c index 081c752..a611a5b 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -110,9 +110,9 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) size_t optlen, hdrlen; struct vu_virtq_element flags_elem[2]; struct ipv6hdr *ip6h = NULL; + struct iphdr *ip4h = NULL; struct iovec flags_iov[2]; struct tcp_syn_opts *opts; - struct iphdr *iph = NULL; struct iov_tail payload; struct tcphdr *th; struct ethhdr *eh; @@ -143,8 +143,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) if (CONN_V4(conn)) { eh->h_proto = htons(ETH_P_IP); - iph = vu_ip(flags_elem[0].in_sg[0].iov_base); - *iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_TCP); + ip4h = vu_ip(flags_elem[0].in_sg[0].iov_base); + *ip4h = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_TCP); th = vu_payloadv4(flags_elem[0].in_sg[0].iov_base); } else { @@ -170,12 +170,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) flags_elem[0].in_sg[0].iov_len = hdrlen + optlen; payload = IOV_TAIL(flags_elem[0].in_sg, 1, hdrlen); - if (CONN_V4(conn)) { - tcp_fill_headers4(conn, NULL, iph, th, &payload, - NULL, seq, true); - } else { - tcp_fill_headers6(conn, NULL, ip6h, th, &payload, seq, true); - } + tcp_fill_headers(conn, NULL, ip4h, ip6h, th, &payload, + NULL, seq, true); if (*c->pcap) { tcp_vu_update_check(tapside, &flags_elem[0].in_sg[0], 1); @@ -338,7 +334,7 @@ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn, struct iov_tail payload = IOV_TAIL(iov, iov_cnt, hdrlen); char *base = iov[0].iov_base; struct ipv6hdr *ip6h = NULL; - struct iphdr *iph = NULL; + struct iphdr *ip4h = NULL; struct tcphdr *th; struct ethhdr *eh; @@ -357,8 +353,8 @@ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn, if (!v6) { eh->h_proto = htons(ETH_P_IP); - iph = vu_ip(base); - *iph = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_TCP); + ip4h = vu_ip(base); + *ip4h = (struct iphdr)L2_BUF_IP4_INIT(IPPROTO_TCP); th = vu_payloadv4(base); } else { eh->h_proto = htons(ETH_P_IPV6); @@ -373,14 +369,10 @@ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn, th->doff = sizeof(*th) / 4; th->ack = 1; - if (!v6) { - tcp_fill_headers4(conn, NULL, iph, th, &payload, - *check, conn->seq_to_tap, true); - *check = &iph->check; - } else { - tcp_fill_headers6(conn, NULL, ip6h, th, &payload, - conn->seq_to_tap, true); - } + tcp_fill_headers(conn, NULL, ip4h, ip6h, th, &payload, + *check, conn->seq_to_tap, true); + if (ip4h) + *check = &ip4h->check; } /** -- 2.47.0
Because the vhost-user <-> virtio-net path ignores checksums, we usually don't calculate them when sending packets to the guest. So, we always pass no_tcp_csum=true to tcp_fill_headers(). We do want accurate checksums when capturing packets though, so the captures don't show bogus values. Currently we handle this by updating the checksum field immediately before writing the packet to the capture file, using tcp_vu_update_check(). This is unnecessary, though: in each case tcp_fill_headers() is called not very long before, so we can alter its no_tcp_csum parameter pased on whether we're generating captures or not. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp_vu.c | 59 +++++++++++--------------------------------------------- 1 file changed, 11 insertions(+), 48 deletions(-) diff --git a/tcp_vu.c b/tcp_vu.c index a611a5b..4c60897 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -60,40 +60,6 @@ static size_t tcp_vu_hdrlen(bool v6) return hdrlen; } -/** - * tcp_vu_update_check() - Calculate TCP checksum - * @tapside: Address information for one side of the flow - * @iov: Pointer to the array of IO vectors - * @iov_cnt: Length of the array - */ -static void tcp_vu_update_check(const struct flowside *tapside, - struct iovec *iov, int iov_cnt) -{ - char *base = iov[0].iov_base; - struct iov_tail payload; - struct tcphdr *th; - uint32_t psum; - - if (inany_v4(&tapside->oaddr)) { - const struct in_addr *src4 = inany_v4(&tapside->oaddr); - const struct in_addr *dst4 = inany_v4(&tapside->eaddr); - const struct iphdr *iph = vu_ip(base); - size_t l4len = ntohs(iph->tot_len) - sizeof(*iph); - - th = vu_payloadv4(base); - psum = proto_ipv4_header_psum(l4len, IPPROTO_TCP, *src4, *dst4); - } else { - const struct ipv6hdr *ip6h = vu_ip(base); - size_t l4len = ntohs(ip6h->payload_len); - - th = vu_payloadv6(base); - psum = proto_ipv6_header_psum(l4len, IPPROTO_TCP, - &ip6h->saddr, &ip6h->daddr); - } - payload = IOV_TAIL(iov, iov_cnt, (char *)(th + 1) - base); - tcp_update_csum(psum, th, &payload); -} - /** * tcp_vu_send_flag() - Send segment with flags to vhost-user (no payload) * @c: Execution context @@ -106,7 +72,6 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) { struct vu_dev *vdev = c->vdev; struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; - const struct flowside *tapside = TAPFLOW(conn); size_t optlen, hdrlen; struct vu_virtq_element flags_elem[2]; struct ipv6hdr *ip6h = NULL; @@ -171,10 +136,9 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) payload = IOV_TAIL(flags_elem[0].in_sg, 1, hdrlen); tcp_fill_headers(conn, NULL, ip4h, ip6h, th, &payload, - NULL, seq, true); + NULL, seq, !*c->pcap); if (*c->pcap) { - tcp_vu_update_check(tapside, &flags_elem[0].in_sg[0], 1); pcap_iov(&flags_elem[0].in_sg[0], 1, sizeof(struct virtio_net_hdr_mrg_rxbuf)); } @@ -318,15 +282,16 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, /** * tcp_vu_prepare() - Prepare the frame header - * @c: Execution context - * @conn: Connection pointer - * @iov: Pointer to the array of IO vectors - * @iov_cnt: Number of entries in @iov - * @check: Checksum, if already known + * @c: Execution context + * @conn: Connection pointer + * @iov: Pointer to the array of IO vectors + * @iov_cnt: Number of entries in @iov + * @check: Checksum, if already known + * @no_tcp_csum: Do not set TCP checksum */ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn, struct iovec *iov, size_t iov_cnt, - const uint16_t **check) + const uint16_t **check, bool no_tcp_csum) { const struct flowside *toside = TAPFLOW(conn); bool v6 = !(inany_v4(&toside->eaddr) && inany_v4(&toside->oaddr)); @@ -370,7 +335,7 @@ static void tcp_vu_prepare(const struct ctx *c, struct tcp_tap_conn *conn, th->ack = 1; tcp_fill_headers(conn, NULL, ip4h, ip6h, th, &payload, - *check, conn->seq_to_tap, true); + *check, conn->seq_to_tap, no_tcp_csum); if (ip4h) *check = &ip4h->check; } @@ -388,8 +353,7 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) uint32_t wnd_scaled = conn->wnd_from_tap << conn->ws_from_tap; struct vu_dev *vdev = c->vdev; struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; - const struct flowside *tapside = TAPFLOW(conn); - size_t fillsize, hdrlen; + size_t hdrlen, fillsize; int v6 = CONN_V6(conn); uint32_t already_sent; const uint16_t *check; @@ -482,10 +446,9 @@ int tcp_vu_data_from_sock(const struct ctx *c, struct tcp_tap_conn *conn) if (i + 1 == head_cnt) check = NULL; - tcp_vu_prepare(c, conn, iov, buf_cnt, &check); + tcp_vu_prepare(c, conn, iov, buf_cnt, &check, !*c->pcap); if (*c->pcap) { - tcp_vu_update_check(tapside, iov, buf_cnt); pcap_iov(iov, buf_cnt, sizeof(struct virtio_net_hdr_mrg_rxbuf)); } -- 2.47.0
On Wed, 27 Nov 2024 14:54:03 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Here's a current version of my IOV tail and some cleanups to TCP buffer handling based on it. Now rebased on top of v14 of the vhost-user patches. This was aimed at sharing more code between the "buffer" and vhost-user paths, but that turned out to be trickier than I anticipated, so it hasn't really been accomplished. Nonetheless I think these are reasonable cleanups on their own merits, and may yet make sharing some more code between the paths easier in future. David Gibson (7): iov: iov tail helpers iov, checksum: Replace csum_iov() with csum_iov_tail() tcp: Pass TCP header and payload separately to tcp_update_check_tcp[46]() tcp: Pass TCP header and payload separately to tcp_fill_headers[46]() tcp: Merge tcp_update_check_tcp[46]() tcp: Merge tcp_fill_headers[46]() with each other tcp_vu: Remove unnecessary tcp_vu_update_check() functionApplied. -- Stefano