v3: - add a patch that has been extracted from: "tcp: extract buffer management from tcp_send_flag()" -> "tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()" - see detailed v3 history log in each patch - I didn't address the alignment problem when we provide a pointer to a sub-structure in the internal buffer structure. (for the last patches of the series). v2 comparing to vhost-user full part: - part 1 includes only preliminary patches (checksum, iovec, cleanup) - see detailed v2 history log in each patch. Full series v1 available at: [PATCH 00/24] Add vhost-user support to passt. https://url.corp.redhat.com/passt-vhost-user-v1 Thanks, Laurent Laurent Vivier (9): iov: add some functions to manage iovec pcap: add pcap_iov() checksum: align buffers checksum: add csum_iov() util: move IP stuff from util.[ch] to ip.[ch] checksum: use csum_ip4_header() in udp.c and tcp.c checksum: introduce functions to compute the header part checksum for TCP/UDP tap: make tap_update_mac() generic tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers() Makefile | 12 +-- checksum.c | 171 ++++++++++++++++++++++++++++++----------- checksum.h | 9 ++- conf.c | 1 + dhcp.c | 1 + flow.c | 1 + icmp.c | 1 + iov.c | 171 +++++++++++++++++++++++++++++++++++++++++ iov.h | 29 +++++++ ip.c | 72 +++++++++++++++++ ip.h | 86 +++++++++++++++++++++ ndp.c | 1 + pcap.c | 62 +++++++++++++-- pcap.h | 1 + port_fwd.c | 1 + qrap.c | 1 + tap.c | 14 ++-- tap.h | 2 +- tcp.c | 213 +++++++++++++++++++++++++++++---------------------- tcp_splice.c | 1 + udp.c | 35 +++------ util.c | 55 ------------- util.h | 76 ------------------ 23 files changed, 703 insertions(+), 313 deletions(-) create mode 100644 iov.c create mode 100644 iov.h create mode 100644 ip.c create mode 100644 ip.h -- 2.42.0
Introduce functions to copy to/from a buffer from/to an iovec array, to compute data length in in bytes of an iovec and to copy memory from an iovec to another. iov_from_buf(), iov_to_buf(), iov_size(), iov_copy(). Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - update copyright statement (but keep it as proposed by Stefano) - open code of iov_fill_from_buf()/iov_fill_to_buf() into iov_frombuf()/iov_to_buf() - coding style cleanup - use size_t for the length of the vectors v2: - reorder added files in alphanetical order in Makefile - update comments, cosmetic cleanup - rename iov_from_buf_full/iov_to_buf_full to iov_fill_from_buf/iov_fill_to_buf - split loops that manage offset and bytes copy. - move iov_from_buf()/iov_to_buf() to iov.c Makefile | 8 +-- iov.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ iov.h | 29 ++++++++++ 3 files changed, 204 insertions(+), 4 deletions(-) create mode 100644 iov.c create mode 100644 iov.h diff --git a/Makefile b/Makefile index af4fa87e7e13..156398b3844e 100644 --- a/Makefile +++ b/Makefile @@ -45,16 +45,16 @@ FLAGS += -DVERSION=\"$(VERSION)\" FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS) PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \ - igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c \ - passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c tcp_splice.c udp.c \ - util.c + igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \ + packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \ + tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c SRCS = $(PASST_SRCS) $(QRAP_SRCS) MANPAGES = passt.1 pasta.1 qrap.1 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \ - flow_table.h icmp.h inany.h isolation.h lineread.h log.h ndp.h \ + flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \ netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \ tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h HEADERS = $(PASST_HEADERS) seccomp.h diff --git a/iov.c b/iov.c new file mode 100644 index 000000000000..1135f87e2f45 --- /dev/null +++ b/iov.c @@ -0,0 +1,171 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * iov.h - helpers for using (partial) iovecs. + * + * Copyright Red Hat + * Author: Laurent Vivier <lvivier(a)redhat.com> + * + * This file also contains code originally from QEMU include/qemu/iov.h + * and licensed under the following terms: + * + * Copyright (C) 2010 Red Hat, Inc. + * + * Author(s): + * Amit Shah <amit.shah(a)redhat.com> + * Michael Tokarev <mjt(a)tls.msk.ru> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ +#include <sys/socket.h> + +#include "util.h" +#include "iov.h" + +/** + * iov_from_buf - Copy data from a buffer to an I/O vector (struct iovec) + * efficiently. + * + * @iov: Pointer to the array of struct iovec describing the + * scatter/gather I/O vector. + * @iov_cnt: Number of elements in the iov array. + * @offset: Byte offset in the iov array where copying should start. + * @buf: Pointer to the source buffer containing the data to copy. + * @bytes: Total number of bytes to copy from buf to iov. + * + * Returns: The number of bytes successfully copied. + */ +size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, const void *buf, size_t bytes) +{ + unsigned int i; + size_t copied; + + if (__builtin_constant_p(bytes) && iov_cnt && + offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) { + memcpy((char *)iov[0].iov_base + offset, buf, bytes); + return bytes; + } + + /* skipping offset bytes in the iovec */ + for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (copied = 0; copied < bytes && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, bytes - copied); + + memcpy((char *)iov[i].iov_base + offset, (char *)buf + copied, + len); + copied += len; + offset = 0; + } + + return copied; +} + +/** + * iov_to_buf - Copy data from a scatter/gather I/O vector (struct iovec) to + * a buffer efficiently. + * + * @iov: Pointer to the array of struct iovec describing the scatter/gather + * I/O vector. + * @iov_cnt: Number of elements in the iov array. + * @offset: Offset within the first element of iov from where copying should start. + * @buf: Pointer to the destination buffer where data will be copied. + * @bytes: Total number of bytes to copy from iov to buf. + * + * Returns: The number of bytes successfully copied. + */ +size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, void *buf, size_t bytes) +{ + unsigned int i; + size_t copied; + + if (__builtin_constant_p(bytes) && iov_cnt && + offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) { + memcpy(buf, (char *)iov[0].iov_base + offset, bytes); + return bytes; + } + + /* skipping offset bytes in the iovec */ + for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (copied = 0; copied < bytes && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, bytes - copied); + memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset, + len); + copied += len; + offset = 0; + } + + return copied; +} + +/** + * iov_size - Calculate the total size of a scatter/gather I/O vector + * (struct iovec). + * + * @iov: Pointer to the array of struct iovec describing the + * scatter/gather I/O vector. + * @iov_cnt: Number of elements in the iov array. + * + * Returns: The total size in bytes. + */ +size_t iov_size(const struct iovec *iov, size_t iov_cnt) +{ + unsigned int i; + size_t len; + + for (i = 0, len = 0; i < iov_cnt; i++) + len += iov[i].iov_len; + + return len; +} + +/** + * iov_copy - Copy data from one scatter/gather I/O vector (struct iovec) to + * another. + * + * @dst_iov: Pointer to the destination array of struct iovec describing + * the scatter/gather I/O vector to copy to. + * @dst_iov_cnt: Number of elements in the destination iov array. + * @iov: Pointer to the source array of struct iovec describing + * the scatter/gather I/O vector to copy from. + * @iov_cnt: Number of elements in the source iov array. + * @offset: Offset within the source iov from where copying should start. + * @bytes: Total number of bytes to copy from iov to dst_iov. + * + * Returns: The number of elements successfully copied to the destination + * iov array. + */ +unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt, + const struct iovec *iov, size_t iov_cnt, + size_t offset, size_t bytes) +{ + unsigned int i, j; + size_t len; + + /* skipping offset bytes in the iovec */ + for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (j = 0; i < iov_cnt && j < dst_iov_cnt && bytes; i++) { + len = MIN(bytes, iov[i].iov_len - offset); + + dst_iov[j].iov_base = (char *)iov[i].iov_base + offset; + dst_iov[j].iov_len = len; + j++; + bytes -= len; + offset = 0; + } + + return j; +} diff --git a/iov.h b/iov.h new file mode 100644 index 000000000000..ee35a75d0870 --- /dev/null +++ b/iov.h @@ -0,0 +1,29 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * iov.c - helpers for using (partial) iovecs. + * + * Copyrigh Red Hat + * Author: Laurent Vivier <lvivier(a)redhat.com> + * + * This file also contains code originally from QEMU include/qemu/iov.h: + * + * Author(s): + * Amit Shah <amit.shah(a)redhat.com> + * Michael Tokarev <mjt(a)tls.msk.ru> + */ + +#ifndef IOVEC_H +#define IOVEC_H + +#include <unistd.h> +#include <string.h> + +size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, const void *buf, size_t bytes); +size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, void *buf, size_t bytes); +size_t iov_size(const struct iovec *iov, size_t iov_cnt); +unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt, + const struct iovec *iov, size_t iov_cnt, + size_t offset, size_t bytes); +#endif /* IOVEC_H */ -- 2.42.0
On Sat, Feb 17, 2024 at 04:07:17PM +0100, Laurent Vivier wrote:Introduce functions to copy to/from a buffer from/to an iovec array, to compute data length in in bytes of an iovec and to copy memory from an iovec to another. iov_from_buf(), iov_to_buf(), iov_size(), iov_copy(). Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>--- Notes: v3: - update copyright statement (but keep it as proposed by Stefano) - open code of iov_fill_from_buf()/iov_fill_to_buf() into iov_frombuf()/iov_to_buf() - coding style cleanup - use size_t for the length of the vectors v2: - reorder added files in alphanetical order in Makefile - update comments, cosmetic cleanup - rename iov_from_buf_full/iov_to_buf_full to iov_fill_from_buf/iov_fill_to_buf - split loops that manage offset and bytes copy. - move iov_from_buf()/iov_to_buf() to iov.c Makefile | 8 +-- iov.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ iov.h | 29 ++++++++++ 3 files changed, 204 insertions(+), 4 deletions(-) create mode 100644 iov.c create mode 100644 iov.h diff --git a/Makefile b/Makefile index af4fa87e7e13..156398b3844e 100644 --- a/Makefile +++ b/Makefile @@ -45,16 +45,16 @@ FLAGS += -DVERSION=\"$(VERSION)\" FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS) PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \ - igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c \ - passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c tcp_splice.c udp.c \ - util.c + igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \ + packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \ + tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c SRCS = $(PASST_SRCS) $(QRAP_SRCS) MANPAGES = passt.1 pasta.1 qrap.1 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \ - flow_table.h icmp.h inany.h isolation.h lineread.h log.h ndp.h \ + flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \ netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \ tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h HEADERS = $(PASST_HEADERS) seccomp.h diff --git a/iov.c b/iov.c new file mode 100644 index 000000000000..1135f87e2f45 --- /dev/null +++ b/iov.c @@ -0,0 +1,171 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * iov.h - helpers for using (partial) iovecs. + * + * Copyright Red Hat + * Author: Laurent Vivier <lvivier(a)redhat.com> + * + * This file also contains code originally from QEMU include/qemu/iov.h + * and licensed under the following terms: + * + * Copyright (C) 2010 Red Hat, Inc. + * + * Author(s): + * Amit Shah <amit.shah(a)redhat.com> + * Michael Tokarev <mjt(a)tls.msk.ru> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ +#include <sys/socket.h> + +#include "util.h" +#include "iov.h" + +/** + * iov_from_buf - Copy data from a buffer to an I/O vector (struct iovec) + * efficiently. + * + * @iov: Pointer to the array of struct iovec describing the + * scatter/gather I/O vector. + * @iov_cnt: Number of elements in the iov array. + * @offset: Byte offset in the iov array where copying should start. + * @buf: Pointer to the source buffer containing the data to copy. + * @bytes: Total number of bytes to copy from buf to iov. + * + * Returns: The number of bytes successfully copied. + */ +size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, const void *buf, size_t bytes) +{ + unsigned int i; + size_t copied; + + if (__builtin_constant_p(bytes) && iov_cnt && + offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) { + memcpy((char *)iov[0].iov_base + offset, buf, bytes); + return bytes; + } + + /* skipping offset bytes in the iovec */ + for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (copied = 0; copied < bytes && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, bytes - copied); + + memcpy((char *)iov[i].iov_base + offset, (char *)buf + copied, + len); + copied += len; + offset = 0; + } + + return copied; +} + +/** + * iov_to_buf - Copy data from a scatter/gather I/O vector (struct iovec) to + * a buffer efficiently. + * + * @iov: Pointer to the array of struct iovec describing the scatter/gather + * I/O vector. + * @iov_cnt: Number of elements in the iov array. + * @offset: Offset within the first element of iov from where copying should start. + * @buf: Pointer to the destination buffer where data will be copied. + * @bytes: Total number of bytes to copy from iov to buf. + * + * Returns: The number of bytes successfully copied. + */ +size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, void *buf, size_t bytes) +{ + unsigned int i; + size_t copied; + + if (__builtin_constant_p(bytes) && iov_cnt && + offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) { + memcpy(buf, (char *)iov[0].iov_base + offset, bytes); + return bytes; + } + + /* skipping offset bytes in the iovec */ + for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (copied = 0; copied < bytes && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, bytes - copied); + memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset, + len); + copied += len; + offset = 0; + } + + return copied; +} + +/** + * iov_size - Calculate the total size of a scatter/gather I/O vector + * (struct iovec). + * + * @iov: Pointer to the array of struct iovec describing the + * scatter/gather I/O vector. + * @iov_cnt: Number of elements in the iov array. + * + * Returns: The total size in bytes. + */ +size_t iov_size(const struct iovec *iov, size_t iov_cnt) +{ + unsigned int i; + size_t len; + + for (i = 0, len = 0; i < iov_cnt; i++) + len += iov[i].iov_len; + + return len; +} + +/** + * iov_copy - Copy data from one scatter/gather I/O vector (struct iovec) to + * another. + * + * @dst_iov: Pointer to the destination array of struct iovec describing + * the scatter/gather I/O vector to copy to. + * @dst_iov_cnt: Number of elements in the destination iov array. + * @iov: Pointer to the source array of struct iovec describing + * the scatter/gather I/O vector to copy from. + * @iov_cnt: Number of elements in the source iov array. + * @offset: Offset within the source iov from where copying should start. + * @bytes: Total number of bytes to copy from iov to dst_iov. + * + * Returns: The number of elements successfully copied to the destination + * iov array. + */ +unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt, + const struct iovec *iov, size_t iov_cnt, + size_t offset, size_t bytes) +{ + unsigned int i, j; + size_t len; + + /* skipping offset bytes in the iovec */ + for (i = 0; offset >= iov[i].iov_len && i < iov_cnt; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (j = 0; i < iov_cnt && j < dst_iov_cnt && bytes; i++) { + len = MIN(bytes, iov[i].iov_len - offset); + + dst_iov[j].iov_base = (char *)iov[i].iov_base + offset; + dst_iov[j].iov_len = len; + j++; + bytes -= len; + offset = 0; + } + + return j; +} diff --git a/iov.h b/iov.h new file mode 100644 index 000000000000..ee35a75d0870 --- /dev/null +++ b/iov.h @@ -0,0 +1,29 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * iov.c - helpers for using (partial) iovecs. + * + * Copyrigh Red Hat + * Author: Laurent Vivier <lvivier(a)redhat.com> + * + * This file also contains code originally from QEMU include/qemu/iov.h: + * + * Author(s): + * Amit Shah <amit.shah(a)redhat.com> + * Michael Tokarev <mjt(a)tls.msk.ru> + */ + +#ifndef IOVEC_H +#define IOVEC_H + +#include <unistd.h> +#include <string.h> + +size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, const void *buf, size_t bytes); +size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, void *buf, size_t bytes); +size_t iov_size(const struct iovec *iov, size_t iov_cnt); +unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt, + const struct iovec *iov, size_t iov_cnt, + size_t offset, size_t bytes); +#endif /* IOVEC_H */-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Introduce a new function pcap_iov() to capture packet desribed by an IO vector. Move packet header writing to to pcap_header() and use it in pcap_frame() and pcap_iov(). Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - update rational - update comment - use strerror(errno) - use size_t for io vector length v2: - introduce pcap_header(), a common helper to write packet header - use writev() rather than write() in a loop - add functions comment pcap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++------- pcap.h | 1 + 2 files changed, 56 insertions(+), 7 deletions(-) diff --git a/pcap.c b/pcap.c index 501d52d4992b..4e213eea8113 100644 --- a/pcap.c +++ b/pcap.c @@ -20,6 +20,7 @@ #include <sys/time.h> #include <sys/types.h> #include <sys/stat.h> +#include <sys/uio.h> #include <fcntl.h> #include <time.h> #include <errno.h> @@ -31,6 +32,7 @@ #include "util.h" #include "passt.h" #include "log.h" +#include "iov.h" #define PCAP_VERSION_MINOR 4 @@ -65,6 +67,28 @@ struct pcap_pkthdr { uint32_t len; }; +/* + * pcap_header() - Write a pcap packet header to pcap file + * + * @len: Length of the packet data. + * @tv: Timestamp for the packet. + * + * Return: -1 in case of error, otherwise, 0 to indicate success. + */ +static int pcap_header(size_t len, const struct timeval *tv) +{ + struct pcap_pkthdr h; + + h.tv_sec = tv->tv_sec; + h.tv_usec = tv->tv_usec; + h.caplen = h.len = len; + + if (write(pcap_fd, &h, sizeof(h)) < 0) + return -1; + + return 0; +} + /** * pcap_frame() - Capture a single frame to pcap file with given timestamp * @pkt: Pointer to data buffer, including L2 headers @@ -75,13 +99,7 @@ struct pcap_pkthdr { */ static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv) { - struct pcap_pkthdr h; - - h.tv_sec = tv->tv_sec; - h.tv_usec = tv->tv_usec; - h.caplen = h.len = len; - - if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0) + if (pcap_header(len, tv) < 0 || write(pcap_fd, pkt, len) < 0) return -errno; return 0; @@ -130,6 +148,36 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) } } +/* + * pcap_iov - Write packet data described by a scatter/gather I/O vector (iov) + * to a pcap file descriptor (pcap_fd). + * + * @iov: Pointer to the array of struct iovec describing the scatter/gather + * I/O vector containing packet data to write, including L2 header + * @n: Number of elements in the iov array. + */ +void pcap_iov(const struct iovec *iov, size_t n) +{ + struct timeval tv; + size_t len; + + if (pcap_fd == -1) + return; + + gettimeofday(&tv, NULL); + + len = iov_size(iov, n); + + if (pcap_header(len, &tv) < 0) { + debug("Cannot write pcap header"); + return; + } + + if (writev(pcap_fd, iov, n) < 0) + debug("Cannot log packet using writev(): %s\n", + strerror(errno)); +} + /** * pcap_init() - Initialise pcap file * @c: Execution context diff --git a/pcap.h b/pcap.h index da5a7e846b72..4950f617f4c8 100644 --- a/pcap.h +++ b/pcap.h @@ -8,6 +8,7 @@ void pcap(const char *pkt, size_t len); void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset); +void pcap_iov(const struct iovec *iov, size_t n); void pcap_init(struct ctx *c); #endif /* PCAP_H */ -- 2.42.0
On Sat, Feb 17, 2024 at 04:07:18PM +0100, Laurent Vivier wrote:Introduce a new function pcap_iov() to capture packet desribed by an IO vector. Move packet header writing to to pcap_header() and use it in pcap_frame() and pcap_iov(). Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>--- Notes: v3: - update rational - update comment - use strerror(errno) - use size_t for io vector length v2: - introduce pcap_header(), a common helper to write packet header - use writev() rather than write() in a loop - add functions comment pcap.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++------- pcap.h | 1 + 2 files changed, 56 insertions(+), 7 deletions(-) diff --git a/pcap.c b/pcap.c index 501d52d4992b..4e213eea8113 100644 --- a/pcap.c +++ b/pcap.c @@ -20,6 +20,7 @@ #include <sys/time.h> #include <sys/types.h> #include <sys/stat.h> +#include <sys/uio.h> #include <fcntl.h> #include <time.h> #include <errno.h> @@ -31,6 +32,7 @@ #include "util.h" #include "passt.h" #include "log.h" +#include "iov.h" #define PCAP_VERSION_MINOR 4 @@ -65,6 +67,28 @@ struct pcap_pkthdr { uint32_t len; }; +/* + * pcap_header() - Write a pcap packet header to pcap file + * + * @len: Length of the packet data. + * @tv: Timestamp for the packet. + * + * Return: -1 in case of error, otherwise, 0 to indicate success. + */ +static int pcap_header(size_t len, const struct timeval *tv) +{ + struct pcap_pkthdr h; + + h.tv_sec = tv->tv_sec; + h.tv_usec = tv->tv_usec; + h.caplen = h.len = len; + + if (write(pcap_fd, &h, sizeof(h)) < 0) + return -1; + + return 0; +} + /** * pcap_frame() - Capture a single frame to pcap file with given timestamp * @pkt: Pointer to data buffer, including L2 headers @@ -75,13 +99,7 @@ struct pcap_pkthdr { */ static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv) { - struct pcap_pkthdr h; - - h.tv_sec = tv->tv_sec; - h.tv_usec = tv->tv_usec; - h.caplen = h.len = len; - - if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0) + if (pcap_header(len, tv) < 0 || write(pcap_fd, pkt, len) < 0) return -errno; return 0; @@ -130,6 +148,36 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) } } +/* + * pcap_iov - Write packet data described by a scatter/gather I/O vector (iov) + * to a pcap file descriptor (pcap_fd). + * + * @iov: Pointer to the array of struct iovec describing the scatter/gather + * I/O vector containing packet data to write, including L2 header + * @n: Number of elements in the iov array. + */ +void pcap_iov(const struct iovec *iov, size_t n) +{ + struct timeval tv; + size_t len; + + if (pcap_fd == -1) + return; + + gettimeofday(&tv, NULL); + + len = iov_size(iov, n); + + if (pcap_header(len, &tv) < 0) { + debug("Cannot write pcap header"); + return; + } + + if (writev(pcap_fd, iov, n) < 0) + debug("Cannot log packet using writev(): %s\n", + strerror(errno)); +} + /** * pcap_init() - Initialise pcap file * @c: Execution context diff --git a/pcap.h b/pcap.h index da5a7e846b72..4950f617f4c8 100644 --- a/pcap.h +++ b/pcap.h @@ -8,6 +8,7 @@ void pcap(const char *pkt, size_t len); void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset); +void pcap_iov(const struct iovec *iov, size_t n); void pcap_init(struct ctx *c); #endif /* PCAP_H */-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
If buffer is not aligned use sum_16b() only on the not aligned part, and then use csum_avx2() on the remaining part Remove unneeded now function csum_unaligned(). Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- Notes: v3: - Add David's R-b v2: - use ROUND_UP() and sizeof(__m256i) - fix function comment - remove csum_unaligned() and use csum() instead checksum.c | 47 ++++++++++++++++++++++++----------------------- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/checksum.c b/checksum.c index f21c9b7a14d1..65486b4625ba 100644 --- a/checksum.c +++ b/checksum.c @@ -56,6 +56,8 @@ #include <linux/udp.h> #include <linux/icmpv6.h> +#include "util.h" + /* Checksums are optional for UDP over IPv4, so we usually just set * them to 0. Change this to 1 to calculate real UDP over IPv4 * checksums @@ -110,20 +112,7 @@ uint16_t csum_fold(uint32_t sum) return sum; } -/** - * csum_unaligned() - Compute TCP/IP-style checksum for not 32-byte aligned data - * @buf: Input data - * @len: Input length - * @init: Initial 32-bit checksum, 0 for no pre-computed checksum - * - * Return: 16-bit IPv4-style checksum - */ -/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ -__attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ -uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) -{ - return (uint16_t)~csum_fold(sum_16b(buf, len) + init); -} +uint16_t csum(const void *buf, size_t len, uint32_t init); /** * csum_ip4_header() - Calculate and set IPv4 header checksum @@ -132,7 +121,7 @@ uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init) void csum_ip4_header(struct iphdr *ip4h) { ip4h->check = 0; - ip4h->check = csum_unaligned(ip4h, (size_t)ip4h->ihl * 4, 0); + ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); } /** @@ -159,7 +148,7 @@ void csum_udp4(struct udphdr *udp4hr, + htons(IPPROTO_UDP); /* Add in partial checksum for the UDP header alone */ psum += sum_16b(udp4hr, sizeof(*udp4hr)); - udp4hr->check = csum_unaligned(payload, len, psum); + udp4hr->check = csum(payload, len, psum); } } @@ -178,7 +167,7 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) /* Partial checksum for ICMP header alone */ psum = sum_16b(icmp4hr, sizeof(*icmp4hr)); - icmp4hr->checksum = csum_unaligned(payload, len, psum); + icmp4hr->checksum = csum(payload, len, psum); } /** @@ -199,7 +188,7 @@ void csum_udp6(struct udphdr *udp6hr, udp6hr->check = 0; /* Add in partial checksum for the UDP header alone */ psum += sum_16b(udp6hr, sizeof(*udp6hr)); - udp6hr->check = csum_unaligned(payload, len, psum); + udp6hr->check = csum(payload, len, psum); } /** @@ -222,7 +211,7 @@ void csum_icmp6(struct icmp6hdr *icmp6hr, icmp6hr->icmp6_cksum = 0; /* Add in partial checksum for the ICMPv6 header alone */ psum += sum_16b(icmp6hr, sizeof(*icmp6hr)); - icmp6hr->icmp6_cksum = csum_unaligned(payload, len, psum); + icmp6hr->icmp6_cksum = csum(payload, len, psum); } #ifdef __AVX2__ @@ -397,17 +386,29 @@ less_than_128_bytes: /** * csum() - Compute TCP/IP-style checksum - * @buf: Input buffer, must be aligned to 32-byte boundary + * @buf: Input buffer * @len: Input length * @init: Initial 32-bit checksum, 0 for no pre-computed checksum * - * Return: 16-bit folded, complemented checksum sum + * Return: 16-bit folded, complemented checksum */ /* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ uint16_t csum(const void *buf, size_t len, uint32_t init) { - return (uint16_t)~csum_fold(csum_avx2(buf, len, init)); + intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i)); + unsigned int pad = align - (intptr_t)buf; + + if (len < pad) + pad = len; + + if (pad) + init += sum_16b(buf, pad); + + if (len > pad) + init = csum_avx2((void *)align, len - pad, init); + + return (uint16_t)~csum_fold(init); } #else /* __AVX2__ */ @@ -424,7 +425,7 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ uint16_t csum(const void *buf, size_t len, uint32_t init) { - return csum_unaligned(buf, len, init); + return (uint16_t)~csum_fold(sum_16b(buf, len) + init); } #endif /* !__AVX2__ */ -- 2.42.0
Introduce the function csum_unfolded() that computes the unfolded 32-bit checksum of a data buffer, and call it from csum() that returns the folded value. Introduce csum_iov() that computes the checksum using csum_folded() on all vectors of the iovec array and returns the folded result. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - update comments - use size_t for the IO vectors length - include checksum.h in checksum.c - export csum_unfolded() (for later) v2: - fix typo and superfluous space - update comments checksum.c | 56 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 2 ++ 2 files changed, 46 insertions(+), 12 deletions(-) diff --git a/checksum.c b/checksum.c index 65486b4625ba..74e3742bc6f6 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set * them to 0. Change this to 1 to calculate real UDP over IPv4 @@ -385,16 +386,16 @@ less_than_128_bytes: } /** - * csum() - Compute TCP/IP-style checksum - * @buf: Input buffer - * @len: Input length - * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * csum_unfolded - Calculate the unfolded checksum of a data buffer. * - * Return: 16-bit folded, complemented checksum + * @buf: Input buffer + * @len: Input length + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 32-bit unfolded */ -/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ -uint16_t csum(const void *buf, size_t len, uint32_t init) +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) { intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i)); unsigned int pad = align - (intptr_t)buf; @@ -408,16 +409,30 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) if (len > pad) init = csum_avx2((void *)align, len - pad, init); - return (uint16_t)~csum_fold(init); + return init; } - #else /* __AVX2__ */ +/** + * csum_unfolded - Calculate the unfolded checksum of a data buffer. + * + * @buf: Input buffer + * @len: Input length + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 32-bit unfolded checksum + */ +__attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) +{ + return sum_16b(buf, len) + init; +} +#endif /* !__AVX2__ */ /** * csum() - Compute TCP/IP-style checksum * @buf: Input buffer * @len: Input length - * @sum: Initial 32-bit checksum, 0 for no pre-computed checksum + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum * * Return: 16-bit folded, complemented checksum */ @@ -425,7 +440,24 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ uint16_t csum(const void *buf, size_t len, uint32_t init) { - return (uint16_t)~csum_fold(sum_16b(buf, len) + init); + return (uint16_t)~csum_fold(csum_unfolded(buf, len, init)); } -#endif /* !__AVX2__ */ +/** + * csum_iov() - Calculates the unfolded checksum over an array of IO vectors + * + * @iov Pointer to the array of IO vectors + * @n Length of the array + * @init Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 16-bit folded, complemented checksum + */ +uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init) +{ + unsigned int i; + + for (i = 0; i < n; i++) + init = csum_unfolded(iov[i].iov_base, iov[i].iov_len, init); + + return (uint16_t)~csum_fold(init); +} diff --git a/checksum.h b/checksum.h index 21c0310d3804..dfa705a04a24 100644 --- a/checksum.h +++ b/checksum.h @@ -24,6 +24,8 @@ void csum_udp6(struct udphdr *udp6hr, void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init); uint16_t csum(const void *buf, size_t len, uint32_t init); +uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init); #endif /* CHECKSUM_H */ -- 2.42.0
On Sat, Feb 17, 2024 at 04:07:20PM +0100, Laurent Vivier wrote:Introduce the function csum_unfolded() that computes the unfolded 32-bit checksum of a data buffer, and call it from csum() that returns the folded value. Introduce csum_iov() that computes the checksum using csum_folded() on all vectors of the iovec array and returns the folded result. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>--- Notes: v3: - update comments - use size_t for the IO vectors length - include checksum.h in checksum.c - export csum_unfolded() (for later) v2: - fix typo and superfluous space - update comments checksum.c | 56 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 2 ++ 2 files changed, 46 insertions(+), 12 deletions(-) diff --git a/checksum.c b/checksum.c index 65486b4625ba..74e3742bc6f6 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set * them to 0. Change this to 1 to calculate real UDP over IPv4 @@ -385,16 +386,16 @@ less_than_128_bytes: } /** - * csum() - Compute TCP/IP-style checksum - * @buf: Input buffer - * @len: Input length - * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * csum_unfolded - Calculate the unfolded checksum of a data buffer. * - * Return: 16-bit folded, complemented checksum + * @buf: Input buffer + * @len: Input length + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 32-bit unfolded */ -/* NOLINTNEXTLINE(clang-diagnostic-unknown-attributes) */ __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ -uint16_t csum(const void *buf, size_t len, uint32_t init) +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) { intptr_t align = ROUND_UP((intptr_t)buf, sizeof(__m256i)); unsigned int pad = align - (intptr_t)buf; @@ -408,16 +409,30 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) if (len > pad) init = csum_avx2((void *)align, len - pad, init); - return (uint16_t)~csum_fold(init); + return init; } - #else /* __AVX2__ */ +/** + * csum_unfolded - Calculate the unfolded checksum of a data buffer. + * + * @buf: Input buffer + * @len: Input length + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 32-bit unfolded checksum + */ +__attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init) +{ + return sum_16b(buf, len) + init; +} +#endif /* !__AVX2__ */ /** * csum() - Compute TCP/IP-style checksum * @buf: Input buffer * @len: Input length - * @sum: Initial 32-bit checksum, 0 for no pre-computed checksum + * @init: Initial 32-bit checksum, 0 for no pre-computed checksum * * Return: 16-bit folded, complemented checksum */ @@ -425,7 +440,24 @@ uint16_t csum(const void *buf, size_t len, uint32_t init) __attribute__((optimize("-fno-strict-aliasing"))) /* See csum_16b() */ uint16_t csum(const void *buf, size_t len, uint32_t init) { - return (uint16_t)~csum_fold(sum_16b(buf, len) + init); + return (uint16_t)~csum_fold(csum_unfolded(buf, len, init)); } -#endif /* !__AVX2__ */ +/** + * csum_iov() - Calculates the unfolded checksum over an array of IO vectors + * + * @iov Pointer to the array of IO vectors + * @n Length of the array + * @init Initial 32-bit checksum, 0 for no pre-computed checksum + * + * Return: 16-bit folded, complemented checksum + */ +uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init) +{ + unsigned int i; + + for (i = 0; i < n; i++) + init = csum_unfolded(iov[i].iov_base, iov[i].iov_len, init); + + return (uint16_t)~csum_fold(init); +} diff --git a/checksum.h b/checksum.h index 21c0310d3804..dfa705a04a24 100644 --- a/checksum.h +++ b/checksum.h @@ -24,6 +24,8 @@ void csum_udp6(struct udphdr *udp6hr, void csum_icmp6(struct icmp6hdr *icmp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); +uint32_t csum_unfolded(const void *buf, size_t len, uint32_t init); uint16_t csum(const void *buf, size_t len, uint32_t init); +uint16_t csum_iov(struct iovec *iov, size_t n, uint32_t init); #endif /* CHECKSUM_H */-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- Notes: v3: - rewrap rational - add David's R-b v2: - update rational and comments Makefile | 8 ++--- conf.c | 1 + dhcp.c | 1 + flow.c | 1 + icmp.c | 1 + ip.c | 72 +++++++++++++++++++++++++++++++++++++++++++ ip.h | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++ ndp.c | 1 + port_fwd.c | 1 + qrap.c | 1 + tap.c | 1 + tcp.c | 1 + tcp_splice.c | 1 + udp.c | 1 + util.c | 55 --------------------------------- util.h | 76 ---------------------------------------------- 16 files changed, 173 insertions(+), 135 deletions(-) create mode 100644 ip.c create mode 100644 ip.h diff --git a/Makefile b/Makefile index 156398b3844e..e1ebb454bc6b 100644 --- a/Makefile +++ b/Makefile @@ -45,7 +45,7 @@ FLAGS += -DVERSION=\"$(VERSION)\" FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS) PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \ - igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \ + igmp.c iov.c ip.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \ packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \ tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c @@ -54,9 +54,9 @@ SRCS = $(PASST_SRCS) $(QRAP_SRCS) MANPAGES = passt.1 pasta.1 qrap.1 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \ - flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \ - netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \ - tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h + flow_table.h icmp.h inany.h iov.h ip.h isolation.h lineread.h log.h \ + ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h \ + siphash.h tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h HEADERS = $(PASST_HEADERS) seccomp.h C := \#include <linux/tcp.h>\nstruct tcp_info x = { .tcpi_snd_wnd = 0 }; diff --git a/conf.c b/conf.c index 5e15b665be9c..93bfda331349 100644 --- a/conf.c +++ b/conf.c @@ -35,6 +35,7 @@ #include <netinet/if_ether.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "netlink.h" #include "udp.h" diff --git a/dhcp.c b/dhcp.c index 110772867632..ff4834a3dce9 100644 --- a/dhcp.c +++ b/dhcp.c @@ -25,6 +25,7 @@ #include <limits.h> #include "util.h" +#include "ip.h" #include "checksum.h" #include "packet.h" #include "passt.h" diff --git a/flow.c b/flow.c index 5e94a7a949e5..73d52bda8774 100644 --- a/flow.c +++ b/flow.c @@ -11,6 +11,7 @@ #include <string.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "siphash.h" #include "inany.h" diff --git a/icmp.c b/icmp.c index 9434fc5a7490..3b85a8578316 100644 --- a/icmp.c +++ b/icmp.c @@ -33,6 +33,7 @@ #include "packet.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "log.h" diff --git a/ip.c b/ip.c new file mode 100644 index 000000000000..2cc7f6548aff --- /dev/null +++ b/ip.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* PASST - Plug A Simple Socket Transport + * for qemu/UNIX domain socket mode + * + * PASTA - Pack A Subtle Tap Abstraction + * for network namespace/tap device mode + * + * ip.c - IP related functions + * + * Copyright (c) 2020-2021 Red Hat GmbH + * Author: Stefano Brivio <sbrivio(a)redhat.com> + */ + +#include <stddef.h> +#include "util.h" +#include "ip.h" + +#define IPV6_NH_OPT(nh) \ + ((nh) == 0 || (nh) == 43 || (nh) == 44 || (nh) == 50 || \ + (nh) == 51 || (nh) == 60 || (nh) == 135 || (nh) == 139 || \ + (nh) == 140 || (nh) == 253 || (nh) == 254) + +/** + * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol + * @p: Packet pool, packet number @idx has IPv6 header at @offset + * @idx: Index of packet in pool + * @offset: Pre-calculated IPv6 header offset + * @proto: Filled with L4 protocol number + * @dlen: Data length (payload excluding header extensions), set on return + * + * Return: pointer to L4 header, NULL if not found + */ +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, + size_t *dlen) +{ + const struct ipv6_opt_hdr *o; + const struct ipv6hdr *ip6h; + char *base; + int hdrlen; + uint8_t nh; + + base = packet_get(p, idx, 0, 0, NULL); + ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen); + if (!ip6h) + return NULL; + + offset += sizeof(*ip6h); + + nh = ip6h->nexthdr; + if (!IPV6_NH_OPT(nh)) + goto found; + + while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) { + nh = o->nexthdr; + hdrlen = (o->hdrlen + 1) * 8; + + if (IPV6_NH_OPT(nh)) + offset += hdrlen; + else + goto found; + } + + return NULL; + +found: + if (nh == 59) + return NULL; + + *proto = nh; + return base + offset; +} diff --git a/ip.h b/ip.h new file mode 100644 index 000000000000..b2e08bc049f3 --- /dev/null +++ b/ip.h @@ -0,0 +1,86 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * Copyright (c) 2021 Red Hat GmbH + * Author: Stefano Brivio <sbrivio(a)redhat.com> + */ + +#ifndef IP_H +#define IP_H + +#include <netinet/ip.h> +#include <netinet/ip6.h> + +#define IN4_IS_ADDR_UNSPECIFIED(a) \ + ((a)->s_addr == htonl_constant(INADDR_ANY)) +#define IN4_IS_ADDR_BROADCAST(a) \ + ((a)->s_addr == htonl_constant(INADDR_BROADCAST)) +#define IN4_IS_ADDR_LOOPBACK(a) \ + (ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET) +#define IN4_IS_ADDR_MULTICAST(a) \ + (IN_MULTICAST(ntohl((a)->s_addr))) +#define IN4_ARE_ADDR_EQUAL(a, b) \ + (((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr) +#define IN4ADDR_LOOPBACK_INIT \ + { .s_addr = htonl_constant(INADDR_LOOPBACK) } +#define IN4ADDR_ANY_INIT \ + { .s_addr = htonl_constant(INADDR_ANY) } + +#define L2_BUF_IP4_INIT(proto) \ + { \ + .version = 4, \ + .ihl = 5, \ + .tos = 0, \ + .tot_len = 0, \ + .id = 0, \ + .frag_off = 0, \ + .ttl = 0xff, \ + .protocol = (proto), \ + .saddr = 0, \ + .daddr = 0, \ + } +#define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ + (uint32_t)htons_constant(0xff00 | (proto))) + +#define L2_BUF_IP6_INIT(proto) \ + { \ + .priority = 0, \ + .version = 6, \ + .flow_lbl = { 0 }, \ + .payload_len = 0, \ + .nexthdr = (proto), \ + .hop_limit = 255, \ + .saddr = IN6ADDR_ANY_INIT, \ + .daddr = IN6ADDR_ANY_INIT, \ + } + +struct ipv6hdr { +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wpedantic" +#if __BYTE_ORDER == __BIG_ENDIAN + uint8_t version:4, + priority:4; +#else + uint8_t priority:4, + version:4; +#endif +#pragma GCC diagnostic pop + uint8_t flow_lbl[3]; + + uint16_t payload_len; + uint8_t nexthdr; + uint8_t hop_limit; + + struct in6_addr saddr; + struct in6_addr daddr; +}; + +struct ipv6_opt_hdr { + uint8_t nexthdr; + uint8_t hdrlen; + /* + * TLV encoded option data follows. + */ +} __attribute__((packed)); /* required for some archs */ + +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, + size_t *dlen); +#endif /* IP_H */ diff --git a/ndp.c b/ndp.c index 4c85ab8bcaee..c58f4b222b76 100644 --- a/ndp.c +++ b/ndp.c @@ -28,6 +28,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "log.h" diff --git a/port_fwd.c b/port_fwd.c index 6f6c836c57ad..e1ec31e2232c 100644 --- a/port_fwd.c +++ b/port_fwd.c @@ -21,6 +21,7 @@ #include <stdio.h> #include "util.h" +#include "ip.h" #include "port_fwd.h" #include "passt.h" #include "lineread.h" diff --git a/qrap.c b/qrap.c index 97f350a4bf0b..d59670621731 100644 --- a/qrap.c +++ b/qrap.c @@ -32,6 +32,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "arp.h" diff --git a/tap.c b/tap.c index 396dee7eef25..3ea03f720d6d 100644 --- a/tap.c +++ b/tap.c @@ -45,6 +45,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "arp.h" #include "dhcp.h" diff --git a/tcp.c b/tcp.c index 2ab443d5c3f2..45ef5146729a 100644 --- a/tcp.c +++ b/tcp.c @@ -289,6 +289,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "siphash.h" diff --git a/tcp_splice.c b/tcp_splice.c index 26d32065cd47..66575ca95a1e 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -49,6 +49,7 @@ #include <sys/socket.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "log.h" #include "tcp_splice.h" diff --git a/udp.c b/udp.c index 933f24b81616..56b58bd8b43a 100644 --- a/udp.c +++ b/udp.c @@ -112,6 +112,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "pcap.h" diff --git a/util.c b/util.c index 21b35ff94db1..f73ea1d98a09 100644 --- a/util.c +++ b/util.c @@ -30,61 +30,6 @@ #include "packet.h" #include "log.h" -#define IPV6_NH_OPT(nh) \ - ((nh) == 0 || (nh) == 43 || (nh) == 44 || (nh) == 50 || \ - (nh) == 51 || (nh) == 60 || (nh) == 135 || (nh) == 139 || \ - (nh) == 140 || (nh) == 253 || (nh) == 254) - -/** - * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol - * @p: Packet pool, packet number @idx has IPv6 header at @offset - * @idx: Index of packet in pool - * @offset: Pre-calculated IPv6 header offset - * @proto: Filled with L4 protocol number - * @dlen: Data length (payload excluding header extensions), set on return - * - * Return: pointer to L4 header, NULL if not found - */ -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, - size_t *dlen) -{ - const struct ipv6_opt_hdr *o; - const struct ipv6hdr *ip6h; - char *base; - int hdrlen; - uint8_t nh; - - base = packet_get(p, idx, 0, 0, NULL); - ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen); - if (!ip6h) - return NULL; - - offset += sizeof(*ip6h); - - nh = ip6h->nexthdr; - if (!IPV6_NH_OPT(nh)) - goto found; - - while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) { - nh = o->nexthdr; - hdrlen = (o->hdrlen + 1) * 8; - - if (IPV6_NH_OPT(nh)) - offset += hdrlen; - else - goto found; - } - - return NULL; - -found: - if (nh == 59) - return NULL; - - *proto = nh; - return base + offset; -} - /** * sock_l4() - Create and bind socket for given L4, add to epoll list * @c: Execution context diff --git a/util.h b/util.h index d2320f8cc99a..f7c3dfee9972 100644 --- a/util.h +++ b/util.h @@ -110,22 +110,6 @@ #define htonl_constant(x) (__bswap_constant_32(x)) #endif -#define IN4_IS_ADDR_UNSPECIFIED(a) \ - ((a)->s_addr == htonl_constant(INADDR_ANY)) -#define IN4_IS_ADDR_BROADCAST(a) \ - ((a)->s_addr == htonl_constant(INADDR_BROADCAST)) -#define IN4_IS_ADDR_LOOPBACK(a) \ - (ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET) -#define IN4_IS_ADDR_MULTICAST(a) \ - (IN_MULTICAST(ntohl((a)->s_addr))) -#define IN4_ARE_ADDR_EQUAL(a, b) \ - (((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr) -#define IN4ADDR_LOOPBACK_INIT \ - { .s_addr = htonl_constant(INADDR_LOOPBACK) } -#define IN4ADDR_ANY_INIT \ - { .s_addr = htonl_constant(INADDR_ANY) } - - #define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 8) int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, void *arg); @@ -138,34 +122,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, (void *)(arg)); \ } while (0) -#define L2_BUF_IP4_INIT(proto) \ - { \ - .version = 4, \ - .ihl = 5, \ - .tos = 0, \ - .tot_len = 0, \ - .id = 0, \ - .frag_off = 0, \ - .ttl = 0xff, \ - .protocol = (proto), \ - .saddr = 0, \ - .daddr = 0, \ - } -#define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ - (uint32_t)htons_constant(0xff00 | (proto))) - -#define L2_BUF_IP6_INIT(proto) \ - { \ - .priority = 0, \ - .version = 6, \ - .flow_lbl = { 0 }, \ - .payload_len = 0, \ - .nexthdr = (proto), \ - .hop_limit = 255, \ - .saddr = IN6ADDR_ANY_INIT, \ - .daddr = IN6ADDR_ANY_INIT, \ - } - #define RCVBUF_BIG (2UL * 1024 * 1024) #define SNDBUF_BIG (4UL * 1024 * 1024) #define SNDBUF_SMALL (128UL * 1024) @@ -173,45 +129,13 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, #include <net/if.h> #include <limits.h> #include <stdint.h> -#include <netinet/ip6.h> #include "packet.h" struct ctx; -struct ipv6hdr { -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wpedantic" -#if __BYTE_ORDER == __BIG_ENDIAN - uint8_t version:4, - priority:4; -#else - uint8_t priority:4, - version:4; -#endif -#pragma GCC diagnostic pop - uint8_t flow_lbl[3]; - - uint16_t payload_len; - uint8_t nexthdr; - uint8_t hop_limit; - - struct in6_addr saddr; - struct in6_addr daddr; -}; - -struct ipv6_opt_hdr { - uint8_t nexthdr; - uint8_t hdrlen; - /* - * TLV encoded option data follows. - */ -} __attribute__((packed)); /* required for some archs */ - /* cppcheck-suppress funcArgNamesDifferent */ __attribute__ ((weak)) int ffsl(long int i) { return __builtin_ffsl(i); } -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, - size_t *dlen); int sock_l4(const struct ctx *c, int af, uint8_t proto, const void *bind_addr, const char *ifname, uint16_t port, uint32_t data); -- 2.42.0
On Sat, Feb 17, 2024 at 04:07:21PM +0100, Laurent Vivier wrote:Introduce ip.[ch] file to encapsulate IP protocol handling functions and structures. Modify various files to include the new header ip.h when it's needed. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- Notes: v3: - rewrap rationalTangential English usage note. "rationale" is different from "rational", though they're related. "Rationale" is the noun - "an explanation of the reason for the thing", "rational" is an adjective - "sensible, logical, having a rationale". (Except of course that in maths "rational" is also a noun - "a number expressible as a ratio").- add David's R-b v2: - update rational and comments Makefile | 8 ++--- conf.c | 1 + dhcp.c | 1 + flow.c | 1 + icmp.c | 1 + ip.c | 72 +++++++++++++++++++++++++++++++++++++++++++ ip.h | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++ ndp.c | 1 + port_fwd.c | 1 + qrap.c | 1 + tap.c | 1 + tcp.c | 1 + tcp_splice.c | 1 + udp.c | 1 + util.c | 55 --------------------------------- util.h | 76 ---------------------------------------------- 16 files changed, 173 insertions(+), 135 deletions(-) create mode 100644 ip.c create mode 100644 ip.h diff --git a/Makefile b/Makefile index 156398b3844e..e1ebb454bc6b 100644 --- a/Makefile +++ b/Makefile @@ -45,7 +45,7 @@ FLAGS += -DVERSION=\"$(VERSION)\" FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS) PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \ - igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \ + igmp.c iov.c ip.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \ packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \ tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c @@ -54,9 +54,9 @@ SRCS = $(PASST_SRCS) $(QRAP_SRCS) MANPAGES = passt.1 pasta.1 qrap.1 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \ - flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \ - netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \ - tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h + flow_table.h icmp.h inany.h iov.h ip.h isolation.h lineread.h log.h \ + ndp.h netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h \ + siphash.h tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h HEADERS = $(PASST_HEADERS) seccomp.h C := \#include <linux/tcp.h>\nstruct tcp_info x = { .tcpi_snd_wnd = 0 }; diff --git a/conf.c b/conf.c index 5e15b665be9c..93bfda331349 100644 --- a/conf.c +++ b/conf.c @@ -35,6 +35,7 @@ #include <netinet/if_ether.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "netlink.h" #include "udp.h" diff --git a/dhcp.c b/dhcp.c index 110772867632..ff4834a3dce9 100644 --- a/dhcp.c +++ b/dhcp.c @@ -25,6 +25,7 @@ #include <limits.h> #include "util.h" +#include "ip.h" #include "checksum.h" #include "packet.h" #include "passt.h" diff --git a/flow.c b/flow.c index 5e94a7a949e5..73d52bda8774 100644 --- a/flow.c +++ b/flow.c @@ -11,6 +11,7 @@ #include <string.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "siphash.h" #include "inany.h" diff --git a/icmp.c b/icmp.c index 9434fc5a7490..3b85a8578316 100644 --- a/icmp.c +++ b/icmp.c @@ -33,6 +33,7 @@ #include "packet.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "log.h" diff --git a/ip.c b/ip.c new file mode 100644 index 000000000000..2cc7f6548aff --- /dev/null +++ b/ip.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0-or-later + +/* PASST - Plug A Simple Socket Transport + * for qemu/UNIX domain socket mode + * + * PASTA - Pack A Subtle Tap Abstraction + * for network namespace/tap device mode + * + * ip.c - IP related functions + * + * Copyright (c) 2020-2021 Red Hat GmbH + * Author: Stefano Brivio <sbrivio(a)redhat.com> + */ + +#include <stddef.h> +#include "util.h" +#include "ip.h" + +#define IPV6_NH_OPT(nh) \ + ((nh) == 0 || (nh) == 43 || (nh) == 44 || (nh) == 50 || \ + (nh) == 51 || (nh) == 60 || (nh) == 135 || (nh) == 139 || \ + (nh) == 140 || (nh) == 253 || (nh) == 254) + +/** + * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol + * @p: Packet pool, packet number @idx has IPv6 header at @offset + * @idx: Index of packet in pool + * @offset: Pre-calculated IPv6 header offset + * @proto: Filled with L4 protocol number + * @dlen: Data length (payload excluding header extensions), set on return + * + * Return: pointer to L4 header, NULL if not found + */ +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, + size_t *dlen) +{ + const struct ipv6_opt_hdr *o; + const struct ipv6hdr *ip6h; + char *base; + int hdrlen; + uint8_t nh; + + base = packet_get(p, idx, 0, 0, NULL); + ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen); + if (!ip6h) + return NULL; + + offset += sizeof(*ip6h); + + nh = ip6h->nexthdr; + if (!IPV6_NH_OPT(nh)) + goto found; + + while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) { + nh = o->nexthdr; + hdrlen = (o->hdrlen + 1) * 8; + + if (IPV6_NH_OPT(nh)) + offset += hdrlen; + else + goto found; + } + + return NULL; + +found: + if (nh == 59) + return NULL; + + *proto = nh; + return base + offset; +} diff --git a/ip.h b/ip.h new file mode 100644 index 000000000000..b2e08bc049f3 --- /dev/null +++ b/ip.h @@ -0,0 +1,86 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later + * Copyright (c) 2021 Red Hat GmbH + * Author: Stefano Brivio <sbrivio(a)redhat.com> + */ + +#ifndef IP_H +#define IP_H + +#include <netinet/ip.h> +#include <netinet/ip6.h> + +#define IN4_IS_ADDR_UNSPECIFIED(a) \ + ((a)->s_addr == htonl_constant(INADDR_ANY)) +#define IN4_IS_ADDR_BROADCAST(a) \ + ((a)->s_addr == htonl_constant(INADDR_BROADCAST)) +#define IN4_IS_ADDR_LOOPBACK(a) \ + (ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET) +#define IN4_IS_ADDR_MULTICAST(a) \ + (IN_MULTICAST(ntohl((a)->s_addr))) +#define IN4_ARE_ADDR_EQUAL(a, b) \ + (((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr) +#define IN4ADDR_LOOPBACK_INIT \ + { .s_addr = htonl_constant(INADDR_LOOPBACK) } +#define IN4ADDR_ANY_INIT \ + { .s_addr = htonl_constant(INADDR_ANY) } + +#define L2_BUF_IP4_INIT(proto) \ + { \ + .version = 4, \ + .ihl = 5, \ + .tos = 0, \ + .tot_len = 0, \ + .id = 0, \ + .frag_off = 0, \ + .ttl = 0xff, \ + .protocol = (proto), \ + .saddr = 0, \ + .daddr = 0, \ + } +#define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ + (uint32_t)htons_constant(0xff00 | (proto))) + +#define L2_BUF_IP6_INIT(proto) \ + { \ + .priority = 0, \ + .version = 6, \ + .flow_lbl = { 0 }, \ + .payload_len = 0, \ + .nexthdr = (proto), \ + .hop_limit = 255, \ + .saddr = IN6ADDR_ANY_INIT, \ + .daddr = IN6ADDR_ANY_INIT, \ + } + +struct ipv6hdr { +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wpedantic" +#if __BYTE_ORDER == __BIG_ENDIAN + uint8_t version:4, + priority:4; +#else + uint8_t priority:4, + version:4; +#endif +#pragma GCC diagnostic pop + uint8_t flow_lbl[3]; + + uint16_t payload_len; + uint8_t nexthdr; + uint8_t hop_limit; + + struct in6_addr saddr; + struct in6_addr daddr; +}; + +struct ipv6_opt_hdr { + uint8_t nexthdr; + uint8_t hdrlen; + /* + * TLV encoded option data follows. + */ +} __attribute__((packed)); /* required for some archs */ + +char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, + size_t *dlen); +#endif /* IP_H */ diff --git a/ndp.c b/ndp.c index 4c85ab8bcaee..c58f4b222b76 100644 --- a/ndp.c +++ b/ndp.c @@ -28,6 +28,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "log.h" diff --git a/port_fwd.c b/port_fwd.c index 6f6c836c57ad..e1ec31e2232c 100644 --- a/port_fwd.c +++ b/port_fwd.c @@ -21,6 +21,7 @@ #include <stdio.h> #include "util.h" +#include "ip.h" #include "port_fwd.h" #include "passt.h" #include "lineread.h" diff --git a/qrap.c b/qrap.c index 97f350a4bf0b..d59670621731 100644 --- a/qrap.c +++ b/qrap.c @@ -32,6 +32,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "arp.h" diff --git a/tap.c b/tap.c index 396dee7eef25..3ea03f720d6d 100644 --- a/tap.c +++ b/tap.c @@ -45,6 +45,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "arp.h" #include "dhcp.h" diff --git a/tcp.c b/tcp.c index 2ab443d5c3f2..45ef5146729a 100644 --- a/tcp.c +++ b/tcp.c @@ -289,6 +289,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "siphash.h" diff --git a/tcp_splice.c b/tcp_splice.c index 26d32065cd47..66575ca95a1e 100644 --- a/tcp_splice.c +++ b/tcp_splice.c @@ -49,6 +49,7 @@ #include <sys/socket.h> #include "util.h" +#include "ip.h" #include "passt.h" #include "log.h" #include "tcp_splice.h" diff --git a/udp.c b/udp.c index 933f24b81616..56b58bd8b43a 100644 --- a/udp.c +++ b/udp.c @@ -112,6 +112,7 @@ #include "checksum.h" #include "util.h" +#include "ip.h" #include "passt.h" #include "tap.h" #include "pcap.h" diff --git a/util.c b/util.c index 21b35ff94db1..f73ea1d98a09 100644 --- a/util.c +++ b/util.c @@ -30,61 +30,6 @@ #include "packet.h" #include "log.h" -#define IPV6_NH_OPT(nh) \ - ((nh) == 0 || (nh) == 43 || (nh) == 44 || (nh) == 50 || \ - (nh) == 51 || (nh) == 60 || (nh) == 135 || (nh) == 139 || \ - (nh) == 140 || (nh) == 253 || (nh) == 254) - -/** - * ipv6_l4hdr() - Find pointer to L4 header in IPv6 packet and extract protocol - * @p: Packet pool, packet number @idx has IPv6 header at @offset - * @idx: Index of packet in pool - * @offset: Pre-calculated IPv6 header offset - * @proto: Filled with L4 protocol number - * @dlen: Data length (payload excluding header extensions), set on return - * - * Return: pointer to L4 header, NULL if not found - */ -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, - size_t *dlen) -{ - const struct ipv6_opt_hdr *o; - const struct ipv6hdr *ip6h; - char *base; - int hdrlen; - uint8_t nh; - - base = packet_get(p, idx, 0, 0, NULL); - ip6h = packet_get(p, idx, offset, sizeof(*ip6h), dlen); - if (!ip6h) - return NULL; - - offset += sizeof(*ip6h); - - nh = ip6h->nexthdr; - if (!IPV6_NH_OPT(nh)) - goto found; - - while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) { - nh = o->nexthdr; - hdrlen = (o->hdrlen + 1) * 8; - - if (IPV6_NH_OPT(nh)) - offset += hdrlen; - else - goto found; - } - - return NULL; - -found: - if (nh == 59) - return NULL; - - *proto = nh; - return base + offset; -} - /** * sock_l4() - Create and bind socket for given L4, add to epoll list * @c: Execution context diff --git a/util.h b/util.h index d2320f8cc99a..f7c3dfee9972 100644 --- a/util.h +++ b/util.h @@ -110,22 +110,6 @@ #define htonl_constant(x) (__bswap_constant_32(x)) #endif -#define IN4_IS_ADDR_UNSPECIFIED(a) \ - ((a)->s_addr == htonl_constant(INADDR_ANY)) -#define IN4_IS_ADDR_BROADCAST(a) \ - ((a)->s_addr == htonl_constant(INADDR_BROADCAST)) -#define IN4_IS_ADDR_LOOPBACK(a) \ - (ntohl((a)->s_addr) >> IN_CLASSA_NSHIFT == IN_LOOPBACKNET) -#define IN4_IS_ADDR_MULTICAST(a) \ - (IN_MULTICAST(ntohl((a)->s_addr))) -#define IN4_ARE_ADDR_EQUAL(a, b) \ - (((struct in_addr *)(a))->s_addr == ((struct in_addr *)b)->s_addr) -#define IN4ADDR_LOOPBACK_INIT \ - { .s_addr = htonl_constant(INADDR_LOOPBACK) } -#define IN4ADDR_ANY_INIT \ - { .s_addr = htonl_constant(INADDR_ANY) } - - #define NS_FN_STACK_SIZE (RLIMIT_STACK_VAL * 1024 / 8) int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, void *arg); @@ -138,34 +122,6 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, (void *)(arg)); \ } while (0) -#define L2_BUF_IP4_INIT(proto) \ - { \ - .version = 4, \ - .ihl = 5, \ - .tos = 0, \ - .tot_len = 0, \ - .id = 0, \ - .frag_off = 0, \ - .ttl = 0xff, \ - .protocol = (proto), \ - .saddr = 0, \ - .daddr = 0, \ - } -#define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ - (uint32_t)htons_constant(0xff00 | (proto))) - -#define L2_BUF_IP6_INIT(proto) \ - { \ - .priority = 0, \ - .version = 6, \ - .flow_lbl = { 0 }, \ - .payload_len = 0, \ - .nexthdr = (proto), \ - .hop_limit = 255, \ - .saddr = IN6ADDR_ANY_INIT, \ - .daddr = IN6ADDR_ANY_INIT, \ - } - #define RCVBUF_BIG (2UL * 1024 * 1024) #define SNDBUF_BIG (4UL * 1024 * 1024) #define SNDBUF_SMALL (128UL * 1024) @@ -173,45 +129,13 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, #include <net/if.h> #include <limits.h> #include <stdint.h> -#include <netinet/ip6.h> #include "packet.h" struct ctx; -struct ipv6hdr { -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wpedantic" -#if __BYTE_ORDER == __BIG_ENDIAN - uint8_t version:4, - priority:4; -#else - uint8_t priority:4, - version:4; -#endif -#pragma GCC diagnostic pop - uint8_t flow_lbl[3]; - - uint16_t payload_len; - uint8_t nexthdr; - uint8_t hop_limit; - - struct in6_addr saddr; - struct in6_addr daddr; -}; - -struct ipv6_opt_hdr { - uint8_t nexthdr; - uint8_t hdrlen; - /* - * TLV encoded option data follows. - */ -} __attribute__((packed)); /* required for some archs */ - /* cppcheck-suppress funcArgNamesDifferent */ __attribute__ ((weak)) int ffsl(long int i) { return __builtin_ffsl(i); } -char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, - size_t *dlen); int sock_l4(const struct ctx *c, int af, uint8_t proto, const void *bind_addr, const char *ifname, uint16_t port, uint32_t data);-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr v2: - use csum_ip4_header() from checksum.c - use code from tcp.c and udp.c in csum_ip4_header() - use "const struct iphfr *", check is not updated by the function but by the caller. checksum.c | 17 +++++++++++++---- checksum.h | 3 ++- tap.c | 3 ++- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 22 insertions(+), 45 deletions(-) diff --git a/checksum.c b/checksum.c index 74e3742bc6f6..511b296a9a80 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum) uint16_t csum(const void *buf, size_t len, uint32_t init); /** - * csum_ip4_header() - Calculate and set IPv4 header checksum + * csum_ip4_header() - Calculate IPv4 header checksum * @ip4h: IPv4 header */ -void csum_ip4_header(struct iphdr *ip4h) +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) { - ip4h->check = 0; - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); + uint32_t sum = L2_BUF_IP4_PSUM(protocol); + + sum += tot_len; + sum += (saddr >> 16) & 0xffff; + sum += saddr & 0xffff; + sum += (daddr >> 16) & 0xffff; + sum += daddr & 0xffff; + + return ~csum_fold(sum); } /** diff --git a/checksum.h b/checksum.h index dfa705a04a24..92db73612b6e 100644 --- a/checksum.h +++ b/checksum.h @@ -13,7 +13,8 @@ struct icmp6hdr; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); -void csum_ip4_header(struct iphdr *ip4h); +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); diff --git a/tap.c b/tap.c index 3ea03f720d6d..2d590e8525a0 100644 --- a/tap.c +++ b/tap.c @@ -160,7 +160,8 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst, ip4h->protocol = proto; ip4h->saddr = src.s_addr; ip4h->daddr = dst.s_addr; - csum_ip4_header(ip4h); + ip4h->check = csum_ip4_header(ip4h->tot_len, proto, + src.s_addr, dst.s_addr); return ip4h + 1; } diff --git a/tcp.c b/tcp.c index 45ef5146729a..8887656d3ee8 100644 --- a/tcp.c +++ b/tcp.c @@ -934,23 +934,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) trace("TCP: failed to set SO_SNDBUF to %i", v); } -/** - * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one - * @buf: L2 packet buffer with final IPv4 header - */ -static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf) -{ - uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP); - - sum += buf->iph.tot_len; - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - - buf->iph.check = (uint16_t)~csum_fold(sum); -} - /** * tcp_update_check_tcp4() - Update TCP checksum from stored one * @buf: L2 packet buffer with final IPv4 header @@ -1393,10 +1376,9 @@ do { \ b->iph.saddr = a4->s_addr; b->iph.daddr = c->ip4.addr_seen.s_addr; - if (check) - b->iph.check = *check; - else - tcp_update_check_ip4(b); + b->iph.check = check ? *check : + csum_ip4_header(b->iph.tot_len, IPPROTO_TCP, + b->iph.saddr, b->iph.daddr); SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); diff --git a/udp.c b/udp.c index 56b58bd8b43a..cd34f659210b 100644 --- a/udp.c +++ b/udp.c @@ -270,23 +270,6 @@ static void udp_invert_portmap(struct udp_port_fwd *fwd) } } -/** - * udp_update_check4() - Update checksum with variable parts from stored one - * @buf: L2 packet buffer with final IPv4 header - */ -static void udp_update_check4(struct udp4_l2_buf_t *buf) -{ - uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP); - - sum += buf->iph.tot_len; - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - - buf->iph.check = (uint16_t)~csum_fold(sum); -} - /** * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses * @eth_d: Ethernet destination address, NULL if unchanged @@ -614,7 +597,8 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, b->iph.saddr = b->s_in.sin_addr.s_addr; } - udp_update_check4(b); + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.saddr, b->iph.daddr); b->uh.source = b->s_in.sin_port; b->uh.dest = htons(dstport); b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh)); -- 2.42.0
On Sat, Feb 17, 2024 at 04:07:22PM +0100, Laurent Vivier wrote:We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com>Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>--- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr v2: - use csum_ip4_header() from checksum.c - use code from tcp.c and udp.c in csum_ip4_header() - use "const struct iphfr *", check is not updated by the function but by the caller. checksum.c | 17 +++++++++++++---- checksum.h | 3 ++- tap.c | 3 ++- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 22 insertions(+), 45 deletions(-) diff --git a/checksum.c b/checksum.c index 74e3742bc6f6..511b296a9a80 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum) uint16_t csum(const void *buf, size_t len, uint32_t init); /** - * csum_ip4_header() - Calculate and set IPv4 header checksum + * csum_ip4_header() - Calculate IPv4 header checksum * @ip4h: IPv4 header */ -void csum_ip4_header(struct iphdr *ip4h) +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) { - ip4h->check = 0; - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); + uint32_t sum = L2_BUF_IP4_PSUM(protocol); + + sum += tot_len; + sum += (saddr >> 16) & 0xffff; + sum += saddr & 0xffff; + sum += (daddr >> 16) & 0xffff; + sum += daddr & 0xffff; + + return ~csum_fold(sum); } /** diff --git a/checksum.h b/checksum.h index dfa705a04a24..92db73612b6e 100644 --- a/checksum.h +++ b/checksum.h @@ -13,7 +13,8 @@ struct icmp6hdr; uint32_t sum_16b(const void *buf, size_t len); uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); -void csum_ip4_header(struct iphdr *ip4h); +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); diff --git a/tap.c b/tap.c index 3ea03f720d6d..2d590e8525a0 100644 --- a/tap.c +++ b/tap.c @@ -160,7 +160,8 @@ static void *tap_push_ip4h(char *buf, struct in_addr src, struct in_addr dst, ip4h->protocol = proto; ip4h->saddr = src.s_addr; ip4h->daddr = dst.s_addr; - csum_ip4_header(ip4h); + ip4h->check = csum_ip4_header(ip4h->tot_len, proto, + src.s_addr, dst.s_addr); return ip4h + 1; } diff --git a/tcp.c b/tcp.c index 45ef5146729a..8887656d3ee8 100644 --- a/tcp.c +++ b/tcp.c @@ -934,23 +934,6 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) trace("TCP: failed to set SO_SNDBUF to %i", v); } -/** - * tcp_update_check_ip4() - Update IPv4 with variable parts from stored one - * @buf: L2 packet buffer with final IPv4 header - */ -static void tcp_update_check_ip4(struct tcp4_l2_buf_t *buf) -{ - uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_TCP); - - sum += buf->iph.tot_len; - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - - buf->iph.check = (uint16_t)~csum_fold(sum); -} - /** * tcp_update_check_tcp4() - Update TCP checksum from stored one * @buf: L2 packet buffer with final IPv4 header @@ -1393,10 +1376,9 @@ do { \ b->iph.saddr = a4->s_addr; b->iph.daddr = c->ip4.addr_seen.s_addr; - if (check) - b->iph.check = *check; - else - tcp_update_check_ip4(b); + b->iph.check = check ? *check : + csum_ip4_header(b->iph.tot_len, IPPROTO_TCP, + b->iph.saddr, b->iph.daddr); SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); diff --git a/udp.c b/udp.c index 56b58bd8b43a..cd34f659210b 100644 --- a/udp.c +++ b/udp.c @@ -270,23 +270,6 @@ static void udp_invert_portmap(struct udp_port_fwd *fwd) } } -/** - * udp_update_check4() - Update checksum with variable parts from stored one - * @buf: L2 packet buffer with final IPv4 header - */ -static void udp_update_check4(struct udp4_l2_buf_t *buf) -{ - uint32_t sum = L2_BUF_IP4_PSUM(IPPROTO_UDP); - - sum += buf->iph.tot_len; - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - - buf->iph.check = (uint16_t)~csum_fold(sum); -} - /** * udp_update_l2_buf() - Update L2 buffers with Ethernet and IPv4 addresses * @eth_d: Ethernet destination address, NULL if unchanged @@ -614,7 +597,8 @@ static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, b->iph.saddr = b->s_in.sin_addr.s_addr; } - udp_update_check4(b); + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.saddr, b->iph.daddr); b->uh.source = b->s_in.sin_port; b->uh.dest = htons(dstport); b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh));-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Sat, 17 Feb 2024 16:07:22 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr v2: - use csum_ip4_header() from checksum.c - use code from tcp.c and udp.c in csum_ip4_header() - use "const struct iphfr *", check is not updated by the function but by the caller. checksum.c | 17 +++++++++++++---- checksum.h | 3 ++- tap.c | 3 ++- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 22 insertions(+), 45 deletions(-) diff --git a/checksum.c b/checksum.c index 74e3742bc6f6..511b296a9a80 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum) uint16_t csum(const void *buf, size_t len, uint32_t init); /** - * csum_ip4_header() - Calculate and set IPv4 header checksum + * csum_ip4_header() - Calculate IPv4 header checksum * @ip4h: IPv4 header */ -void csum_ip4_header(struct iphdr *ip4h) +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) { - ip4h->check = 0; - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); + uint32_t sum = L2_BUF_IP4_PSUM(protocol);Now that we use this macro, Coverity Scan realises that it's broken: #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ (uint32_t)htons_constant(0xff00 | (proto))) ...but proto is eight (lower) bits, so this actually ignores 'proto'. -- Stefano
On Thu, Feb 29, 2024 at 05:24:06PM +0100, Stefano Brivio wrote:On Sat, 17 Feb 2024 16:07:22 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:Uh... how so? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonWe can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr v2: - use csum_ip4_header() from checksum.c - use code from tcp.c and udp.c in csum_ip4_header() - use "const struct iphfr *", check is not updated by the function but by the caller. checksum.c | 17 +++++++++++++---- checksum.h | 3 ++- tap.c | 3 ++- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 22 insertions(+), 45 deletions(-) diff --git a/checksum.c b/checksum.c index 74e3742bc6f6..511b296a9a80 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum) uint16_t csum(const void *buf, size_t len, uint32_t init); /** - * csum_ip4_header() - Calculate and set IPv4 header checksum + * csum_ip4_header() - Calculate IPv4 header checksum * @ip4h: IPv4 header */ -void csum_ip4_header(struct iphdr *ip4h) +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) { - ip4h->check = 0; - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); + uint32_t sum = L2_BUF_IP4_PSUM(protocol);Now that we use this macro, Coverity Scan realises that it's broken: #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ (uint32_t)htons_constant(0xff00 | (proto))) ...but proto is eight (lower) bits, so this actually ignores 'proto'.
On Fri, 1 Mar 2024 10:10:52 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Feb 29, 2024 at 05:24:06PM +0100, Stefano Brivio wrote:Oops, sorry, it's not broken, and this is a false positive due to the fact that __bswap_constant_16() (which htons_constant() resolves to, on little-endian) is defined, for example in glibc, as: #define __bswap_constant_16(x) \ ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8)) and in this case the first term of the | resolves to a constant value, 0xff, because 0xffxx >> 8 is 0xff for any value of xx. I couldn't think of a "solution", yet. -- StefanoOn Sat, 17 Feb 2024 16:07:22 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:Uh... how so?We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr v2: - use csum_ip4_header() from checksum.c - use code from tcp.c and udp.c in csum_ip4_header() - use "const struct iphfr *", check is not updated by the function but by the caller. checksum.c | 17 +++++++++++++---- checksum.h | 3 ++- tap.c | 3 ++- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 22 insertions(+), 45 deletions(-) diff --git a/checksum.c b/checksum.c index 74e3742bc6f6..511b296a9a80 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum) uint16_t csum(const void *buf, size_t len, uint32_t init); /** - * csum_ip4_header() - Calculate and set IPv4 header checksum + * csum_ip4_header() - Calculate IPv4 header checksum * @ip4h: IPv4 header */ -void csum_ip4_header(struct iphdr *ip4h) +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) { - ip4h->check = 0; - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); + uint32_t sum = L2_BUF_IP4_PSUM(protocol);Now that we use this macro, Coverity Scan realises that it's broken: #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ (uint32_t)htons_constant(0xff00 | (proto))) ...but proto is eight (lower) bits, so this actually ignores 'proto'.
On Fri, Mar 01, 2024 at 08:58:45AM +0100, Stefano Brivio wrote:On Fri, 1 Mar 2024 10:10:52 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Right. This really seems overzealous of coverity: it seems like any occasion where the compiler would constant fold could result in a similar warning.On Thu, Feb 29, 2024 at 05:24:06PM +0100, Stefano Brivio wrote:Oops, sorry, it's not broken, and this is a false positive due to the fact that __bswap_constant_16() (which htons_constant() resolves to, on little-endian) is defined, for example in glibc, as: #define __bswap_constant_16(x) \ ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8)) and in this case the first term of the | resolves to a constant value, 0xff, because 0xffxx >> 8 is 0xff for any value of xx.On Sat, 17 Feb 2024 16:07:22 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:Uh... how so?We can find the same function to compute the IPv4 header checksum in tcp.c, udp.c and tap.c Use the function defined for tap.c, csum_ip4_header(), but with the code used in tcp.c and udp.c as it doesn't need a fully initialiazed IPv4 header, only protocol, tot_len, saddr and daddr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr v2: - use csum_ip4_header() from checksum.c - use code from tcp.c and udp.c in csum_ip4_header() - use "const struct iphfr *", check is not updated by the function but by the caller. checksum.c | 17 +++++++++++++---- checksum.h | 3 ++- tap.c | 3 ++- tcp.c | 24 +++--------------------- udp.c | 20 ++------------------ 5 files changed, 22 insertions(+), 45 deletions(-) diff --git a/checksum.c b/checksum.c index 74e3742bc6f6..511b296a9a80 100644 --- a/checksum.c +++ b/checksum.c @@ -57,6 +57,7 @@ #include <linux/icmpv6.h> #include "util.h" +#include "ip.h" #include "checksum.h" /* Checksums are optional for UDP over IPv4, so we usually just set @@ -116,13 +117,21 @@ uint16_t csum_fold(uint32_t sum) uint16_t csum(const void *buf, size_t len, uint32_t init); /** - * csum_ip4_header() - Calculate and set IPv4 header checksum + * csum_ip4_header() - Calculate IPv4 header checksum * @ip4h: IPv4 header */ -void csum_ip4_header(struct iphdr *ip4h) +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) { - ip4h->check = 0; - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); + uint32_t sum = L2_BUF_IP4_PSUM(protocol);Now that we use this macro, Coverity Scan realises that it's broken: #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ (uint32_t)htons_constant(0xff00 | (proto))) ...but proto is eight (lower) bits, so this actually ignores 'proto'.I couldn't think of a "solution", yet.Making it an inline function rather than a macro might be enough to convince Coverity. Otherwise we could just mark it as a false positive in the Coverity web interface. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Fri, 1 Mar 2024 23:23:41 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Fri, Mar 01, 2024 at 08:58:45AM +0100, Stefano Brivio wrote:The inline function didn't help per se, but while trying it out (with a number of variations on it) I realised that, Coverity being overzealous or not... 'proto' isn't a constant, so we shouldn't use __bswap_constant_16(), unless we want to define three constants for the Layer-4 protocols we support. Switched to htons() as it obviously ought to be, problem solved. I'll post this as follow-up patch to Laurent's series. -- StefanoOn Fri, 1 Mar 2024 10:10:52 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Right. This really seems overzealous of coverity: it seems like any occasion where the compiler would constant fold could result in a similar warning.On Thu, Feb 29, 2024 at 05:24:06PM +0100, Stefano Brivio wrote:Oops, sorry, it's not broken, and this is a false positive due to the fact that __bswap_constant_16() (which htons_constant() resolves to, on little-endian) is defined, for example in glibc, as: #define __bswap_constant_16(x) \ ((((x) >> 8) & 0xff) | (((x) & 0xff) << 8)) and in this case the first term of the | resolves to a constant value, 0xff, because 0xffxx >> 8 is 0xff for any value of xx.On Sat, 17 Feb 2024 16:07:22 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote: > +uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, > + uint32_t saddr, uint32_t daddr) > { > - ip4h->check = 0; > - ip4h->check = csum(ip4h, (size_t)ip4h->ihl * 4, 0); > + uint32_t sum = L2_BUF_IP4_PSUM(protocol); Now that we use this macro, Coverity Scan realises that it's broken: #define L2_BUF_IP4_PSUM(proto) ((uint32_t)htons_constant(0x4500) + \ (uint32_t)htons_constant(0xff00 | (proto))) ...but proto is eight (lower) bits, so this actually ignores 'proto'.Uh... how so?I couldn't think of a "solution", yet.Making it an inline function rather than a macro might be enough to convince Coverity. Otherwise we could just mark it as a false positive in the Coverity web interface.
The TCP and UDP checksums are computed using the data in the TCP/UDP payload but also some informations in the IP header (protocol, length, source and destination addresses). We add two functions, proto_ipv4_header_psum() and proto_ipv6_header_psum(), to compute the checksum of the IP header part. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr/ipv6hdr v2: - move new function to checksum.c - use _psum rather than _checksum in the name - replace csum_udp4() and csum_udp6() by the new function checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 4 ++++ tcp.c | 44 ++++++++++++++++------------------- udp.c | 10 ++++---- 4 files changed, 81 insertions(+), 44 deletions(-) diff --git a/checksum.c b/checksum.c index 511b296a9a80..55bf1340a257 100644 --- a/checksum.c +++ b/checksum.c @@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, return ~csum_fold(sum); } +/** + * proto_ipv4_header_psum() - Calculates the partial checksum of an + * IPv4 header for UDP or TCP + * @tot_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * @proto: proto Protocol number + * Returns: Partial checksum of the IPv4 header + */ +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) +{ + uint32_t psum = htons(protocol); + + psum += (saddr >> 16) & 0xffff; + psum += saddr & 0xffff; + psum += (daddr >> 16) & 0xffff; + psum += daddr & 0xffff; + psum += htons(ntohs(tot_len) - 20); + + return psum; +} + /** * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet * @udp4hr: UDP header, initialised apart from checksum @@ -150,14 +174,10 @@ void csum_udp4(struct udphdr *udp4hr, udp4hr->check = 0; if (UDP4_REAL_CHECKSUMS) { - /* UNTESTED: if we did want real UDPv4 checksums, this - * is roughly what we'd need */ - uint32_t psum = csum_fold(saddr.s_addr) - + csum_fold(daddr.s_addr) - + htons(len + sizeof(*udp4hr)) - + htons(IPPROTO_UDP); - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp4hr, sizeof(*udp4hr)); + uint32_t psum = proto_ipv4_header_psum(len, IPPROTO_UDP, + saddr.s_addr, + daddr.s_addr); + psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum); udp4hr->check = csum(payload, len, psum); } } @@ -180,6 +200,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) icmp4hr->checksum = csum(payload, len, psum); } +/** + * proto_ipv6_header_psum() - Calculates the partial checksum of an + * IPv6 header for UDP or TCP + * @payload_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * Returns: Partial checksum of the IPv6 header + */ +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr) +{ + uint32_t sum = htons(protocol) + payload_len; + + sum += sum_16b(&saddr, sizeof(saddr)); + sum += sum_16b(&daddr, sizeof(daddr)); + + return sum; +} + /** * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet * @udp6hr: UDP header, initialised apart from checksum @@ -190,14 +230,11 @@ void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len) { - /* Partial checksum for the pseudo-IPv6 header */ - uint32_t psum = sum_16b(saddr, sizeof(*saddr)) + - sum_16b(daddr, sizeof(*daddr)) + - htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP); - + uint32_t psum = proto_ipv6_header_psum(len, IPPROTO_UDP, + *saddr, *daddr); udp6hr->check = 0; - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp6hr, sizeof(*udp6hr)); + + psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum); udp6hr->check = csum(payload, len, psum); } diff --git a/checksum.h b/checksum.h index 92db73612b6e..b2b5b8e8b77e 100644 --- a/checksum.h +++ b/checksum.h @@ -15,10 +15,14 @@ uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, uint32_t saddr, uint32_t daddr); +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len); +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr); void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); diff --git a/tcp.c b/tcp.c index 8887656d3ee8..8ee252131504 100644 --- a/tcp.c +++ b/tcp.c @@ -938,39 +938,29 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) * tcp_update_check_tcp4() - Update TCP checksum from stored one * @buf: L2 packet buffer with final IPv4 header */ -static void tcp_update_check_tcp4(struct tcp4_l2_buf_t *buf) +static void tcp_update_check_tcp4(struct iphdr *iph) { - uint16_t tlen = ntohs(buf->iph.tot_len) - 20; - uint32_t sum = htons(IPPROTO_TCP); + struct tcphdr *th = (struct tcphdr *)(iph + 1); + uint16_t tlen = ntohs(iph->tot_len) - 20; + uint32_t sum = proto_ipv4_header_psum(iph->tot_len, IPPROTO_TCP, + iph->saddr, iph->daddr); - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - sum += htons(ntohs(buf->iph.tot_len) - 20); - - buf->th.check = 0; - buf->th.check = csum(&buf->th, tlen, sum); + th->check = 0; + th->check = csum(th, tlen, sum); } /** * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6 * @buf: L2 packet buffer with final IPv6 header */ -static void tcp_update_check_tcp6(struct tcp6_l2_buf_t *buf) +static void tcp_update_check_tcp6(struct ipv6hdr *ip6h) { - int len = ntohs(buf->ip6h.payload_len) + sizeof(struct ipv6hdr); - - buf->ip6h.hop_limit = IPPROTO_TCP; - buf->ip6h.version = 0; - buf->ip6h.nexthdr = 0; + struct tcphdr *th = (struct tcphdr *)(ip6h + 1); + uint32_t sum = proto_ipv6_header_psum(ip6h->payload_len, IPPROTO_TCP, + ip6h->saddr, ip6h->daddr); - buf->th.check = 0; - buf->th.check = csum(&buf->ip6h, len, 0); - - buf->ip6h.hop_limit = 255; - buf->ip6h.version = 6; - buf->ip6h.nexthdr = IPPROTO_TCP; + th->check = 0; + th->check = csum(th, ntohs(ip6h->payload_len), sum); } /** @@ -1382,7 +1372,7 @@ do { \ SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - tcp_update_check_tcp4(b); + tcp_update_check_tcp4(&b->iph); tlen = tap_iov_len(c, &b->taph, ip_len); } else { @@ -1401,7 +1391,11 @@ do { \ SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - tcp_update_check_tcp6(b); + tcp_update_check_tcp6(&b->ip6h); + + b->ip6h.hop_limit = 255; + b->ip6h.version = 6; + b->ip6h.nexthdr = IPPROTO_TCP; b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf; b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff; diff --git a/udp.c b/udp.c index cd34f659210b..e123b35a955c 100644 --- a/udp.c +++ b/udp.c @@ -670,10 +670,12 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, b->uh.source = b->s_in6.sin6_port; b->uh.dest = htons(dstport); b->uh.len = b->ip6h.payload_len; - - b->ip6h.hop_limit = IPPROTO_UDP; - b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0; - b->uh.check = csum(&b->ip6h, ip_len, 0); + b->uh.check = 0; + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), + proto_ipv6_header_psum(b->ip6h.payload_len, + IPPROTO_UDP, + b->ip6h.saddr, + b->ip6h.daddr)); b->ip6h.version = 6; b->ip6h.nexthdr = IPPROTO_UDP; b->ip6h.hop_limit = 255; -- 2.42.0
On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:The TCP and UDP checksums are computed using the data in the TCP/UDP payload but also some informations in the IP header (protocol, length, source and destination addresses). We add two functions, proto_ipv4_header_psum() and proto_ipv6_header_psum(), to compute the checksum of the IP header part. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr/ipv6hdr v2: - move new function to checksum.c - use _psum rather than _checksum in the name - replace csum_udp4() and csum_udp6() by the new function checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 4 ++++ tcp.c | 44 ++++++++++++++++------------------- udp.c | 10 ++++---- 4 files changed, 81 insertions(+), 44 deletions(-) diff --git a/checksum.c b/checksum.c index 511b296a9a80..55bf1340a257 100644 --- a/checksum.c +++ b/checksum.c @@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, return ~csum_fold(sum); } +/** + * proto_ipv4_header_psum() - Calculates the partial checksum of an + * IPv4 header for UDP or TCP + * @tot_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * @proto: proto Protocol number + * Returns: Partial checksum of the IPv4 header + */ +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) +{ + uint32_t psum = htons(protocol); + + psum += (saddr >> 16) & 0xffff; + psum += saddr & 0xffff; + psum += (daddr >> 16) & 0xffff; + psum += daddr & 0xffff; + psum += htons(ntohs(tot_len) - 20); + + return psum; +} + /** * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet * @udp4hr: UDP header, initialised apart from checksum @@ -150,14 +174,10 @@ void csum_udp4(struct udphdr *udp4hr, udp4hr->check = 0; if (UDP4_REAL_CHECKSUMS) { - /* UNTESTED: if we did want real UDPv4 checksums, this - * is roughly what we'd need */ - uint32_t psum = csum_fold(saddr.s_addr) - + csum_fold(daddr.s_addr) - + htons(len + sizeof(*udp4hr)) - + htons(IPPROTO_UDP); - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp4hr, sizeof(*udp4hr)); + uint32_t psum = proto_ipv4_header_psum(len, IPPROTO_UDP, + saddr.s_addr, + daddr.s_addr); + psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum); udp4hr->check = csum(payload, len, psum); } } @@ -180,6 +200,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) icmp4hr->checksum = csum(payload, len, psum); } +/** + * proto_ipv6_header_psum() - Calculates the partial checksum of an + * IPv6 header for UDP or TCP + * @payload_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * Returns: Partial checksum of the IPv6 header + */ +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr)Hrm, this is passing 2 16-byte IPv6 addresses by value, which might not be what we want.+{ + uint32_t sum = htons(protocol) + payload_len;Uh.. doesn't that need to be htons(payload_len + sizeof(struct ipv6hdr)) rather than simply payload_len?+ + sum += sum_16b(&saddr, sizeof(saddr)); + sum += sum_16b(&daddr, sizeof(daddr)); + + return sum; +} + /** * csum_udp6() - Calculate and set checksum for a UDP over IPv6 packet * @udp6hr: UDP header, initialised apart from checksum @@ -190,14 +230,11 @@ void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len) { - /* Partial checksum for the pseudo-IPv6 header */ - uint32_t psum = sum_16b(saddr, sizeof(*saddr)) + - sum_16b(daddr, sizeof(*daddr)) + - htons(len + sizeof(*udp6hr)) + htons(IPPROTO_UDP); - + uint32_t psum = proto_ipv6_header_psum(len, IPPROTO_UDP, + *saddr, *daddr); udp6hr->check = 0; - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp6hr, sizeof(*udp6hr)); + + psum = csum_unfolded(udp6hr, sizeof(struct udphdr), psum); udp6hr->check = csum(payload, len, psum); } diff --git a/checksum.h b/checksum.h index 92db73612b6e..b2b5b8e8b77e 100644 --- a/checksum.h +++ b/checksum.h @@ -15,10 +15,14 @@ uint16_t csum_fold(uint32_t sum); uint16_t csum_unaligned(const void *buf, size_t len, uint32_t init); uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, uint32_t saddr, uint32_t daddr); +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr); void csum_udp4(struct udphdr *udp4hr, struct in_addr saddr, struct in_addr daddr, const void *payload, size_t len); void csum_icmp4(struct icmphdr *ih, const void *payload, size_t len); +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr); void csum_udp6(struct udphdr *udp6hr, const struct in6_addr *saddr, const struct in6_addr *daddr, const void *payload, size_t len); diff --git a/tcp.c b/tcp.c index 8887656d3ee8..8ee252131504 100644 --- a/tcp.c +++ b/tcp.c @@ -938,39 +938,29 @@ static void tcp_sock_set_bufsize(const struct ctx *c, int s) * tcp_update_check_tcp4() - Update TCP checksum from stored one * @buf: L2 packet buffer with final IPv4 header */ -static void tcp_update_check_tcp4(struct tcp4_l2_buf_t *buf) +static void tcp_update_check_tcp4(struct iphdr *iph) { - uint16_t tlen = ntohs(buf->iph.tot_len) - 20; - uint32_t sum = htons(IPPROTO_TCP); + struct tcphdr *th = (struct tcphdr *)(iph + 1); + uint16_t tlen = ntohs(iph->tot_len) - 20; + uint32_t sum = proto_ipv4_header_psum(iph->tot_len, IPPROTO_TCP, + iph->saddr, iph->daddr); - sum += (buf->iph.saddr >> 16) & 0xffff; - sum += buf->iph.saddr & 0xffff; - sum += (buf->iph.daddr >> 16) & 0xffff; - sum += buf->iph.daddr & 0xffff; - sum += htons(ntohs(buf->iph.tot_len) - 20); - - buf->th.check = 0; - buf->th.check = csum(&buf->th, tlen, sum); + th->check = 0; + th->check = csum(th, tlen, sum); } /** * tcp_update_check_tcp6() - Calculate TCP checksum for IPv6 * @buf: L2 packet buffer with final IPv6 header */ -static void tcp_update_check_tcp6(struct tcp6_l2_buf_t *buf) +static void tcp_update_check_tcp6(struct ipv6hdr *ip6h) { - int len = ntohs(buf->ip6h.payload_len) + sizeof(struct ipv6hdr); - - buf->ip6h.hop_limit = IPPROTO_TCP; - buf->ip6h.version = 0; - buf->ip6h.nexthdr = 0; + struct tcphdr *th = (struct tcphdr *)(ip6h + 1); + uint32_t sum = proto_ipv6_header_psum(ip6h->payload_len, IPPROTO_TCP, + ip6h->saddr, ip6h->daddr); - buf->th.check = 0; - buf->th.check = csum(&buf->ip6h, len, 0); - - buf->ip6h.hop_limit = 255; - buf->ip6h.version = 6; - buf->ip6h.nexthdr = IPPROTO_TCP; + th->check = 0; + th->check = csum(th, ntohs(ip6h->payload_len), sum); } /** @@ -1382,7 +1372,7 @@ do { \ SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - tcp_update_check_tcp4(b); + tcp_update_check_tcp4(&b->iph); tlen = tap_iov_len(c, &b->taph, ip_len); } else { @@ -1401,7 +1391,11 @@ do { \ SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - tcp_update_check_tcp6(b); + tcp_update_check_tcp6(&b->ip6h); + + b->ip6h.hop_limit = 255; + b->ip6h.version = 6; + b->ip6h.nexthdr = IPPROTO_TCP; b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf; b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff; diff --git a/udp.c b/udp.c index cd34f659210b..e123b35a955c 100644 --- a/udp.c +++ b/udp.c @@ -670,10 +670,12 @@ static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, b->uh.source = b->s_in6.sin6_port; b->uh.dest = htons(dstport); b->uh.len = b->ip6h.payload_len; - - b->ip6h.hop_limit = IPPROTO_UDP; - b->ip6h.version = b->ip6h.nexthdr = b->uh.check = 0; - b->uh.check = csum(&b->ip6h, ip_len, 0); + b->uh.check = 0; + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), + proto_ipv6_header_psum(b->ip6h.payload_len, + IPPROTO_UDP, + b->ip6h.saddr, + b->ip6h.daddr)); b->ip6h.version = 6; b->ip6h.nexthdr = IPPROTO_UDP; b->ip6h.hop_limit = 255;-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On 2/19/24 04:08, David Gibson wrote:On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:The idea here is to avoid the pointer alignment problem (&ip6h->saddr and &ip6h->daddr can be misaligned). Is it a better solution to copy the content of ip6h->saddr and ip6h->daddr to some local variables and then provide the pointers of the local variables to proto_ipv6_header_psum()?The TCP and UDP checksums are computed using the data in the TCP/UDP payload but also some informations in the IP header (protocol, length, source and destination addresses). We add two functions, proto_ipv4_header_psum() and proto_ipv6_header_psum(), to compute the checksum of the IP header part. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr/ipv6hdr v2: - move new function to checksum.c - use _psum rather than _checksum in the name - replace csum_udp4() and csum_udp6() by the new function checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 4 ++++ tcp.c | 44 ++++++++++++++++------------------- udp.c | 10 ++++---- 4 files changed, 81 insertions(+), 44 deletions(-) diff --git a/checksum.c b/checksum.c index 511b296a9a80..55bf1340a257 100644 --- a/checksum.c +++ b/checksum.c @@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, return ~csum_fold(sum); } +/** + * proto_ipv4_header_psum() - Calculates the partial checksum of an + * IPv4 header for UDP or TCP + * @tot_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * @proto: proto Protocol number + * Returns: Partial checksum of the IPv4 header + */ +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) +{ + uint32_t psum = htons(protocol); + + psum += (saddr >> 16) & 0xffff; + psum += saddr & 0xffff; + psum += (daddr >> 16) & 0xffff; + psum += daddr & 0xffff; + psum += htons(ntohs(tot_len) - 20); + + return psum; +} + /** * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet * @udp4hr: UDP header, initialised apart from checksum @@ -150,14 +174,10 @@ void csum_udp4(struct udphdr *udp4hr, udp4hr->check = 0; if (UDP4_REAL_CHECKSUMS) { - /* UNTESTED: if we did want real UDPv4 checksums, this - * is roughly what we'd need */ - uint32_t psum = csum_fold(saddr.s_addr) - + csum_fold(daddr.s_addr) - + htons(len + sizeof(*udp4hr)) - + htons(IPPROTO_UDP); - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp4hr, sizeof(*udp4hr)); + uint32_t psum = proto_ipv4_header_psum(len, IPPROTO_UDP, + saddr.s_addr, + daddr.s_addr); + psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum); udp4hr->check = csum(payload, len, psum); } } @@ -180,6 +200,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) icmp4hr->checksum = csum(payload, len, psum); } +/** + * proto_ipv6_header_psum() - Calculates the partial checksum of an + * IPv6 header for UDP or TCP + * @payload_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * Returns: Partial checksum of the IPv6 header + */ +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr)Hrm, this is passing 2 16-byte IPv6 addresses by value, which might not be what we want.payload_len is: - b->ip6h.payload_len (from udp_update_hdr6()) - ip6h->payload_len (from tcp_update_check_tcp6()) and in ip6h payload_len is: - htons(udp6_l2_mh_sock[n].msg_len + sizeof(b->uh)) (from udp_update_hdr6()) - htons(plen + sizeof(struct tcphdr)) (from tcp_fill_ipv6_header()) So this is correct... but csum_udp6() uses "len" from tap_udp6_send(), so there is a bug here. but there is also a problem with proto_ipv4_header_psum() that need host endianness and tcp_update_check_tcp4() provides network endianness... The first idea was to use the value from ip6h payload_len as it is already computed. But mixing network endianness and host endianness appears to be a bad idea... Thanks, Laurent+{ + uint32_t sum = htons(protocol) + payload_len;Uh.. doesn't that need to be htons(payload_len + sizeof(struct ipv6hdr)) rather than simply payload_len?
On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:On 2/19/24 04:08, David Gibson wrote:Ah, right. That's a neat idea, but I'm not sure it really helps: I think it will just move the misaligned access from inside the function to the call site, where we try to marshal the parameter from something unaligned.On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote:The idea here is to avoid the pointer alignment problem (&ip6h->saddr and &ip6h->daddr can be misaligned).The TCP and UDP checksums are computed using the data in the TCP/UDP payload but also some informations in the IP header (protocol, length, source and destination addresses). We add two functions, proto_ipv4_header_psum() and proto_ipv6_header_psum(), to compute the checksum of the IP header part. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - function parameters provide tot_len, saddr, daddr and protocol rather than an iphdr/ipv6hdr v2: - move new function to checksum.c - use _psum rather than _checksum in the name - replace csum_udp4() and csum_udp6() by the new function checksum.c | 67 ++++++++++++++++++++++++++++++++++++++++++------------ checksum.h | 4 ++++ tcp.c | 44 ++++++++++++++++------------------- udp.c | 10 ++++---- 4 files changed, 81 insertions(+), 44 deletions(-) diff --git a/checksum.c b/checksum.c index 511b296a9a80..55bf1340a257 100644 --- a/checksum.c +++ b/checksum.c @@ -134,6 +134,30 @@ uint16_t csum_ip4_header(uint16_t tot_len, uint8_t protocol, return ~csum_fold(sum); } +/** + * proto_ipv4_header_psum() - Calculates the partial checksum of an + * IPv4 header for UDP or TCP + * @tot_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * @proto: proto Protocol number + * Returns: Partial checksum of the IPv4 header + */ +uint32_t proto_ipv4_header_psum(uint16_t tot_len, uint8_t protocol, + uint32_t saddr, uint32_t daddr) +{ + uint32_t psum = htons(protocol); + + psum += (saddr >> 16) & 0xffff; + psum += saddr & 0xffff; + psum += (daddr >> 16) & 0xffff; + psum += daddr & 0xffff; + psum += htons(ntohs(tot_len) - 20); + + return psum; +} + /** * csum_udp4() - Calculate and set checksum for a UDP over IPv4 packet * @udp4hr: UDP header, initialised apart from checksum @@ -150,14 +174,10 @@ void csum_udp4(struct udphdr *udp4hr, udp4hr->check = 0; if (UDP4_REAL_CHECKSUMS) { - /* UNTESTED: if we did want real UDPv4 checksums, this - * is roughly what we'd need */ - uint32_t psum = csum_fold(saddr.s_addr) - + csum_fold(daddr.s_addr) - + htons(len + sizeof(*udp4hr)) - + htons(IPPROTO_UDP); - /* Add in partial checksum for the UDP header alone */ - psum += sum_16b(udp4hr, sizeof(*udp4hr)); + uint32_t psum = proto_ipv4_header_psum(len, IPPROTO_UDP, + saddr.s_addr, + daddr.s_addr); + psum = csum_unfolded(udp4hr, sizeof(struct udphdr), psum); udp4hr->check = csum(payload, len, psum); } } @@ -180,6 +200,26 @@ void csum_icmp4(struct icmphdr *icmp4hr, const void *payload, size_t len) icmp4hr->checksum = csum(payload, len, psum); } +/** + * proto_ipv6_header_psum() - Calculates the partial checksum of an + * IPv6 header for UDP or TCP + * @payload_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * Returns: Partial checksum of the IPv6 header + */ +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr)Hrm, this is passing 2 16-byte IPv6 addresses by value, which might not be what we want.Is it a better solution to copy the content of ip6h->saddr and ip6h->daddr to some local variables and then provide the pointers of the local variables to proto_ipv6_header_psum()?Honestly, I'm not sure.Ah, right. Not sure why I thought the ip6h length needed to be included. As a rule htons(x) + y is always suspect, because you generally can't do math on values that aren't host endian. We get away with it in these csum functions because the way they're folded means the answers end up the same - as long as we're consistent about it, anyway.payload_len is: - b->ip6h.payload_len (from udp_update_hdr6()) - ip6h->payload_len (from tcp_update_check_tcp6()) and in ip6h payload_len is: - htons(udp6_l2_mh_sock[n].msg_len + sizeof(b->uh)) (from udp_update_hdr6()) - htons(plen + sizeof(struct tcphdr)) (from tcp_fill_ipv6_header()) So this is correct... but+{ + uint32_t sum = htons(protocol) + payload_len;Uh.. doesn't that need to be htons(payload_len + sizeof(struct ipv6hdr)) rather than simply payload_len?csum_udp6() uses "len" from tap_udp6_send(), so there is a bug here. but there is also a problem with proto_ipv4_header_psum() that need host endianness and tcp_update_check_tcp4() provides network endianness... The first idea was to use the value from ip6h payload_len as it is already computed. But mixing network endianness and host endianness appears to be a bad idea...Right. As a rule I really dislike putting non-host-endian values in a plain u16 local or parameter, because it's really easy to think it's just a number, rather than a funny encoding of a number. Likewise, I think it's a lot easier to keep track of things if every field of a struct has a strict endianness, which we never change in place, even temporarily. (Ideally, in fact, I'd prefer to see non-host-endian values always in an encapsulating type that won't let you do math on them, but that's not always practical in C). -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Thu, 29 Feb 2024 11:38:53 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:I haven't tested this yet, but note that this is generally okay: the problem is *dereferencing* an unaligned pointer. But if you load memory from an aligned pointer, and extract a value from this memory, it's all fine. Speaking MIPS, this is not safe on all CPU models: la $1, 1002 # s1 now contains the value 1002 lw $2, 0($1) # load word from memory at 1002 + 0 into s2 but this is: la $1, 1000 # s1 now contains the value 1000 la $2, 1004 # s3 now contains the value 1004 lw $3, 0($1) # load word from memory at 1000 + 0 into s3 lw $4, 0($3) # load word from memory at 1004 + 0 into s4 sll $5, $3, 16 # 16-bit shift left s3 into s5 srl $6, $4, 16 # 16-bit shift right s4 into s6 or $2, $5, $6 # OR s5 and s6 into s2 On x86, as far as I know, mov will digest the equivalent of the first version without issues.On 2/19/24 04:08, David Gibson wrote:Ah, right. That's a neat idea, but I'm not sure it really helps: I think it will just move the misaligned access from inside the function to the call site, where we try to marshal the parameter from something unaligned.On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: [...]The idea here is to avoid the pointer alignment problem (&ip6h->saddr and &ip6h->daddr can be misaligned).+/** + * proto_ipv6_header_psum() - Calculates the partial checksum of an + * IPv6 header for UDP or TCP + * @payload_len: Payload length + * @proto: Protocol number + * @saddr: Source address + * @daddr: Destination address + * Returns: Partial checksum of the IPv6 header + */ +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, + struct in6_addr saddr, struct in6_addr daddr)Hrm, this is passing 2 16-byte IPv6 addresses by value, which might not be what we want.I think it's pretty much the same. Let the compiler pass 16-byte variables by value, and it will generally do this for us, but only if needed. -- StefanoIs it a better solution to copy the content of ip6h->saddr and ip6h->daddr to some local variables and then provide the pointers of the local variables to proto_ipv6_header_psum()?Honestly, I'm not sure.
On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:On Thu, 29 Feb 2024 11:38:53 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Right, that's kind of what I'm getting at. Assuming this value starts in an unaligned buffer, then in order to pass this by value the caller will need to load from that unaligned pointer. AFAIK, the compiler will base the type of loads only on the pointed to type, which isn't changed whether we dereference in the caller or the callee.On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:I haven't tested this yet, but note that this is generally okay: the problem is *dereferencing* an unaligned pointer. But if you load memory from an aligned pointer, and extract a value from this memory, it's all fine.On 2/19/24 04:08, David Gibson wrote:Ah, right. That's a neat idea, but I'm not sure it really helps: I think it will just move the misaligned access from inside the function to the call site, where we try to marshal the parameter from something unaligned.On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: [...] > +/** > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > + * IPv6 header for UDP or TCP > + * @payload_len: Payload length > + * @proto: Protocol number > + * @saddr: Source address > + * @daddr: Destination address > + * Returns: Partial checksum of the IPv6 header > + */ > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > + struct in6_addr saddr, struct in6_addr daddr) Hrm, this is passing 2 16-byte IPv6 addresses by value, which might not be what we want.The idea here is to avoid the pointer alignment problem (&ip6h->saddr and &ip6h->daddr can be misaligned).Speaking MIPS, this is not safe on all CPU models: la $1, 1002 # s1 now contains the value 1002 lw $2, 0($1) # load word from memory at 1002 + 0 into s2 but this is: la $1, 1000 # s1 now contains the value 1000 la $2, 1004 # s3 now contains the value 1004 lw $3, 0($1) # load word from memory at 1000 + 0 into s3 lw $4, 0($3) # load word from memory at 1004 + 0 into s4 sll $5, $3, 16 # 16-bit shift left s3 into s5 srl $6, $4, 16 # 16-bit shift right s4 into s6 or $2, $5, $6 # OR s5 and s6 into s2Right, but I don't think merely moving the dereference to the caller will necessarily induce the compiler to generate this rather than the former.On x86, as far as I know, mov will digest the equivalent of the first version without issues.-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonI think it's pretty much the same. Let the compiler pass 16-byte variables by value, and it will generally do this for us, but only if needed.Is it a better solution to copy the content of ip6h->saddr and ip6h->daddr to some local variables and then provide the pointers of the local variables to proto_ipv6_header_psum()?Honestly, I'm not sure.
On Thu, 29 Feb 2024 19:49:09 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:Oh, oops, I didn't realise this was the case (I haven't reviewed the patch yet). -- StefanoOn Thu, 29 Feb 2024 11:38:53 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Right, that's kind of what I'm getting at. Assuming this value starts in an unaligned buffer, then in order to pass this by value the caller will need to load from that unaligned pointer. AFAIK, the compiler will base the type of loads only on the pointed to type, which isn't changed whether we dereference in the caller or the callee.On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote:I haven't tested this yet, but note that this is generally okay: the problem is *dereferencing* an unaligned pointer. But if you load memory from an aligned pointer, and extract a value from this memory, it's all fine.On 2/19/24 04:08, David Gibson wrote: > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > [...] > > > +/** > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > + * IPv6 header for UDP or TCP > > + * @payload_len: Payload length > > + * @proto: Protocol number > > + * @saddr: Source address > > + * @daddr: Destination address > > + * Returns: Partial checksum of the IPv6 header > > + */ > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > + struct in6_addr saddr, struct in6_addr daddr) > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > not be what we want. The idea here is to avoid the pointer alignment problem (&ip6h->saddr and &ip6h->daddr can be misaligned).Ah, right. That's a neat idea, but I'm not sure it really helps: I think it will just move the misaligned access from inside the function to the call site, where we try to marshal the parameter from something unaligned.Speaking MIPS, this is not safe on all CPU models: la $1, 1002 # s1 now contains the value 1002 lw $2, 0($1) # load word from memory at 1002 + 0 into s2 but this is: la $1, 1000 # s1 now contains the value 1000 la $2, 1004 # s3 now contains the value 1004 lw $3, 0($1) # load word from memory at 1000 + 0 into s3 lw $4, 0($3) # load word from memory at 1004 + 0 into s4 sll $5, $3, 16 # 16-bit shift left s3 into s5 srl $6, $4, 16 # 16-bit shift right s4 into s6 or $2, $5, $6 # OR s5 and s6 into s2Right, but I don't think merely moving the dereference to the caller will necessarily induce the compiler to generate this rather than the former.
On Thu, 29 Feb 2024 09:56:25 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:On Thu, 29 Feb 2024 19:49:09 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:...no, that's not the case. Dereferencing 'iph' from struct tcp[46]_l2_buf_t is fine: struct tcp4_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct iphdr iph; /* 20 20 */ [...] } __attribute__((__packed__)); struct tcp6_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct ipv6hdr ip6h; /* 20 40 */ [...] } __attribute__((__packed__)); The problematic structures are the UDP buffers: struct udp4_l2_buf_t { struct sockaddr_in s_in; /* 0 16 */ struct tap_hdr taph; /* 16 18 */ struct iphdr iph; /* 34 20 */ [...] } __attribute__((__aligned__(4))); and for UDP, this patch is dereferencing buffer pointers only, not pointers to headers. Inconsistent if you want, but it's quite simple and it works, plus if you had half a mind (at least for UDP) to split buffers into header and payloads iovec parts... this doesn't need to be exceedingly elegant. -- StefanoOn Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:Oh, oops, I didn't realise this was the case (I haven't reviewed the patch yet).On Thu, 29 Feb 2024 11:38:53 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Right, that's kind of what I'm getting at. Assuming this value starts in an unaligned buffer, then in order to pass this by value the caller will need to load from that unaligned pointer. AFAIK, the compiler will base the type of loads only on the pointed to type, which isn't changed whether we dereference in the caller or the callee.On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > On 2/19/24 04:08, David Gibson wrote: > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > [...] > > > > > +/** > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > + * IPv6 header for UDP or TCP > > > + * @payload_len: Payload length > > > + * @proto: Protocol number > > > + * @saddr: Source address > > > + * @daddr: Destination address > > > + * Returns: Partial checksum of the IPv6 header > > > + */ > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > not be what we want. > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > &ip6h->daddr can be misaligned). Ah, right. That's a neat idea, but I'm not sure it really helps: I think it will just move the misaligned access from inside the function to the call site, where we try to marshal the parameter from something unaligned.I haven't tested this yet, but note that this is generally okay: the problem is *dereferencing* an unaligned pointer. But if you load memory from an aligned pointer, and extract a value from this memory, it's all fine.Speaking MIPS, this is not safe on all CPU models: la $1, 1002 # s1 now contains the value 1002 lw $2, 0($1) # load word from memory at 1002 + 0 into s2 but this is: la $1, 1000 # s1 now contains the value 1000 la $2, 1004 # s3 now contains the value 1004 lw $3, 0($1) # load word from memory at 1000 + 0 into s3 lw $4, 0($3) # load word from memory at 1004 + 0 into s4 sll $5, $3, 16 # 16-bit shift left s3 into s5 srl $6, $4, 16 # 16-bit shift right s4 into s6 or $2, $5, $6 # OR s5 and s6 into s2Right, but I don't think merely moving the dereference to the caller will necessarily induce the compiler to generate this rather than the former.
On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:On Thu, 29 Feb 2024 09:56:25 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:Ok... but my point remains, I'm not seeing that passing the address by value actually helps - it just seems to change whether we need to handle the unaligned load in the caller or the callee.On Thu, 29 Feb 2024 19:49:09 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:...no, that's not the case. Dereferencing 'iph' from struct tcp[46]_l2_buf_t is fine: struct tcp4_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct iphdr iph; /* 20 20 */ [...] } __attribute__((__packed__)); struct tcp6_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct ipv6hdr ip6h; /* 20 40 */ [...] } __attribute__((__packed__)); The problematic structures are the UDP buffers: struct udp4_l2_buf_t { struct sockaddr_in s_in; /* 0 16 */ struct tap_hdr taph; /* 16 18 */ struct iphdr iph; /* 34 20 */ [...] } __attribute__((__aligned__(4))); and for UDP, this patch is dereferencing buffer pointers only, not pointers to headers.On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote:Oh, oops, I didn't realise this was the case (I haven't reviewed the patch yet).On Thu, 29 Feb 2024 11:38:53 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > On 2/19/24 04:08, David Gibson wrote: > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > [...] > > > > > > > +/** > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > + * IPv6 header for UDP or TCP > > > > + * @payload_len: Payload length > > > > + * @proto: Protocol number > > > > + * @saddr: Source address > > > > + * @daddr: Destination address > > > > + * Returns: Partial checksum of the IPv6 header > > > > + */ > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > not be what we want. > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > &ip6h->daddr can be misaligned). > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > think it will just move the misaligned access from inside the function > to the call site, where we try to marshal the parameter from something > unaligned. I haven't tested this yet, but note that this is generally okay: the problem is *dereferencing* an unaligned pointer. But if you load memory from an aligned pointer, and extract a value from this memory, it's all fine.Right, that's kind of what I'm getting at. Assuming this value starts in an unaligned buffer, then in order to pass this by value the caller will need to load from that unaligned pointer. AFAIK, the compiler will base the type of loads only on the pointed to type, which isn't changed whether we dereference in the caller or the callee.Speaking MIPS, this is not safe on all CPU models: la $1, 1002 # s1 now contains the value 1002 lw $2, 0($1) # load word from memory at 1002 + 0 into s2 but this is: la $1, 1000 # s1 now contains the value 1000 la $2, 1004 # s3 now contains the value 1004 lw $3, 0($1) # load word from memory at 1000 + 0 into s3 lw $4, 0($3) # load word from memory at 1004 + 0 into s4 sll $5, $3, 16 # 16-bit shift left s3 into s5 srl $6, $4, 16 # 16-bit shift right s4 into s6 or $2, $5, $6 # OR s5 and s6 into s2Right, but I don't think merely moving the dereference to the caller will necessarily induce the compiler to generate this rather than the former.Inconsistent if you want, but it's quite simple and it works, plus if you had half a mind (at least for UDP) to split buffers into header and payloads iovec parts... this doesn't need to be exceedingly elegant.That won't actually help this, since (for now at least) I intend to handle all the headers, including UDP, as one blob. The payload in the second iov will be the UDP payload, not the IP payload. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Fri, 1 Mar 2024 10:09:39 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:For UDP and IPv4 (from 6/9): + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.saddr, b->iph.daddr); and for IPv6 (this patch): + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), + proto_ipv6_header_psum(b->ip6h.payload_len, + IPPROTO_UDP, + b->ip6h.saddr, + b->ip6h.daddr)); these cause loads starting from 'b', which is aligned, instead of passing 'iph' or 'ip6h', unaligned, and loading from there. This patch isn't just passing the address by value: it's also changing the load operation. It doesn't do iph->tot_len, it goes back to the load from 'b' with b->iph.tot_len.On Thu, 29 Feb 2024 09:56:25 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:Ok... but my point remains, I'm not seeing that passing the address by value actually helps - it just seems to change whether we need to handle the unaligned load in the caller or the callee.On Thu, 29 Feb 2024 19:49:09 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:...no, that's not the case. Dereferencing 'iph' from struct tcp[46]_l2_buf_t is fine: struct tcp4_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct iphdr iph; /* 20 20 */ [...] } __attribute__((__packed__)); struct tcp6_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct ipv6hdr ip6h; /* 20 40 */ [...] } __attribute__((__packed__)); The problematic structures are the UDP buffers: struct udp4_l2_buf_t { struct sockaddr_in s_in; /* 0 16 */ struct tap_hdr taph; /* 16 18 */ struct iphdr iph; /* 34 20 */ [...] } __attribute__((__aligned__(4))); and for UDP, this patch is dereferencing buffer pointers only, not pointers to headers.On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote: > On Thu, 29 Feb 2024 11:38:53 +1100 > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > > On 2/19/24 04:08, David Gibson wrote: > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > > > [...] > > > > > > > > > +/** > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > > + * IPv6 header for UDP or TCP > > > > > + * @payload_len: Payload length > > > > > + * @proto: Protocol number > > > > > + * @saddr: Source address > > > > > + * @daddr: Destination address > > > > > + * Returns: Partial checksum of the IPv6 header > > > > > + */ > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > > not be what we want. > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > > &ip6h->daddr can be misaligned). > > > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > > think it will just move the misaligned access from inside the function > > to the call site, where we try to marshal the parameter from something > > unaligned. > > I haven't tested this yet, but note that this is generally okay: the > problem is *dereferencing* an unaligned pointer. But if you load memory > from an aligned pointer, and extract a value from this memory, it's all > fine. Right, that's kind of what I'm getting at. Assuming this value starts in an unaligned buffer, then in order to pass this by value the caller will need to load from that unaligned pointer. AFAIK, the compiler will base the type of loads only on the pointed to type, which isn't changed whether we dereference in the caller or the callee. > > Speaking MIPS, this is not safe on all CPU models: > > la $1, 1002 # s1 now contains the value 1002 > lw $2, 0($1) # load word from memory at 1002 + 0 into s2 > > but this is: > > la $1, 1000 # s1 now contains the value 1000 > la $2, 1004 # s3 now contains the value 1004 > lw $3, 0($1) # load word from memory at 1000 + 0 into s3 > lw $4, 0($3) # load word from memory at 1004 + 0 into s4 > sll $5, $3, 16 # 16-bit shift left s3 into s5 > srl $6, $4, 16 # 16-bit shift right s4 into s6 > or $2, $5, $6 # OR s5 and s6 into s2 Right, but I don't think merely moving the dereference to the caller will necessarily induce the compiler to generate this rather than the former.Oh, oops, I didn't realise this was the case (I haven't reviewed the patch yet).Ah, right... and I guess you already told me. -- StefanoInconsistent if you want, but it's quite simple and it works, plus if you had half a mind (at least for UDP) to split buffers into header and payloads iovec parts... this doesn't need to be exceedingly elegant.That won't actually help this, since (for now at least) I intend to handle all the headers, including UDP, as one blob. The payload in the second iov will be the UDP payload, not the IP payload.
On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:On Fri, 1 Mar 2024 10:09:39 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and b->ip6h.payload_len. Just because we're computing the offset a bit differently doesn't change the load itself. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonOn Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:For UDP and IPv4 (from 6/9): + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.saddr, b->iph.daddr); and for IPv6 (this patch): + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), + proto_ipv6_header_psum(b->ip6h.payload_len, + IPPROTO_UDP, + b->ip6h.saddr, + b->ip6h.daddr)); these cause loads starting from 'b', which is aligned, instead of passing 'iph' or 'ip6h', unaligned, and loading from there.On Thu, 29 Feb 2024 09:56:25 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:Ok... but my point remains, I'm not seeing that passing the address by value actually helps - it just seems to change whether we need to handle the unaligned load in the caller or the callee.On Thu, 29 Feb 2024 19:49:09 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote: > > On Thu, 29 Feb 2024 11:38:53 +1100 > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > > > On 2/19/24 04:08, David Gibson wrote: > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > > > > > [...] > > > > > > > > > > > +/** > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > > > + * IPv6 header for UDP or TCP > > > > > > + * @payload_len: Payload length > > > > > > + * @proto: Protocol number > > > > > > + * @saddr: Source address > > > > > > + * @daddr: Destination address > > > > > > + * Returns: Partial checksum of the IPv6 header > > > > > > + */ > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > > > not be what we want. > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > > > &ip6h->daddr can be misaligned). > > > > > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > > > think it will just move the misaligned access from inside the function > > > to the call site, where we try to marshal the parameter from something > > > unaligned. > > > > I haven't tested this yet, but note that this is generally okay: the > > problem is *dereferencing* an unaligned pointer. But if you load memory > > from an aligned pointer, and extract a value from this memory, it's all > > fine. > > Right, that's kind of what I'm getting at. Assuming this value starts > in an unaligned buffer, then in order to pass this by value the caller > will need to load from that unaligned pointer. AFAIK, the compiler > will base the type of loads only on the pointed to type, which isn't > changed whether we dereference in the caller or the callee. > > > > > Speaking MIPS, this is not safe on all CPU models: > > > > la $1, 1002 # s1 now contains the value 1002 > > lw $2, 0($1) # load word from memory at 1002 + 0 into s2 > > > > but this is: > > > > la $1, 1000 # s1 now contains the value 1000 > > la $2, 1004 # s3 now contains the value 1004 > > lw $3, 0($1) # load word from memory at 1000 + 0 into s3 > > lw $4, 0($3) # load word from memory at 1004 + 0 into s4 > > sll $5, $3, 16 # 16-bit shift left s3 into s5 > > srl $6, $4, 16 # 16-bit shift right s4 into s6 > > or $2, $5, $6 # OR s5 and s6 into s2 > > Right, but I don't think merely moving the dereference to the caller > will necessarily induce the compiler to generate this rather than the > former. Oh, oops, I didn't realise this was the case (I haven't reviewed the patch yet)....no, that's not the case. Dereferencing 'iph' from struct tcp[46]_l2_buf_t is fine: struct tcp4_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct iphdr iph; /* 20 20 */ [...] } __attribute__((__packed__)); struct tcp6_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct ipv6hdr ip6h; /* 20 40 */ [...] } __attribute__((__packed__)); The problematic structures are the UDP buffers: struct udp4_l2_buf_t { struct sockaddr_in s_in; /* 0 16 */ struct tap_hdr taph; /* 16 18 */ struct iphdr iph; /* 34 20 */ [...] } __attribute__((__aligned__(4))); and for UDP, this patch is dereferencing buffer pointers only, not pointers to headers.
On Mon, 4 Mar 2024 12:54:12 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:It depends how we define "loading from" -- the problem, in general, is not the memory location per se, the problem is dereferencing memory pointers. I plan to try an example on MIPS in a bit, but meanwhile, this is what I mean: lw $2, 0($1) and $1 needs to be aligned. Then, the compiler needs to know if this: lw $2, 333($1) is fine, or if there needs to be a load from another address. However, we need to give the chance to the compiler to use an aligned pointer (that is, 'b', not '&b->iph'). -- StefanoOn Fri, 1 Mar 2024 10:09:39 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and b->ip6h.payload_len.On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote:For UDP and IPv4 (from 6/9): + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.saddr, b->iph.daddr); and for IPv6 (this patch): + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), + proto_ipv6_header_psum(b->ip6h.payload_len, + IPPROTO_UDP, + b->ip6h.saddr, + b->ip6h.daddr)); these cause loads starting from 'b', which is aligned, instead of passing 'iph' or 'ip6h', unaligned, and loading from there.On Thu, 29 Feb 2024 09:56:25 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote: > On Thu, 29 Feb 2024 19:49:09 +1100 > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote: > > > On Thu, 29 Feb 2024 11:38:53 +1100 > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > > > > On 2/19/24 04:08, David Gibson wrote: > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > > > > > > > [...] > > > > > > > > > > > > > +/** > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > > > > + * IPv6 header for UDP or TCP > > > > > > > + * @payload_len: Payload length > > > > > > > + * @proto: Protocol number > > > > > > > + * @saddr: Source address > > > > > > > + * @daddr: Destination address > > > > > > > + * Returns: Partial checksum of the IPv6 header > > > > > > > + */ > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > > > > not be what we want. > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > > > > &ip6h->daddr can be misaligned). > > > > > > > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > > > > think it will just move the misaligned access from inside the function > > > > to the call site, where we try to marshal the parameter from something > > > > unaligned. > > > > > > I haven't tested this yet, but note that this is generally okay: the > > > problem is *dereferencing* an unaligned pointer. But if you load memory > > > from an aligned pointer, and extract a value from this memory, it's all > > > fine. > > > > Right, that's kind of what I'm getting at. Assuming this value starts > > in an unaligned buffer, then in order to pass this by value the caller > > will need to load from that unaligned pointer. AFAIK, the compiler > > will base the type of loads only on the pointed to type, which isn't > > changed whether we dereference in the caller or the callee. > > > > > > > > Speaking MIPS, this is not safe on all CPU models: > > > > > > la $1, 1002 # s1 now contains the value 1002 > > > lw $2, 0($1) # load word from memory at 1002 + 0 into s2 > > > > > > but this is: > > > > > > la $1, 1000 # s1 now contains the value 1000 > > > la $2, 1004 # s3 now contains the value 1004 > > > lw $3, 0($1) # load word from memory at 1000 + 0 into s3 > > > lw $4, 0($3) # load word from memory at 1004 + 0 into s4 > > > sll $5, $3, 16 # 16-bit shift left s3 into s5 > > > srl $6, $4, 16 # 16-bit shift right s4 into s6 > > > or $2, $5, $6 # OR s5 and s6 into s2 > > > > Right, but I don't think merely moving the dereference to the caller > > will necessarily induce the compiler to generate this rather than the > > former. > > Oh, oops, I didn't realise this was the case (I haven't reviewed the > patch yet). ...no, that's not the case. Dereferencing 'iph' from struct tcp[46]_l2_buf_t is fine: struct tcp4_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct iphdr iph; /* 20 20 */ [...] } __attribute__((__packed__)); struct tcp6_l2_buf_t { uint8_t pad[2]; /* 0 2 */ struct tap_hdr taph; /* 2 18 */ struct ipv6hdr ip6h; /* 20 40 */ [...] } __attribute__((__packed__)); The problematic structures are the UDP buffers: struct udp4_l2_buf_t { struct sockaddr_in s_in; /* 0 16 */ struct tap_hdr taph; /* 16 18 */ struct iphdr iph; /* 34 20 */ [...] } __attribute__((__aligned__(4))); and for UDP, this patch is dereferencing buffer pointers only, not pointers to headers.Ok... but my point remains, I'm not seeing that passing the address by value actually helps - it just seems to change whether we need to handle the unaligned load in the caller or the callee.
On Mon, 4 Mar 2024 12:00:40 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:On Mon, 4 Mar 2024 12:54:12 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Actually, armhf first (for clarity): $ cat align.c #include <stdio.h> #include <stdint.h> struct disarray { uint8_t oops; uint32_t v1; uint32_t v2; } __attribute__((packed, aligned(__alignof__(unsigned int)))); void f1(uint32_t *v1) { *v1 += 42; } uint32_t f2(uint32_t v2) { return v2++; } int main() { struct disarray d = { 0x55, 0xaa, 0xaa }; f1(&d.v1); f2(d.v2); fprintf(stdout, "%08x %08x", d.v1, d.v2); } $ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align.c: In function ‘main’: align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 22 | f1(&d.v1); | ^~~~~ $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f1(&d.v1); 562: ab01 add r3, sp, #4 564: 3301 adds r3, #1 566: 4618 mov r0, r3 568: f7ff ffde bl 528 <f1> [...] before the call to f1(), the address in r3 is not aligned (we just added #1), despite -mno-unaligned-access. I guess gcc can only warn about that, but not fix it. This: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html says: -munaligned-access -mno-unaligned-access Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. Implying, I guess, that on those architectures unaligned accesses shouldn't be done. I think Thumb mode also has issues with this, by the way. And in f1() we just have a ldr from that address (passed on r0): void f1(uint32_t *v1) { 528: b082 sub sp, #8 52a: 9001 str r0, [sp, #4] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a $ arm-linux-gnueabihf-objdump -S --disassemble=f1 align [...] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a ...but the call to f2() is fine: we load with offset 8 from the stack pointer, shift word right, load from offset 12, shift word left, OR: $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f2(d.v2); 56c: 9b02 ldr r3, [sp, #8] 56e: 0a1b lsrs r3, r3, #8 570: f89d 200c ldrb.w r2, [sp, #12] 574: 0612 lsls r2, r2, #24 576: 4313 orrs r3, r2 578: 4618 mov r0, r3 57a: f7ff ffe0 bl 53e <f2> [...] Now on to MIPS (MIPS32): $ mips-linux-gnu-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align.c: In function ‘main’: align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 22 | f1(&d.v1); | ^~~~~ $ mips-linux-gnu-objdump -S --disassemble=main align [...] f1(&d.v1); 7bc: 27a20019 addiu v0,sp,25 7c0: 00402025 move a0,v0 7c4: 8f82802c lw v0,-32724(gp) 7c8: 0040c825 move t9,v0 7cc: 0411ffe0 bal 750 <f1> 7d0: 00000000 nop 7d4: 8fbc0010 lw gp,16(sp) [...] '&d.v1' is passed in a0, again unaligned (stack pointer plus 25). And f1() uses it just like that: $ mips-linux-gnu-objdump -S --disassemble=f1 align [...] void f1(uint32_t *v1) { 750: afa40000 sw a0,0(sp) *v1 += 42; 754: 8fa20000 lw v0,0(sp) 758: 8c420000 lw v0,0(v0) 75c: 2443002a addiu v1,v0,42 [...] while the call to f2() is, again, fine: $ mips-linux-gnu-objdump -S --disassemble=main align f2(d.v2); 7e0: 8ba2001d lwl v0,29(sp) 7e4: 9ba20020 lwr v0,32(sp) 7e8: 00402025 move a0,v0 7ec: 8f828030 lw v0,-32720(gp) 7f0: 0040c825 move t9,v0 7f4: 0411ffdf bal 774 <f2> 7f8: 00000000 nop 7fc: 8fbc0010 lw gp,16(sp) two loads, from stack pointer + 29 and stack pointer + 32. MIPS32 has lwl and lwr (the infamous US4814976A patent, now expired) to avoid load plus shift plus OR. Now, you might argue that what I'm describing here might simply be gcc's behaviour, and if gcc avoids unaligned loads as long as we don't pass unaligned pointers around, that's not any better for us -- other compilers might do things differently. And... yes, packed structures are actually a GNU extension: C standards don't say anything about loads like my f1(d.v2) call above, so all I'm showing here is that a particular compiler is fine with these accesses, but not unaligned pointers. On the other hand, this seems to be a well established behaviour, and I don't think we could realistically drop every load of unaligned *values*. Unaligned pointers, we currently don't dereference any, because gcc warns otherwise. So, practically speaking, I guess as long as we avoid dereferencing unaligned pointers, we should be fine? -- StefanoOn Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:It depends how we define "loading from" -- the problem, in general, is not the memory location per se, the problem is dereferencing memory pointers. I plan to try an example on MIPS in a bit [...]On Fri, 1 Mar 2024 10:09:39 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and b->ip6h.payload_len.On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote: > On Thu, 29 Feb 2024 09:56:25 +0100 > Stefano Brivio <sbrivio(a)redhat.com> wrote: > > > On Thu, 29 Feb 2024 19:49:09 +1100 > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote: > > > > On Thu, 29 Feb 2024 11:38:53 +1100 > > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > > > > > On 2/19/24 04:08, David Gibson wrote: > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > +/** > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > > > > > + * IPv6 header for UDP or TCP > > > > > > > > + * @payload_len: Payload length > > > > > > > > + * @proto: Protocol number > > > > > > > > + * @saddr: Source address > > > > > > > > + * @daddr: Destination address > > > > > > > > + * Returns: Partial checksum of the IPv6 header > > > > > > > > + */ > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > > > > > not be what we want. > > > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > > > > > &ip6h->daddr can be misaligned). > > > > > > > > > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > > > > > think it will just move the misaligned access from inside the function > > > > > to the call site, where we try to marshal the parameter from something > > > > > unaligned. > > > > > > > > I haven't tested this yet, but note that this is generally okay: the > > > > problem is *dereferencing* an unaligned pointer. But if you load memory > > > > from an aligned pointer, and extract a value from this memory, it's all > > > > fine. > > > > > > Right, that's kind of what I'm getting at. Assuming this value starts > > > in an unaligned buffer, then in order to pass this by value the caller > > > will need to load from that unaligned pointer. AFAIK, the compiler > > > will base the type of loads only on the pointed to type, which isn't > > > changed whether we dereference in the caller or the callee. > > > > > > > > > > > Speaking MIPS, this is not safe on all CPU models: > > > > > > > > la $1, 1002 # s1 now contains the value 1002 > > > > lw $2, 0($1) # load word from memory at 1002 + 0 into s2 > > > > > > > > but this is: > > > > > > > > la $1, 1000 # s1 now contains the value 1000 > > > > la $2, 1004 # s3 now contains the value 1004 > > > > lw $3, 0($1) # load word from memory at 1000 + 0 into s3 > > > > lw $4, 0($3) # load word from memory at 1004 + 0 into s4 > > > > sll $5, $3, 16 # 16-bit shift left s3 into s5 > > > > srl $6, $4, 16 # 16-bit shift right s4 into s6 > > > > or $2, $5, $6 # OR s5 and s6 into s2 > > > > > > Right, but I don't think merely moving the dereference to the caller > > > will necessarily induce the compiler to generate this rather than the > > > former. > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the > > patch yet). > > ...no, that's not the case. Dereferencing 'iph' from > struct tcp[46]_l2_buf_t is fine: > > struct tcp4_l2_buf_t { > uint8_t pad[2]; /* 0 2 */ > struct tap_hdr taph; /* 2 18 */ > struct iphdr iph; /* 20 20 */ > [...] > } __attribute__((__packed__)); > > struct tcp6_l2_buf_t { > uint8_t pad[2]; /* 0 2 */ > struct tap_hdr taph; /* 2 18 */ > struct ipv6hdr ip6h; /* 20 40 */ > [...] > } __attribute__((__packed__)); > > The problematic structures are the UDP buffers: > > struct udp4_l2_buf_t { > struct sockaddr_in s_in; /* 0 16 */ > struct tap_hdr taph; /* 16 18 */ > struct iphdr iph; /* 34 20 */ > [...] > } __attribute__((__aligned__(4))); > > and for UDP, this patch is dereferencing buffer pointers only, not > pointers to headers. Ok... but my point remains, I'm not seeing that passing the address by value actually helps - it just seems to change whether we need to handle the unaligned load in the caller or the callee.For UDP and IPv4 (from 6/9): + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.saddr, b->iph.daddr); and for IPv6 (this patch): + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), + proto_ipv6_header_psum(b->ip6h.payload_len, + IPPROTO_UDP, + b->ip6h.saddr, + b->ip6h.daddr)); these cause loads starting from 'b', which is aligned, instead of passing 'iph' or 'ip6h', unaligned, and loading from there.
On Mon, Mar 04, 2024 at 11:47:17PM +0100, Stefano Brivio wrote:On Mon, 4 Mar 2024 12:00:40 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:Huh. Ok, so I guess the compiler realises it's doing a load from a packed structure and generates the necessary fixup code. I thought it would only consider the type of the actually loaded value. Are you sure it still does this correctly when optimization is enabled?On Mon, 4 Mar 2024 12:54:12 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Actually, armhf first (for clarity): $ cat align.c #include <stdio.h> #include <stdint.h> struct disarray { uint8_t oops; uint32_t v1; uint32_t v2; } __attribute__((packed, aligned(__alignof__(unsigned int)))); void f1(uint32_t *v1) { *v1 += 42; } uint32_t f2(uint32_t v2) { return v2++; } int main() { struct disarray d = { 0x55, 0xaa, 0xaa }; f1(&d.v1); f2(d.v2); fprintf(stdout, "%08x %08x", d.v1, d.v2); } $ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align.c: In function ‘main’: align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 22 | f1(&d.v1); | ^~~~~ $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f1(&d.v1); 562: ab01 add r3, sp, #4 564: 3301 adds r3, #1 566: 4618 mov r0, r3 568: f7ff ffde bl 528 <f1> [...] before the call to f1(), the address in r3 is not aligned (we just added #1), despite -mno-unaligned-access. I guess gcc can only warn about that, but not fix it. This: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html says: -munaligned-access -mno-unaligned-access Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. Implying, I guess, that on those architectures unaligned accesses shouldn't be done. I think Thumb mode also has issues with this, by the way. And in f1() we just have a ldr from that address (passed on r0): void f1(uint32_t *v1) { 528: b082 sub sp, #8 52a: 9001 str r0, [sp, #4] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a $ arm-linux-gnueabihf-objdump -S --disassemble=f1 align [...] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a ...but the call to f2() is fine: we load with offset 8 from the stack pointer, shift word right, load from offset 12, shift word left, OR: $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f2(d.v2); 56c: 9b02 ldr r3, [sp, #8] 56e: 0a1b lsrs r3, r3, #8 570: f89d 200c ldrb.w r2, [sp, #12] 574: 0612 lsls r2, r2, #24 576: 4313 orrs r3, r2 578: 4618 mov r0, r3 57a: f7ff ffe0 bl 53e <f2> [...]On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote:It depends how we define "loading from" -- the problem, in general, is not the memory location per se, the problem is dereferencing memory pointers. I plan to try an example on MIPS in a bit [...]On Fri, 1 Mar 2024 10:09:39 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote: > > On Thu, 29 Feb 2024 09:56:25 +0100 > > Stefano Brivio <sbrivio(a)redhat.com> wrote: > > > > > On Thu, 29 Feb 2024 19:49:09 +1100 > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote: > > > > > On Thu, 29 Feb 2024 11:38:53 +1100 > > > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > > > > > > On 2/19/24 04:08, David Gibson wrote: > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > +/** > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > > > > > > + * IPv6 header for UDP or TCP > > > > > > > > > + * @payload_len: Payload length > > > > > > > > > + * @proto: Protocol number > > > > > > > > > + * @saddr: Source address > > > > > > > > > + * @daddr: Destination address > > > > > > > > > + * Returns: Partial checksum of the IPv6 header > > > > > > > > > + */ > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > > > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > > > > > > not be what we want. > > > > > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > > > > > > &ip6h->daddr can be misaligned). > > > > > > > > > > > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > > > > > > think it will just move the misaligned access from inside the function > > > > > > to the call site, where we try to marshal the parameter from something > > > > > > unaligned. > > > > > > > > > > I haven't tested this yet, but note that this is generally okay: the > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory > > > > > from an aligned pointer, and extract a value from this memory, it's all > > > > > fine. > > > > > > > > Right, that's kind of what I'm getting at. Assuming this value starts > > > > in an unaligned buffer, then in order to pass this by value the caller > > > > will need to load from that unaligned pointer. AFAIK, the compiler > > > > will base the type of loads only on the pointed to type, which isn't > > > > changed whether we dereference in the caller or the callee. > > > > > > > > > > > > > > Speaking MIPS, this is not safe on all CPU models: > > > > > > > > > > la $1, 1002 # s1 now contains the value 1002 > > > > > lw $2, 0($1) # load word from memory at 1002 + 0 into s2 > > > > > > > > > > but this is: > > > > > > > > > > la $1, 1000 # s1 now contains the value 1000 > > > > > la $2, 1004 # s3 now contains the value 1004 > > > > > lw $3, 0($1) # load word from memory at 1000 + 0 into s3 > > > > > lw $4, 0($3) # load word from memory at 1004 + 0 into s4 > > > > > sll $5, $3, 16 # 16-bit shift left s3 into s5 > > > > > srl $6, $4, 16 # 16-bit shift right s4 into s6 > > > > > or $2, $5, $6 # OR s5 and s6 into s2 > > > > > > > > Right, but I don't think merely moving the dereference to the caller > > > > will necessarily induce the compiler to generate this rather than the > > > > former. > > > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the > > > patch yet). > > > > ...no, that's not the case. Dereferencing 'iph' from > > struct tcp[46]_l2_buf_t is fine: > > > > struct tcp4_l2_buf_t { > > uint8_t pad[2]; /* 0 2 */ > > struct tap_hdr taph; /* 2 18 */ > > struct iphdr iph; /* 20 20 */ > > [...] > > } __attribute__((__packed__)); > > > > struct tcp6_l2_buf_t { > > uint8_t pad[2]; /* 0 2 */ > > struct tap_hdr taph; /* 2 18 */ > > struct ipv6hdr ip6h; /* 20 40 */ > > [...] > > } __attribute__((__packed__)); > > > > The problematic structures are the UDP buffers: > > > > struct udp4_l2_buf_t { > > struct sockaddr_in s_in; /* 0 16 */ > > struct tap_hdr taph; /* 16 18 */ > > struct iphdr iph; /* 34 20 */ > > [...] > > } __attribute__((__aligned__(4))); > > > > and for UDP, this patch is dereferencing buffer pointers only, not > > pointers to headers. > > Ok... but my point remains, I'm not seeing that passing the address by > value actually helps - it just seems to change whether we need to > handle the unaligned load in the caller or the callee. For UDP and IPv4 (from 6/9): + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, + b->iph.saddr, b->iph.daddr); and for IPv6 (this patch): + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), + proto_ipv6_header_psum(b->ip6h.payload_len, + IPPROTO_UDP, + b->ip6h.saddr, + b->ip6h.daddr)); these cause loads starting from 'b', which is aligned, instead of passing 'iph' or 'ip6h', unaligned, and loading from there.No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and b->ip6h.payload_len.Now on to MIPS (MIPS32): $ mips-linux-gnu-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align.c: In function ‘main’: align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 22 | f1(&d.v1); | ^~~~~ $ mips-linux-gnu-objdump -S --disassemble=main align [...] f1(&d.v1); 7bc: 27a20019 addiu v0,sp,25 7c0: 00402025 move a0,v0 7c4: 8f82802c lw v0,-32724(gp) 7c8: 0040c825 move t9,v0 7cc: 0411ffe0 bal 750 <f1> 7d0: 00000000 nop 7d4: 8fbc0010 lw gp,16(sp) [...] '&d.v1' is passed in a0, again unaligned (stack pointer plus 25). And f1() uses it just like that: $ mips-linux-gnu-objdump -S --disassemble=f1 align [...] void f1(uint32_t *v1) { 750: afa40000 sw a0,0(sp) *v1 += 42; 754: 8fa20000 lw v0,0(sp) 758: 8c420000 lw v0,0(v0) 75c: 2443002a addiu v1,v0,42 [...] while the call to f2() is, again, fine: $ mips-linux-gnu-objdump -S --disassemble=main align f2(d.v2); 7e0: 8ba2001d lwl v0,29(sp) 7e4: 9ba20020 lwr v0,32(sp) 7e8: 00402025 move a0,v0 7ec: 8f828030 lw v0,-32720(gp) 7f0: 0040c825 move t9,v0 7f4: 0411ffdf bal 774 <f2> 7f8: 00000000 nop 7fc: 8fbc0010 lw gp,16(sp) two loads, from stack pointer + 29 and stack pointer + 32. MIPS32 has lwl and lwr (the infamous US4814976A patent, now expired) to avoid load plus shift plus OR. Now, you might argue that what I'm describing here might simply be gcc's behaviour, and if gcc avoids unaligned loads as long as we don't pass unaligned pointers around, that's not any better for us -- other compilers might do things differently. And... yes, packed structures are actually a GNU extension: C standards don't say anything about loads like my f1(d.v2) call above, so all I'm showing here is that a particular compiler is fine with these accesses, but not unaligned pointers. On the other hand, this seems to be a well established behaviour, and I don't think we could realistically drop every load of unaligned *values*. Unaligned pointers, we currently don't dereference any, because gcc warns otherwise. So, practically speaking, I guess as long as we avoid dereferencing unaligned pointers, we should be fine?-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Wed, 6 Mar 2024 16:09:23 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Mon, Mar 04, 2024 at 11:47:17PM +0100, Stefano Brivio wrote:Well, it can't just do that, because otherwise we couldn't use packed structures on any architecture that doesn't support unaligned accesses, right? Once you pass a pointer to an unaligned value, though, the fixup information is lost, and the compiler models a function as simply taking a given pointer with a given type: the model doesn't include information as to where the value is stored.On Mon, 4 Mar 2024 12:00:40 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:Huh. Ok, so I guess the compiler realises it's doing a load from a packed structure and generates the necessary fixup code. I thought it would only consider the type of the actually loaded value.On Mon, 4 Mar 2024 12:54:12 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Actually, armhf first (for clarity): $ cat align.c #include <stdio.h> #include <stdint.h> struct disarray { uint8_t oops; uint32_t v1; uint32_t v2; } __attribute__((packed, aligned(__alignof__(unsigned int)))); void f1(uint32_t *v1) { *v1 += 42; } uint32_t f2(uint32_t v2) { return v2++; } int main() { struct disarray d = { 0x55, 0xaa, 0xaa }; f1(&d.v1); f2(d.v2); fprintf(stdout, "%08x %08x", d.v1, d.v2); } $ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align.c: In function ‘main’: align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 22 | f1(&d.v1); | ^~~~~ $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f1(&d.v1); 562: ab01 add r3, sp, #4 564: 3301 adds r3, #1 566: 4618 mov r0, r3 568: f7ff ffde bl 528 <f1> [...] before the call to f1(), the address in r3 is not aligned (we just added #1), despite -mno-unaligned-access. I guess gcc can only warn about that, but not fix it. This: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html says: -munaligned-access -mno-unaligned-access Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. Implying, I guess, that on those architectures unaligned accesses shouldn't be done. I think Thumb mode also has issues with this, by the way. And in f1() we just have a ldr from that address (passed on r0): void f1(uint32_t *v1) { 528: b082 sub sp, #8 52a: 9001 str r0, [sp, #4] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a $ arm-linux-gnueabihf-objdump -S --disassemble=f1 align [...] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a ...but the call to f2() is fine: we load with offset 8 from the stack pointer, shift word right, load from offset 12, shift word left, OR: $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f2(d.v2); 56c: 9b02 ldr r3, [sp, #8] 56e: 0a1b lsrs r3, r3, #8 570: f89d 200c ldrb.w r2, [sp, #12] 574: 0612 lsls r2, r2, #24 576: 4313 orrs r3, r2 578: 4618 mov r0, r3 57a: f7ff ffe0 bl 53e <f2> [...]On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote: > On Fri, 1 Mar 2024 10:09:39 +1100 > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote: > > > On Thu, 29 Feb 2024 09:56:25 +0100 > > > Stefano Brivio <sbrivio(a)redhat.com> wrote: > > > > > > > On Thu, 29 Feb 2024 19:49:09 +1100 > > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote: > > > > > > On Thu, 29 Feb 2024 11:38:53 +1100 > > > > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > > > > > > > On 2/19/24 04:08, David Gibson wrote: > > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > +/** > > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > > > > > > > + * IPv6 header for UDP or TCP > > > > > > > > > > + * @payload_len: Payload length > > > > > > > > > > + * @proto: Protocol number > > > > > > > > > > + * @saddr: Source address > > > > > > > > > > + * @daddr: Destination address > > > > > > > > > > + * Returns: Partial checksum of the IPv6 header > > > > > > > > > > + */ > > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > > > > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > > > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > > > > > > > not be what we want. > > > > > > > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > > > > > > > &ip6h->daddr can be misaligned). > > > > > > > > > > > > > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > > > > > > > think it will just move the misaligned access from inside the function > > > > > > > to the call site, where we try to marshal the parameter from something > > > > > > > unaligned. > > > > > > > > > > > > I haven't tested this yet, but note that this is generally okay: the > > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory > > > > > > from an aligned pointer, and extract a value from this memory, it's all > > > > > > fine. > > > > > > > > > > Right, that's kind of what I'm getting at. Assuming this value starts > > > > > in an unaligned buffer, then in order to pass this by value the caller > > > > > will need to load from that unaligned pointer. AFAIK, the compiler > > > > > will base the type of loads only on the pointed to type, which isn't > > > > > changed whether we dereference in the caller or the callee. > > > > > > > > > > > > > > > > > Speaking MIPS, this is not safe on all CPU models: > > > > > > > > > > > > la $1, 1002 # s1 now contains the value 1002 > > > > > > lw $2, 0($1) # load word from memory at 1002 + 0 into s2 > > > > > > > > > > > > but this is: > > > > > > > > > > > > la $1, 1000 # s1 now contains the value 1000 > > > > > > la $2, 1004 # s3 now contains the value 1004 > > > > > > lw $3, 0($1) # load word from memory at 1000 + 0 into s3 > > > > > > lw $4, 0($3) # load word from memory at 1004 + 0 into s4 > > > > > > sll $5, $3, 16 # 16-bit shift left s3 into s5 > > > > > > srl $6, $4, 16 # 16-bit shift right s4 into s6 > > > > > > or $2, $5, $6 # OR s5 and s6 into s2 > > > > > > > > > > Right, but I don't think merely moving the dereference to the caller > > > > > will necessarily induce the compiler to generate this rather than the > > > > > former. > > > > > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the > > > > patch yet). > > > > > > ...no, that's not the case. Dereferencing 'iph' from > > > struct tcp[46]_l2_buf_t is fine: > > > > > > struct tcp4_l2_buf_t { > > > uint8_t pad[2]; /* 0 2 */ > > > struct tap_hdr taph; /* 2 18 */ > > > struct iphdr iph; /* 20 20 */ > > > [...] > > > } __attribute__((__packed__)); > > > > > > struct tcp6_l2_buf_t { > > > uint8_t pad[2]; /* 0 2 */ > > > struct tap_hdr taph; /* 2 18 */ > > > struct ipv6hdr ip6h; /* 20 40 */ > > > [...] > > > } __attribute__((__packed__)); > > > > > > The problematic structures are the UDP buffers: > > > > > > struct udp4_l2_buf_t { > > > struct sockaddr_in s_in; /* 0 16 */ > > > struct tap_hdr taph; /* 16 18 */ > > > struct iphdr iph; /* 34 20 */ > > > [...] > > > } __attribute__((__aligned__(4))); > > > > > > and for UDP, this patch is dereferencing buffer pointers only, not > > > pointers to headers. > > > > Ok... but my point remains, I'm not seeing that passing the address by > > value actually helps - it just seems to change whether we need to > > handle the unaligned load in the caller or the callee. > > For UDP and IPv4 (from 6/9): > > + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, > + b->iph.saddr, b->iph.daddr); > > and for IPv6 (this patch): > > + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), > + proto_ipv6_header_psum(b->ip6h.payload_len, > + IPPROTO_UDP, > + b->ip6h.saddr, > + b->ip6h.daddr)); > > these cause loads starting from 'b', which is aligned, instead of > passing 'iph' or 'ip6h', unaligned, and loading from there. No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and b->ip6h.payload_len.It depends how we define "loading from" -- the problem, in general, is not the memory location per se, the problem is dereferencing memory pointers. I plan to try an example on MIPS in a bit [...]Are you sure it still does this correctly when optimization is enabled?I'm fairly sure, even though I couldn't find a "normative" reference (at least as far as GNU extensions are concerned) guaranteeing that (but I didn't try hard). I had to move f1() and f2() to different compilation units, otherwise with -O2 they get merged into main(): $ cat align.c #include <stdio.h> #include <stdint.h> struct disarray { uint8_t oops; uint32_t v1; uint32_t v2; } __attribute__((packed, aligned(__alignof__(unsigned int)))); void f1(uint32_t *v1); uint32_t f2(uint32_t v2); int main() { struct disarray d = { 0x55, 0xaa, 0xaa }; f1(&d.v1); f2(d.v2); fprintf(stdout, "%08x %08x", d.v1, d.v2); } $ cat align_f1.c #include <stdint.h> void f1(uint32_t *v1) { *v1 += 42; } $ cat align_f2.c #include <stdint.h> uint32_t f2(uint32_t v2) { return v2++; } $ arm-linux-gnueabihf-gcc-12 -g -O2 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align_f1.c align_f2.c align.c: In function ‘main’: align.c:18:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 18 | f1(&d.v1); | ^~~~~ ...note the usual dance before f2() is called, but not before the call to f1(): $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] struct disarray d = { 0x55, 0xaa, 0xaa }; 43e: e903 0007 stmdb r3, {r0, r1, r2} f1(&d.v1); 442: f10d 0005 add.w r0, sp, #5 446: f000 f8a5 bl 594 <f1> f2(d.v2); 44a: f89d 300c ldrb.w r3, [sp, #12] 44e: 9802 ldr r0, [sp, #8] 450: 061b lsls r3, r3, #24 452: ea43 2010 orr.w r0, r3, r0, lsr #8 456: f000 f8a1 bl 59c <f2> [...] -- Stefano
On Fri, Mar 08, 2024 at 01:08:45AM +0100, Stefano Brivio wrote:On Wed, 6 Mar 2024 16:09:23 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Hm.. yeah, I guess not, I hadn't thought it through that way. I find it difficult to reason about what will happen with packed structures, since they explicitly break rules which are otherwise pretty much universal.On Mon, Mar 04, 2024 at 11:47:17PM +0100, Stefano Brivio wrote:Well, it can't just do that, because otherwise we couldn't use packed structures on any architecture that doesn't support unaligned accesses, right?On Mon, 4 Mar 2024 12:00:40 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:Huh. Ok, so I guess the compiler realises it's doing a load from a packed structure and generates the necessary fixup code. I thought it would only consider the type of the actually loaded value.On Mon, 4 Mar 2024 12:54:12 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: > On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote: > > On Fri, 1 Mar 2024 10:09:39 +1100 > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote: > > > > On Thu, 29 Feb 2024 09:56:25 +0100 > > > > Stefano Brivio <sbrivio(a)redhat.com> wrote: > > > > > > > > > On Thu, 29 Feb 2024 19:49:09 +1100 > > > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote: > > > > > > > On Thu, 29 Feb 2024 11:38:53 +1100 > > > > > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > > > > > > > > On 2/19/24 04:08, David Gibson wrote: > > > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > > > +/** > > > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > > > > > > > > + * IPv6 header for UDP or TCP > > > > > > > > > > > + * @payload_len: Payload length > > > > > > > > > > > + * @proto: Protocol number > > > > > > > > > > > + * @saddr: Source address > > > > > > > > > > > + * @daddr: Destination address > > > > > > > > > > > + * Returns: Partial checksum of the IPv6 header > > > > > > > > > > > + */ > > > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > > > > > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > > > > > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > > > > > > > > not be what we want. > > > > > > > > > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > > > > > > > > &ip6h->daddr can be misaligned). > > > > > > > > > > > > > > > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > > > > > > > > think it will just move the misaligned access from inside the function > > > > > > > > to the call site, where we try to marshal the parameter from something > > > > > > > > unaligned. > > > > > > > > > > > > > > I haven't tested this yet, but note that this is generally okay: the > > > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory > > > > > > > from an aligned pointer, and extract a value from this memory, it's all > > > > > > > fine. > > > > > > > > > > > > Right, that's kind of what I'm getting at. Assuming this value starts > > > > > > in an unaligned buffer, then in order to pass this by value the caller > > > > > > will need to load from that unaligned pointer. AFAIK, the compiler > > > > > > will base the type of loads only on the pointed to type, which isn't > > > > > > changed whether we dereference in the caller or the callee. > > > > > > > > > > > > > > > > > > > > Speaking MIPS, this is not safe on all CPU models: > > > > > > > > > > > > > > la $1, 1002 # s1 now contains the value 1002 > > > > > > > lw $2, 0($1) # load word from memory at 1002 + 0 into s2 > > > > > > > > > > > > > > but this is: > > > > > > > > > > > > > > la $1, 1000 # s1 now contains the value 1000 > > > > > > > la $2, 1004 # s3 now contains the value 1004 > > > > > > > lw $3, 0($1) # load word from memory at 1000 + 0 into s3 > > > > > > > lw $4, 0($3) # load word from memory at 1004 + 0 into s4 > > > > > > > sll $5, $3, 16 # 16-bit shift left s3 into s5 > > > > > > > srl $6, $4, 16 # 16-bit shift right s4 into s6 > > > > > > > or $2, $5, $6 # OR s5 and s6 into s2 > > > > > > > > > > > > Right, but I don't think merely moving the dereference to the caller > > > > > > will necessarily induce the compiler to generate this rather than the > > > > > > former. > > > > > > > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the > > > > > patch yet). > > > > > > > > ...no, that's not the case. Dereferencing 'iph' from > > > > struct tcp[46]_l2_buf_t is fine: > > > > > > > > struct tcp4_l2_buf_t { > > > > uint8_t pad[2]; /* 0 2 */ > > > > struct tap_hdr taph; /* 2 18 */ > > > > struct iphdr iph; /* 20 20 */ > > > > [...] > > > > } __attribute__((__packed__)); > > > > > > > > struct tcp6_l2_buf_t { > > > > uint8_t pad[2]; /* 0 2 */ > > > > struct tap_hdr taph; /* 2 18 */ > > > > struct ipv6hdr ip6h; /* 20 40 */ > > > > [...] > > > > } __attribute__((__packed__)); > > > > > > > > The problematic structures are the UDP buffers: > > > > > > > > struct udp4_l2_buf_t { > > > > struct sockaddr_in s_in; /* 0 16 */ > > > > struct tap_hdr taph; /* 16 18 */ > > > > struct iphdr iph; /* 34 20 */ > > > > [...] > > > > } __attribute__((__aligned__(4))); > > > > > > > > and for UDP, this patch is dereferencing buffer pointers only, not > > > > pointers to headers. > > > > > > Ok... but my point remains, I'm not seeing that passing the address by > > > value actually helps - it just seems to change whether we need to > > > handle the unaligned load in the caller or the callee. > > > > For UDP and IPv4 (from 6/9): > > > > + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, > > + b->iph.saddr, b->iph.daddr); > > > > and for IPv6 (this patch): > > > > + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), > > + proto_ipv6_header_psum(b->ip6h.payload_len, > > + IPPROTO_UDP, > > + b->ip6h.saddr, > > + b->ip6h.daddr)); > > > > these cause loads starting from 'b', which is aligned, instead of > > passing 'iph' or 'ip6h', unaligned, and loading from there. > > No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and > b->ip6h.payload_len. It depends how we define "loading from" -- the problem, in general, is not the memory location per se, the problem is dereferencing memory pointers. I plan to try an example on MIPS in a bit [...]Actually, armhf first (for clarity): $ cat align.c #include <stdio.h> #include <stdint.h> struct disarray { uint8_t oops; uint32_t v1; uint32_t v2; } __attribute__((packed, aligned(__alignof__(unsigned int)))); void f1(uint32_t *v1) { *v1 += 42; } uint32_t f2(uint32_t v2) { return v2++; } int main() { struct disarray d = { 0x55, 0xaa, 0xaa }; f1(&d.v1); f2(d.v2); fprintf(stdout, "%08x %08x", d.v1, d.v2); } $ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align.c: In function ‘main’: align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 22 | f1(&d.v1); | ^~~~~ $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f1(&d.v1); 562: ab01 add r3, sp, #4 564: 3301 adds r3, #1 566: 4618 mov r0, r3 568: f7ff ffde bl 528 <f1> [...] before the call to f1(), the address in r3 is not aligned (we just added #1), despite -mno-unaligned-access. I guess gcc can only warn about that, but not fix it. This: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html says: -munaligned-access -mno-unaligned-access Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. Implying, I guess, that on those architectures unaligned accesses shouldn't be done. I think Thumb mode also has issues with this, by the way. And in f1() we just have a ldr from that address (passed on r0): void f1(uint32_t *v1) { 528: b082 sub sp, #8 52a: 9001 str r0, [sp, #4] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a $ arm-linux-gnueabihf-objdump -S --disassemble=f1 align [...] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a ...but the call to f2() is fine: we load with offset 8 from the stack pointer, shift word right, load from offset 12, shift word left, OR: $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f2(d.v2); 56c: 9b02 ldr r3, [sp, #8] 56e: 0a1b lsrs r3, r3, #8 570: f89d 200c ldrb.w r2, [sp, #12] 574: 0612 lsls r2, r2, #24 576: 4313 orrs r3, r2 578: 4618 mov r0, r3 57a: f7ff ffe0 bl 53e <f2> [...]Once you pass a pointer to an unaligned value, though, the fixup information is lost, and the compiler models a function as simply taking a given pointer with a given type: the model doesn't include information as to where the value is stored.Right. Everything you've said about this now makes sense to me with this realisation. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Use ethhdr rather than tap_hdr. Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> --- Notes: v3: - add David's R-b v2: - update function comment - move the patch earlier in the series tap.c | 10 +++++----- tap.h | 2 +- tcp.c | 8 ++++---- udp.c | 4 ++-- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/tap.c b/tap.c index 2d590e8525a0..d555f59cb6f7 100644 --- a/tap.c +++ b/tap.c @@ -443,18 +443,18 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n) } /** - * tap_update_mac() - Update tap L2 header with new Ethernet addresses - * @taph: Tap headers to update + * eth_update_mac() - Update tap L2 header with new Ethernet addresses + * @eh: Ethernet headers to update * @eth_d: Ethernet destination address, NULL if unchanged * @eth_s: Ethernet source address, NULL if unchanged */ -void tap_update_mac(struct tap_hdr *taph, +void eth_update_mac(struct ethhdr *eh, const unsigned char *eth_d, const unsigned char *eth_s) { if (eth_d) - memcpy(taph->eh.h_dest, eth_d, sizeof(taph->eh.h_dest)); + memcpy(eh->h_dest, eth_d, sizeof(eh->h_dest)); if (eth_s) - memcpy(taph->eh.h_source, eth_s, sizeof(taph->eh.h_source)); + memcpy(eh->h_source, eth_s, sizeof(eh->h_source)); } PACKET_POOL_DECL(pool_l4, UIO_MAXIOV, pkt_buf); diff --git a/tap.h b/tap.h index 466d91466c3d..437b9aa2b43f 100644 --- a/tap.h +++ b/tap.h @@ -74,7 +74,7 @@ void tap_icmp6_send(const struct ctx *c, const void *in, size_t len); int tap_send(const struct ctx *c, const void *data, size_t len); size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n); -void tap_update_mac(struct tap_hdr *taph, +void eth_update_mac(struct ethhdr *eh, const unsigned char *eth_d, const unsigned char *eth_s); void tap_listen_handler(struct ctx *c, uint32_t events); void tap_handler_pasta(struct ctx *c, uint32_t events, diff --git a/tcp.c b/tcp.c index 8ee252131504..aa03c20712f6 100644 --- a/tcp.c +++ b/tcp.c @@ -978,10 +978,10 @@ void tcp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s) struct tcp4_l2_buf_t *b4 = &tcp4_l2_buf[i]; struct tcp6_l2_buf_t *b6 = &tcp6_l2_buf[i]; - tap_update_mac(&b4->taph, eth_d, eth_s); - tap_update_mac(&b6->taph, eth_d, eth_s); - tap_update_mac(&b4f->taph, eth_d, eth_s); - tap_update_mac(&b6f->taph, eth_d, eth_s); + eth_update_mac(&b4->taph.eh, eth_d, eth_s); + eth_update_mac(&b6->taph.eh, eth_d, eth_s); + eth_update_mac(&b4f->taph.eh, eth_d, eth_s); + eth_update_mac(&b6f->taph.eh, eth_d, eth_s); } } diff --git a/udp.c b/udp.c index e123b35a955c..dc7f4559b50d 100644 --- a/udp.c +++ b/udp.c @@ -283,8 +283,8 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s) struct udp4_l2_buf_t *b4 = &udp4_l2_buf[i]; struct udp6_l2_buf_t *b6 = &udp6_l2_buf[i]; - tap_update_mac(&b4->taph, eth_d, eth_s); - tap_update_mac(&b6->taph, eth_d, eth_s); + eth_update_mac(&b4->taph.eh, eth_d, eth_s); + eth_update_mac(&b6->taph.eh, eth_d, eth_s); } } -- 2.42.0
Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function tcp_fill_header(). Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to tcp_fill_ipv4_header() and tcp_fill_ipv6_header() Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - add to sub-series part 1 v2: - extract header filling functions from "tcp: extract buffer management from tcp_send_flag()" - rename them tcp_fill_flag_header()/tcp_fill_ipv4_header(), tcp_fill_ipv6_header() - use upside-down Christmas tree arguments order - replace (void *) by (struct tcphdr *) tcp.c | 154 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 104 insertions(+), 50 deletions(-) diff --git a/tcp.c b/tcp.c index aa03c20712f6..bc57a4f6e611 100644 --- a/tcp.c +++ b/tcp.c @@ -1324,6 +1324,108 @@ void tcp_defer_handler(struct ctx *c) tcp_l2_data_buf_flush(c); } +/** + * tcp_fill_header() - Fill the TCP header fields for a given TCP segment. + * + * @th: Pointer to the TCP header structure + * @conn: Pointer to the TCP connection structure + * @seq: Sequence number + */ +static void tcp_fill_header(struct tcphdr *th, + const struct tcp_tap_conn *conn, uint32_t seq) +{ + th->source = htons(conn->fport); + th->dest = htons(conn->eport); + th->seq = htonl(seq); + th->ack_seq = htonl(conn->seq_ack_to_tap); + if (conn->events & ESTABLISHED) { + th->window = htons(conn->wnd_to_tap); + } else { + unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; + + th->window = htons(MIN(wnd, USHRT_MAX)); + } +} + +/** + * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers + * @c: Execution context + * @conn: Connection pointer + * @iph: Pointer to IPv4 header, immediately followed by a TCP header + * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: IP frame length including L2 headers, host order + */ +static size_t tcp_fill_ipv4_header(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct iphdr *iph, size_t plen, + const uint16_t *check, uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); + const struct in_addr *a4 = inany_v4(&conn->faddr); + struct tcphdr *th = (struct tcphdr *)(iph + 1); + + iph->tot_len = htons(ip_len); + iph->saddr = a4->s_addr; + iph->daddr = c->ip4.addr_seen.s_addr; + + iph->check = check ? *check : + csum_ip4_header(iph->tot_len, IPPROTO_TCP, + iph->saddr, iph->daddr); + + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp4(iph); + + return ip_len; +} + +/** + * tcp_fill_ipv6_header() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers + * @c: Execution context + * @conn: Connection pointer + * @ip6h: Pointer to IPv6 header, immediately followed by a TCP header + * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: The total length of the IPv6 packet, host order + */ +static size_t tcp_fill_ipv6_header(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct ipv6hdr *ip6h, size_t plen, + uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); + struct tcphdr *th = (struct tcphdr *)(ip6h + 1); + + ip6h->payload_len = htons(plen + sizeof(struct tcphdr)); + ip6h->saddr = conn->faddr.a6; + if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr)) + ip6h->daddr = c->ip6.addr_ll_seen; + else + ip6h->daddr = c->ip6.addr_seen; + + memset(ip6h->flow_lbl, 0, 3); + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp6(ip6h); + + ip6h->hop_limit = 255; + ip6h->version = 6; + ip6h->nexthdr = IPPROTO_TCP; + + ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf; + ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff; + ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff; + + return ip_len; +} + /** * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers * @c: Execution context @@ -1343,67 +1445,19 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c, const struct in_addr *a4 = inany_v4(&conn->faddr); size_t ip_len, tlen; -#define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq) \ -do { \ - b->th.source = htons(conn->fport); \ - b->th.dest = htons(conn->eport); \ - b->th.seq = htonl(seq); \ - b->th.ack_seq = htonl(conn->seq_ack_to_tap); \ - if (conn->events & ESTABLISHED) { \ - b->th.window = htons(conn->wnd_to_tap); \ - } else { \ - unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; \ - \ - b->th.window = htons(MIN(wnd, USHRT_MAX)); \ - } \ -} while (0) - if (a4) { struct tcp4_l2_buf_t *b = (struct tcp4_l2_buf_t *)p; - ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); - b->iph.tot_len = htons(ip_len); - b->iph.saddr = a4->s_addr; - b->iph.daddr = c->ip4.addr_seen.s_addr; - - b->iph.check = check ? *check : - csum_ip4_header(b->iph.tot_len, IPPROTO_TCP, - b->iph.saddr, b->iph.daddr); - - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - - tcp_update_check_tcp4(&b->iph); + ip_len = tcp_fill_ipv4_header(c, conn, &b->iph, plen, check, seq); tlen = tap_iov_len(c, &b->taph, ip_len); } else { struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p; - ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); - - b->ip6h.payload_len = htons(plen + sizeof(struct tcphdr)); - b->ip6h.saddr = conn->faddr.a6; - if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr)) - b->ip6h.daddr = c->ip6.addr_ll_seen; - else - b->ip6h.daddr = c->ip6.addr_seen; - - memset(b->ip6h.flow_lbl, 0, 3); - - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - - tcp_update_check_tcp6(&b->ip6h); - - b->ip6h.hop_limit = 255; - b->ip6h.version = 6; - b->ip6h.nexthdr = IPPROTO_TCP; - - b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf; - b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff; - b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff; + ip_len = tcp_fill_ipv6_header(c, conn, &b->ip6h, plen, seq); tlen = tap_iov_len(c, &b->taph, ip_len); } -#undef SET_TCP_HEADER_COMMON_V4_V6 return tlen; } -- 2.42.0
On Sat, Feb 17, 2024 at 04:07:25PM +0100, Laurent Vivier wrote:Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function tcp_fill_header(). Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to tcp_fill_ipv4_header() and tcp_fill_ipv6_header() Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - add to sub-series part 1 v2: - extract header filling functions from "tcp: extract buffer management from tcp_send_flag()" - rename them tcp_fill_flag_header()/tcp_fill_ipv4_header(), tcp_fill_ipv6_header() - use upside-down Christmas tree arguments order - replace (void *) by (struct tcphdr *) tcp.c | 154 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 104 insertions(+), 50 deletions(-) diff --git a/tcp.c b/tcp.c index aa03c20712f6..bc57a4f6e611 100644 --- a/tcp.c +++ b/tcp.c @@ -1324,6 +1324,108 @@ void tcp_defer_handler(struct ctx *c) tcp_l2_data_buf_flush(c); } +/** + * tcp_fill_header() - Fill the TCP header fields for a given TCP segment. + * + * @th: Pointer to the TCP header structure + * @conn: Pointer to the TCP connection structure + * @seq: Sequence number + */ +static void tcp_fill_header(struct tcphdr *th, + const struct tcp_tap_conn *conn, uint32_t seq) +{ + th->source = htons(conn->fport); + th->dest = htons(conn->eport); + th->seq = htonl(seq); + th->ack_seq = htonl(conn->seq_ack_to_tap); + if (conn->events & ESTABLISHED) { + th->window = htons(conn->wnd_to_tap); + } else { + unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; + + th->window = htons(MIN(wnd, USHRT_MAX)); + } +} + +/** + * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers + * @c: Execution context + * @conn: Connection pointer + * @iph: Pointer to IPv4 header, immediately followed by a TCP header + * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: IP frame length including L2 headers, host order + */ +static size_t tcp_fill_ipv4_header(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct iphdr *iph, size_t plen, + const uint16_t *check, uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); + const struct in_addr *a4 = inany_v4(&conn->faddr); + struct tcphdr *th = (struct tcphdr *)(iph + 1); + + iph->tot_len = htons(ip_len); + iph->saddr = a4->s_addr; + iph->daddr = c->ip4.addr_seen.s_addr; + + iph->check = check ? *check : + csum_ip4_header(iph->tot_len, IPPROTO_TCP, + iph->saddr, iph->daddr); + + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp4(iph); + + return ip_len; +} + +/** + * tcp_fill_ipv6_header() - Fill 802.3, IPv6, TCP headers in pre-cooked buffers + * @c: Execution context + * @conn: Connection pointer + * @ip6h: Pointer to IPv6 header, immediately followed by a TCP header + * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: The total length of the IPv6 packet, host order + */ +static size_t tcp_fill_ipv6_header(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct ipv6hdr *ip6h, size_t plen, + uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); + struct tcphdr *th = (struct tcphdr *)(ip6h + 1); + + ip6h->payload_len = htons(plen + sizeof(struct tcphdr)); + ip6h->saddr = conn->faddr.a6; + if (IN6_IS_ADDR_LINKLOCAL(&ip6h->saddr)) + ip6h->daddr = c->ip6.addr_ll_seen; + else + ip6h->daddr = c->ip6.addr_seen; + + memset(ip6h->flow_lbl, 0, 3); + + tcp_fill_header(th, conn, seq); + + tcp_update_check_tcp6(ip6h); + + ip6h->hop_limit = 255; + ip6h->version = 6; + ip6h->nexthdr = IPPROTO_TCP; + + ip6h->flow_lbl[0] = (conn->sock >> 16) & 0xf; + ip6h->flow_lbl[1] = (conn->sock >> 8) & 0xff; + ip6h->flow_lbl[2] = (conn->sock >> 0) & 0xff;IIUC, the reason part of the ip6h update is done after the TCP header update, but part before was a consequence of how we did the checksumming: we computed the pseudo-header checksum by doing a full checksum operation over the partially filled header, meaning filling the fields not in the pseudo-header had to be done afterwards. Now that you've reworked the checksumming, that's no longer necessary, so you could group all the ip6g initialisations together. Oh.. and also avoid the pre-filling of the flow_lbl with 0s before filling the real values.+ return ip_len; +} + /** * tcp_l2_buf_fill_headers() - Fill 802.3, IP, TCP headers in pre-cooked buffers * @c: Execution context @@ -1343,67 +1445,19 @@ static size_t tcp_l2_buf_fill_headers(const struct ctx *c, const struct in_addr *a4 = inany_v4(&conn->faddr); size_t ip_len, tlen; -#define SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq) \ -do { \ - b->th.source = htons(conn->fport); \ - b->th.dest = htons(conn->eport); \ - b->th.seq = htonl(seq); \ - b->th.ack_seq = htonl(conn->seq_ack_to_tap); \ - if (conn->events & ESTABLISHED) { \ - b->th.window = htons(conn->wnd_to_tap); \ - } else { \ - unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; \ - \ - b->th.window = htons(MIN(wnd, USHRT_MAX)); \ - } \ -} while (0) - if (a4) { struct tcp4_l2_buf_t *b = (struct tcp4_l2_buf_t *)p; - ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); - b->iph.tot_len = htons(ip_len); - b->iph.saddr = a4->s_addr; - b->iph.daddr = c->ip4.addr_seen.s_addr; - - b->iph.check = check ? *check : - csum_ip4_header(b->iph.tot_len, IPPROTO_TCP, - b->iph.saddr, b->iph.daddr); - - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - - tcp_update_check_tcp4(&b->iph); + ip_len = tcp_fill_ipv4_header(c, conn, &b->iph, plen, check, seq); tlen = tap_iov_len(c, &b->taph, ip_len); } else { struct tcp6_l2_buf_t *b = (struct tcp6_l2_buf_t *)p; - ip_len = plen + sizeof(struct ipv6hdr) + sizeof(struct tcphdr); - - b->ip6h.payload_len = htons(plen + sizeof(struct tcphdr)); - b->ip6h.saddr = conn->faddr.a6; - if (IN6_IS_ADDR_LINKLOCAL(&b->ip6h.saddr)) - b->ip6h.daddr = c->ip6.addr_ll_seen; - else - b->ip6h.daddr = c->ip6.addr_seen; - - memset(b->ip6h.flow_lbl, 0, 3); - - SET_TCP_HEADER_COMMON_V4_V6(b, conn, seq); - - tcp_update_check_tcp6(&b->ip6h); - - b->ip6h.hop_limit = 255; - b->ip6h.version = 6; - b->ip6h.nexthdr = IPPROTO_TCP; - - b->ip6h.flow_lbl[0] = (conn->sock >> 16) & 0xf; - b->ip6h.flow_lbl[1] = (conn->sock >> 8) & 0xff; - b->ip6h.flow_lbl[2] = (conn->sock >> 0) & 0xff; + ip_len = tcp_fill_ipv6_header(c, conn, &b->ip6h, plen, seq); tlen = tap_iov_len(c, &b->taph, ip_len); } -#undef SET_TCP_HEADER_COMMON_V4_V6 return tlen; }-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Sat, 17 Feb 2024 16:07:25 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:Replace the macro SET_TCP_HEADER_COMMON_V4_V6() by a new function tcp_fill_header(). Move IPv4 and IPv6 code from tcp_l2_buf_fill_headers() to tcp_fill_ipv4_header() and tcp_fill_ipv6_header() Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> --- Notes: v3: - add to sub-series part 1 v2: - extract header filling functions from "tcp: extract buffer management from tcp_send_flag()" - rename them tcp_fill_flag_header()/tcp_fill_ipv4_header(), tcp_fill_ipv6_header() - use upside-down Christmas tree arguments order - replace (void *) by (struct tcphdr *) tcp.c | 154 +++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 104 insertions(+), 50 deletions(-) diff --git a/tcp.c b/tcp.c index aa03c20712f6..bc57a4f6e611 100644 --- a/tcp.c +++ b/tcp.c @@ -1324,6 +1324,108 @@ void tcp_defer_handler(struct ctx *c) tcp_l2_data_buf_flush(c); } +/** + * tcp_fill_header() - Fill the TCP header fields for a given TCP segment. + * + * @th: Pointer to the TCP header structure + * @conn: Pointer to the TCP connection structure + * @seq: Sequence number + */ +static void tcp_fill_header(struct tcphdr *th, + const struct tcp_tap_conn *conn, uint32_t seq) +{ + th->source = htons(conn->fport); + th->dest = htons(conn->eport); + th->seq = htonl(seq); + th->ack_seq = htonl(conn->seq_ack_to_tap); + if (conn->events & ESTABLISHED) { + th->window = htons(conn->wnd_to_tap); + } else { + unsigned wnd = conn->wnd_to_tap << conn->ws_to_tap; + + th->window = htons(MIN(wnd, USHRT_MAX)); + } +} + +/** + * tcp_fill_ipv4_header() - Fill 802.3, IPv4, TCP headers in pre-cooked buffers + * @c: Execution context + * @conn: Connection pointer + * @iph: Pointer to IPv4 header, immediately followed by a TCP header + * @plen: Payload length (including TCP header options) + * @check: Checksum, if already known + * @seq: Sequence number for this segment + * + * Return: IP frame length including L2 headers, host order + */ +static size_t tcp_fill_ipv4_header(const struct ctx *c, + const struct tcp_tap_conn *conn, + struct iphdr *iph, size_t plen, + const uint16_t *check, uint32_t seq) +{ + size_t ip_len = plen + sizeof(struct iphdr) + sizeof(struct tcphdr); + const struct in_addr *a4 = inany_v4(&conn->faddr); + struct tcphdr *th = (struct tcphdr *)(iph + 1); + + iph->tot_len = htons(ip_len); + iph->saddr = a4->s_addr;The reasoning behind the fact that a4 isn't NULL here is relatively simple to follow: you already check for inany_v4(&conn->faddr) in the caller, if it evaluates to true, you call this. Still, it's a bit too convoluted for Coverity's taste. Could you perhaps add an ASSERT(a4) before this block to make it obvious? It's a bit annoying that we extract the address twice, but I don't see a much better alternative compared to what you did. -- Stefano
On Sat, 17 Feb 2024 16:07:16 +0100 Laurent Vivier <lvivier(a)redhat.com> wrote:v3: - add a patch that has been extracted from: "tcp: extract buffer management from tcp_send_flag()" -> "tcp: Introduce ipv4_fill_headers()/ipv6_fill_headers()" - see detailed v3 history log in each patch - I didn't address the alignment problem when we provide a pointer to a sub-structure in the internal buffer structure. (for the last patches of the series).Excluding pending comments, the series looks good to me. -- Stefano