While working on other stuff I stumbled across some patches I wrote quite a while back (part of a larger series on indefinite hiatus). These make it easier to use the pcap and tap functions with frames that aren't in a single contiguous buffer. This overlaps with some of the work in Laurent's vhost-user series, so I've included the first of his patches and integrated it with my changes. There will The first two patches have some overlap with the preliminary iovec patches in the vhost-user series (and will certainly conflict). I do think the pcap interface here is slightly better than the one in the vhost-user series, although I did ack that. Stefano, if you've already applied / run tests for Laurent's series then go ahead with it; I'll rework this on top of those. Changes since v1: * Stefano correctly pointed out that my iov_offset() function was unclear in its naming and description. Rename it to iov_skip_bytes() and change the wording around it to improve this. * Incorporate Laurent's iov helper patch, and also use iov_skip_bytes() to slightly simplify its functions. * Fix commit message for write_remainder() so that it correctly describes handling "short writes", not "sort writes", whatever that might mean. * Adjust parameter names and descriptions for pcap_multiple() for clarity. David Gibson (6): iov: Add helper to find skip over first n bytes of an io vector pcap: Update pcap_frame() to take an iovec and offset util: Add write_remainder() helper pcap: Handle short writes in pcap_frame() pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer frames tap: Use write_remainder() in tap_send_frames_passt() Laurent Vivier (1): iov: add some functions to manage iovec Makefile | 8 +-- iov.c | 198 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ iov.h | 31 +++++++++ pcap.c | 56 ++++++++-------- pcap.h | 3 +- tap.c | 41 +++--------- util.c | 33 ++++++++++ util.h | 1 + 8 files changed, 307 insertions(+), 64 deletions(-) create mode 100644 iov.c create mode 100644 iov.h -- 2.43.2
From: Laurent Vivier <lvivier(a)redhat.com> Introduce functions to copy to/from a buffer from/to an iovec array, to compute data length in in bytes of an iovec and to copy memory from an iovec to another. iov_from_buf(), iov_to_buf(), iov_size(), iov_copy(). Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Message-ID: <20240217150725.661467-2-lvivier(a)redhat.com> [dwg: Small changes to suppress cppcheck warnings] Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- Makefile | 8 +-- iov.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ iov.h | 29 ++++++++++ 3 files changed, 207 insertions(+), 4 deletions(-) create mode 100644 iov.c create mode 100644 iov.h diff --git a/Makefile b/Makefile index af4fa87e..156398b3 100644 --- a/Makefile +++ b/Makefile @@ -45,16 +45,16 @@ FLAGS += -DVERSION=\"$(VERSION)\" FLAGS += -DDUAL_STACK_SOCKETS=$(DUAL_STACK_SOCKETS) PASST_SRCS = arch.c arp.c checksum.c conf.c dhcp.c dhcpv6.c flow.c icmp.c \ - igmp.c isolation.c lineread.c log.c mld.c ndp.c netlink.c packet.c \ - passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c tcp_splice.c udp.c \ - util.c + igmp.c iov.c isolation.c lineread.c log.c mld.c ndp.c netlink.c \ + packet.c passt.c pasta.c pcap.c pif.c port_fwd.c tap.c tcp.c \ + tcp_splice.c udp.c util.c QRAP_SRCS = qrap.c SRCS = $(PASST_SRCS) $(QRAP_SRCS) MANPAGES = passt.1 pasta.1 qrap.1 PASST_HEADERS = arch.h arp.h checksum.h conf.h dhcp.h dhcpv6.h flow.h \ - flow_table.h icmp.h inany.h isolation.h lineread.h log.h ndp.h \ + flow_table.h icmp.h inany.h iov.h isolation.h lineread.h log.h ndp.h \ netlink.h packet.h passt.h pasta.h pcap.h pif.h port_fwd.h siphash.h \ tap.h tcp.h tcp_conn.h tcp_splice.h udp.h util.h HEADERS = $(PASST_HEADERS) seccomp.h diff --git a/iov.c b/iov.c new file mode 100644 index 00000000..8a48acb1 --- /dev/null +++ b/iov.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * iov.h - helpers for using (partial) iovecs. + * + * Copyright Red Hat + * Author: Laurent Vivier <lvivier(a)redhat.com> + * + * This file also contains code originally from QEMU include/qemu/iov.h + * and licensed under the following terms: + * + * Copyright (C) 2010 Red Hat, Inc. + * + * Author(s): + * Amit Shah <amit.shah(a)redhat.com> + * Michael Tokarev <mjt(a)tls.msk.ru> + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Contributions after 2012-01-13 are licensed under the terms of the + * GNU GPL, version 2 or (at your option) any later version. + */ +#include <sys/socket.h> + +#include "util.h" +#include "iov.h" + +/** + * iov_from_buf - Copy data from a buffer to an I/O vector (struct iovec) + * efficiently. + * + * @iov: Pointer to the array of struct iovec describing the + * scatter/gather I/O vector. + * @iov_cnt: Number of elements in the iov array. + * @offset: Byte offset in the iov array where copying should start. + * @buf: Pointer to the source buffer containing the data to copy. + * @bytes: Total number of bytes to copy from buf to iov. + * + * Returns: The number of bytes successfully copied. + */ +/* cppcheck-suppress unusedFunction */ +size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, const void *buf, size_t bytes) +{ + unsigned int i; + size_t copied; + + if (__builtin_constant_p(bytes) && iov_cnt && + offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) { + memcpy((char *)iov[0].iov_base + offset, buf, bytes); + return bytes; + } + + /* skipping offset bytes in the iovec */ + for (i = 0; i < iov_cnt && offset >= iov[i].iov_len; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (copied = 0; copied < bytes && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, bytes - copied); + + memcpy((char *)iov[i].iov_base + offset, (char *)buf + copied, + len); + copied += len; + offset = 0; + } + + return copied; +} + +/** + * iov_to_buf - Copy data from a scatter/gather I/O vector (struct iovec) to + * a buffer efficiently. + * + * @iov: Pointer to the array of struct iovec describing the scatter/gather + * I/O vector. + * @iov_cnt: Number of elements in the iov array. + * @offset: Offset within the first element of iov from where copying should start. + * @buf: Pointer to the destination buffer where data will be copied. + * @bytes: Total number of bytes to copy from iov to buf. + * + * Returns: The number of bytes successfully copied. + */ +/* cppcheck-suppress unusedFunction */ +size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, void *buf, size_t bytes) +{ + unsigned int i; + size_t copied; + + if (__builtin_constant_p(bytes) && iov_cnt && + offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) { + memcpy(buf, (char *)iov[0].iov_base + offset, bytes); + return bytes; + } + + /* skipping offset bytes in the iovec */ + for (i = 0; i < iov_cnt && offset >= iov[i].iov_len; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (copied = 0; copied < bytes && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, bytes - copied); + memcpy((char *)buf + copied, (char *)iov[i].iov_base + offset, + len); + copied += len; + offset = 0; + } + + return copied; +} + +/** + * iov_size - Calculate the total size of a scatter/gather I/O vector + * (struct iovec). + * + * @iov: Pointer to the array of struct iovec describing the + * scatter/gather I/O vector. + * @iov_cnt: Number of elements in the iov array. + * + * Returns: The total size in bytes. + */ +/* cppcheck-suppress unusedFunction */ +size_t iov_size(const struct iovec *iov, size_t iov_cnt) +{ + unsigned int i; + size_t len; + + for (i = 0, len = 0; i < iov_cnt; i++) + len += iov[i].iov_len; + + return len; +} + +/** + * iov_copy - Copy data from one scatter/gather I/O vector (struct iovec) to + * another. + * + * @dst_iov: Pointer to the destination array of struct iovec describing + * the scatter/gather I/O vector to copy to. + * @dst_iov_cnt: Number of elements in the destination iov array. + * @iov: Pointer to the source array of struct iovec describing + * the scatter/gather I/O vector to copy from. + * @iov_cnt: Number of elements in the source iov array. + * @offset: Offset within the source iov from where copying should start. + * @bytes: Total number of bytes to copy from iov to dst_iov. + * + * Returns: The number of elements successfully copied to the destination + * iov array. + */ +/* cppcheck-suppress unusedFunction */ +unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt, + const struct iovec *iov, size_t iov_cnt, + size_t offset, size_t bytes) +{ + unsigned int i, j; + + /* skipping offset bytes in the iovec */ + for (i = 0; i < iov_cnt && offset >= iov[i].iov_len; i++) + offset -= iov[i].iov_len; + + /* copying data */ + for (j = 0; i < iov_cnt && j < dst_iov_cnt && bytes; i++) { + size_t len = MIN(bytes, iov[i].iov_len - offset); + + dst_iov[j].iov_base = (char *)iov[i].iov_base + offset; + dst_iov[j].iov_len = len; + j++; + bytes -= len; + offset = 0; + } + + return j; +} diff --git a/iov.h b/iov.h new file mode 100644 index 00000000..ee35a75d --- /dev/null +++ b/iov.h @@ -0,0 +1,29 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * iov.c - helpers for using (partial) iovecs. + * + * Copyrigh Red Hat + * Author: Laurent Vivier <lvivier(a)redhat.com> + * + * This file also contains code originally from QEMU include/qemu/iov.h: + * + * Author(s): + * Amit Shah <amit.shah(a)redhat.com> + * Michael Tokarev <mjt(a)tls.msk.ru> + */ + +#ifndef IOVEC_H +#define IOVEC_H + +#include <unistd.h> +#include <string.h> + +size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, const void *buf, size_t bytes); +size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, + size_t offset, void *buf, size_t bytes); +size_t iov_size(const struct iovec *iov, size_t iov_cnt); +unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt, + const struct iovec *iov, size_t iov_cnt, + size_t offset, size_t bytes); +#endif /* IOVEC_H */ -- 2.43.2
On Wed, 28 Feb 2024 12:52:00 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:From: Laurent Vivier <lvivier(a)redhat.com> Introduce functions to copy to/from a buffer from/to an iovec array, to compute data length in in bytes of an iovec and to copy memory from an iovec to another. iov_from_buf(), iov_to_buf(), iov_size(), iov_copy(). Signed-off-by: Laurent Vivier <lvivier(a)redhat.com> Message-ID: <20240217150725.661467-2-lvivier(a)redhat.com> [dwg: Small changes to suppress cppcheck warnings] Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au>I have a long list of style comments on this one, but nothing functional, so I'll go ahead and apply it to unblock this series, and post fix-up patches later. -- Stefano
Several of the IOV functions in iov.c, and also tap_send_frames_passt() needs to determine which buffer element a byte offset into an IO vector lies in. Split this out into a helper function iov_skip_bytes(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- iov.c | 42 +++++++++++++++++++++++++++++++++--------- iov.h | 2 ++ tap.c | 12 +++++------- 3 files changed, 40 insertions(+), 16 deletions(-) diff --git a/iov.c b/iov.c index 8a48acb1..e3312628 100644 --- a/iov.c +++ b/iov.c @@ -25,6 +25,36 @@ #include "util.h" #include "iov.h" + +/* iov_skip_bytes() - Skip the first n bytes into an IO vector + * @iov: IO vector + * @n: Number of entries in @iov + * @vec_offset: Total byte offset into the IO vector + * @buf_offset: Offset into a single buffer of the IO vector + * + * Return: index I of individual struct iovec which contains the byte at + * @vec_offset bytes into the vector (as though all its buffers were + * contiguous). If @buf_offset is non-NULL, update it to the offset of + * that byte within @iov[I] (guaranteed to be less than @iov[I].iov_len) + * If the whole vector has <= @vec_offset bytes, return @n. + */ +size_t iov_skip_bytes(const struct iovec *iov, size_t n, + size_t vec_offset, size_t *buf_offset) +{ + size_t offset = vec_offset, i; + + for (i = 0; i < n; i++) { + if (offset < iov[i].iov_len) + break; + offset -= iov[i].iov_len; + } + + if (buf_offset) + *buf_offset = offset; + + return i; +} + /** * iov_from_buf - Copy data from a buffer to an I/O vector (struct iovec) * efficiently. @@ -51,9 +81,7 @@ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, return bytes; } - /* skipping offset bytes in the iovec */ - for (i = 0; i < iov_cnt && offset >= iov[i].iov_len; i++) - offset -= iov[i].iov_len; + i = iov_skip_bytes(iov, iov_cnt, offset, &offset); /* copying data */ for (copied = 0; copied < bytes && i < iov_cnt; i++) { @@ -94,9 +122,7 @@ size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, return bytes; } - /* skipping offset bytes in the iovec */ - for (i = 0; i < iov_cnt && offset >= iov[i].iov_len; i++) - offset -= iov[i].iov_len; + i = iov_skip_bytes(iov, iov_cnt, offset, &offset); /* copying data */ for (copied = 0; copied < bytes && i < iov_cnt; i++) { @@ -155,9 +181,7 @@ unsigned iov_copy(struct iovec *dst_iov, size_t dst_iov_cnt, { unsigned int i, j; - /* skipping offset bytes in the iovec */ - for (i = 0; i < iov_cnt && offset >= iov[i].iov_len; i++) - offset -= iov[i].iov_len; + i = iov_skip_bytes(iov, iov_cnt, offset, &offset); /* copying data */ for (j = 0; i < iov_cnt && j < dst_iov_cnt && bytes; i++) { diff --git a/iov.h b/iov.h index ee35a75d..e1becdea 100644 --- a/iov.h +++ b/iov.h @@ -18,6 +18,8 @@ #include <unistd.h> #include <string.h> +size_t iov_skip_bytes(const struct iovec *iov, size_t n, + size_t vec_offset, size_t *buf_offset); size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, size_t offset, const void *buf, size_t bytes); size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, diff --git a/tap.c b/tap.c index 396dee7e..dd11d1d4 100644 --- a/tap.c +++ b/tap.c @@ -45,6 +45,7 @@ #include "checksum.h" #include "util.h" +#include "iov.h" #include "passt.h" #include "arp.h" #include "dhcp.h" @@ -389,6 +390,7 @@ static size_t tap_send_frames_passt(const struct ctx *c, .msg_iov = (void *)iov, .msg_iovlen = n, }; + size_t buf_offset; unsigned int i; ssize_t sent; @@ -397,15 +399,11 @@ static size_t tap_send_frames_passt(const struct ctx *c, return 0; /* Check for any partial frames due to short send */ - for (i = 0; i < n; i++) { - if ((size_t)sent < iov[i].iov_len) - break; - sent -= iov[i].iov_len; - } + i = iov_skip_bytes(iov, n, sent, &buf_offset); - if (i < n && sent) { + if (i < n && buf_offset) { /* A partial frame was sent */ - tap_send_remainder(c, &iov[i], sent); + tap_send_remainder(c, &iov[i], buf_offset); i++; } -- 2.43.2
On Wed, 28 Feb 2024 12:52:01 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Several of the IOV functions in iov.c, and also tap_send_frames_passt() needs to determine which buffer element a byte offset into an IO vector lies in. Split this out into a helper function iov_skip_bytes(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- iov.c | 42 +++++++++++++++++++++++++++++++++--------- iov.h | 2 ++ tap.c | 12 +++++------- 3 files changed, 40 insertions(+), 16 deletions(-) diff --git a/iov.c b/iov.c index 8a48acb1..e3312628 100644 --- a/iov.c +++ b/iov.c @@ -25,6 +25,36 @@ #include "util.h" #include "iov.h" + +/* iov_skip_bytes() - Skip the first n bytes into an IO vector + * @iov: IO vector + * @n: Number of entries in @iov...which is a different 'n' compared to two lines above.+ * @vec_offset: Total byte offset into the IO vectorThis doesn't clearly correlate with the description of the function ("first n bytes"). Same here, I have no other comments about the series, so I'll apply this and try to improve this a bit as a follow-up.+ * @buf_offset: Offset into a single buffer of the IO vector + * + * Return: index I of individual struct iovec which contains the byte at + * @vec_offset bytes into the vector (as though all its buffers were + * contiguous). If @buf_offset is non-NULL, update it to the offset of + * that byte within @iov[I] (guaranteed to be less than @iov[I].iov_len) + * If the whole vector has <= @vec_offset bytes, return @n. + */ +size_t iov_skip_bytes(const struct iovec *iov, size_t n, + size_t vec_offset, size_t *buf_offset)-- Stefano
On Thu, Feb 29, 2024 at 09:10:02AM +0100, Stefano Brivio wrote:On Wed, 28 Feb 2024 12:52:01 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Ouch, good point.Several of the IOV functions in iov.c, and also tap_send_frames_passt() needs to determine which buffer element a byte offset into an IO vector lies in. Split this out into a helper function iov_skip_bytes(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- iov.c | 42 +++++++++++++++++++++++++++++++++--------- iov.h | 2 ++ tap.c | 12 +++++------- 3 files changed, 40 insertions(+), 16 deletions(-) diff --git a/iov.c b/iov.c index 8a48acb1..e3312628 100644 --- a/iov.c +++ b/iov.c @@ -25,6 +25,36 @@ #include "util.h" #include "iov.h" + +/* iov_skip_bytes() - Skip the first n bytes into an IO vector + * @iov: IO vector + * @n: Number of entries in @iov...which is a different 'n' compared to two lines above.Ok, thanks. I might see if I can improve this one too.+ * @vec_offset: Total byte offset into the IO vectorThis doesn't clearly correlate with the description of the function ("first n bytes"). Same here, I have no other comments about the series, so I'll apply this and try to improve this a bit as a follow-up.-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson+ * @buf_offset: Offset into a single buffer of the IO vector + * + * Return: index I of individual struct iovec which contains the byte at + * @vec_offset bytes into the vector (as though all its buffers were + * contiguous). If @buf_offset is non-NULL, update it to the offset of + * that byte within @iov[I] (guaranteed to be less than @iov[I].iov_len) + * If the whole vector has <= @vec_offset bytes, return @n. + */ +size_t iov_skip_bytes(const struct iovec *iov, size_t n, + size_t vec_offset, size_t *buf_offset)
Update the low-level helper pcap_frame() to take a struct iovec and offset within it, rather than an explicit pointer and length for the frame. This moves the handling of an offset (to skip vnet_len) from pcap_multiple() to pcap_frame(). This doesn't accomplish a great deal immediately, but will make subsequent changes easier. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/pcap.c b/pcap.c index 501d52d4..5cd7a8cc 100644 --- a/pcap.c +++ b/pcap.c @@ -67,24 +67,25 @@ struct pcap_pkthdr { /** * pcap_frame() - Capture a single frame to pcap file with given timestamp - * @pkt: Pointer to data buffer, including L2 headers - * @len: L2 packet length + * @iov: IO vector referencing buffer containing frame (with L2 headers) + * @offset: Offset of the frame from @iov->iov_base * @tv: Timestamp * * Returns: 0 on success, -errno on error writing to the file */ -static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv) +static void pcap_frame(const struct iovec *iov, size_t offset, + const struct timeval *tv) { + size_t len = iov->iov_len - offset; struct pcap_pkthdr h; h.tv_sec = tv->tv_sec; h.tv_usec = tv->tv_usec; h.caplen = h.len = len; - if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0) - return -errno; - - return 0; + if (write(pcap_fd, &h, sizeof(h)) < 0 || + write(pcap_fd, (char *)iov->iov_base + offset, len) < 0) + debug("Cannot log packet, length %zu", len); } /** @@ -94,14 +95,14 @@ static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv) */ void pcap(const char *pkt, size_t len) { + struct iovec iov = { (char *)pkt, len }; struct timeval tv; if (pcap_fd == -1) return; gettimeofday(&tv, NULL); - if (pcap_frame(pkt, len, &tv) != 0) - debug("Cannot log packet, length %zu", len); + pcap_frame(&iov, 0, &tv); } /** @@ -120,14 +121,8 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) gettimeofday(&tv, NULL); - for (i = 0; i < n; i++) { - if (pcap_frame((char *)iov[i].iov_base + offset, - iov[i].iov_len - offset, &tv) != 0) { - debug("Cannot log packet, length %zu", - iov->iov_len - offset); - return; - } - } + for (i = 0; i < n; i++) + pcap_frame(iov + i, offset, &tv); } /** -- 2.43.2
We have several places where we want to write(2) a buffer or buffers and we handle short write()s by retrying until everything is successfully written. Add a helper for this in util.c. This version has some differences from the typical write_all() function. First, take an IO vector rather than a single buffer, because that will be useful for some of our cases. Second, allow it to take an parameter to skip the first n bytes of the given buffers. This will be useful for some of the cases we want, and also falls out quite naturally from the implementation. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 33 +++++++++++++++++++++++++++++++++ util.h | 1 + 2 files changed, 34 insertions(+) diff --git a/util.c b/util.c index 8acce233..03e393f6 100644 --- a/util.c +++ b/util.c @@ -19,6 +19,7 @@ #include <arpa/inet.h> #include <net/ethernet.h> #include <sys/epoll.h> +#include <sys/uio.h> #include <fcntl.h> #include <string.h> #include <time.h> @@ -26,6 +27,7 @@ #include <stdbool.h> #include "util.h" +#include "iov.h" #include "passt.h" #include "packet.h" #include "log.h" @@ -574,3 +576,34 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, return clone(fn, stack_area + stack_size / 2, flags, arg); #endif } + +/* write_remainder() - write the tail of an IO vector to an fd + * @fd: File descriptor + * @iov: IO vector + * @iovcnt: Number of entries in @iov + * @skip: Number of bytes of the vector to skip writing + * + * Return: 0 on success, -1 on error (with errno set) + * + * #syscalls write writev + */ +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip) +{ + int i; + + while ((i = iov_skip_bytes(iov, iovcnt, skip, &skip)) < iovcnt) { + ssize_t rc; + + if (skip) + rc = write(fd, (char *)iov[i].iov_base + skip, + iov[i].iov_len - skip); + else + rc = writev(fd, &iov[i], iovcnt - i); + + if (rc < 0) + return -1; + skip += rc; + } + + return 0; +} diff --git a/util.h b/util.h index e0df26c6..de6816af 100644 --- a/util.h +++ b/util.h @@ -229,6 +229,7 @@ void write_pidfile(int fd, pid_t pid); int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip); /** * mod_sub() - Modular arithmetic subtraction -- 2.43.2
Currently pcap_frame() assumes that if write() doesn't return an error, it has written everything we want. That's not necessarily true, because it could return a short write. That's not likely to happen on a regular file, but there's not a lot of reason not to be robust here; it's conceivable we might want to direct the pcap fd at a named pipe or similar. So, make pcap_frame() handle short frames by using the write_remainder() helper. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/pcap.c b/pcap.c index 5cd7a8cc..317fc844 100644 --- a/pcap.c +++ b/pcap.c @@ -77,15 +77,18 @@ static void pcap_frame(const struct iovec *iov, size_t offset, const struct timeval *tv) { size_t len = iov->iov_len - offset; - struct pcap_pkthdr h; - - h.tv_sec = tv->tv_sec; - h.tv_usec = tv->tv_usec; - h.caplen = h.len = len; - - if (write(pcap_fd, &h, sizeof(h)) < 0 || - write(pcap_fd, (char *)iov->iov_base + offset, len) < 0) - debug("Cannot log packet, length %zu", len); + struct pcap_pkthdr h = { + .tv_sec = tv->tv_sec, + .tv_usec = tv->tv_usec, + .caplen = len, + .len = len + }; + struct iovec hiov = { &h, sizeof(h) }; + + if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || + write_remainder(pcap_fd, iov, 1, offset) < 0) + debug("Cannot log packet, length %zu: %s", + len, strerror(errno)); } /** -- 2.43.2
pcap_frame() explicitly takes a single frame, and only allows a single buffer (iovec) to be passed. pcap_multiple() takes multiple buffers, but explicitly expects exactly one frame per buffer. Future changes are going to want to split single frames across multiple buffers in some circumstances, so extend the pcap functions to allow for that. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 26 +++++++++++++++----------- pcap.h | 3 ++- tap.c | 2 +- 3 files changed, 18 insertions(+), 13 deletions(-) diff --git a/pcap.c b/pcap.c index 317fc844..dcf1a741 100644 --- a/pcap.c +++ b/pcap.c @@ -31,6 +31,7 @@ #include "util.h" #include "passt.h" #include "log.h" +#include "pcap.h" #define PCAP_VERSION_MINOR 4 @@ -67,14 +68,15 @@ struct pcap_pkthdr { /** * pcap_frame() - Capture a single frame to pcap file with given timestamp - * @iov: IO vector referencing buffer containing frame (with L2 headers) - * @offset: Offset of the frame from @iov->iov_base + * @iov: IO vector containing frame (with L2 headers and tap headers) + * @iovcnt: Number of buffers (@iov entries) in frame + * @offset: Byte offset of the L2 headers within @iov * @tv: Timestamp * * Returns: 0 on success, -errno on error writing to the file */ -static void pcap_frame(const struct iovec *iov, size_t offset, - const struct timeval *tv) +static void pcap_frame(const struct iovec *iov, size_t iovcnt, + size_t offset, const struct timeval *tv) { size_t len = iov->iov_len - offset; struct pcap_pkthdr h = { @@ -86,7 +88,7 @@ static void pcap_frame(const struct iovec *iov, size_t offset, struct iovec hiov = { &h, sizeof(h) }; if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || - write_remainder(pcap_fd, iov, 1, offset) < 0) + write_remainder(pcap_fd, iov, iovcnt, offset) < 0) debug("Cannot log packet, length %zu: %s", len, strerror(errno)); } @@ -105,16 +107,18 @@ void pcap(const char *pkt, size_t len) return; gettimeofday(&tv, NULL); - pcap_frame(&iov, 0, &tv); + pcap_frame(&iov, 1, 0, &tv); } /** * pcap_multiple() - Capture multiple frames - * @iov: Array of iovecs, one entry per frame - * @n: Number of frames to capture - * @offset: Offset of the frame within each iovec buffer + * @iov: IO vector with @frame_parts * @n entries + * @frame_parts: Number of IO vector items for each frame + * @n: Number of frames to capture + * @offset: Offset of the L2 frame within each iovec buffer */ -void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) +void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, + size_t offset) { struct timeval tv; unsigned int i; @@ -125,7 +129,7 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) gettimeofday(&tv, NULL); for (i = 0; i < n; i++) - pcap_frame(iov + i, offset, &tv); + pcap_frame(iov + i * frame_parts, frame_parts, offset, &tv); } /** diff --git a/pcap.h b/pcap.h index da5a7e84..85fc58e5 100644 --- a/pcap.h +++ b/pcap.h @@ -7,7 +7,8 @@ #define PCAP_H void pcap(const char *pkt, size_t len); -void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset); +void pcap_multiple(const struct iovec *iov, size_t frame_parts, unsigned int n, + size_t offset); void pcap_init(struct ctx *c); #endif /* PCAP_H */ diff --git a/tap.c b/tap.c index dd11d1d4..87d176b7 100644 --- a/tap.c +++ b/tap.c @@ -433,7 +433,7 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n) if (m < n) debug("tap: failed to send %zu frames of %zu", n - m, n); - pcap_multiple(iov, m, c->mode == MODE_PASST ? sizeof(uint32_t) : 0); + pcap_multiple(iov, 1, n, c->mode == MODE_PASST ? sizeof(uint32_t) : 0); return m; } -- 2.43.2
When we determine we have sent a partial frame in tap_send_frames_passt(), we call tap_send_remainder() to send the remainder of it. The logic in that function is very similar to that in the more general write_remainder() except that it uses send() instead of write()/writev(). But we are dealing specifically with the qemu socket here, which is a connected stream socket. In that case write()s do the same thing as send() with the options we were using, so we can just reuse write_remainder(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 29 ++++------------------------- 1 file changed, 4 insertions(+), 25 deletions(-) diff --git a/tap.c b/tap.c index 87d176b7..8a9a68b7 100644 --- a/tap.c +++ b/tap.c @@ -349,30 +349,6 @@ static size_t tap_send_frames_pasta(const struct ctx *c, return i; } -/** - * tap_send_remainder() - Send remainder of a partially sent frame - * @c: Execution context - * @iov: Partially sent buffer - * @offset: Number of bytes already sent from @iov - */ -static void tap_send_remainder(const struct ctx *c, const struct iovec *iov, - size_t offset) -{ - const char *base = (char *)iov->iov_base; - size_t len = iov->iov_len; - - while (offset < len) { - ssize_t sent = send(c->fd_tap, base + offset, len - offset, - MSG_NOSIGNAL); - if (sent < 0) { - err("tap: partial frame send (missing %zu bytes): %s", - len - offset, strerror(errno)); - return; - } - offset += sent; - } -} - /** * tap_send_frames_passt() - Send multiple frames to the passt tap * @c: Execution context @@ -403,7 +379,10 @@ static size_t tap_send_frames_passt(const struct ctx *c, if (i < n && buf_offset) { /* A partial frame was sent */ - tap_send_remainder(c, &iov[i], buf_offset); + if (write_remainder(c->fd_tap, &iov[i], 1, buf_offset) < 0) { + err("tap: partial frame send: %s", strerror(errno)); + return i; + } i++; } -- 2.43.2
On Wed, 28 Feb 2024 12:51:59 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:While working on other stuff I stumbled across some patches I wrote quite a while back (part of a larger series on indefinite hiatus). These make it easier to use the pcap and tap functions with frames that aren't in a single contiguous buffer. This overlaps with some of the work in Laurent's vhost-user series, so I've included the first of his patches and integrated it with my changes. There will The first two patches have some overlap with the preliminary iovec patches in the vhost-user series (and will certainly conflict). I do think the pcap interface here is slightly better than the one in the vhost-user series, although I did ack that. Stefano, if you've already applied / run tests for Laurent's series then go ahead with it; I'll rework this on top of those.No, I hadn't. This one is applied now. -- Stefano