Laurent, sorry I didn't spot this earlier. While working on other stuff I stumbled across these patches I wrote quite a while back (part of a larger series on indefinite hiatus). The first two patches have some overlap with the preliminary iovec patches in the vhost-user series (and will certainly conflict). I do think the pcap interface here is slightly better than the one in the vhost-user series, although I did ack that. Stefano, if you've already applied / run tests for Laurent's series then go ahead with it; I'll rework this on top of those. David Gibson (6): util: Add helper to find offset into io vector pcap: Update pcap_frame() to take an iovec and offset util: Add write_remainder() helper pcap: Handle short writes in pcap_frame() pcap: Allow pcap_frame() and pcap_multiple() to take multi-buffer frames tap: Use write_remainder() in tap_send_frames_passt() pcap.c | 54 ++++++++++++++++++++++++++++-------------------------- pcap.h | 3 ++- tap.c | 42 +++++++++--------------------------------- util.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ util.h | 2 ++ 5 files changed, 96 insertions(+), 60 deletions(-) -- 2.43.2
tap_send_frames_passt() needs to determine which buffer element a byte offset into an IO vector lies in. We have some upcoming uses for similar logic, so split this out into a helper function iov_offset(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 13 +++++-------- util.c | 23 +++++++++++++++++++++++ util.h | 1 + 3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/tap.c b/tap.c index 396dee7e..f15eba6e 100644 --- a/tap.c +++ b/tap.c @@ -390,22 +390,19 @@ static size_t tap_send_frames_passt(const struct ctx *c, .msg_iovlen = n, }; unsigned int i; + size_t offset; ssize_t sent; sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT); if (sent < 0) return 0; - /* Check for any partial frames due to short send */ - for (i = 0; i < n; i++) { - if ((size_t)sent < iov[i].iov_len) - break; - sent -= iov[i].iov_len; - } + offset = (size_t)sent; + i = iov_offset(iov, n, &offset); - if (i < n && sent) { + if (i < n && offset) { /* A partial frame was sent */ - tap_send_remainder(c, &iov[i], sent); + tap_send_remainder(c, &iov[i], offset); i++; } diff --git a/util.c b/util.c index 21b35ff9..fb6a0430 100644 --- a/util.c +++ b/util.c @@ -574,3 +574,26 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, return clone(fn, stack_area + stack_size / 2, flags, arg); #endif } + +/* iov_offset() - interpret offset into an IO vector + * @iov: IO vector + * @n: Number of entries in @iov + * @offset: Pointer to offset into IO vector + * + * Return: index I of iovec which contains the given offset, or @n if + * the given offset is >= the total # of bytes in the vector. + * *@offset is updated to be the byte offset into (@iov + I), + * and is guaranteed to be less than @iov[I].iov_len + */ +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset) +{ + size_t i; + + for (i = 0; i < n; i++) { + if (*offset < iov[i].iov_len) + break; + *offset -= iov[i].iov_len; + } + + return i; +} diff --git a/util.h b/util.h index d2320f8c..62fad6fe 100644 --- a/util.h +++ b/util.h @@ -229,6 +229,7 @@ void write_pidfile(int fd, pid_t pid); int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset); /** * mod_sub() - Modular arithmetic subtraction -- 2.43.2
On Thu, 22 Feb 2024 16:55:57 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:tap_send_frames_passt() needs to determine which buffer element a byte offset into an IO vector lies in. We have some upcoming uses for similar logic, so split this out into a helper function iov_offset(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 13 +++++-------- util.c | 23 +++++++++++++++++++++++ util.h | 1 +Laurent, I guess this will need to be moved to iov.h by your series, at the point where you introduce the new header.3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/tap.c b/tap.c index 396dee7e..f15eba6e 100644 --- a/tap.c +++ b/tap.c @@ -390,22 +390,19 @@ static size_t tap_send_frames_passt(const struct ctx *c, .msg_iovlen = n, }; unsigned int i; + size_t offset; ssize_t sent; sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT); if (sent < 0) return 0; - /* Check for any partial frames due to short send */Why would you drop this comment, though? I feel it's needed even more as we don't open-code that any longer.- for (i = 0; i < n; i++) { - if ((size_t)sent < iov[i].iov_len) - break; - sent -= iov[i].iov_len; - } + offset = (size_t)sent; + i = iov_offset(iov, n, &offset);I think with these names and interface this becomes quite obscure: it sounds like 'i' should be an offset at this point... and 'offset', I have no idea (unless I read the comment to iov_offset()). Slightly different proposal below.- if (i < n && sent) { + if (i < n && offset) { /* A partial frame was sent */ - tap_send_remainder(c, &iov[i], sent); + tap_send_remainder(c, &iov[i], offset); i++; } diff --git a/util.c b/util.c index 21b35ff9..fb6a0430 100644 --- a/util.c +++ b/util.c @@ -574,3 +574,26 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, return clone(fn, stack_area + stack_size / 2, flags, arg); #endif } + +/* iov_offset() - interpret offset into an IO vector + * @iov: IO vector + * @n: Number of entries in @iov + * @offset: Pointer to offset into IO vector + * + * Return: index I of iovec which contains the given offset, or @n if + * the given offset is >= the total # of bytes in the vector. + * *@offset is updated to be the byte offset into (@iov + I), + * and is guaranteed to be less than @iov[I].iov_len + */ +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset)...what do you think of: /* iov_entry_index() - Index and optionally entry offset, given global offset * @iov: IO vector * @n: Number of entries in @iov * @global_offset: Global offset of byte in IO vector we're looking for * @entry_offset: If not NULL, set on return: entry offset * * Return: index of IO vector entry for given byte offset, @n if not found * * Note: @entry_offset is guaranteed to be less than @iov[i].iov_len, where i is * the return value */ and tap_send_frames_passt() could (more) happily do: i = iov_entry_index(iov, n, sent, &entry_offset); if (i < n) { /* A partial frame was sent */ tap_send_remainder(c, &iov[i], entry_offset); i++; }+{ + size_t i; + + for (i = 0; i < n; i++) { + if (*offset < iov[i].iov_len)Indentation.+ break; + *offset -= iov[i].iov_len; + } + + return i; +} diff --git a/util.h b/util.h index d2320f8c..62fad6fe 100644 --- a/util.h +++ b/util.h @@ -229,6 +229,7 @@ void write_pidfile(int fd, pid_t pid); int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset); /** * mod_sub() - Modular arithmetic subtraction-- Stefano
On Tue, Feb 27, 2024 at 03:23:35PM +0100, Stefano Brivio wrote:On Thu, 22 Feb 2024 16:55:57 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:To avoid that shuffle, I cherry picked Laurent's first patch into my series, and rebased putting this function directly into iov.c. Several of Laurent's functions can also be slightly simplified using this helper, so I've done that.tap_send_frames_passt() needs to determine which buffer element a byte offset into an IO vector lies in. We have some upcoming uses for similar logic, so split this out into a helper function iov_offset(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 13 +++++-------- util.c | 23 +++++++++++++++++++++++ util.h | 1 +Laurent, I guess this will need to be moved to iov.h by your series, at the point where you introduce the new header.I guess I thought it was redundant with the comment a few lines down? Anyway, I've put it back in.3 files changed, 29 insertions(+), 8 deletions(-) diff --git a/tap.c b/tap.c index 396dee7e..f15eba6e 100644 --- a/tap.c +++ b/tap.c @@ -390,22 +390,19 @@ static size_t tap_send_frames_passt(const struct ctx *c, .msg_iovlen = n, }; unsigned int i; + size_t offset; ssize_t sent; sent = sendmsg(c->fd_tap, &mh, MSG_NOSIGNAL | MSG_DONTWAIT); if (sent < 0) return 0; - /* Check for any partial frames due to short send */Why would you drop this comment, though? I feel it's needed even more as we don't open-code that any longer.Yeah, that's fair. I was think of "iov_offset" as "offset the IOV", but "offset *of* the IOV" is probably a more obvious interpretation.- for (i = 0; i < n; i++) { - if ((size_t)sent < iov[i].iov_len) - break; - sent -= iov[i].iov_len; - } + offset = (size_t)sent; + i = iov_offset(iov, n, &offset);I think with these names and interface this becomes quite obscure: it sounds like 'i' should be an offset at this point... and 'offset', I have no idea (unless I read the comment to iov_offset()). Slightly different proposal below.Hm, I find this name and and one line description confusing in a different way from mine. In my next spin, I've tried to synthesize your suggestion and mine, into a version under the name iov_skip_bytes().- if (i < n && sent) { + if (i < n && offset) { /* A partial frame was sent */ - tap_send_remainder(c, &iov[i], sent); + tap_send_remainder(c, &iov[i], offset); i++; } diff --git a/util.c b/util.c index 21b35ff9..fb6a0430 100644 --- a/util.c +++ b/util.c @@ -574,3 +574,26 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, return clone(fn, stack_area + stack_size / 2, flags, arg); #endif } + +/* iov_offset() - interpret offset into an IO vector + * @iov: IO vector + * @n: Number of entries in @iov + * @offset: Pointer to offset into IO vector + * + * Return: index I of iovec which contains the given offset, or @n if + * the given offset is >= the total # of bytes in the vector. + * *@offset is updated to be the byte offset into (@iov + I), + * and is guaranteed to be less than @iov[I].iov_len + */ +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset)...what do you think of: /* iov_entry_index() - Index and optionally entry offset, given global offset* @iov: IO vector * @n: Number of entries in @iov * @global_offset: Global offset of byte in IO vector we're looking for * @entry_offset: If not NULL, set on return: entry offset * * Return: index of IO vector entry for given byte offset, @n if not found * * Note: @entry_offset is guaranteed to be less than @iov[i].iov_len, where i is * the return value */ and tap_send_frames_passt() could (more) happily do: i = iov_entry_index(iov, n, sent, &entry_offset); if (i < n) { /* A partial frame was sent */ tap_send_remainder(c, &iov[i], entry_offset); i++; }-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson+{ + size_t i; + + for (i = 0; i < n; i++) { + if (*offset < iov[i].iov_len)Indentation.+ break; + *offset -= iov[i].iov_len; + } + + return i; +} diff --git a/util.h b/util.h index d2320f8c..62fad6fe 100644 --- a/util.h +++ b/util.h @@ -229,6 +229,7 @@ void write_pidfile(int fd, pid_t pid); int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); +size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset); /** * mod_sub() - Modular arithmetic subtraction
Update the low-level helper pcap_frame() to take a struct iovec and offset within it, rather than an explicit pointer and length for the frame. This moves the handling of an offset (to skip vnet_len) from pcap_multiple() to pcap_frame(). This doesn't accomplish a great deal immediately, but will make subsequent changes easier. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/pcap.c b/pcap.c index 501d52d4..8876a051 100644 --- a/pcap.c +++ b/pcap.c @@ -67,24 +67,25 @@ struct pcap_pkthdr { /** * pcap_frame() - Capture a single frame to pcap file with given timestamp - * @pkt: Pointer to data buffer, including L2 headers - * @len: L2 packet length + * @iov: iovec referencing buffer containing frame (with L2 headers) + * @offset: Offset of the frame from @iov->iov_base * @tv: Timestamp * * Returns: 0 on success, -errno on error writing to the file */ -static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv) +static void pcap_frame(const struct iovec *iov, size_t offset, + const struct timeval *tv) { + size_t len = iov->iov_len - offset; struct pcap_pkthdr h; h.tv_sec = tv->tv_sec; h.tv_usec = tv->tv_usec; h.caplen = h.len = len; - if (write(pcap_fd, &h, sizeof(h)) < 0 || write(pcap_fd, pkt, len) < 0) - return -errno; - - return 0; + if (write(pcap_fd, &h, sizeof(h)) < 0 || + write(pcap_fd, (char *)iov->iov_base + offset, len) < 0) + debug("Cannot log packet, length %zu", len); } /** @@ -94,14 +95,14 @@ static int pcap_frame(const char *pkt, size_t len, const struct timeval *tv) */ void pcap(const char *pkt, size_t len) { + struct iovec iov = { (char *)pkt, len }; struct timeval tv; if (pcap_fd == -1) return; gettimeofday(&tv, NULL); - if (pcap_frame(pkt, len, &tv) != 0) - debug("Cannot log packet, length %zu", len); + pcap_frame(&iov, 0, &tv); } /** @@ -120,14 +121,8 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) gettimeofday(&tv, NULL); - for (i = 0; i < n; i++) { - if (pcap_frame((char *)iov[i].iov_base + offset, - iov[i].iov_len - offset, &tv) != 0) { - debug("Cannot log packet, length %zu", - iov->iov_len - offset); - return; - } - } + for (i = 0; i < n; i++) + pcap_frame(iov + i, offset, &tv); } /** -- 2.43.2
On Thu, 22 Feb 2024 16:55:58 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Update the low-level helper pcap_frame() to take a struct iovec and offset within it, rather than an explicit pointer and length for the frame. This moves the handling of an offset (to skip vnet_len) from pcap_multiple() to pcap_frame(). This doesn't accomplish a great deal immediately, but will make subsequent changes easier. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/pcap.c b/pcap.c index 501d52d4..8876a051 100644 --- a/pcap.c +++ b/pcap.c @@ -67,24 +67,25 @@ struct pcap_pkthdr { /** * pcap_frame() - Capture a single frame to pcap file with given timestamp - * @pkt: Pointer to data buffer, including L2 headers - * @len: L2 packet length + * @iov: iovec referencing buffer containing frame (with L2 headers)For consistency with, say, 1/6: "IO vector" -- Stefano
We have several places where we want to write(2) a buffer or buffers and we do (or should) handle sort write()s by retrying until everything is successfully written. Add a helper for this in util.c. This version has some differences from the typical write_all() function. First, take an IO vector rather than a single buffer, because that will be useful for some of our cases. Second, allow it to take an parameter to skip the first n bytes of the given buffers. This will be usefl for some of the cases we want, and also falls out quite naturally from the implementation. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 32 ++++++++++++++++++++++++++++++++ util.h | 1 + 2 files changed, 33 insertions(+) diff --git a/util.c b/util.c index fb6a0430..a475f2b5 100644 --- a/util.c +++ b/util.c @@ -19,6 +19,7 @@ #include <arpa/inet.h> #include <net/ethernet.h> #include <sys/epoll.h> +#include <sys/uio.h> #include <fcntl.h> #include <string.h> #include <time.h> @@ -597,3 +598,34 @@ size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset) return i; } + +/* write_remainder() - write the tail of an IO vector to an fd + * @fd: File descriptor + * @iov: IO vector + * @iovcnt: Number of entries in @iov + * @skip: Number of bytes of the vector to skip writing + * + * Return: 0 on success, -1 on error (with errno set) + * + * #syscalls write writev + */ +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip) +{ + int i; + + while ((i = iov_offset(iov, iovcnt, &skip)) < iovcnt) { + ssize_t rc; + + if (skip) + rc = write(fd, (char *)iov[i].iov_base + skip, + iov[i].iov_len - skip); + else + rc = writev(fd, &iov[i], iovcnt - i); + + if (rc < 0) + return -1; + skip += rc; + } + + return 0; +} diff --git a/util.h b/util.h index 62fad6fe..ee380f55 100644 --- a/util.h +++ b/util.h @@ -230,6 +230,7 @@ int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset); +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip); /** * mod_sub() - Modular arithmetic subtraction -- 2.43.2
On Thu, 22 Feb 2024 16:55:59 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:We have several places where we want to write(2) a buffer or buffers and we do (or should) handle sort write()s by retrying until everything isAfter making sure that "handle sorting" doesn't exist (yet): is one between "handle" and "sort" redundant, or is there another meaning to this?successfully written. Add a helper for this in util.c. This version has some differences from the typical write_all() function. First, take an IO vector rather than a single buffer, because that will be useful for some of our cases. Second, allow it to take an parameter to skip the first n bytes of the given buffers. This will be usefl for some of the cases we want, and also falls out quite naturally from the implementation. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 32 ++++++++++++++++++++++++++++++++ util.h | 1 + 2 files changed, 33 insertions(+) diff --git a/util.c b/util.c index fb6a0430..a475f2b5 100644 --- a/util.c +++ b/util.c @@ -19,6 +19,7 @@ #include <arpa/inet.h> #include <net/ethernet.h> #include <sys/epoll.h> +#include <sys/uio.h> #include <fcntl.h> #include <string.h> #include <time.h> @@ -597,3 +598,34 @@ size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset) return i; } + +/* write_remainder() - write the tail of an IO vector to an fd + * @fd: File descriptor + * @iov: IO vector + * @iovcnt: Number of entries in @iov + * @skip: Number of bytes of the vector to skip writing + * + * Return: 0 on success, -1 on error (with errno set) + * + * #syscalls write writev + */ +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip) +{ + int i; + + while ((i = iov_offset(iov, iovcnt, &skip)) < iovcnt) {With my proposal for 1/6 this becomes: while ((i = iov_entry_index(iov, iovcnt, skip, &offset)) < iovcnt) { if (offset) ... which I don't really find brilliant, but at least we don't do if (skip) where 'skip' used to be something completely different.+ ssize_t rc; + + if (skip)Curly brackets here for consistency (undecided about readability to be honest).+ rc = write(fd, (char *)iov[i].iov_base + skip, + iov[i].iov_len - skip); + else + rc = writev(fd, &iov[i], iovcnt - i); + + if (rc < 0) + return -1; + skip += rc; + } + + return 0; +} diff --git a/util.h b/util.h index 62fad6fe..ee380f55 100644 --- a/util.h +++ b/util.h @@ -230,6 +230,7 @@ int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset); +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip); /** * mod_sub() - Modular arithmetic subtraction-- Stefano
On Tue, Feb 27, 2024 at 03:25:51PM +0100, Stefano Brivio wrote:On Thu, 22 Feb 2024 16:55:59 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Oops. s/sort/short/ is what makes this make sense.We have several places where we want to write(2) a buffer or buffers and we do (or should) handle sort write()s by retrying until everything isAfter making sure that "handle sorting" doesn't exist (yet): is one between "handle" and "sort" redundant, or is there another meaning to this?Hrm, the easiest conversion is to use (..., skip, &skip). Using a new variable means we'd need to feed that back into skip somehow on the next loop.successfully written. Add a helper for this in util.c. This version has some differences from the typical write_all() function. First, take an IO vector rather than a single buffer, because that will be useful for some of our cases. Second, allow it to take an parameter to skip the first n bytes of the given buffers. This will be usefl for some of the cases we want, and also falls out quite naturally from the implementation. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 32 ++++++++++++++++++++++++++++++++ util.h | 1 + 2 files changed, 33 insertions(+) diff --git a/util.c b/util.c index fb6a0430..a475f2b5 100644 --- a/util.c +++ b/util.c @@ -19,6 +19,7 @@ #include <arpa/inet.h> #include <net/ethernet.h> #include <sys/epoll.h> +#include <sys/uio.h> #include <fcntl.h> #include <string.h> #include <time.h> @@ -597,3 +598,34 @@ size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset) return i; } + +/* write_remainder() - write the tail of an IO vector to an fd + * @fd: File descriptor + * @iov: IO vector + * @iovcnt: Number of entries in @iov + * @skip: Number of bytes of the vector to skip writing + * + * Return: 0 on success, -1 on error (with errno set) + * + * #syscalls write writev + */ +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip) +{ + int i; + + while ((i = iov_offset(iov, iovcnt, &skip)) < iovcnt) {With my proposal for 1/6 this becomes: while ((i = iov_entry_index(iov, iovcnt, skip, &offset)) < iovcnt) {if (offset) ... which I don't really find brilliant, but at least we don't do if (skip) where 'skip' used to be something completely different.So.. the version I have now you might not like based on this comment, because it still has 'skip' essentially become a local variable with a different meaning from the one it has at entry.Uh.. consistency with what? We don't typically brace single line clauses in passt.+ ssize_t rc; + + if (skip)Curly brackets here for consistency (undecided about readability to be honest).-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson+ rc = write(fd, (char *)iov[i].iov_base + skip, + iov[i].iov_len - skip); + else + rc = writev(fd, &iov[i], iovcnt - i); + + if (rc < 0) + return -1; + skip += rc; + } + + return 0; +} diff --git a/util.h b/util.h index 62fad6fe..ee380f55 100644 --- a/util.h +++ b/util.h @@ -230,6 +230,7 @@ int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); size_t iov_offset(const struct iovec *iov, size_t n, size_t *offset); +int write_remainder(int fd, const struct iovec *iov, int iovcnt, size_t skip); /** * mod_sub() - Modular arithmetic subtraction
On Wed, 28 Feb 2024 11:44:28 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Tue, Feb 27, 2024 at 03:25:51PM +0100, Stefano Brivio wrote:These are two lines though. I've been trying to keep this consistent with the Linux kernel's net/ and drivers/net/ style, where curly braces are used for multiple lines, even if it's a single statement. -- StefanoOn Thu, 22 Feb 2024 16:55:59 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: [...]Uh.. consistency with what? We don't typically brace single line clauses in passt. > > + rc = write(fd, (char *)iov[i].iov_base + skip, > > + iov[i].iov_len - skip);+ + if (skip)Curly brackets here for consistency (undecided about readability to be honest).
On Wed, Feb 28, 2024 at 07:24:10AM +0100, Stefano Brivio wrote:On Wed, 28 Feb 2024 11:44:28 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Ah, fair enough. It never occurred to me to consider physical lines rather than logical lines. Adjust and merge if you want, otherwise I'll respin with this change tomorrow. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonOn Tue, Feb 27, 2024 at 03:25:51PM +0100, Stefano Brivio wrote:These are two lines though. I've been trying to keep this consistent with the Linux kernel's net/ and drivers/net/ style, where curly braces are used for multiple lines, even if it's a single statement.On Thu, 22 Feb 2024 16:55:59 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: [...]Uh.. consistency with what? We don't typically brace single line clauses in passt. > > + rc = write(fd, (char *)iov[i].iov_base + skip, > > + iov[i].iov_len - skip);+ + if (skip)Curly brackets here for consistency (undecided about readability to be honest).
On Wed, 28 Feb 2024 20:04:55 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Wed, Feb 28, 2024 at 07:24:10AM +0100, Stefano Brivio wrote:Sure, let me adjust this on merge. I'm still reviewing the rest. -- StefanoOn Wed, 28 Feb 2024 11:44:28 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Ah, fair enough. It never occurred to me to consider physical lines rather than logical lines. Adjust and merge if you want, otherwise I'll respin with this change tomorrow.On Tue, Feb 27, 2024 at 03:25:51PM +0100, Stefano Brivio wrote:These are two lines though. I've been trying to keep this consistent with the Linux kernel's net/ and drivers/net/ style, where curly braces are used for multiple lines, even if it's a single statement.On Thu, 22 Feb 2024 16:55:59 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: [...] > + > + if (skip) Curly brackets here for consistency (undecided about readability to be honest).Uh.. consistency with what? We don't typically brace single line clauses in passt. > > + rc = write(fd, (char *)iov[i].iov_base + skip, > > + iov[i].iov_len - skip);
Currently pcap_frame() assumes that if write() doesn't return an error, it has written everything we want. That's not necessarily true, because it could return a short write. That's not likely to happen on a regular file, but there's not a lot of reason not to be robust here; it's conceivable we might want to direct the pcap fd at a named pipe or similar. So, make pcap_frame() handle short frames by using the write_remainder() helper. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/pcap.c b/pcap.c index 8876a051..eeb71a3c 100644 --- a/pcap.c +++ b/pcap.c @@ -77,15 +77,18 @@ static void pcap_frame(const struct iovec *iov, size_t offset, const struct timeval *tv) { size_t len = iov->iov_len - offset; - struct pcap_pkthdr h; - - h.tv_sec = tv->tv_sec; - h.tv_usec = tv->tv_usec; - h.caplen = h.len = len; - - if (write(pcap_fd, &h, sizeof(h)) < 0 || - write(pcap_fd, (char *)iov->iov_base + offset, len) < 0) - debug("Cannot log packet, length %zu", len); + struct pcap_pkthdr h = { + .tv_sec = tv->tv_sec, + .tv_usec = tv->tv_usec, + .caplen = len, + .len = len + }; + struct iovec hiov = { &h, sizeof(h) }; + + if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || + write_remainder(pcap_fd, iov, 1, offset) < 0) + debug("Cannot log packet, length %zu: %s", + len, strerror(errno)); } /** -- 2.43.2
pcap_frame() explicitly takes a single frame, and only allows a single buffer (iovec) to be passed. pcap_multiple() takes multiple buffers, but explicitly expects exactly one frame per buffer. Future changes are going to want to split single frames across multiple buffers in some circumstances, so extend the pcap functions to allow for that. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 24 ++++++++++++++---------- pcap.h | 3 ++- tap.c | 2 +- 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/pcap.c b/pcap.c index eeb71a3c..9eb4f3d2 100644 --- a/pcap.c +++ b/pcap.c @@ -31,6 +31,7 @@ #include "util.h" #include "passt.h" #include "log.h" +#include "pcap.h" #define PCAP_VERSION_MINOR 4 @@ -67,14 +68,15 @@ struct pcap_pkthdr { /** * pcap_frame() - Capture a single frame to pcap file with given timestamp - * @iov: iovec referencing buffer containing frame (with L2 headers) - * @offset: Offset of the frame from @iov->iov_base + * @iov: IO vector containing frame (with L2 headers and tap headers) + * @iovcnt: Number of buffers (@iov entries) in frame + * @offset: Byte offset of the L2 headers within @iov * @tv: Timestamp * * Returns: 0 on success, -errno on error writing to the file */ -static void pcap_frame(const struct iovec *iov, size_t offset, - const struct timeval *tv) +static void pcap_frame(const struct iovec *iov, size_t iovcnt, + size_t offset, const struct timeval *tv) { size_t len = iov->iov_len - offset; struct pcap_pkthdr h = { @@ -86,7 +88,7 @@ static void pcap_frame(const struct iovec *iov, size_t offset, struct iovec hiov = { &h, sizeof(h) }; if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || - write_remainder(pcap_fd, iov, 1, offset) < 0) + write_remainder(pcap_fd, iov, iovcnt, offset) < 0) debug("Cannot log packet, length %zu: %s", len, strerror(errno)); } @@ -105,16 +107,18 @@ void pcap(const char *pkt, size_t len) return; gettimeofday(&tv, NULL); - pcap_frame(&iov, 0, &tv); + pcap_frame(&iov, 1, 0, &tv); } /** * pcap_multiple() - Capture multiple frames - * @iov: Array of iovecs, one entry per frame + * @iov: Array of iovecs, every @framebufs entries is one frame + * @framebufs: Number of buffers per frame * @n: Number of frames to capture - * @offset: Offset of the frame within each iovec buffer + * @offset: Offset of the L2 frame within each iovec buffer */ -void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) +void pcap_multiple(const struct iovec *iov, size_t framebufs, unsigned int n, + size_t offset) { struct timeval tv; unsigned int i; @@ -125,7 +129,7 @@ void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset) gettimeofday(&tv, NULL); for (i = 0; i < n; i++) - pcap_frame(iov + i, offset, &tv); + pcap_frame(iov + i * framebufs, framebufs, offset, &tv); } /** diff --git a/pcap.h b/pcap.h index da5a7e84..b9a2257e 100644 --- a/pcap.h +++ b/pcap.h @@ -7,7 +7,8 @@ #define PCAP_H void pcap(const char *pkt, size_t len); -void pcap_multiple(const struct iovec *iov, unsigned int n, size_t offset); +void pcap_multiple(const struct iovec *iov, size_t framebufs, unsigned int n, + size_t offset); void pcap_init(struct ctx *c); #endif /* PCAP_H */ diff --git a/tap.c b/tap.c index f15eba6e..7f0389f3 100644 --- a/tap.c +++ b/tap.c @@ -432,7 +432,7 @@ size_t tap_send_frames(const struct ctx *c, const struct iovec *iov, size_t n) if (m < n) debug("tap: failed to send %zu frames of %zu", n - m, n); - pcap_multiple(iov, m, c->mode == MODE_PASST ? sizeof(uint32_t) : 0); + pcap_multiple(iov, 1, n, c->mode == MODE_PASST ? sizeof(uint32_t) : 0); return m; } -- 2.43.2
On Thu, 22 Feb 2024 16:56:01 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:pcap_frame() explicitly takes a single frame, and only allows a single buffer (iovec) to be passed. pcap_multiple() takes multiple buffers, but explicitly expects exactly one frame per buffer. Future changes are going to want to split single frames across multiple buffers in some circumstances, so extend the pcap functions to allow for that. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 24 ++++++++++++++---------- pcap.h | 3 ++- tap.c | 2 +- 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/pcap.c b/pcap.c index eeb71a3c..9eb4f3d2 100644 --- a/pcap.c +++ b/pcap.c @@ -31,6 +31,7 @@ #include "util.h" #include "passt.h" #include "log.h" +#include "pcap.h" #define PCAP_VERSION_MINOR 4 @@ -67,14 +68,15 @@ struct pcap_pkthdr { /** * pcap_frame() - Capture a single frame to pcap file with given timestamp - * @iov: iovec referencing buffer containing frame (with L2 headers) - * @offset: Offset of the frame from @iov->iov_base + * @iov: IO vector containing frame (with L2 headers and tap headers) + * @iovcnt: Number of buffers (@iov entries) in frame + * @offset: Byte offset of the L2 headers within @iovI swear I didn't read "Byte offset" from here before commenting on 1/6, which is rather promising.* @tv: Timestamp * * Returns: 0 on success, -errno on error writing to the file */ -static void pcap_frame(const struct iovec *iov, size_t offset, - const struct timeval *tv) +static void pcap_frame(const struct iovec *iov, size_t iovcnt, + size_t offset, const struct timeval *tv) { size_t len = iov->iov_len - offset; struct pcap_pkthdr h = { @@ -86,7 +88,7 @@ static void pcap_frame(const struct iovec *iov, size_t offset, struct iovec hiov = { &h, sizeof(h) }; if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || - write_remainder(pcap_fd, iov, 1, offset) < 0) + write_remainder(pcap_fd, iov, iovcnt, offset) < 0) debug("Cannot log packet, length %zu: %s", len, strerror(errno)); } @@ -105,16 +107,18 @@ void pcap(const char *pkt, size_t len) return; gettimeofday(&tv, NULL); - pcap_frame(&iov, 0, &tv); + pcap_frame(&iov, 1, 0, &tv); } /** * pcap_multiple() - Capture multiple frames - * @iov: Array of iovecs, one entry per frame + * @iov: Array of iovecs, every @framebufs entries is one frame + * @framebufs: Number of buffers per frameI found this a bit hard to understand. What about: * @iov: Array of IO vectors, with item count @frame_parts * @n * @frame_parts: Number of IO vector items for each frame ? The rest of the series looks good to me. -- Stefano
When we determine we have sent a partial frame in tap_send_frames_passt(), we call tap_send_remainder() to send the remainder of it. The logic in that function is very similar to that in the more general write_remainder() except that it uses send() instead of write()/writev(). But we are dealing specifically with the qemu socket here, which is a connected stream socket. In that case write()s do the same thing as send() with the options we were using, so we can just reuse write_remainder(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tap.c | 29 ++++------------------------- 1 file changed, 4 insertions(+), 25 deletions(-) diff --git a/tap.c b/tap.c index 7f0389f3..d2c66acb 100644 --- a/tap.c +++ b/tap.c @@ -348,30 +348,6 @@ static size_t tap_send_frames_pasta(const struct ctx *c, return i; } -/** - * tap_send_remainder() - Send remainder of a partially sent frame - * @c: Execution context - * @iov: Partially sent buffer - * @offset: Number of bytes already sent from @iov - */ -static void tap_send_remainder(const struct ctx *c, const struct iovec *iov, - size_t offset) -{ - const char *base = (char *)iov->iov_base; - size_t len = iov->iov_len; - - while (offset < len) { - ssize_t sent = send(c->fd_tap, base + offset, len - offset, - MSG_NOSIGNAL); - if (sent < 0) { - err("tap: partial frame send (missing %zu bytes): %s", - len - offset, strerror(errno)); - return; - } - offset += sent; - } -} - /** * tap_send_frames_passt() - Send multiple frames to the passt tap * @c: Execution context @@ -402,7 +378,10 @@ static size_t tap_send_frames_passt(const struct ctx *c, if (i < n && offset) { /* A partial frame was sent */ - tap_send_remainder(c, &iov[i], offset); + if (write_remainder(c->fd_tap, &iov[i], 1, offset) < 0) { + err("tap: partial frame send: %s", strerror(errno)); + return i; + } i++; } -- 2.43.2