With Markus Armbruster's help I re-examined the Coverity warning we've had in this function for a while. While I still don't understand some steps in the report it's giving, I did spot some real problems in the vicinity. It turns out fixing those also fixes the Coverity warning, so here we go. David Gibson (2): util: Add helper to write() all of a buffer util: Remove possible quadratic behaviour from write_remainder() pcap.c | 3 +-- util.c | 45 ++++++++++++++++++++++++++++++++++++--------- util.h | 1 + 3 files changed, 38 insertions(+), 11 deletions(-) -- 2.46.0
write(2) might not write all the data it is given. Add a write_all_buf() helper to keep calling it until all the given data is written, or we get an error. Currently we use write_remainder() to do this operation in pcap_frame(). That's a little awkward since it requires constructing an iovec, and future changes we want to make to write_remainder() will be easier in terms of this single buffer helper. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 3 +-- util.c | 25 +++++++++++++++++++++++++ util.h | 1 + 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/pcap.c b/pcap.c index 46cc4b0d..e6b5ced4 100644 --- a/pcap.c +++ b/pcap.c @@ -86,9 +86,8 @@ static void pcap_frame(const struct iovec *iov, size_t iovcnt, .caplen = l2len, .len = l2len }; - struct iovec hiov = { &h, sizeof(h) }; - if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || + if (write_all_buf(pcap_fd, &h, sizeof(h)) < 0 || write_remainder(pcap_fd, iov, iovcnt, offset) < 0) debug_perror("Cannot log packet, length %zu", l2len); } diff --git a/util.c b/util.c index eede4e58..7db7c2e7 100644 --- a/util.c +++ b/util.c @@ -582,6 +582,31 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, #endif } +/* write_all_buf() - write all of a buffer to an fd + * @fd: File descriptor + * @buf: Pointer to base of buffer + * @len: Length of buffer + * + * Return: 0 on success, -1 on error (with errno set) + * + * #syscalls write + */ +int write_all_buf(int fd, const void *buf, size_t len) +{ + const char *p = buf; + size_t left = len; + + while (left) { + ssize_t rc = write(fd, p, left); + + if (rc < 0) + return -1; + p += rc; + left -= rc; + } + return 0; +} + /* write_remainder() - write the tail of an IO vector to an fd * @fd: File descriptor * @iov: IO vector diff --git a/util.h b/util.h index c7a59d5d..5e67f1fa 100644 --- a/util.h +++ b/util.h @@ -200,6 +200,7 @@ void pidfile_write(int fd, pid_t pid); int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); +int write_all_buf(int fd, const void *buf, size_t len); int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip); void close_open_files(int argc, char **argv); -- 2.46.0
David Gibson <david(a)gibson.dropbear.id.au> writes:write(2) might not write all the data it is given. Add a write_all_buf() helper to keep calling it until all the given data is written, or we get an error. Currently we use write_remainder() to do this operation in pcap_frame(). That's a little awkward since it requires constructing an iovec, and future changes we want to make to write_remainder() will be easier in terms of this single buffer helper. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 3 +-- util.c | 25 +++++++++++++++++++++++++ util.h | 1 + 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/pcap.c b/pcap.c index 46cc4b0d..e6b5ced4 100644 --- a/pcap.c +++ b/pcap.c @@ -86,9 +86,8 @@ static void pcap_frame(const struct iovec *iov, size_t iovcnt, .caplen = l2len, .len = l2len }; - struct iovec hiov = { &h, sizeof(h) }; - if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || + if (write_all_buf(pcap_fd, &h, sizeof(h)) < 0 || write_remainder(pcap_fd, iov, iovcnt, offset) < 0) debug_perror("Cannot log packet, length %zu", l2len); } diff --git a/util.c b/util.c index eede4e58..7db7c2e7 100644 --- a/util.c +++ b/util.c @@ -582,6 +582,31 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, #endif } +/* write_all_buf() - write all of a buffer to an fd + * @fd: File descriptor + * @buf: Pointer to base of buffer + * @len: Length of buffer + * + * Return: 0 on success, -1 on error (with errno set) + * + * #syscalls write + */ +int write_all_buf(int fd, const void *buf, size_t len)I'd call it write_all(), but that's strictly a matter of taste.+{ + const char *p = buf; + size_t left = len; + + while (left) { + ssize_t rc = write(fd, p, left); + + if (rc < 0) + return -1;Loop when errno == EINTR?+ p += rc; + left -= rc; + } + return 0; +} + /* write_remainder() - write the tail of an IO vector to an fd * @fd: File descriptor * @iov: IO vector diff --git a/util.h b/util.h index c7a59d5d..5e67f1fa 100644 --- a/util.h +++ b/util.h @@ -200,6 +200,7 @@ void pidfile_write(int fd, pid_t pid); int __daemon(int pidfile_fd, int devnull_fd); int fls(unsigned long x); int write_file(const char *path, const char *buf); +int write_all_buf(int fd, const void *buf, size_t len); int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip); void close_open_files(int argc, char **argv);
On Wed, Sep 18, 2024 at 08:05:26AM +0200, Markus Armbruster wrote:David Gibson <david(a)gibson.dropbear.id.au> writes:I usually would too, but given we work pretty heavily with both plain buffers and iovs, I thought it was worth emphasising that this works on the former.write(2) might not write all the data it is given. Add a write_all_buf() helper to keep calling it until all the given data is written, or we get an error. Currently we use write_remainder() to do this operation in pcap_frame(). That's a little awkward since it requires constructing an iovec, and future changes we want to make to write_remainder() will be easier in terms of this single buffer helper. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- pcap.c | 3 +-- util.c | 25 +++++++++++++++++++++++++ util.h | 1 + 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/pcap.c b/pcap.c index 46cc4b0d..e6b5ced4 100644 --- a/pcap.c +++ b/pcap.c @@ -86,9 +86,8 @@ static void pcap_frame(const struct iovec *iov, size_t iovcnt, .caplen = l2len, .len = l2len }; - struct iovec hiov = { &h, sizeof(h) }; - if (write_remainder(pcap_fd, &hiov, 1, 0) < 0 || + if (write_all_buf(pcap_fd, &h, sizeof(h)) < 0 || write_remainder(pcap_fd, iov, iovcnt, offset) < 0) debug_perror("Cannot log packet, length %zu", l2len); } diff --git a/util.c b/util.c index eede4e58..7db7c2e7 100644 --- a/util.c +++ b/util.c @@ -582,6 +582,31 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, #endif } +/* write_all_buf() - write all of a buffer to an fd + * @fd: File descriptor + * @buf: Pointer to base of buffer + * @len: Length of buffer + * + * Return: 0 on success, -1 on error (with errno set) + * + * #syscalls write + */ +int write_all_buf(int fd, const void *buf, size_t len)I'd call it write_all(), but that's strictly a matter of taste.Oh, good idea. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson+{ + const char *p = buf; + size_t left = len; + + while (left) { + ssize_t rc = write(fd, p, left); + + if (rc < 0) + return -1;Loop when errno == EINTR?
write_remainder() steps through the buffers in an IO vector writing out everything past a certain byte offset. However, on each iteration it rescans the buffer from the beginning to find out where we're up to. With an unfortunate set of write sizes this could lead to quadratic behaviour. In an even less likely set of circumstances (total vector length > maximum size_t) the 'skip' variable could overflow. This is one factor in a longstanding Coverity error we've seen (although I still can't figure out the remainder of its complaint). Rework write_remainder() to always work out our new position in the vector relative to our old/current position, rather than starting from the beginning each time. As a bonus this seems to fix the Coverity error. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/util.c b/util.c index 7db7c2e7..43e5f5ec 100644 --- a/util.c +++ b/util.c @@ -615,28 +615,30 @@ int write_all_buf(int fd, const void *buf, size_t len) * * Return: 0 on success, -1 on error (with errno set) * - * #syscalls write writev + * #syscalls writev */ int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip) { - size_t offset, i; + size_t i = 0, offset; - while ((i = iov_skip_bytes(iov, iovcnt, skip, &offset)) < iovcnt) { + while ((i += iov_skip_bytes(iov + i, iovcnt - i, skip, &offset)) < iovcnt) { ssize_t rc; if (offset) { - rc = write(fd, (char *)iov[i].iov_base + offset, - iov[i].iov_len - offset); - } else { - rc = writev(fd, &iov[i], iovcnt - i); + /* Write the remainder of the partially written buffer */ + if (write_all_buf(fd, (char *)iov[i].iov_base + offset, + iov[i].iov_len - offset) < 0) + return -1; + i++; } + /* Write as much of the remaining whole buffers as we can */ + rc = writev(fd, &iov[i], iovcnt - i); if (rc < 0) return -1; - skip += rc; + skip = rc; } - return 0; } -- 2.46.0
David Gibson <david(a)gibson.dropbear.id.au> writes:write_remainder() steps through the buffers in an IO vector writing out everything past a certain byte offset. However, on each iteration it rescans the buffer from the beginning to find out where we're up to. With an unfortunate set of write sizes this could lead to quadratic behaviour. In an even less likely set of circumstances (total vector length > maximum size_t) the 'skip' variable could overflow. This is one factor in a longstanding Coverity error we've seen (although I still can't figure out the remainder of its complaint). Rework write_remainder() to always work out our new position in the vector relative to our old/current position, rather than starting from the beginning each time. As a bonus this seems to fix the Coverity error. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/util.c b/util.c index 7db7c2e7..43e5f5ec 100644 --- a/util.c +++ b/util.c @@ -615,28 +615,30 @@ int write_all_buf(int fd, const void *buf, size_t len) * * Return: 0 on success, -1 on error (with errno set) * - * #syscalls write writev + * #syscalls writevI'm not familiar with this annotation. I guess it's system calls used in directly this function, not including the ones used in callees.*/ int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip) { - size_t offset, i; + size_t i = 0, offset; - while ((i = iov_skip_bytes(iov, iovcnt, skip, &offset)) < iovcnt) { + while ((i += iov_skip_bytes(iov + i, iovcnt - i, skip, &offset)) < iovcnt) { ssize_t rc; if (offset) { - rc = write(fd, (char *)iov[i].iov_base + offset, - iov[i].iov_len - offset); - } else { - rc = writev(fd, &iov[i], iovcnt - i); + /* Write the remainder of the partially written buffer */ + if (write_all_buf(fd, (char *)iov[i].iov_base + offset, + iov[i].iov_len - offset) < 0) + return -1; + i++; } + /* Write as much of the remaining whole buffers as we can */ + rc = writev(fd, &iov[i], iovcnt - i);Last argument can be zero now. Okay.if (rc < 0) return -1; - skip += rc; + skip = rc; } - return 0; }We could use try to save system calls by only using writev(), with a suitably adjusted copy of @iov. But non-zero @offset should be rare, so I guess it's not worth the bother. Reviewed-by: Markus Armbruster <armbru(a)redhat.com>
On Wed, Sep 18, 2024 at 08:20:25AM +0200, Markus Armbruster wrote:David Gibson <david(a)gibson.dropbear.id.au> writes:Basically, yes. They're just slurped up by a script to generate our seccomp() filter.write_remainder() steps through the buffers in an IO vector writing out everything past a certain byte offset. However, on each iteration it rescans the buffer from the beginning to find out where we're up to. With an unfortunate set of write sizes this could lead to quadratic behaviour. In an even less likely set of circumstances (total vector length > maximum size_t) the 'skip' variable could overflow. This is one factor in a longstanding Coverity error we've seen (although I still can't figure out the remainder of its complaint). Rework write_remainder() to always work out our new position in the vector relative to our old/current position, rather than starting from the beginning each time. As a bonus this seems to fix the Coverity error. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/util.c b/util.c index 7db7c2e7..43e5f5ec 100644 --- a/util.c +++ b/util.c @@ -615,28 +615,30 @@ int write_all_buf(int fd, const void *buf, size_t len) * * Return: 0 on success, -1 on error (with errno set) * - * #syscalls write writev + * #syscalls writevI'm not familiar with this annotation. I guess it's system calls used in directly this function, not including the ones used in callees.Yes.*/ int write_remainder(int fd, const struct iovec *iov, size_t iovcnt, size_t skip) { - size_t offset, i; + size_t i = 0, offset; - while ((i = iov_skip_bytes(iov, iovcnt, skip, &offset)) < iovcnt) { + while ((i += iov_skip_bytes(iov + i, iovcnt - i, skip, &offset)) < iovcnt) { ssize_t rc; if (offset) { - rc = write(fd, (char *)iov[i].iov_base + offset, - iov[i].iov_len - offset); - } else { - rc = writev(fd, &iov[i], iovcnt - i); + /* Write the remainder of the partially written buffer */ + if (write_all_buf(fd, (char *)iov[i].iov_base + offset, + iov[i].iov_len - offset) < 0) + return -1; + i++; } + /* Write as much of the remaining whole buffers as we can */ + rc = writev(fd, &iov[i], iovcnt - i);Last argument can be zero now. Okay.Actually, they're not rare - we nearly always pass a non-zero initial skip value. But write_remainder() itself is somewhat rare, and making a copy of the iov is kind of structurally awkward in passt (we don't allocate). -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonif (rc < 0) return -1; - skip += rc; + skip = rc; } - return 0; }We could use try to save system calls by only using writev(), with a suitably adjusted copy of @iov. But non-zero @offset should be rare, so I guess it's not worth the bother.