This... is not any of the things I said I would be working on. I can only say that a herd of very hairy yaks led me astray. Looking at bug 66 I spotted some problems with our handling of MTUs / maximum frame sizes. Looking at that I found some weirdness and some real, if minor, bugs in the sizing and handling of the packet pools. Changes in v2: * Stefano convinced me that packet_check_range() is still worthwhile. * So don't remove it... but in looking at it I spotted various flaws in the checks, so address those in a number of new patches. David Gibson (12): test focus hack: stop on fail, but not perf fail make passt dumpable packet: Use flexible array member in struct pool packet: Don't pass start and offset separately too packet_check_range() packet: Don't hard code maximum packet size to UINT16_MAX packet: Remove unhelpful packet_get_try() macro util: Add abort_with_msg() and ASSERT_WITH_MSG() helpers packet: Distinguish severities of different packet_{add,git}_do() errors packet: Move packet length checks into packet_check_range() tap: Don't size pool_tap[46] for the maximum number of packets packet: More cautious checks to avoid pointer arithmetic UB dhcpv6.c | 2 +- ip.c | 2 +- isolation.c | 2 +- packet.c | 106 ++++++++++++++++++++++---------------------------- packet.h | 19 ++++++--- passt.h | 2 - tap.c | 18 +++++++-- tap.h | 3 +- test/lib/term | 1 + test/lib/test | 4 +- test/run | 38 +++++++++--------- util.c | 19 +++++++++ util.h | 25 +++++------- vu_common.c | 34 ++++++++++------ 14 files changed, 153 insertions(+), 122 deletions(-) -- 2.47.1
--- test/run | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/test/run b/test/run index f188d8ea..fc710475 100755 --- a/test/run +++ b/test/run @@ -81,9 +81,9 @@ run() { test passt/shutdown teardown pasta - setup pasta_options - test pasta_options/log_to_file - teardown pasta_options + #setup pasta_options + #test pasta_options/log_to_file + #teardown pasta_options setup build test pasta_podman/bats @@ -132,24 +132,24 @@ run() { VALGRIND=0 VHOST_USER=0 - setup passt_in_ns - test passt/ndp - test passt_in_ns/dhcp - test perf/passt_tcp - test perf/passt_udp - test perf/pasta_tcp - test perf/pasta_udp - test passt_in_ns/shutdown - teardown passt_in_ns + #setup passt_in_ns + #test passt/ndp + #test passt_in_ns/dhcp + #test perf/passt_tcp + #test perf/passt_udp + #test perf/pasta_tcp + #test perf/pasta_udp + #test passt_in_ns/shutdown + #teardown passt_in_ns VHOST_USER=1 - setup passt_in_ns - test passt_vu/ndp - test passt_vu_in_ns/dhcp - test perf/passt_vu_tcp - test perf/passt_vu_udp - test passt_vu_in_ns/shutdown - teardown passt_in_ns + #setup passt_in_ns + #test passt_vu/ndp + #test passt_vu_in_ns/dhcp + #test perf/passt_vu_tcp + #test perf/passt_vu_udp + #test passt_vu_in_ns/shutdown + #teardown passt_in_ns # TODO: Make those faster by at least pre-installing gcc and make on # non-x86 images, then re-enable. -- 2.47.1
--- test/lib/term | 1 + test/lib/test | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/test/lib/term b/test/lib/term index ed690de8..81191714 100755 --- a/test/lib/term +++ b/test/lib/term @@ -413,6 +413,7 @@ info_failed() { [ ${FAST} -eq 1 ] || status_bar_blink + exit 1 pause_continue \ "Press any key to pause test session" \ "Resuming in " \ diff --git a/test/lib/test b/test/lib/test index e6726bea..91729af7 100755 --- a/test/lib/test +++ b/test/lib/test @@ -101,7 +101,7 @@ test_one_line() { IFS="${__ifs}" ;; "test") - [ ${TEST_ONE_perf_nok} -eq 0 ] || TEST_ONE_nok=1 + # [ ${TEST_ONE_perf_nok} -eq 0 ] || TEST_ONE_nok=1 [ ${TEST_ONE_nok} -eq 1 ] && status_test_fail [ ${TEST_ONE_nok} -eq 0 ] && status_test_ok @@ -374,7 +374,7 @@ test_one() { [ ${DEMO} -eq 1 ] && return [ ${TEST_ONE_skip} -eq 1 ] && status_test_skip && return - [ ${TEST_ONE_perf_nok} -eq 0 ] || TEST_ONE_nok=1 + # [ ${TEST_ONE_perf_nok} -eq 0 ] || TEST_ONE_nok=1 [ ${TEST_ONE_nok} -eq 0 ] && status_test_ok || status_test_fail } -- 2.47.1
--- isolation.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/isolation.c b/isolation.c index c944fb35..df58bb87 100644 --- a/isolation.c +++ b/isolation.c @@ -377,7 +377,7 @@ void isolate_postfork(const struct ctx *c) { struct sock_fprog prog; - prctl(PR_SET_DUMPABLE, 0); +// prctl(PR_SET_DUMPABLE, 0); switch (c->mode) { case MODE_PASST: -- 2.47.1
Currently we have a dummy pkt[1] array, which we alias with an array of a different size via various macros. However, we already require C11 which includes flexible array members, so we can do better. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- packet.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packet.h b/packet.h index 3f70e949..85ee5508 100644 --- a/packet.h +++ b/packet.h @@ -21,7 +21,7 @@ struct pool { size_t buf_size; size_t size; size_t count; - struct iovec pkt[1]; + struct iovec pkt[]; }; int vu_packet_check_range(void *buf, size_t offset, size_t len, -- 2.47.1
Fundamentally what packet_check_range() does is to check whether a given memory range is within the allowed / expected memory set aside for packets from a particular pool. That range could represent a whole packet (from packet_add_do()) or part of a packet (from packet_get_do()), but it doesn't really matter which. However, we pass the start of the range as two parameters: @start which is the start of the packet, and @offset which is the offset within the packet of the range we're interested in. We never use these separately, only as (start + offset). Simplify the interface of packet_check_range() and vu_packet_check_range() to directly take the start of the relevant range. This will allow some additional future improvements. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- packet.c | 36 +++++++++++++++++++----------------- packet.h | 3 +-- vu_common.c | 11 ++++------- 3 files changed, 24 insertions(+), 26 deletions(-) diff --git a/packet.c b/packet.c index 03a11e6a..0330b548 100644 --- a/packet.c +++ b/packet.c @@ -23,23 +23,22 @@ #include "log.h" /** - * packet_check_range() - Check if a packet memory range is valid + * packet_check_range() - Check if a memory range is valid for a pool * @p: Packet pool - * @offset: Offset of data range in packet descriptor + * @ptr: Start of desired data range * @len: Length of desired data range - * @start: Start of the packet descriptor * @func: For tracing: name of calling function * @line: For tracing: caller line of function call * * Return: 0 if the range is valid, -1 otherwise */ -static int packet_check_range(const struct pool *p, size_t offset, size_t len, - const char *start, const char *func, int line) +static int packet_check_range(const struct pool *p, const char *ptr, size_t len, + const char *func, int line) { if (p->buf_size == 0) { int ret; - ret = vu_packet_check_range((void *)p->buf, offset, len, start); + ret = vu_packet_check_range((void *)p->buf, ptr, len); if (ret == -1) trace("cannot find region, %s:%i", func, line); @@ -47,16 +46,16 @@ static int packet_check_range(const struct pool *p, size_t offset, size_t len, return ret; } - if (start < p->buf) { - trace("packet start %p before buffer start %p, " - "%s:%i", (void *)start, (void *)p->buf, func, line); + if (ptr < p->buf) { + trace("packet range start %p before buffer start %p, %s:%i", + (void *)ptr, (void *)p->buf, func, line); return -1; } - if (start + len + offset > p->buf + p->buf_size) { - trace("packet offset plus length %zu from size %zu, " - "%s:%i", start - p->buf + len + offset, - p->buf_size, func, line); + if (ptr + len > p->buf + p->buf_size) { + trace("packet range end %p after buffer end %p, %s:%i", + (void *)(ptr + len), (void *)(p->buf + p->buf_size), + func, line); return -1; } @@ -81,7 +80,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, return; } - if (packet_check_range(p, 0, len, start, func, line)) + if (packet_check_range(p, start, len, func, line)) return; if (len > UINT16_MAX) { @@ -110,6 +109,8 @@ void packet_add_do(struct pool *p, size_t len, const char *start, void *packet_get_do(const struct pool *p, size_t idx, size_t offset, size_t len, size_t *left, const char *func, int line) { + char *ptr; + if (idx >= p->size || idx >= p->count) { if (func) { trace("packet %zu from pool size: %zu, count: %zu, " @@ -135,14 +136,15 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, return NULL; } - if (packet_check_range(p, offset, len, p->pkt[idx].iov_base, - func, line)) + ptr = (char *)p->pkt[idx].iov_base + offset; + + if (packet_check_range(p, ptr, len, func, line)) return NULL; if (left) *left = p->pkt[idx].iov_len - offset - len; - return (char *)p->pkt[idx].iov_base + offset; + return ptr; } /** diff --git a/packet.h b/packet.h index 85ee5508..bdc07fef 100644 --- a/packet.h +++ b/packet.h @@ -24,8 +24,7 @@ struct pool { struct iovec pkt[]; }; -int vu_packet_check_range(void *buf, size_t offset, size_t len, - const char *start); +int vu_packet_check_range(void *buf, const char *ptr, size_t len); void packet_add_do(struct pool *p, size_t len, const char *start, const char *func, int line); void *packet_get_do(const struct pool *p, const size_t idx, diff --git a/vu_common.c b/vu_common.c index 299b5a32..531f8786 100644 --- a/vu_common.c +++ b/vu_common.c @@ -22,14 +22,12 @@ * vu_packet_check_range() - Check if a given memory zone is contained in * a mapped guest memory region * @buf: Array of the available memory regions - * @offset: Offset of data range in packet descriptor + * @ptr: Start of desired data range * @size: Length of desired data range - * @start: Start of the packet descriptor * * Return: 0 if the zone is in a mapped memory region, -1 otherwise */ -int vu_packet_check_range(void *buf, size_t offset, size_t len, - const char *start) +int vu_packet_check_range(void *buf, const char *ptr, size_t len) { struct vu_dev_region *dev_region; @@ -37,9 +35,8 @@ int vu_packet_check_range(void *buf, size_t offset, size_t len, /* NOLINTNEXTLINE(performance-no-int-to-ptr) */ char *m = (char *)(uintptr_t)dev_region->mmap_addr; - if (m <= start && - start + offset + len <= m + dev_region->mmap_offset + - dev_region->size) + if (m <= ptr && + ptr + len <= m + dev_region->mmap_offset + dev_region->size) return 0; } -- 2.47.1
We verify that every packet we store in a pool - and every partial packet we retreive from it has a length no longer than UINT16_MAX. This originated in the older packet pool implementation which stored packet lengths in a uint16_t. Now, that packets are represented by a struct iovec with its size_t length, this check serves only as a sanity / security check that we don't have some wildly out of range length due to a bug elsewhere. However, UINT16_MAX (65535) isn't quite enough, because the "packet" as stored in the pool is in fact an entire frame including both L2 and any backend specific headers. We can exceed this in passt mode, even with the default MTU: 65520 bytes of IP datagram + 14 bytes of Ethernet header + 4 bytes of qemu stream length header = 65538 bytes. Introduce our own define for the maximum length of a packet in the pool and set it slightly larger, allowing 128 bytes for L2 and/or other backend specific headers. We'll use different amounts of that depending on the tap backend, but since this is just a sanity check, the bound doesn't need to be 100% tight. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- packet.c | 4 ++-- packet.h | 7 +++++++ 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/packet.c b/packet.c index 0330b548..bcac0375 100644 --- a/packet.c +++ b/packet.c @@ -83,7 +83,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, if (packet_check_range(p, start, len, func, line)) return; - if (len > UINT16_MAX) { + if (len > PACKET_MAX_LEN) { trace("add packet length %zu, %s:%i", len, func, line); return; } @@ -119,7 +119,7 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, return NULL; } - if (len > UINT16_MAX) { + if (len > PACKET_MAX_LEN) { if (func) { trace("packet data length %zu, %s:%i", len, func, line); diff --git a/packet.h b/packet.h index bdc07fef..05d37695 100644 --- a/packet.h +++ b/packet.h @@ -6,6 +6,13 @@ #ifndef PACKET_H #define PACKET_H +/* Maximum size of a single packet in a pool, including all headers. Sized to + * allow a maximum size IP datagram (65535) plus some extra space or L2 and + * backend specific headers. This is just for sanity checking, so doesn't need + * to be a tight limit. + */ +#define PACKET_MAX_LEN ROUND_UP(IP_MAX_MTU + 128, sizeof(unsigned long)) + /** * struct pool - Generic pool of packets stored in a buffer * @buf: Buffer storing packet descriptors, -- 2.47.1
Two places in the code use the packet_get_try() variant on packet_get(). The difference is that packet_get_try() passes a NULL 'func' to packet_get_do(), which suppresses log messages. The places we use this are ones where we expect to sometimes have a failure retreiving the packet range, even in normal cases. So, suppressing the log messages seems like it makes sense, except: 1) It suppresses log messages on all errors. We (somewhat) expect to hit the case where the requested range is not within the received packet. However, it also suppresses message in cases where the requested packet index doesn't exist, the requested range has a nonsensical length or doesn't lie in even the right vague part of memory. None of those latter cases are expected, and the message would be helpful if we ever actually hit them. 2) The suppressed messages aren't actually that disruptive. For the case in ip.c, we'd log only if we run out of IPv6 packet before reaching a (non-option) L4 header. That shouldn't be the case in normal operation so getting a message (at trave level) is not unreasonable. For the case in dhcpv6.c we do suppress a message every time we look for but don't find a specific DHCPv6 option. That can happen in perfectly ok cases, but, again these are trace() level and DHCPv6 transactions aren't that common. Suppressing the message for this case doesn't seem worth the drawbacks of (1). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- dhcpv6.c | 2 +- ip.c | 2 +- packet.c | 18 +++++------------- packet.h | 3 --- 4 files changed, 7 insertions(+), 18 deletions(-) diff --git a/dhcpv6.c b/dhcpv6.c index 0523bba8..c0d05131 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -271,7 +271,7 @@ static struct opt_hdr *dhcpv6_opt(const struct pool *p, size_t *offset, if (!*offset) *offset = sizeof(struct udphdr) + sizeof(struct msg_hdr); - while ((o = packet_get_try(p, 0, *offset, sizeof(*o), &left))) { + while ((o = packet_get(p, 0, *offset, sizeof(*o), &left))) { unsigned int opt_len = ntohs(o->l) + sizeof(*o); if (ntohs(o->l) > left) diff --git a/ip.c b/ip.c index 2cc7f654..e391f81b 100644 --- a/ip.c +++ b/ip.c @@ -51,7 +51,7 @@ char *ipv6_l4hdr(const struct pool *p, int idx, size_t offset, uint8_t *proto, if (!IPV6_NH_OPT(nh)) goto found; - while ((o = packet_get_try(p, idx, offset, sizeof(*o), dlen))) { + while ((o = packet_get(p, idx, offset, sizeof(*o), dlen))) { nh = o->nexthdr; hdrlen = (o->hdrlen + 1) * 8; diff --git a/packet.c b/packet.c index bcac0375..c921aa15 100644 --- a/packet.c +++ b/packet.c @@ -112,27 +112,19 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, char *ptr; if (idx >= p->size || idx >= p->count) { - if (func) { - trace("packet %zu from pool size: %zu, count: %zu, " - "%s:%i", idx, p->size, p->count, func, line); - } + trace("packet %zu from pool size: %zu, count: %zu, %s:%i", + idx, p->size, p->count, func, line); return NULL; } if (len > PACKET_MAX_LEN) { - if (func) { - trace("packet data length %zu, %s:%i", - len, func, line); - } + trace("packet data length %zu, %s:%i", len, func, line); return NULL; } if (len + offset > p->pkt[idx].iov_len) { - if (func) { - trace("data length %zu, offset %zu from length %zu, " - "%s:%i", len, offset, p->pkt[idx].iov_len, - func, line); - } + trace("data length %zu, offset %zu from length %zu, %s:%i", + len, offset, p->pkt[idx].iov_len, func, line); return NULL; } diff --git a/packet.h b/packet.h index 05d37695..f95cda08 100644 --- a/packet.h +++ b/packet.h @@ -45,9 +45,6 @@ void pool_flush(struct pool *p); #define packet_get(p, idx, offset, len, left) \ packet_get_do(p, idx, offset, len, left, __func__, __LINE__) -#define packet_get_try(p, idx, offset, len, left) \ - packet_get_do(p, idx, offset, len, left, NULL, 0) - #define PACKET_POOL_DECL(_name, _size, _buf) \ struct _name ## _t { \ char *buf; \ -- 2.47.1
We already have the ASSERT() macro which will abort() passt based on a condition. It always has a fixed error message based on its location and the asserted expression. We have some upcoming cases where we want to customise the message when hitting an assert. Add abort_with_msg() and ASSERT_WITH_MSG() helpers to allow this. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- util.c | 19 +++++++++++++++++++ util.h | 25 ++++++++++--------------- 2 files changed, 29 insertions(+), 15 deletions(-) diff --git a/util.c b/util.c index 11973c44..a62c255b 100644 --- a/util.c +++ b/util.c @@ -837,3 +837,22 @@ void raw_random(void *buf, size_t buflen) if (random_read < buflen) die("Unexpected EOF on random data source"); } + +/** + * abort_with_msg() - Print error message and abort + * @fmt: Format string + * @...: Format parameters + */ +void abort_with_msg(const char *fmt, ...) +{ + va_list ap; + + va_start(ap, fmt); + vlogmsg(true, false, LOG_CRIT, fmt, ap); + va_end(ap); + + /* This may actually cause a SIGSYS instead of SIGABRT, due to seccomp, + * but that will still get the job done. + */ + abort(); +} diff --git a/util.h b/util.h index 3fa1d125..020e75a5 100644 --- a/util.h +++ b/util.h @@ -67,27 +67,22 @@ #define STRINGIFY(x) #x #define STR(x) STRINGIFY(x) -#ifdef CPPCHECK_6936 +void abort_with_msg(const char *fmt, ...) + __attribute__((format(printf, 1, 2), noreturn)); + /* Some cppcheck versions get confused by aborts inside a loop, causing * it to give false positive uninitialised variable warnings later in * the function, because it doesn't realise the non-initialising path * already exited. See https://trac.cppcheck.net/ticket/13227 + * + * Therefore, avoid using the usual do while wrapper we use to force the macro + * to act like a single statement requiring a ';'. */ -#define ASSERT(expr) \ - ((expr) ? (void)0 : abort()) -#else +#define ASSERT_WITH_MSG(expr, ...) \ + ((expr) ? (void)0 : abort_with_msg(__VA_ARGS__)) #define ASSERT(expr) \ - do { \ - if (!(expr)) { \ - err("ASSERTION FAILED in %s (%s:%d): %s", \ - __func__, __FILE__, __LINE__, STRINGIFY(expr)); \ - /* This may actually SIGSYS, due to seccomp, \ - * but that will still get the job done \ - */ \ - abort(); \ - } \ - } while (0) -#endif + ASSERT_WITH_MSG((expr), "ASSSERTION FAILED in %s (%s:%d): %s", \ + __func__, __FILE__, __LINE__, STRINGIFY(expr)) #ifdef P_tmpdir #define TMPDIR P_tmpdir -- 2.47.1
packet_add() and packet_get() can fail for a number of reasons, and we currently treat them all basically the same: we log a trace() level message and for packet_get() we return NULL. However the different causes of error are quite different and suggest different handling: 1) If we run out of space in the pool on add, that's (rightly) non-fatal, but is an unusual situation which we might want to know about. Promote it's message to debug() level. 2) All packet_check_range() errors and the checks on packet length indicate a serious problem. Due to either a bug in calling code, or some sort of memory-clobbering, we've been given addresses or sizes of packets that are nonsensical. If things are this bad, we shouldn't try to carry on replace these with asserts. 3) Requesting a packet index that doesn't exist in the pool in packet_get() indicates a bug in the caller - it should always check the index first. Replace this with an assert. 4) On packet_get() requesting a section of the packet beyond its bounds is correct already. This happens routinely from some callers when they probe to see if certain parts of the packet are present. At worst it indicates that the guest has generate a malformed packet, which we should quietly drop as we already do. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- packet.c | 71 ++++++++++++++++++++--------------------------------- packet.h | 3 ++- vu_common.c | 13 +++++++--- 3 files changed, 38 insertions(+), 49 deletions(-) diff --git a/packet.c b/packet.c index c921aa15..24f12448 100644 --- a/packet.c +++ b/packet.c @@ -30,37 +30,27 @@ * @func: For tracing: name of calling function * @line: For tracing: caller line of function call * - * Return: 0 if the range is valid, -1 otherwise + * ASSERT()s if the given range isn't valid (within the expected buffer(s) for + * the pool). */ -static int packet_check_range(const struct pool *p, const char *ptr, size_t len, - const char *func, int line) +static void packet_check_range(const struct pool *p, + const char *ptr, size_t len, + const char *func, int line) { if (p->buf_size == 0) { - int ret; - - ret = vu_packet_check_range((void *)p->buf, ptr, len); - - if (ret == -1) - trace("cannot find region, %s:%i", func, line); - - return ret; - } - - if (ptr < p->buf) { - trace("packet range start %p before buffer start %p, %s:%i", - (void *)ptr, (void *)p->buf, func, line); - return -1; - } - - if (ptr + len > p->buf + p->buf_size) { - trace("packet range end %p after buffer end %p, %s:%i", - (void *)(ptr + len), (void *)(p->buf + p->buf_size), - func, line); - return -1; + vu_packet_check_range((void *)p->buf, ptr, len, func, line); + return; } - return 0; + ASSERT_WITH_MSG(ptr >= p->buf, + "packet range start %p before buffer start %p, %s:%i", + (void *)ptr, (void *)p->buf, func, line); + ASSERT_WITH_MSG(ptr + len <= p->buf + p->buf_size, + "packet range end %p after buffer end %p, %s:%i", + (void *)(ptr + len), (void *)(p->buf + p->buf_size), + func, line); } + /** * packet_add_do() - Add data as packet descriptor to given pool * @p: Existing pool @@ -75,18 +65,16 @@ void packet_add_do(struct pool *p, size_t len, const char *start, size_t idx = p->count; if (idx >= p->size) { - trace("add packet index %zu to pool with size %zu, %s:%i", + debug("add packet index %zu to pool with size %zu, %s:%i", idx, p->size, func, line); return; } - if (packet_check_range(p, start, len, func, line)) - return; + packet_check_range(p, start, len, func, line); - if (len > PACKET_MAX_LEN) { - trace("add packet length %zu, %s:%i", len, func, line); - return; - } + ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, + "add packet length %zu (max %zu), %s:%i", + len, PACKET_MAX_LEN, func, line); p->pkt[idx].iov_base = (void *)start; p->pkt[idx].iov_len = len; @@ -111,16 +99,12 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, { char *ptr; - if (idx >= p->size || idx >= p->count) { - trace("packet %zu from pool size: %zu, count: %zu, %s:%i", - idx, p->size, p->count, func, line); - return NULL; - } - - if (len > PACKET_MAX_LEN) { - trace("packet data length %zu, %s:%i", len, func, line); - return NULL; - } + ASSERT_WITH_MSG(idx < p->size && idx < p->count, + "packet %zu from pool size: %zu, count: %zu, %s:%i", + idx, p->size, p->count, func, line); + ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, + "packet range length %zu (max %zu), %s:%i", + len, PACKET_MAX_LEN, func, line); if (len + offset > p->pkt[idx].iov_len) { trace("data length %zu, offset %zu from length %zu, %s:%i", @@ -130,8 +114,7 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, ptr = (char *)p->pkt[idx].iov_base + offset; - if (packet_check_range(p, ptr, len, func, line)) - return NULL; + packet_check_range(p, ptr, len, func, line); if (left) *left = p->pkt[idx].iov_len - offset - len; diff --git a/packet.h b/packet.h index f95cda08..b164f77e 100644 --- a/packet.h +++ b/packet.h @@ -31,7 +31,8 @@ struct pool { struct iovec pkt[]; }; -int vu_packet_check_range(void *buf, const char *ptr, size_t len); +void vu_packet_check_range(void *buf, const char *ptr, size_t len, + const char *func, int line); void packet_add_do(struct pool *p, size_t len, const char *start, const char *func, int line); void *packet_get_do(const struct pool *p, const size_t idx, diff --git a/vu_common.c b/vu_common.c index 531f8786..528b9b08 100644 --- a/vu_common.c +++ b/vu_common.c @@ -24,10 +24,13 @@ * @buf: Array of the available memory regions * @ptr: Start of desired data range * @size: Length of desired data range + * @func: For tracing: name of calling function + * @line: For tracing: caller line of function call * - * Return: 0 if the zone is in a mapped memory region, -1 otherwise + * Aborts if the zone isn't in any of the device regions */ -int vu_packet_check_range(void *buf, const char *ptr, size_t len) +void vu_packet_check_range(void *buf, const char *ptr, size_t len, + const char *func, int line) { struct vu_dev_region *dev_region; @@ -37,10 +40,12 @@ int vu_packet_check_range(void *buf, const char *ptr, size_t len) if (m <= ptr && ptr + len <= m + dev_region->mmap_offset + dev_region->size) - return 0; + return; } - return -1; + abort_with_msg( + "package range at %p, length %zd not within dev region %s:%i", + (void *)ptr, len, func, line); } /** -- 2.47.1
Both packet_add_do() and packet_get_do() have a check on the given length, essentially sanity checking it before validating that it's in an expected memory region. This can be folded into packet_check_range() which performs similar checks for both functions. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- packet.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/packet.c b/packet.c index 24f12448..9e0e6555 100644 --- a/packet.c +++ b/packet.c @@ -37,6 +37,10 @@ static void packet_check_range(const struct pool *p, const char *ptr, size_t len, const char *func, int line) { + ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, + "packet_check_range length %zu (max %zu), %s:%i", + len, PACKET_MAX_LEN, func, line); + if (p->buf_size == 0) { vu_packet_check_range((void *)p->buf, ptr, len, func, line); return; @@ -72,10 +76,6 @@ void packet_add_do(struct pool *p, size_t len, const char *start, packet_check_range(p, start, len, func, line); - ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, - "add packet length %zu (max %zu), %s:%i", - len, PACKET_MAX_LEN, func, line); - p->pkt[idx].iov_base = (void *)start; p->pkt[idx].iov_len = len; @@ -102,9 +102,6 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, ASSERT_WITH_MSG(idx < p->size && idx < p->count, "packet %zu from pool size: %zu, count: %zu, %s:%i", idx, p->size, p->count, func, line); - ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, - "packet range length %zu (max %zu), %s:%i", - len, PACKET_MAX_LEN, func, line); if (len + offset > p->pkt[idx].iov_len) { trace("data length %zu, offset %zu from length %zu, %s:%i", -- 2.47.1
Currently we attempt to size pool_tap[46] so they have room for the maximum possible number of packets that could fit in pkt_buf, TAP_MSGS. However, the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as the minimum possible L2 frame size. But, we don't enforce that L2 frames are at least ETH_ZLEN when we receive them from the tap backend, and since we're dealing with virtual interfaces we don't have the physical Ethernet limitations requiring that length. Indeed it is possible to generate a legitimate frame smaller than that (e.g. a zero-payload UDP/IPv4 frame on the 'pasta' backend is only 42 bytes long). It's also unclear if this limit is sufficient for vhost-user which isn't limited by the size of pkt_buf as the other modes are. We could attempt to correct the calculation, but that would leave us with even larger arrays, which in practice rarely accumulate more than a handful of packets. So, instead, put an arbitrary cap on the number of packets we can put in a batch, and if we run out of space, process and flush the batch. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- packet.c | 13 ++++++++++++- packet.h | 3 +++ passt.h | 2 -- tap.c | 18 +++++++++++++++--- tap.h | 3 ++- vu_common.c | 3 ++- 6 files changed, 34 insertions(+), 8 deletions(-) diff --git a/packet.c b/packet.c index 9e0e6555..c3b80924 100644 --- a/packet.c +++ b/packet.c @@ -55,6 +55,17 @@ static void packet_check_range(const struct pool *p, func, line); } +/** + * pool_full() - Is a packet pool full? + * @p: Pointer to packet pool + * + * Return: true if the pool is full, false if more packets can be added + */ +bool pool_full(const struct pool *p) +{ + return p->count >= p->size; +} + /** * packet_add_do() - Add data as packet descriptor to given pool * @p: Existing pool @@ -68,7 +79,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, { size_t idx = p->count; - if (idx >= p->size) { + if (pool_full(p)) { debug("add packet index %zu to pool with size %zu, %s:%i", idx, p->size, func, line); return; diff --git a/packet.h b/packet.h index b164f77e..8d78841b 100644 --- a/packet.h +++ b/packet.h @@ -6,6 +6,8 @@ #ifndef PACKET_H #define PACKET_H +#include <stdbool.h> + /* Maximum size of a single packet in a pool, including all headers. Sized to * allow a maximum size IP datagram (65535) plus some extra space or L2 and * backend specific headers. This is just for sanity checking, so doesn't need @@ -38,6 +40,7 @@ void packet_add_do(struct pool *p, size_t len, const char *start, void *packet_get_do(const struct pool *p, const size_t idx, size_t offset, size_t len, size_t *left, const char *func, int line); +bool pool_full(const struct pool *p); void pool_flush(struct pool *p); #define packet_add(p, len, start) \ diff --git a/passt.h b/passt.h index 0dd4efa0..81b2787f 100644 --- a/passt.h +++ b/passt.h @@ -70,8 +70,6 @@ static_assert(sizeof(union epoll_ref) <= sizeof(union epoll_data), #define TAP_BUF_BYTES \ ROUND_DOWN(((ETH_MAX_MTU + sizeof(uint32_t)) * 128), PAGE_SIZE) -#define TAP_MSGS \ - DIV_ROUND_UP(TAP_BUF_BYTES, ETH_ZLEN - 2 * ETH_ALEN + sizeof(uint32_t)) #define PKT_BUF_BYTES MAX(TAP_BUF_BYTES, 0) extern char pkt_buf [PKT_BUF_BYTES]; diff --git a/tap.c b/tap.c index cd32a901..4421ce4d 100644 --- a/tap.c +++ b/tap.c @@ -61,6 +61,8 @@ #include "vhost_user.h" #include "vu_common.h" +#define TAP_MSGS 256 + /* IPv4 (plus ARP) and IPv6 message batches from tap/guest to IP handlers */ static PACKET_POOL_NOINIT(pool_tap4, TAP_MSGS, pkt_buf); static PACKET_POOL_NOINIT(pool_tap6, TAP_MSGS, pkt_buf); @@ -966,8 +968,10 @@ void tap_handler(struct ctx *c, const struct timespec *now) * @c: Execution context * @l2len: Total L2 packet length * @p: Packet buffer + * @now: Current timestamp */ -void tap_add_packet(struct ctx *c, ssize_t l2len, char *p) +void tap_add_packet(struct ctx *c, ssize_t l2len, char *p, + const struct timespec *now) { const struct ethhdr *eh; @@ -983,9 +987,17 @@ void tap_add_packet(struct ctx *c, ssize_t l2len, char *p) switch (ntohs(eh->h_proto)) { case ETH_P_ARP: case ETH_P_IP: + if (pool_full(pool_tap4)) { + tap4_handler(c, pool_tap4, now); + pool_flush(pool_tap4); + } packet_add(pool_tap4, l2len, p); break; case ETH_P_IPV6: + if (pool_full(pool_tap6)) { + tap6_handler(c, pool_tap6, now); + pool_flush(pool_tap6); + } packet_add(pool_tap6, l2len, p); break; default: @@ -1066,7 +1078,7 @@ static void tap_passt_input(struct ctx *c, const struct timespec *now) p += sizeof(uint32_t); n -= sizeof(uint32_t); - tap_add_packet(c, l2len, p); + tap_add_packet(c, l2len, p, now); p += l2len; n -= l2len; @@ -1129,7 +1141,7 @@ static void tap_pasta_input(struct ctx *c, const struct timespec *now) len > (ssize_t)ETH_MAX_MTU) continue; - tap_add_packet(c, len, pkt_buf + n); + tap_add_packet(c, len, pkt_buf + n, now); } tap_handler(c, now); diff --git a/tap.h b/tap.h index dfbd8b9e..a1b18430 100644 --- a/tap.h +++ b/tap.h @@ -74,6 +74,7 @@ void tap_sock_update_pool(void *base, size_t size); void tap_backend_init(struct ctx *c); void tap_flush_pools(void); void tap_handler(struct ctx *c, const struct timespec *now); -void tap_add_packet(struct ctx *c, ssize_t l2len, char *p); +void tap_add_packet(struct ctx *c, ssize_t l2len, char *p, + const struct timespec *now); #endif /* TAP_H */ diff --git a/vu_common.c b/vu_common.c index 528b9b08..6f49f873 100644 --- a/vu_common.c +++ b/vu_common.c @@ -187,7 +187,8 @@ static void vu_handle_tx(struct vu_dev *vdev, int index, tap_add_packet(vdev->context, elem[count].out_sg[0].iov_len - hdrlen, - (char *)elem[count].out_sg[0].iov_base + hdrlen); + (char *)elem[count].out_sg[0].iov_base + hdrlen, + now); count++; } tap_handler(vdev->context, now); -- 2.47.1
packet_check_range (and vu_packet_check_range()) verify that the packet or section of packet we're interested in lies in the packet buffer pool we expect it to. However, in doing so it doesn't avoid the possibility of an integer overflow while performing pointer arithmetic, with is UB. In fact, AFAICT it's UB even to use arbitrary pointer arithmetic to construct a pointer outside of a known valid buffer. To do this safely, we can't calculate the end of a memory region with pointer addition, at least if we're treating the length as untrusted. Instead we must work out the offset of one memory region within another using pointer subtraction, then do integer checks against the length of the outer region. We then need to be careful about the order of checks so that those integer checks can't themselves overflow. While addressing this, correct a flaw in vu_packet_check_range() where it only checked that the given packet piece was above the region's mmap address, not the base of the specific section of the mmap we expect to find it in (dev_region->mmap_addr + dev_region->mmap_offset). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- packet.c | 9 ++++++--- vu_common.c | 15 +++++++++++---- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/packet.c b/packet.c index c3b80924..04a7a829 100644 --- a/packet.c +++ b/packet.c @@ -49,9 +49,12 @@ static void packet_check_range(const struct pool *p, ASSERT_WITH_MSG(ptr >= p->buf, "packet range start %p before buffer start %p, %s:%i", (void *)ptr, (void *)p->buf, func, line); - ASSERT_WITH_MSG(ptr + len <= p->buf + p->buf_size, - "packet range end %p after buffer end %p, %s:%i", - (void *)(ptr + len), (void *)(p->buf + p->buf_size), + ASSERT_WITH_MSG(len <= p->buf_size, + "packet range length %zu larger than buffer %zu, %s:%i", + len, p->buf_size, func, line); + ASSERT_WITH_MSG((size_t)(ptr - p->buf) <= p->buf_size - len, +"packet range %p, len %zu extends beyond buffer %p, len %zu, %s:%i", + (void *)ptr, len, (void *)p->buf, p->buf_size, func, line); } diff --git a/vu_common.c b/vu_common.c index 6f49f873..3f7fb29d 100644 --- a/vu_common.c +++ b/vu_common.c @@ -35,16 +35,23 @@ void vu_packet_check_range(void *buf, const char *ptr, size_t len, struct vu_dev_region *dev_region; for (dev_region = buf; dev_region->mmap_addr; dev_region++) { + uintptr_t base_addr = dev_region->mmap_addr + + dev_region->mmap_offset; /* NOLINTNEXTLINE(performance-no-int-to-ptr) */ - char *m = (char *)(uintptr_t)dev_region->mmap_addr; + const char *base = (const char *)base_addr; - if (m <= ptr && - ptr + len <= m + dev_region->mmap_offset + dev_region->size) + ASSERT(base_addr >= dev_region->mmap_addr); + + if (len > dev_region->size) + continue; + + if (base <= ptr && + (size_t)(ptr - base) <= dev_region->size - len) return; } abort_with_msg( - "package range at %p, length %zd not within dev region %s:%i", + "packet range at %p, length %zd not within dev region, %s:%i", (void *)ptr, len, func, line); } -- 2.47.1
On Fri, Dec 20, 2024 at 07:35:23PM +1100, David Gibson wrote:This... is not any of the things I said I would be working on. I can only say that a herd of very hairy yaks led me astray. Looking at bug 66 I spotted some problems with our handling of MTUs / maximum frame sizes. Looking at that I found some weirdness and some real, if minor, bugs in the sizing and handling of the packet pools. Changes in v2: * Stefano convinced me that packet_check_range() is still worthwhile. * So don't remove it... but in looking at it I spotted various flaws in the checks, so address those in a number of new patches. David Gibson (12): test focus hack: stop on fail, but not perf fail make passt dumpableArgh, oops. Of course, 1..3 are test/debug patches that should not be included in the series.packet: Use flexible array member in struct pool packet: Don't pass start and offset separately too packet_check_range() packet: Don't hard code maximum packet size to UINT16_MAX packet: Remove unhelpful packet_get_try() macro util: Add abort_with_msg() and ASSERT_WITH_MSG() helpers packet: Distinguish severities of different packet_{add,git}_do() errors packet: Move packet length checks into packet_check_range() tap: Don't size pool_tap[46] for the maximum number of packets packet: More cautious checks to avoid pointer arithmetic UB dhcpv6.c | 2 +- ip.c | 2 +- isolation.c | 2 +- packet.c | 106 ++++++++++++++++++++++---------------------------- packet.h | 19 ++++++--- passt.h | 2 - tap.c | 18 +++++++-- tap.h | 3 +- test/lib/term | 1 + test/lib/test | 4 +- test/run | 38 +++++++++--------- util.c | 19 +++++++++ util.h | 25 +++++------- vu_common.c | 34 ++++++++++------ 14 files changed, 153 insertions(+), 122 deletions(-)-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, 20 Dec 2024 20:00:13 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Fri, Dec 20, 2024 at 07:35:23PM +1100, David Gibson wrote:Yeah, it's fine, 4/12 to 12/12 apply cleanly anyway. -- StefanoThis... is not any of the things I said I would be working on. I can only say that a herd of very hairy yaks led me astray. Looking at bug 66 I spotted some problems with our handling of MTUs / maximum frame sizes. Looking at that I found some weirdness and some real, if minor, bugs in the sizing and handling of the packet pools. Changes in v2: * Stefano convinced me that packet_check_range() is still worthwhile. * So don't remove it... but in looking at it I spotted various flaws in the checks, so address those in a number of new patches. David Gibson (12): test focus hack: stop on fail, but not perf fail make passt dumpableArgh, oops. Of course, 1..3 are test/debug patches that should not be included in the series.