[PATCH v2 00/11] Improve robustness of calculations related to frame size limits
There are a number of places where we make calculations and checks around how large frames can be and where they sit in memory. Several of these are roughly correct, but can be wrong in certain edge cases. Improve robustness by clarifying what we're doing and being more careful about the edge cases. v2: * Added additional patches 5..11 * Patches 1..4 rebased but unchanged David Gibson (11): vu_common: Tighten vu_packet_check_range() packet: More cautious checks to avoid pointer arithmetic UB tap: Make size of pool_tap[46] purely a tuning parameter tap: Clarify calculation of TAP_MSGS packet: Correct type of PACKET_MAX_LEN packet: Avoid integer overflows in packet_get_do() packet: Move checks against PACKET_MAX_LEN to packet_check_range() packet: Rework packet_get() versus packet_get_try() util: Add abort_with_msg() and ASSERT_WITH_MSG() helpers packet: ASSERT on signs of pool corruption packet: Upgrade severity of most packet errors packet.c | 110 ++++++++++++++++++++++++++++++++++------------------ packet.h | 13 +++++-- passt.h | 2 - tap.c | 43 ++++++++++++++++---- tap.h | 3 +- util.c | 19 +++++++++ util.h | 25 +++++------- vu_common.c | 15 ++++--- 8 files changed, 158 insertions(+), 72 deletions(-) -- 2.48.1
This function verifies that the given packet is within the mmap()ed memory
region of the vhost-user device. We can do better, however. The packet
should be not only within the mmap()ed range, but specifically in the
subsection of that range set aside for shared buffers, which starts at
dev_region->mmap_offset within there.
Signed-off-by: David Gibson
packet_check_range and vu_packet_check_range() verify that the packet or
section of packet we're interested in lies in the packet buffer pool we
expect it to. However, in doing so it doesn't avoid the possibility of
an integer overflow while performing pointer arithmetic, with is UB. In
fact, AFAICT it's UB even to use arbitrary pointer arithmetic to construct
a pointer outside of a known valid buffer.
To do this safely, we can't calculate the end of a memory region with
pointer addition when then the length as untrusted. Instead we must work
out the offset of one memory region within another using pointer
subtraction, then do integer checks against the length of the outer region.
We then need to be careful about the order of checks so that those integer
checks can't themselves overflow.
Signed-off-by: David Gibson
Currently we attempt to size pool_tap[46] so they have room for the maximum
possible number of packets that could fit in pkt_buf (TAP_MSGS). However,
the calculation isn't quite correct: TAP_MSGS is based on ETH_ZLEN (60) as
the minimum possible L2 frame size. But ETH_ZLEN is based on physical
constraints of Ethernet, which don't apply to our virtual devices. It is
possible to generate a legitimate frame smaller than this, for example an
empty payload UDP/IPv4 frame on the 'pasta' backend is only 42 bytes long.
Further more, the same limit applies for vhost-user, which is not limited
by the size of pkt_buf like the other backends. In that case we don't even
have full control of the maximum buffer size, so we can't really calculate
how many packets could fit in there.
If we exceed do TAP_MSGS we'll drop packets, not just use more batches,
which is moderately bad. The fact that this needs to be sized just so for
correctness not merely for tuning is a fairly non-obvious coupling between
different parts of the code.
To make this more robust, alter the tap code so it doesn't rely on
everything fitting in a single batch of TAP_MSGS packets, instead breaking
into multiple batches as necessary. This leaves TAP_MSGS as purely a
tuning parameter, which we can freely adjust based on performance measures.
Signed-off-by: David Gibson
The rationale behind the calculation of TAP_MSGS isn't necessarily obvious.
It's supposed to be the maximum number of packets that can fit in pkt_buf.
However, the calculation is wrong in several ways:
* It's based on ETH_ZLEN which isn't meaningful for virtual devices
* It always includes the qemu socket header which isn't used for pasta
* The size of pkt_buf isn't relevant for vhost-user
We've already made sure this is just a tuning parameter, not a hard limit.
Clarify what we're calculating here and why.
Signed-off-by: David Gibson
PACKET_MAX_LEN is usually involved in calculations on size_t values - the
type of the iov_len field in struct iovec. However, being defined bare as
UINT16_MAX, the compiled is likely to assign it a shorter type. This can
lead to unexpected promotions (or lack thereof). Add a cast to force the
type to be what we expect.
Fixes: c43972ad6 ("packet: Give explicit name to maximum packet size")
Signed-off-by: David Gibson
In packet_get_do() both offset and len are essentially untrusted. We do
some validation of len (check it's < PACKET_MAX_LEN), but that's not enough
to ensure that (len + offset) doesn't overflow. Rearrange our calculation
to make sure it's safe regardless of the given offset & len values.
Signed-off-by: David Gibson
Both the callers of packet_check_range() separately verify that the given
length does not exceed PACKET_MAX_LEN. Fold that check into
packet_check_range() instead.
Signed-off-by: David Gibson
Most failures of packet_get() indicate a serious problem, and log messages
accordingly. However, a few callers expect failures here, because they're
probing for a certain range which might or might not be in a packet. They
use packet_get_try() which passes a NULL func to packet_get_do() to
suppress the logging which is unwanted in this case.
However, this doesn't just suppress the log when packet_get_do() finds the
requested region isn't in the packet. It suppresses logging for all other
errors too, which do indicate serious problems, even for the callers of
packet_get_try(). Worse it will pass the NULL func on to
packet_check_range() which doesn't expect it, meaning we'll get unhelpful
messages from there if there is a failure.
Fix this by making packet_get_try_do() the primary function which doesn't
log for the case of a range outside the packet. packet_get_do() becomes a
trivial wrapper around that which logs a message if packet_get_try_do()
returns NULL.
Signed-off-by: David Gibson
We already have the ASSERT() macro which will abort() passt based on a
condition. It always has a fixed error message based on its location and
the asserted expression. We have some upcoming cases where we want to
customise the message when hitting an assert.
Add abort_with_msg() and ASSERT_WITH_MSG() helpers to allow this.
Signed-off-by: David Gibson
If packet_check_range() fails in packet_get_try_do() we just return NULL.
But this check only takes places after we've already validated the given
range against the packet it's in. That means that if packet_check_range()
fails, the packet pool is already in a corrupted state (we should have
made strictly stronger checks when the packet was added). Simply returning
NULL and logging a trace() level message isn't really adequate for that
situation; ASSERT instead.
Similarly we check the given idx against both p->count and p->size. The
latter should be redundant, because count should always be <= size. If
that's not the case then, again, the pool is already in a corrupted state
and we may have overwritten unknown memory. Assert for this case too.
Signed-off-by: David Gibson
All errors from packet_range_check(), packet_add() and packet_get() are
trace level. However, these are for the most part actual error conditions.
They're states that should not happen, in many cases indicating a bug
in the caller or elswhere.
We don't promote these to err() or ASSERT() level, for fear of a localised
bug on very specific input crashing the entire program, or flooding the
logs, but we can at least upgrade them to debug level.
Signed-off-by: David Gibson
On Mon, 17 Mar 2025 20:24:13 +1100
David Gibson
There are a number of places where we make calculations and checks around how large frames can be and where they sit in memory. Several of these are roughly correct, but can be wrong in certain edge cases. Improve robustness by clarifying what we're doing and being more careful about the edge cases.
v2: * Added additional patches 5..11 * Patches 1..4 rebased but unchanged
Applied... it took me a while to convince myself that the refactored checks in 2/11 and 10/11 (functionally different in one case) are in fact equivalent (and functionally equivalent in the bigger picture for that idx >= p->size now gone from 10/11), but yes, of course, they weren't robust earlier. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio