On Mon, Jan 06, 2025 at 11:55:25AM +0100, Stefano Brivio wrote:On Fri, 3 Jan 2025 16:06:57 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Actually, I realised this borderline case is addressed by the end of the series. The "hard" length check is moved into packet_check_range() and is always fatal. However, *before* we do that we indirectly check the range length against the packet length with if (len + offset > p->pkt[idx].iov_len) { If that fails, we hit a "soft" error (returning NULL), before we call packet_check_range() and possibly trigger a "hard" error (assert). I'll attempt to get the order of operations right when I rework this, so we don't have interim patches where this breaks.On Thu, Jan 02, 2025 at 11:00:08PM +0100, Stefano Brivio wrote:...generally, yes, but I'm sure we don't want to crash on: - a malformed DHCP request plus subtle bug (not saying that there's one, just that I could see myself adding one), dhcp(): packet_get(p, 0, offset + opt_off + 2, *olen, NULL); - a malformed TCP header plus some kind of bug, tcp_data_from_tap(): data = packet_get(p, i, off, len, NULL); Yes, those would need "bad errors". But the bad error here would crash us with an assert, and cause us to throw away one packet without it.On Thu, 2 Jan 2025 13:58:19 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:There's one, maybe two of the tests in packet_get_do() that don't indicate a serious error: if (len + offset > p->pkt[idx].iov_len) { is definitely ok, and you've convinced me I should try harder to make sure this doesn't print any error (even trace()) in at least the "try" cases. if (len > UINT16_MAX) { is questionable. Technically it could be ok, but since that parameter is generally a constant, chances are it does indicate a bad error in the caller.On Wed, Jan 01, 2025 at 10:54:41PM +0100, Stefano Brivio wrote: > On Fri, 20 Dec 2024 19:35:32 +1100 > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > packet_add() and packet_get() can fail for a number of reasons, and we > > currently treat them all basically the same: we log a trace() level message > > and for packet_get() we return NULL. However the different causes of > > error are quite different and suggest different handling: > > > > 1) If we run out of space in the pool on add, that's (rightly) non-fatal, > > but is an unusual situation which we might want to know about. Promote > > it's message to debug() level. > > > > 2) All packet_check_range() errors and the checks on packet length indicate > > a serious problem. Due to either a bug in calling code, or some sort > > of memory-clobbering, we've been given addresses or sizes of packets > > that are nonsensical. If things are this bad, we shouldn't try to carry > > on replace these with asserts. > > > > 3) Requesting a packet index that doesn't exist in the pool in packet_get() > > indicates a bug in the caller - it should always check the index first. > > Replace this with an assert. > > The reasons for 2) and 3) being trace() messages (they should probably > be debug() following the same reasoning as 1)) are graceful degradation > and security (availability). Hmm.. > If we have a packet triggering some sort of "memory clobbering", or an > issue in the caller, that might very well be just one packet or a few > of them. I see the case for packet_add_do(). In that case it could be that this is just a bad packet triggering bugs in the initial processing code, and just dropping it will be sufficient to save us from damaging state. For packet_get_do(), however, this indicates that we have a packet that's out of bounds, despite the fact that we checked its bounds when it was added to the pool. That means we *already* have corrupted memory, and we have no way to assess the extent of the damage. It's entirely possible we're already badly compromised. Under those circumstances I think it is much better to terminate immediately, rather than risk doing any more hard to trace damage.It depends on which part of packet_get_do(). If we're exceeding the packet count or the size of a pool, then I guess yes, otherwise not so much.True, the later patches are for different reasons. I'll reexamine what lives in the series.All the other checks, from packet_check_range() must indicate something very bad.I see the relationship to that up to patch 6/12. After that, not so much.Ok, but I'm still not really seeing how to distinguish the cases without the benefit of hindsight.> I think it's a disservice to users to crash in that case, and it has > the potential to worsen a security flaw if this packet can be built on > purpose. It's also a disservice to package maintainers because it has > the potential to turn a harmless issue into an annoying situation with > associated time pressure and everything. Ok, that argument makes sense to me for packet_add_do(), but again, not for packet_get_do(). > Those are not warn() calls, by the way, because we don't want to make > it too easy, in that (perhaps unlikely) case, to flood logs, and > possibly hide information due to rotation. Right, I see the argument, but it also worries me that the message - which even if we are successfully containing the damage here definitely indicates a bug in the caller - could easily never be seen at trace or even debug level. > I suppose that the argument for asserts here is so that we discover > functional issues as soon as possible, which I understand of course, That's the main reason. The secondary reason is that it clarifies the meaning of a NULL return from packet_get_do(). With this change it means: you requested a range that's not in the packet, whereas previously it could mean either that or "something is horribly wrong in the pool state, and we don't know what". > but it relies on two assumptions that might not hold: > > 1. the crash will be reported, while a debug() message will go > unnoticed. > > Well, the crash is more noticeable, but that doesn't mean that it > will be reported. Users might/will try to hack something around it > or try to survive with slirp4netns and all its flaws. I suppose so. > On the other hand, a malfunction (say, failed connections) might be > equally likely to be reported, along with debug() messages. That only applies if this causes malfunctions in intended behaviour. If it's triggered due to an attack, rather than a flaw handling expected behaviour then there's no reason it would be reported. > 2. the check and the assertion themselves are correct. This was not the > case for the two most recent severe issues (commit 61c0b0d0f199, bug > #105) we discovered through assertions. That's fair, I can see why you'd be gun shy about asserts given that. > Issues such as https://github.com/containers/podman/issues/22925, on > the other hand, are good examples of how asserts saved us a lot > of time and effort later on (compared to, say, "connections stop > working after three days"). But I would argue that those are > substantially different cases where assertions are checking > something that's much more inherent to the implementation. Eh... I can't really see a way these assertions are different other than that one turned out to be wrong and the other didn't.Ah, sorry, I meant that there is a substantial difference between the ones related to Podman issue 22925 *and what might happen in packet_get_range()*. Not that there's a difference between those related to Podman issue 22925 and those from commit 61c0b0d0f199 and bug #105.I mean, I see your point. This is basically following on from the MTU stuff - I was tracing what enforces / relies on packet/datagram/frame size limits and making sure they're consistent through the entire chain. At the moment we allow the user to set --mtu 65535 (i.e. 65535 byte IP datagrams, not including L2), but the pool layer (amongst other things) doesn't allow us to actually deliver it. This does cause user visible (though minor) problems, such as bug 65 & bug 66.> In this case, I would argue that there's no need to crash, so we > shouldn't. We can presumably carry on after that, and we're not leaking > any resource or sneakily leave behind some condition that will cause > apparently unrelated issues at a later time. Again, true (plausibly) for packet_add_do(), but not for packet_get_do(). > > 4) On packet_get() requesting a section of the packet beyond its bounds is > > correct already. This happens routinely from some callers when they > > probe to see if certain parts of the packet are present. At worst it > > indicates that the guest has generate a malformed packet, which we > > should quietly drop as we already do. This patch doesn't change that. We check against the stated packet bounds first and return NULL *before* we check against the buffer bounds and possibly trigger an assert.Yes... I wasn't actually disputing that (quote is from yourself).Ok... here's a tentative proposal, let me know what you think:So, first off, I have to say that it's a bit hard to reason about something that is not giving users any problem at the moment (well, at least not without a part of this series), and which didn't really give us problems in the past. It's difficult to quantify with experience and examples (even letting alone the priority of the whole matter for a moment...). Anyway, let me try:I understand that there might be inconsistencies (at least in your interpretation) but you wouldn't be making them any worse.No.I hope merging the two series will make what's going on clearer.I thought you implemented something like that, by the way, but I can't find it right now. Yes, a "warn once" thing would fix all this. I think we could happily warn() or err() on anything like that if we had it. By the way, same here, I guess that wouldn't really need to be in the same series as patches up to 6/12 here.Hm, fair point. I feel like at some point we're probably going to have to bite the bullet and implement rate-limited messages and/or some sort of "warn once" thing at some point.* I promote the packet_check_range() messages to warn(), or even err(). Yes, this might allow for log flooding, but I would argue that if we have a bug allowing this path to be triggered and something is frequently triggering it, we *want* that to be noisy. We *think* we're containing the damage here, but we don't really know.I still think it's our responsibility to not flood the system log in *any* circumstance. Remember that this is not necessarily *our* log where the worst that can happen is that we hide information. Even with sane system log configurations, one message per packet can get close to a substantial denial of service (especially with systemd-syslogd, see also https://github.com/systemd/systemd/issues/13425). I guess you don't realise until you run passt without -f, for example with libvirt, for a large number of virtual machines. Or in KubeVirt. Or for a bunch of containers. It's a common and very much intended usage.No. We've already checked the requested range against the bounds of the packet (and returned a soft error if it's out of bounds). If the range is bad once we reach packet_check_range(), then the packet is bad too.That's not necessarily the case: it could be that we just use an offset that extends beyond the end of the pool, due to bad/missing handling of some malformed packet/request. Maybe the packet is fine, but we're asking for something that's not.It's not true in general that we don't know. We can quantify the difference in damage very clearly if the attacker manages to make packet_check_range() print one line for each packet (and nothing else).I really don't see the difference. Since we're talking packet_get_do() here, this means that we've *already* inserted a packet in the pool which extends outside the buffer(s) it's supposed to be in.* I make packet_check_range() failures in packet_add_do() drop the packet and continue, as they used to * packet_check_range() failures in packet_get_do() will still assert()I guess the (start < p->buf) one makes sense, but not so much the one where we check that offset plus length is within the pool, because that one could be very well triggered by a specific packet itself.I'm talking about this one: if (start + len + offset > p->buf + p->buf_size) {-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonBut we can't have inserted this packet with packet_add_do(), because it checked the same things non-fatally. Which means either a packet has been inserted bypassing packet_add_do() or the pool metadata has been corrupted since then. Either of those situations indicates something very, very bad. > > * I leave the assert()s for packet _index_ out of range > > Okay, yes, those make sense to me as well. > > > > > Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> > > > > --- > > > > packet.c | 71 ++++++++++++++++++++--------------------------------- > > > > packet.h | 3 ++- > > > > vu_common.c | 13 +++++++--- > > > > 3 files changed, 38 insertions(+), 49 deletions(-) > > > > > > > > diff --git a/packet.c b/packet.c > > > > index c921aa15..24f12448 100644 > > > > --- a/packet.c > > > > +++ b/packet.c > > > > @@ -30,37 +30,27 @@ > > > > * @func: For tracing: name of calling function > > > > * @line: For tracing: caller line of function call > > > > * > > > > - * Return: 0 if the range is valid, -1 otherwise > > > > + * ASSERT()s if the given range isn't valid (within the expected buffer(s) for > > > > + * the pool). > > > > */ > > > > -static int packet_check_range(const struct pool *p, const char *ptr, size_t len, > > > > - const char *func, int line) > > > > +static void packet_check_range(const struct pool *p, > > > > + const char *ptr, size_t len, > > > > + const char *func, int line) > > > > { > > > > if (p->buf_size == 0) { > > > > - int ret; > > > > - > > > > - ret = vu_packet_check_range((void *)p->buf, ptr, len); > > > > - > > > > - if (ret == -1) > > > > - trace("cannot find region, %s:%i", func, line); > > > > - > > > > - return ret; > > > > - } > > > > - > > > > - if (ptr < p->buf) { > > > > - trace("packet range start %p before buffer start %p, %s:%i", > > > > - (void *)ptr, (void *)p->buf, func, line); > > > > - return -1; > > > > - } > > > > - > > > > - if (ptr + len > p->buf + p->buf_size) { > > > > - trace("packet range end %p after buffer end %p, %s:%i", > > > > - (void *)(ptr + len), (void *)(p->buf + p->buf_size), > > > > - func, line); > > > > - return -1; > > > > + vu_packet_check_range((void *)p->buf, ptr, len, func, line); > > > > + return; > > > > } > > > > > > > > - return 0; > > > > + ASSERT_WITH_MSG(ptr >= p->buf, > > > > + "packet range start %p before buffer start %p, %s:%i", > > > > + (void *)ptr, (void *)p->buf, func, line); > > > > + ASSERT_WITH_MSG(ptr + len <= p->buf + p->buf_size, > > > > + "packet range end %p after buffer end %p, %s:%i", > > > > + (void *)(ptr + len), (void *)(p->buf + p->buf_size), > > > > + func, line); > > > > } > > > > + > > > > /** > > > > * packet_add_do() - Add data as packet descriptor to given pool > > > > * @p: Existing pool > > > > @@ -75,18 +65,16 @@ void packet_add_do(struct pool *p, size_t len, const char *start, > > > > size_t idx = p->count; > > > > > > > > if (idx >= p->size) { > > > > - trace("add packet index %zu to pool with size %zu, %s:%i", > > > > + debug("add packet index %zu to pool with size %zu, %s:%i", > > > > idx, p->size, func, line); > > > > return; > > > > } > > > > > > > > - if (packet_check_range(p, start, len, func, line)) > > > > - return; > > > > + packet_check_range(p, start, len, func, line); > > > > > > > > - if (len > PACKET_MAX_LEN) { > > > > - trace("add packet length %zu, %s:%i", len, func, line); > > > > - return; > > > > - } > > > > + ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, > > > > + "add packet length %zu (max %zu), %s:%i", > > > > + len, PACKET_MAX_LEN, func, line); > > > > > > > > p->pkt[idx].iov_base = (void *)start; > > > > p->pkt[idx].iov_len = len; > > > > @@ -111,16 +99,12 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, > > > > { > > > > char *ptr; > > > > > > > > - if (idx >= p->size || idx >= p->count) { > > > > - trace("packet %zu from pool size: %zu, count: %zu, %s:%i", > > > > - idx, p->size, p->count, func, line); > > > > - return NULL; > > > > - } > > > > - > > > > - if (len > PACKET_MAX_LEN) { > > > > - trace("packet data length %zu, %s:%i", len, func, line); > > > > - return NULL; > > > > - } > > > > + ASSERT_WITH_MSG(idx < p->size && idx < p->count, > > > > + "packet %zu from pool size: %zu, count: %zu, %s:%i", > > > > + idx, p->size, p->count, func, line); > > > > + ASSERT_WITH_MSG(len <= PACKET_MAX_LEN, > > > > + "packet range length %zu (max %zu), %s:%i", > > > > + len, PACKET_MAX_LEN, func, line); > > > > > > > > if (len + offset > p->pkt[idx].iov_len) { > > > > trace("data length %zu, offset %zu from length %zu, %s:%i", > > > > @@ -130,8 +114,7 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset, > > > > > > > > ptr = (char *)p->pkt[idx].iov_base + offset; > > > > > > > > - if (packet_check_range(p, ptr, len, func, line)) > > > > - return NULL; > > > > + packet_check_range(p, ptr, len, func, line); > > > > > > > > if (left) > > > > *left = p->pkt[idx].iov_len - offset - len; > > > > diff --git a/packet.h b/packet.h > > > > index f95cda08..b164f77e 100644 > > > > --- a/packet.h > > > > +++ b/packet.h > > > > @@ -31,7 +31,8 @@ struct pool { > > > > struct iovec pkt[]; > > > > }; > > > > > > > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len); > > > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len, > > > > + const char *func, int line); > > > > void packet_add_do(struct pool *p, size_t len, const char *start, > > > > const char *func, int line); > > > > void *packet_get_do(const struct pool *p, const size_t idx, > > > > diff --git a/vu_common.c b/vu_common.c > > > > index 531f8786..528b9b08 100644 > > > > --- a/vu_common.c > > > > +++ b/vu_common.c > > > > @@ -24,10 +24,13 @@ > > > > * @buf: Array of the available memory regions > > > > * @ptr: Start of desired data range > > > > * @size: Length of desired data range > > > > + * @func: For tracing: name of calling function > > > > + * @line: For tracing: caller line of function call > > > > > > If those arguments are mandatory, I would drop the "For tracing" > > > specification. > > > > > > > * > > > > - * Return: 0 if the zone is in a mapped memory region, -1 otherwise > > > > + * Aborts if the zone isn't in any of the device regions > > > > */ > > > > -int vu_packet_check_range(void *buf, const char *ptr, size_t len) > > > > +void vu_packet_check_range(void *buf, const char *ptr, size_t len, > > > > + const char *func, int line) > > > > { > > > > struct vu_dev_region *dev_region; > > > > > > > > @@ -37,10 +40,12 @@ int vu_packet_check_range(void *buf, const char *ptr, size_t len) > > > > > > > > if (m <= ptr && > > > > ptr + len <= m + dev_region->mmap_offset + dev_region->size) > > > > - return 0; > > > > + return; > > > > } > > > > > > > > - return -1; > > > > + abort_with_msg( > > > > + "package range at %p, length %zd not within dev region %s:%i", > > > > > > s/package/packet/ > > > > > > > + (void *)ptr, len, func, line); > > > > } > > > > > > > > /**