Re: [PATCH v2 09/12] packet: Distinguish severities of different packet_{add,git}_do() errors

8 Jan 2025

On Mon, Jan 06, 2025 at 11:55:25AM +0100, Stefano Brivio wrote:
...
On Fri, 3 Jan 2025 16:06:57 +1100
David Gibson  wrote:
...
On Thu, Jan 02, 2025 at 11:00:08PM +0100, Stefano Brivio wrote:
...
On Thu, 2 Jan 2025 13:58:19 +1100
David Gibson  wrote:
...
On Wed, Jan 01, 2025 at 10:54:41PM +0100, Stefano Brivio wrote:
...
On Fri, 20 Dec 2024 19:35:32 +1100
David Gibson  wrote:
...
packet_add() and packet_get() can fail for a number of reasons, and we
currently treat them all basically the same: we log a trace() level message
and for packet_get() we return NULL.  However the different causes of
error are quite different and suggest different handling:
1) If we run out of space in the pool on add, that's (rightly) non-fatal,
   but is an unusual situation which we might want to know about.  Promote
   it's message to debug() level.
2) All packet_check_range() errors and the checks on packet length indicate
   a serious problem.  Due to either a bug in calling code, or some sort
   of memory-clobbering, we've been given addresses or sizes of packets
   that are nonsensical.  If things are this bad, we shouldn't try to carry
   on replace these with asserts.
3) Requesting a packet index that doesn't exist in the pool in packet_get()
   indicates a bug in the caller - it should always check the index first.
   Replace this with an assert.
The reasons for 2) and 3) being trace() messages (they should probably
be debug() following the same reasoning as 1)) are graceful degradation
and security (availability).
Hmm..
...
If we have a packet triggering some sort of "memory clobbering", or an
issue in the caller, that might very well be just one packet or a few
of them.
I see the case for packet_add_do().  In that case it could be that
this is just a bad packet triggering bugs in the initial processing
code, and just dropping it will be sufficient to save us from damaging
state.
For packet_get_do(), however, this indicates that we have a packet
that's out of bounds, despite the fact that we checked its bounds when
it was added to the pool.  That means we *already* have corrupted
memory, and we have no way to assess the extent of the damage.  It's
entirely possible we're already badly compromised.  Under those
circumstances I think it is much better to terminate immediately,
rather than risk doing any more hard to trace damage.
It depends on which part of packet_get_do(). If we're exceeding the
packet count or the size of a pool, then I guess yes, otherwise not so
much.
There's one, maybe two of the tests in packet_get_do() that don't
indicate a serious error:
  if (len + offset > p->pkt[idx].iov_len) {
is definitely ok, and you've convinced me I should try harder to make
sure this doesn't print any error (even trace()) in at least the "try"
cases.
  if (len > UINT16_MAX) {
is questionable.  Technically it could be ok, but since that parameter
is generally a constant, chances are it does indicate a bad error in
the caller.
...generally, yes, but I'm sure we don't want to crash on:
- a malformed DHCP request plus subtle bug (not saying that there's
  one, just that I could see myself adding one), dhcp():
packet_get(p, 0, offset + opt_off + 2,   *olen,  NULL);
- a malformed TCP header plus some kind of bug, tcp_data_from_tap():
data = packet_get(p, i, off, len, NULL);
Yes, those would need "bad errors". But the bad error here would crash
us with an assert, and cause us to throw away one packet without it.
Actually, I realised this borderline case is addressed by the end of
the series.  The "hard" length check is moved into
packet_check_range() and is always fatal.  However, *before* we do
that we indirectly check the range length against the packet length
with
        if (len + offset > p->pkt[idx].iov_len) {

If that fails, we hit a "soft" error (returning NULL), before we call
packet_check_range() and possibly trigger a "hard" error (assert).

I'll attempt to get the order of operations right when I rework this,
so we don't have interim patches where this breaks.
...
...
All the other checks, from packet_check_range() must indicate
something very bad.
...
...
...
I think it's a disservice to users to crash in that case, and it has
the potential to worsen a security flaw if this packet can be built on
purpose. It's also a disservice to package maintainers because it has
the potential to turn a harmless issue into an annoying situation with
associated time pressure and everything.
Ok, that argument makes sense to me for packet_add_do(), but again,
not for packet_get_do().
...
Those are not warn() calls, by the way, because we don't want to make
it too easy, in that (perhaps unlikely) case, to flood logs, and
possibly hide information due to rotation.
Right, I see the argument, but it also worries me that the message -
which even if we are successfully containing the damage here
definitely indicates a bug in the caller - could easily never be seen
at trace or even debug level.
...
I suppose that the argument for asserts here is so that we discover
functional issues as soon as possible, which I understand of course,
That's the main reason.  The secondary reason is that it clarifies the
meaning of a NULL return from packet_get_do().  With this change it
means: you requested a range that's not in the packet, whereas
previously it could mean either that or "something is horribly wrong
in the pool state, and we don't know what".
...
but it relies on two assumptions that might not hold:
1. the crash will be reported, while a debug() message will go
   unnoticed.
Well, the crash is more noticeable, but that doesn't mean that it
   will be reported. Users might/will try to hack something around it
   or try to survive with slirp4netns and all its flaws.
I suppose so.
...
On the other hand, a malfunction (say, failed connections) might be
   equally likely to be reported, along with debug() messages.
That only applies if this causes malfunctions in intended behaviour.
If it's triggered due to an attack, rather than a flaw handling
expected behaviour then there's no reason it would be reported.
...
2. the check and the assertion themselves are correct. This was not the
   case for the two most recent severe issues (commit 61c0b0d0f199, bug
   #105) we discovered through assertions.
That's fair, I can see why you'd be gun shy about asserts given that.
...
Issues such as https://github.com/containers/podman/issues/22925, on
   the other hand, are good examples of how asserts saved us a lot
   of time and effort later on (compared to, say, "connections stop
   working after three days"). But I would argue that those are
   substantially different cases where assertions are checking
   something that's much more inherent to the implementation.
Eh... I can't really see a way these assertions are different other
than that one turned out to be wrong and the other didn't.
Ah, sorry, I meant that there is a substantial difference between the
ones related to Podman issue 22925 *and what might happen in
packet_get_range()*.
Not that there's a difference between those related to Podman issue
22925 and those from commit 61c0b0d0f199 and bug #105.
Ok, but I'm still not really seeing how to distinguish the cases
without the benefit of hindsight.
...
...
...
In this case, I would argue that there's no need to crash, so we
shouldn't. We can presumably carry on after that, and we're not leaking
any resource or sneakily leave behind some condition that will cause
apparently unrelated issues at a later time.
Again, true (plausibly) for packet_add_do(), but not for
packet_get_do().
...
...
4) On packet_get() requesting a section of the packet beyond its bounds is
   correct already.  This happens routinely from some callers when they
   probe to see if certain parts of the packet are present.  At worst it
   indicates that the guest has generate a malformed packet, which we
   should quietly drop as we already do.
This patch doesn't change that.  We check against the stated packet
bounds first and return NULL *before* we check against the buffer
bounds and possibly trigger an assert.
Yes... I wasn't actually disputing that (quote is from yourself).
...
Ok... here's a tentative proposal, let me know what you think:
So, first off, I have to say that it's a bit hard to reason about
something that is not giving users any problem at the moment (well, at
least not without a part of this series), and which didn't really give
us problems in the past.
It's difficult to quantify with experience and examples (even letting
alone the priority of the whole matter for a moment...). Anyway, let me
try:
I mean, I see your point.  This is basically following on from the MTU
stuff - I was tracing what enforces / relies on packet/datagram/frame
size limits and making sure they're consistent through the entire
chain.  At the moment we allow the user to set --mtu 65535 (i.e. 65535
byte IP datagrams, not including L2), but the pool layer (amongst
other things) doesn't allow us to actually deliver it.  This does
cause user visible (though minor) problems, such as bug 65 & bug 66.
I see the relationship to that up to patch 6/12. After that, not so
much.
True, the later patches are for different reasons.  I'll reexamine
what lives in the series.
...
I understand that there might be inconsistencies (at least in your
interpretation) but you wouldn't be making them any worse.
...
I hope merging the two series will make what's going on clearer.
...
...
* I promote the packet_check_range() messages to warn(), or even
    err().  Yes, this might allow for log flooding, but I would argue
    that if we have a bug allowing this path to be triggered and
    something is frequently triggering it, we *want* that to be noisy.
    We *think* we're containing the damage here, but we don't really
    know.
I still think it's our responsibility to not flood the system log in
*any* circumstance.
Remember that this is not necessarily *our* log where the worst that
can happen is that we hide information. Even with sane system log
configurations, one message per packet can get close to a substantial
denial of service (especially with systemd-syslogd, see also
https://github.com/systemd/systemd/issues/13425).
I guess you don't realise until you run passt without -f, for example
with libvirt, for a large number of virtual machines. Or in KubeVirt.
Or for a bunch of containers. It's a common and very much intended
usage.
Hm, fair point.  I feel like at some point we're probably going to
have to bite the bullet and implement rate-limited messages and/or
some sort of "warn once" thing at some point.
I thought you implemented something like that, by the way, but I can't
find it right now.
Yes, a "warn once" thing would fix all this. I think we could happily
warn() or err() on anything like that if we had it.
By the way, same here, I guess that wouldn't really need to be in the
same series as patches up to 6/12 here.
No.
...
...
...
It's not true in general that we don't know. We can quantify the
difference in damage very clearly if the attacker manages to make
packet_check_range() print one line for each packet (and nothing
else).
...
* I make packet_check_range() failures in packet_add_do() drop the
    packet and continue, as they used to
  * packet_check_range() failures in packet_get_do() will still
    assert()
I guess the (start < p->buf) one makes sense, but not so much the one
where we check that offset plus length is within the pool, because that
one could be very well triggered by a specific packet itself.
I really don't see the difference.  Since we're talking
packet_get_do() here, this means that we've *already* inserted a
packet in the pool which extends outside the buffer(s) it's supposed
to be in.
That's not necessarily the case: it could be that we just use an offset
that extends beyond the end of the pool, due to bad/missing handling of
some malformed packet/request.
Maybe the packet is fine, but we're asking for something that's not.
No.  We've already checked the requested range against the bounds of
the packet (and returned a soft error if it's out of bounds).  If the
range is bad once we reach packet_check_range(), then the packet is
bad too.
...
I'm talking about this one:
if (start + len + offset > p->buf + p->buf_size) {
...
But we can't have inserted this packet with
packet_add_do(), because it checked the same things non-fatally.
Which means either a packet has been inserted bypassing
packet_add_do() or the pool metadata has been corrupted since then.
Either of those situations indicates something very, very bad.
...
...
* I leave the assert()s for packet _index_ out of range
Okay, yes, those make sense to me as well.
...
...
...
Signed-off-by: David Gibson 
---
 packet.c    | 71 ++++++++++++++++++++---------------------------------
 packet.h    |  3 ++-
 vu_common.c | 13 +++++++---
 3 files changed, 38 insertions(+), 49 deletions(-)

diff --git a/packet.c b/packet.c
index c921aa15..24f12448 100644
--- a/packet.c
+++ b/packet.c
@@ -30,37 +30,27 @@
  * @func:	For tracing: name of calling function
  * @line:	For tracing: caller line of function call
  *
- * Return: 0 if the range is valid, -1 otherwise
+ * ASSERT()s if the given range isn't valid (within the expected buffer(s) for
+ * the pool).
  */
-static int packet_check_range(const struct pool *p, const char *ptr, size_t len,
-			      const char *func, int line)
+static void packet_check_range(const struct pool *p,
+			       const char *ptr, size_t len,
+			       const char *func, int line)
 {
  if (p->buf_size == 0) {
-		int ret;
-
-		ret = vu_packet_check_range((void *)p->buf, ptr, len);
-
-		if (ret == -1)
-			trace("cannot find region, %s:%i", func, line);
-
-		return ret;
-	}
-
-	if (ptr < p->buf) {
-		trace("packet range start %p before buffer start %p, %s:%i",
-		      (void *)ptr, (void *)p->buf, func, line);
-		return -1;
-	}
-
-	if (ptr + len > p->buf + p->buf_size) {
-		trace("packet range end %p after buffer end %p, %s:%i",
-		      (void *)(ptr + len), (void *)(p->buf + p->buf_size),
-		      func, line);
-		return -1;
+		vu_packet_check_range((void *)p->buf, ptr, len, func, line);
+		return;
  }
-	return 0;
+	ASSERT_WITH_MSG(ptr >= p->buf,
+			"packet range start %p before buffer start %p, %s:%i",
+			(void *)ptr, (void *)p->buf, func, line);
+	ASSERT_WITH_MSG(ptr + len <= p->buf + p->buf_size,
+			"packet range end %p after buffer end %p, %s:%i",
+			(void *)(ptr + len), (void *)(p->buf + p->buf_size),
+			func, line);
 }
+
 /**
  * packet_add_do() - Add data as packet descriptor to given pool
  * @p:		Existing pool
@@ -75,18 +65,16 @@ void packet_add_do(struct pool *p, size_t len, const char *start,
  size_t idx = p->count;
if (idx >= p->size) {
-		trace("add packet index %zu to pool with size %zu, %s:%i",
+		debug("add packet index %zu to pool with size %zu, %s:%i",
            idx, p->size, func, line);
      return;
  }
-	if (packet_check_range(p, start, len, func, line))
-		return;
+	packet_check_range(p, start, len, func, line);
-	if (len > PACKET_MAX_LEN) {
-		trace("add packet length %zu, %s:%i", len, func, line);
-		return;
-	}
+	ASSERT_WITH_MSG(len <= PACKET_MAX_LEN,
+			"add packet length %zu (max %zu), %s:%i",
+			len, PACKET_MAX_LEN, func, line);
p->pkt[idx].iov_base = (void *)start;
  p->pkt[idx].iov_len = len;
@@ -111,16 +99,12 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset,
 {
  char *ptr;
-	if (idx >= p->size || idx >= p->count) {
-		trace("packet %zu from pool size: %zu, count: %zu, %s:%i",
-		      idx, p->size, p->count, func, line);
-		return NULL;
-	}
-
-	if (len > PACKET_MAX_LEN) {
-		trace("packet data length %zu, %s:%i", len, func, line);
-		return NULL;
-	}
+	ASSERT_WITH_MSG(idx < p->size && idx < p->count,
+			"packet %zu from pool size: %zu, count: %zu, %s:%i",
+			idx, p->size, p->count, func, line);
+	ASSERT_WITH_MSG(len <= PACKET_MAX_LEN,
+			"packet range length %zu (max %zu), %s:%i",
+			len, PACKET_MAX_LEN, func, line);
if (len + offset > p->pkt[idx].iov_len) {
      trace("data length %zu, offset %zu from length %zu, %s:%i",
@@ -130,8 +114,7 @@ void *packet_get_do(const struct pool *p, size_t idx, size_t offset,
ptr = (char *)p->pkt[idx].iov_base + offset;
-	if (packet_check_range(p, ptr, len, func, line))
-		return NULL;
+	packet_check_range(p, ptr, len, func, line);
if (left)
      *left = p->pkt[idx].iov_len - offset - len;
diff --git a/packet.h b/packet.h
index f95cda08..b164f77e 100644
--- a/packet.h
+++ b/packet.h
@@ -31,7 +31,8 @@ struct pool {
  struct iovec pkt[];
 };
-int vu_packet_check_range(void *buf, const char *ptr, size_t len);
+void vu_packet_check_range(void *buf, const char *ptr, size_t len,
+			   const char *func, int line);
 void packet_add_do(struct pool *p, size_t len, const char *start,
         const char *func, int line);
 void *packet_get_do(const struct pool *p, const size_t idx,
diff --git a/vu_common.c b/vu_common.c
index 531f8786..528b9b08 100644
--- a/vu_common.c
+++ b/vu_common.c
@@ -24,10 +24,13 @@
  * @buf:	Array of the available memory regions
  * @ptr:	Start of desired data range
  * @size:	Length of desired data range
+ * @func:	For tracing: name of calling function
+ * @line:	For tracing: caller line of function call
If those arguments are mandatory, I would drop the "For tracing"
specification.
...
*
- * Return: 0 if the zone is in a mapped memory region, -1 otherwise
+ * Aborts if the zone isn't in any of the device regions
  */
-int vu_packet_check_range(void *buf, const char *ptr, size_t len)
+void vu_packet_check_range(void *buf, const char *ptr, size_t len,
+			   const char *func, int line)
 {
  struct vu_dev_region *dev_region;
@@ -37,10 +40,12 @@ int vu_packet_check_range(void *buf, const char *ptr, size_t len)
if (m <= ptr &&
          ptr + len <= m + dev_region->mmap_offset + dev_region->size)
-			return 0;
+			return;
  }
-	return -1;
+	abort_with_msg(
+		"package range at %p, length %zd not within dev region %s:%i",
s/package/packet/
...
+		(void *)ptr, len, func, line);
 }
/**
-- 
David Gibson (he or they)	| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you, not the other way
				| around.
http://www.ozlabs.org/~dgibson