Re: [PATCH v2 07/12] packet: Remove unhelpful packet_get_try() macro

6 Jan 2025


      On Fri, 3 Jan 2025 15:48:47 +1100
David Gibson  wrote:
...
On Thu, Jan 02, 2025 at 11:00:04PM +0100, Stefano Brivio wrote:
...
On Thu, 2 Jan 2025 13:15:40 +1100
David Gibson  wrote:
...
On Wed, Jan 01, 2025 at 10:54:37PM +0100, Stefano Brivio wrote:
...
On Fri, 20 Dec 2024 19:35:30 +1100
David Gibson  wrote:
...
Two places in the code use the packet_get_try() variant on packet_get().
The difference is that packet_get_try() passes a NULL 'func' to
packet_get_do(), which suppresses log messages.  The places we use this
are ones where we expect to sometimes have a failure retreiving the packet
range, even in normal cases.  So, suppressing the log messages seems like
it makes sense, except:
1) It suppresses log messages on all errors.  We (somewhat) expect to hit
   the case where the requested range is not within the received packet.
   However, it also suppresses message in cases where the requested packet
   index doesn't exist, the requested range has a nonsensical length or
   doesn't lie in even the right vague part of memory.  None of those
   latter cases are expected, and the message would be helpful if we ever
   actually hit them.
2) The suppressed messages aren't actually that disruptive.  For the case
   in ip.c, we'd log only if we run out of IPv6 packet before reaching a
   (non-option) L4 header.  That shouldn't be the case in normal operation
   so getting a message (at trave level) is not unreasonable.
   For the case in dhcpv6.c we do suppress a message every time we look for
   but don't find a specific DHCPv6 option.  That can happen in perfectly
   ok cases, but, again these are trace() level and DHCPv6 transactions
   aren't that common.  Suppressing the message for this case doesn't
   seem worth the drawbacks of (1).
The reason why I implemented packet_get_try() is that the messages from
packet_get_do() indicate serious issues, and if I'm debugging something
by looking at traces it's not great to have messages indicating that we
hit a serious issue while we're simply validating identity associations.
I'm not following your argument here.  It's exactly because (most of)
the message indicate serious issues that I don't want to suppress
them.  I don't know what you mean by "validating identity
associations".
But dhcpv6_opt() trying to get data that doesn't exist is *not* an
issue, including not a serious one, so if I'm debugging something with
--trace and I see one of these messages I'll shout at "memory" or
"packet" badness and waste time thinking it's an actual issue.
Oh.. I think I see the confusion.  dhcpv6_opt() trying to get data
that's not in the packet is not an issue.  dhcpv6_opt() trying to get
data that is (theoretically) within the packet, but *not* in the
buffer indicates something very bad has happened.  The former is
exactly one check, every other one is the second class - trying to
separate those cases is the purpose of the later "different
severities" patch.
The difficulty is that passing func==NULL to indicate the "try" case
doesn't work if we want to still give useful errors for the serious
cases: we need the function name for those too.
I had been considering printing occasional trace level messages for
the ok case an acceptable tradeoff for not suppressing the messages
which are serious.  But I see your case or that being too confusing
when debugging.  I did have a draft where I used an explicit boolean
flag to enable/disable the non-serious errors, but gave up on it for
simplicity.
I'll look into a way to continue suppressing the non-serious error
here.  Maybe moving the (single) non-serious error case message into
the caller with a wrapper.
Okay, yes, thanks, that would be helpful.
...
...
Validating identity associations (IA_NA, IA_TA, RFC 3315) is what
dhcpv6_ia_notonlink() does. That's the most common case where we'll
routinely call dhcpv6_opt() to fetch data which isn't there.
Ok.
...
...
...
It's not about the amount of logged messages, it's about the type of
message being logged and the distracting noise possibly resulting in a
substantial time waste.
About 1): dhcpv6_opt() always picks pool index 0, and the base offset
was already checked by the caller.
Right, but dhcpv6_opt() is called in a loop, that only stops when it
returns NULL.  So, by construction the last call to dhcpv6_opt(),
which terminates the loop, _will_ have a failing call to packet_get().
At this point - at least assuming a correctly constructed packet - the
offset will point to just past the last option, which should be
exactly at the end of the packet.
Yes, I get that, and:
- I would be happy if that one were *not* reported as failure
Right, that's also my preference, but as above I compromised on this
to simplify preserving the error cases that do matter.
...
- the calls before that one should always be enough to check if we have
  an actual issue with the packet
Yes, in this case I think that's correct.
...
...
...
In ipv6_l4hdr(), the index was
already validated by a previous call to packet_get(), and the starting
offset as well.
Not AFAICT, the initial packet_get just validates the basic IPv6
header.  The calls to packet_get_try() in the loop validate additional
IP options.  I don't think it will ever fail on a well-constructed
packet, but it could on a bogus (or truncated) packet, where the
nexthdr field indicates an option that's actually missing.
This is kind of my point: it will only trip on a badly constructed
packet, in which case I don't think we want to suppress messages.
There, I used packet_get_try() because a missing option or payload
doesn't indicate a bad packet at the _data_ level.
Not really sure what you mean by the data level, here.
I meant that in the sense of Layer-2: the packet is fine at Layer 2 in
the sense that there's no missing data, but it's bad at Layer-3 level
because IPv6 doesn't allow a missing next-option.
...
...
On the other hand, it's bad at the network level anyway, because option
59 *must* be there otherwise (I just realised), so while I'd still
prefer another wording of the warning (not mentioning packet/buffer
ranges... something more network-y), I would be fine with it.
That sounds like another argument for moving the message for the
"requested range is outside packet" case into the caller.
-- 
Stefano