On Mon, Jan 06, 2025 at 11:55:22AM +0100, Stefano Brivio wrote:
On Fri, 3 Jan 2025 15:48:47 +1100
David Gibson <david(a)gibson.dropbear.id.au> wrote:
On Thu, Jan 02, 2025 at 11:00:04PM +0100, Stefano
Brivio wrote:
On Thu, 2 Jan 2025 13:15:40 +1100
David Gibson <david(a)gibson.dropbear.id.au> wrote:
On Wed, Jan 01, 2025 at 10:54:37PM +0100, Stefano
Brivio wrote:
> On Fri, 20 Dec 2024 19:35:30 +1100
> David Gibson <david(a)gibson.dropbear.id.au> wrote:
>
> > Two places in the code use the packet_get_try() variant on packet_get().
> > The difference is that packet_get_try() passes a NULL 'func' to
> > packet_get_do(), which suppresses log messages. The places we use this
> > are ones where we expect to sometimes have a failure retreiving the packet
> > range, even in normal cases. So, suppressing the log messages seems like
> > it makes sense, except:
> >
> > 1) It suppresses log messages on all errors. We (somewhat) expect to hit
> > the case where the requested range is not within the received packet.
> > However, it also suppresses message in cases where the requested packet
> > index doesn't exist, the requested range has a nonsensical length or
> > doesn't lie in even the right vague part of memory. None of those
> > latter cases are expected, and the message would be helpful if we ever
> > actually hit them.
> >
> > 2) The suppressed messages aren't actually that disruptive. For the case
> > in ip.c, we'd log only if we run out of IPv6 packet before reaching a
> > (non-option) L4 header. That shouldn't be the case in normal
operation
> > so getting a message (at trave level) is not unreasonable.
> > For the case in dhcpv6.c we do suppress a message every time we look for
> > but don't find a specific DHCPv6 option. That can happen in perfectly
> > ok cases, but, again these are trace() level and DHCPv6 transactions
> > aren't that common. Suppressing the message for this case
doesn't
> > seem worth the drawbacks of (1).
>
> The reason why I implemented packet_get_try() is that the messages from
> packet_get_do() indicate serious issues, and if I'm debugging something
> by looking at traces it's not great to have messages indicating that we
> hit a serious issue while we're simply validating identity associations.
I'm not following your argument here. It's exactly because (most of)
the message indicate serious issues that I don't want to suppress
them. I don't know what you mean by "validating identity
associations".
But dhcpv6_opt() trying to get data that doesn't exist is *not* an
issue, including not a serious one, so if I'm debugging something with
--trace and I see one of these messages I'll shout at "memory" or
"packet" badness and waste time thinking it's an actual issue.
Oh.. I think I see the confusion. dhcpv6_opt() trying to get data
that's not in the packet is not an issue. dhcpv6_opt() trying to get
data that is (theoretically) within the packet, but *not* in the
buffer indicates something very bad has happened. The former is
exactly one check, every other one is the second class - trying to
separate those cases is the purpose of the later "different
severities" patch.
The difficulty is that passing func==NULL to indicate the "try" case
doesn't work if we want to still give useful errors for the serious
cases: we need the function name for those too.
I had been considering printing occasional trace level messages for
the ok case an acceptable tradeoff for not suppressing the messages
which are serious. But I see your case or that being too confusing
when debugging. I did have a draft where I used an explicit boolean
flag to enable/disable the non-serious errors, but gave up on it for
simplicity.
I'll look into a way to continue suppressing the non-serious error
here. Maybe moving the (single) non-serious error case message into
the caller with a wrapper.
Okay, yes, thanks, that would be helpful.
Validating identity associations (IA_NA, IA_TA, RFC 3315) is what
dhcpv6_ia_notonlink() does. That's the most common case where we'll
routinely call dhcpv6_opt() to fetch data which isn't there.
Ok.
>
It's not about the amount of logged messages, it's about the type of
> message being logged and the distracting noise possibly resulting in a
> substantial time waste.
>
> About 1): dhcpv6_opt() always picks pool index 0, and the base offset
> was already checked by the caller.
Right, but dhcpv6_opt() is called in a loop, that only stops when it
returns NULL. So, by construction the last call to dhcpv6_opt(),
which terminates the loop, _will_ have a failing call to packet_get().
At this point - at least assuming a correctly constructed packet - the
offset will point to just past the last option, which should be
exactly at the end of the packet.
Yes, I get that, and:
- I would be happy if that one were *not* reported as failure
Right, that's also my preference, but as above I compromised on this
to simplify preserving the error cases that do matter.
- the calls before that one should always be
enough to check if we have
an actual issue with the packet
Yes, in this case I think that's correct.
> In
ipv6_l4hdr(), the index was
> already validated by a previous call to packet_get(), and the starting
> offset as well.
Not AFAICT, the initial packet_get just validates the basic IPv6
header. The calls to packet_get_try() in the loop validate additional
IP options. I don't think it will ever fail on a well-constructed
packet, but it could on a bogus (or truncated) packet, where the
nexthdr field indicates an option that's actually missing.
This is kind of my point: it will only trip on a badly constructed
packet, in which case I don't think we want to suppress messages.
There, I used packet_get_try() because a missing option or payload
doesn't indicate a bad packet at the _data_ level.
Not really sure what you mean by the data level, here.
I meant that in the sense of Layer-2: the packet is fine at Layer 2 in
the sense that there's no missing data, but it's bad at Layer-3 level
because IPv6 doesn't allow a missing next-option.
Oh, right. That's true, but in this case the caller is code dealing
with L3 structure, so it seems reasonable a failure should give an
error.
On the other hand, it's bad at the network level
anyway, because option
59 *must* be there otherwise (I just realised), so while I'd still
prefer another wording of the warning (not mentioning packet/buffer
ranges... something more network-y), I would be fine with it.
That sounds like another argument for moving the message for the
"requested range is outside packet" case into the caller.
--
David Gibson (he or they) | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you, not the other way
| around.
http://www.ozlabs.org/~dgibson