On Fri, 3 Jan 2025 15:48:47 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Jan 02, 2025 at 11:00:04PM +0100, Stefano Brivio wrote:Okay, yes, thanks, that would be helpful.On Thu, 2 Jan 2025 13:15:40 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Oh.. I think I see the confusion. dhcpv6_opt() trying to get data that's not in the packet is not an issue. dhcpv6_opt() trying to get data that is (theoretically) within the packet, but *not* in the buffer indicates something very bad has happened. The former is exactly one check, every other one is the second class - trying to separate those cases is the purpose of the later "different severities" patch. The difficulty is that passing func==NULL to indicate the "try" case doesn't work if we want to still give useful errors for the serious cases: we need the function name for those too. I had been considering printing occasional trace level messages for the ok case an acceptable tradeoff for not suppressing the messages which are serious. But I see your case or that being too confusing when debugging. I did have a draft where I used an explicit boolean flag to enable/disable the non-serious errors, but gave up on it for simplicity. I'll look into a way to continue suppressing the non-serious error here. Maybe moving the (single) non-serious error case message into the caller with a wrapper.On Wed, Jan 01, 2025 at 10:54:37PM +0100, Stefano Brivio wrote:But dhcpv6_opt() trying to get data that doesn't exist is *not* an issue, including not a serious one, so if I'm debugging something with --trace and I see one of these messages I'll shout at "memory" or "packet" badness and waste time thinking it's an actual issue.On Fri, 20 Dec 2024 19:35:30 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: > Two places in the code use the packet_get_try() variant on packet_get(). > The difference is that packet_get_try() passes a NULL 'func' to > packet_get_do(), which suppresses log messages. The places we use this > are ones where we expect to sometimes have a failure retreiving the packet > range, even in normal cases. So, suppressing the log messages seems like > it makes sense, except: > > 1) It suppresses log messages on all errors. We (somewhat) expect to hit > the case where the requested range is not within the received packet. > However, it also suppresses message in cases where the requested packet > index doesn't exist, the requested range has a nonsensical length or > doesn't lie in even the right vague part of memory. None of those > latter cases are expected, and the message would be helpful if we ever > actually hit them. > > 2) The suppressed messages aren't actually that disruptive. For the case > in ip.c, we'd log only if we run out of IPv6 packet before reaching a > (non-option) L4 header. That shouldn't be the case in normal operation > so getting a message (at trave level) is not unreasonable. > For the case in dhcpv6.c we do suppress a message every time we look for > but don't find a specific DHCPv6 option. That can happen in perfectly > ok cases, but, again these are trace() level and DHCPv6 transactions > aren't that common. Suppressing the message for this case doesn't > seem worth the drawbacks of (1). The reason why I implemented packet_get_try() is that the messages from packet_get_do() indicate serious issues, and if I'm debugging something by looking at traces it's not great to have messages indicating that we hit a serious issue while we're simply validating identity associations.I'm not following your argument here. It's exactly because (most of) the message indicate serious issues that I don't want to suppress them. I don't know what you mean by "validating identity associations".I meant that in the sense of Layer-2: the packet is fine at Layer 2 in the sense that there's no missing data, but it's bad at Layer-3 level because IPv6 doesn't allow a missing next-option.Validating identity associations (IA_NA, IA_TA, RFC 3315) is what dhcpv6_ia_notonlink() does. That's the most common case where we'll routinely call dhcpv6_opt() to fetch data which isn't there.Ok.Right, that's also my preference, but as above I compromised on this to simplify preserving the error cases that do matter.Yes, I get that, and: - I would be happy if that one were *not* reported as failureIt's not about the amount of logged messages, it's about the type of message being logged and the distracting noise possibly resulting in a substantial time waste. About 1): dhcpv6_opt() always picks pool index 0, and the base offset was already checked by the caller.Right, but dhcpv6_opt() is called in a loop, that only stops when it returns NULL. So, by construction the last call to dhcpv6_opt(), which terminates the loop, _will_ have a failing call to packet_get(). At this point - at least assuming a correctly constructed packet - the offset will point to just past the last option, which should be exactly at the end of the packet.- the calls before that one should always be enough to check if we have an actual issue with the packetYes, in this case I think that's correct.Not really sure what you mean by the data level, here.There, I used packet_get_try() because a missing option or payload doesn't indicate a bad packet at the _data_ level.In ipv6_l4hdr(), the index was already validated by a previous call to packet_get(), and the starting offset as well.Not AFAICT, the initial packet_get just validates the basic IPv6 header. The calls to packet_get_try() in the loop validate additional IP options. I don't think it will ever fail on a well-constructed packet, but it could on a bogus (or truncated) packet, where the nexthdr field indicates an option that's actually missing. This is kind of my point: it will only trip on a badly constructed packet, in which case I don't think we want to suppress messages.-- StefanoOn the other hand, it's bad at the network level anyway, because option 59 *must* be there otherwise (I just realised), so while I'd still prefer another wording of the warning (not mentioning packet/buffer ranges... something more network-y), I would be fine with it.That sounds like another argument for moving the message for the "requested range is outside packet" case into the caller.