On Fri, 21 Feb 2025 17:37:18 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Fri, Feb 21, 2025 at 06:59:12AM +0100, Stefano Brivio wrote:Not in the LSMs we ship policies for, because we don't run as root, so we can't magically relabel (setfilecon()) sockets or change subprofile (aa_change_hat()) before/after creating some, even if we wanted. And from the little I know of TOMOYO that should be impossible as well.On Fri, 21 Feb 2025 13:40:12 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Well.. probably. I suspect something sufficiently insane in an LSM could break that rule.On Thu, Feb 20, 2025 at 11:38:00AM +0100, Stefano Brivio wrote:By the way, after having another look at the kernel interface and implementation: this makes no sense. Either we're able to set repair mode for all the sockets, or for none of them (EPERM).On Thu, 20 Feb 2025 21:18:06 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote: > This sort of thing is, incidentally, why I did way back suggest the > possibility of passt-repair reporting failures per-fd, rather than > just per-batch. Sorry, I somehow missed that proposal, and I can't find any trace of it.It may have just been on IRC somewhere.But anyway, the problem is that if we fail to read a batch for any reason (invalid ancillary data... maybe always implying a kernel issue, but I'm not sure), you can't _reliably_ report per-fd failures. *Usually*, you can. Worth it?Ah, I see. We could handle that by being able to report both per-fd and "whole batch" failure (equivalent to failure on every fd), but that would complexify the protocol, of course."It shouldn't happen in practice" == "It doesn't happen". And again, in this impossible case, we would either get a socket that's stuck, for two hours, or close it because we get some other error on it.And if there's any invalid file descriptor in the set, we'll get EBADF for the whole sendmsg(). The stuff I'm proposing below is well beyond my threshold of things that make no sense to implement, but at least it limits damage in terms of complexity (and hence of potential impact on the actual usage, because that's what we're talking about here: a die() that makes no sense but now proves to be actually harmful). I wouldn't go any further than that.Again, it shouldn't happen in practice, but you will get a mess if you ever managed to set repair mode, but failed to clear it.Not for what I'm proposing: 1. passt sends 1 (TCP_REPAIR_ON) for sockets 2, 3 2. passt-repair sets repair mode for 2 3. passt-repair fails to set repair mode for 3 4. passt-repair clears repair mode for 2In any case, if it's simple, we can still do it, because passt and passt-repair are distributed together. You can't pass back the file descriptors via SCM_RIGHTS though, because we want to close() them before we reply. Another alternative could be that passt-repair reverts back the state of the file descriptors that were already switched, on failure.That might help a bit, we'd still need to rework the passt-side interface to know what needs reverting at the right stage.There's a similar question in the other direction, if passt is trying to REPAIR_OFF, and we fail part way through. Should passt-repair REPAIR_ON again? I'm not sure it has the information to make a sane choice here.Yes, I think it should. The information it has is that it just set REPAIR_OFF, so repair mode was presumably on to begin with.I guess it conflicts with my priority which is to fix actual bugs without introducing new ones, but other than that, I'm not necessarily against it. -- Stefano5. passt-repair closes the connection to signal failure 6. passt knows that repair mode is *not* set for any socket in the batch The interface remains the same (per-batch error only), but you can rely on the whole batch to have failed. You already can, even with no changes in passt-repair (see above), by the way.Well, I just finished implementing a simple way of reporting partial failures. I guess see what you think of it.