On Fri, 21 Feb 2025 13:40:12 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Thu, Feb 20, 2025 at 11:38:00AM +0100, Stefano Brivio wrote:By the way, after having another look at the kernel interface and implementation: this makes no sense. Either we're able to set repair mode for all the sockets, or for none of them (EPERM). And if there's any invalid file descriptor in the set, we'll get EBADF for the whole sendmsg(). The stuff I'm proposing below is well beyond my threshold of things that make no sense to implement, but at least it limits damage in terms of complexity (and hence of potential impact on the actual usage, because that's what we're talking about here: a die() that makes no sense but now proves to be actually harmful). I wouldn't go any further than that.On Thu, 20 Feb 2025 21:18:06 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:It may have just been on IRC somewhere.This sort of thing is, incidentally, why I did way back suggest the possibility of passt-repair reporting failures per-fd, rather than just per-batch.Sorry, I somehow missed that proposal, and I can't find any trace of it.But anyway, the problem is that if we fail to read a batch for any reason (invalid ancillary data... maybe always implying a kernel issue, but I'm not sure), you can't _reliably_ report per-fd failures. *Usually*, you can. Worth it?Ah, I see. We could handle that by being able to report both per-fd and "whole batch" failure (equivalent to failure on every fd), but that would complexify the protocol, of course.Not for what I'm proposing: 1. passt sends 1 (TCP_REPAIR_ON) for sockets 2, 3 2. passt-repair sets repair mode for 2 3. passt-repair fails to set repair mode for 3 4. passt-repair clears repair mode for 2 5. passt-repair closes the connection to signal failure 6. passt knows that repair mode is *not* set for any socket in the batch The interface remains the same (per-batch error only), but you can rely on the whole batch to have failed. You already can, even with no changes in passt-repair (see above), by the way. -- StefanoIn any case, if it's simple, we can still do it, because passt and passt-repair are distributed together. You can't pass back the file descriptors via SCM_RIGHTS though, because we want to close() them before we reply. Another alternative could be that passt-repair reverts back the state of the file descriptors that were already switched, on failure.That might help a bit, we'd still need to rework the passt-side interface to know what needs reverting at the right stage.