On Fri, 26 Apr 2024 13:27:11 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Wed, Apr 24, 2024 at 08:30:44PM +0200, Stefano Brivio wrote:Definitely.On Wed, 24 Apr 2024 10:48:05 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:That makes sense, but I think it would need a per-connection flag.On Tue, Apr 23, 2024 at 07:50:10PM +0200, Stefano Brivio wrote:By the way of which, this is not documented at this point -- a man page patch (linux-man and linux-api lists) would be nice.On Sat, 20 Apr 2024 15:19:19 -0400 Jon Maloy <jmaloy(a)redhat.com> wrote:[snip]> + set_peek_offset(s, 0); Do we really need to initialise it to zero on a new connection? Extra system calls on this path matter for latency of connection establishment.Sort of, yes: we need to enable the SO_PEEK_OFF behaviour by setting it to 0, rather than the default -1.We could lazily enable it, but we'd need either to a) do it later in the handshake (maybe when we set ESTABLISHED), but we'd need to be careful it is always set before the first MSG_PEEKI was actually thinking that we could set it only as we receive data (not every connection will receive data), and keep this out of the handshake (which we want to keep "faster", I think).I forgot to mention this part: what I wanted to propose was actually still a global probe, so that we don't waste one system call per connection on kernels not supporting this (a substantial use case for a couple of years from now?), which probably outweighs the advantage of the weird, purely theoretical kernel not supporting the feature for some sockets only. And then something like PEEK_OFFSET_SET (SO_PEEK_OFF_SET sounds awkward to me) on top. Another advantage is avoiding the tristate you described. -- StefanoAnd setting it as we mark a connection as ESTABLISHED should have the same effect on latency as setting it on a new connection -- that's not really lazy. So, actually:Good point.I just realised that folding the feature detection into this is a bit costlier than I thought. If we globally probe the feature we just need one bit per connection: is SO_PEEK_OFF set yet or not. If we tried to probe per-connection we'd need a tristate: haven't tried / SO_PEEK_OFF enabled / tried and failed.or b) keep track of whether it's set on a per-socket basis (this would have the advantage of robustness if we ever encountered a kernel that weirdly allows it for some but not all TCP sockets)....this could be done as we receive data in tcp_data_from_sock(), with a new flag in tcp_tap_conn::flags, to avoid adding latency to the handshake. It also looks more robust to me, and done/checked in a single place where we need it. We have just three bits left there which isn't great, but if we need to save one at a later point, we can drop this new flag easily.