On Thu, Oct 05, 2023 at 08:19:00AM +0200, Stefano Brivio wrote:On Tue, 3 Oct 2023 14:22:59 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:No, I guess we could do that, but I think it would be substantially messier than the parallel array approach.On Fri, Sep 29, 2023 at 05:19:50PM +0200, Stefano Brivio wrote:Oh, I thought you wanted to rebuild the information about the connection by looking into the hash table or something like that.On Thu, 28 Sep 2023 11:58:45 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:More importantly, I forgot the fact that by the time we're sending the frames, we don't know what connection they're associated with any more.On Wed, Sep 27, 2023 at 07:06:03PM +0200, Stefano Brivio wrote: > On Mon, 25 Sep 2023 14:47:52 +1000 > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > On Sat, Sep 23, 2023 at 12:06:09AM +0200, Stefano Brivio wrote: > > > ...so that we'll retry sending them, instead of more-or-less silently > > > dropping them. This happens quite frequently if our sending buffer on > > > the UNIX domain socket is heavily constrained (for instance, by the > > > 208 KiB default memory limit). > > > > > > It might be argued that dropping frames is part of the expected TCP > > > flow: we don't dequeue those from the socket anyway, so we'll > > > eventually retransmit them. > > > > > > But we don't need the receiver to tell us (by the way of duplicate or > > > missing ACKs) that we couldn't send them: we already know as > > > sendmsg() reports that. This seems to considerably increase > > > throughput stability and throughput itself for TCP connections with > > > default wmem_max values. > > > > > > Unfortunately, the 16 bits left as padding in the frame descriptors > > > > I assume you're referring to the 'pad' fields in tcp[46]_l2_buf_t, > > yes? > > Right, that. > > > For AVX2 we have substantially more space here. Couldn't we put > > a conn (or seq) pointer in here at the cost of a few bytes MSS for > > non-AVX2 and zero cost for AVX2 (which is probably the majority case)? > > Yes, true. On the other hand, having this parallel array only affects > readability I guess, whereas inserting pointers and lengths in > tcp[46]_l2_buf_t actually decreases the usable MSS (not just on > non-AVX2 x86, but also on other architectures). So I'd rather stick to > this. Yeah, I guess so. Actually.. I did just think of one other option. It avoids both any extra padding and a parallel array, but at the cost of additional work when frames are dropped. We could use that 16-bits of padding to store the TCP payload length. Then when we don't manage to send all our frames, we do another loop through and add up how many stream bytes we actually sent to update the seq pointer.Hmm, yes. It's slightly more memory efficient, but the complexity seems a bit overkill to me.Ah, right, that could work. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson[snip]That tcp_l2_buf_fill_headers() calculates the sequence from conn->seq_to_tap plus a cumulative count from that table, instead of passing it from the caller.Not really sure what you're proposing there.> > > @@ -2282,14 +2310,15 @@ static int tcp_data_from_sock(struct ctx *c, struct tcp_tap_conn *conn) > > > > > > /* Finally, queue to tap */ > > > plen = mss; > > > + seq = conn->seq_to_tap; > > > > This will only be correct if tcp_l2_data_buf_flush() is *always* > > called between tcp_data_from_sock() calls for the same socket. That > > should be true for the normal course of things. However, couldn't it > > happen that we get a normal socket EPOLLIN event for a particular > > connection - calling tcp_data_from_sock() - but in the same epoll() > > round we also get a tap ack for the same connection which causes > > another call to tcp_data_from_sock() (with the change from patch > > 2/5). IIRC those would both happen before the deferred handling and > > therefore the data_buf_flush(). > > Ah, yes, I actually wrote this before 2/5 and concluded it was okay :/ > but with that change, it's not. Unless we drop that change from 2/5. Even if we drop the change, it's a worryingly subtle constraint.Another option to avoid this...> > Not sure how to deal with that short of separate 'seq_queued' and > > 'seq_sent' counters in the connection structure, which is a bit > > unfortunate. > > I wonder how bad it is if we call tcp_l2_data_buf_flush() > unconditionally before calling tcp_data_from_sock() from > tcp_tap_handler(). But again, maybe this is not needed at all, we > should check that epoll detail from 2/5 first...other than this one, would be to use that external table to update sequence numbers *in the frames* as we send stuff out.