On Tue, Jan 02, 2024 at 07:13:41PM +0100, Stefano Brivio wrote:On Mon, 1 Jan 2024 23:01:17 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:No. Not allowing deletion of any entry at any time is what I'm trading off to get both O(1) allocation and (effectively) O(1) deletion.On Sat, Dec 30, 2023 at 11:33:04AM +0100, Stefano Brivio wrote:Oh, I thought that was only the case for this series and you would use that as actual deletion in another pending series (which I haven't finished reviewing yet).On Thu, 28 Dec 2023 19:25:25 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:Remember this is not a general deletion, only a "cancel" of the most recent allocation.> On Thu, 21 Dec 2023 17:15:49 +1100 > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > [...] [...] I wonder if we really have to keep track of the number of (non-)entries in the free "block", and if we have to be explicit about the two cases. I'm trying to find out if we can simplify the whole thing with slightly different variants, for example:So... I think the version with (explicit) blocks has this fundamental advantage, on deletion:> + flow->f.type = FLOW_TYPE_NONE; > + /* Put it back in a length 1 free block, don't attempt to fully reverse > + * flow_alloc()s steps. This will get folded together the next time > + * flow_defer_handler runs anyway() */ > + flow->free.n = 1; > + flow->free.next = flow_first_free; > + flow_first_free = FLOW_IDX(flow);which is doable even without explicit blocks, but much harder to follow.But now I'm not sure anymore why I was thinking this... Anyway... do we really need it, then? Can't we just mark the "failed" flows as whatever means "closed" for a specific protocol, and clean them up later, instead of calling cancel() right away?We could, but I'm not sure we want to. For starters, that requires protocol-specific behaviour whenever we need to back out an allocation like this. Not a big deal, since that's in protocol specific code already, but I think it's uglier than calling cancel. It also requires that the protocol specific deferred cleanup functions (e.g. tcp_flow_defer()) handle partially initialised entries. With 'cancel' we can back out just the initialisation steps we've already done (because we know where we've failed during init), then remove the entry. The deferred cleanup function only needs to deal with "complete" entries. Again, certainly possible, but IMO uglier than having 'cancel'. Finally, leaving the "cancelled" entry there until the next deferred pass means if we allocate other flows before that defer pass they'll avoid this slot. That works against trying to keep the entries reasonably compact.I'll see what I can in terms of making that clearer in the description.To reduce fragmentation we are keeping the linked list of free clusters in strictly ascending order. The logic above is only correct if the entry we're freeing is before any other free entry in the table. That will be the case for the most recent allocation, because we always allocatte the first free entry in the table.I see now. This isn't entirely clear from the "Theory of Operation", on the other hand it's probably a bad idea to overload that description with all these details.I'll see what I can come up with. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonRight, yes, I see now. This idea is also not terribly clear from the description, by the way, but I don't have a good proposal right now.Other than the comments I have, I would go ahead with your approach. My attempt below in case you're interested. The comment at the end of flow_del() shows the issue I'm facing: struct flow_free_block { /* Must be first element */ struct flow_common f; unsigned next_free; unsigned next_used; }; static union flow *flow_alloc(void) { union flow *flow; if (flow_first_free >= FLOW_MAX) return NULL; flow = flowtab + flow_first_free; flow_first_free = flow->free.next_free; memset(flow, 0, sizeof(*flow)); /* TODO: select region */ return flow; } static void flow_del(union flow *del) { union flow *left, *right, *next_used = NULL, *first_free;fgdel->f.type = FLOW_TYPE_NONE; left = (FLOW_IDX(del) > 0) ? FLOW(FLOW_IDX(del) - 1) : NULL; right = (FLOW_IDX(del) < FLOW_MAX - 1) ? FLOW(FLOW_IDX(del) + 1) : NULL; first_free = flow_first_free < FLOW_MAX ? FLOW(flow_first_free) : NULL; if (right) { if (right->f.type == FLOW_TYPE_NONE) del->free.next_used = right->free.next_used; else del->free.next_used = right; } else { del->free.next_used = FLOW_MAX; } if (left && left->f.type == FLOW_TYPE_NONE) { left->free.next_free = FLOW_IDX(del); left->free.next_used = del->free.next_used; return; } if (flow_first_free == FLOW_MAX) { flow_first_free = FLOW_IDX(del); } else if (flow_first_free > FLOW_IDX(del)) { flow_first_free = FLOW_IDX(del); del->free.next_free = flow_first_free; } else if (flow_first_free < FLOW_IDX(del)) { ; /* Walk free slots from flow_first_free until FLOW_IDX(del) to * find insertion point... but that makes deletion at most O(n), * perhaps O(log(n)), certainly not O(1).Exactly. I can't see any way (short of far more complex data structures) around having a linear scan somewhere, although you can kind of choose whether it happens on alloc or free. The idea of my implementation is to have it at free time - but to merge it with an existing linear scan so it won't have an additional cost.