On Fri, Oct 03, 2025 at 05:27:14PM +0200, Laurent Vivier wrote:
Move epoll_ctl() calls from sock_l4_sa() to the protocol-specific code (icmp.c, pif.c, udp_flow.c) to give callers more control over epoll registration. This allows sock_l4_sa() to focus solely on socket creation and binding, while epoll management happens at a higher level.
Remove the data parameter from sock_l4_sa() and flowside_sock_l4() as it's no longer needed - callers now construct the full epoll_ref and register the socket themselves after creation.
I like this idea in principle. I've previously thought that doing the epoll registration in sock_l4_sa() was verging on a layering violation. However... [snip]
diff --git a/icmp.c b/icmp.c index bd3108a21675..d6f0abe68269 100644 --- a/icmp.c +++ b/icmp.c @@ -172,10 +172,11 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, { uint8_t proto = af == AF_INET ? IPPROTO_ICMP : IPPROTO_ICMPV6; uint8_t flowtype = af == AF_INET ? FLOW_PING4 : FLOW_PING6; - union epoll_ref ref = { .type = EPOLL_TYPE_PING }; union flow *flow = flow_alloc(); struct icmp_ping_flow *pingf; const struct flowside *tgt; + struct epoll_event ev; + union epoll_ref ref;
if (!flow) return NULL; @@ -196,9 +197,7 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c,
pingf->seq = -1;
- ref.flowside = FLOW_SIDX(flow, TGTSIDE); - pingf->sock = flowside_sock_l4(c, EPOLL_TYPE_PING, PIF_HOST, - tgt, ref.data); + pingf->sock = flowside_sock_l4(c, EPOLL_TYPE_PING, PIF_HOST, tgt);
if (pingf->sock < 0) { warn("Cannot open \"ping\" socket. You might need to:"); @@ -210,6 +209,18 @@ static struct icmp_ping_flow *icmp_ping_new(const struct ctx *c, if (pingf->sock > FD_REF_MAX) goto cancel;
+ ref.type = EPOLL_TYPE_PING; + ref.flowside = FLOW_SIDX(flow, TGTSIDE); + ref.fd = pingf->sock; + + ev.events = EPOLLIN; + ev.data.u64 = ref.u64; + if (epoll_ctl(c->epollfd, EPOLL_CTL_ADD, pingf->sock, &ev) == -1) { + warn_perror("L4 epoll_ctl"); + close(pingf->sock); + goto cancel; + } +
... this is uncomfortably bulky to have in every protocol. Could we maybe mitigate this with an epoll_add() helper of some sort that does at least some of the bit shuffling to construct the epoll data? -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson