On Fri, 22 May 2026 10:02:27 +1000
David Gibson
On Thu, May 21, 2026 at 08:01:45PM +0200, Stefano Brivio wrote:
...instead of the one dedicated to the neighbour monitor, because, if neighbour notifications start coming in before or while we send the initial request to read out the neighbour tables, messages and sequence numbers will collide.
For example, if nl_neigh_sync() sends a RTM_GETNEIGH request with sequence 20, we expect a corresponding reply with sequence 20. But given that we already used the same socket to subscribe to notifications, and notifications don't correspond to any specific request we sent, we might now get a message with sequence 0.
Heh. Called it, kinda. Nice job tracking this down.
Your suggestion that we were probably mixing up messages helped as a starting point. Then I think we concluded that _of course_ it's different sockets and they can't influence each other. So I started looking into how two netlink sockets belonging to the same process could possibly interfere with each other, until I tried to skip the creation of nl_neigh_sock and realised that nl_neigh_sync() stopped working altogether... oops. -- Stefano