On Thu, Jan 16, 2025 at 09:32:49PM +0100, Stefano Brivio wrote:Before SO_PEEK_OFF support was introduced by commit e63d281871ef ("tcp: leverage support of SO_PEEK_OFF socket option when available"), we would peek data from sockets using a "discard" buffer as first iovec element, so that, unless we had no pending data at all, we would always get a positive return code from recvmsg() (except for closing connections or errors). If we couldn't send more data to the guest, in the window, we would set the STALLED flag (causing the epoll descriptor to switch to edge-triggered mode), and return early from tcp_data_from_sock(). With SO_PEEK_OFF, we don't have a discard buffer, and if there's data on the socket, but nothing beyond our current peeking offset, we'll get EAGAIN instead of our current "discard" length. In that case, we return even earlier, and we don't set EPOLLET on the socket as a result. As reported by Asahi Lina, this causes event loops where the kernel is signalling socket readiness, because there's data we didn't dequeue yet (waiting for the guest to acknowledge it), but we won't actually peek anything new, and return early without setting EPOLLET. This is the original report, mentioning the originally proposed fix:Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au> -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson