On Thu, 17 Nov 2022 16:02:43 +0000 "Richard W.M. Jones" <rjones(a)redhat.com> wrote:On Thu, Nov 17, 2022 at 04:49:31PM +0100, Stefano Brivio wrote:Ah, yes, we ignore EPOLLRDHUP, and there's a one-second timeout there, so it's actually expected.On Thu, 17 Nov 2022 15:33:06 +0000 "Richard W.M. Jones" <rjones(a)redhat.com> wrote:With EPOLLRDHUP removed it's a bit deadlock-y. One case I commonly see is: child (qemu): 1295637 write(5, "\0\0\0\0", 4 <unfinished ...> 1295637 <... write resumed>) = 4 1295637 poll([{fd=5, events=POLLIN|POLLOUT}], 1, -1 <unfinished ...> 1295637 <... poll resumed>) = 1 ([{fd=5, revents=POLLOUT}]) 1295637 read(3, <unfinished ...> 1295637 <... read resumed>"", 512) = 0 1295637 shutdown(5, SHUT_WR <unfinished ...> 1295637 <... shutdown resumed>) = 0 1295637 poll([{fd=5, events=POLLIN}], 1, -1 <unfinished ...> then later the parent (passt): 1295636 epoll_wait(5, 0x7ffe93a894e0, 8, 1000) = 1 1295636 recvfrom(4, 0x5576383cf000, 8323069, MSG_DONTWAIT, NULL, NULL) = 4 1295636 epoll_wait(5, [], 8, 1000) = 0 1295636 epoll_wait(5, [], 8, 1000) = 0 1295636 epoll_wait(5, [], 8, 1000) = 0 1295636 epoll_wait(5, [], 8, 1000) = 0 1295636 epoll_wait(5, [], 8, 1000) = 0 1295636 epoll_wait(5, [], 8, 1000) = 0 1295636 epoll_wait(5, [], 8, 1000) = 0 1295636 epoll_wait(5, [], 8, 1000) = 0 1295636 epoll_wait(5, [], 8, 1000) = 0 (forever)On Thu, Nov 17, 2022 at 04:26:40PM +0100, Stefano Brivio wrote:Out-of-band, so to speak: we won't even recv() if we get EPOLLRDHUP (that's handled in tap_handler()). If I do this on top of this patch: --- a/tap.c +++ b/tap.c @@ -1073,7 +1073,7 @@ void tap_sock_init(struct ctx *c) struct epoll_event ev = { 0 }; ev.data.fd = c->fd_tap; - ev.events = EPOLLIN | EPOLLET | EPOLLRDHUP; + ev.events = EPOLLIN | EPOLLET; Then it gets those four bytes: [pid 2538704] epoll_wait(5, 0x7ffedc4a6320, 8, 1000) = 1 [pid 2538704] recvfrom(4, 0x560797677000, 8323069, MSG_DONTWAIT, NULL, NULL) = 4 [pid 2538704] epoll_wait(5, [], 8, 1000) = 0 [pid 2538704] epoll_wait(5, 0x7ffedc4a6320, 8, 1000) = -1 EINTR (Interrupted system call) and does nothing with them, as expected. Two epoll_wait() calls later, the syscall is interrupted, I'm not sure why and how we should react (in main(), passt.c) in that case.From: "Richard W.M. Jones" <rjones(a)redhat.com> This passes a fully connected stream socket to passt. Signed-off-by: Richard W.M. Jones <rjones(a)redhat.com> [sbrivio: reuse fd_tap instead of adding a new descriptor, imply --one-off on --fd, add to optstring and usage()] Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- v2: - reuse fd_tap, we don't need a separate file descriptor - add F to optstring and usage(), for both passt and pasta - imply --one-off, we can't do much once the socket is closed With this, the trick from 5/5 goes a bit further: passt reads from the file descriptor passed by the wrapper.Thanks for the v2 .. I'll add it to my series and play with it.However, we get EPOLLRDHUP right away, from the close() on one end of the socket pair I guess. Should we just ignore all EPOLLRDHUP events, just the first one...?Does it see the event out-of-band or does it get an in-band read(2) == 0 after finishing reading the data?Removing EPOLLET (edge triggered) delivers the EPOLLIN event over and over again to passt as expected: 1299436 recvfrom(4, "", 8323069, MSG_DONTWAIT, NULL, NULL) = 0 1299436 epoll_wait(5, 0x7ffdd8a65640, 8, 1000) = 1 1299436 recvfrom(4, "", 8323069, MSG_DONTWAIT, NULL, NULL) = 0 1299436 epoll_wait(5, 0x7ffdd8a65640, 8, 1000) = 1 1299436 recvfrom(4, "", 8323069, MSG_DONTWAIT, NULL, NULL) = 0 (forever)I'm not sure that's EPOLLIN, it could be anything (we don't check explicitly).but I expected that passt would exit the first time it reads 0 from the socket. tap_handler_passt() seems like it only considers the n<0 (error) and n>0 (data) cases. What do you think about checking for n == 0 and exiting immediately if c->one_off is true?I think it's reasonable in any case. At the same time, we don't necessarily need to ignore EPOLLRDHUP, we could also simply try to read from the socket first, that is, move this: if ((c->mode == MODE_PASST && tap_handler_passt(c, now)) || (c->mode == MODE_PASTA && tap_handler_pasta(c, now))) goto reinit; before this: if (events & (EPOLLRDHUP | EPOLLHUP | EPOLLERR)) goto reinit; and we'll just get EBADF on recv(). If we have data to read, we won't necessarily get a EPOLLRDHUP (or EPOLLHUP) and EPOLLIN separately, so trying to read first makes sense for the case you described. An explicit check on EPOLLIN is probably a good idea too: read if we have anything to read, first, then quit if we have a reason to do so. -- Stefano