Hi, Thank you so much for this project! I'm running a rootless container using pasta. Everything works great but after a while (anything from minutes to hours) I see the error "Failed to open tun socket in namespace" in my systemd logs, and my container loses access to the internet. Do you happen to have any idea how I would go about debugging this? Thanks a lot in advance :) PS: I'm running pasta from podman with arguments --map-gw,--dns,<my-dns>,-i,wlan0,--outbound-if4,wireguard0,--outbound-if6,wireguard0,--config-net
Hi Gianluca, On Sun, 26 Mar 2023 13:42:19 +0100 "Gianluca Stivan" <me(a)yawnt.com> wrote:Hi, Thank you so much for this project!And thanks for testing it!I'm running a rootless container using pasta. Everything works great but after a while (anything from minutes to hours) I see the error "Failed to open tun socket in namespace" in my systemd logs, and my container loses access to the internet.That happens when pasta detects an issue with the tap file descriptor (error or closure reported by the kernel, or failed read), so it closes it and tries to open it again. There are two issues actually: 1. the problem with the tap device causing this and 2. the fact that pasta fails to close and reopen the file descriptor.Do you happen to have any idea how I would go about debugging this? Thanks a lot in advance :) PS: I'm running pasta from podman with arguments --map-gw,--dns,<my-dns>,-i,wlan0,--outbound-if4,wireguard0,--outbound-if6,wireguard0,--config-netLet's try to debug issue 1. first: can you pass -d (--debug) and -l (--log-file) here, then check the lines at the end, when pasta terminates? You might want to play with --log-size too if there's a lot of activity. -- Stefano
Thanks for the reply! Here is the ending of the log file: ... <a lot a lot of STALLED and dropped...> 3944.3562: TCP: index 19, timer expires in 2.000s 3944.3563: TCP: index 19, timer expires in 2.000s 3944.3566: TCP: index 19: STALLED 3944.3578: TCP: index 7: STALLED dropped 3944.3579: TCP: index 7, timer expires in 2.000s 3944.3584: TCP: index 7: STALLED 3944.3586: TCP: index 7: STALLED dropped 3944.3587: TCP: index 7: ACK_FROM_TAP_DUE dropped 3944.3588: TCP: index 7: ACK_FROM_TAP_DUE 3944.3588: TCP: index 7, timer expires in 2.000s 3944.3589: TCP: index 12: STALLED dropped 3944.3590: TCP: index 12, timer expires in 2.000s 3944.3620: tap write: No buffer space available 3944.3620: TCP: index 17: STALLED dropped 3944.3620: TCP: index 17: ACK_FROM_TAP_DUE dropped 3944.4420: ERROR: Failed to open tun socket in namespace No buffer space available seems interesting? Thanks! PS: let me know if you'd like the full file! On Mon, Mar 27, 2023, at 10:47, Stefano Brivio wrote:Hi Gianluca, On Sun, 26 Mar 2023 13:42:19 +0100 "Gianluca Stivan" <me(a)yawnt.com> wrote:Hi, Thank you so much for this project!And thanks for testing it!I'm running a rootless container using pasta. Everything works great but after a while (anything from minutes to hours) I see the error "Failed to open tun socket in namespace" in my systemd logs, and my container loses access to the internet.That happens when pasta detects an issue with the tap file descriptor (error or closure reported by the kernel, or failed read), so it closes it and tries to open it again. There are two issues actually: 1. the problem with the tap device causing this and 2. the fact that pasta fails to close and reopen the file descriptor.Do you happen to have any idea how I would go about debugging this? Thanks a lot in advance :) PS: I'm running pasta from podman with arguments --map-gw,--dns,<my-dns>,-i,wlan0,--outbound-if4,wireguard0,--outbound-if6,wireguard0,--config-netLet's try to debug issue 1. first: can you pass -d (--debug) and -l (--log-file) here, then check the lines at the end, when pasta terminates? You might want to play with --log-size too if there's a lot of activity. -- Stefano
On Mon, 27 Mar 2023 13:22:12 +0100 "Gianluca Stivan" <me(a)yawnt.com> wrote:Thanks for the reply! Here is the ending of the log file: ... <a lot a lot of STALLED and dropped...> 3944.3562: TCP: index 19, timer expires in 2.000s 3944.3563: TCP: index 19, timer expires in 2.000s 3944.3566: TCP: index 19: STALLED 3944.3578: TCP: index 7: STALLED dropped 3944.3579: TCP: index 7, timer expires in 2.000s 3944.3584: TCP: index 7: STALLED 3944.3586: TCP: index 7: STALLED dropped 3944.3587: TCP: index 7: ACK_FROM_TAP_DUE dropped 3944.3588: TCP: index 7: ACK_FROM_TAP_DUE 3944.3588: TCP: index 7, timer expires in 2.000s 3944.3589: TCP: index 12: STALLED dropped 3944.3590: TCP: index 12, timer expires in 2.000s 3944.3620: tap write: No buffer space available 3944.3620: TCP: index 17: STALLED dropped 3944.3620: TCP: index 17: ACK_FROM_TAP_DUE dropped 3944.4420: ERROR: Failed to open tun socket in namespace No buffer space available seems interesting?Interesting, I had never seen ENOBUFS from a tap file descriptor. Let me find out where it might come from (and how pasta should react to it), it might take a bit. What kernel version are you using by the way? I mean, maybe we just have to try again (as if we received EAGAIN): diff --git a/tap.c b/tap.c index 68ef480..21a786d 100644 --- a/tap.c +++ b/tap.c @@ -323,7 +323,8 @@ static size_t tap_send_frames_pasta(struct ctx *c, if (write(c->fd_tap, (char *)iov[i].iov_base, iov[i].iov_len) < 0) { debug("tap write: %s", strerror(errno)); - if (errno != EAGAIN && errno != EWOULDBLOCK) + if (errno != EAGAIN && errno != EWOULDBLOCK && + errno != ENOBUFS) tap_handler(c, c->fd_tap, EPOLLERR, NULL); i--; } but I'm not sure it's correct. You could probably try that as a workaround meanwhile.PS: let me know if you'd like the full file!No no, thank you, that was enough! About issue number 2. (the fact that we can't actually close and reopen the tap file descriptor after this failure), I'll check if I can easily reproduce that in a Podman setting. -- Stefano
Thanks for the reply, I'll give that a go and report back! Apologies, it seems my unusual use case is causing some headaches :) Kernel version: 5.19.0-35-generic (just a boring Ubuntu 22.04). Cheers, On Mon, Mar 27, 2023, at 14:02, Stefano Brivio wrote:On Mon, 27 Mar 2023 13:22:12 +0100 "Gianluca Stivan" <me(a)yawnt.com> wrote:Thanks for the reply! Here is the ending of the log file: ... <a lot a lot of STALLED and dropped...> 3944.3562: TCP: index 19, timer expires in 2.000s 3944.3563: TCP: index 19, timer expires in 2.000s 3944.3566: TCP: index 19: STALLED 3944.3578: TCP: index 7: STALLED dropped 3944.3579: TCP: index 7, timer expires in 2.000s 3944.3584: TCP: index 7: STALLED 3944.3586: TCP: index 7: STALLED dropped 3944.3587: TCP: index 7: ACK_FROM_TAP_DUE dropped 3944.3588: TCP: index 7: ACK_FROM_TAP_DUE 3944.3588: TCP: index 7, timer expires in 2.000s 3944.3589: TCP: index 12: STALLED dropped 3944.3590: TCP: index 12, timer expires in 2.000s 3944.3620: tap write: No buffer space available 3944.3620: TCP: index 17: STALLED dropped 3944.3620: TCP: index 17: ACK_FROM_TAP_DUE dropped 3944.4420: ERROR: Failed to open tun socket in namespace No buffer space available seems interesting?Interesting, I had never seen ENOBUFS from a tap file descriptor. Let me find out where it might come from (and how pasta should react to it), it might take a bit. What kernel version are you using by the way? I mean, maybe we just have to try again (as if we received EAGAIN): diff --git a/tap.c b/tap.c index 68ef480..21a786d 100644 --- a/tap.c +++ b/tap.c @@ -323,7 +323,8 @@ static size_t tap_send_frames_pasta(struct ctx *c, if (write(c->fd_tap, (char *)iov[i].iov_base, iov[i].iov_len) < 0) { debug("tap write: %s", strerror(errno)); - if (errno != EAGAIN && errno != EWOULDBLOCK) + if (errno != EAGAIN && errno != EWOULDBLOCK && + errno != ENOBUFS) tap_handler(c, c->fd_tap, EPOLLERR, NULL); i--; } but I'm not sure it's correct. You could probably try that as a workaround meanwhile.PS: let me know if you'd like the full file!No no, thank you, that was enough! About issue number 2. (the fact that we can't actually close and reopen the tap file descriptor after this failure), I'll check if I can easily reproduce that in a Podman setting. -- Stefano