Hi, I'm testing a DNS server in a rootless container using pasta, and I have seen that the IPv6 UDP packets are not reaching the service: $ dig www.google.com @fddc:f797:78ef:70::5 +short ;; communications error to fddc:f797:78ef:70::5#53: timed out ;; communications error to fddc:f797:78ef:70::5#53: timed out ;; communications error to fddc:f797:78ef:70::5#53: timed out ; <<>> DiG 9.18.15 <<>> www.google.com @fddc:f797:78ef:70::5 +short ;; global options: +cmd ;; no servers could be reached TCP over IPv6 and UDP, TCP over IPv4 works fine: $ dig www.google.com @fddc:f797:78ef:70::5 +short +tcp 216.239.38.120 $ dig www.google.com @192.168.7.5 +short 216.239.38.120 $ dig www.google.com @192.168.7.5 +short +tcp216.239.38.120 The pasta process is running with these arguments: /usr/bin/pasta --config-net -u 53-53:53-53 -t 53-53:53-53 -t 3003-3003:3003-3003 -T none -U none --no-map-gw --netns /run/user/1002/netns/netns-378b62b8-bf27-3b51-1fb1-e2ebb7119647 I'm using passt-0^20230509.g96f8d55-1.fc38.x86_64 from Fedora CoreOS 38. Is this a known bug? or am I doing something wrong? Thank you.
On Sat, May 27, 2023 at 02:22:47PM +0000, Juan Orti wrote:Hi, I'm testing a DNS server in a rootless container using pasta, and I have seen that the IPv6 UDP packets are not reaching the service: $ dig www.google.com @fddc:f797:78ef:70::5 +short ;; communications error to fddc:f797:78ef:70::5#53: timed out ;; communications error to fddc:f797:78ef:70::5#53: timed out ;; communications error to fddc:f797:78ef:70::5#53: timed out ; <<>> DiG 9.18.15 <<>> www.google.com @fddc:f797:78ef:70::5 +short ;; global options: +cmd ;; no servers could be reached TCP over IPv6 and UDP, TCP over IPv4 works fine: $ dig www.google.com @fddc:f797:78ef:70::5 +short +tcp 216.239.38.120 $ dig www.google.com @192.168.7.5 +short 216.239.38.120 $ dig www.google.com @192.168.7.5 +short +tcp216.239.38.120 The pasta process is running with these arguments: /usr/bin/pasta --config-net -u 53-53:53-53 -t 53-53:53-53 -t 3003-3003:3003-3003 -T none -U none --no-map-gw --netns /run/user/1002/netns/netns-378b62b8-bf27-3b51-1fb1-e2ebb7119647 I'm using passt-0^20230509.g96f8d55-1.fc38.x86_64 from Fedora CoreOS 38. Is this a known bug? or am I doing something wrong?So, we have some special cases related to port 53 - aimed at allowing the container to contact a nameserver outside. I don't think we thought much about the case of a DNS server inside the container. So my first guess would be that those special cases have an error that's interfering with your use case. If it's possible to try running your server on a port other than 53 temporarily that would be interesting to try. We also attempt to auto-configure those cases from the host's resolv.conf, so if you could share that it might shed some extra light. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
------- Original Message ------- El domingo, 28 de mayo de 2023 a las 07:23, David Gibson <david(a)gibson.dropbear.id.au> escribió:So, we have some special cases related to port 53 - aimed at allowing the container to contact a nameserver outside. I don't think we thought much about the case of a DNS server inside the container. So my first guess would be that those special cases have an error that's interfering with your use case. If it's possible to try running your server on a port other than 53 temporarily that would be interesting to try.Thanks for the suggestion. I've tried moving the listening port of this service (AdguardHome) to 54 and surprisingly it was still not working on UDPv6. Testing with a different DNS service (unbound) works fine even when using port 53. So this is a problem with this specific service. I don't understand why it's not working, as the service listens on the :: address. Maybe it's using a socket option that it's causing this? I need to investigate this further. # netstat -putan Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 :::3003 :::* LISTEN 2/AdGuardHome tcp 0 0 :::54 :::* LISTEN 2/AdGuardHome udp 0 0 :::54 :::* 2/AdGuardHome # cat /proc/net/udp6 sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops 4718: 00000000000000000000000000000000:0036 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 38510801 2 0000000073476783 0We also attempt to auto-configure those cases from the host's resolv.conf, so if you could share that it might shed some extra light.resolv.conf looks correct: # Host: nameserver 192.168.7.1 nameserver fddc:f797:78ef:70::1 search lan # Container: search lan nameserver 192.168.7.1 Thank you.
------- Original Message ------- El domingo, 28 de mayo de 2023 a las 12:12, Juan Orti <jorti(a)pm.me> escribió:I don't understand why it's not working, as the service listens on the :: address. Maybe it's using a socket option that it's causing this? I need to investigate this further. # netstat -putan Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 :::3003 :::* LISTEN 2/AdGuardHome tcp 0 0 :::54 :::* LISTEN 2/AdGuardHome udp 0 0 :::54 :::* 2/AdGuardHome # cat /proc/net/udp6 sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops 4718: 00000000000000000000000000000000:0036 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 38510801 2 0000000073476783 0After stracing the AdGuardHome process, I can see that the UDP packet is indeed reaching the service but it's the reply that fails to be sent: 11 10:25:04.445902 recvmsg(25<UDPv6:[38993134]>, <unfinished ...> 11 10:25:04.446238 <... recvmsg resumed>{msg_name={sa_family=AF_INET6, sin6_port=htons(33308), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fddc:f797:78ef:10::b46", &sin6_addr), sin6_scope_id=0}, msg_namelen=112 => 28, msg_iov=[{iov_base="\246\245\1 \0\1\0\0\0\0\0\1\3www\6google\3com\0\0\1\0\1\0\0)\4\320\0\0\0\0\0\f\0\n\0\10\207a\315\224\245\253\v\37", iov_len=65535}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32}], msg_controllen=40, msg_flags=0}, 0) = 55 <0.000059> 11 10:25:04.446371 futex(0xc000064548, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 11 10:25:04.446415 <... futex resumed>) = 1 <0.000032> 11 10:25:04.446461 recvmsg(25<UDPv6:[38993134]>, <unfinished ...> 11 10:25:04.446658 <... recvmsg resumed>{msg_namelen=112}, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000097> 11 10:25:04.447130 sendmsg(25<UDPv6:[38993134]>, {msg_name={sa_family=AF_INET6, sin6_port=htons(33308), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fddc:f797:78ef:10::b46", &sin6_addr), sin6_scope_id=0}, msg_namelen=28, msg_iov=[{iov_base="\246\245\201\200\0\1\0\1\0\0\0\0\3www\6google\3com\0\0\1\0\1\300\f\0\1\0\1\0\0\0\n\0\4\330\357&x", iov_len=48}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32}], msg_controllen=40, msg_flags=0}, 0) = -1 EINVAL (Invalid argument) <0.000020> It's not clear to me what's wrong with the sendmsg syscall. Any ideas? Thanks.
On Sun, 28 May 2023 10:50:20 +0000 Juan Orti <jorti(a)pm.me> wrote:------- Original Message ------- El domingo, 28 de mayo de 2023 a las 12:12, Juan Orti <jorti(a)pm.me> escribió:I guess that might come from the IPV6_PKTINFO ancillary data (cmsg_type 0x32) -- I'm not sure how and why it's used here as strace doesn't dump the CMSG_DATA content, but, having a look at ip6_datagram_send_ctl() (net/ipv6/datagram.c), EINVAL might come from: 1. a link-local address being passed along... I doubt that's the case 2. a non-local address (or one we can't bind to anyway) being used. To check if we're in this case, it would be helpful if you could share the addressing information from the container (ip -6 address show), and if you could try 'sysctl -w net.ipv6.ip_nonlocal_bind = 1', again from the container. -- StefanoI don't understand why it's not working, as the service listens on the :: address. Maybe it's using a socket option that it's causing this? I need to investigate this further. # netstat -putan Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 :::3003 :::* LISTEN 2/AdGuardHome tcp 0 0 :::54 :::* LISTEN 2/AdGuardHome udp 0 0 :::54 :::* 2/AdGuardHome # cat /proc/net/udp6 sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops 4718: 00000000000000000000000000000000:0036 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 38510801 2 0000000073476783 0After stracing the AdGuardHome process, I can see that the UDP packet is indeed reaching the service but it's the reply that fails to be sent: 11 10:25:04.445902 recvmsg(25<UDPv6:[38993134]>, <unfinished ...> 11 10:25:04.446238 <... recvmsg resumed>{msg_name={sa_family=AF_INET6, sin6_port=htons(33308), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fddc:f797:78ef:10::b46", &sin6_addr), sin6_scope_id=0}, msg_namelen=112 => 28, msg_iov=[{iov_base="\246\245\1 \0\1\0\0\0\0\0\1\3www\6google\3com\0\0\1\0\1\0\0)\4\320\0\0\0\0\0\f\0\n\0\10\207a\315\224\245\253\v\37", iov_len=65535}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32}], msg_controllen=40, msg_flags=0}, 0) = 55 <0.000059> 11 10:25:04.446371 futex(0xc000064548, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> 11 10:25:04.446415 <... futex resumed>) = 1 <0.000032> 11 10:25:04.446461 recvmsg(25<UDPv6:[38993134]>, <unfinished ...> 11 10:25:04.446658 <... recvmsg resumed>{msg_namelen=112}, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000097> 11 10:25:04.447130 sendmsg(25<UDPv6:[38993134]>, {msg_name={sa_family=AF_INET6, sin6_port=htons(33308), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "fddc:f797:78ef:10::b46", &sin6_addr), sin6_scope_id=0}, msg_namelen=28, msg_iov=[{iov_base="\246\245\201\200\0\1\0\1\0\0\0\0\3www\6google\3com\0\0\1\0\1\300\f\0\1\0\1\0\0\0\n\0\4\330\357&x", iov_len=48}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_IPV6, cmsg_type=0x32}], msg_controllen=40, msg_flags=0}, 0) = -1 EINVAL (Invalid argument) <0.000020> It's not clear to me what's wrong with the sendmsg syscall. Any ideas?
------- Original Message ------- El domingo, 28 de mayo de 2023 a las 16:38, Stefano Brivio <sbrivio(a)redhat.com> escribió:I guess that might come from the IPV6_PKTINFO ancillary data (cmsg_type 0x32) -- I'm not sure how and why it's used here as strace doesn't dump the CMSG_DATA content, but, having a look at ip6_datagram_send_ctl() (net/ipv6/datagram.c), EINVAL might come from: 1. a link-local address being passed along... I doubt that's the case 2. a non-local address (or one we can't bind to anyway) being used. To check if we're in this case, it would be helpful if you could share the addressing information from the container (ip -6 address show), and if you could try 'sysctl -w net.ipv6.ip_nonlocal_bind = 1', again from the container.net.ipv6.ip_nonlocal_bind=1 is not helping. This is the container network config: # ip -6 address show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000 inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp88s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 state UNKNOWN qlen 1000 inet6 fddc:f797:78ef:70::5/64 scope global flags 02 valid_lft forever preferred_lft forever inet6 fe80::5cef:4eff:fe6c:551f/64 scope link valid_lft forever preferred_lft forever # ip -6 r show table all fddc:f797:78ef:70::/64 dev enp88s0 metric 256 fe80::/64 dev enp88s0 metric 256 default via fe80::ea9f:80ff:fe5d:3d6e dev enp88s0 metric 1024 local ::1 dev lo table local metric 0 local fddc:f797:78ef:70::5 dev enp88s0 table local metric 0 local fe80::5cef:4eff:fe6c:551f dev enp88s0 table local metric 0 multicast ff00::/8 dev enp88s0 table local metric 256 With a tcpdump inside the container I can see that the incoming packets are actually arriving with the link-local address as the destination (is this expected?). 16:18:26.248659 IP6 (hlim 255, next-header UDP (17) payload length: 63) fddc:f797:78ef:10::b46.42091 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 6215+ [1au] A? www.google.com. (55) 16:18:31.253942 IP6 (hlim 255, next-header UDP (17) payload length: 63) fddc:f797:78ef:10::b46.34965 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 6215+ [1au] A? www.google.com. (55) 16:18:36.257294 IP6 (hlim 255, next-header UDP (17) payload length: 63) fddc:f797:78ef:10::b46.55302 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 6215+ [1au] A? www.google.com. (55) TCP also uses the link-local address, however it works: 16:20:50.933652 IP6 (flowlabel 0x000e0, hlim 255, next-header TCP (6) payload length: 28) fddc:f797:78ef:10::b46.36213 > fe80::5cef:4eff:fe6c:551f.53: Flags [S], cksum 0x8f00 (correct), seq 3612741141, win 65535, options [mss 4096,nop,wscale 7], length 0 16:20:50.933670 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5cef:4eff:fe6c:551f > ff02::1:ff5d:3d6e: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::ea9f:80ff:fe5d:3d6e source link-address option (1), length 8 (1): 5e:ef:4e:6c:55:1f 16:20:50.933675 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::5cef:4eff:fe6c:551f > ff02::1:ff5d:3d6e: [icmp6 sum ok] ICMP6, neighbor solicitation, length 32, who has fe80::ea9f:80ff:fe5d:3d6e source link-address option (1), length 8 (1): 5e:ef:4e:6c:55:1f 16:20:50.933910 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::ea9f:80ff:fe5d:3d6e > fe80::5cef:4eff:fe6c:551f: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::ea9f:80ff:fe5d:3d6e, Flags [router, solicited, override] destination link-address option (2), length 8 (1): 48:21:0b:32:59:8e 16:20:50.933915 IP6 (flowlabel 0x57ab1, hlim 64, next-header TCP (6) payload length: 28) fe80::5cef:4eff:fe6c:551f.53 > fddc:f797:78ef:10::b46.36213: Flags [S.], cksum 0x6abe (correct), seq 3302060021, ack 3612741142, win 65460, options [mss 65460,nop,wscale 7], length 0 16:20:50.933921 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 32) fe80::ea9f:80ff:fe5d:3d6e > fe80::5cef:4eff:fe6c:551f: [icmp6 sum ok] ICMP6, neighbor advertisement, length 32, tgt is fe80::ea9f:80ff:fe5d:3d6e, Flags [router, solicited, override] destination link-address option (2), length 8 (1): 48:21:0b:32:59:8e 16:20:50.933934 IP6 (flowlabel 0x000e0, hlim 255, next-header TCP (6) payload length: 77) fddc:f797:78ef:10::b46.36213 > fe80::5cef:4eff:fe6c:551f.53: Flags [.], cksum 0x4e3d (correct), seq 1:58, ack 1, win 46080, length 57 29365+ [1au] A? www.google.com. (55) 16:20:50.933943 IP6 (flowlabel 0x57ab1, hlim 64, next-header TCP (6) payload length: 20) fe80::5cef:4eff:fe6c:551f.53 > fddc:f797:78ef:10::b46.36213: Flags [.], cksum 0x8e07 (correct), ack 58, win 511, length 0 16:20:50.934239 IP6 (flowlabel 0x57ab1, hlim 64, next-header TCP (6) payload length: 70) fe80::5cef:4eff:fe6c:551f.53 > fddc:f797:78ef:10::b46.36213: Flags [P.], cksum 0x4c39 (correct), seq 1:51, ack 58, win 512, length 50 29365 1/0/0 www.google.com. A 216.239.38.120 (48) 16:20:50.934253 IP6 (flowlabel 0x000e0, hlim 255, next-header TCP (6) payload length: 20) fddc:f797:78ef:10::b46.36213 > fe80::5cef:4eff:fe6c:551f.53: Flags [.], cksum 0x8e6c (correct), ack 51, win 360, length 0 16:20:50.934795 IP6 (flowlabel 0x000e0, hlim 255, next-header TCP (6) payload length: 20) fddc:f797:78ef:10::b46.36213 > fe80::5cef:4eff:fe6c:551f.53: Flags [F.], cksum 0x8e6b (correct), seq 58, ack 51, win 360, length 0 16:20:50.934874 IP6 (flowlabel 0x57ab1, hlim 64, next-header TCP (6) payload length: 20) fe80::5cef:4eff:fe6c:551f.53 > fddc:f797:78ef:10::b46.36213: Flags [F.], cksum 0x8dd2 (correct), seq 51, ack 59, win 512, length 0 16:20:50.934888 IP6 (flowlabel 0x000e0, hlim 255, next-header TCP (6) payload length: 20) fddc:f797:78ef:10::b46.36213 > fe80::5cef:4eff:fe6c:551f.53: Flags [.], cksum 0x8e6a (correct), ack 52, win 360, length 0
On Sun, 28 May 2023 16:27:13 +0000 Juan Orti <jorti(a)pm.me> wrote:------- Original Message ------- El domingo, 28 de mayo de 2023 a las 16:38, Stefano Brivio <sbrivio(a)redhat.com> escribió:Hmm, it depends: https://passt.top/passt/tree/udp.c?id=e3b19530e4a689f9f8e417ebf737dfca23403… I'm not sure what's the original source address of our DNS query (you can find that out with tcpdump in the parent namespace). For example, if it's a loopback address, we go ahead and try to convert both source and destination address to our notion of (observed) link-local addresses, because we can't use a loopback address on a non-loopback interface (non-lo in the container). But I guess in this case it's not a loopback address: the default gateway address, copied to the container, is fe80::ea9f:80ff:fe5d:3d6e, which is a link-local address, but we don't use it, so I assume we end up either in the IN6_IS_ADDR_LINKLOCAL(src) condition, or in the final 'else' clause. At that point, the address we've seen the guest using becomes our destination address. It can even be a link-local address if we haven't observed a unicast address used, yet. It would be interesting to see what happens if you generate traffic, from the container, coming from fddc:f797:78ef:70::5, before a DNS query is sent (a TCP request via IPv6 should be enough). I'm not swearing on the correctness of this logic, it's a result of handling several corner cases, it's rather ugly at the moment, and David is currently considering how to clean that up. By the way, this might also happen to be "fixed" on HEAD, as there we copy all the addresses and all the routes, by default, from the parent namespace to the container namespace.I guess that might come from the IPV6_PKTINFO ancillary data (cmsg_type 0x32) -- I'm not sure how and why it's used here as strace doesn't dump the CMSG_DATA content, but, having a look at ip6_datagram_send_ctl() (net/ipv6/datagram.c), EINVAL might come from: 1. a link-local address being passed along... I doubt that's the case 2. a non-local address (or one we can't bind to anyway) being used. To check if we're in this case, it would be helpful if you could share the addressing information from the container (ip -6 address show), and if you could try 'sysctl -w net.ipv6.ip_nonlocal_bind = 1', again from the container.net.ipv6.ip_nonlocal_bind=1 is not helping. This is the container network config: # ip -6 address show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 state UNKNOWN qlen 1000 inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp88s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65520 state UNKNOWN qlen 1000 inet6 fddc:f797:78ef:70::5/64 scope global flags 02 valid_lft forever preferred_lft forever inet6 fe80::5cef:4eff:fe6c:551f/64 scope link valid_lft forever preferred_lft forever # ip -6 r show table all fddc:f797:78ef:70::/64 dev enp88s0 metric 256 fe80::/64 dev enp88s0 metric 256 default via fe80::ea9f:80ff:fe5d:3d6e dev enp88s0 metric 1024 local ::1 dev lo table local metric 0 local fddc:f797:78ef:70::5 dev enp88s0 table local metric 0 local fe80::5cef:4eff:fe6c:551f dev enp88s0 table local metric 0 multicast ff00::/8 dev enp88s0 table local metric 256 With a tcpdump inside the container I can see that the incoming packets are actually arriving with the link-local address as the destination (is this expected?). 16:18:26.248659 IP6 (hlim 255, next-header UDP (17) payload length: 63) fddc:f797:78ef:10::b46.42091 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 6215+ [1au] A? www.google.com. (55)16:18:31.253942 IP6 (hlim 255, next-header UDP (17) payload length: 63) fddc:f797:78ef:10::b46.34965 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 6215+ [1au] A? www.google.com. (55) 16:18:36.257294 IP6 (hlim 255, next-header UDP (17) payload length: 63) fddc:f797:78ef:10::b46.55302 > fe80::5cef:4eff:fe6c:551f.53: [udp sum ok] 6215+ [1au] A? www.google.com. (55) TCP also uses the link-local address, however it works:...yes, as far as I know there are no normative references preventing a non-link-local address from contacting a link-local one. This just happens to be a problem because AdguardHome uses IPV6_PKTINFO, with that same address I guess, in its sendmsg(), and for some reason I didn't really investigate that leads to EINVAL on Linux, but it looks like an implementation detail (specific to UDP) to me. -- Stefano