If net.ipv4.ip_nonlocal_bind is enabled, the following code in tcp_conn_from_tap gets very confused: if (!bind(s, sa, sl)) { tcp_rst(c, conn); /* Nobody is listening then */ return; } if (errno != EADDRNOTAVAIL && errno != EACCES) conn_flag(c, conn, LOCAL); This is especially visible if net.ipv4.ip_unprivileged_port_start is set to a value lower than the default. For example, if net.ipv4.ip_unprivileged_port_start=443 and net.ipv4.ip_nonlocal_bind=1, the bind()==0 branch will be hit for all outgoing connections going to port 443 because bind() succeeds even when "sa" contains the remote address, and pretty much nothing will work. It might the best to skip the check and marking connections as LOCAL if net.ipv4.ip_nonlocal_bind is enabled? If that doesn't seem reasonable, then maybe show a warning at start and/or just document that the ip_nonlocal_bind setting shouldn't be used with passt? -Valtteri
On Tue, 18 Jul 2023 11:14:14 +0300 Valtteri Vuorikoski <vuori(a)notcom.org> wrote:If net.ipv4.ip_nonlocal_bind is enabled, the following code in tcp_conn_from_tap gets very confused: if (!bind(s, sa, sl)) { tcp_rst(c, conn); /* Nobody is listening then */ return; } if (errno != EADDRNOTAVAIL && errno != EACCES) conn_flag(c, conn, LOCAL); This is especially visible if net.ipv4.ip_unprivileged_port_start is set to a value lower than the default. For example, if net.ipv4.ip_unprivileged_port_start=443 and net.ipv4.ip_nonlocal_bind=1, the bind()==0 branch will be hit for all outgoing connections going to port 443 because bind() succeeds even when "sa" contains the remote address, and pretty much nothing will work.Ouch, I didn't think about that.It might the best to skip the check and marking connections as LOCAL if net.ipv4.ip_nonlocal_bind is enabled?This would mean that if ip_nonlocal_bind is set, we'll always override the MSS, which would break essentially any non-local connection.If that doesn't seem reasonable, then maybe show a warning at start and/or just document that the ip_nonlocal_bind setting shouldn't be used with passt?That's not really friendly, nor future-proof: https://bugs.passt.top/show_bug.cgi?id=48 I think we should go the relatively hard way of extracting the relevant logic from procfs_scan_listen(), and understand from there if there's a local bind for the port at hand. I'm not sure, then, if we should always use this mechanism, even if ip_nonlocal_bind isn't set, because bind() gives us a lightweight way to check for three conditions in one, and we're on a latency-critical path here, so if this results in more syscalls, I would read from procfs just in case we really need to. Feel free to send a patch, or file a bug, or both, or none. :) -- Stefano
On Wed, Jul 19, 2023 at 04:10:52PM +0200, Stefano Brivio wrote:Thanks for checking this out. But yeah, I looked at the alternatives a bit and none seemed really appealing. Maybe go for the proc route if nonlocal binds were enabled at startup? Luckily for me, it turned out that ip_nonlocal_bind was enabled on some servers due to a service that had since been removed, so this time we could solve the problem by just turning the sysctl off. I'll try to get something into bugzilla for this issue anyway. -ValtteriIf that doesn't seem reasonable, then maybe show a warning at start and/or just document that the ip_nonlocal_bind setting shouldn't be used with passt?That's not really friendly, nor future-proof: https://bugs.passt.top/show_bug.cgi?id=48 I think we should go the relatively hard way of extracting the relevant logic from procfs_scan_listen(), and understand from there if there's a local bind for the port at hand. I'm not sure, then, if we should always use this mechanism, even if ip_nonlocal_bind isn't set, because bind() gives us a lightweight way to check for three conditions in one, and we're on a latency-critical path here, so if this results in more syscalls, I would read from procfs just in case we really need to. Feel free to send a patch, or file a bug, or both, or none. :)