A recently reported podman bug shows transfer failures in podman custom rootless networks connected with pasta. Analysis suggests this is triggered by pasta generating a TCP packet without the ACK flag when it should have one. The exact symptoms seem to arise because of some odd kernel behaviour - rather than simply ignoring the packet, an RST is observed killing the connection. However, there are also packets seen after the RST which don't seem to make sense. While there are some mysteries which we still hope to track down here, in the meantime it definitely seems like pasta's ACK behaviour isn't correct, and appears to trigger the other problems. So, fix it. Link: https://github.com/containers/podman/issues/22146 Link: https://bugs.passt.top/show_bug.cgi?id=84 David Gibson (4): tcp: Split handling of DUP_ACK from ACK tcp: Rearrange logic for setting ACK flag in tcp_send_flag() tcp: Never automatically add the ACK flag to RST packets tcp: Unconditionally force ACK for all !SYN, !RST packets tcp.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) -- 2.44.0
The DUP_ACK flag to tcp_send_flag() has two effects: first it forces the setting of the ACK flag in the packet, even if we otherwise wouldn't. Secondly, it causes a duplicate of the flags packet to be sent immediately after the first. Setting the ACK flag to tcp_send_flag() also has the first effect, so instead of having DUP_ACK also do that, pass both flags when we need both operations. This slightly simplifies the logic of tcp_send_flag() in a way that makes some future changes easier. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tcp.c b/tcp.c index a1860d10..edd3d899 100644 --- a/tcp.c +++ b/tcp.c @@ -1677,7 +1677,7 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) th->ack = !!(flags & ACK); } else { - th->ack = !!(flags & (ACK | DUP_ACK)) || + th->ack = !!(flags & ACK)) || conn->seq_ack_to_tap != prev_ack_to_tap || !prev_wnd_to_tap; } @@ -2503,7 +2503,7 @@ out: */ if (conn->seq_dup_ack_approx != (conn->seq_from_tap & 0xff)) { conn->seq_dup_ack_approx = conn->seq_from_tap & 0xff; - tcp_send_flag(c, conn, DUP_ACK); + tcp_send_flag(c, conn, ACK | DUP_ACK); } return p->count - idx; } -- 2.44.0
We have different paths for controlling the ACK flag for the SYN and !SYN paths. This amounts to sometimes forcing on the ACK flag in the !SYN path regardless of options. We can rearrange things to explicitly be that which will make things neater for some future changes. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/tcp.c b/tcp.c index edd3d899..47954d11 100644 --- a/tcp.c +++ b/tcp.c @@ -1674,16 +1674,15 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) *data++ = OPT_WS; *data++ = OPT_WS_LEN; *data++ = conn->ws_to_tap; - - th->ack = !!(flags & ACK); } else { - th->ack = !!(flags & ACK)) || - conn->seq_ack_to_tap != prev_ack_to_tap || - !prev_wnd_to_tap; + if (conn->seq_ack_to_tap != prev_ack_to_tap || + !prev_wnd_to_tap) + flags |= ACK; } th->doff = (sizeof(*th) + optlen) / 4; + th->ack = !!(flags & ACK); th->rst = !!(flags & RST); th->syn = !!(flags & SYN); th->fin = !!(flags & FIN); -- 2.44.0
tcp_send_flag() will sometimes force on the ACK flag for all !SYN packets. This doesn't make sense for RST packets, where plain RST and RST+ACK have somewhat different meanings. AIUI, RST+ACK indicates an abrupt end to a connection, but acknowledges data already sent. Plain RST indicates an abort, when one end receives a packet that doesn't seem to make sense in the context of what it knows about the connection. All of the cases where we send RSTs are the second, so we don't want an ACK flag, but we currently could add one anyway. Change that, so we won't add an ACK to an RST unless the caller explicitly requests it. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcp.c b/tcp.c index 47954d11..b65ddeb5 100644 --- a/tcp.c +++ b/tcp.c @@ -1674,7 +1674,7 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) *data++ = OPT_WS; *data++ = OPT_WS_LEN; *data++ = conn->ws_to_tap; - } else { + } else if (!(flags & RST)) { if (conn->seq_ack_to_tap != prev_ack_to_tap || !prev_wnd_to_tap) flags |= ACK; -- 2.44.0
Currently we set ACK on flags packets only when the acknowledged byte pointer has advanced, or we hadn't previously set a window. This means in particular that we can send a window update with no ACK flag, which doesn't appear to be correct. RFC 9293 requires a receiver to ignore such a packet [0], and indeed it appears that every non-SYN, non-RST packet should have the ACK flag. The reason for the existing logic, rather than always forcing an ACK seems to be to avoid having the packet mistaken as a duplicate ACK which might trigger a fast retransmit. However, earlier tests in the function mean we won't reach here if we don't have either an advance in the ack pointer - which will already set the ACK flag, or a window update - which shouldn't trigger a fast retransmit. [0] https://www.ietf.org/rfc/rfc9293.html#section-3.10.7.4-2.5.2.1 Link: https://github.com/containers/podman/issues/22146 Link: https://bugs.passt.top/show_bug.cgi?id=84 Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/tcp.c b/tcp.c index b65ddeb5..28562b7f 100644 --- a/tcp.c +++ b/tcp.c @@ -1593,8 +1593,6 @@ static void tcp_update_seqack_from_tap(const struct ctx *c, */ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) { - uint32_t prev_ack_to_tap = conn->seq_ack_to_tap; - uint32_t prev_wnd_to_tap = conn->wnd_to_tap; struct tcp4_l2_flags_buf_t *b4 = NULL; struct tcp6_l2_flags_buf_t *b6 = NULL; struct tcp_info tinfo = { 0 }; @@ -1675,9 +1673,7 @@ static int tcp_send_flag(struct ctx *c, struct tcp_tap_conn *conn, int flags) *data++ = OPT_WS_LEN; *data++ = conn->ws_to_tap; } else if (!(flags & RST)) { - if (conn->seq_ack_to_tap != prev_ack_to_tap || - !prev_wnd_to_tap) - flags |= ACK; + flags |= ACK; } th->doff = (sizeof(*th) + optlen) / 4; -- 2.44.0
On Tue, 26 Mar 2024 16:42:20 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:A recently reported podman bug shows transfer failures in podman custom rootless networks connected with pasta. Analysis suggests this is triggered by pasta generating a TCP packet without the ACK flag when it should have one. The exact symptoms seem to arise because of some odd kernel behaviour - rather than simply ignoring the packet, an RST is observed killing the connection. However, there are also packets seen after the RST which don't seem to make sense. While there are some mysteries which we still hope to track down here, in the meantime it definitely seems like pasta's ACK behaviour isn't correct, and appears to trigger the other problems. So, fix it. Link: https://github.com/containers/podman/issues/22146 Link: https://bugs.passt.top/show_bug.cgi?id=84 David Gibson (4): tcp: Split handling of DUP_ACK from ACK tcp: Rearrange logic for setting ACK flag in tcp_send_flag() tcp: Never automatically add the ACK flag to RST packets tcp: Unconditionally force ACK for all !SYN, !RST packetsApplied. -- Stefano