September 2023 - dev

[PATCH] arp: only send ARP replies for --gateway address
by Nikolay Edigaryev 12 Oct '23

12 Oct '23

Problem: when passt/pasta are working in a broadcast domain with more than one host machine, it will answer for all of these machines, except for the one having --address. This is akin to ARP spoofing and breaks connection with these machines if passt/pasta ARP reply arrives before the original one. Solution: only be responsible and send ARP replies for the --gateway's address. --- arp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arp.c b/arp.c index a35c1b6..f873491 100644 --- a/arp.c +++ b/arp.c @@ -67,8 +67,8 @@ int arp(const struct ctx *c, const struct pool *p) !memcmp(am->sip, am->tip, sizeof(am->sip))) return 1; - /* Don't resolve our own address, either. */ - if (!memcmp(am->tip, &c->ip4.addr, sizeof(am->tip))) + /* Don't resolve anything but gateway address. */ + if (memcmp(am->tip, &c->ip4.gw, sizeof(am->tip)) != 0) return 1; ah->ar_op = htons(ARPOP_REPLY); -- 2.39.2 (Apple Git-144)

3 5

[PATCH RFT 0/5] Fixes and a workaround for TCP stalls with small buffers
by Stefano Brivio 05 Oct '23

05 Oct '23

The fundamental patch here is 3/5, which is a workaround for a rather surprising kernel behaviour we seem to be hitting. This all comes from the investigation around https://bugs.passt.top/show_bug.cgi?id=74. I can't hit stalls anymore and throughput looks finally good to me (~3.5gbps with 208 KiB rmem_max and wmem_max), but... please test. Stefano Brivio (5): tcp: Fix comment to tcp_sock_consume() tcp: Reset STALLED flag on ACK only, check for pending socket data tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag tcp, tap: Don't increase tap-side sequence counter for dropped frames passt.1: Add note about tuning rmem_max and wmem_max for throughput passt.1 | 33 +++++++++++++++++++++++++ tap.c | 10 +++++--- tap.h | 2 +- tcp.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++---------- 4 files changed, 102 insertions(+), 17 deletions(-) -- 2.39.2

2 31

[PATCH 0/4] RFC: Updates for cppcheck-2.12 warnings
by David Gibson 05 Oct '23

05 Oct '23

cppcheck 2.12 (which Fedora 38 has updated, for one) introduces a number of new warnings. Unfortunately, at least one of these is a clear bug in cppcheck. This series fixes a number of the new warnings reported in passt (patches 1..3) and works around the remaining cppcheck bug (patch 4). I'm pretty confident that patches 1 & 2 are safe and beneficial to apply regardless of which cppcheck we're using. Patch 3 is a little more dubious, because it potentially increases the cppcheck runtime. On my system it doesn't seem to make a significant difference, but that might not always stay true. Patch 4 is a tricky one. It applies a specific suppression to work around the cppcheck bug. That's necessary to get a pass with the currently available cppcheck. However, it's ugly and we'd like to remove it once the bug is fixed, but have no obvious way to remind us to do that. What we want to do here kind of depends how long it takes the bug to be fixed, which isn't clear at the moment. David Gibson (4): cppcheck: Make many pointers const conf: Remove overly cryptic selection of forward table cppcheck: Use "exhaustive" level checking when available cppcheck: Work around bug in cppcheck 2.12.0 Makefile | 6 ++++++ conf.c | 25 +++++++++++-------------- isolation.c | 2 +- isolation.h | 2 +- log.c | 6 +++--- netlink.c | 6 +++--- netlink.h | 6 +++--- pasta.c | 4 ++-- pasta.h | 2 +- tap.c | 14 +++++++------- tap.h | 6 +++--- tcp.c | 25 ++++++++++++++++--------- tcp_conn.h | 2 +- tcp_splice.c | 5 +++-- tcp_splice.h | 3 ++- udp.c | 4 ++-- udp.h | 2 +- util.c | 5 +++-- util.h | 4 ++-- 19 files changed, 71 insertions(+), 58 deletions(-) -- 2.41.0

2 8

[PATCH v2 0/3] Fixes and a workaround for TCP stalls with small buffers
by Stefano Brivio 03 Oct '23

03 Oct '23

The fundamental patch here is 2/3, which is a workaround for a rather surprising kernel behaviour we seem to be hitting. This all comes from the investigation around https://bugs.passt.top/show_bug.cgi?id=74. I can't hit stalls anymore and throughput looks finally good to me (~3.5gbps with 208 KiB rmem_max and wmem_max), but... please test (again). v2: - Drop 1/5 (checking for ACK before resetting STALLED and calling tcp_data_from_sock() directly from tcp_tap_handler()) - Moving reset of STALLED flag is now done in 2/3 (was 3/5) - Don't pass unnecessary argument to tcp_data_to_tap() in 3/3 (was 4/5) - Drop 5/5 as long as we're not sure what kind of buffer clamping is actually beneficial Stefano Brivio (3): tcp: Fix comment to tcp_sock_consume() tcp: Force TCP_WINDOW_CLAMP before resetting STALLED flag tcp, tap: Don't increase tap-side sequence counter for dropped frames tap.c | 10 ++++++--- tap.h | 2 +- tcp.c | 71 +++++++++++++++++++++++++++++++++++++++++++++++++---------- 3 files changed, 67 insertions(+), 16 deletions(-) -- 2.39.2

2 6

[PATCH v2 00/10] siphash: cleanups and fixes
by David Gibson 30 Sep '23

30 Sep '23

While working on unifying the hashing for the flow table, I noticed some awkwardness in the siphash functions. While looking into that I noticed some bugs. So.. here we are. Changes since v1: * Don't accidentally increase the alignment of union inany_addr David Gibson (10): siphash: Make siphash functions consistently return 64-bit results siphash: Make sip round calculations an inline function rather than macro siphash: Add siphash_feed() helper siphash: Clean up hash finalisation with posthash_final() function siphash: Fix bug in state initialisation siphash: Use more hygienic state initialiser siphash: Use specific structure for internal state siphash: Make internal helpers public siphash, checksum: Move TBAA explanation to checksum.c siphash: Use incremental rather than all-at-once siphash functions Makefile | 2 +- checksum.c | 19 ++-- inany.h | 17 +++- siphash.c | 243 --------------------------------------------------- siphash.h | 119 +++++++++++++++++++++++-- tcp.c | 37 +++----- tcp_splice.c | 1 + 7 files changed, 159 insertions(+), 279 deletions(-) delete mode 100644 siphash.c -- 2.41.0

2 11

[PATCH 00/10] siphash: cleanups and fixes
by David Gibson 29 Sep '23

29 Sep '23

While working on unifying the hashing for the flow table, I noticed some awkwardness in the siphash functions. While looking into that I noticed some bugs. So.. here we are. David Gibson (10): siphash: Make siphash functions consistently return 64-bit results siphash: Make sip round calculations an inline function rather than macro siphash: Add siphash_feed() helper siphash: Clean up hash finalisation with posthash_final() function siphash: Fix bug in state initialisation siphash: Use more hygienic state initialiser siphash: Use specific structure for internal state siphash: Make internal helpers public siphash, checksum: Move TBAA explanation to checksum.c siphash: Use incremental rather than all-at-once siphash functions Makefile | 2 +- checksum.c | 19 ++-- inany.h | 16 +++- siphash.c | 243 --------------------------------------------------- siphash.h | 119 +++++++++++++++++++++++-- tcp.c | 37 +++----- tcp_splice.c | 1 + 7 files changed, 158 insertions(+), 279 deletions(-) delete mode 100644 siphash.c -- 2.41.0

2 16

[PATCH v2 0/2] Some static checker fixes
by David Gibson 27 Sep '23

27 Sep '23

We already had a couple of places we were working around clang-tidy issue 58992, and the flow table series adds more. I got sick of ugly inlines every time we used a syscall which returns a socket address, so wrote a patch to consolidate the workarounds in one place. However, that patch added an include of <string.h> to util.h which exposed a classic C library gotcha in packet.c, so I fixed that too. Changes since v1: * Updated missed comment to match code changes in 1/2 * Fixed more places which shadowed index(3) David Gibson (2): Avoid shadowing index(3) util: Consolidate and improve workarounds for clang-tidy issue 58992 Makefile | 2 +- icmp.c | 5 ----- packet.c | 30 +++++++++++++++--------------- packet.h | 10 +++++----- tcp.c | 22 ++++++++-------------- tcp_splice.c | 2 +- util.c | 12 ++++++------ util.h | 43 ++++++++++++++++++++++++++++++++++++++++++- 8 files changed, 78 insertions(+), 48 deletions(-) -- 2.41.0

2 3

[PATCH] tcp: Always send an ACK segment once the handshake is completed
by Stefano Brivio 25 Sep '23

25 Sep '23

The reporter is running a SMTP server behind pasta, and the client waits for the server's banner before sending any data. In turn, the server waits for our ACK after sending SYN,ACK, which never comes. If we use the ACK_IF_NEEDED indication to tcp_send_flag(), given that there's no pending data, we delay sending the ACK segment at the end of the three-way handshake until we have some data to send to the server. This was actually intended, as I thought we would lower the latency for new connections, but we can't assume that the client will start sending data first (SMTP is the typical example where this doesn't happen). And, trying out this patch with SSH (where the client starts sending data first), the reporter actually noticed we have a lower latency by forcing an ACK right away. Comparing a capture before the patch: 13:07:14.007704 IP 10.1.2.1.42056 > 10.1.2.140.1234: Flags [S], seq 1797034836, win 65535, options [mss 4096,nop,wscale 7], length 0 13:07:14.007769 IP 10.1.2.140.1234 > 10.1.2.1.42056: Flags [S.], seq 2297052481, ack 1797034837, win 65480, options [mss 65480,nop,wscale 7], length 0 13:07:14.008462 IP 10.1.2.1.42056 > 10.1.2.140.1234: Flags [.], seq 1:22, ack 1, win 65535, length 21 13:07:14.008496 IP 10.1.2.140.1234 > 10.1.2.1.42056: Flags [.], ack 22, win 512, length 0 13:07:14.011799 IP 10.1.2.140.1234 > 10.1.2.1.42056: Flags [P.], seq 1:515, ack 22, win 512, length 514 and after: 13:10:26.165364 IP 10.1.2.1.59508 > 10.1.2.140.1234: Flags [S], seq 4165939595, win 65535, options [mss 4096,nop,wscale 7], length 0 13:10:26.165391 IP 10.1.2.140.1234 > 10.1.2.1.59508: Flags [S.], seq 985607380, ack 4165939596, win 65480, options [mss 65480,nop,wscale 7], length 0 13:10:26.165418 IP 10.1.2.1.59508 > 10.1.2.140.1234: Flags [.], ack 1, win 512, length 0 13:10:26.165683 IP 10.1.2.1.59508 > 10.1.2.140.1234: Flags [.], seq 1:22, ack 1, win 512, length 21 13:10:26.165698 IP 10.1.2.140.1234 > 10.1.2.1.59508: Flags [.], ack 22, win 512, length 0 13:10:26.167107 IP 10.1.2.140.1234 > 10.1.2.1.59508: Flags [P.], seq 1:515, ack 22, win 512, length 514 the latency between the initial SYN segment and the first data transmission actually decreases from 792µs to 334µs. This is not statistically relevant as we have a single measurement, but it can't be that bad, either. Reported-by: cr3bs (from IRC) Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcp.c b/tcp.c index 76b7b8d..39844eb 100644 --- a/tcp.c +++ b/tcp.c @@ -2562,7 +2562,7 @@ static void tcp_conn_from_sock_finish(struct ctx *c, struct tcp_tap_conn *conn, * dequeue waiting for SYN,ACK from tap -- check now. */ tcp_data_from_sock(c, conn); - tcp_send_flag(c, conn, ACK_IF_NEEDED); + tcp_send_flag(c, conn, ACK); } /** -- 2.39.2

2 1

[PATCH] dhcpv6: Properly separate domain names in search list
by Stefano Brivio 23 Sep '23

23 Sep '23

If we go over the flattened list of search domains and just replace dots and zero bytes with the length of the next label to implement the encoding specified by section 3.1 of RFC 1035, if there are multiple domains in the search list, we'll also replace separators between two domain names with the length of the first label of the second domain, plus one. Those should remain as zero bytes to separate domains, though. To distinguish between label separators and domain names separators, for simplicity, introduce a dot before the first label of every domain we copy to form the list. All dots are then replaced by label lengths, and separators (zero bytes) remain as they are. As we do this, we need to make sure we don't replace the trailing dot, if present: that's already a separator. Skip copying it, and just add separators as needed. Now that we don't copy those, though, we might end up with zero-length domains: skip them, as they're meaningless anyway. And as we might skip domains, we can't use the index 'i' to check if we're at the beginning of the option -- use 'srch' instead. This is very similar to how we prepare the list for NDP option 31, except that we don't need padding (RFC 8106, 5.2) here, and we should refactor this into common functions, but it probably makes sense to rework the NDP responder (https://bugs.passt.top/show_bug.cgi?id=21) first. Reported-by: Sebastian Mitterle <smitterl(a)redhat.com> Link: https://bugs.passt.top/show_bug.cgi?id=75 Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- dhcpv6.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/dhcpv6.c b/dhcpv6.c index fc42a84..58171bb 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -376,24 +376,34 @@ search: return offset; for (i = 0; *c->dns_search[i].n; i++) { - if (!i) { + size_t name_len = strlen(c->dns_search[i].n); + + /* We already append separators, don't duplicate if present */ + if (c->dns_search[i].n[name_len - 1] == '.') + name_len--; + + /* Skip root-only search domains */ + if (!name_len) + continue; + + if (!srch) { srch = (struct opt_dns_search *)(buf + offset); offset += sizeof(struct opt_hdr); srch->hdr.t = OPT_DNS_SEARCH; srch->hdr.l = 0; p = srch->list; - *p = 0; } - p = stpcpy(p + 1, c->dns_search[i].n); - *(p++) = 0; - srch->hdr.l += strlen(c->dns_search[i].n) + 2; - offset += strlen(c->dns_search[i].n) + 2; + *p = '.'; + p = stpncpy(p + 1, c->dns_search[i].n, name_len); + p++; + srch->hdr.l += name_len + 2; + offset += name_len + 2; } if (srch) { for (i = 0; i < srch->hdr.l; i++) { - if (srch->list[i] == '.' || !srch->list[i]) { + if (srch->list[i] == '.') { srch->list[i] = strcspn(srch->list + i + 1, "."); } -- 2.39.2

2 3

[PATCH] dhcp: Actually note down the length of options received by the client
by Stefano Brivio 23 Sep '23

23 Sep '23

It turns out we never used 'clen' until commit 1f24d3efb499 ("dhcp: support BOOTP clients"), and we always ignored option 55 (Parameter Request List), while, according to RFC 2132, we MUST try to insert the requested options in the order requested by the client. The commit mentioned above made this visible because now every client is reported as sending a DHCPREQUEST as an old BOOTP client, based on the lack of option 53 (that is, zero length). Fixes: b439984641ed ("merd: ARP and DHCP handlers, connection tracking fixes") Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com> --- dhcp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/dhcp.c b/dhcp.c index c1ac95e..46c258e 100644 --- a/dhcp.c +++ b/dhcp.c @@ -317,6 +317,7 @@ int dhcp(const struct ctx *c, const struct pool *p) return -1; memcpy(&opts[*type].c, val, *olen); + opts[*type].clen = *olen; opt_off += *olen + 2; } -- 2.39.2

2 1