On Tue, 22 Aug 2023 15:29:56 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:We partially prepopulate IP and TCP header structures including, amongst other things the destination address, which for IPv4 is always the known address of the guest/namespace. We partially precompute both the IPv4 header checksum and the TCP checksum based on this. In future we're going to want more flexibility with controlling the destination for IPv4 (as we already do for IPv6), so this precomputed value gets in the way. Therefore remove the IPv4 destination from the precomputed checksum and fold it into the checksum update when we actually send a packet. Doing this means we no longer need to recompute those partial sums when the destination address changes ({tcp,udp}_update_l2_buf()) and instead the computation can be moved to compile time. This means while we perform slightly more computations on each packet, we slightly reduce the amount of memory we need to access. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- tcp.c | 61 ++++++++++++++++++++-------------------------------------- udp.c | 14 +++----------- util.h | 4 +++- 3 files changed, 27 insertions(+), 52 deletions(-) diff --git a/tcp.c b/tcp.c index 56634c9..c52ea2b 100644 --- a/tcp.c +++ b/tcp.c @@ -323,10 +323,8 @@ #define MSS_DEFAULT 536 struct tcp4_l2_head { /* For MSS4 macro: keep in sync with tcp4_l2_buf_t */ - uint32_t psum; - uint32_t tsum; #ifdef __AVX2__ - uint8_t pad[18]; + uint8_t pad[26]; #else uint8_t pad[2]; #endif @@ -443,8 +441,6 @@ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; /** * tcp4_l2_buf_t - Pre-cooked IPv4 packet buffers for tap connections - * @psum: Partial IP header checksum (excluding tot_len and saddr) - * @tsum: Partial TCP header checksum (excluding length and saddr) * @pad: Align TCP header to 32 bytes, for AVX2 checksum calculation only * @taph: Tap-level headers (partially pre-filled) * @iph: Pre-filled IP header (except for tot_len and saddr) @@ -452,17 +448,15 @@ static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE]; * @data: Storage for TCP payload */ static struct tcp4_l2_buf_t { - uint32_t psum; /* 0 */ - uint32_t tsum; /* 4 */ #ifdef __AVX2__ - uint8_t pad[18]; /* 8, align th to 32 bytes */ + uint8_t pad[26]; /* 0, align th to 32 bytes */ #else - uint8_t pad[2]; /* align iph to 4 bytes 8 */ + uint8_t pad[2]; /* align iph to 4 bytes 0 */ #endif - struct tap_hdr taph; /* 26 10 */ - struct iphdr iph; /* 44 28 */ - struct tcphdr th; /* 64 48 */ - uint8_t data[MSS4]; /* 84 68 */ + struct tap_hdr taph; /* 26 2 */ + struct iphdr iph; /* 44 20 */ + struct tcphdr th; /* 64 40 */ + uint8_t data[MSS4]; /* 84 60 */ /* 65536 65532 */Pre-existing, but I spotted this only now: the non-AVX2 version ends at 65532 because MSS4 is 65535 - 68 rounded down to a multiple of 4. But USHRT_MAX is used in MSS4() just to make sure we don't exceed it as frame size (right?) -- after https://bugs.passt.top/show_bug.cgi?id=55 I'm kind of wondering if we shouldn't start from 65536 instead and gain those four bytes back. Unrelated, anyway. The series looks good to me, thanks, applying in a bit. -- Stefano