On Wed, 6 Mar 2024 16:09:23 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:On Mon, Mar 04, 2024 at 11:47:17PM +0100, Stefano Brivio wrote:Well, it can't just do that, because otherwise we couldn't use packed structures on any architecture that doesn't support unaligned accesses, right? Once you pass a pointer to an unaligned value, though, the fixup information is lost, and the compiler models a function as simply taking a given pointer with a given type: the model doesn't include information as to where the value is stored.On Mon, 4 Mar 2024 12:00:40 +0100 Stefano Brivio <sbrivio(a)redhat.com> wrote:Huh. Ok, so I guess the compiler realises it's doing a load from a packed structure and generates the necessary fixup code. I thought it would only consider the type of the actually loaded value.On Mon, 4 Mar 2024 12:54:12 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Actually, armhf first (for clarity): $ cat align.c #include <stdio.h> #include <stdint.h> struct disarray { uint8_t oops; uint32_t v1; uint32_t v2; } __attribute__((packed, aligned(__alignof__(unsigned int)))); void f1(uint32_t *v1) { *v1 += 42; } uint32_t f2(uint32_t v2) { return v2++; } int main() { struct disarray d = { 0x55, 0xaa, 0xaa }; f1(&d.v1); f2(d.v2); fprintf(stdout, "%08x %08x", d.v1, d.v2); } $ arm-linux-gnueabihf-gcc-12 -g -O0 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align.c: In function ‘main’: align.c:22:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 22 | f1(&d.v1); | ^~~~~ $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f1(&d.v1); 562: ab01 add r3, sp, #4 564: 3301 adds r3, #1 566: 4618 mov r0, r3 568: f7ff ffde bl 528 <f1> [...] before the call to f1(), the address in r3 is not aligned (we just added #1), despite -mno-unaligned-access. I guess gcc can only warn about that, but not fix it. This: https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html says: -munaligned-access -mno-unaligned-access Enables (or disables) reading and writing of 16- and 32- bit values from addresses that are not 16- or 32- bit aligned. By default unaligned access is disabled for all pre-ARMv6, all ARMv6-M and for ARMv8-M Baseline architectures, and enabled for all other architectures. If unaligned access is not enabled then words in packed data structures are accessed a byte at a time. Implying, I guess, that on those architectures unaligned accesses shouldn't be done. I think Thumb mode also has issues with this, by the way. And in f1() we just have a ldr from that address (passed on r0): void f1(uint32_t *v1) { 528: b082 sub sp, #8 52a: 9001 str r0, [sp, #4] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a $ arm-linux-gnueabihf-objdump -S --disassemble=f1 align [...] *v1 += 42; 52c: 9b01 ldr r3, [sp, #4] 52e: 681b ldr r3, [r3, #0] 530: f103 022a add.w r2, r3, #42 @ 0x2a ...but the call to f2() is fine: we load with offset 8 from the stack pointer, shift word right, load from offset 12, shift word left, OR: $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] f2(d.v2); 56c: 9b02 ldr r3, [sp, #8] 56e: 0a1b lsrs r3, r3, #8 570: f89d 200c ldrb.w r2, [sp, #12] 574: 0612 lsls r2, r2, #24 576: 4313 orrs r3, r2 578: 4618 mov r0, r3 57a: f7ff ffe0 bl 53e <f2> [...]On Fri, Mar 01, 2024 at 07:56:51AM +0100, Stefano Brivio wrote: > On Fri, 1 Mar 2024 10:09:39 +1100 > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > On Thu, Feb 29, 2024 at 03:15:53PM +0100, Stefano Brivio wrote: > > > On Thu, 29 Feb 2024 09:56:25 +0100 > > > Stefano Brivio <sbrivio(a)redhat.com> wrote: > > > > > > > On Thu, 29 Feb 2024 19:49:09 +1100 > > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > > > On Thu, Feb 29, 2024 at 08:05:09AM +0100, Stefano Brivio wrote: > > > > > > On Thu, 29 Feb 2024 11:38:53 +1100 > > > > > > David Gibson <david(a)gibson.dropbear.id.au> wrote: > > > > > > > > > > > > > On Wed, Feb 28, 2024 at 02:26:18PM +0100, Laurent Vivier wrote: > > > > > > > > On 2/19/24 04:08, David Gibson wrote: > > > > > > > > > On Sat, Feb 17, 2024 at 04:07:23PM +0100, Laurent Vivier wrote: > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > +/** > > > > > > > > > > + * proto_ipv6_header_psum() - Calculates the partial checksum of an > > > > > > > > > > + * IPv6 header for UDP or TCP > > > > > > > > > > + * @payload_len: Payload length > > > > > > > > > > + * @proto: Protocol number > > > > > > > > > > + * @saddr: Source address > > > > > > > > > > + * @daddr: Destination address > > > > > > > > > > + * Returns: Partial checksum of the IPv6 header > > > > > > > > > > + */ > > > > > > > > > > +uint32_t proto_ipv6_header_psum(uint16_t payload_len, uint8_t protocol, > > > > > > > > > > + struct in6_addr saddr, struct in6_addr daddr) > > > > > > > > > > > > > > > > > > Hrm, this is passing 2 16-byte IPv6 addresses by value, which might > > > > > > > > > not be what we want. > > > > > > > > > > > > > > > > The idea here is to avoid the pointer alignment problem (&ip6h->saddr and > > > > > > > > &ip6h->daddr can be misaligned). > > > > > > > > > > > > > > Ah, right. That's a neat idea, but I'm not sure it really helps: I > > > > > > > think it will just move the misaligned access from inside the function > > > > > > > to the call site, where we try to marshal the parameter from something > > > > > > > unaligned. > > > > > > > > > > > > I haven't tested this yet, but note that this is generally okay: the > > > > > > problem is *dereferencing* an unaligned pointer. But if you load memory > > > > > > from an aligned pointer, and extract a value from this memory, it's all > > > > > > fine. > > > > > > > > > > Right, that's kind of what I'm getting at. Assuming this value starts > > > > > in an unaligned buffer, then in order to pass this by value the caller > > > > > will need to load from that unaligned pointer. AFAIK, the compiler > > > > > will base the type of loads only on the pointed to type, which isn't > > > > > changed whether we dereference in the caller or the callee. > > > > > > > > > > > > > > > > > Speaking MIPS, this is not safe on all CPU models: > > > > > > > > > > > > la $1, 1002 # s1 now contains the value 1002 > > > > > > lw $2, 0($1) # load word from memory at 1002 + 0 into s2 > > > > > > > > > > > > but this is: > > > > > > > > > > > > la $1, 1000 # s1 now contains the value 1000 > > > > > > la $2, 1004 # s3 now contains the value 1004 > > > > > > lw $3, 0($1) # load word from memory at 1000 + 0 into s3 > > > > > > lw $4, 0($3) # load word from memory at 1004 + 0 into s4 > > > > > > sll $5, $3, 16 # 16-bit shift left s3 into s5 > > > > > > srl $6, $4, 16 # 16-bit shift right s4 into s6 > > > > > > or $2, $5, $6 # OR s5 and s6 into s2 > > > > > > > > > > Right, but I don't think merely moving the dereference to the caller > > > > > will necessarily induce the compiler to generate this rather than the > > > > > former. > > > > > > > > Oh, oops, I didn't realise this was the case (I haven't reviewed the > > > > patch yet). > > > > > > ...no, that's not the case. Dereferencing 'iph' from > > > struct tcp[46]_l2_buf_t is fine: > > > > > > struct tcp4_l2_buf_t { > > > uint8_t pad[2]; /* 0 2 */ > > > struct tap_hdr taph; /* 2 18 */ > > > struct iphdr iph; /* 20 20 */ > > > [...] > > > } __attribute__((__packed__)); > > > > > > struct tcp6_l2_buf_t { > > > uint8_t pad[2]; /* 0 2 */ > > > struct tap_hdr taph; /* 2 18 */ > > > struct ipv6hdr ip6h; /* 20 40 */ > > > [...] > > > } __attribute__((__packed__)); > > > > > > The problematic structures are the UDP buffers: > > > > > > struct udp4_l2_buf_t { > > > struct sockaddr_in s_in; /* 0 16 */ > > > struct tap_hdr taph; /* 16 18 */ > > > struct iphdr iph; /* 34 20 */ > > > [...] > > > } __attribute__((__aligned__(4))); > > > > > > and for UDP, this patch is dereferencing buffer pointers only, not > > > pointers to headers. > > > > Ok... but my point remains, I'm not seeing that passing the address by > > value actually helps - it just seems to change whether we need to > > handle the unaligned load in the caller or the callee. > > For UDP and IPv4 (from 6/9): > > + b->iph.check = csum_ip4_header(b->iph.tot_len, IPPROTO_UDP, > + b->iph.saddr, b->iph.daddr); > > and for IPv6 (this patch): > > + b->uh.check = csum(&b->uh, ntohs(b->ip6h.payload_len), > + proto_ipv6_header_psum(b->ip6h.payload_len, > + IPPROTO_UDP, > + b->ip6h.saddr, > + b->ip6h.daddr)); > > these cause loads starting from 'b', which is aligned, instead of > passing 'iph' or 'ip6h', unaligned, and loading from there. No... the loads are still from b->ip6h.saddr, b->ip6h.daddr and b->ip6h.payload_len.It depends how we define "loading from" -- the problem, in general, is not the memory location per se, the problem is dereferencing memory pointers. I plan to try an example on MIPS in a bit [...]Are you sure it still does this correctly when optimization is enabled?I'm fairly sure, even though I couldn't find a "normative" reference (at least as far as GNU extensions are concerned) guaranteeing that (but I didn't try hard). I had to move f1() and f2() to different compilation units, otherwise with -O2 they get merged into main(): $ cat align.c #include <stdio.h> #include <stdint.h> struct disarray { uint8_t oops; uint32_t v1; uint32_t v2; } __attribute__((packed, aligned(__alignof__(unsigned int)))); void f1(uint32_t *v1); uint32_t f2(uint32_t v2); int main() { struct disarray d = { 0x55, 0xaa, 0xaa }; f1(&d.v1); f2(d.v2); fprintf(stdout, "%08x %08x", d.v1, d.v2); } $ cat align_f1.c #include <stdint.h> void f1(uint32_t *v1) { *v1 += 42; } $ cat align_f2.c #include <stdint.h> uint32_t f2(uint32_t v2) { return v2++; } $ arm-linux-gnueabihf-gcc-12 -g -O2 -fno-stack-protector -fomit-frame-pointer -mno-unaligned-access -o align align.c align_f1.c align_f2.c align.c: In function ‘main’: align.c:18:8: warning: taking address of packed member of ‘struct disarray’ may result in an unaligned pointer value [-Waddress-of-packed-member] 18 | f1(&d.v1); | ^~~~~ ...note the usual dance before f2() is called, but not before the call to f1(): $ arm-linux-gnueabihf-objdump -S --disassemble=main align [...] struct disarray d = { 0x55, 0xaa, 0xaa }; 43e: e903 0007 stmdb r3, {r0, r1, r2} f1(&d.v1); 442: f10d 0005 add.w r0, sp, #5 446: f000 f8a5 bl 594 <f1> f2(d.v2); 44a: f89d 300c ldrb.w r3, [sp, #12] 44e: 9802 ldr r0, [sp, #8] 450: 061b lsls r3, r3, #24 452: ea43 2010 orr.w r0, r3, r0, lsr #8 456: f000 f8a1 bl 59c <f2> [...] -- Stefano