gcc, strict-aliasing, and casting through a union-Collection of common programming errors

Do you have any horror stories to tell? The GCC Manual recently added a warning regarding -fstrict-aliasing and casting a pointer through a union:

[…] Taking the address, casting the resulting pointer and dereferencing the result has undefined behavior [emphasis added], even if the cast uses a union type, e.g.:

    union a_union {
        int i;
        double d;
    };

    int f() {
        double d = 3.0;
        return ((union a_union *)&d)->i;
    }

Does anyone have an example to illustrate this undefined behavior?

Note this question is not about what the C99 standard says, or does not say. It is about the actual functioning of gcc, and other existing compilers, today.

I am only guessing, but one potential problem may lie in the setting of d to 3.0. Because d is a temporary variable which is never directly read, and which is never read via a ‘somewhat-compatible’ pointer, the compiler may not bother to set it. And then f() will return some garbage from the stack.

My simple, naive, attempt fails. For example:

#include 

union a_union {
    int i;
    double d;
};

int f1(void) {
    union a_union t;
    t.d = 3333333.0;
    return t.i; // gcc manual: 'type-punning is allowed, provided...' (C90 6.3.2.3)
}

int f2(void) {
    double d = 3333333.0;
    return ((union a_union *)&d)->i; // gcc manual: 'undefined behavior' 
}

int main(void) {
    printf("%d\n", f1());
    printf("%d\n", f2());
    return 0;
}

works fine, giving on CYGWIN:

-2147483648
-2147483648

Looking at the assembler, we see that gcc completely optimizes t away: f1() simply stores the pre-calculated answer:

movl    $-2147483648, %eax

while f2() pushes 3333333.0 onto the floating-point stack, and then extracts the return value:

flds   LC0                 # LC0: 1246458708 (= 3333333.0) (--> 80 bits)
fstpl  -8(%ebp)            # save in d (64 bits)
movl   -8(%ebp), %eax      # return value (32 bits)

And the functions are also inlined (which seems to be the cause of some subtle strict-aliasing bugs) but that is not relevant here. (And this assembler is not that relevant, but it adds corroborative detail.)

Also note that taking addresses is obviously wrong (or right, if you are trying to illustrate undefined behavior). For example, just as we know this is wrong:

extern void foo(int *, double *);
union a_union t;
t.d = 3.0;
foo(&t.i, &t.d); // undefined behavior

we likewise know this is wrong:

extern void foo(int *, double *);
double d = 3.0;
foo(&((union a_union *)&d)->i, &d); // undefined behavior

For background discussion about this, see for example:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1422.pdf
http://gcc.gnu.org/ml/gcc/2010-01/msg00013.html
http://davmac.wordpress.com/2010/02/26/c99-revisited/
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule
http://stackoverflow.com/questions/2771023/c99-strict-aliasing-rules-in-c-gcc/2771041#2771041

In the first link, draft minutes of an ISO meeting seven months ago, one participant notes in section 4.16:

Is there anybody that thinks the rules are clear enough? No one is really able to interpret them.

Other notes: My test was with gcc 4.3.4, with -O2; options -O2 and -O3 imply -fstrict-aliasing. The example from the GCC Manual assumes sizeof(double) >= sizeof(int); it doesn’t matter if they are unequal.

Also, as noted by Mike Acton in the cellperformace link, -Wstrict-aliasing=2, but not =3, produces warning: dereferencing type-punned pointer might break strict-aliasing rules for the example here.

  1. Aliasing occurs when the compiler has two different pointers to the same piece of memory. By typecasting a pointer, you’re generating a new temporary pointer. If the optimizer reorders the assembly instructions for example, accessing the two pointers might give two totally different results – it might reorder a read before a write to the same address. This is why it is undefined behavior.

    You are unlikely to see the problem in very simple test code, but it will appear when there’s a lot going on.

    I think the warning is to make clear that unions are not a special case, even though you might expect them to be.

    See this Wikipedia article for more information about aliasing: http://en.wikipedia.org/wiki/Aliasing_(computing)#Conflicts_with_optimization

  2. The fact that GCC is warning about unions doesn’t necessarily mean that unions don’t currently work. But here’s a slightly less simple example than yours:

    #include 
    
    struct B {
        int i1;
        int i2;
    };
    
    union A {
        struct B b;
        double d;
    };
    
    int main() {
        double d = 3.0;
        #ifdef USE_UNION
            ((union A*)&d)->b.i2 += 0x80000000;
        #else
            ((int*)&d)[1] += 0x80000000;
        #endif
        printf("%g\n", d);
    }
    

    Output:

    $ gcc --version
    gcc (GCC) 4.3.4 20090804 (release) 1
    Copyright (C) 2008 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    $ gcc -oalias alias.c -O1 -std=c99 && ./alias
    -3
    
    $ gcc -oalias alias.c -O3 -std=c99 && ./alias
    3
    
    $ gcc -oalias alias.c -O1 -std=c99 -DUSE_UNION && ./alias
    -3
    
    $ gcc -oalias alias.c -O3 -std=c99 -DUSE_UNION && ./alias
    -3
    

    So on GCC 4.3.4, the union “saves the day” (assuming I want the output “-3”). It disables the optimisation that relies on strict aliasing and that results in the output “3” in the second case (only). With -Wall, USE_UNION also disables the type-pun warning.

    I don’t have gcc 4.4 to test, but please give this code a go. Your code in effect tests whether the memory for d is initialised before reading back through a union: mine tests whether it is modified.

    Btw, the safe way to read half of a double as an int is:

    double d = 3;
    int i;
    memcpy(&i, &d, sizeof i);
    return i;
    

    With optimisation on GCC, this results in:

        int thing() {
    401130:       55                      push   %ebp
    401131:       89 e5                   mov    %esp,%ebp
    401133:       83 ec 10                sub    $0x10,%esp
            double d = 3;
    401136:       d9 05 a8 20 40 00       flds   0x4020a8
    40113c:       dd 5d f0                fstpl  -0x10(%ebp)
            int i;
            memcpy(&i, &d, sizeof i);
    40113f:       8b 45 f0                mov    -0x10(%ebp),%eax
            return i;
        }
    401142:       c9                      leave
    401143:       c3                      ret
    

    So there’s no actual call to memcpy. If you aren’t doing this, you deserve what you get if union casts stop working in GCC 😉

  3. Well it’s a bit of necro-posting, but here is a horror story. I’m porting a program that was written with the assumption that the native byte order is big endian. Now I need it to work on little endian too. Unfortunately, I can’t just use native byte order everywhere, as data could be accessed in many ways. For example, a 64-bit integer could be treated as two 32-bit integers or as 4 16-bit integers, or even as 16 4-bit integers. To make things worse, there is no way to figure out what exactly is stored in memory, because the software is an interpreter for some sort of byte code, and the data is formed by that byte code. For example, the byte code may contain instructions to write an array of 16-bit integers, and then access a pair of them as a 32-bit float. And there is no way to predict it or alter the byte code.

    Therefore, I had to create a set of wrapper classes to work with values stored in the big endian order regardless of the native endianness. Worked perfectly in Visual Studio and in GCC on Linux with no optimizations. But with gcc -O2, hell broke loose. After a lot of debugging I figured out that the reason was here:

    double D;
    float F; 
    Ul *pF=(Ul*)&F; // Ul is unsigned long
    *pF=pop0->lu.r(); // r() returns Ul
    D=(double)F; 
    

    This code was used to convert a 32-bit representation of a float stored in a 32-bit integer to double. It seems that the compiler decided to do the assignment to *pF after the assignment to D – the result was that the first time the code was executed, the value of D was garbage, and the consequent values were “late” by 1 iteration.

    Miraculously, there were no other problems at that point. So I decided to move on and test my new code on the original platform, HP-UX on a RISC processor with native big endian order. Now it broke again, this time in my new class:

    typedef unsigned long long Ur; // 64-bit uint
    typedef unsigned char Uc;
    class BEDoubleRef {
            double *p;
    public:
            inline BEDoubleRef(double *p): p(p) {}
            inline operator double() {
                    Uc *pu = reinterpret_cast(p);
                    Ur n = (pu[7] & 0xFFULL) | ((pu[6] & 0xFFULL) > 32) & 0xFFu;
                    pc[4] = (*pu >> 24) & 0xFFu;
                    pc[5] = (*pu >> 16) & 0xFFu;
                    pc[6] = (*pu >> 8) & 0xFFu;
                    pc[7] = *pu & 0xFFu;
                    return *this;
            }
            inline BEDoubleRef &operator=(const BEDoubleRef &d) {
                    *p = *d.p;
                    return *this;
            }
    };
    

    For some really weird reason, the first assignment operator only correctly assigned bytes 1 through 7. Byte 0 always had some nonsense in it, which broke everything as there is a sign bit and a part of order.

    I have tried to use unions as a workaround:

    union {
        double d;
        Uc c[8];
    } un;
    Uc *pc = un.c;
    const Ur *pu = reinterpret_cast(&d);
    pc[0] = (*pu >> 56) & 0xFFu;
    pc[1] = (*pu >> 48) & 0xFFu;
    pc[2] = (*pu >> 40) & 0xFFu;
    pc[3] = (*pu >> 32) & 0xFFu;
    pc[4] = (*pu >> 24) & 0xFFu;
    pc[5] = (*pu >> 16) & 0xFFu;
    pc[6] = (*pu >> 8) & 0xFFu;
    pc[7] = *pu & 0xFFu;
    *p = un.d;
    

    but it didn’t work either. In fact, it was a bit better – it only failed for negative numbers.

    At this point I’m thinking about adding a simple test for native endianness, then doing everything via char* pointers with if (LITTLE_ENDIAN) checks around. To make things worse, the program makes heavy use of unions all around, which seems to work ok for now, but after all this mess I won’t be surprised if it suddenly breaks for no apparent reason.

  4. I don’t really understand your problem. The compiler did exactly what it was supposed to do in your example. The union conversion is what you did in f1. In f2 it’s a normal pointer typecast, that you casted it to a union is irrelevant, it’s still a pointer casting

  5. Your assertion that the following code is “wrong”:

    extern void foo(int *, double *);
    union a_union t;
    t.d = 3.0;
    foo(&t.i, &t.d); // undefined behavior
    

    … is wrong. Just passing taking the address of the two union members and passing them to an external function doesn’t result in undefined behaviour; you only get that from dereferencing one of those pointers in an invalid way. For instance if the function foo returns immediately without dereferencing the pointers you passed it, then the behaviour is not undefined. With a strict reading of the C99 standard, there are even some cases where the pointers can be dereferenced without invoking undefined behaviour; for instance, it could read the value referenced by the second pointer, and then store a value through the first pointer.

Originally posted 2013-11-10 00:10:05.