Undefined-Behavior at its best, is it -boundary break? -bad pointer arithmetic? Or just -ignore of aliasing?-Collection of common programming errors
I’m working now for some weeks with c99 focusing undefined behaviour. I wanted to test some strange code while trying to respect the rules. The result was this code:
(plz forgive me the variable names, i had eaten a clown)
int main(int arg, char** argv)
{
unsigned int uiDiffOfVars;
int LegalPointerCast1, LegalPointerCast2, signedIntToRespectTheRules;
char StartVar;//Only use to have an adress from where we can move on
char *TheAccesingPointer;
int iTargetOfPointeracces;
iTargetOfPointeracces= 0x55555555;
TheAccesingPointer = (char *) &StartVar;
LegalPointerCast2 = (int) &StartVar;
LegalPointerCast1 = (int) &iTargetOfPointeracces;
if ((0x80000000 & LegalPointerCast2) != (0x80000000 & LegalPointerCast1))
{
//as im not sure in how far
//"— Apointer is converted to other than an integer or pointer type (6.5.4)." is treating unsigned integers,
//im checking this way.
printf ("try it on next machine!\r\n");
return 1;
}
if ((0x80000000 & LegalPointerCast2) == 0)
uiDiffOfVars = abs (LegalPointerCast1) - abs (LegalPointerCast2);
else
uiDiffOfVars = abs (LegalPointerCast2) - abs (LegalPointerCast1);
LegalPointerCast2 = (int) TheAccesingPointer;
signedIntToRespectTheRules = abs ((int) uiDiffOfVars);
TheAccesingPointer = (char *)(LegalPointerCast2 + signedIntToRespectTheRules);
printf ("%c\r\n", *TheAccesingPointer);//Will the output be an 'U' ?
return 0;
}
So this code is undefined behavior at its best. I get different results, whether I’m not accessing any memory-area, that i don’t own, nor accessing any uninitialized memory. (afaik)
The first critical rule was, I’m not allowed to add or subtract pointer which lets them leaving their array bounds. But I’m allowed to cast a pointer into integer, there I’m able calculate with, as I want, am I not?
My second assumption was as I’m allowed to assign a pointer an address thats valid, its a valid operation to assign this calculated address to a pointer. Since I’m acting with a char pointer, there is also no break of strict aliasing rules, as a char* is allowed to alias anything.
So which rule is broken, that this causes UB?
are single Variables also to be understood as “Arrays”, and I’m breaking this rule?
— Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that does not point into, or just beyond, the same array object (6.5.6).
If so, I’m also allowed to do this?
int var;
int *ptr;
ptr = &var;
ptr = ptr + 1;
Because the result is almost pretty sure undefined behavior. compiling with MSVC2010 it puts out the expected “U”, but on freeBSD using clang and gcc I get depending on optimization level pretty funny and different results each time. (what in my eyes shouldn’t be as far the bahavior is defined).
So any ideas what is causing this nasal dragons?
-
You are basically running into paragraph 6.3.2.3 Pointer ad 5 in conversion from
int
tochar*
in the assignment toTheAccesingPointer
.An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
The use of all
abs
functions makes it very dependent on the actual implementation what happens. Basically it will only work ifiTargetOfPointeracces
has a higher address thanStartVar
. If you lose all occurrences ofabs
I think you will get'U'
on most if not all architectures and with most if not all compilers.Ironically this is not undefined behavior but implementation defined behavior. But when you don’t get
'U'
theTheAccesingPointer
is not pointing to an entity of the referenced type, most likely it is not pointing to an entity at all.If it is not pointing to an entity then (of course) you will run into undefined behavior when dereferencing it in the
printf
following paragraph 6.5.3.2 ad 4The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
Let’s elaborate two scenarios where all addresses on the stack have bit 31 set, which is quite common under Linux.
Scenario A: Suppose
&StartVar < &iTargetOfPointeracces
thenabs(LegalPointerCast1) - abs(LegalPointerCast2) = LegalPointerCast2 - LegalPointerCast1 (by both < 0) = (char*)(&StartVar) - (char*)(&iTargetOfPointeracces) < 0 (by &StartVar < &iTargetOfPointeracces) So uiDiffOfVars = (char*)(&StartVar) - (char*)(&iTargetOfPointeracces) and signedIntToRespectTheRules = -uiDiffOfVars (by (int)uiDiffOfVars < 0) thus TheAccesingPointer = (char *)(&StartVar + (char*)(&iTargetOfPointeracces) - (char*)(&StartVar)) = (char*)(&iTargetOfPointeracces)
So in this scenario you will get
'U'
.Scenario B: Suppose
&StartVar > &iTargetOfPointeracces
thenabs(LegalPointerCast1) - abs(LegalPointerCast2) = LegalPointerCast2 - LegalPointerCast1 (by both < 0) = (char*)(&StartVar) - (char*)(&iTargetOfPointeracces) > 0 (by &StartVar > &iTargetOfPointeracces) So uiDiffOfVars = (char*)(&StartVar) - (char*)(&iTargetOfPointeracces) and signedIntToRespectTheRules = uiDiffOfVars (by (int)uiDiffOfVars > 0) thus TheAccesingPointer = (char *)(&StartVar + (char*)(&StartVar) - (char*)(&iTargetOfPointeracces)) = (char *)(2*(char*)&StartVar - (char*)(&iTargetOfPointeracces))
In this scenario it is very unlikely that
TheAccesingPointer
is pointing to some entity, so undefined behavior is triggered in dereferencing this pointer. So my point is that the calculation ofTheAccesingPointer
is implementation defined, where the above calculations are very common. If the computed pointer is not pointing toiTargetOfPointeracces
, as in scenario B, undefined behavior is triggered.Different optimization levels may result in a different order of
StartVar' and
iTargetOfPointeracces’ on the stack and that may explain the different result for different optimization levels.I don’t think single variables count as an array.
Originally posted 2013-11-10 00:10:39.