Is there an advantage to knowing at compile time the number of iterations of a for loop?-Collection of common programming errors


  • Henry Gomersall

    Is there some advantage to knowing at compile time how many iterations should be done of a for loop?

    For example, in some situations, can the compiler produce an executable that will run faster given this:

    #define ITERATIONS 10
    int foo()
    {
        for (int i=0; i < ITERATIONS; i++){
            do_something();
        }
    }
    

    than given this:

    int foo(int iterations)
    {
        for (int i=0; i < iterations; i++){
            do_something();
        }
    }
    

    If this is not universally the case, what are those situations?

    My concern is in the specific case of OpenCL, so I’m also interested to know if this is different to C.


  • David Schwartz

    I tested in a fairly realistic situation using GCC. When the number of loops is known at compile time, I get this:

    .L2:
        call    do_something
        subl    $1, %ebx
        jne .L2
    

    When it’s not, I get this:

    .L6:
        call    do_something
        addl    $1, %ebx
        cmpl    %ebp, %ebx
        jne .L6
    

    So it was able to optimize the fixed number of iterations slightly better by changing to a count down to zero loop rather than a count up loop. If nothing else, this uses a bit less code cache.

    With higher optimizations levels, it fully unrolls a loop that invokes an external function ten times. Presumably, it wouldn’t do that unless it thought it was better. And it surely can’t do that if the number of iterations is unknown.

    Short answer: Fixed numbers of iterations give compilers more options. That should result in very slightly better code at least some of the time.


  • Alex Placet

    Indeed, it depends on the compiler. But effectively it allows the unrolling of loops. Here is an Intel example and a AMD example.

    With NVIDIA you need to had in your kernel: #pragma OPENCL EXTENSION cl_nv_pragma_unroll : enable.

    So you can use a flag when you compiling, like: -DITERATIONS=10


  • Daniel Fischer

    You need to initialise i,

    for (int i; i < ITERATIONS; i++){
    

    is undefined behaviour and allows the compiler to entirely skip the loop 😉

    Apart from that, if the number of iterations is known at compile time, the compiler can completely unroll the loop, for example, which can make a big difference (if the loop body is cheap).


  • Minion91

    This will depend a lot on the compiler and optimization settings you are using. Most likely, knowing the amount of loops will allow the compiler to unroll the loop and get rid of the branches and parameter i.

    This is however completely compiler dependent, so don’t count on it. But if you have a modern compiler, chances are big it will improve speed.

    edit:

    I read over the obvious uninitialized i, but you probably did too…


  • gcbenison

    In the case of gcc-4.5.3, with do_something() {printf("hello world")}:

    • Without optimization or at -O2, the compiler does not unroll either loop.
    • At -O4, for the pre-known number of iterations only, the compiler unrolls the loop and inlines do_something():

    .

    foo:
    ...
    .L4:
            addl    $1, %ebx
            movl    $.LC0, 4(%esp)
            movl    $1, (%esp)
            call    __printf_chk
            cmpl    %ebx, %esi
            jg      .L4
    
    main:
    ...
            pushl   %ebp
            movl    %esp, %ebp
            andl    $-16, %esp
            subl    $16, %esp
            movl    $.LC0, 4(%esp)
            movl    $1, (%esp)
            call    __printf_chk
            movl    $.LC0, 4(%esp)
            movl    $1, (%esp)
            call    __printf_chk
            movl    $.LC0, 4(%esp)
            movl    $1, (%esp)
            call    __printf_chk
    ...