Loop distribution

  • OpenMP can distribute the work load of a for-loop over the available threads using a special #pragma (compiler directive)

  • #pragma omp for:

    • instructs the compiler to distribute loop iterations among the thread in the team

  • Syntax:

    #pragma omp parallel
    {
        ...
        #pragma opm for
        for-loop
        ...
    }
    

    Effect:

    • The workload of the for-loop is distribute among the threads in the parallel region

Example loop distribution

  • Example OpenMP program with #pragma omp for:

    int main(int argc, char *argv[])
    {
        int N = ...
    
      #pragma omp parallel
      {
        int id = omp_get_thread_num();
    
        #pragma omp for
        for (int i = 0; i < N; i = i + 1)
        {
            printf("Thread ID = %d, i = %d\n", id, i);
        }
      }
    }
    

    Output for N=10 using 4 threads:

    Thread ID = 1, i = 3
    Thread ID = 1, i = 4
    Thread ID = 1, i = 5
    Thread ID = 0, i = 0
    Thread ID = 0, i = 1
    Thread ID = 0, i = 2
    Thread ID = 3, i = 8
    Thread ID = 3, i = 9
    Thread ID = 2, i = 6
    Thread ID = 2, i = 7
    

DEMO: demo/OpenMP/openMP-for1.c

Review:   approximating π using the Rectangle Rule

  • The single-threaded Rectangle Rule algorithm in C used to approximate π:

    double f(double x)
    {
        return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi
    }
    
    int main(int argc, char *argv[])
    {
        int    N;
        double w, sum;
    
        N = ...;     // N will determine the accuracy of the approximation
        w = 1.0/N;   // a = 0, b = 1, so: (b-a) = 1.0
    
        sum = 0.0;
    
    
    
        double x;
    
    
        for (int i = 0; i < N; i++)
        {
            x = (i + 0.5) * w;
    
    
            sum = sum + w*f(x);
        }
    
        printf("Pi = %lf", sum);
    }

 

Parallelizing the Rectangle Rule Algorithm

  • We create a parallel region to execute the loop:

    double f(double x)
    {
        return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi
    }
    
    int main(int argc, char *argv[])
    {
        int    N;
        double w, sum;
    
        N = ...;     // N will determine the accuracy of the approximation
        w = 1.0/N;   // a = 0, b = 1, so: (b-a) = 1.0
    
        sum = 0.0;
    
    #pragma omp parallel
    {
        double x;
    
    
        for (int i = 0; i < N; i++)
        {
            x = (i + 0.5) * w;
    
    
            sum = sum + w*f(x);
        }
    }
        printf("Pi = %lf", sum);
    }

 

Parallelizing the Rectangle Rule Algorithm

  • Distribute the load of the for-loop among the threads:

    double f(double x)
    {
        return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi
    }
    
    int main(int argc, char *argv[])
    {
        int    N;
        double w, sum;
    
        N = ...;     // N will determine the accuracy of the approximation
        w = 1.0/N;   // a = 0, b = 1, so: (b-a) = 1.0
    
        sum = 0.0;
    
    #pragma omp parallel
    {
        double x;
    
        #pragma omp for
        for (int i = 0; i < N; i++)
        {
            x = (i + 0.5) * w;
    
    
            sum = sum + w*f(x);
        }
    }
        printf("Pi = %lf", sum);
    }

 

Parallelizing the Rectangle Rule Algorithm

  • Synchronize the updates to the shared variable with a critical section:

    double f(double x)
    {
        return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi
    }
    
    int main(int argc, char *argv[])
    {
        int    N;
        double w, sum;
    
        N = ...;     // N will determine the accuracy of the approximation
        w = 1.0/N;   // a = 0, b = 1, so: (b-a) = 1.0
    
        sum = 0.0;
    
    #pragma omp parallel
    {
        double x;
    
        #pragma omp for
        for (int i = 0; i < N; i++)
        {
            x = (i + 0.5) * w;
    
            #pragma omp critical
            sum = sum + w*f(x);
        }
    }
        printf("Pi = %lf", sum);
    }

DEMO: demo/OpenMP/openMP-compute-pi5.c