Loop distribution

OpenMP can distribute the work load of a for-loop over the available threads using a special #pragma (compiler directive)
#pragma omp for:
instructs the compiler to distribute loop iterations among the thread in the team

Syntax:

#pragma omp parallel { ... #pragma opm for for-loop ... }

Effect:

The workload of the for-loop is distribute among the threads in the parallel region

Example loop distribution

Example OpenMP program with #pragma omp for:

int main(int argc, char *argv[]) { int N = ... #pragma omp parallel { int id = omp_get_thread_num(); #pragma omp for for (int i = 0; i < N; i = i + 1) { printf("Thread ID = %d, i = %d\n", id, i); } } }

Output for N=10 using 4 threads:

Thread ID = 1, i = 3 Thread ID = 1, i = 4 Thread ID = 1, i = 5 Thread ID = 0, i = 0 Thread ID = 0, i = 1 Thread ID = 0, i = 2 Thread ID = 3, i = 8 Thread ID = 3, i = 9 Thread ID = 2, i = 6 Thread ID = 2, i = 7

DEMO: demo/OpenMP/openMP-for1.c

Review: approximating π using the Rectangle Rule

The single-threaded Rectangle Rule algorithm in C used to approximate π:

double f(double x) { return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi } int main(int argc, char *argv[]) { int N; double w, sum; N = ...; // N will determine the accuracy of the approximation w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0 sum = 0.0; double x; for (int i = 0; i < N; i++) { x = (i + 0.5) * w; sum = sum + w*f(x); } printf("Pi = %lf", sum); }

Parallelizing the Rectangle Rule Algorithm

We create a parallel region to execute the loop:

double f(double x) { return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi } int main(int argc, char *argv[]) { int N; double w, sum; N = ...; // N will determine the accuracy of the approximation w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0 sum = 0.0; #pragma omp parallel { double x; for (int i = 0; i < N; i++) { x = (i + 0.5) * w; sum = sum + w*f(x); } } printf("Pi = %lf", sum); }

Parallelizing the Rectangle Rule Algorithm

Distribute the load of the for-loop among the threads:

double f(double x) { return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi } int main(int argc, char *argv[]) { int N; double w, sum; N = ...; // N will determine the accuracy of the approximation w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0 sum = 0.0; #pragma omp parallel { double x; #pragma omp for for (int i = 0; i < N; i++) { x = (i + 0.5) * w; sum = sum + w*f(x); } } printf("Pi = %lf", sum); }

Parallelizing the Rectangle Rule Algorithm

Synchronize the updates to the shared variable with a critical section:

double f(double x) { return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi } int main(int argc, char *argv[]) { int N; double w, sum; N = ...; // N will determine the accuracy of the approximation w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0 sum = 0.0; #pragma omp parallel { double x; #pragma omp for for (int i = 0; i < N; i++) { x = (i + 0.5) * w; #pragma omp critical sum = sum + w*f(x); } } printf("Pi = %lf", sum); }

DEMO: demo/OpenMP/openMP-compute-pi5.c