The OpenMP synchronization structure (pragma critical) and example (compute Pi)

Synchronization Primitives in OpenMP
- There are a number of synchronization methods in OpenMP, but for this short intro, I will limit to one: mutual exclusion.
- Recall that concurrent updates to shared variables must be done within a pthread_mutex_lock and pthread_mutex_unlock pair.
- This mutual exclusion effect is achieved in OpenMP using the following pragma:

Example OpenMP program with synchronization: compute Pi

Consider the following sequential program that estimate Pi by integrating 2.0 / sqrt(1 - x*x):

double f(double a) { return( 2.0 / sqrt(1 - a*a) ); } int main(int argc, char *argv[]) { int i; int N; double sum; double x, w; N = ...; // accuracy of the approximation w = 1.0/N; sum = 0.0; for (i = 1; i <= N; i = i + 1) { x = w*(i - 0.5); sum = sum + w*f(x); } cout << sum; }

Example Program: (Sequential program for Pi) --- click here
Compile with:
Run the program with:
(We have seen this program before, so I will not explain it again: click here )
We will parallelize the for-loop using OpenMP

Parallel Open MP program to estimate Pi in C/C++:
When we parallelize, it is important to know which UPDATES must be SYNCHRONIZED:
- When a variable is updated by different threads, the various updates must be synchronized
Updates that need to be synchronized:
- The only variable that are updated by different threads is: sum
- So updates to sum must be synchronized.

First Version of the OpenMP program to compute Pi:

double f(double a) { return( 2.0 / sqrt(1 - a*a) ); } int main(int argc, char *argv[]) { int N; double sum; // Shared variable, updated ! double x, w; N = ...; // accuracy of the approximation w = 1.0/N; sum = 0.0; #pragma omp parallel { int i, num_threads; // Non-shared variables !!! double x; num_threads = omp_get_num_threads() ; for (i = omp_get_thread_num(); i < N; i = i + num_threads) { x = w*(i + 0.5); #pragma omp critical { sum = sum + w*f(x); } } } cout << sum; }

Example Program: (First version of OpenMP to compute Pi) --- click here
Compile with:
Run with:
Change OMP_NUM_THREADS and see the difference in performance

Problem with the first version:
- Too many synchronizations operations !!!
- Notice that the CRITICAL pragme is INSIDE the for-loop
To eliminate synchronizations, we use private variables

Improved Version of the OpenMP program to compute Pi:

double f(double a) { return( 2.0 / sqrt(1 - a*a) ); } int main(int argc, char *argv[]) { int N; double sum; // Shared variable, updated ! double x, w; N = ...; // accuracy of the approximation w = 1.0/N; sum = 0.0; #pragma omp parallel { int i, num_threads; double x; double mypi; // Private variable to reduce synchronization num_threads = omp_get_num_threads() ; mypi = 0.0; for (i = omp_get_thread_num(); i < N; i = i + num_threads) { x = w*(i + 0.5); mypi = mypi + w*f(x); } #pragma omp critical { sum = sum + mypi; } } cout << sum; }

Example Program: (OpenMP compute Pi) --- click here
Compile with:
Run with:
Change OMP_NUM_THREADS and see the difference in performance