It is a very simple and mechanical process to divide up the work over a number of threads simply by scheduling a different thread to work on the for-body using a different index value.
The division of labor (splitting the work of a for-loop) of a for-loop can be done in OpenMP through a special Parallel LOOP construct.
#pragma omp parallel { .... #pragma omp for [parameters] for-statement // Parallel Loop .... } |
Each iteration of the for-loop is executed exactly once by each thread.
The loop variable used in the Parallel LOOP construct is by default PRIVATE (other variables are still by default SHARED)
double f(double a)
{
return( 2.0 / sqrt(1 - a*a) );
}
int main(int argc, char *argv[])
{
int N;
double sum; // Shared variable, updated !
double x, w;
N = ...; // accuracy of the approximation
w = 1.0/N;
sum = 0.0;
#pragma omp parallel
{
int i;
double mypi, x;
mypi = 0.0;
#pragma omp for
for (i = 0; i < N; i = i + 1)
{
x = w*(i + 0.5); // Save us the trouble of dividing
mypi = mypi + w*f(x); // the work up...
}
#pragma omp critical
{
sum = sum + mypi;
}
}
cout << sum;
}
|
The C/C++ compiler will insert instructions that distribute the execution of the each iteration of for-loop to some thread - it is no longer your problem to "skip" index count to accomplish load distribution !
export OMP_NUM_THREADS=8
a.out 50000000
Change OMP_NUM_THREADS and see the difference in performance
setenv STACKSIZE nBytes |