The workload of the
for-loop is
distributeamong
the threads in the
parallel region
Example
loop distribution
ExampleOpenMP program with
#pragma omp for:
int main(int argc, char *argv[])
{
int N = ...
#pragma omp parallel
{
int id = omp_get_thread_num();
#pragma omp for
for (int i = 0; i < N; i = i + 1)
{
printf("Thread ID = %d, i = %d\n", id, i);
}
}
}
Output for
N=10using4 threads:
Thread ID = 1, i = 3
Thread ID = 1, i = 4
Thread ID = 1, i = 5
Thread ID = 0, i = 0
Thread ID = 0, i = 1
Thread ID = 0, i = 2
Thread ID = 3, i = 8
Thread ID = 3, i = 9
Thread ID = 2, i = 6
Thread ID = 2, i = 7
DEMO:
demo/OpenMP/openMP-for1.c
Review:
approximating π
using the Rectangle Rule
The single-threadedRectangle Rulealgorithm in
C used to
approximateπ:
double f(double x)
{
return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi
}
int main(int argc, char *argv[])
{
int N;
double w, sum;
N = ...; // N will determine the accuracy of the approximation
w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0
sum = 0.0;
double x;
for (int i = 0; i < N; i++)
{
x = (i + 0.5) * w;
sum = sum + w*f(x);
}
printf("Pi = %lf", sum);
}
Parallelizing the
Rectangle Rule Algorithm
We create
a parallel region to
execute the
loop:
double f(double x)
{
return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi
}
int main(int argc, char *argv[])
{
int N;
double w, sum;
N = ...; // N will determine the accuracy of the approximation
w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0
sum = 0.0;
#pragma omp parallel{
double x;
for (int i = 0; i < N; i++)
{
x = (i + 0.5) * w;
sum = sum + w*f(x);
}
}
printf("Pi = %lf", sum);
}
Parallelizing the
Rectangle Rule Algorithm
Distribute
the load of
the for-loop among the
threads:
double f(double x)
{
return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi
}
int main(int argc, char *argv[])
{
int N;
double w, sum;
N = ...; // N will determine the accuracy of the approximation
w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0
sum = 0.0;
#pragma omp parallel{
double x;
#pragma omp for
for (int i = 0; i < N; i++)
{
x = (i + 0.5) * w;
sum = sum + w*f(x);
}
}
printf("Pi = %lf", sum);
}
Parallelizing the
Rectangle Rule Algorithm
Synchronize
the updates to
the shared variable
with a
critical section:
double f(double x)
{
return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi
}
int main(int argc, char *argv[])
{
int N;
double w, sum;
N = ...; // N will determine the accuracy of the approximation
w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0
sum = 0.0;
#pragma omp parallel{
double x;
#pragma omp for
for (int i = 0; i < N; i++)
{
x = (i + 0.5) * w;
#pragma omp criticalsum = sum + w*f(x);
}
}
printf("Pi = %lf", sum);
}