Computing π with numerical integration

One of the many ways to compute the value of π is:

1 π = ∫ (2.0/sqrt(1 - x*x)) dx 0

You can prove this equality by compute the integral using Matlab as follows:
>> syms x >> int( 2/sqrt(1-x*x), 0, 1 ) ans = pi
We will use this integral to compute an approximation for π using:
Numerical integration

DEMO: matlab -nodesktop

Numerical integration

The integral:
1 ∫ f(x) dx 0
is equal to the area under the function f(x) between the points x = 0 and x = 1:

Numerical integration

The integral:
1 ∫ f(x) dx 0
can be approximated by the sum of the area of N rectangles:

The width of each rectangle w = 1/N.

Numerical integration

The area of the first rectangle:
1 ∫ f(x) dx ~= w*f(0.5w) + ... 0
is equal to w × f(0.5w):

The width = w and the height = f(0.5w)

Numerical integration

The area of the 2nd rectangle:
1 ∫ f(x) dx ~= w*f(0.5w) + w*f(1.5w) + ... 0
is equal to w × f(1.5w):

The width = w and the height = f(1.5w)

Numerical integration

The sum of the area of N rectangles is:
1 ∫ f(x) dx ~= w*f(0.5w) + w*f(1.5w) + ... + w*f( (N-1)+0.5)*w ) 0
which can be computed numerically

Note: the approximation is better when N is larger....

Numerical integration

The integral:

1 ∫ f(x) dx ~= w*f(0.5w) + w*f(1.5w) + ... + w*f( (N-1)+0.5)*w ) 0

can be approximated numerically as follows:

Integral = 0; w = (1 - 0)/N; // The width of each rectangle for (int i = 0; i < N; i++) { x = (i+0.5)*w; Integral = Integral + w * f(x); // w * f(x) is area of rectangle }

This algorithm is called the "Rectangle Rule" in Numerical Analysis...

Approximating π using the Rectangle Rule

The Rectangle Rule algorithm in C used to approximate π:

double f(double x) { return( 2.0 / sqrt(1 - x*x) ); // Used to approximate pi } int main(int argc, char *argv[]) { int i; int N; double sum; double x, w; N = ...; // N will determine the accuracy of the approximation w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0 sum = 0.0; for (i = 0; i < N; i++) { x = (i + 0.5) * w; sum = sum + w*f(x); } printf("Pi = %lf", sum); }

DEMO: demo/pthread/compute-pi.c

Common technique to parallelize program with multi-threaded program

How to parallelize a single-threaded program:
Consider program segments where work can be performed concurrently
Good place to look:
Loops !!!

Example of a loop that can be parallelized:

for (i = 0; i < N; i = i + 1) { x = w*(i + 0.5); sum = sum + w*f(x); } Loop computes: w*f(0.5w) + w*f(1.5w) + w*f(2.5w) + ... + w*f( (N-1)+0.5)*w )

Parallelizing loops with multi-threaded programming

Example of a loop that can be parallelized with multi-threaded programming:

for (i = 0; i < N; i = i + 1) { x = w*(i + 0.5); sum = sum + w*f(x); } Loop computes: w*f(0.5w) + w*f(1.5w) + w*f(2.5w) + ... + w*f( (N-1)+0.5)*w )

Traditional work load distribution between 2 threads:

values added by thread 0 values added by thread 1 |<----------------------------->|<----------------------------->| Thread 0 computes: w*f(0.5w) + w*f(1.5w) + .... w*f(0.5-0.5w) Thread 1 computes: w*f(0.5+0.5w) + w*f(0.5+1.5w) + ... + w*f(1.0+0.5w)

Parallelizing loops with multi-threaded programming

Example of a loop that can be parallelized with multi-threaded programming:

for (i = 0; i < N; i = i + 1) { x = w*(i + 0.5); sum = sum + w*f(x); } Loop computes: w*f(0.5w) + w*f(1.5w) + w*f(2.5w) + ... + w*f( (N-1)+0.5)*w )

Alternate way to distribute the work load between 2 threads ( easier to code):

values added by thread 0 | | | | | | | | | | | | | | V V V V V V V V V V V V V V |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-| ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | values added by thread 1 Thread 0 computes: w*f( (0+0.5)w ) + w*f( (2+0.5)w ) + .... Thread 1 computes: w*f( (1+0.5)w ) + w*f( (3+0.5)w ) + ....

Comment: paging and multi-threaded programming

Multi-threaded program that scans of memory data can be affected by paging

Example:

values accessed by thread 0 | | | | | | | | | | | | | | V V V V V V V V V V V V V V |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-| ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | values accessed by thread 1 Thread 0 accesses: a[0] a[2] ... // Access memory variables Thread 1 accesses: a[1] a[3] ... // Access memory variables

However: computation that do not access memory is not affected by paging:

Thread 0 computes: w*f( (0+0.5)w ) + w*f( (2+0.5)w ) + .... Thread 1 computes: w*f( (1+0.5)w ) + w*f( (3+0.5)w ) + ....

Multi-threaded program to compute the approximation π using the Rectangle Rule

Recall the single-threaded program in C to approximate π:

int main(int argc, char *argv[]) { int i; int N; // <----- Share this variable double sum; // <----- Share this variable double x, w; N = ...; // N will determine the accuracy of the approximation w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0 sum = 0.0; for (i = 0; i < N; i++) { x = (i + 0.5) * w; sum = sum + w*f(x); } }

Multi-threaded program to compute the approximation π using the Rectangle Rule

We must make N and sum shared variables because they are used by all threads:

int N; // # intervalues Shared variables double sum; // Sum of rectangles int main(int argc, char *argv[]) { N = ...; // N will determine the accuracy of the approximation sum = 0.0; Create threads to do the work... Wait for threads to finish... }

Multi-threaded program to compute the approximation π using the Rectangle Rule

Create num_threads threads to do the work:

int N; // # intervalues Shared variables double sum; // Sum of rectangles int num_threads; // # Threads used int main(int argc, char *argv[]) { N = ...; // N will determine the accuracy of the approximation sum = 0.0; num_threads = ...; // # threads used pthread_t tid[100]; // Used for pthread_join() int id[100]; // Start index values for each thread for (int i = 0; i < num_threads; i++) { id[i] = i; pthread_create(&tid[i], NULL, PIworker, &id[i]); } }

Multi-threaded program to compute the approximation π using the Rectangle Rule

Wait for all threads to complete:

The PIworker( ) function

The PIworker(id) function will compute a part of the total sum that starts at index id:

int N; // # intervalues Shared variables double sum; // Sum of rectangles int num_threads; // # Threads used void *PIworker(int *id) { int i, myStart; double x; double w = 1.0/N; myStart = *id; Computes: for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x sum = sum + w*f(x); // Add to sum } }

The PIworker( ) function

The PIworker(id) function will compute a part of the total sum that starts at index id:

int N; // # intervalues Shared variables double sum; // Sum of rectangles int num_threads; // # Threads used void *PIworker(int *id) { int i, myStart; double x; double w = 1.0/N; /*** Get the parameter (which is my starting index) ***/ myStart = *id; /*** Compute sum, skipping every "num_threads" items ***/ for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x sum = sum + w*f(x); // Add to sum } pthread_exit(NULL); }

DEMO: demo/pthread/compute-pi-mt0.c --- Synchronization error...

Synchronizing the updates in the PIworker( ) function

We must use a mutex variable to synchronize the updates to the shared variable sum:

int N; // # intervalues Shared variables double sum; // Sum of rectangles int num_threads; // # Threads used pthread_mutex_t sum_mutex; void *PIworker(int *id) { int i, myStart; double x; double w = 1.0/N; /*** Get the parameter (which is my starting index) ***/ myStart = *id; /*** Compute sum, skipping every "num_threads" items ***/ for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x pthread_mutex_lock(&sum_mutex); sum = sum + w*f(x); // Add to sum pthread_mutex_unlock(&sum_mutex); } pthread_exit(NULL); }

Synchronizing the updates in the PIworker( ) function

We must initialize the mutex variable in main( ) before creating the threads:

pthread_mutex_t sum_mutex; int main(int argc, char *argv[]) { N = ...; // N will determine the accuracy of the approximation w = 1.0/N; // a = 0, b = 1, so: (b-a) = 1.0 sum = 0.0; pthread_mutex_int(&sum_mutex, NULL); num_threads = ...; // # threads used pthread_t tid[100]; // Used for pthread_join() int id[100]; // Start index values for each thread for (i = 0; i < num_threads; i++) { id[i] = i; pthread_create(&tid[i], NULL, PIworker, &id[i]); } for (i = 0; i < num_threads; i = i + 1) // Wait for threads pthread_join(tid[i], NULL); }

DEMO: demo/pthread/compute-pi-mt1.c --- Synchronization bottleneck...

Synchronization Bottleneck

Simultaneous updates to shared variables can cause:
Synchronization bottlenecks in parallel programs

Synchronization bottleneck:

When parallel executions must wait for each other
The parallel executions will behave like a sequential execution

Example: only one (= 1) thread will be able to execute sum = sum + + w*f(x)

for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x pthread_mutex_lock(&sum_mutex); sum = sum + w*f(x); // Serial execution !! pthread_mutex_unlock(&sum_mutex); }

Avoiding synchronized updates

Synchronization is required to update shared variable:

int N, num_threads; double sum; // <---- Shared variable pthread_mutex_t sum_mutex; void *PIworker(int *id) { int i, myStart; double x; double w = 1.0/N; /*** Get the parameter (which is my starting index) ***/ myStart = *id; /*** Compute sum, skipping every "num_threads" items ***/ for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x pthread_mutex_lock(&sum_mutex); sum = sum + w*f(x); // Synchronize access to shared var pthread_mutex_unlock(&sum_mutex); } pthread_exit(NULL); }

Avoiding synchronized updates

Use a local variable to avoid synchronization:

int N, num_threads; double sum; // <---- Shared variable pthread_mutex_t sum_mutex; void *PIworker(int *id) { int i, myStart; double x; double w = 1.0/N; /*** Get the parameter (which is my starting index) ***/ myStart = *id; int mySum = 0.0; // <---- Not shared for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x mySum = mySum + w*f(x); // Use local var to avoid synchr } pthread_exit(NULL); }

Avoiding synchronized updates

Add the final total using a synchronized update:

int N, num_threads; double sum; // <---- Shared variable pthread_mutex_t sum_mutex; void *PIworker(int *id) { int i, myStart; double x; double w = 1.0/N; /*** Get the parameter (which is my starting index) ***/ myStart = *id; int mySum = 0.0; // <---- Not shared for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x mySum = mySum + w*f(x); // Use local var to avoid synchr } pthread_mutex_lock(&sum_mutex); sum = sum + mySum; // Synchronize access to shared var pthread_mutex_unlock(&sum_mutex); pthread_exit(NULL); }

DEMO: demo/pthread/compute-pi-mt2.c

❮ ❯