A common parallel program structure

A common structure for computational parallel program

The follow program fragment depicts a common motif of parallel programs:

void *worker (void *arg) { .... (You will need to convert arg to the correct type before you can use the value passed) .... } int main(int argc, char *argv[]) { // SEQUENTIAL Section .... Prepare problem (setup shared variables) .... // PARALLEL Section // Start the worker threads for (i = 0; i < NUM_PROCESSORS; i = i + 1) { param[i] = ....; pthread_create(&tid[i], &attr, worker, & param[i]) } // Wait for all workers to finish for (i = 0; i < NUM_PROCESSORS; i = i + 1) pthread_join(tid[i], NULL); // SEQUENTIAL Section .... Post process results from workers.... (Or start another prepare section followed by a parallel + join section) }

Note:

You can pass a parameter of any type to the worker() function
But inside the worker() function, you will have to cast the value to the correct type

Use of global variables in multi-threaded programs

Fact:

The main function often need to pass a huge amount of data to the worker( ) functions to be processed

Passing a large amount of data between the main( ) function and the worker( ) function (that is executed by multiple threads) are usualy done through:

Shared (memory) variable(s) !!!

because:

You will see in the demo programs that the data that need to be processed are stored in a global variabel

You do this, so that:

The main( ) function can initialize the variables (usually an array)
The worker( ) function can access the variables !!!
(Otherwise, the worker( ) function cannot work !!!)

Note:

The worker( ) function sometimes will produce some output
It is easier to use global variables to pass the output value to the main( ) function

Example: Find the Minimum value in an array

When writing a parallel program, you must divide the work into (preferrably) non-overlapping subtasks
Usually there are many different ways to divide a task into subtasks
(And sometimes, it is not possible to divide the task into non-overlapping tasks and you may have to repeat some steps - necessary evil in parallel programming...
You may think that dividing the task of "Finding the minimum value" in an array is pretty straighforward - and yet, there are many ways to do so.
Some ways are better than others.
Let's do a simple example: split the task of "Finding the minimum value" in an array into 2 subtasks (each subtask performed by one thread).

Solution 1: (2 threads)

Split the array into 2 (approximate) equal halfs
Thread 1 finds the minimum in the first half of the array
Thread 2 finds the minimum in the second half of the array
Main thread waits for the results and find the actual minimum.

Pictorially:

In general, the division of the data in the array is as follows

Main Thread:

Create the worker( ) threads (run the function worker( ))
The main( ) function passes an id to each worker( ) function thread

Wait for the worker( ) function to find the minimum in its range

thread 0 will return the minimum value in variable min[0]
thread 1 will return the minimum value in variable min[1]
thread 2 will return the minimum value in variable min[2]
And so on

Find the overall minimum among the minimum values min[0], min[1], min[2], ....

C++ code for the main( ) function:

/* Shared Variables */ double x[1000000]; // Must be SHARED (accessed by worker threads !!) int start[100]; // Contain starting array index of each thread double min[100]; // Contain the minimum found by each thread int num_threads; // ----------------------------------- // Create worker threads.... // ----------------------------------- for (i = 0; i < num_threads; i = i + 1) { start[i] = i; // Pass ID to thread in a private variable if ( pthread_create(&tid[i], NULL, worker, (void *)&start[i]) ) { cout << "Cannot create thread" << endl; exit(1); } } // ----------------------------------- // Wait for worker threads to end.... // ----------------------------------- for (i = 0; i < num_threads; i = i + 1) pthread_join(tid[i], NULL); // ---------------------------------------- // Post processing: Find actual minimum // ---------------------------------------- my_min = min[0]; for (i = 1; i < num_threads; i++) if ( min[i] < my_min ) my_min = min[i];

Worker Thread:

Finds the minimum value in its portion of the array
Recall how the thread s determines its portion of work:

The C++ code for worker( ):

void *worker(void *arg) { int i, s; int n, start, stop; double my_min; n = MAX/num_threads; // number of elements to handle s = * (int *) arg; // Convert arg to an integer start = s * n; // Starting index if ( s != (num_threads-1) ) { stop = start + n; // Ending index } else { stop = MAX; } my_min = x[start]; for (i = start+1; i < stop; i++ ) // Find min in my range { if ( x[i] < my_min ) my_min = x[i]; } min[s] = my_min; // Store min in private slot return(NULL); /* Thread exits (dies) */ }

Example Program: (Demo above code)
- Prog file: click here
On Solaris compile with: CC -mt min-mt1.C
On Linux compile with: g++ -pthread min-mt1.C
Changes that you need to make to compile on Linux:
```
    #include <iostream.h>   ===>   #include <iostream>

    Add line:  using namespace std;
```

Find the Minimum value in an array - take 2

Let's do the "Find min" example again, now splitting the task of "Finding the minimum value" in an array in a different manner

Solution 2:

Split the array into 2 (approximate) equal halfs
Thread 1 finds the minimum in the odd-indexed elements of the array
(I.e.: x[0], x[2], x[4], etc)
Thread 2 finds the minimum in the even-indexed elements of the array
(I.e.: x[1], x[3], x[5], etc)
Main thread waits for the results and find the actual minimum.

Pictorially:

values handled by thread 0 | | | | | | | | | | | | | | V V V V V V V V V V V V V V |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-| ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | values handled by thread 1 Thread 0 Thread 1 | | | | V V min[0] min[1] \ / \ / \ / \ / \ / main thread | | V Actual minimum

The division of labor in general is:

Main Thread: (UNCHANGED)

// ----------------------------------- // Create worker threads.... // ----------------------------------- for (i = 0; i < num_threads; i = i + 1) { start[i] = i; // Pass ID to thread in a private variable if ( pthread_create(&tid[i], NULL, worker, (void *)&start[i]) ) { cout << "Cannot create thread" << endl; exit(1); } } // ----------------------------------- // Wait for worker threads to end.... // ----------------------------------- for (i = 0; i < num_threads; i = i + 1) pthread_join(tid[i], NULL); // ---------------------------------------- // Post processing: Find actual minimum // ---------------------------------------- my_min = min[0]; for (i = 1; i < num_threads; i++) if ( min[i] < my_min ) my_min = min[i];

Worker Thread: (CHANGED !!!)

void *worker(void *arg) { int i, s; double my_min; s = * (int *) arg; // Convert arg to an integer // -------------------------------------- // Find min in my range // -------------------------------------- my_min = x[s]; for (i = s+num_threads; i < MAX; i += num_threads) { if ( x[i] < my_min ) my_min = x[i]; } min[s] = my_min; // Store min in private slot return(NULL); /* Thread exits (dies) */ }

See the elements processed by the thread s:

It's much easier to code the worker thread !!!

Example Program: (Demo above code)
- Prog file: click here
Compile with: g++ -pthread min-mt2.C

Speed up...
- Try running the programs using different threads (the program prints the elapsed time)
- Notice that the first version have drastically improved times on multi-processors (e.g. on compute
  But the second version... no so much...
- $60,000 question:
  - Why the second version is not doing so great ?
- Answer: paging...