Not all access operations are conflicting, however.
Thread 1 on Thread 2 on Memory CPU 1 CPU 2 ============== =================== ================= N = 1234 Read N --> 1234 Add 1 --> 1235 Read N --> 1234 N = 1235 Write N Add 1 --> 1235 N = 1235 Write N |
#include <pthread.h> int N; <========= Shard variable N pthread_t tid[100]; // Each thread executes the following function: void *worker(void *arg) { int i, k, s; for (i = 0; i < 10000; i = i + 1) { N = N + 1; <======= Multiple threads access shared variable N } cout << "Added 10000 to N" << endl; return(NULL); /* Thread exits (dies) */ } /* ======================= MAIN ======================= */ int main(int argc, char *argv[]) { int i, num_threads; num_threads = atoi(argv[1]); /* ------ Create threads ------ */ for (i = 0; i < num_threads; i = i + 1) { if ( pthread_create(&tid[i], NULL, worker, NULL) ) { cout << "Cannot create thread" << endl; exit(1); } } N = 0; // Wait for all threads to terminate for (i = 0; i < num_threads; i = i + 1) pthread_join(tid[i], NULL); cout << "N = " << N << endl << endl; exit(0); } |
How to run the program:
|
|
|
|
|
pthread_mutex_t x; |
After defining the mutex lock variable, you must initialized it using the following function:
int pthread_mutex_init(pthread_mutex_t *mutex, pthread_mutexattr_t *attr ); |
The most common mutex lock is one where the lock is initially in the unlock.
This kind of mutex lock is created using the (default) attribute null:
Example: initialize a mutex variable
pthread_mutex_t x; /* Define a mutex lock "x" */ pthread_mutex_init(&x, NULL); /* Initialize "x" */ |
int pthread_mutex_lock(pthread_mutex_t *mutex); |
Example:
pthread_mutex_t x; pthread_mutex_init(&x, NULL); ... pthread_mutex_lock(&x); |
int pthread_mutex_unlock(pthread_mutex_t *mutex); |
Example:
pthread_mutex_unlock(&x); |
Whenever a thread want to update a shared variable, it must enclose the operation between a "lock - unlock" pair.
Example:
int N; // SHARED variable pthread_mutex_t N_mutex; // Mutex controlling access to N void *worker(void *arg) { int i; for (i = 0; i < 10000; i = i + 1) { pthread_mutex_lock(&N_mutex); N = N + 1; pthread_mutex_unlock(&N_mutex); } } |
This particular thread will then be the only thread that will update the variable N, thus ensuring that N is updated sequential (one thread after another)
Compare the behavior of this program with the one that does not use MUTEX to control access to N: click here
A common error in parallel programs is to forget the unlock call (especially if the call is made after many statments)... the result is deadlock - you program hangs (no progress)
Integrate( f(x) = 2.0 / sqrt(1 - x*x) , x = 0 to x = 1 ) |
Maple: > integrate(2.0 / sqrt(1 - x*x), x=0..1); 3.141592654 |
How to approximate:
|
(See the figure above !!!)
We can see that:
Integral ~= w * f(0.5w) + w * f(1.5w) + w + f(2.5w) + ...... |
In pseudo code:
Integral = 0; for (i = 0; i < N; i++) { x = (i+0.5)*w; Integral = Integral + w * f(x); // w * f(x) is area of rectangle } |
The entire program in C/C++:
double f(double a) { return( 2.0 / sqrt(1 - a*a) ); } int main(int argc, char *argv[]) { int i; int N; double sum; double x, w; N = ...; // Will determine the accuracy of the approximation w = 1.0/N; sum = 0.0; for (i = 0; i < N; i = i + 1) { x = (i + 0.5) * w; sum = sum + w*f(x); } cout << sum; } |
Compile with:
Run the program with:
Often, a small amount of (shared) information is updated within every execution of the loop.
The program can be speed up when non-conlficting operations are performed concurrently (in parallel), while conlficting operations to the shared information (variable) are performed sequentially
double f(double a) { return( 2.0 / sqrt(1 - a*a) ); } int main(int argc, char *argv[]) { int i; int N; double sum; double x, w; N = ...; // Will determine the accuracy of approximation w = 1.0/N; sum = 0.0; for (i = 0; i < N; i = i + 1) { x = w*(i + 0.5); // We can make x non-shared.. sum = sum + w*f(x); // sum is SHARED !!! } cout << sum; } |
      w*(f(0.5w) + f(1.5w) + f(2.5w) + ... + f(0.5-0.5w) )
and thread 2 compute the "second half" of partial sum
      w*(f(0.5+0.5w) + f(0.5+1.5w) + f(0.5+2.5w) + ... + f(1-0.5w) )
Pictorially:
values added by values added by thread 1 thread 2 |<--------------------->|<--------------------->| |
      w*(f(0.5w) + f((2 + 0.5)w) + f((4+0.5)w) + ... )
and thread 2 compute the "odd stepped" partial sum
      w*(f((1+0.5)w) + f((3+0.5)w) + f((5+0.5)w) + ... ) Pictorially:
values added by thread 1 | | | | | | | | | | | | | | V V V V V V V V V V V V V V |-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-| ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | | | | | | | | | | values added by thread 2 |
It turns out that the "even stepped" and "odd stepped" approach of partition is more easier to program in many instances !!!
NOTE: We don't access any array, so the paging problem is not applicable !!!
The division of labor is as follows for N threads:
/*** Shared variables, but not updated.... ***/ int N; // # intervals double w; // width of one interval int num_threads; // # threads /*** Shared variables, updated !!! ***/ double sum; pthread_mutex_t sum_mutex; // Mutex to control access to sum int main(int argc, char *argv[]) { int Start[100]; // Start index values for each thread pthread_t tid[100]; // Used for pthread_join() int i; N = ...; // Read N in from keyboard... w = 1.0/N; // "Broadcast" w num_threads = ... // Skip distance for each thread sum = 0.0; // Initialized shared variable /**** Make worker threads... ****/ for (i = 1; i <= N; i = i + 1) { Start[i] = i; // Start index for thread i if ( pthread_create(&tid[i], NULL, PIworker, &Start[i]) ) { cout << "Cannot create thread" << endl; exit(1); } } /**** Wait for worker threads to finish... ****/ for (i = 0; i < num_threads; i = i + 1) pthread_join(tid[i], NULL); cout << sum; } |
Worker thread:void *PIworker(void *arg) { int i, myStart; double x; /*** Get the parameter (which is my starting index) ***/ myStart = * (int *) arg; /*** Compute sum, skipping every "num_threads" items ***/ for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x pthread_mutex_lock(&sum_mutex); sum = sum + w*f(x); // Add to sum pthread_mutex_unlock(&sum_mutex); } return(NULL); /* Thread exits (dies) */ } |
Are you surprised ???
Worker thread:void *PIworker(void *arg) { int i, myStart; double x; double tmp; // local non-shared variable /*** Get the parameter (which is my starting index) ***/ myStart = * (int *) arg; /*** Compute sum, skipping every "num_threads" items ***/ for (i = myStart; i < N; i = i + num_threads) { x = w * ((double) i + 0.5); // next x tmp = tmp + w*f(x); // No synchr. needed } pthread_mutex_lock(&sum_mutex); sum = sum + tmp; // Synch only ONCE !!! pthread_mutex_unlock(&sum_mutex); return(NULL); /* Thread exits (dies) */ } |
What a difference it can make where you put the synchronization points in a parallel program....
The value (state) of the read/write lock is changed to read locked
The value (state) of the read/write lock remains read locked, but a count is increased (so we know how many times a read lock operation has been performed)
(When the state of the read/write lock does become unlocked, the read lock command will complete and change the state of the read/write lock to read locked)
The value (state) of the read/write lock is changed to write locked
(When the state of the read/write lock does become unlocked, the write lock command will complete and change the state of the read/write lock to write locked)
(When the state of the read/write lock does become unlocked, the write lock command will complete and change the state of the read/write lock to write locked)
pthread_rwlock_t x; |
After defining the read/write lock variable, you must initialized it using the following function:
int pthread_rwlock_init(pthread_rwlock_t *rwlock, pthread_rwlockattr_t *attr ); |
The most common read/write lock is one where the lock is initially in the unlock.
This kind of mutex lock is created using the (default) attribute null:
Example: a read/write lock with an initial
unlock state
pthread_rwlock_init(&x, NULL); /* Default initialization */ |
int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock); |
Example:
pthread_rwlock_t x; pthread_rwlock_init(&x, NULL); ... pthread_rwlock_rdlock(&x); |
int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock); |
Example:
pthread_rwlock_t x; pthread_rwlock_init(&x, NULL); ... pthread_rwlock_wrlock(&x); |