Thread of execution in a program

A running program is commonly known as:
A process
Each process has at least one (1) thread of execution:

Executing a C/Java program:

Initially, there is one thread of execution
The thread of execution starts with the main( ) function
The thread of execution follows the program flow in the C/Java program

Review: Threads

A thread:
is a unit of execution
implemented using an execution context
Thread context:
contains all the information a thread needs to resume execution
consists of: (information in registers)
The Program Counter
The values in all of the registers
The stack (or: the stack pointer)

Difference between thread context and process context:

Context switching between processes is done by the operating system and takes more time
Context switching between threads does not require an operating system call and takes less time

Single-threaded execution

A process that has one thread of execution has a:
Single-threaded execution
A Shared Memory MIMD computer can execute multiple single-threaded processes:
As we have discussed before:
This is not parallel programming

Multi-threaded execution and parallel programming

A process that has more than one thread of execution has a:
Multi-threaded execution
In parallel programming the different threads cooperate in the computation:
In order to cooperate:
Threads must share information with each other

Intro to multi-threaded programming using Posix Threads

POSIX Threads:

POSIX Threads (a.k.a. pthreads) is a multi-threaded execution model
POSIX Threads allows a program to control multiple different flows of work that overlap in time.

The Pthreads API:
The POSIX Threads defines a standardized API

This API has been implemented in many platforms:

Linux
MacOS
We will study multi-threaded programming using the PThread API
(Windows has "Windows" threads that has a similar API: click here)

Your first multi-threaded C program

Let's write a multi-threaded C program:

#include <stdio.h> #include <pthread.h> void *worker(void *arg) { for (int i = 0; i < 10000000; i++) { for (int j = 0; j < 10000000; j++); // Slow thread down... printf("*"); fflush(stdout); } } int main(int argc, char *argv[]) { pthread_t tid; // This program just prints a lot of dots.... for (int i = 0; i < 10000000; i++) { for (int j = 0; j < 20000000; j++); // Slow main down... printf("."); fflush(stdout); } }

Your first multi-threaded C program

A pthread C program must include the pthread.h header file

#include <stdio.h> #include <pthread.h> void *worker(void *arg) { for (int i = 0; i < 10000000; i++) { for (int j = 0; j < 10000000; j++); // Slow thread down... printf("*"); fflush(stdout); } } int main(int argc, char *argv[]) { pthread_t tid; pthread_create(&tid, NULL, worker, NULL); // Create thread for (int i = 0; i < 10000000; i++) { for (int j = 0; j < 20000000; j++); // Slow main down... printf("."); fflush(stdout); } }

Your first multi-threaded C program

A thread is created using the pthread_create( ) function:

#include <stdio.h> #include <pthread.h> void *worker(void *arg) { for (int i = 0; i < 10000000; i++) { for (int j = 0; j < 10000000; j++); // Slow thread down... printf("*"); fflush(stdout); } } int main(int argc, char *argv[]) { pthread_t tid; // ID of the thread pthread_create(&tid, NULL, worker, NULL); // Create thread for (int i = 0; i < 10000000; i++) { for (int j = 0; j < 20000000; j++); // Slow main down... printf("."); fflush(stdout); } }

The thread will start executing in the worker( ) function

Your first multi-threaded C program

The worker( ) function will be executed independently from the main( ) function:

#include <stdio.h> #include <pthread.h> void *worker(void *arg) { for (int i = 0; i < 10000000; i++) { for (int j = 0; j < 10000000; j++); // Slow thread down... printf("*"); // Print * fflush(stdout); } } int main(int argc, char *argv[]) { pthread_t tid; // ID of the thread pthread_create(&tid, NULL, worker, NULL); // Create thread for (int i = 0; i < 10000000; i++) { for (int j = 0; j < 20000000; j++); // Slow main down... printf("."); fflush(stdout); } }

DEMO: demo/pthread/intro.c --- gcc intro.c -lpthread

Compiling a multi-threaded C program

In order to compile pthread programs, you must:
Link the PThread library in the compile process
The name of the PThread library is:
libpthread.a
To include the PThread library in the compilation, we specify the -lpthread option:
gcc ...... -lpthread
Example: to compile the intro.c demo program:
gcc intro.c -lpthread

DEMO: demo/pthread/intro.c

Visualizing the multi-threaded program

When the program starts execution:

There is only 1 thread in the process

Visualizing the multi-threaded program

Then the execution reaches the pthread_create( ) call:

There is still 1 thread in the process

Visualizing the multi-threaded program

After the pthread_create( ) call finishes:

There is 2 threads executing simulateneously in the process

Difference between PThreads and CUDA threads

PThreads:
A PThread is created to execute some function

Different threads can execute different functions....

Therefore: each thread is independent from each other

Instructions executed by different threads are "unrelated"

CUDA threads:

A grid of threads is launched to execute one kernel function.
Therefore: every thread in the grid will:
execute the same sequence of instructions

Thread ID of a thread

Each thread has a unique
thread id
The function pthread_self( ) returns the thread id of a thread

The main thread is created when a program starts running

The main thread is also a thread and has a thread id

Example:

void *worker(void *arg) { printf("Worker: my thread id = %ld\n", pthread_self()); } int main(int argc, char *argv[]) { pthread_t tid; pthread_create(&tid, NULL, worker, NULL) != 0 ) // Create thread printf("Main: my thread id = %ld\n", pthread_self()); }

DEMO: demo/pthread/thread-id.c

Variable sharing among the threads

Global variables are shared among all threads
All threads uses the same copy of the global variable

Local variables defined in functions executed by a thread are private:

Each thread has its own copy of the local variable

Example:

int N = 4444; // Shared among all threads void *worker(void *arg) { int i = 99; // Each thread has its own copy printf("Worker: N = %d, i = %d\n", N, i); } int main(int argc, char *argv[]) { pthread_t tid; int i = 86; // Each thread has its own copy pthread_create(&tid, NULL, worker, NULL); printf("Worker: N = %d, i = %d\n", N, i); }

DEMO: demo/pthread/sharing-vars.c

Background info: concurrent updates to shared variables

Recall from CS255:

A statement in a high level programming language is translated into multiple assembler/machine instructions

Example:

Statement in High Level Prog Lang Assembler code ----------------------------------------------------------------- N = N + 1 movw r0, #:lower16:N movt r0, #:upper16:N ldr r1, [r0] add r1, r1, #1 str r1, [r0]

Gist:
The execution of a program statement is not "atomic"
I.e.: N = N + 1 in not executed "uninterrupted"
We will now see the effect of concurrent update to a shared variable next

The concurrent updates to shared variables

Consider the concurrent updates to a shared variable N by 2 threads where each thread increments N 100,000 times:

int N = 0; void *worker(void *arg) { for ( int i = 0; i < 100000; i++) N++; } int main(int argc, char *argv[]) { pthread_t tid; pthread_create(&tid, NULL, worker, NULL); for ( int i = 0; i < 100000; i++) N++; sleep(1); printf("Final value: N = %d\n", N); }

Result: you can get different final value for N with different runs...

DEMO: demo/pthread/sharing-vars2.c

Explaining the results of the concurrent updates to shared variables

Recall that the statement N++ (≡ N=N+1) is not executed atomically:

N++ ----> movw r0, #:lower16:N movt r0, #:upper16:N ldr r1, [r0] add r1, r1, #1 str r1, [r0]

In the multi-threaded example program, 2 threads are executing these machine codes simultaneously:

thread 1 -> movw r0, #:lower16:N thread 2 -> movw r0, #:lower16:N | movt r0, #:upper16:N | movt r0, #:upper16:N | ldr r1, [r0] V ldr r1, [r0] V add r1, r1, #1 unsynchronized add r1, r1, #1 str r1, [r0] str r1, [r0]

The unsynchronized execution of the ldr and str machine instructions can cause incorrect updates to a shared variable

Example on next slide...

Explaining the results of the concurrent updates to shared variables

Example of the effect of a unsynchronized execution of parallel threads:

Time Memory Thread 1 Thread 2 ---- --------- -------------- -------------- | N = 12 ldr N --> 12 | add 1 --> 13 ldr N --> 12 | N = 13 str N add 1 --> 13 | N = 13 str N V Missing increment N (because Thread 2 used an "old" value)

Explaination:

Initially, the variable N contains the value 12 (for example)
Thread 1 obtained the value 12 when the ldr instruction was executed because the variable N has the value 12
Thread 1 add 1 to it's copy (in it's register r1) while....
Thread 2 obtained the value 12 when the ldr instruction was executed because the variable N still has the value 12
Thread 1 updates the variable N to 13 while....
Thread 2 add 1 to it's copy (in it's register r1)
Thread 2 updates the variable N to 13