Thread of execution in a program

  • A running program is commonly known as:

    • A process

  • Each process has at least one (1) thread of execution:

  • Executing a C/Java program:

    • Initially, there is one thread of execution

    • The thread of execution starts with the main( ) function

    • The thread of execution follows the program flow in the C/Java program

Review:   Threads

  • A thread:

    • is a unit of execution
    • implemented using an execution context

  • Thread context:

    • contains all the information a thread needs to resume execution

    consists of: (information in registers)

    • The Program Counter
    • The values in all of the registers
    • The stack (or: the stack pointer)

  • Difference between thread context and process context:

    • Context switching between processes is done by the operating system and takes more time

    • Context switching between threads does not require an operating system call and takes less time

Single-threaded execution

  • A process that has one thread of execution has a:

    • Single-threaded execution

  • A Shared Memory MIMD computer can execute multiple single-threaded processes:

  • As we have discussed before:

    • This is not parallel programming

Multi-threaded execution and parallel programming

  • A process that has more than one thread of execution has a:

    • Multi-threaded execution

  • In parallel programming the different threads cooperate in the computation:

  • In order to cooperate:

    • Threads must share information with each other

Intro to multi-threaded programming using Posix Threads

  • POSIX Threads:

    • POSIX Threads (a.k.a. pthreads) is a multi-threaded execution model

    • POSIX Threads allows a program to control multiple different flows of work that overlap in time.

  • The Pthreads API:

    • The POSIX Threads defines a standardized API

    • This API has been implemented in many platforms:

      • Linux
      • MacOS

  • We will study multi-threaded programming using the PThread API

    (Windows has "Windows" threads that has a similar API: click here)

Your first multi-threaded C program

  • Let's write a multi-threaded C program:

    #include <stdio.h>
    #include <pthread.h>
    
    void *worker(void *arg)
    {
        for (int i = 0; i < 10000000; i++)
        {
            for (int j = 0; j < 10000000; j++);   // Slow thread down...
            printf("*");
            fflush(stdout);
        }
    }
    
    int main(int argc, char *argv[])
    {
        pthread_t tid;
    
        // This program just prints a lot of dots....
    
        for (int i = 0; i < 10000000; i++)
        {
            for (int j = 0; j < 20000000; j++);   // Slow main down...
            printf(".");
            fflush(stdout);
        }
    }

     

Your first multi-threaded C program

  • A pthread C program must include the pthread.h header file

    #include <stdio.h>
    #include <pthread.h>
    
    void *worker(void *arg)
    {
        for (int i = 0; i < 10000000; i++)
        {
            for (int j = 0; j < 10000000; j++);   // Slow thread down...
            printf("*");
            fflush(stdout);
        }
    }
    
    int main(int argc, char *argv[])
    {
        pthread_t tid;
    
        pthread_create(&tid, NULL, worker, NULL);  // Create thread
    
        for (int i = 0; i < 10000000; i++)
        {
            for (int j = 0; j < 20000000; j++);   // Slow main down...
            printf(".");
            fflush(stdout);
        }
    }

     

Your first multi-threaded C program

  • A thread is created using the pthread_create( ) function:

    #include <stdio.h>
    #include <pthread.h>
    
    void *worker(void *arg)
    {
        for (int i = 0; i < 10000000; i++)
        {
            for (int j = 0; j < 10000000; j++);   // Slow thread down...
            printf("*");
            fflush(stdout);
        }
    }
    
    int main(int argc, char *argv[])
    {
        pthread_t tid;  // ID of the thread
    
        pthread_create(&tid, NULL, worker, NULL);  // Create thread
    
        for (int i = 0; i < 10000000; i++)
        {
            for (int j = 0; j < 20000000; j++);   // Slow main down...
            printf(".");
            fflush(stdout);
        }
    }

    The thread will start executing in the worker( ) function

Your first multi-threaded C program

  • The worker( ) function will be executed independently from the main( ) function:

    #include <stdio.h>
    #include <pthread.h>
    
    void *worker(void *arg)
    {
        for (int i = 0; i < 10000000; i++)
        {
            for (int j = 0; j < 10000000; j++);   // Slow thread down...
            printf("*");                          // Print *
            fflush(stdout);
        }
    }
    
    int main(int argc, char *argv[])
    {
        pthread_t tid;  // ID of the thread
    
        pthread_create(&tid, NULL, worker, NULL);  // Create thread
    
        for (int i = 0; i < 10000000; i++)
        {
            for (int j = 0; j < 20000000; j++);   // Slow main down...
            printf(".");
            fflush(stdout);
        }
    }

    DEMO: demo/pthread/intro.c --- gcc intro.c -lpthread

Compiling a multi-threaded C program

  • In order to compile pthread programs, you must:

    • Link the PThread library in the compile process

  • The name of the PThread library is:

       libpthread.a
    

  • To include the PThread library in the compilation, we specify the -lpthread option:

        gcc  ......     -lpthread
    

    Example: to compile the intro.c demo program:

        gcc  intro.c  -lpthread
    

DEMO: demo/pthread/intro.c

Visualizing the multi-threaded program

  • When the program starts execution:

    There is only 1 thread in the process

Visualizing the multi-threaded program

  • Then the execution reaches the pthread_create( ) call:

    There is still 1 thread in the process

Visualizing the multi-threaded program

  • After the pthread_create( ) call finishes:

    There is 2 threads executing simulateneously in the process

Difference between PThreads and CUDA threads

  • PThreads:

    • A PThread is created to execute some function

      • Different threads can execute different functions....

    • Therefore: each thread is independent from each other

      • Instructions executed by different threads are "unrelated"

  • CUDA threads:

    • A grid of threads is launched to execute one kernel function.

    • Therefore: every thread in the grid will:

      • execute the same sequence of instructions

Thread ID of a thread

  • Each thread has a unique thread id

    • The function pthread_self( ) returns the thread id of a thread

  • The main thread is created when a program starts running

    • The main thread is also a thread and has a thread id

    Example:

    void *worker(void *arg)
    {   
        printf("Worker: my thread id = %ld\n", pthread_self());
    }
    
    int main(int argc, char *argv[])
    {   
        pthread_t tid;
    
        pthread_create(&tid, NULL, worker, NULL) != 0 )  // Create thread
    
        printf("Main: my thread id = %ld\n", pthread_self());
    }   

DEMO: demo/pthread/thread-id.c

Variable sharing among the threads

  • Global variables are shared among all threads

    • All threads uses the same copy of the global variable

  • Local variables defined in functions executed by a thread are private:

    • Each thread has its own copy of the local variable

    Example:

    int N = 4444;               // Shared among all threads
    
    void *worker(void *arg)
    {
        int i = 99;             // Each thread has its own copy
        printf("Worker: N = %d, i = %d\n", N, i);
    }
    
    int main(int argc, char *argv[])
    {
        pthread_t tid;
        int i = 86;             // Each thread has its own copy
        pthread_create(&tid, NULL, worker, NULL);
        printf("Worker: N = %d, i = %d\n", N, i);
    }

DEMO: demo/pthread/sharing-vars.c

Background info:   concurrent updates to shared variables

  • Recall from CS255:

    • A statement in a high level programming language is translated into multiple assembler/machine instructions

    Example:

         Statement in High Level Prog Lang         Assembler code
        -----------------------------------------------------------------
                 N = N + 1                         movw  r0, #:lower16:N
                                                   movt  r0, #:upper16:N
    					       ldr   r1, [r0]
    					       add   r1, r1, #1
    					       str   r1, [r0]
    

  • Gist:

    • The execution of a program statement is not "atomic"

      I.e.: N = N + 1 in not executed "uninterrupted"

    We will now see the effect of concurrent update to a shared variable next

The concurrent updates to shared variables

  • Consider the concurrent updates to a shared variable N by 2 threads where each thread increments N 100,000 times:

    int N = 0;
    
    void *worker(void *arg)
    {
        for ( int i = 0; i < 100000; i++)
            N++;
    }
    
    int main(int argc, char *argv[])
    {
        pthread_t tid;
    
        pthread_create(&tid, NULL, worker, NULL);
    
        for ( int i = 0; i < 100000; i++)
            N++;
    
        sleep(1);
        printf("Final value: N = %d\n", N);
    }

  • Result: you can get different final value for N with different runs...

DEMO: demo/pthread/sharing-vars2.c

Explaining the results of the concurrent updates to shared variables

  • Recall that the statement N++ (≡ N=N+1) is not executed atomically:

      N++   ---->  movw  r0, #:lower16:N
                   movt  r0, #:upper16:N
    	       ldr   r1, [r0]
    	       add   r1, r1, #1
    	       str   r1, [r0]
    

  • In the multi-threaded example program, 2 threads are executing these machine codes simultaneously:

     thread 1 -> movw  r0, #:lower16:N   thread 2 -> movw  r0, #:lower16:N
        |        movt  r0, #:upper16:N	    |    movt  r0, #:upper16:N
        |	     ldr   r1, [r0]		    V	 ldr   r1, [r0]
        V	     add   r1, r1, #1	unsynchronized	 add   r1, r1, #1
    	     str   r1, [r0]			 str   r1, [r0]
    

    The unsynchronized execution of the ldr and str machine instructions can cause incorrect updates to a shared variable

    Example on next slide...

Explaining the results of the concurrent updates to shared variables

  • Example of the effect of a unsynchronized execution of parallel threads:

      Time        Memory         Thread 1           Thread 2
      ----       ---------    --------------     --------------
        |         N = 12       ldr N --> 12
        |                      add 1 --> 13       ldr N --> 12
        |	      N = 13	   str N              add 1 --> 13
        |         N = 13                          str N
        V     Missing increment N (because Thread 2 used an "old" value)
    

  • Explaination:

    • Initially, the variable N contains the value 12 (for example)

    • Thread 1 obtained the value 12 when the ldr instruction was executed because the variable N has the value 12

    • Thread 1 add 1 to it's copy (in it's register r1) while....
      Thread 2 obtained the value 12 when the ldr instruction was executed because the variable N still has the value 12

    • Thread 1 updates the variable N to 13 while....
      Thread 2 add 1 to it's copy (in it's register r1)

    • Thread 2 updates the variable N to 13