Compiling CUDA programs
 

  • We provided CUDA capable machines for CS355 students to learn CUDA programming:

      • lab1a, lab1b, ... lab1h

  • How to access the CUDA capable machines:

      • Login to lab0z.mathcs.emory.edu

      • On lab0z, use this command to login:

             ssh -X lab1a    or   ssh -X lab1b      etc... 
          

Compiling CUDA programs

To compile a single file CUDA program:

  nvcc -o prog  prog.cu          

To compile a multi files CUDA program:

  nvcc -c  prog1.cu         
  nvcc -c  prog2.cu   
  ...
  nvcc -c  progN.cc

  nvcc -o  prog  prog1.o prog2.o ... progN.o   
 

To run the compiled program:

  prog [optionally with arguments]          

(CUDA programs has the file extension .cu)

The CUDA header file
 

All CUDA programs must include the cuda.h header file:

  #include <cuda.h>
  
 

However: the CUDA compiler (nvcc) will include this header file automatically

Therefore: you do not have to include it...

My "Hello World" programs in CUDA

Compile and run the following hello1.cu CUDA program:

#include <stdio.h>   // C programming header file
#include <unistd.h>  // C programming header file
                     // cude.h is automatically included by nvcc...

/* ------------------------------------
   Your first kernel (= GPU function)
   ------------------------------------ */
__global__ void hello( )
{
   printf("Hello World !\n");  // You don't see this msg...
}

int main()
{
   hello<<< 1, 4 >>>( ); // launch kernel

   printf("I am the CPU: Hello World ! \n");
  
} 

DEMO: /home/cs355001/demo/CUDA/1-intro/hello1.cu -- nvcc hello1.cu

My "Hello World" programs in CUDA

We can see the message when the C main program waits for 1 sec before exiting:

#include <stdio.h>   // C programming header file
#include <unistd.h>  // C programming header file
                     // cude.h is automatically included by nvcc...

/* ------------------------------------
   Your first kernel (= GPU function)
   ------------------------------------ */
__global__ void hello( )
{
   printf("Hello World !\n");  // You don't see this msg...
}

int main()
{
   hello<<< 1, 4 >>>( ); // launch kernel

   printf("I am the CPU: Hello World ! \n");
   sleep(1);
} 

From this experiment, you can see that multiple "things" are happening at the same time !

CUDA C programs
 

  • A CUDA C program contains:

      • "Ordinary" C functions        
      • Kernel functions

  • Kernel function:

      • Kernel function = a function that is (stored and) run on the Graphics Processing Unit (GPU)

  • A CUDA C program starts its execition in the (ordinary) C function main

Kernel functions

  • Syntax used to define kernel functions (return type must be void):

        __global__  void  C-function-definition-syntax   
      
       Example: 
      
        __global__ void hello( )
        {
           printf("Hello World !\n");
        } 

  • How to run a kernel function on the GPU (a.k.a. a launching a kernel):

         /* ------------------------------------------------
            Call the kernel "hello( )" using a "1,4  grid"
            ------------------------------------------------ */ 
         hello<<< 1, 4 >>>( );             
      

Behavior of a kernel launch operation
 

 

  CPU:                      GPU:
 ------------------        -------------------
 hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello" 
      |                    using a grid consisting of:
      |                           1 thread block  and
      |			          4 threads in block
      V
 CPU continues
 exection with 
 next statement ---> exit....
  
 

That is why the first version of the hello1.cu does not print the messages from the GPU threads: the main program has terminated before the print messages are received !!

Behavior of a kernel launch operation
 

QUIZ: Why did we see 4 lines of messages from the kernel function ???

  CPU:                      GPU:
 ------------------        -------------------
 hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello" 
      |                    using a grid consisting of:
      |                           1 thread block  and
      |			          4 threads in block
      V
 sleep(1);

 We see 4 lines of messages printed by the kernel function
  
 

 

Behavior of a kernel launch operation
 

QUIZ: Why did we see 4 lines of messages from the kernel function ???

  CPU:                      GPU:
 ------------------        -------------------
 hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello" 
      |                    using a grid consisting of:
      |                           1 thread block  and
      |			          4 threads in block
      V
 sleep(1);

 We see 4 lines of messages printed by the kernel function
  
 

Because a total of 4 threads are executing the same kernel function !!!

 

Experiment: change the launch code to: <<< 2, 4 >>>