Synchronous vs. Asynchronous CUDA function calls
 

There are 2 kinds of kernel calls (launches):

  • Synchronous CUDA function call:

      • The CPU will launch (= starts execution) a kernel function to run on the GPU and then

      • The CPU's program execution will wait (= pause) until the kernel function completes before executing the next CPU program statement

  • Asynchronous CUDA function call:

      • The CPU will launch (= starts execution) a kernel function to run on the GPU and then
      • The CPU will continue right away with the execution the next CPU program statement

The launching of user-defined kernels is asynchronous

Important fact in CUDA programming:

  • The launching of a (user-defined) kernel function is always asynchronous !!!         

Example:

int main()
{
   /* ------------------------------------
      Call the hello( ) kernel function
      ------------------------------------ */
   hello<<< 1, 4 >>>( );  // Asynchronous !!!    

   // Exec next statement without any waiting:
   printf("I am the CPU: Hello World ! \n");
}

Forcing the CPU execution to wait on an execution on the GPU
 

To force the CPU execution to wait on the termination of all activities on the GPU, use:

   cudaDeviceSynchronize( );

       When the CPU executes

          cudaDeviceSynchronize( );

       the CPU execution will wait until all kernels 
       running on the GPU have completed
  
 

 

I will re-write the hello2.cu program properly using cudaDeviceSynchronize() next

The correct way to write the hello world CUDA program

The Hello World program in CUDA:

#include <stdio.h>
#include <unistd.h>

__global__ void hello( )
{
   printf("Hello World !\n");
}

int main()
{
   hello<<< 1, 4 >>>( );  // Launch

   printf("I am the CPU: Hello World ! \n");
      // Try moving the "printf" statement        
      // after cudaDeviceSynchronize()

   cudaDeviceSynchronize();
}
  

DEMO: /home/cs355001/demo/CUDA/1-intro/hello-sync.cu