There are 2 kinds of kernel calls (launches):
|
Important fact in CUDA programming:
|
Example:
int main()
{
/* ------------------------------------
Call the hello( ) kernel function
------------------------------------ */
hello<<< 1, 4 >>>( ); // Asynchronous !!!
// Exec next statement without any waiting:
printf("I am the CPU: Hello World ! \n");
}
|
To force the CPU execution to wait on the termination of all activities on the GPU, use:
cudaDeviceSynchronize( );
When the CPU executes
cudaDeviceSynchronize( );
the CPU execution will wait until all kernels
running on the GPU have completed
|
I will re-write the hello2.cu program properly using cudaDeviceSynchronize() next
The Hello World program in CUDA:
#include <stdio.h>
#include <unistd.h>
__global__ void hello( )
{
printf("Hello World !\n");
}
int main()
{
hello<<< 1, 4 >>>( ); // Launch
printf("I am the CPU: Hello World ! \n");
// Try moving the "printf" statement
// after cudaDeviceSynchronize()
cudaDeviceSynchronize();
}
|
DEMO: /home/cs355001/demo/CUDA/1-intro/hello-sync.cu