There are 2 kinds of kernel calls (launches):
|
Important fact in CUDA programming:
|
Example:
int main() { /* ------------------------------------ Call the hello( ) kernel function ------------------------------------ */ hello<<< 1, 4 >>>( ); // Asynchronous !!!// Exec next statement without any waiting: printf("I am the CPU: Hello World ! \n"); } |
To force the CPU execution to wait on the termination of all activities on the GPU, use:
cudaDeviceSynchronize( );
When the CPU executes
cudaDeviceSynchronize( );
the CPU execution will wait until all kernels
running on the GPU have completed
|
I will re-write the hello2.cu program properly using cudaDeviceSynchronize() next
The Hello World program in CUDA:
#include <stdio.h>
#include <unistd.h>
__global__ void hello( )
{
printf("Hello World !\n");
}
int main()
{
hello<<< 1, 4 >>>( ); // Launch
printf("I am the CPU: Hello World ! \n");
// Try moving the "printf" statement
// after cudaDeviceSynchronize()
cudaDeviceSynchronize();
}
|
DEMO: /home/cs355001/demo/CUDA/1-intro/hello-sync.cu