|
To compile a single file CUDA program:
nvcc -o prog prog.cu |
To compile a multi files CUDA program:
nvcc -c prog1.cu nvcc -c prog2.cu ... nvcc -c progN.cc nvcc -o prog prog1.o prog2.o ... progN.o |
To run the compiled program:
prog [optionally with arguments] |
(CUDA programs has the file extension .cu)
All CUDA programs must include the cuda.h header file:
#include <cuda.h> |
However: the CUDA compiler (nvcc) will include this header file automatically
Therefore: you do not have to include it...
Compile and run the following hello1.cu CUDA program:
#include <stdio.h> // C programming header file #include <unistd.h> // C programming header file // cude.h is automatically included by nvcc... /* ------------------------------------ Your first kernel (= GPU function) ------------------------------------ */ __global__ void hello( ) { printf("Hello World !\n"); // You don't see this msg... } int main() { hello<<< 1, 4 >>>( ); // launch kernel printf("I am the CPU: Hello World ! \n"); } |
DEMO: /home/cs355001/demo/CUDA/1-intro/hello1.cu -- nvcc hello1.cu
We can see the message when the C main program waits for 1 sec before exiting:
#include <stdio.h> // C programming header file #include <unistd.h> // C programming header file // cude.h is automatically included by nvcc... /* ------------------------------------ Your first kernel (= GPU function) ------------------------------------ */ __global__ void hello( ) { printf("Hello World !\n"); // You don't see this msg... } int main() { hello<<< 1, 4 >>>( ); // launch kernel printf("I am the CPU: Hello World ! \n"); sleep(1); } |
From this experiment, you can see that multiple "things" are happening at the same time !
|
|
CPU: GPU:
------------------ -------------------
hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello"
| using a grid consisting of:
| 1 thread block and
| 4 threads in block
V
CPU continues
exection with
next statement ---> exit....
|
That is why the first version of the hello1.cu does not print the messages from the GPU threads: the main program has terminated before the print messages are received !!
QUIZ: Why did we see 4 lines of messages from the kernel function ???
CPU: GPU:
------------------ -------------------
hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello"
| using a grid consisting of:
| 1 thread block and
| 4 threads in block
V
sleep(1);
We see 4 lines of messages printed by the kernel function
|
QUIZ: Why did we see 4 lines of messages from the kernel function ???
CPU: GPU: ------------------ ------------------- hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello" | using a grid consisting of: | 1 thread block and | 4 threads in block V sleep(1); We see 4 lines of messages printed by the kernel function |
Because a total of 4 threads are executing the same kernel function !!!
Experiment: change the launch code to: <<< 2, 4 >>>