Compiling CUDA programs

We provided CUDA capable machines for CS355 students to learn CUDA programming:
How to access the CUDA capable machines:

Compiling CUDA programs

To compile a single file CUDA program:

nvcc -o prog prog.cu

To compile a multi files CUDA program:

nvcc -c prog1.cu nvcc -c prog2.cu ... nvcc -c progN.cc nvcc -o prog prog1.o prog2.o ... progN.o

To run the compiled program:

prog [optionally with arguments]

(CUDA programs has the file extension .cu)

The CUDA header file

All CUDA programs must include the cuda.h header file:

#include <cuda.h>

However: the CUDA compiler (nvcc) will include this header file automatically

Therefore: you do not have to include it...

My "Hello World" programs in CUDA

Compile and run the following hello1.cu CUDA program:

#include <stdio.h> // C programming header file #include <unistd.h> // C programming header file // cude.h is automatically included by nvcc... /* ------------------------------------ Your first kernel (= GPU function) ------------------------------------ */ __global__ void hello( ) { printf("Hello World !\n"); // You don't see this msg... } int main() { hello<<< 1, 4 >>>( ); // launch kernel printf("I am the CPU: Hello World ! \n"); }

DEMO: /home/cs355001/demo/CUDA/1-intro/hello1.cu -- nvcc hello1.cu

My "Hello World" programs in CUDA

We can see the message when the C main program waits for 1 sec before exiting:

#include <stdio.h> // C programming header file #include <unistd.h> // C programming header file // cude.h is automatically included by nvcc... /* ------------------------------------ Your first kernel (= GPU function) ------------------------------------ */ __global__ void hello( ) { printf("Hello World !\n"); // You don't see this msg... } int main() { hello<<< 1, 4 >>>( ); // launch kernel printf("I am the CPU: Hello World ! \n"); sleep(1); }

From this experiment, you can see that multiple "things" are happening at the same time !

CUDA C programs

A CUDA C program contains:
Kernel function:
A CUDA C program starts its execition in the (ordinary) C function main

Kernel functions

Syntax used to define kernel functions (return type must be void):

__global__ void C-function-definition-syntax Example: __global__ void hello( ) { printf("Hello World !\n"); }

How to run a kernel function on the GPU (a.k.a. a launching a kernel):

/* ------------------------------------------------ Call the kernel "hello( )" using a "1,4 grid" ------------------------------------------------ */ hello<<< 1, 4 >>>( );

Behavior of a kernel launch operation

CPU: GPU: ------------------ ------------------- hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello" | using a grid consisting of: | 1 thread block and | 4 threads in block V CPU continues exection with next statement ---> exit....

That is why the first version of the hello1.cu does not print the messages from the GPU threads: the main program has terminated before the print messages are received !!

Behavior of a kernel launch operation

QUIZ: Why did we see 4 lines of messages from the kernel function ???

CPU: GPU: ------------------ ------------------- hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello" | using a grid consisting of: | 1 thread block and | 4 threads in block V sleep(1); We see 4 lines of messages printed by the kernel function

Behavior of a kernel launch operation

QUIZ: Why did we see 4 lines of messages from the kernel function ???

CPU: GPU: ------------------ ------------------- hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello" | using a grid consisting of: | 1 thread block and | 4 threads in block V sleep(1); We see 4 lines of messages printed by the kernel function

Because a total of 4 threads are executing the same kernel function !!!

Experiment: change the launch code to: <<< 2, 4 >>>