Brief history of GPU computing

Before 2006:

Graphics cards were designed perform specialized graphics rendering functions
"Graphic Accerelation" Hardware performs the rendering algorithm
Not programmable (i.e.: can only perform image rendering)

2006: introduction of the general purpose Graphics Processing Unit (GPGPU) to save on hardware cost:

The GPU can now be programmed using a high-level programming language (C)
The rendering algorithm is executed using software
Library functions were provided to support GPU programming

For more info: click here

The CUDA Architecture

The General Purpose GPU computing platform of NVidia is called:
CUDA
CUDA = acronym for Compute Unified Device Architecture
CUDA is:
a (vector) parallel computing platform (= architecture) and

an application programming interface (API)
created by Nvidia (a graphics card company).
This architecture is still evolving....
2020: Ampere
2022: Hopper
2024: Blackwell

The CUDA architecture

The CUDA architecture makes the GPU computer as a subsystem of a CPU (host) computer:

The CUDA architecture

The CPU system and the GPU (sub)system have their own (independent) memories:

The CUDA architecture

Consequently: data must be transfered between RAM and device memory (using DMA):

CUDA programming

A CUDA (computer) program consists of 2 parts:

(1) a CPU part of the CUDA program stored in RAM (and executed by the CPU)

CUDA programming

A CUDA (computer) program consists of 2 parts:

(2) a GPU part of the CUDA program stored in the device memory (and executed by the GPU)

How to write the CPU program part of a CUDA program

The CPU program part of a CUDA program is typically written in the C programming language and can use functions in the C library

Example:

int main( ) { printf("Hello World\n"); // Standard C programming statements // Can use functions in C library }

How to write the GPU program part of a CUDA program

The GPU program part of a CUDA program in written in the "CUDA C" programming language and can use functions in the "CUDA C" library

Example:

__global__ void hello( ) { printf("Hello World\n"); // CUDA C code // Uses printf( ) in CUDA C library ! }

Note: CUDA C is designed to be similar to the C language to make it easy to learn !!!

How does the GPU execute a CUDA function

A CUDA function is executed by multiple threads (of execution):
All threads will execute the same series of instructions ( "SIMT")
A thread is executed on a "core" (a.k.a. CUDA core) --- explained next
There can be more threads than cores.
A core will then alternate executions of different threads

Threads

A thread:
is a unit of execution
implemented using an execution context
Thread context:
contains all the information a thread needs to resume execution
consists of: (information in registers)
The Program Counter
The values in all of the registers
The stack (or: the stack pointer)

Difference between thread context and process context:

Context switching between processes is done by the operating system and takes more time
Context switching between threads does not require an operating system call and takes less time