Review: the general shape of a execution configuration

The dim3 data type

CUDA provides the dim3 data type to allow the programmer to define the shape of the execution configuration
Syntax:
I will now discuss the general usage of dim3...
However, to keep things simple:

A 2-dimensional grid shape

A example of a 2-dimensional grid shape:

A 3×2 grid:

A 2-dimensional grid shape

How to define a 3×2 grid shape in CUDA:

dim3 gridShape = dim3( 3, 2 );

A 2-dimensional grid shape

The values of the identifying variables of each thread block in the 3×2 grid shape:

dim3 gridShape = dim3( 3, 2 );

A 2-dimensional (thread) block shape

A example of a 2-dimensional (thread) block shape:

A 2×3 thread block:

A 2-dimensional (thread) block shape

How to define a 2×3 thread block shape in CUDA:

dim3 blockShape = dim3( 2, 3 );

A 2-dimensional (thread) block shape

The values of the identifying variables of each thread in the 2×3 thread block shape:

dim3 blockShape = dim3( 2, 3 );

A specific thread in a 3×2 grid and 2×3 thread block - Example 1

The values of the identifying variables of a (specific) thread in the 2×3 block inside a 3×2 grid:

dim3 blockShape = dim3( 2, 3 );

A specific thread in a 3×2 grid and 2×3 thread block - Example 2

The values of the identifying variables of a (specific) thread in the 2×3 block inside a 3×2 grid:

dim3 blockShape = dim3( 2, 3 );

Demo to show the identifying variables of a 2-dim grid + thread block

DEMO program that shows the identifying variables of threads in a 2-dim grid and thread block:

#include <stdio.h> #include <unistd.h> __global__ void hello( ) { printf("grid coord: (%d,%d), thread coord: (%d,%d), grid dim: (%d,%d), block dim: (%d,%d)\n", blockIdx.x, blockIdx.y, threadIdx.x, threadIdx.y, gridDim.x, gridDim.y, blockDim.x, blockDim.y); } int main() { dim3 gridShape = dim3( 3, 2 ); dim3 blockShape = dim3( 2, 3 ); hello<<< gridShape, blockShape>>>( ); printf("I am the CPU: Hello World ! \n"); cudaDeviceSynchronize(); }

DEMO: /home/cs355001/demo/CUDA/1-intro/hello-2dim.cu

How to obtain a unique ID from the 2-dim identifying variables

Review: how to compute a unique ID for the x-dimension

We have computed a unique identifier for each thread in the x-dimension using:

uniqueID(x) = blockIdx.x × blockDim.x + threadIdx.x gridDim.x = # blocks in the grid (in the first dimension) blockIdx.x = block index of the (current) thread in grid blockDim.x = # threads in (each of) the (thread) block (in the first dimension) threadIdx.x = thread index of the (current) thread in block

How to obtain a unique ID from the 2-dim identifying variables

The expression blockIdx.x × blockDim.x + threadIdx.x can be used to compute uniqueID for x-dimension in a 2 dimensional shaped execution configuration :

Row 1 (for all columns): 0 × 2 + 0 = 0
Row 2 (for all columns): 0 × 2 + 1 = 1
Row 3 (for all columns): 1 × 2 + 0 = 3 and so on

How to obtain a unique ID from the 2-dim identifying variables

We can use a similar procedure to compute a unique ID for the y-dimension:

We can compute a unique identifier for each thread in the y-dimension as follows:

uniqueID(y) = blockIdx.y × blockDim.y + threadIdx.y gridDim.y = # blocks in the grid (in the 2nd dimension) blockIdx.y = block index of the (current) thread in grid blockDim.y = # threads in (each of) the (thread) block (in the 2nd dimension) threadIdx.y = thread index of the (current) thread in block

How to obtain a unique ID from the 2-dim identifying variables

The expression blockIdx.y × blockDim.y + threadIdx.y can be used to compute uniqueID for y-dimension in a 2 dimensional shaped execution configuration :

Column 1 (for all rows): 0 × 3 + 0 = 0
Column 2 (for all rows): 0 × 3 + 1 = 1
Column 3 (for all rows): 1 × 3 + 2 = 2
Column 4 (for all rows): 1 × 3 + 0 = 3 and so on

Demo to show the identifying variables of a 2-dim grid + thread block

DEMO program that shows how to compute unique ID for threads in a 2-dim grid and thread block:

#include <stdio.h> #include <unistd.h> __global__ void hello( ) { printf("blockIdx:(%d,%d), threadIdx:(%d,%d) -> Row,Col=(%d,%d)\n", blockIdx.x, blockIdx.y, threadIdx.x, threadIdx.y, blockIdx.x * blockDim.x + threadIdx.x, // rowID blockIdx.y * blockDim.y + threadIdx.y); // columnID } int main() { dim3 blockShape = dim3( 2, 3 ); dim3 gridShape = dim3( 3, 2 ); hello<<< gridShape, blockShape>>>( ); printf("I am the CPU: Hello World ! \n"); cudaDeviceSynchronize(); }

DEMO: /home/cs355001/demo/CUDA/1-intro/hello-2dim-ID.cu

3-dimensional grid and thread blocks

You can use 3 dimensional (cube) shaped grids and thread blocks
I will not discuss the details of a 3 dimensional (cube) shaped execution configuration to save time

It it's similar to the 2-D example .

Example:

dim3 gridShape = dim (2,3,4); dim3 threadShape = dim (3,2,4); uniqueID(thread) = (uniqueID(x),uniqueID(y),uniqueID(z)) uniqueID(x) = blockIdx.x × blockDim.x + threadIdx.x uniqueID(y) = blockIdx.y × blockDim.y + threadIdx.y uniqueID(z) = blockIdx.z × blockDim.z + threadIdx.z