Review: the general shape of a execution configuration
 

The dim3 data type

  • CUDA provides the dim3 data type to allow the programmer to define the shape of the execution configuration

    Syntax:

         dim3 myShape = dim3( xDim, yDim, zDim );   
      

  • I will now discuss the general usage of dim3...

    However, to keep things simple:

      • I will use a 2 dimensional shape (the most general shape is 3 dimensional...)

A 2-dimensional grid shape
 

A example of a 2-dimensional grid shape:

  • A 3×2 grid:

           

 

A 2-dimensional grid shape
 

How to define a 3×2 grid shape in CUDA:

  • dim3 gridShape = dim3( 3, 2 );

           

 

A 2-dimensional grid shape
 

The values of the identifying variables of each thread block in the 3×2 grid shape:

  • dim3 gridShape = dim3( 3, 2 );

           

 

A 2-dimensional (thread) block shape
 

A example of a 2-dimensional (thread) block shape:

  • A 2×3 thread block:

           

 

A 2-dimensional (thread) block shape
 

How to define a 2×3 thread block shape in CUDA:

  • dim3 blockShape = dim3( 2, 3 );

           

 

A 2-dimensional (thread) block shape
 

The values of the identifying variables of each thread in the 2×3 thread block shape:

  • dim3 blockShape = dim3( 2, 3 );

           

 

A specific thread in a 3×2 grid and 2×3 thread block - Example 1
 

The values of the identifying variables of a (specific) thread in the 2×3 block inside a 3×2 grid:

  • dim3 blockShape = dim3( 2, 3 );

           

 

A specific thread in a 3×2 grid and 2×3 thread block - Example 2
 

The values of the identifying variables of a (specific) thread in the 2×3 block inside a 3×2 grid:

  • dim3 blockShape = dim3( 2, 3 );

           

 

Demo to show the identifying variables of a 2-dim grid + thread block

DEMO program that shows the identifying variables of threads in a 2-dim grid and thread block:

#include <stdio.h>
#include <unistd.h>

__global__ void hello( )
{
   printf("grid coord: (%d,%d), thread coord: (%d,%d),
           grid dim:   (%d,%d), block dim:    (%d,%d)\n",
           blockIdx.x, blockIdx.y, threadIdx.x, threadIdx.y,
           gridDim.x, gridDim.y, blockDim.x, blockDim.y);
}

int main()
{
   dim3 gridShape  = dim3( 3, 2 );
   dim3 blockShape = dim3( 2, 3 );

   hello<<< gridShape, blockShape>>>( );

   printf("I am the CPU: Hello World ! \n");
   cudaDeviceSynchronize();
} 

DEMO: /home/cs355001/demo/CUDA/1-intro/hello-2dim.cu

How to obtain a unique ID from the 2-dim identifying variables

Review: how to compute a unique ID for the x-dimension

  • We have computed a unique identifier for each thread in the x-dimension using:

       uniqueID(x) = blockIdx.x × blockDim.x + threadIdx.x
      
                 
      
       gridDim.x   = # blocks in the grid (in the first dimension)
       blockIdx.x  = block index of the (current) thread in grid
      
       blockDim.x  = # threads in (each of) the (thread) block
                     (in the first dimension)
       threadIdx.x = thread index of the (current) thread in block
      

How to obtain a unique ID from the 2-dim identifying variables

The expression blockIdx.x × blockDim.x + threadIdx.x can be used to compute uniqueID for x-dimension in a 2 dimensional shaped execution configuration :

Row 1 (for all columns): 0 × 2 + 0 = 0
Row 2 (for all columns): 0 × 2 + 1 = 1
Row 3 (for all columns): 1 × 2 + 0 = 3       and so on

How to obtain a unique ID from the 2-dim identifying variables

We can use a similar procedure to compute a unique ID for the y-dimension:

  • We can compute a unique identifier for each thread in the y-dimension as follows:

       uniqueID(y) = blockIdx.y × blockDim.y + threadIdx.y
      
                 
      
       gridDim.y   = # blocks in the grid (in the 2nd dimension)
       blockIdx.y  = block index of the (current) thread in grid
      
       blockDim.y  = # threads in (each of) the (thread) block
                     (in the 2nd dimension)
       threadIdx.y = thread index of the (current) thread in block
      

How to obtain a unique ID from the 2-dim identifying variables

The expression blockIdx.y × blockDim.y + threadIdx.y can be used to compute uniqueID for y-dimension in a 2 dimensional shaped execution configuration :

Column 1 (for all rows): 0 × 3 + 0 = 0
Column 2 (for all rows): 0 × 3 + 1 = 1
Column 3 (for all rows): 1 × 3 + 2 = 2      
Column 4 (for all rows): 1 × 3 + 0 = 3       and so on

Demo to show the identifying variables of a 2-dim grid + thread block

DEMO program that shows how to compute unique ID for threads in a 2-dim grid and thread block:

#include <stdio.h>
#include <unistd.h>

__global__ void hello( )
{
   printf("blockIdx:(%d,%d), threadIdx:(%d,%d) -> Row,Col=(%d,%d)\n",
           blockIdx.x, blockIdx.y, threadIdx.x, threadIdx.y,
           blockIdx.x * blockDim.x + threadIdx.x,   // rowID    
           blockIdx.y * blockDim.y + threadIdx.y);  // columnID 
}

int main()
{
   dim3 blockShape = dim3( 2, 3 );
   dim3 gridShape  = dim3( 3, 2 );

   hello<<< gridShape, blockShape>>>( );

   printf("I am the CPU: Hello World ! \n");
   cudaDeviceSynchronize();
} 

DEMO: /home/cs355001/demo/CUDA/1-intro/hello-2dim-ID.cu

3-dimensional grid and thread blocks

  • You can use 3 dimensional (cube) shaped grids and thread blocks

  • I will not discuss the details of a 3 dimensional (cube) shaped execution configuration to save time

  • It it's similar to the 2-D example .

    Example:

        dim3  gridShape   = dim (2,3,4);
        dim3  threadShape = dim (3,2,4);
      
        uniqueID(thread) = (uniqueID(x),uniqueID(y),uniqueID(z))
       
        uniqueID(x) = blockIdx.x × blockDim.x + threadIdx.x
        uniqueID(y) = blockIdx.y × blockDim.y + threadIdx.y
        uniqueID(z) = blockIdx.z × blockDim.z + threadIdx.z