Review: the basic pipelined CPU

Important difference between a pipelined CPU and a "normal" CPU:

The pipelined CPU executes multiple instructions simultaneously while the "normal" CPU executes one instruction at a time.

Problems with the basic pipelined CPU
 

  • Problems when executing multiple instructions simultaneously:

      • Some instructions depend on other instructions !!!      

        Example: the instruction add r4, r4, r1 uses the result computed by instruction add r1, r2, r3

             add r1, r2, r3
             add r4, r4, r1         
          

  • Dependencies between execution of different instructions will require additional hardware support (circuits)

  • We will first examine this data dependency problem with an example

The Read-after-write data hazard problem
 

Consider the following assembler program code fragment:

   add  r1, r2, r4  // Instruction writes register r1
   add  r4, r1, r4  // Instruction reads  register r1
   add  r5, r1, r5
   add  r6, r1, r6 

The first instruction writes (= updates) the register r1

Then the next 3 instruction reads (= uses) the register r1

We call this instruction sequence: read-after-write (because we read a register immediately after writing the register)

The read-after-write construct will cause instruction execution errors in the basic pipelined CPU (that we must solve !!!)

Important fact: the latest time to obtain operands for an instruction

The first moment that the pipelined CPU uses operands of an instruction is in the EX stage:

Therefore: the correct operands must be available when an instruction is inside the EX stage !!!

The Read-after-write data hazard problem - step-by-step

I will explain the read-after-write data hazard using a series of diagrams:

This is the initial state: the pipelined CPU has fetched the instruction add r1,r2,r3.

The Read-after-write data hazard problem - step-by-step

Start of cycle 2: ID fetches operands for add r1,r2,r3 and IF fetches add r4,r1,r4

 

The Read-after-write data hazard problem - step-by-step

End of cycle 2: R2, R3 for add r1,r2,r3 fetched and next instruction add r4,r1,r4 fetched

Note: the correct execution of instruction add r4,r1,r4 must use r1=192+48=240 !!!

The Read-after-write data hazard problem - step-by-step

Start 3: EX computes R2+R3=240, ID fetches R4=1,R1=12 and IF fetches add r5,r1,r5

Notice the result of add r1,r2,r3 (= 240) is not yet available in register R1 !!!

The Read-after-write data hazard problem - step-by-step

End 3: R1+R2 stored in ALUo, R4, (old)R1 fetched and add r5,r1,r5 fetched

Note: the EX stage will execute add r4,r1,r4 using a wrong value (12) for R1 !!!

The Read-after-write data hazard problem - step-by-step

Start 4: MEM does no op, EX computes R4+R1=13 !!, ID fetches R5=2,R1=12 and IF fetches

Notice the result of add r1,r2,r3 (= 240) is still not available in register R1 !!!

The Read-after-write data hazard problem - step-by-step

End 4: R1+R2 ALUo1, R4+(old)R1 ALUo, R5, (old)R1 fetched and add r6,.. fetched

Note: the EX stage will also execute add r5,r1,r5 using a wrong value (12) for R1 !!!

Before we continue....

  • Recall that a register (made with D-flipflops) can be written (=updated):

      • When the clock rises from 0 → 1         or      

      • When the clock falls from 1 → 0

    Graphically:

           

  • The pipelined CPU will:

      • Update the general purpose registers during the rising clock edge
      • Update the special purpose registers in the pipeline stages during the falling clock edge

    This will shorten the read-after-write error condition by 1 instruction !!!

The Read-after-write data hazard problem - step-by-step

Start 5: WB write R1, MEM no op, EX comp: R5+R1=14 !!, ID fetch R5,R1 and IF fetch

Notice the result of add r1,r2,r3 (= 240) is not yet available in register R1... will change !

The Read-after-write data hazard problem - step-by-step

Mid cycle 5: WB completes the writing of R1:

Notice the result of add r1,r2,r3 (= 240) is now available in register R1... and being fetched !

The Read-after-write data hazard problem - step-by-step

Later in cycle 5: The new value in register R1 arrives at the input of the special register

The ID stage can fetch the correct value for source register R1 !!!

The Read-after-write data hazard problem - step-by-step

End 5: R2+R3 R1, R4+(old)R1 ALUo1, R5+(old)R1 ALUo, R5,(new)R1 fetched

Conclussion: 2 instructions following add r1,r2,r3 is unable to fetch the updated R1 value

DEMO (using Aaron's pipelined CPU)

  • Execute this command on a lab machine:

        /home/cs355001/demo/pipeline/4a-ALU-hazard    
      

    Program being executed:

              10  12    // mov r1,#12
              18  192   // mov r2,#192
              26  48    // mov r3,#48
              34  1     // mov r4,#1
              42  2     // mov r5,#2
              50  3     // mov r6,#3
              58  4     // mov r7,#4
              0   0     // nop
              0   0     // nop
              0   0     // nop
              0   0     // nop
              8   19    // add r1,r2,r3  (R1=R2+R3)
              32  33    // add r4,r1,r4  (R4 = R1 + R4)   ** These instructions
              40  41    // add r5,r1,r5  (R5 = R1 + R5)   ** will use an old value
              48  49    // add r6,r1,r6  (R6 = R1 + R6)
              56  57    // add r7,r1,r7  (R7 = R1 + R7)