Notice that:
|
Conclussion:
|
We need to solve this problem first !!!
|
|
|
|
We have just discovered a stall condition (when we studied the LDR instruction):
|
if ( MEM stage contains a LDR instruction && EX stage contains an ALU/LDR/STR instruction && ( Destination register of MEM stage == src1 reg of EX stage || Destination register of MEM stage == src2 reg of EX stage ) ) { STALL: EX stage, ID stage and IF stage } |
The circuitry used to detect the Read-after-Write stall condition is:
Note:
|
|
|
|
Now let us see how the improved pipelined CPU correctly handle the read after write data hazard caused by the LOAD instruction
Slideshow:
ldr r1, [r2+r3] R1=240, R2=4, R3=24, R4=1, R5=2, R6=3, R7=4
add r4, r1, r4 Memory[28] = 11111111 00000000
add r5, r1, r5
add r6, r1, r6
add r7, r1, r7
...
// Originally: R1 = 00001111 00000000 ldr r1, [r2+r3] // Instr will update R1 := memory[R2+R3] // or R1 := 11110000 00000000 Then subsequent instructions will use R1=11110000 00000000: add r4, r1, r4 // R4 = 11110000 00000001 add r5, r1, r5 // R5 = 11110000 00000010 add r6, r1, r6 // R6 = 11110000 00000011 add r7, r1, r7 // R7 = 11110000 00000100 |
State at the end of CPY cycle 1:
State at the end of CPU cycle 2:
Also, at start of the CPU cycle, the ID stage selects R4 (=1) and R1 (= 240, an old value !!) to be fetched into the "A" and "B" registers.
Also, at the end of the CPU cycle, the instruction (ldr r1, [r2+r3]) is moved into IR(MEM), "add r4, r1, r4" is moved into IR(EX) and instruction "add r5, r1, r5" is fetched into IR(ID)
So far, the execution is the same as we have seen (when we execute a LDR instruction).
The change will will happen right now.... because the LOAD instruction in the MEM stage and it is being executed !!!
|
The STALL detection hardware will issue a STALL signal to the EX and ID stages !!!
The ID stage and the EX stage are stopped (= will do nothing)
Also, at the end of the CPU cycle, the instruction (ldr r1, [r2+r3]) is moved into IR(WB), but the add r4, r1, r4 instruction will remain in IR(EX), and the add r5, r1, r5 instruction will remain in IR(ID)
State at the end of CPU cycle 4:
So the correct value of R1 will be used by the first ADD instruction add r4, r1, r4
|
That is due the way the general purpose registers are updated:
|
If you want to know the details of this update technique, you can re-read the timing of the register updates here: click here
So the correct value of r1 will be fetched (and used) by the add r5, r1, r5 instruction.
State at the start of the CPU cycle:
State at the middle of the CPU cycle:
State at the end of CPU cycle 5: