|
|
|
![]() |
We need circuits to perform these functions:
|
|
The multiplexor that selects the first input of the ALU:
|
The 2nd multiplexor that selects the second input of the ALU:
|
Let us first look at an example and convince ourselves that the solution works. Later, I will should you how the two multiplexor in the EX stage is constructed (it's pretty straightforward).
add r1, r2, r3 R2=192, R3=48, R4=1, R5=2, R6=3, R7=4
add r4, r1, r4
add r5, r1, r5
add r6, r1, r6
add r7, r1, r7
...
Slideshow:
State at the end of CPU cycle 1:
State at the end of CPU cycle 2:
The difference now is the result (240) will also be written into the Forwarding Register FR1 along with the register tag = 001 (indicating register R1)
Also, at start of the CPU cycle, the ID stage selects R4 and R1 to be copied into the "A" and "B" registers,
Notice that an OLD value of R1 (= 12) will still be fetched into B. (That is not a problem because we will find a way to obtain the more recent value from the forwarding registers - see next CPU cycle).
State at the end of CPU cycle #3:
|
For the first operand, the value from the "A" register (R4) is selected.
But for the second operand, the value of the Forwarding register FR1 is select -- because (tag1 (001) == Src2 (001)) !!!.
So the ALU will add R4 with the new value 240 of R1 (which has not arrived to R1 yet !!!)
State at the start of CPU cycle 4:
State at the end of CPU cycle #4:
For the first operand, the value from the A register (R5) is selected.
For the second operand, the value of the Forwarding register FR2 is selected -- because (tag2 (001) == Src2 (001)).
So the ALU will add R5 with the new value (= 240) of R1 (which has still not arrived at R1 yet !!!)
State at the start of CPU cycle #5:
NOTE: That is not a problem because we have previously determined that the 3rd instruction following "add r1,r2,r3" can obtain the correct value of R1 using the "A"/"B"-registers.
State at the end of CPU cyle #5:
The first input of the ALU can be one of the following:
|
The selection logic (algorithm) can be formulated as follows:
if ( instruction is a BRANCH instruction ) select PC1; else if ( Tag1 in ForwReg1 == Src1 in instruction ) select ForwReg1; // Because this reg. has the most recent value else if ( Tag2 in ForwReg2 == Src1 in instruction ) select ForwReg2; // Because this reg. has the next recent value else select A-register; |
(I call this a "multiplexor", but in reality, it is a hardware if-statement !!!)
This circuit (hardware) implements the following if-statement:
if ( instruction is a BRANCH instruction ) select PC1; else if ( Tag1 in ForwReg1 == Src1 in instruction ) select ForwReg1; // Because this reg. has the most recent value else if ( Tag2 in ForwReg2 == Src1 in instruction ) select ForwReg2; // Because this reg. has the next recent value else select A-register; |
(The red circle represents the "compare equal" circuit above).
The PC1 value is selected when the instruction is a BRANCH instruction and otherwise, the value from the second last multiplexor is selected.
/home/cs355001/demo/pipeline/4a-ALU-hazard-sol Executes: 10 12 // mov r1,#12 18 192 // mov r2,#192 26 48 // mov r3,#48 34 1 // mov r4,#1 42 2 // mov r5,#2 50 3 // mov r6,#3 58 4 // mov r7,#4 0 0 // nop 0 0 // nop 0 0 // nop 0 0 // nop 8 19 // add r1,r2,r3 (R1=R2+R3) 32 33 // add r4,r1,r4 (R4 = R1 + R4) (R1 forwarded) 40 41 // add r5,r1,r5 (R5 = R1 + R5) (R1 forwarded) 48 49 // add r6,r1,r6 (R6 = R1 + R6) 56 57 // add r7,r1,r7 (R7 = R1 + R7) |