Executing an conditional branch instruction using the modified ID stage
 

  • The conditional branch instruction must use the flag values and updates the PC with one of the following values:

      1. PC + 4 (no branch)
      2. PC + branch offset (branch)       

  • I will use the following program to show you the branch delay using the modified ID stage:

         cmp r1,r2
         bcc Label     (bcc can be any conditional branch instr) 
         add r2, r1, r2
         add r3, r1, r3
         add r4, r1, r4       
         ...
      

    (The execution does not use data forwarding and I will omit the data forwarding circuits in the diagrams to keep the material simple)

Executing an conditional branch instruction using the modified ID stage - Example

Start of cycle 1: IF stage is fetching the cmp instruction:

 

Executing an conditional branch instruction using the modified ID stage - Example

End of cycle 1: cmp instruction is fetched in IR(ID)

 

Executing an conditional branch instruction using the modified ID stage - Example

Start of cycle 2: ID stage fetch (all) operands for cmp, IF stage fetch bcc label:

 

Executing an conditional branch instruction using the modified ID stage - Example

End of cycle 2: fetch (all) operands for cmp fetched, fetch bcc label fetched:

Can we execute the Bcc instr correctly ? (I.e.: can we select the correct new address for the PC?)

Executing an conditional branch instruction using the modified ID stage - Example

Start 2: EX: compare R1.R2, ID: exec bcc , IF: fetch add r2,r1,r2:

Note: the ALU computes R1−R2 and also computes the values of the N,Z,V,C flags !

Executing an conditional branch instruction using the modified ID stage - Example

Middle of cycle 3: EX: updates the N,Z,V,C flags in the PSR:

Note: the branch selection circuit can obtain the correct N,Z,V,C flag values !

Executing an conditional branch instruction using the modified ID stage - Example

End of cycle 3: PC is correctly updated to address Label , add r2,r1,r2 is fetched in IR(ID)

The CPU can correctly execute a conditional branch instruction (with one-instruction branch delay)

The branch selection circuitry

The is one circuitry that remains to be discussed:

How does the branch selection circuitry work ?

The branch selection circuitry - background info
 

Without going into details of the flag setting (will require me too explain too many details), the following are the flag settings for the branch conditions used in assembler programming taught in CS255:

E.g.: the equal condition is true when the Z flag = 1
         the not equal condition is true when the Z flag = 0 - etc

Sample branch condition: blt, i.e.: x < y

Why x < y is represented by condition flags N != V ?

  • Normally (= no overflow), if x < y then x − y < 0 (i.e.: N = 1 and V=0):

        x = 3 and y = 5:
      
               00000011
             - 00000101
            -----------
               11111110  (= -2)   N=1  (No overflow: V=0)  

  • However, some values where x < 0 and y > 0 may cause the condition N=0, e.g.:

        x = -128 and y = 1:  (x < y)
      
               10000000  (=-128, largest negative byte value)
             - 00000001 
            -----------
               01111111  (=+127)  N=0  (Overflow: V=1)   

The branch selection circuitry - background info
 

Recall how the branch conditions are encoded in the instruction code (by Aaron):

            Branch
            condition  <-------------- branch offset  ----------->
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
  | 1 | 1 | B | B | B |   |   |   |   |   |   |   |   |   |   |   |
  +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

    B B B        Branch condition
   ---------------------------------------------------------------
    0 0 0        Branch always
    0 0 1        BEQ
    0 1 0        BNE
    0 1 1        BLT
    1 0 0        BLE
    1 0 1        BGT
    1 1 0        BGE
    1 1 1        not used
  

The branch selection circuitry

The branch selection circuitry will output 1 when the branch condition is true and 0 otherwise:

 

The branch selection circuitry

We use a decoder to determine the branch condition:

The conditional branch code determines which decoder output = 1

The branch selection circuitry

When the decoder detects a branch always instruction, the branch selection = 1:

The conditional branch code 000 will cause the branch delection circuit to output 1

The branch selection circuitry

For the beq condition (condition code 001) to be true, the Z flag must be equal to 1:

 

The branch selection circuitry

For the bne condition (condition code 010) to be true, the Z flag must be equal to 0:

 

The branch selection circuitry

For the blt condition (condition code 011) to be true, the N flag must not equal to the V flag:

 

The branch selection circuitry

And so on:

The branch selection must output 1 whenever the flag value fulfills the branch condition !!

The branch selection circuitry

We must combine the individual branch cases using an OR-gate:

The resulting circuit is the branch selection circuit to determine if the condition for a branch is met !

DEMO (using Aaron's pipelined CPU) - branch taken
 

  • Execute this command on a lab machine:

       /home/cs355001/demo/pipeline/6-speedup-cond-bra1     
      

    Program being executed:

         0:   10  62    // mov r1,#62
              18  1     // mov r2,#1
              26  1     // mov r3,#1
              34  1     // mov r4,#1
              42  1     // mov r5,#1
              50  1     // mov r6,#1
              58  1     // mov r7,#1
              0   0     // nop
              0   0     // nop
              0   0     // nop
              0   0     // nop
              0   0     // nop
       12:    4 145    // cmp r2,r1 (1-62 < 0)
       13:    216 43   // blt +43  (branch taken)
              16  10    // add r2,r1,r2  (R2=R1+R2) executed and then branch !  
              24  11    // add r3,r1,r3  (R3=R1+R3) // Not executed
              32  12    // add r3,r1,r4  (R4=R1+R4) 
              40  13    // add r4,r1,r5  (R4=R1+R4)
              48  14    // add r5,r1,r6  (R4=R1+R4)
      
       56:     0   1    // <---- bra target   (56 = 111000)
               0   2
      

DEMO (using Aaron's pipelined CPU) - branch not taken
 

  • Execute this command on a lab machine:

       /home/cs355001/demo/pipeline/6-speedup-cond-bra2     
      

    Program being executed:

         0:   10  62    // mov r1,#62
              18  1     // mov r2,#1
              26  1     // mov r3,#1
              34  1     // mov r4,#1
              42  1     // mov r5,#1
              50  1     // mov r6,#1
              58  1     // mov r7,#1
              0   0     // nop
              0   0     // nop
              0   0     // nop
              0   0     // nop
              0   0     // nop
       12:    4 138    // cmp r1,r2 (62-1 >= 0)
       13:    216 43   // blt +43  (branch NOT taken)
              16  10    // add r2,r1,r2  (R2=R1+R2) - program just continues...
              24  11    // add r3,r1,r3  (R3=R1+R3) 
              32  12    // add r3,r1,r4  (R4=R1+R4) 
              40  13    // add r4,r1,r5  (R4=R1+R4)
              48  14    // add r5,r1,r6  (R4=R1+R4)
      
       56:     0   1    // <---- bra target   (56 = 111000)
               0   2