What have we learned: hardware to execute one machine instruction
 

What have we learned so far:

  1. Circuits used to forward data to the places where they are needed (multiplexor and decoder)

  2. Circuits used to perform computations on binary numbers (adder, multiplier, etc)

  3. How is a (simple) CPU constructed

  4. (How the CPU communicate with the memory)

  5. (How a computer program performs IO operations (that take a relatively long time to complete)

Parallel execution: executing multiple machine instructions at the same time
 

In the remainder of the course, we will study techniques to execute multiple instructions:

  1. Pipelined processing

      • A pipelined CPU will execute multiple instructions at the same time

  2. Parallel processing

      1. SIMD Parallel processing uses multiple copies of ALUs
      2. MIMD Parallel processing uses multiple copies of CPUs

Review: program instruction execution
 

The CPU that we have studied operates as follows:

  1. Fetch the next instruction from memory into the CPU

  2. Decode the instruction (this does not take any time at all !)

  3. Fetch of the operands for the instruction (from the registers)

  4. Perform the operation (it takes time to perform complex arithmetic operations)

  5. Write (= save) the result back in a register

All these steps (= stages of processing) are performed completely for one instruction before the next instruction is fetched from memory...

Program instruction processing depicted graphically
 

The "normal" CPU processes program instructions as follows

Initial state:

The stages represents the various steps of the instruction execution cycle

Program instruction processing depicted graphically
 

The "normal" CPU processes program instructions as follows

(1) Fetch the Instruction-1 from memory:

The diagram depicts that instruction-1 has now completed its stage 1 of processing

Program instruction processing depicted graphically
 

The "normal" CPU processes program instructions as follows

(2) Fetch the operands for Instruction-1 from the registers:

The diagram depicts that instruction-1 has now completed its stage 2 of processing

Program instruction processing depicted graphically
 

The "normal" CPU processes program instructions as follows

(3) Perform the computation on the operands of Instruction-1:

The diagram depicts that instruction-1 has now completed its stage 3 of processing

Program instruction processing depicted graphically
 

The "normal" CPU processes program instructions as follows

(4) Write (= store) the result of Instruction-1 back into a register:

The diagram depicts that instruction-1 has now completed its stage 4 of processing

Now the execution of instruction-1 is completed and instruction-2 is then fetched and processed !

Program instruction processing depicted graphically
 

The "normal" CPU processes program instructions as follows

(5) Fetch the Instruction-2 from memory:

The diagram depicts that instruction-2 has now completed its stage 1 of processing

And so on...

Pipeline processing

  • Pipelined processing = execution program instructions where different instructions are in different stages of completion

    Multiple instructions are executed at the same time

    Different instructions are in different stages of completion

  • Analogy: assembly line (items are in different stages of completion)

                        

Pipelined instruction processing depicted graphically
 

The pipelined CPU processes program instructions as follows

Initial state:

The stages represents the various steps of the instruction execution cycle

Pipelined instruction processing depicted graphically
 

The pipelined CPU processes program instructions as follows

(1) Fetch the Instruction-1 from memory:

The diagram depicts that instruction-1 has now completed its stage 1 of processing

Pipelined instruction processing depicted graphically
 

The pipelined CPU processes program instructions as follows

(2) (A) Fetch the operands for Instruction-1 (2) and fetch instruction-2 from memory:

Instruction-1 completed its stage 2 of processing and instruction-2 completed its stage 1

Pipelined instruction processing depicted graphically
 

The pipelined CPU processes program instructions as follows

(3) (A) Perform the computation on the operands of Instruction-1, (B) Fetch operands for Instruction-2 and (C) fetch Instruction-3 from memory:

Instruction-1 completed its stage 3 of processing, Instruction-2 completed its stage 2 and Instruction-3 completed its stage 1

Advantage of a pipelined CPU: speed

A non-pipelined CPU executes 1 instruction in N (> 1) CPU clock cycles:

Advantage of a pipelined CPU: speed

A pipelined CPU executes 1 instruction in 1 CPU clock cycle (in its steady state):

How a pipelined CPU differs from a "normal" CPU

A "normal" CPU (the one that we have studied so far) operates as follows (simplified):

 

How a pipelined CPU differs from a "normal" CPU

Let's see how the "normal" CPU executes instructions:

Phase 1 of clock will update IR with next instruction.

How a pipelined CPU differs from a "normal" CPU

The instruction add r1,r2,r3 will fetch the operands R2 and R3:

Phase 2 of clock will update A-buffer and B-buffer with R1 and R2.

How a pipelined CPU differs from a "normal" CPU

The instruction add r1,r2,r3 will then write the result into R1:

Phase 3 of clock will update R0 with R1+R2.

How a pipelined CPU differs from a "normal" CPU

Then the cycle repeats:

Phase 1 of clock will update IR with next instruction.... And so on
Notice: one instruction is executed completely before the next instruction starts its execution

How a pipelined CPU differs from a "normal" CPU

A pipelined CPU (the one that we will be studying) operates as follows (simplified):

Notice: there are multiple Instruction Registers inside the pipelined CPU !!!
Different IRs are controlling a different parts (= execution phase) of the CPU

How a pipelined CPU differs from a "normal" CPU

Let's see how the pipelined CPU executes instructions:

The clock will update IR1 with next instruction (and other registers too, but let's ignore them for now - assume that IR2 contains a "NOP" (= No Operation) instruction)

How a pipelined CPU differs from a "normal" CPU

The instruction add r1,r2,r3 in IR1 will fetch the operands R2 and R3:

The clock will update A-buffer and B-buffer with R1 and R2 and simultaneously (1) copy IR1 to IR2 and (2) fetch next instruction into IR1 !!!

How a pipelined CPU differs from a "normal" CPU

The instruction add r4,r5,r6 in IR1 will fetch the operands R5 and R6 and simultaneously instruction add r1,r2,r3 in IR2 will write the result R1+R2 into R0:

The clock will update R0 with R1+R2 and simultaneously (1) update A-buffer and B-buffer with R4 and R5 and (2) copy IR1 to IR2 and (3) fetch next instruction into IR1 !!!

Making a pipelined CPU ( Wikipedia)

  • In a pipelined CPU, instructions flow through the central processing unit (CPU) in stages.

  • A pipelined CPU has "pipeline registers" after (in) each stage.

    These registers store information from the instruction and calculations so that the logic gates of the next stage can do the next step:

                 

Designing an "easier to make" pipelined CPU

Simplifications intorduced to make it easier to build a pipelined CPU:

  • Use fixed size machine instructions

      • E.g.: every ARM instructions are encoded with 4 bytes

    Traditionally, machine instructions can have variable number bytes

  • Limit (= reduce) the number of instructions that access memory

      • E.g.: ARM CPU has only 2 instructions that access memory: ldr and str.

        (The reason is executing an instruction that access the memory will conflict with the instruction fetching operation done by the CPU).

Important consideration when making a pipelined CPU: resource requirement to execute instructions

The 3 categories of machine instructions that have similar processing requirements (i.e.: requires similar resources to execute these instructions):

  • Arithmetic and Logic instructions

       add  r0, r1, r2     // Updates a general    
       cmp  r0, r1         // purpose register
      

  • Memory access instructions

       ldr  r0, [r1, r2]   // Access the memory    
       strh r0, [r1, r2]
      

  • Branching instructions

       ble    whileEnd   // Updates the PC !       
       bl     func
      

Making a pipelined CPU
 

  • A pipelined CPU is more complex than the multicycle ("normal") CPU.

  • The pipelined CPU also has to deal with additional complexity:

      • The pipeline CPU also must make sure that the instruction in each stage does not harm the operation of instructions in other stages !!

    For example, if instructions in two different stages use the same piece of data (e.g.: add r10, r1, r2 and add r3,r4,r10) the pipeline CPU must assure that the uses (of R10) are done in the correct sequence.

  • We will study a pipelined CPU designed by Hennessy (Stanford) Patterson (Berkeley)