CS255 Syllabus

Sample variable length instruction encoding: M68000 Instruction Encoding

What is the M68000 processor ?

M68000 was a micro processor (CPU) designed by Motorola at around 1980.

The length of a Motorola M68000 instruction can vary between:

2 bytes: +------------------+ | 2 bytes | +------------------+ To a maximum of 10 bytes: +---------------------------------------------------------+ | 10 bytes | +---------------------------------------------------------+

The length is adjusted according to the need to encode more information

M68000 instruction formats
- The M68000 processor is a Complex Instruction Set Computer and as such, it has many instruction formats
- The various instruction formats of the M68000 processor are summarized in Section 8 (starting on page 557 of this document: clock here
  This document is only to show you (give you an idea) how to encode various computer instructions - you don't need to know the details in this huge document.
  I will only discuss the encoding of one specific instruction next.

The "MOVE" instruction and its encoding in M68000

The MOVE instruction copies a value from a source operand to a destination operand
One of the format of the MOVE instruction is:
The syntax of the MOVE instruction that copies a value x to a register is:

The M68000 processor has 2 variants of the move instructions:

There is a shorter (and quicker) variant used to move smaller values into a register of the the M68000 processor
There is a longer (and slower) variant used to move larger values into a register of the the M68000 processor

Because program often use small values (e.g.: i = 0; or i = 1;), M68000 provides a quick move variant to help optimize the program for running time.

The longer (and slower) variant is for general use.

The encoding format of the M68000's move instructions are given on pages 222 and 238 of the M68000 Programmer Manual

I have replicated the format here for our convenience:

Short format MOVEQ: (page 238) - for "small value" +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | 0 | 1 | 1 | 1 | | | | 0 | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ <---------> <-----------------------------> dest reg# value between -128..127
Long format MOVE: (page 222) - for "larger values" +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | 0 | 0 | | | | | | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | + 32 bit value +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ <-----> <---------> size dest reg# size: 00 = 1 byte 01 = 2 bytes 10 = 4 bytes dest reg#: 000 = d0 001 = d1 010 = d2 011 = d3 100 = d4 101 = d5 110 = d6 111 = d7 The long format can accomodate value in 32 bits !!!

Here are a few of the MOVE instruction of the M68000 processor that copies different values to the register d0 and their corresponding instruction codes:

M68000 instruction Binary machine code (given in Hexadecimal) ----------------------- --------------------------------------------- move.l #1, d0 * Instruction code: 7001 move.l #127, d0 * Instruction code: 707F move.l #-128, d0 * Instruction code: 7080 move.l #128, d0 * Instruction code: 203C 0000 0080 move.l #129, d0 * Instruction code: 203C 0000 0081 move.l #2147483648, d0 * Instruction code: 203C 8000 0000 (Note: 2147483648 = 2^31) Explanation: Take a look at the instruction "move.l #1, d0" It's instruction code given in hexadecimal number is: 7001 Expressed in binary, the instruction code is: <----- 7 -----> <----- 0 -----> <----- 0 -----> <----- 1 -----> (Hex) +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ <---------> <-----------------------------> dest reg# value between -128..127 You can see that the "dest reg#" field represents the register "d0" You can also see that the "value" field represents the (binary) number 1 The short format is used to encode the 2nd and the 3rd instruction in the example: "move.l #127, d0" and "move.l #-128, d0". When you express their hexadecimal number code in binary, you will find the value 127 and -128 in the value part of the instruction code.
Let's take a look at the instruction "move.l #128, d0" The first instruction code given in hexadecimal number is: 203C Expressed in binary, the instruction code is: <----- 2 -----> <----- 0 -----> <----- 3 -----> <----- C -----> (Hex) +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ <-----> <---------> size dest reg# You can see that this is a long format "MOVE" instruction code where: size = 10 (4 bytes) dest reg# = 000 = "d0" This instruction code is followed by the value given by the next 32 bits

I want to show you how the value x is encoded in the M68000 instruction - specifically, how M68000's variable length instructions are used to accommodate more information.

Notice that the value x must be encoded inside the instruction

When the value is small (between -128 and 127), the constant is encoded using the last 8 bits of the instruction:

M68000 instruction Binary machine code (given in Hexadecimal and Binary) ----------------------- --------------------------------------------- move.l #1, d0 * Instruction code: 7001 = 0111000000000001 move.l #127, d0 * Instruction code: 707F = 0111000001111111 move.l #-128, d0 * Instruction code: 7080 = 0111000010000000 The first half of the instruction code (= 01110000) encodes "copy a value ...., to register d0" The 2nd half of the instruction code encodes the value (in 8 bits binary): 00000001 = 2's compl code for the value 1 01111111 = 2's compl code for the value 127 10000000 = 2's compl code for the value -128 !!

Notice when the value is larger than 128, we will need to use more bits to encode the value

The M68000 assembler knows this - so when we use values that are larger than 128, the M68000 assembler use a different machine code:

M68000 instruction Binary machine code (given in Hexadecimal and Binary) ----------------------- --------------------------------------------- move.l #128, d0 * Instruction code: 203C 0000 0080 = 001000000011110000000000000000000000000010000000 move.l #129, d0 * Instruction code: 203C 0000 0081 = 001000000011110000000000000000000000000010000001 move.l #2147483648, d0 * Instruction code: 203C 8000 0000 = 001000000011110010000000000000000000000000000000 (Note: 2147483648 = 2^31) As you can see, the instruction is represented logically; but the format is different: The first 16 bits of the instruction code (= 0010000000111100) encodes "copy a value ...., to register d0" The last 32 bits of the instruction code encodes the value (in 32 bits binary): 00000000000000000000000010000000 = 2's compl code for the value 128 00000000000000000000000010000000 = 2's compl code for the value 129 10000000000000000000000000000000 = 2's compl code for the value 2147483648 = 2^31 !!

You can see that M68000 uses fewer bits (= shorter instruction code) when the value is small (and requires fewer bits to encode)

When we use larger value, M68000 uses a longer instruction code to represent the value

So:

Variable length instruction code can accomodate more complex information (when you need more bits to represent the information)

But variable length instructions also complicate decoding:

There are 2 different instruction codes for the same instruction that means "move a value ... to register do (a short form 01110000 and a long form 0010000000111100)

Example Program: (Demo above code)

Prog file: /home/cs255001/demo/instruction-encoding/m68000/asm-prog.s

How to run the demo:

Make a m68000 in your cs255 folder: mkdir ~/cs255/m68000
Copy my demo: cp -r /home/cs255001/demo/instruction-encoding/m68000/* ~/cs255/m68000
Go to the copied directory: cd ~/cs255/m68000
Compile the assembler program: ./as-m68000 asm-prog
Take a look at the listing file: gedit asm-prog.list

You will see this listing content:

2 * Intro to assembler programming 3 4 xdef Start 5 6 Start: 7 8 0000 7001 move.l #1,d0 * Instruction code: 7001 9 0002 707F move.l #127,d0 * Instruction code: 707F 10 0004 7080 move.l #-128,d0 * Instruction code: 7080 11 0006 203C 0000 move.l #128,d0 * Instruction code: 203C 0000 0080 11 0080 12 000c 203C 0000 move.l #129,d0 * Instruction code: 203C 0000 0081 12 0081 13 0012 203C 8000 move.l #2147483648,d0 * Instruction code: 203C 8000 0000 13 0000 14 * (2147483648 = 2^31) 15 16 17 end

The Hexedecimal number in red are the instruction codes generate for the assembler program (I entered it as comment on the right for your convenience)

You can see that the instruction code I showed you are indeed the real ones, and I did not "make it up"...