Removing the dummy "NOP" instructions from a SPARC assembler program

Techniques for removing NOP instructions

Fact:

There are 2 ways to put a "useful" (non-NOP) instruction in the "delay slot" of a branch instruction:

Try to move the instruction immediately prior to the branch instruction into the delay slot
This technique is generally used in an unconditional branch
Try to move the instruction at the branch target location of the branch instruction into the delay slot
This technique is generally used in an conditional branch - but requires the annulling version of the branch instruction

Removing the NOP instruction after an UNCONDITIONAL branch instruction

Consider the following assembler program:

 	sethi  %hi(x), %l0
	ld     [%l0 + %lo(x)], %l0  // l0 = x
 	sethi  %hi(y), %l1
	ld     [%l1 + %lo(y)], %l1  // l1 = y

	cmp    %l0, %l1
	bl     L1
	nop

	sethi  %hi(y), %l2
	st     %l0, [%l2 + %lo(y)]  // y = l0 (x)

	ba     L2 		// unconditional branch  
	nop

    L1: sethi  %hi(x), %l2
	st     %l1, [%l2 + %lo(x)]  // x = l1 (y)

    L2:

Since the instruction in the delay slot is always executed, we can:

replace "nop" by the instruction prior to the branch "ba L2"
(i.e., instruction "st %l0, [%l2 + %lo(y)])
The instructions that are executed will not be affected. by this change !!!

Improved program:

 	sethi  %hi(x), %l0
	ld     [%l0 + %lo(x)], %l0  // l0 = x
 	sethi  %hi(y), %l1
	ld     [%l1 + %lo(y)], %l1  // l1 = y

	cmp    %l0, %l1
	bl     L1
	nop

	sethi  %hi(y), %l2 

	ba     L2 	
	st     %l0, [%l2 + %lo(y)]  // Will be executed before branch takes place

    L1: sethi  %hi(x), %l2
	st     %l1, [%l2 + %lo(x)]  // x = l1 (y)

    L2:

Removing the NOP instruction after an CONDITIONAL branch instruction

Next, consider the "nop" instruction following "bl L1":

 	sethi  %hi(x), %l0
	ld     [%l0 + %lo(x)], %l0  // l0 = x
 	sethi  %hi(y), %l1
	ld     [%l1 + %lo(y)], %l1  // l1 = y

	cmp    %l0, %l1

	bl     L1
	nop

	sethi  %hi(y), %l2 
	ba     L2 		// Skip over else part !
	st     %l0, [%l2 + %lo(y)]  // y = l0 (x)

    L1: sethi  %hi(x), %l2
	st     %l1, [%l2 + %lo(x)]  // x = l1 (y)

    L2:

The technique of

"move the instruction prior to the branch instruction into the delay slot"

does NOT work because this instruction is a compare instruction that sets the flags for the conditional branch instruction.

If we move the compare instruction into the delay slot of the branch instruction, like this:

sethi %hi(x), %l0 ld [%l0 + %lo(x)], %l0 // l0 = x sethi %hi(y), %l1 ld [%l1 + %lo(y)], %l1 // l1 = y bl L1 cmp %l0, %l1 sethi %hi(y), %l2 ba L2 // Skip over else part ! st %l0, [%l2 + %lo(y)] // y = l0 (x) L1: sethi %hi(x), %l2 st %l1, [%l2 + %lo(x)] // x = l1 (y) L2:

Then the flags will NOT be set correctly when the bl L1 instruction is executed.

Therefore, we must use the second technique in this case:

First, consider the instructions that belong in the then and the else parts of the if statement:

sethi %hi(x), %l0 ld [%l0 + %lo(x)], %l0 // l0 = x sethi %hi(y), %l1 ld [%l1 + %lo(y)], %l1 // l1 = y cmp %l0, %l1 bl L1 nop sethi %hi(y), %l2 // Instructions in THEN part ba L2 st %l0, [%l2 + %lo(y)] L1: sethi %hi(x), %l2 st %l1, [%l2 + %lo(x)] // Instructions in ELSE part L2:

We will transform the program by shuffling instructions
We cannot not shuffle instructions arbitrarily - the most important goal is to keep the program correct

The program will remain correct if it executes the same instructions in the same sequence for both the TRUE case (THEN part) and the FALSE case (ELSE part)

I have highlighted the instructions in the THEN part in blue and the instructions in the ELSE part in magenta

OK, consider the program when we have moved the instruction "sethi %hi(x), %l2" (at label L1) into the delay slot:

sethi %hi(x), %l0 ld [%l0 + %lo(x)], %l0 // l0 = x sethi %hi(y), %l1 ld [%l1 + %lo(y)], %l1 // l1 = y cmp %l0, %l1 bl L1 sethi %hi(x), %l2 <------ !!! sethi %hi(y), %l2 ba L2 st %l0, [%l2 + %lo(y)] L1: st %l1, [%l2 + %lo(x)] L2:

If x < y (in the ELSE case), these instructions will be executed:

bl L1 // if (x < y), "bl" will branch to L1.... sethi %hi(x), %l2 ......(1) L1: st %l1, [%l2 + %lo(x)] ......(2)

If x >= y (in the THEN case), these instructions will be executed:

bl L1 // if (x >= y), "bl" will NOT branch to L1.... sethi %hi(x), %l2 **** Also executed !!! sethi %hi(y), %l2 ba L2 st %l0, [%l2 + %lo(y)] L2:

We see that in this case, ONE instruction from the ELSE part is executed.

(Note that is THIS example, the instruction from the ELSE part will not cause an error in the execution, but that is NOT true in general

The annulling version of a branch instruction

Each SPARC branch instruction comes in 2 flavors:
- A non-annulling version (the one that we have used so far; this is the "ordinary" kind of branch instructions - works just like M68000)
- A annulling version.
  The annulling branch instruction is denoted by adding ,a to the branch instruction

In an annulling branch instruction, the instruction in the delay slot of the branch instruction is void when the branch instruction FAILS (i.e., does not) to branch

The above program can be written using an annulling branch instruction as follows:

sethi %hi(x), %l0 ld [%l0 + %lo(x)], %l0 // l0 = x sethi %hi(y), %l1 ld [%l1 + %lo(y)], %l1 // l1 = y cmp %l0, %l1 bl,a L1 // Will void next instruct if no branch sethi %hi(x), %l2 sethi %hi(y), %l2 ba L2 st %l0, [%l2 + %lo(y)] L1: st %l1, [%l2 + %lo(x)] L2:

If x >= y (in the THEN case), these instructions will be executed:

bl,a L1 // if (x >= y), "bl" will NOT branch to L1.... ~~sethi %hi(x), %l2~~ // ANNULLED (because "bl" did NOT branch !!) sethi %hi(y), %l2 ba L2 st %l0, [%l2 + %lo(y)] L2:

Now the program will execute the same instructions and in the same sequence in BOTH cases (in both true and false cases).

Hence, the transformed program is equalivalent to the original program and it is correct.

Example Program: (Demo above code)
- Prog file: click here