80x86 Assembler, Part 7

Atrevida Game Programming Tutorial #18
Copyright 1998, Kevin Matz, All Rights Reserved.

Prerequisites:

Chapter 17, 80x86 Assembler, Part 6

In this chapter, we'll examine string instructions. We'll learn how to manipulate strings, and we'll also discover how certain string instructions can be used to manipulate graphics in video memory.

Is it safe to ignore this chapter?

String instructions are simply faster and more convenient methods of doing certain tasks. By that, I mean that you can write assembler code that does the same thing as a string instruction. The string instructions will generally be shorter, faster, and more convenient to use, but you can always write chunks of assembler code that perform the same tasks. Therefore, it is possible to "get by" in assembler without learning the string instructions. But these instructions are not difficult to learn, and they offer many advantages, so let's take a look...

A brief overview of string instructions

In 80x86 assembler, a string is considered to be a consecutive sequence of bytes or words in memory. On the 386 and higher processors, strings can also be composed of doublewords, but in this tutorial we'll only deal with bytes and words. When you use a string instruction, you have to specify the size of the "basic units" of the string: byte or word (or doubleword).

The string instructions are STOSx, MOVSx, LODSx, CMPSx, and SCASx. On the 386 and up, there are the additional instructions INSx and OUTSx. What do the "x"'s stand for? This is where you specify the size of the basic units of the string. If you want to deal with a string composed of bytes, replace the "x" with "B". If you want to deal with strings composed of words, replace the "x" with "W". (For doublewords, replace the "x" with "D".)

For example, if we wanted to use MOVSx to process a string composed of bytes, we'd use the instruction MOVSB. Or, if we wanted to use CMPSx to operate on strings composed of words, we'd use CMPSW.

String instructions can use conditional-repeat mnemonics, also called repeat prefixes. These are REP, REPE, REPNE, REPZ, and REPNZ, and they are convenient because they let you avoid writing lots of little loops when you're doing string processing.

In just a moment, we'll look at each string instruction, and we'll see how the string instructions work with the repeat prefixes.

The direction flag (DF)

String instructions can work either forward or backward in memory. Before you use a string instruction, you must be sure to specify the direction that you want, and to do that, you can either set or clear the direction flag (DF).

                      Addresses increase this way
                    ------------------------------->

Address:  n-5  n-4  n-3  n-2  n-1   n   n+1  n+2  n+3  n+4  n+5
       --+----+----+----+----+----+----+----+----+----+----+----+--
         |    |    |    |    |    |    |    |    |    |    |    |
       --+----+----+----+----+----+----+----+----+----+----+----+--

                                     Forward (increasing addresses): CLD
                                    *-------------------->
               <--------------------*
Backward (decreasing addresses): STD

To tell string instructions to go forward in memory (that is, with increasing addresses) you can use the Clear Direction Flag instruction, CLD, like this:

    CLD                                ; DF = 0; String op.'s go forward

To tell string instructions to go backward in memory (that is, with decreasing addresses), use the Set Direction Flag instruction, STD:

    STD                                ; DF = 1; String op.'s go backward

The direction flag stays set or cleared until you set or clear it again. If you only use one direction throughout your entire program, you only have to set or clear the direction flag once. You could put a CLD or STD instruction somewhere at the beginning of your CODESEG section.

STOSx and REP STOSx

STOSx, for Store String, basically does one thing: it stores a single value at a memory location. STOSx doesn't take any operands. Here are the specifics:

If the instruction being used is STOSB, it takes the value in AL and stores it at the byte pointed to by ES:DI. Then, if DF is 0, DI is incremented by 1, so that ES:DI points to the next byte. Otherwise, if DF is 1, DI is decremented by 1, so that ES:DI points to the previous byte in memory.
If the instruction being used is STOSW, it takes the value in AX and stores it at the word pointed to by ES:DI. Then, if DF is 0, DI is incremented by 2, so that DI points to the next word. Otherwise, if DF is 1, DI is decremented by 2, so that ES:DI points to the previous word in memory.

What good is this? Can't we just use this...

   MOV [ES:DI], AL
   INC DI

...in place of STOSB, and this...

   MOV [ES:DI], AX
   INC DI
   INC DI

...in place of STOSW? Certainly, but STOSx is probably faster. (I say "probably" because on some newer processors, it is sometimes possible for a series of short, simple operations to do a task faster than an equivalent "complex" instruction. But this isn't always the case and it isn't true for every 80x86 processor. And later, we'll see how using STOSx with a conditional-repeat mnemonic can provide more speed.)

Let's use STOSB for what it is usually used for: storing the same character in some number of consecutive locations. Let's say I have this string declared in the data segment:

    DATASEG

Greeting                          DB   "Friends, Romans, Countrymen!", "$"

Now I want to overwrite the string with "X"'s, up to the "$" character. I counted 28 characters, from the "F" to the "!". Let's write a loop that repeatedly invokes the STOSB instruction:

    ; Setup before the loop:
    MOV AX, DS                         ; (intermediate)
    MOV ES, AX                         ; Let ES = DS (remember, you can't
                                       ;  copy one segment register
                                       ;  directly to another)

    MOV DI, OFFSET Greeting            ; Now ES:DI points to the start of
                                       ;  the Greeting string

    MOV AL, 'X'                        ; Here's the character that we're
                                       ;  going to store

    MOV CX, 28                         ; Set up our counter to iterate
                                       ;  28 times

    CLD                                ; Make sure our string instructions
                                       ;  go forward in memory

    ; Now run the loop to store the "X" character 28 times...
@@StoreCharacterLoop:
    STOSB                              ; Store AL at [BYTE ES:DI], then
                                       ;  increment DI

    LOOP @@StoreCharacterLoop          ; Decrement CX and jump back to the
                                       ;  label if CX is not zero

; Now we're done.  ES:DI points at the character right after the sequence
;  that we stored, so in this case, ES:DI currently points at the "$"
;  character.

Before we run this section of code, the string Greeting consists of:

 00000000011111111112222222222
 12345678901234567890123456789
 |||||||||||||||||||||||||||||
"Friends, Romans, Countrymen!$"

And after the loop finishes, the string Greeting will contain:

 00000000011111111112222222222
 12345678901234567890123456789
 |||||||||||||||||||||||||||||
"XXXXXXXXXXXXXXXXXXXXXXXXXXXX$"

If you have a copy of Turbo Debugger or some other debugger, you can use it to watch the string get overwritten with "X"'s. If you don't have a debugger, simply use INT 21h Service 9 ("$"-Terminated String Print), before and after to see how the string changes. (You could even write a DisplayString macro to save time if you end up displaying a lot of strings.)

Now, this section of code works fine, but we can make it a little faster. Let's use the REP repeat prefix. (REP is the only repeat prefix that works with STOSx.) We write REP and STOSx together, like this:

    REP STOSB

    REP STOSW

So what does REP do? It works very much like LOOP. It first checks CX, the "counter register", and if CX is greater than zero, it executes the STOSB or STOSW instruction. Then it decrements CX by one, and it repeats. So basically, REP STOSB is almost the same as...

@@StoreCharacterLoop:
    STOSB                              ; Store AL at [BYTE ES:DI], then
                                       ;  increment DI

    LOOP @@StoreCharacterLoop          ; Decrement CX and jump back to the
                                       ;  label if CX is not zero

...except for the way in which the "CX equals 0" situation is handled. Using REP, if CX equals 0, no characters are stored; using LOOP, as in the above loop, at least one character is stored no matter what CX is.

Likewise, REP STOSW would be similar to...

@@StoreWordLoop:
    STOSW                              ; Store AX at [BYTE ES:DI], then
                                       ;  increment DI by two

    LOOP @@StoreWordLoop               ; Decrement CX and jump back to the
                                       ;  label if CX is not zero

...with the same exception regarding the "CX equals 0" situation.

Let's use REP STOSB to replace the loop in the string-filling loop we saw earlier. Here's the new version:

    ; Setup for the REP STOSB instruction:
    MOV AX, DS                         ; (intermediate)
    MOV ES, AX                         ; Let ES = DS (remember, you can't
                                       ;  copy one segment register
                                       ;  directly to another)

    MOV DI, OFFSET Greeting            ; Now ES:DI points to the start of
                                       ;  the Greeting string

    MOV AL, 'X'                        ; Here's the character that we're
                                       ;  going to store

    MOV CX, 28                         ; Set up our counter to iterate
                                       ;  28 times

    CLD                                ; Make sure our string instructions
                                       ;  go forward in memory

    ; Now use REP STOSB to repeatedly store the AL character...
    REP STOSB                          ; Store the character in AL to 28
                                       ;  consecutive bytes, starting at
                                       ;  ES:DI.  (ES:DI is modified.)

Again, you can use a debugger or a string-print routine to test that it works.

Now, when you use REP, this is definitely faster than the old loop method we used earlier. I'm going to explain why, but it's rather long, so you can skip the next section if you're in a hurry.

Here's the main reason why the REP method is faster: when the processor is executing a program, the processor needs to read in each instruction from memory so it knows what to do next. If it sees the machine-language code for REP STOSB or REP STOSW, then it has read in the instruction and it doesn't need to read any more instructions to finish this task. But if we use a loop, it has to read in STOSB, and then it executes that task, then it has to read in LOOP, and then it executes that task. Then if necessary, it jumps back to the label and reads in STOSB again, and then LOOP again, and on and on, for 28 times in the above examples. Each memory access takes a little bit of time.

Now, all the 80x86 processors from the 286 and up have something called an instruction prefetch queue. To put it simply, the processor reads in more instructions whenever it has a little "spare time" (sometimes at the same time another instruction is executing), and it stores these "pre-fetched" instructions in a special buffer inside the processor. Then the processor can read the instructions out of the buffer, which is faster than a memory access. On the newer processors, the instruction prefetch queue works alongside a pipeline scheme, which lets the processor work on several instructions at once, as long as the instructions aren't competing for "resources", such as memory accesses, at the same time. (It's almost like multitasking, although on a very small scale.) To make a long story short (or is it too late?), when the processor comes across a jump instruction, or an instruction like LOOP that performs a jump, it has to empty out the instruction prefetch queue. This is called flushing the queue. When this happens, the processor has to stop everything and immediately access memory, so that it can read in the instruction at the new CS:IP location. The pipeline is also flushed so that execution can begin on the newly fetched instruction. In our loop version of our string-filling program, we're doing all this 27 times. (27 times, because the last LOOP doesn't require a jump, so no flushing is required.)

In contrast, REP STOSB is basically one single instruction. While the REP STOSB instruction is being executed, the processor can suck more instructions into its instruction prefetch queue (at least whenever the REP STOSB instruction isn't accessing memory), and it can start working on those new instructions by putting them into the pipeline.

Let's take the string-filling code we just looked at, and let's modify it so that it uses REP STOSW. REP STOSW lets us store two bytes at once, so it's a little faster than REP STOSB. Our Greeting string was 28 bytes long; that's 14 words.

    ; Setup for the REP STOSW instruction:
    MOV AX, DS                         ; (intermediate)
    MOV ES, AX                         ; Let ES = DS (remember, you can't
                                       ;  copy one segment register
                                       ;  directly to another)

    MOV DI, OFFSET Greeting            ; Now ES:DI points to the start of
                                       ;  the Greeting string

    MOV AL, 'X'                        ; Here's one byte of the word we're
                                       ;  going to store
    MOV AH, 'X'                        ; Here's the other byte

    MOV CX, 14                         ; Set up our counter to iterate
                                       ;  14 times

    CLD                                ; Make sure our string instructions
                                       ;  go forward in memory

    ; Now use REP STOSW to repeatedly store the word in AX...
    REP STOSW                          ; Store the word in AX to 14
                                       ;  consecutive words, starting at
                                       ;  ES:DI.  (ES:DI is modified.)

That will have the same effect as the REP STOSB version.

Remember little-endian storage? That's the name for storing binary numbers in a sort of backwards order in memory. For example, if we did this:

    ; Let ES:DI equal 1234:5678...
    MOV AX, 01234h
    MOV ES, AX
    MOV DI, 5678h

    ; And then store 0CAFEh at 1234:5678:
    MOV [ES:DI], 0CAFEh

Because of little-endian storage, the FE hex byte will get stored at 1234:5678, and the CA hex byte will get stored at 1234:5679.

REP STOSW also obeys the little-endian storage rule. So if you use REP STOSB with AX set to 3344 hex, the 44 hex will get stored at the byte pointed to by ES:DI, and 33 hex will get stored at [ES:DI + 1]; then 44 hex will get stored at [ES:DI + 2], and 33 hex will get stored at [ES:DI + 3], and so on.

MOVSx and REP MOVSx

MOVSx (Move String) is similar to STOSx. STOSx stores a single byte or word in a set of sequential memory locations, but MOVSx can be used to copy a "string" of bytes or words from one set of sequential memory locations to another set of sequential memory locations.

STOSx takes the value at AL or AX and stores it at the location specified by ES:DI, and then updates ES:DI (incremented or decremented by the appropriate number of bytes). But MOVSx reads the value at the location specified by DS:SI, and copies that value to the location specified by ES:DI. Then DS:SI and ES:DI are updated appropriately.

More specifically, for the single instruction MOVSB, here is what happens: The byte at the address pointed to by DS:SI is retrieved. This byte is then stored at the address pointed to by ES:DI. Then, if the direction flag (DF) is 0, both SI and DI are incremented by 1. Otherwise, if the direction flag is 1, both SI and DI are decremented by 1.

And for the single instruction MOVSW, here is what happens: The word at the address pointed to by DS:SI is retrieved. This word is then stored at the address pointed to be ES:DI. Then, if the direction flag (DF) is 0, both SI and DI are incremented by 2 (two, because a word equals two bytes). Otherwise, if the direction flag is 1, both SI and DI are decremented by 1.

MOVSx can be used as a single instruction, but it is more useful when used with the REP prefix. REP works the same as it did with STOSx -- it acts as a loop, using CX as a counter. You use REP MOVSx like this:

Ensure ES:DI and DS:SI point to your desired locations.
Let the counter, CX, equal the number of times you want the MOVSx instruction to be repeated. (For example, if you want to copy 30 bytes with MOVSB, place 30 in CX, or if you want to copy 20 words with MOVSW, place 20 in CX.)
If necessary, use CLD or STD to set the direction flag appropriately. (If DF is already set appropriately, there's no need to change it again.)
Specify either REP MOVSB or REP MOVSW.

Of course, steps 1 through 3 could be done in any order.

For example, let's copy a string from one location to another:

    DATASEG

StringA                           DB   "Every good boy deserves favor.$"
StringB                           DB   31 DUP (?)

    CODESEG
.
.
.
    ; Let DS:SI point to the "source string", StringA:
    ; (Since StringA is in the data segment, DS is already correct)
    MOV SI, OFFSET StringA

    ; Let ES:DI point to the "destination string", StringB:
    MOV AX, DS
    MOV ES, AX                         ; Let ES equal DS
    MOV DI, OFFSET StringB

    ; We want to copy 31 characters (the length of StringA is 31
    ;  characters):
    MOV CX, 31

    ; Ensure that string operations go forward in memory:
    CLD

    ; Copy StringA to StringB, byte by byte:
    REP MOVSB

That section of code will copy the contents of the string StringA to the space reserved by StringB. The 31 characters following StringB get overwritten with the 31 characters from StringA.

Just as with REP STOSx, if CX is zero, nothing happens.

LODSx

LODSx, for Load String, can be used to repeatedly load elements of a set of data into the accumulator. In other words, LODSB can be used to load a series of bytes into AL, and LODSW can be used to load a series of words into AX. Because of the way they work, LODSx instructions are usually used within "normal" loops (that is, loops constructed with LOOP or conditional-jump instructions). REP LODSB and REP LODSW are legal, but they are nonsensical -- they don't actually do anything other than consume time.

Here's what the LODSB instruction does:

The byte at the location pointed to by DS:SI is retrieved; this byte is stored in AL.
If the direction flag (DF) is 0, SI is incremented by 1; otherwise, if the direction flag is 1, SI is decremented by 1.

And here's what the LODSW instruction does:

The word at the location pointed to by DS:SI is retrieved; this word is stored in AX.
If the direction flag (DF) is 0, SI is incremented by 2; otherwise, if the direction flag is 1, SI is decremented by 2.

LODSB and LODSW are rather simple. They can be used in a loop so that some kind of processing can be performed on each of the bytes or words in a string.

As an example, let's write a complete program that encrypts a string using the Caesar cipher. In the Caesar cipher, letters are "shifted" by some constant. If the constant is 3, then the character "A" will become "D" (A -> B -> C -> D), the character "B" will become "E" (B -> C -> D -> E), the character "C" will become "F" (C -> D -> E -> F), and so on. Of course, this is easy to implement: we just take the ASCII code of a character, and add the constant, 3, to get the encrypted character.

Technically, the Caesar cipher states that only the capital letters A through Z exist in the alphabet, and wrap-around is expected to occur on this alphabet (eg. the character "Y" would become "B": Y -> Z -> A -> B) -- but let's be lazy and ignore this technicality. (Well, it will wrap around past ASCII 255 anyways.)

Here we go:

------ TEST14.ASM begins ------

%TITLE "Assembler Test Program 14 -- Caesar Cipher, using LODSB"

    IDEAL

    MODEL small
    STACK 256
    LOCALS

    DATASEG

PlainText                         DB   "Meet me at the park at midnight."
                                  DB   13, 10, "$" ; CR, LF, DOS end-of-str.
CipherText                        DB   32 DUP (?)
                                  DB   13, 10, "$" ; CR, LF, DOS end-of-str.
CaesarShiftConstant               EQU  3


    CODESEG
Start:
    ; Let DS point to the data segment, to make variables addressable:
    MOV AX, @data
    MOV DS, AX


    ; Let DS:SI point to the start of the PlainText string:
    ; (Since PlainText is in the data segment, DS is already correct)
    MOV SI, OFFSET PlainText

    ; Let ES:DI point to the start of CipherText:
    MOV AX, DS
    MOV ES, AX                         ; Let ES = DS
    MOV DI, OFFSET CipherText

    ; Ensure that string operations go forward in memory:
    CLD

    ; PlainText is 32 characters long (we don't want to convert the
    ;  control codes), and we want to operate on each character, so
    ;  let's use a loop that iterates 32 times:
    MOV CX, 32

@@CaesarLoop:
    ; Read in the next character in the string, and place it in AL:
    LODSB

    ; Operate on this character by adding the constant:
    ADD AL, CaesarShiftConstant

    ; Store the resulting character in the destination string
    ;  (CipherText), at the position specified by ES:DI:
    MOV [ES:DI], AL

    ; And increment DI... (note that SI was already incremented by LODSB)
    INC DI

    LOOP @@CaesarLoop

    ; Print out the CipherText string, using INT 21h, Service 9:
    MOV DX, OFFSET CipherText
    MOV AH, 9
    INT 21h

TerminateProgram:
    MOV AX, 04C00h
    INT 21h
END

------ TEST14.ASM ends ------

To decrypt an encrypted string, just change the program to subtract instead of add (or, even easier: use a negative constant, like -3). Also, for a decrypting version of the program, the names PlainText and CipherText should be swapped.

(Needless to say, don't use the Caesar cipher to protect any really important messages. It's easy to break the code using pencil and paper!)

Note that there is no requirement to use ES:DI in the loop containing LODSx. For example, if you wanted to add a list of word-sized values, using a loop with LODSW in it, there would be no need to ever use ES:DI.

CMPSx and REPE/REPZ CMPSx and REPNE/REPNZ CMPSx

CMPSx, or Compare String, can be used to compare two strings in memory. A CMPSB or CMPSW instruction can be used alone, or it can be used with one of the prefixes REPE, REPZ, REPNE, or REPNZ.

The individual instruction CMPSB does this:

The byte at DS:SI is retrieved -- this is the first byte.
The byte at ES:DI is retrieved -- this is the second byte.
The second byte is subtracted from the first byte; the result is discarded, but the flags resulting from the subtraction are saved.
If the direction flag (DF) is 0, SI and DI are incremented by 1; otherwise, if the direction flag is 1, SI and DI are decremented by 1.

The individual instruction CMPSW does this:

The word at DS:SI is retrieved -- this is the first word.
The word at ES:DI is retrieved -- this is the second word.
The second word is subtracted from the first word; the result is discarded, but the flags resulting from the subtraction are saved.
If the direction flag (DF) is 0, SI and DI are incremented by 2; otherwise, if the direction flag is 1, SI and DI are decremented by 2.

Remember how instructions such as SUB set or clear certain flags depending on what happens during the operation -- for example, if the result turns out to be zero, then the zero flag (ZF) is set, and if the result is non-zero, then ZF is cleared. Remember also that the CMP instruction was basically the same as SUB, except CMP "throws away" the result and only modifies the flags. Well, CMPSB and CMPSW are just like CMP! In fact, CMPSB is basically the same as:

    MOV AL, [ES:DI]
    CMP [DS:SI], AL
    (then update SI and DI)

And CMPSW is essentially the same as:

    MOV AX, [ES:DI]
    CMP [DS:SI], AX
    (then update SI and DI)

But the CMPSx instructions are more useful when combined with one of the repeat prefixes REPE, REPZ, REPNE, or REPNZ. Actually, REPE and REPZ are identical, and REPNE and REPNZ and identical. The duplicate names are provided for your convenience (or frustration!).

REPE and REPZ are like REP in that they use CX as a counter to determine the number of times the loop should iterate. But REPE and REPZ consider the initial value in CX to be the maximum number of iterations. That's because REPE and REPZ also perform another test during each iteration: If the zero flag (ZF) equals zero, the loop is immediately stopped.

REPNE and REPNZ are somewhat opposite to REPE and REPZ. REPNE and REPNZ both use the initial value of CX as the maximum number of iterations for the loop, just like as REPE and REPZ. But REPNE and REPNZ do the following second test instead: If the zero flag (ZF) equals 1, then the loop is immediately stopped.

I know this all sounds totally useless, but it is indeed useful for something! These can be used to compare two strings of equal lengths. We can determine if the two strings are equal, or if one is greater or less than the other. How do we do this? Well, we can use a repeat-prefix with a CMPSx instruction, and then afterwards, we can test the zero flag (or check CX) to see which condition terminated the loop.

REPE CMPSB (and REPZ CMPSB) will keep going until it finds a mismatch between two corresponding characters in the two strings (or until CX runs out). So you can use REPE CMPSB or REPZ CMPSB to scan two strings until they are found to be inequal:

Test the zero flag (using JZ or JNZ) immediately after the REPE/REPZ CMPSB instruction.
If the zero flag is clear, then that means that the result of the subtraction was non-zero, and that means that a mismatch was found. DS:SI and ES:DI will point to the characters immediately following (or preceding, depending on the direction flag) the mismatched characters. (Note that only the first mismatch is spotted.) Note that you can check the sign flag (using JS or JNS) -- if the sign flag is set, then the ES:DI string is greater than the DS:SI string; if the sign flag is clear, then the DS:SI string is greater than the ES:DI string.
If the zero flag is set (and/or if CX equals zero), then the two strings must be equal.

Of course, REPE CMPSW and REPZ CMPSW will do the same, except using words instead of characters.

REPNE CMPSB (and REPNZ CMPSB) will keep going until it finds a pair of characters in the two strings that are equal. So these instructions will search two strings (or lists) until a matching pair is found:

Test the zero flag (using JZ or JNZ) immediately after the REPNE/REPNZ CMPSB instruction.
If the zero flag is clear, the first matching pair was found. DS:SI and ES:DI will point to the characters following (or preceding, depending on the direction flag) the matches.
If the zero flag is set (and/or if CX equals zero), both strings were scanned and no matching characters were found.

Again, REPNE CMPSW and REPNZ CMPSW will do the same thing, except using words instead of characters.

Let's try an example.

------ TEST15.ASM begins ------

%TITLE "Assembler Test Program 15: Comparing two equal-length strings"

    IDEAL

    MODEL small
    STACK 256
    LOCALS

    DATASEG

; Change these strings to test the different cases:
StringA                           DB   "May the forks be with you."
StringB                           DB   "May the forts be with you."

; Messages:
Msg_StringsAreEqual               DB   "The strings are equal.", 13, 10, '$'
Msg_StringAIsGreater              DB   "String A is greater.", 13, 10, '$'
Msg_StringBIsGreater              DB   "String B is greater.", 13, 10, '$'


    CODESEG
Start:
    ; Let DS point to the data segment, so that variables are addressable:
    MOV AX, @data
    MOV DS, AX
    

    ; Let DS:SI point to the first string:
    ; (StringA is in the data segment, so DS is already correct.)
    MOV SI, OFFSET StringA

    ; Let ES:DI point to the second string:
    MOV AX, DS
    MOV ES, AX                         ; Let ES equal DS
    MOV DI, OFFSET StringB

    ; Ensure that string operations go forward in memory:
    CLD

    ; The maximum number of characters to check is 26 (the maximum length
    ;  of the two strings):
    MOV CX, 26

    ; Compare the two strings:
    REPE CMPSB

    ; Now, what happened?
    JZ @@StringsAreEqual               ; If ZF = 1, then strings are equal
    JMP @@StringsAreInequal            ; Otherwise, jump...

@@StringsAreEqual:
    ; Display a message using INT 21h, Service 9:
    MOV DX, OFFSET Msg_StringsAreEqual
    MOV AH, 9
    INT 21h
    JMP @@Bypass

@@StringsAreInequal:
    ; Which is greater, StringA or StringB?
    JS @@BIsGreater

    ; Say that StringA is greater, using INT 21h, Service 9:
    MOV DX, OFFSET Msg_StringAIsGreater
    MOV AH, 9
    INT 21h
    JMP @@Bypass

@@BIsGreater:
    MOV DX, OFFSET Msg_StringBIsGreater
    MOV AH, 9
    INT 21h

@@Bypass:
    ; Nothing else to do...

@@TerminateProgram:
    MOV AX, 04C00h
    INT 21h
END

------ TEST15.ASM ends ------

SCASx and REPE/REPZ SCASx and REPNE/REPNZ SCASx

SCASx, for Scan String, can be used to search for a particular value in a list of bytes or words. You can use SCASB to search for a particular character in a string, for example.

SCASB will search the string at ES:DI, looking for the character stored in AL. SCASW will search the string of words starting at ES:DI, looking for the word in AX. (The choice of ES:DI is puzzling, as the other string instructions generally read from DS:SI instead...)

The single instruction SCASB does this:

The byte at the location specified by ES:DI is retrieved.
This byte is subtracted from AL, but the result is discarded. The flags are saved, however.
If the direction flag (DF) is 0, DI is incremented by 1; otherwise, if the direction flag is 1, DI is decremented by 1.

And the single instruction SCASW does this:

The word at the location specified by ES:DI is retrieved.
This word is subtracted from AX, but the result is discarded. The flags are saved, however.
If the direction flag (DF) is 0, DI is incremented by 2; otherwise, if the direction flag is 1, DI is decremented by 2.

As you might expect, SCASB and SCASW are more useful when combined with repeat prefixes. Just as with CMPSx, we can use REPE, REPZ, REPNE, and REPNZ with SCASx. (Recall again that REPE and REPZ are identical, and REPNE and REPNZ are identical.)

You can use REPNE SCASB or REPNZ SCASB to search a string to find the first occurrence of the character specified in AL. We can test the zero flag (ZF) after the REPNE/REPNZ SCASB instruction to determine whether a matching character was found, or whether the string was searched and no matching character was found.

Test the zero flag (using JZ or JNZ) immediately after the REPNE/REPNZ SCASB instruction.
If the zero flag is set, then the character has been found in the string. ES:DI will point to the character immediately following (or preceding, depending on the direction flag) the matching character.
If the zero flag is clear (and/or CX equals zero), then the entire string has been searched and the character in AL was not found.

As you would expect, REPNE SCASW and REPNZ SCASW are the same as the above, except that they operate on words instead of bytes.

You can use REPE SCASB or REPZ SCASB to search a string to find the first character that doesn't match the character specified in AL:

Test the zero flag (using JZ or JNZ) immediately after the REPE/REPZ SCASB instruction.
If the zero flag is clear, then a character has been found in the string that doesn't match the character in AL. ES:DI will point to the character immediately following (or preceding, depending on the direction flag) the character that caused the loop to stop.
If the zero flag is set (and/or CX equals zero), then the entire string has been searched and the string consists entirely of the character in AL.

And again, REPE SCASW and REPZ SCASW are the same as the above, except that they operate on words instead of bytes.

Let's write a quick program that searches a string and reports whether or not a particular letter appears in that string:

------ TEST16.ASM begins ------

%TITLE "Assembler Test Program 16: Searching a string for a character"

    IDEAL

    MODEL small
    STACK 256
    LOCALS

    DATASEG

SearchString                      DB   "London Paris Rome Berlin Madrid"
CharacterToFind                   EQU  'W'

; Messages:
Msg_CharExistsInString            DB   "Yes, that character exists in the "
                                  DB   "string.", 13, 10, '$'
Msg_CharNotFound                  DB   "No, that character was not found "
                                  DB   "in the string.", 13, 10, '$'


    CODESEG
Start:
    ; Let DS point to the data segment, so that variables are addressable:
    MOV AX, @data
    MOV DS, AX
    

    ; Let ES:DI point to the string to search:
    MOV AX, DS
    MOV ES, AX                         ; Let ES equal DS
    MOV DI, OFFSET SearchString

    ; Ensure that string operations go forward in memory:
    CLD

    ; The maximum number of characters to check is 31 (the maximum length
    ;  of the string):
    MOV CX, 31

    ; The character we're looking for is...
    MOV AL, CharacterToFind

    ; Scan the two strings:
    REPNE SCASB

    ; Now, what happened?
    JZ @@CharFound                  ; If ZF = 1, the character was found
                                    ; (Otherwise, fall through)

@@CharNotFound:
    ; Display the not-found message using INT 21h, Service 9:
    MOV DX, OFFSET Msg_CharNotFound
    MOV AH, 9
    INT 21h
    JMP @@Bypass

@@CharFound:
    ; Display the character-was-found message using INT 21h, Service 9:
    MOV DX, OFFSET Msg_CharExistsInString
    MOV AH, 9
    INT 21h

@@Bypass:
    ; Nothing else to do...

@@TerminateProgram:
    MOV AX, 04C00h
    INT 21h
END

------ TEST16.ASM ends ------

Experiment by changing the SearchString and CharacterToFind and verifying the results.

Using string instructions with Mode 13h

I congratulate you for having the patience to read through all this incredibly boring text. Let's try using one of the string instructions with Mode 13h programming.

To clear the screen, we would traditionally set up a loop that would go through all the pixels on the display and set them to a single color. Since there are 64000 pixels in the Mode 13h display, (320 * 200 = 64000), we might set CX to 64000 and use the LOOP construct, so that 64000 consecutive pixels are plotted.

Then we could optimize that a little by using words instead of bytes. Instead of writing 64000 bytes, we could write 32000 words. The word would consist of two bytes, each of which would equal the color to clear the screen with.

Now, we can use the REP STOSB or REP STOSW instructions to do the same task, but faster. So let's use REP STOSW to clear the screen:

Imagine we're in Mode 13h already. Then we would want to set up ES:DI to point to the start of video memory (that is, A000:0000), which we could do like this:

    MOV AX, 0A000h                     ; Or use a constant like VideoSegment
    MOV ES, AX                         ; ES = 0A000h
    XOR DI, DI                         ; DI = 0000h

Now, let's clear the screen using color number 2, which is normally green (unless you've played with the palette). If we were using REP STOSB, we'd simply put 2 in AL. But we're using REP STOSW, so we'll put the number 2 in AL and AH:

    MOV AX, 0202

(Note that putting different values in AL and AH will produce a vertical stripe effect -- try it!)

Then we should set up the counter using the CX register. We're using words to fill the screen, so, as described previously, we'll need to count to 32000:

    MOV CX, 32000

Now we're ready to go:

    REP STOSW

There! Assuming you're in Mode 13h, that should clear the screen in green. And it should be a good deal faster than using LOOP.

Here's that code in an actual program. I stole the Mode 13h program from the previous tutorial and added a ClearScreen procedure, and then I put in some new main-program code to test the new procedure (the program will clear the screen using 16 different colors, waiting for a keypress between each one):

------- TEST17.ASM begins -------

%TITLE "Assembler Test Program 17 -- Clearing Mode 13h screen w/ REP MOVSW"

    IDEAL

    MODEL small
    STACK 256
    LOCALS
    
    DATASEG

VideoSegment                      EQU  0A000h
Mode13h_ScreenWidth               EQU  320
Mode13h_ScreenHeight              EQU  200


    CODESEG

; (PutPixel macro removed -- it's not used in this program)

; -------------------------------------------------------------------------
; Quick utility macro:

MACRO WaitForKeypress
    PUSH AX

    ; Wait for a keypress (and discard it), using INT 21h, Service 8:
    MOV AH, 8
    INT 21h

    POP AX
ENDM
; -------------------------------------------------------------------------


Start:
    ; Make data segment variables addressable:
    MOV AX, @data
    MOV DS, AX

    CALL SetMode13h


    ; Count down from 15:
    MOV CX, 15

@@BigColorLoop:
    ; Use the current value of CL as the color number (but we can't push
    ;  a byte on the stack, so we'll use the whole word, CX):
    PUSH CX                            ; Parameter: color number
    CALL ClearScreen
    WaitForKeypress

    LOOP @@BigColorLoop


    ; Clear the screen for the last time; CX should be 0, which is black:
    PUSH CX
    CALL ClearScreen
    WaitForKeypress

    CALL SetTextMode

    ; Terminate program:
    MOV AX, 04C00h
    INT 21h

; -------------------------------------------------------------------------


PROC SetMode13h
    PUSH AX

    ; Use INT 10h, Service 0 to set the screen mode to Mode 13h:
    MOV AH, 0
    MOV AL, 13h
    INT 10h

    POP AX
    RET
ENDP


PROC SetTextMode
    PUSH AX

    ; Use INT 10h, Service 0 to set the screen mode to text mode (Mode 3):
    MOV AH, 0
    MOV AL, 3
    INT 10h

    POP AX
    RET
ENDP


; -------------------------------------------------------------------------
; ClearScreen
; -------------------------------------------------------------------------
; Desc: Clears the Mode 13h screen with a specified color.
;  Pre: Before calling this procedure, push onto the stack the
;       byte-sized color number to use.
; Post: The Mode 13h screen is cleared using the color parameter.
; -------------------------------------------------------------------------
PROC ClearScreen
    ARG @@Color:BYTE = @@ArgBytesUsed

    PUSH BP                            ; Save BP
    MOV BP, SP                         ; Allow parameters to be addressed

    ; Save affected registers and flags:
    PUSH AX
    PUSH CX
    PUSH ES
    PUSH DI
    PUSHF

    ; Let ES:DI point to the start of the video segment (ie. A000:0000):
    MOV AX, VideoSegment
    MOV ES, AX                         ; ES = VideoSegment
    XOR DI, DI                         ; DI = 0000h

    ; Load the color parameter into AL and AH:
    MOV AL, [@@Color]
    MOV AH, [@@Color]

    ; Let CX equal what should be 32000:
    MOV CX, (Mode13h_ScreenWidth * Mode13h_ScreenHeight / 2)
    ; (Yes, that's basically a constant on the right (it always
    ;  evaluates to the same thing), so it's permitted)

    ; Fill the screen with the specified color:
    REP STOSW
    
    ; Restore affected registers and flags:
    POPF
    POP DI
    POP ES
    POP CX
    POP AX

    POP BP                             ; Save BP
    RET @@ArgBytesUsed
ENDP
; -------------------------------------------------------------------------

END

------- TEST17.ASM ends -------

That's one really good use of REP STOSW. Can any of the other string instructions be used for graphics programming?

Well, the REP MOVSx instruction can be used to copy a chunk of graphics data from one location to another -- we could use REP MOVSB or REP MOVSW in a sprite-drawing routine. And when we learn about Mode X later, we'll see how REP MOVSx can be used effectively for page flipping (a great animation technique). In fact, you can simulate page flipping in Mode 13h, and REP MOVSx would be a great way to handle it -- but I'm not going to bother dealing with it here (Mode X is better!).

How about the other string instructions? Admittedly, they're perhaps not as useful. I could see myself perhaps using LODSx to handle transparent pixels when drawing sprites -- LODSx could be used to retrieve more pixel data, and once a pixel's color is in the accumulator (AL or AX), it could be compared easily... And I'm sure CMPSx and SCASx could be used for some clever graphics manipulation, somehow.

Summary

We've finally completed the dullest tutorial yet! In this chapter, we learned about the string instructions STOSx, MOVSx, LODSx, CMPSx, and STOSx, and the repeat prefixes REP, REPE, REPZ, REPNE, and REPNZ. Then we saw an example of how string instructions can be used in Mode 13h graphics programming.

The next tutorial will cover all the painful details involved in combining assembler code with C/C++ code.