[Chapter Fifteen][Previous]
[Next] [Art of
Assembly][Randall Hyde]
Art of Assembly: Chapter Fifteen
- 15.2 - Character Strings
- 15.2.1 - Types of Strings
- 15.2.2 - String Assignment
- 15.2.3 - String Comparison
15.2 Character Strings
Since you'll encounter character strings more often than other types
of strings, they deserve special attention. The following sections describe
character strings and various types of string operations.
15.2.1 Types of Strings
At the most basic level, the 80x86's string instruction only operate
upon arrays of characters. However, since most string data types contain
an array of characters as a component, the 80x86's string instructions are
handy for manipulating that portion of the string.
Probably the biggest difference between a character string and an array
of characters is the length attribute. An array of characters contains a
fixed number of characters. Never any more, never any less. A character
string, however, has a dynamic run-time length, that is, the number of characters
contained in the string at some point in the program. Character strings,
unlike arrays of characters, have the ability to change their size during
execution (within certain limits, of course).
To complicate things even more, there are two generic types of strings:
statically allocated strings and dynamically allocated strings. Statically
allocated strings are given a fixed, maximum length at program creation
time. The length of the string may vary at run-time, but only between zero
and this maximum length. Most systems allocate and deallocate dynamically
allocated strings in a memory pool when using strings. Such strings may
be any length (up to some reasonable maximum value). Accessing such strings
is less efficient than accessing statically allocated strings. Furthermore,
garbage collection[5] may take additional time.
Nevertheless, dynamically allocated strings are much more space efficient
than statically allocated strings and, in some instances, accessing dynamically
allocated strings is faster as well. Most of the examples in this chapter
will use statically allocated strings.
A string with a dynamic length needs some way of keeping track of this length.
While there are several possible ways to represent string lengths, the two
most popular are length-prefixed strings and zero-terminated strings. A
length-prefixed string consists of a single byte or word that contains the
length of that string. Immediately following this length value, are the
characters that make up the string. Assuming the use of byte prefix lengths,
you could define the string "HELLO" as follows:
HelloStr byte 5,"HELLO"
Length-prefixed strings are often called Pascal strings since this is the
type of string variable supported by most versions of Pascal[6].
Another popular way to specify string lengths is to use zero-terminated
strings. A zero-terminated string consists of a string of characters terminated
with a zero byte. These types of strings are often called C-strings since
they are the type used by the C/C++ programming language. The UCR Standard
Library, since it mimics the C standard library, also uses zero-terminated
strings.
Pascal strings are much better than C/C++ strings for several reasons. First,
computing the length of a Pascal string is trivial. You need only fetch
the first byte (or word) of the string and you've got the length of the
string. Computing the length of a C/C++ string is considerably less efficient.
You must scan the entire string (e.g., using the scasb
instruction)
for a zero byte. If the C/C++ string is long, this can take a long time.
Furthermore, C/C++ strings cannot contain the NULL character. On the other
hand, C/C++ strings can be any length, yet require only a single extra byte
of overhead. Pascal strings, however, can be no longer than 255 characters
when using only a single length byte. For strings longer than 255 bytes,
you'll need two bytes to hold the length for a Pascal string. Since most
strings are less than 256 characters in length, this isn't much of a disadvantage.
An advantage of zero-terminated strings is that they are easy to use in
an assembly language program. This is particularly true of strings that
are so long they require multiple source code lines in your assembly language
programs. Counting up every character in a string is so tedious that it's
not even worth considering. However, you can write a macro which will easily
build Pascal strings for you:
PString macro String
local StringLength, StringStart
byte StringLength
StringStart byte String
StringLength = $-StringStart
endm
.
.
.
PString "This string has a length prefix"
As long as the string fits entirely on one source line, you can use this
macro to generate Pascal style strings.
Common string functions like concatenation, length, substring, index, and
others are much easier to write when using length-prefixed strings. So we'll
use Pascal strings unless otherwise noted. Furthermore, the UCR Standard
library provides a large number of C/C++ string functions, so there is no
need to replicate those functions here.
15.2.2 String Assignment
You can easily assign one string to another using the movsb
instruction.
For example, if you want to assign the length-prefixed string String1
to String2
, use the following:
; Presumably, ES and DS are set up already
lea si, String1
lea di, String2
mov ch, 0 ;Extend len to 16 bits.
mov cl, String1 ;Get string length.
inc cx ;Include length byte.
rep movsb
This code increments cx
by one before executing movsb
because the length byte contains the length of the string exclusive
of the length byte itself.
Generally, string variables can be initialized to constants by using the
PString
macro described earlier. However, if you need to set
a string variable to some constant value, you can write a StrAssign
subroutine which assigns the string immediately following the call
.
The following procedure does exactly that:
include stdlib.a
includelib stdlib.lib
cseg segment para public 'code'
assume cs:cseg, ds:dseg, es:dseg, ss:sseg
; String assignment procedure
MainPgm proc far
mov ax, seg dseg
mov ds, ax
mov es, ax
lea di, ToString
call StrAssign
byte "This is an example of how the "
byte "StrAssign routine is used",0
nop
ExitPgm
MainPgm endp
StrAssign proc near
push bp
mov bp, sp
pushf
push ds
push si
push di
push cx
push ax
push di ;Save again for use later.
push es
cld
; Get the address of the source string
mov ax, cs
mov es, ax
mov di, 2[bp] ;Get return address.
mov cx, 0ffffh ;Scan for as long as it takes.
mov al, 0 ;Scan for a zero.
repne scasb ;Compute the length of string.
neg cx ;Convert length to a positive #.
dec cx ;Because we started with -1, not 0.
dec cx ;skip zero terminating byte.
; Now copy the strings
pop es ;Get destination segment.
pop di ;Get destination address.
mov al, cl ;Store length byte.
stosb
; Now copy the source string.
mov ax, cs
mov ds, ax
mov si, 2[bp]
rep movsb
; Update the return address and leave:
inc si ;Skip over zero byte.
mov 2[bp], si
pop ax
pop cx
pop di
pop si
pop ds
popf
pop bp
ret
StrAssign endp
cseg ends
dseg segment para public 'data'
ToString byte 255 dup (0)
dseg ends
sseg segment para stack 'stack'
word 256 dup (?)
sseg ends
end MainPgm
This code uses the scas
instruction to determine the length
of the string immediately following the call
instruction. Once
the code determines the length, it stores this length into the first byte
of the destination string and then copies the text following the call
to the string variable. After copying the string, this code adjusts
the return address so that it points just beyond the zero terminating byte.
Then the procedure returns control to the caller.
Of course, this string assignment procedure isn't very efficient, but it's
very easy to use. Setting up es:di
is all that you need to
do to use this procedure. If you need fast string assignment, simply use
the movs
instruction as follows:
; Presumably, DS and ES have already been set up.
lea si, SourceString
lea di, DestString
mov cx, LengthSource
rep movsb
.
.
.
SourceString byte LengthSource-1
byte "This is an example of how the "
byte "StrAssign routine is used"
LengthSource = $-SourceString
DestString byte 256 dup (?)
Using in-line instructions requires considerably more setup (and typing!),
but it is much faster than the StrAssign
procedure. If you
don't like the typing, you can always write a macro to do the string assignment
for you.
15.2.3 String Comparison
Comparing two character strings was already beaten to death in the section
on the cmps
instruction. Other than providing some concrete
examples, there is no reason to consider this subject any further.
Note: all the following examples assume that es
and ds
are pointing at the proper segments containing the destination and
source strings.
Comparing Str1
to Str2
:
lea si, Str1
lea di, Str2
; Get the minimum length of the two strings.
mov al, Str1
mov cl, al
cmp al, Str2
jb CmpStrs
mov cl, Str2
; Compare the two strings.
CmpStrs: mov ch, 0
cld
repe cmpsb
jne StrsNotEqual
; If CMPS thinks they're equal, compare their lengths
; just to be sure.
cmp al, Str2
StrsNotEqual:
At label StrsNotEqual
, the flags will contain all the pertinent
information about the ranking of these two strings. You can use the conditional
jump instructions to test the result of this comparison.
[5] Reclaiming unused storage.
[6] At least those versions of Pascal which support strings.
- 15.2 - Character Strings
- 15.2.1 - Types of Strings
- 15.2.2 - String Assignment
- 15.2.3 - String Comparison
Art of Assembly: Chapter Fifteen - 28 SEP 1996
[Chapter Fifteen][Previous]
[Next] [Art of
Assembly][Randall Hyde]