Assembly Language Reference

Compiled by Dr. ME!

LDS

LDS Load Pointer using DS
LDS des-reg, source
Logic: DS <- (source + 2)
       dest-reg <- (source)

LDS loads into two registers the 32-bit pointer variable found in memory at source.
LDS stores the segment value (the higher order word of source) in DS and the offset
value (the lower-order word of source) in the destination register. The destination
register may be any 16-bit general register (that is, all registers except segment
registers). LES, Load Pointer Using ES, is a comparable instruction that loads the
ES register rather than the DS register.

Example:

var1 dd 25,00,40,20
..
..

 Before LDS 

     DX = 0000
     DS = 11F5

LDS DX,var1

 After LDS 

     DX = 0025
     DS = 2040

LES

LES Load Pointer using ES
LES des-reg, source
Logic: ES <- (source)
       dest-reg <- (source + 2)

LES loads into two registers the 32-bit pointer variable found in memory at source.
LES stores the segment value (the higher order word of source) in ES and the offset
value (the lower-order word of source) in the destination register. The destination
register may be any 16-bit general register (that is, all registers except segment
registers). LDS, Load Pointer Using DS, is a comparable instruction that loads the
DS register rather than the ES register.

LODS

LODS source_string
Logic:   Accumulator <- (ds:si)
         if df = 0  si <- si+n       ; n = 1 for byte
         else       si <- si-n       ; n = 2 for word

LODS (load from string) moves a byte or word from DS:[si] to AL or AX, and 
increments (or decrements) SI depending on the setting of DF, the direction flag
(by 1 for bytes and by 2 for words).

You may use CS:[si], SS:[si] or ES:[si]. This performs the same action (except for 
changing SI) as:

                 mov  ax, DS:[SI]              ; or AL for bytes

The allowable forms are:

                 lodsb
                 lodsw
                 lods BYTE PTR SS:[si]         ; or CS:[si], DS:[si], ES:[si]
                 lods WORD PTR SS:[si]         ; or CS:[si], DS:[si], ES:[si]


Note this instruction is always translated by the compiler into LODSB, 
Load String Byte, or LODSW, Load String Word, depending on whether source_string
refers to a string of bytes or words. In either case, however, you must explicitly
load the SI register with the offset of the string.

LODSB

Load String Byte
LODSB
Logic:   al <- (ds:si)
         if df = 0  si <- si+1
         else       si <- si-1

LODSB transfers the byte pointed to by DS:SI into AL register and increments or 
decrements SI (depending on the state of the Direction Flag) to point to the next
byte of the string.

LODSW

Load String Word
LODSW
Logic:   ax <- (ds:si)
         if df = 0  si <- si+2
         else       si <- si-2

LODSW transfers the word pointed to by DS:SI into AX register and increments or 
decrements SI (depending on the state of the Direction Flag) to point to the next
word of the string.

Example:

NAME DW 'ALA'
     CLD
     LEA SI,NAME
LODSW

The first word of NAME will be transferred to rigister AX.

These instructions as well as LODS can use REP/REPE/REPNE/REPZ/REPNZ to move several 
bytes or words

STOS

STOS (store to string) moves a byte (or a word) from AL (or AX) to ES:[di], and 
increments (or decrements) DI depending on the setting of DF, the direction flag
(by 1 for bytes and by 2 for words). NO OVERRIDES ARE ALLOWED. This performs the 
same action (except for changing DI) as:

                 mov  ES:[DI], ax              ; or AL for bytes

The allowable forms are:

                 stosb
                 stosw
                 stos BYTE PTR ES:[di]         ; no override allowed
                 stos WORD PTR ES:[di]         ; no override allowed

SCAS

 SCAS compares AL (or AX) to the byte (or word) pointed to by ES:[di], and 
increments (or decrements) DI depending on the setting of DF, the direction flag
(by 1 for bytes and by 2 for words). NO OVERRIDES ARE ALLOWED. This sets the flags 
the same way as:

                 cmp  ax, ES:[DI]              ; or AL for bytes

The allowable forms are:

                 scasb
                 scasw
                 scas BYTE PTR ES:[di]         ; no override allowed
                 scas WORD PTR ES:[di]         ; no override allowed

SET

SET destination
Logic: If condition, then destination <- 1
       else destination <- 0

The SET instructions set the destination byte to 1 if the specified condition is true;
0 otherwise. Here are the SET instructions and the condition they use:

SET Instruction           Flags             Explanation

SETB/SETNAE               CF = 1            Set if Below/Not Above or Equal

SETAE/SETNB               CF = 0            Set if Above or Equal/Not Below

SETBE/SETNA               CF = 1 or         Set if Below or Equal/Not Above
                          ZF = 1

SETA/SETNBE               CF = 0 and        Set if Above/Not Below or Equal
                          ZF = 0

SETE/SETZ                 ZF = 1            Set if Equal/Zero

SETNE/SETNZ               ZF = 0            Set if Not Equal/Not Zero

SETL/SETNGE               SF <> OF          Set if Less/Not Greater or Equal

SETGE/SETNL               SF = OF           Set if Greater or Equal/Not Less

SETLE/SETNG               ZF = 1 or         Set if Less or Equal/Not Greater
                          SF <> OF

SETG/SETNLE               ZF = 0 or
                          SF = OF           Set if Greater/Not Less or Equal

SETS                      SF = 1            Set if Sign

SETNS                     SF = 0            Set if No Sign    

SETC                      CF = 1            Set if Carry

SETNC                     CF = 0            Set if No Carry

SETO                      OF = 1            Set if Overflow

SETNO                     OF = 0            Set if No Overflow

SETP/SETPE                PF = 1            Set if Parity/Parity Even

SETNP/SETPO               PF = 0            Set if No Parity/Parity Odd

destination can be either a byte-long register or memory location.



MOVS
MOVS moves a byte (or a word) from DS:[si] to ES:[di], and increments 
(or decrements) SI and DI, depending on the setting of DF, the direction flag
(by 1 for bytes and by 2 for words). You may use CS:[si], SS:[si] or ES:[si], but 
you MAY NOT OVERRIDE ES:[di]. Though the following is not a legal instruction, it
signifies the equivalent action to MOVS (not including changing DI and SI):

                 mov  WORD PTR ES:[DI], DS:[SI]     ; or BYTE PTR for bytes

The allowable forms are:

                 movsb
                 movsw
                 movs BYTE PTR ES:[di], SS:[si]     ;or CS, DS, ES:[si]
                 movs WORD PTR ES:[di], SS:[si]     ;or CS, DS, ES:[si]



CMPS
CMPS Compare String (Byte or Word)
CMPS destination-string, source-string
Logic: CMP (DS:SI),(ES:DI)  ; sets flags only

   if DF=0
     SI <- SI + n   ; n = 1 for byte, 2 for word.
     DI <- DI + n
   else
     SI <- SI - n
     DI <- DI - n

This instruction compares two values by subtracting the byte or word pointed to by
ES:DI, from the byte or word pointed to by DS:SI, and sets the flags according to
the result of comparison. The operands themselves are not altered. After the 
comparison, SI and DI are incremented (if the Direction Flag is cleared) or 
decremented (if the Direction Flag is set), in preparation for comparing the next
element of the string.

This instruction is always translated by the assembler into CMPSB, Compare String
Byte, or CMPSW, Compare String Word, depending on whether source refers to a string
of bytes or words. In either case, you must explicitly load the SI and DI registers
with the offset of the source and destination strings.

You may use CS:[si], SS:[si] or ES:[si], but you MAY NOT OVERRIDE ES:[di]. Although 
the following is not a legal action, it signifies the equivalent action to CMPS (not 
including changing DI and SI):

                 cmp  WORD PTR DS:[SI], ES:[DI]     ; or BYTE PTR for bytes

The allowable forms are:

                 cmpsb
                 cmpsw
                 cmps BYTE PTR SS:[si], ES:[di]     ;or CS, DS, ES:[si]
                 cmps WORD PTR SS:[si], ES:[di]     ;or CS, DS, ES:[si]




CMP
CMP Compare
CMP destination, source

Logic:  Flags set according to result of (destination - source)

CMP compares two numbers by subtracting the source from the destination and updates
the flags. CMP does not change the source or destination. The operands may be bytes
or words.

Compare in Key Generating Routines

Registers are divided into higher and lower registers. for example: eax is divided 
into eah eal ah al (h=high, l=low) which looks like: 

76 54 32  10 : Byte No. Each of the four (eah,eal,ah,al) represents one byte.     
                                                     (total:4 bytes = 32 bit) 
|   | |    | 
eah | ah   | 
      eal al 

So if there´s a compare ah,byteptr[exc] the ByteNo 3&2 are compared with the first 
two bytes of ecx (0&1) 

Let´s look at the numbers to understand the whole thing a bit better. I take a 
fictional input like 123456 and the real serial 987654. 

eax: 3938 3736 (9876)   
ecx: 3132 3334 (1234) 
cmp al,byte ptr [ecx]    ;compares 36 with 34 
cmp ah,byte ptr [ecx+01] ;compares 37 with 33 
shr eax,10               ;this prepares the next two numbers in ah,al 
                         ;shr 39383736,10 ------> 0000 3938 
cmp al, byte prt[ecx+02] ;compares now (after the shift right) 38 with 32 
cmp ah, byte ptr[ecx+03] ;compares now (after the shift right) 39 with 31 
..
..
add ecx, 00000004         ;get next 4 numbers from input 
add edx, 00000004         ;get next 4 numbers from real serial 

;"4" is added to both registers. This is obvious because after compering 4 
;characters we have to get the next ones by "shifting" the compared 4 away. why do 
;we add 4 and not 10? With the help of one register we are able to compare 4 
;charaters because one char needs 1 byte and one register has 4 Bytes. 



REP/REPE/REPNE
The string instructions may be prefixed by REP/REPE/REPNE which will repeat the 
instructions according to the following conditions:

                 rep       decrement cx ; repeat if cx is not zero
                 repe      decrement cx ; repeat if cx not zero AND zf = 1
                 repz      decrement cx ; repeat if cx not zero AND zf = 1
                 repne     decrement cx ; repeat if cx not zero AND zf = 0
                 repnz     decrement cx ; repeat if cx not zero AND zf = 0

Here, 'e' stands for equal, 'z' is zero and 'n' is not. These repeat instructions 
should NEVER be used with a segment override, since the 8086 will forget the 
override if a hardware interrupt occurs in the middle of the REP loop. 



FLAGS
SF shows '+' for a positive number. PF shows 'O,' for odd parity. Every time you 
perform an arithmetic or logical operation, the 8086 checks parity. Parity is 
whether the number contains an even or odd number of 1 bits. If a number contains 3 
'1' bits, the parity is odd. Possible settings are 'E' for even and 'O' for odd. SAL 
checks for parity.  

For (1110 0000) SF is now '-'. OF, the overflow flag is set because you changed the 
number from positive to negative (from +112 to -32). OF is set if the high bit 
changes. What is the unsigned number now? 224. CF is set if a '1' bit moves off the 
end of the register to the other side. CF is cleared. PF is '0'. Change the number 
to (1100 0000). OF is cleared because you didn't change signs. (Remember, the 
leftmost bit is the sign bit for a signed number). PF is now 'E' because you have 
two '1' bits, and two is even. CF is set because you shifted a '1' bit off the left 
end. CF always signals when a '1' bit has been shifted off the end. If you shift 
(0111 0000), the OF flag will be set because the sign changed. The overflow flag, 
OF, will never change; if the left bit stays the same. 

'HARD' FLAGS

IEF, TF and DF are 'hard' flags. Once they are set they remain in the same setting. 
If you use DF, the direction flag, in a subroutine, you must save the flags upon 
entry and restore the flags on exiting to make sure that DF has not been altered.



MOVSX
MOVSX destination, source
Logic:  destination <- sign extend(source)

This instruction copies a source operand to a destination operand and extends its 
sign. This is particularly useful to preserve sign when copying from 8-bit register
to 16-bit one, or from 16-bit register to 32-bit one.



MOVZX
MOVZX destination, source
Logic: destination <- zero extend(source)

This instruction copies a source operand to a destination operand and zero-extends
it. This is particularly useful to preserve signs when copying from 8-bit register
to 16-bit one, or from 16-bit register to 32-bit one.

The MOVZX takes four cycles to execute due to due zero-extension wobblies. A better 
way to load a byte into a register is by:

     xor eax,eax
     mov al,memory
 
As the xor just clears the top parts of EAX, the xor may be placed on the OUTSIDE of 
a loop that uses just byte values. The 586 shows greater response to such actions.

It is recommended that 16 bit data be accessed with the MOVSX and MOVZX if you 
cannot place the XOR on the outside of the loop.

N.B. Do the "replacement" only for movsx/zx inside loops.



SBB
SBB Subtract with Borrow
SBB destination, source

Logic: destination <- destination - source - CF

SBB subtracts the source from the destination; subtracts 1 from that result if the
Carry Flag is set, and stores the result in destination. The operands may be bytes
or words; or both may be signed or unsigned binary numbers.

SBB is useful for subtracting numbers that are larger than 16 bits, since it 
subtracts a borrow (in the Carry Flag) from a previous operation.

You may subtract a byte-length immediate value from a destination that is a word;
in this case, the byte is sign-extended to 16 bits before the subtraction.

sbb eax, eax
Consider the following code snippet:

:0040D437 E8740A0000       call 0040DEB0           ;compares serials. sets eax=1 if 
                                                    bad; 0 if good 
:0040D43C F7D8             neg eax 
:0040D43E 59               pop ecx 
:0040D43F 1BC0             sbb eax, eax            ;sets eax = -1 if bad serial else 
                                                   ;(eax = 0) 
:0040D441 59               pop ecx 
:0040D442 40               inc eax                 ;sets eax = 0  if bad serial 
                                                   ;(-1+ 1 = 0) 

As a second example, consider the following code snippet:

:004271DA sbb  eax, eax                            ;eax=-1 (if not previously 0)
:004271DC sbb  eax, FFFFFFFF                       ;FFFFFFFF = -1
:004271DF test eax, eax <-- is eax=0?
:004271E1 jnz 00427228  <-- jump if eax is not 0

For the third example, study the following code snippet:

:0040DEF4 1BC0              sbb eax, eax 
:0040DEF6 D1E0              shl eax, 1 
:0040DEF8 40                inc eax 
:0040DEF9 C3                ret 


Also see how eax, as a Reg Flag, is set equal to 1 in the following code snippet:

1000243E   mov al,byte ptr[esi]
10002441   pop edi
10002442   sub al,37 ; if al is 37 (7 decimal), the result = 0
10002444   pop esi
10002445   pop ebx
10002446   cmp al,01 ; if at this point al is less than 1, the Carry Flag is set
                     ; To end up with Reg Flag (eax = 1), al must be less than 1
10002448   sbb eax,eax 
1000244A   neg eax
1000244C   ret

Note that al at address :1000243E must be = 37 (7 decimal) to make eax = 1 at
:1000244A.

But what is the meaning of the following three code pieces? 
1):

Segment: _TEXT  DWORD USE32  00000018 bytes

 0000  8b 44 24 04       example1        mov     eax,+4H[esp]
 0004  23 c0                             and     eax,eax
 0006  0f 94 c1                          sete    cl
 0009  0f be c9                          movsx   ecx,cl
 000c  0f 95 c0                          setne   al
 000f  0f be c0                          movsx   eax,al
 0012  03 c1                             add     eax,ecx
 0014  c3                                ret
 0015  90                                nop
 0016  90                                nop
 0017  90                                nop


2):

Segment: _TEXT  DWORD USE32  0000001c bytes

 0000  55                _example2       push    ebp
 0001  8b ec                             mov     ebp,esp
 0003  53                                push    ebx
 0004  8b 55 08                          mov     edx,+8H[ebp]
 0007  f7 da                             neg     edx
 0009  19 d2                             sbb     edx,edx
 000b  42                                inc     edx
 000c  8b 5d 08                          mov     ebx,+8H[ebp]
 000f  f7 db                             neg     ebx
 0011  19 db                             sbb     ebx,ebx
 0013  f7 db                             neg     ebx
 0015  89 d0                             mov     eax,edx
 0017  03 c3                             add     eax,ebx
 0019  5b                                pop     ebx
 001a  5d                                pop     ebp
 001b  c3                                ret


3)

Segment: _TEXT  DWORD USE32  00000016 bytes

 0000  8b 44 24 04       _example3       mov     eax,+4H[esp]
 0004  f7 d8                             neg     eax
 0006  19 c0                             sbb     eax,eax
 0008  40                                inc     eax
 0009  8b 4c 24 04                       mov     ecx,+4H[esp]
 000d  f7 d9                             neg     ecx
 000f  19 c9                             sbb     ecx,ecx
 0011  f7 d9                             neg     ecx
 0013  03 c1                             add     eax,ecx
 0015  c3                                ret


Well, they mean the SAME - the following simple function: 
int example( int g ) {

    int x,y;
    x = !g;
    y = !!g;
    return x+y;
}


First code is made by HighC. It IS OPTIMIZED as you see. Second piece is by 
Zortech C. Not so well optimized, but shows interesting NON-obvious 
calculations: 
NEG reg; SBB reg,reg; INC reg; means: if (reg==0) reg=1; else 
reg=0; NEG reg; SBB reg,reg; NEG reg; means: if (reg==0) reg=0; else reg=1; 

And it is WITHOUT any JUMPS or special instructions (like SETE/SETNE from 1st 
example)! Only pure logics and arithmetics! Now one could figure out many 
similar uses of the flags, sign-bit-place-in-a-register, 
flag-dependent/influencing instructions etc... 
(as you see, HighC names functions exactly as they are stated by the 
programmer; Zortech adds an underscore at start; Watcom adds underscore 
afterwards; etc..) 
The third example is again by Zortech C, but for the (same-optimized-by-hand) 
function: 
   int example( int g ) {  return !g + !!g; }

I put it here to show the difference between compilers - HighC just does not 
care if you optimize the source yourself or not - it always produces the same 
most optimized code (it is because the optimization is pure logical; but it will 
NOT figure out that the function will always return 1, for example ;)... well, 
sometimes it does!); while Zortech cannot understand that x,y,z are not needed, 
and makes a new stack frame, etc... Of course, it could even be optimized more 
(but by hand in assembly!): e.g. MOV ECX,EAX (2bytes) after taking EAX from 
stack, instead of taking ECX from stack again (4bytes)... but hell, you're 
better off to replace it with the constant value 1! 

Other similar "strange" arithmetics result from the compiler's way of 
optimizing calculations. Multiplications by numbers near to powers of 2 are 
substituted with combinations of logical shifts and arithmetics. For example: 

reg*3 could be (2*reg+reg): MOV eax,reg; SHL eax,1; add eax,reg; (instead of 
MUL reg,3); but it can be even done in ONE instruction (see above about LEA 
instruction): LEA eax,[2*reg+reg] 
reg*7 could be (8*reg-reg): MOV eax,reg; SHL eax,3; sub eax,reg 



SUB
SUB Subtract
SUB destination,source

Logic: destination <- destination - source

SUB subtracts the source operand from the destination operand and stores the
results in destination. Both operands may be bytes or words; and both may be 
signed or unsigned binary numbers.

You may wish to use SBB if you need to subtract numbers that are larger than
16 bits, since SBB subtracts a borrow from a previous operation.

You may subtract a byte-length immediate value from a destination that is a word;
in this case, the byte is sign-extended to 16 bits before the subtraction.



CBW
Convert Byte to Word
Logic:   if (AL < 80h then
             AH <- 0
         else      
             AH <- FFh

CBW extends the sign bit of the byte in the AL register into the AH register. In 
other words, this instruction extends a signed byte value into the equivalent word 
value. This means that the instruction gives value to AH according to the sign bit 
of AL. If the sign bit of AL is 1, then all bits in AH will become 1 too (negative 
number). If the sign bit of AL is 0, then all bits of AH will also become 0.

Note: This instruction will set AH to 0FFh if the sign bit (bit 7) of AL is 
set; if bit 7 of AL is not set, AH will be set to 0. The instruction is useful for 
generating a word from a byte prior to performing byte multiplication or division.

 


CWD
Convert Word to Doubleword
Logic:   if (AX < 8000h) then
             DX <- 0
         else
             DX <- FFFFh

If the sign bit in AX is 1, then this instruction will set all bits in DX, making
them all 1 (negative number); and if the sign bit in AX is 0, it will clear all bits
in DX, making them all 0.

In other words, CWD extends the sign bit of the AX register into the DX register. 
This instruction generates the double-word equivalent of the signed number in the AX 
register.

Note: This instruction will set DX to 0FFFFh if the sign bit (bit 15) of AX is set;
if bit 15 of AX is not set, DX will be set to 0.



CDQ
Convert Double to Quad
Logic:  EDX:EAX  <- Sign extend(EAX)

This instruction converts a signed double word in EAX to a quad word, also signed,
in EDX:EAX. It extends the sign bit.



IMUL, MUL
MUL     Integer Multiply, Unsigned
        Multiplies two unsigned integers (always positive)

IMUL    Integer Multiply, Signed
        Multiplies two signed integers (either positive or negitive)

Syntax:
        MUL  source   ; (register or variable)
        IMUL source   ; (register or variable)

Logic:  
        AX     <-  AL * source       ;if source is a byte
        DX:AX  <-  AX * source       ;if source is a word
         
This multiplies the register given by the number in AL or AX depending on the
size of the operand. The answer is given in AX. If the answer is bigger than
16 bits then the answer is in DX:AX (the high 16 bits in DX and the low 16
bits in AX).

On a 386, 486 or Pentium the EAX register can be used and the answer is stored
in EDX:EAX.   (See also Multiplication.)

64-bit multiplications are handled in the same way, using EDX:EAX instead.

IMUL has two additional uses that allow for 16-bit results:

1) IMUL register16, immediate16

In this form, register16 is multiplied by immediate16, and the result is placed
in register16. 

2) IMUL register16, memory16, immediate16

Here, memory16 is multiplied by immediate16 and the result is placed in register16.

In both of these forms, the carry and over flow flags will be set if the result16
is too large to fit into 16 bits.

INTEGER MULTIPLY
The integer multiply by an immediate can usually be replaced with a faster
and simpler series of shifts, subs, adds and lea's.
As a rule of thumb when 6 or fewer bits are set in the binary representation
of the constant, it is better to look at other ways of multiplying and not use
INTEGER MULTIPLY. (the thumb value is 8 on a 586)
A simple way to do it is to shift and add for each bit set, or use LEA.

Here the LEA instruction comes in as major cpu booster, for example:

      LEA ECX,[EDX*2]       ; multiply EDX by 2 and store result into ECX
      LEA ECX,[EDX+EDX*2]   ; multiply EDX by 3 and store result into ECX
      LEA ECX,[EDX*4]       ; multiply EDX by 4 and store result into ECX
      LEA ECX,[EDX+EDX*4]   ; multiply EDX by 5 and store result into ECX
      LEA ECX,[EDX*8]       ; multiply EDX by 8 and store result into ECX
      LEA ECX,[EDX+EDX*9]   ; multiply EDX by 9 and store result into ECX

And you can combine leas too!!!!

      lea ecx,[edx+edx*2]   ;
      lea ecx,[ecx+ecx*8]   ;  ecx <--  edx*27

(of course, if you can, put three instructions between the two LEA so even on 
Pentiums, no AGIs will be produced).

For examples of multiplication, consider the following code snippets:

Byte1 DB 80h
Byte2 DB 40h
WORD1 DW 8000h
WORD2 DW 2000h
MAIN PROC NEAR
     CALL C10MUL
     CALL D10IMUL
     RET
MAIN ENDP

C10MUL PROC              ; Multiplication of unsigned numbers  
       MOV AL, BYTE1
       MUL BYTE2         ; two bytes; result in AX

       MOV AX,WORD1      ; two words; result in DX:AX 
       MUL WORD2

       MOV AL, BYTE1     ; one byte and one word; result in DX:AX
       SUB AH, AH
       MUL WORD1
       RET

C10MUL ENDP

D10IMUL PROC              ; Multiplication of signed numbers

        MOV   AL, BYTE1   ; one byte by another byte; result in AX
        IMUL  BYTE2

        MOVE  AX, WORD1   ; one word by another word; result in DX:AX
        IMUL  WORD2

        MOVE  AL, BYTE1   ; one byte by one word; result in DX:AX
        CBW
        IMUL  WORD1
        RET
D10IMUL ENDP



IDIV, DIV
DIV     Divides two unsigned integers(always positive)
IDIV    Divides two signed integers (either positive or negitive)

Syntax:
        DIV  source                ;(register or variable)
        IDIV source                ;(register or variable)

Logic:
        AL <- AX/source            ; Byte source
        AH <- remainder
or

        AX <- DX:AX/source         ; Word source
        DX <- remainder 

This works in the same way as IMUL and MUL by dividing the number in AX by the
register or variable given. The answer is stored in two places. AL stores the
answer and the remainder is in AH. If the operand is a 16 bit register then
the number in DX:AX is divided by the operand and the answer is stored in AX
and remainder in DX.  (See also Division.)

INTEGER DIVIDE
In most cases, an Integer Divide is preceded by a CDQ instruction.
This is a divide instruction using EDX:EAX as the dividend and CDQ sets up EDX.
It is better to copy EAX into EDX, then arithmetic-right-shift EDX 31 places to sign 
extend.

The copy/shift instructions take the same number of clocks as CDQ, however, on 586's
allows two other instructions to execute at the same time.  If you know the value is 
a positive, use XOR EDX,EDX.

For examples of Division, consider the following code snippets:

BYTE1   DB    80h
BYTE2   DB    16h
WORD1   DW    2000h
WORD2   DW    0010h
WORD3   DW    1000h
MAIN    PROC  NEAR
        CALL  D10DIV
        CALL  E10IDIV
        RET
MAIN    ENDP
..
..
D10DIV  PROC                ;Division of unsigned numbers

        MOV AX,WORD1        ;division of one word by one byte
        DIV BYTE1           ;quotiont in AL, and the remainder in AH

        MOV AL, BYTE1       ;division of one byte by one byte
        SUB AH,AH           ;quotiont in AL, and remainder in AH
        DIV BYTE2

        MOV DX, WORD2       ;division of a doubleword by one word
        MOV AX, WORD3          
        DIV WORD1

        MOV AX, WORD1       ;division of one word by another word
        SUB DX, DX
        DIV WORD3
        RET
D10DIV  ENDP
..
..

E10IDIV PROC                ;Division of signed numbers


        MOV   AX, WORD1     ;division of one word by a byte
        IDIV  BYTE1

        MOV   AL, BYTE1     ;division of one byte by another byte
        CBW
        IDIV  BYPTE2

        MOV   DX, WORD2     ;division of a doubleword by another word
        MOV   AX, WORD3
        IDIV  WORD1

        MOV   AX, WORD1     ;division of one word by another word
        CWD
        IDIV  WORD3
        RET
E10IDIV ENDP



LEA
Intel's i80x86 has an instruction called LEA (Load Effective Addressing). It calculates the 
address through the usual processor's addressing module, and afterwards does not use it for 
memory-access, but stores it into a target register. So, if you write LEA AX,[SI]+7, you will 
have AX=SI+7 afterwards. In i386, you could have LEA EDI, [EAX*4][EBX]+37. In one instruction! 
But, if the multiplier is not 1,2,or 4 (i.e. sub-parts of the processor's Word) - you can not 
use it - it is not an addressing mode. 

LEA means Load Effective Address.

Syntax:
LEA destination,source

Desination can be any 16 bit register and the source must be a memory operand
(bit of data in memory). It puts the offset address of the source in the
destination.

The way we usually enter the address of a message we want to print out is a bit
cumbersome. It takes three lines and it isn’t the easiest thing to remember

        mov dx,OFFSET MyMessage
        mov ax,SEG MyMessage
        mov ds,ax

We can replace all this with just one line. This makes the code easier to read
and it easier to remember. This only works if the data is only in in one segment i.e.  small memory model.

        lea dx,MyMessage
or      mov dx,OFFSET MyMessage

Using lea is slightly slower and results in code which is larger. Note that with 
LEA, we use only the name of the variable, while with:

        mov  si, offset variable4

we need to use the word 'offset'.

LEA's generally increase the chance of AGI's (ADDRESS GENERATION STALLS). However, 
LEA's can be advantageous because:

    *  In many cases an LEA instruction may be used to replace constant
       multiply instructions. (a sequence of LEA, add and shift for example)
       (See also INTEGER MULTIPLY.)
    *  LEA may be used as a three/four operand addition instruction.
       LEA ECX, [EAX+EBX*4+ARRAY_NAME]
    *  Can be advantageous to avoid copying a register when both operands to
       an ADD are being used after the ADD as LEA need not overwrite its
       operands.

    The general rule is that the "generic"

    LEA A,[B+C*INDEX+DISPLACEMENT]

        where A can be a register or a memory location and B,C are registers
        and INDEX=1,2,4,8
        and DISPLACEMENT = 0 ... 4*1024*1024*1024
                           or (if performing signed int operations)
                           -2*1024*1024*1024 ... + (2*1024*1024*1024 -1 )

    replaces the "generic" worst-case sequence

    MOV X,C    ; X is a "dummy" register
    MOV A,B
    MUL X,INDEX    ;actually  SHL X, (log2(INDEX))
    ADD A,DISPLACEMENT
    ADD A,X

    So using LEA you can actually "pack" up to FIVE instructions into one
    Even counting a "worst case" of TWO OR THREE AGIs caused by the LEA
    this is very fast compared to "normal" code.
    What's more, cpu registers are precious, and using LEA
    you don't need a dummy "X" register to preserve the value of B and C.




LOGIC
             There are a number of operations which work on individual bits of
             a byte or word. Before we start working on them, it is necessary
             for you to learn the Intel method of numbering bits. Intel starts
             with the low order bit, which is #0, and numbers to the left. If
             you look at a byte:

                 7 6 5 4 3 2 1 0

             that will be the ordering. If you look at a word:

                 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

             that is the ordering. The overwhelming advantage of this is that
             if you extend a number, the numbering system stays the same. That
             means that if you take the number 45 :

                 7 6 5 4 3 2 1 0
                 0 0 1 0 1 1 0 1  (45d)

             and sign extend it:

                 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
                  0  0  0  0  0  0  0  0  0  0  1  0  1  1  0  1 

             each of the bits keeps its previous numbering. The same is true
             for negative numbers. Here's -73:

                 7 6 5 4 3 2 1 0
                 1 0 1 1 0 1 1 1 (-73d)

                 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1  0
                  1  1  1  1  1  1  1  1  1  0  1  1  0  1  1  1  (-73d)

             In addition, the bit-position number denotes the power of 2 that
             it represents. Bit 7 = 2 ** 7 = 128, bit 5 = 2 ** 5 = 32, 
             bit 0 = 2 ** 0 = 1. {1}.

             Whenever a bit is mentioned by number, e.g. bit 5, this is what
             is being talked about.

             
             AND  

             AND destination, source
             Logic: destination <- destination AND source

             AND performs bit-by-bit logical AND operation on its operands and
             stores the result in destination.

             There are five different ways you can AND two numbers:

                 1.   AND two register
                 2.   AND a register with a variable
                 3    AND a variable with a register
                 4.   AND a register with a constant
                 5.   AND a variable with a constant

             That is:

                 variable1 db   ?
                 variable2 dw   ?

                 and  cl, dh
                 and  al, variable1
                 and  variable2, si
                 and  dl, 0C2h
                 and  variable1, 01001011b

             You will notice that this time the constants are expressed in hex
             and binary. These are the only two reasonable alternatives. These
             instructions work bit by bit, and hex and binary are the only two
             ways of displaying a number bitwise (bit by bit). Of course, with
             hex you must still convert a hex digit into four binary digits.

             The table of bitwise actions for AND is:

                 1    1    ->   1
                 1    0    ->   0
                 0    1    ->   0
                 0    0    ->   0

             That is, a bit in the result will be set if and only if that bit
             is set in both the source and the destination. What is this used
             for? Several things. First, if you AND a register with itself,
             you can check for zero.

                 and  cx, cx

             (This can also be used to set the flags correctly before starting.) 

             If any bit is set, then there will be a bit set in the result and
             the zero flag will be cleared. If no bit is set, there will be no
             bit set in the result, and the zero flag will be set. No bit will
             be altered, and CX will be unchanged. This is the standard way of
             checking for zero. You can't AND a variable that way:

                 and  variable1, variable1

             is an illegal instruction. But you can AND it with a constant
             with all the bits set:

                 and  variable1, 11111111b

             If the bit is set in variable1, then it will be set in the
             result. If it is not set in variable1, then it won't be set in
             the result. This also sets the zero flag without changing the
             variable.


             AND ecx, 00000001
       
             00000000 ecx, our Target Indicator.
             00000001 is simply the value "1", our Source Indicator with which ecx 
                      is ANDed.
             --------
             00000000

             Our result is "0" because no bit PAIRS are set. The result of AND would 
             only be "1" if the first bit of ecx would be set to "1". 

             AND is also used in masks.

             
             TEST  

             Test destination, source 

             Logic:    (destination and source)
             CF <- 0
             OF <- 0
             It sets the flags only.

             There is a variant of AND called TEST. TEST does exactly
             the same thing as AND but throws away the results when it is
             done. It does not change the destination. This means that it can
             check for specific things without altering the data. In other words,
             Test performs a logical and on its two operands and updates the flags. 
             Neither destination nor source is changed.

             test ebx, ebx       ; Is ebx zero?
             jz ----             ; If yes, then jump


            For speed optimization, when comparing a value in a register with 0, 
            use the TEST command.
  
            TEST operates by ANDing the operands together without spending any
            internal time worrying about a destination register.
            Use test when comparing the result of a boolean AND command with an
            immediate constant for equality or inequality if the register is EAX.
            You can also use it for zero testing.
            (i.e. test ebx,ebx  sets the zero flag if ebx is zero)

             TEST is useful for examining the status of individual bits. For 
             example, the following code snippet will transfer control to 
             ONE_FIVE_ARE_OFF if both bits 1 and 5 of register AL are 
             cleared. The status of all other bits will be ignored.

             test al,00100010b    ; mask out all bits except for 1 and 5
             jz ONE_FIVE_ARE_OFF  ; if either bit was set, the result will 
             not be zero

             NOT_BOTH_ARE_OFF:
             ..
             ..
             ONE_FIVE_ARE_OFF:
             ..
             ..

             TEST has the same possibilities as AND:

                 variable1 db   ?
                 variable2 dw   ?

                 test cl, dh
                 test al, variable1
                 test variable2, si
                 test dl, 0C2h
                 test variable1, 01001011b

             will set the flags exactly the same as the similar AND
             instructions but will not change the destination. We need another
             concrete example, and for that we'll turn to your video card. In
             text mode, your screen is 80 X 25. That is 2000 cells. Each cell
             has a character byte and an attribute byte. The character byte
             has the actual ascii number of the character. The attribute byte
             says what color the character is, what color the background is,
             whether the character is high or low intensity and whether it
             blinks. An attribute byte looks like this:

                 7 6 5 4 3 2 1 0
                 X R G B I R G B

             Bits 0,1 and 2 are the foreground (character) color. 0 is blue, 1
             is green, and 2 is red. Bits 4, 5, and 6 are the background
             color. 4 is blue, 5 is green, and 6 is red. Bit 3 is high
             intensity, and bit 7 is blinking. If the bit is set (1) that
             particular component is activated, if the bit is cleared (0),
             that component is deactivated. 

             The first thing to notice is how much memory we have saved by
             putting all this information together. It would have been
             possible to use a byte for each one of these characteristics, but
             that would have required 8 X 2000 bytes = 16000 bytes. If you add
             the 2000 bytes for the characters themselves, that would be 18000
             bytes. As it is, we get away with 4000 bytes, a savings of over
             75%. Since there are four different screens (pages) on a color
             card, that is 18000 X 4 = 72000 bytes compared to 4000 X 4 =
             16000. That is a huge savings.

             We don't have the tools to access these bytes yet, but let's
             pretend that we have moved an attribute byte into dl. We can find
             out if any particular bit is set. TEST dl with a specific bit
             pattern. If the zero flag is cleared, the result is not zero so
             the bit was on. If the zero flag is set, the result is zero so
             that bit was off


                 test dl, 10000000b       ; is it blinking?
                 test dl, 00010000b       ; is there blue in the background?
                 test dl, 00000100b       ; is there red in the foreground?

             If we look at the zero flag, this will tell us if that component
             is on. It won't tell us if the background is blue, because maybe
             the green or the red is on too. Remember, test alters neither the
             source nor the destination. Its purpose is to set the flags, and
             the results go into the Great Bit Bucket in the Sky.

             
             OR  
 
             The table for OR is:

                 1    1    ->   1
                 1    0    ->   1
                 0    1    ->   1
                 0    0    ->   0

             If either the source or the destination bit is set, then the
             result bit is set. If both are zero then the result is zero.
             OR is used to turn on a specific bit.

                 or   dl, 10000000b  ; turn on blinking
                 or   dl, 00000001b  ; turn on blue foreground
             
             After this operation, those bits will be on whether or not they
             were on before. It changes none of the bits where there is a 0.
             They stay the same as before.
   
             or ebx, ebx       ; Is ebx zero?
             jz ----           ; If yes, then jump

             To have 1 in ecx:
             
             or ecx, 00000001            


             
             XOR  
           
             The table for XOR is:

                 1    1    ->   0
                 1    0    ->   1
                 0    1    ->   1
                 0    0    ->   0

             That is, if both are on or if both are off, then the result is
             zero. If only one bit is on, then the result is 1. This is used
             to toggle a bit off and on.

                 xor  dl, 10000000b  ; toggle blinking
                 xor  dl, 00000001b  ; toggle blue foreground

             Where there is a 1, it will reverse the setting. Where there is a
             0, the setting will stay the same. This leads to one of the
             favorite pieces of code for programmers.

                 xor  ax, ax

             zeros the ax register. There are three ways to zero the ax
             register:

                 mov  ax, 0
                 sub  ax, ax
                 xor  ax, ax

             The first one is very clear, but slightly slower. For the second
             one, if you subtract a number from itself, you always get zero.
             This is slightly faster and fairly clear.{2}  For the third one,
             any bit that is 1 will become 0, and and bit that is 0 will stay
             0. It zeros the register as a side effect of the XOR instruction.
             You'll never guess which one many programmers prefer. That's
             right, XOR. Many programmers prefer the third because it helps
             make the code more obsure and unreadable. That gives a certain
             aura of technical complexity to the code.

             Exchanging A and B without temporary variables could be done by 
             xor A,B; xor B,A; xor A,B (i.e. A=A^B; B=A^B; A=A^B) sequence and 
             it WILL work on ANY processor/language supporting XOR operation.



             
             NEG and NOT  
            
             NOT is a logical operation and NEG is an arithmetical operation.
             We'll do both here so you can see the difference. NOT toggles the
             value of each individual bit:

                 1    ->   0
                 0    ->   1

             NOT destination
             Logic: destination <- NOT(destination)   ; One's complement

             NOT inverts each bit of its operand (that is, forms the one's 
             complement). The operand can be a byte or a word. 

 
             NEG destination
             Logic: destination  <-  -destination     ;  Two's complement
               
             NEG subtracts the destination operand from 0, and returns the result
             in the destination. This effectively produces the two's complement
             of the operand. The operand may be a byte or a word.   
             NEG negates the value of the register or variable (a signed
             operation). NEG performs (0 - number) so:

                 neg  ax
                 neg  variable1

             are equivalent to (0 - AX) and (0 - variable1) respectively. NEG
             sets the flags in the same way as (0 - number).

             Note: If the operand is zero, the Carry Flag is cleared; in all
                   other cases, the Carry Flag is set.
             

             
             MASKS  
           
             To explain masks, we'll need some data, and we'll use the
             attribute byte for the monitor. Here it is again:

                 7 6 5 4 3 2 1 0
                 X R G B I R G B

             Bits 0,1 and 2 are the foreground (character) color. 0 is blue, 1
             is green, and 2 is red. Bits 4, 5, and 6 are the background
             color. 4 is blue, 5 is green, and 6 is red. Bit 3 is high
             intensity, and bit 7 is blinking.

             What we want to do is turn certain bits on and off without
             affecting other bits. What if we want to make the background
             black without changing anything else? We use and AND mask.

                 and  video_byte, 10001111b

             Bits 0, 1, 2, 3 and 7 will remain unchanged, while bits 4, 5 and
             6 will be zeroed. This will make the background black. What if we
             wanted to make the background blue? This is a two step process.
             First we make the background black, then set the blue background
             bit. This involves first the AND mask, then an OR mask.

                 and  video_byte, 10001111b
                 or   video_byte, 00010000b

             The first instruction shuts off certain bits without changing
             others. The second turns on certain bits without effecting
             others. The binary constant that we are using is called a mask.
             You may write this constant as a binary or a hex number. You
             should never write it as a signed or unsigned number (unless you
             are one of those people who just adores making code unreadable).

             If you want to turn off certain bits in a piece of data, use an
             AND mask. The bits that you want left alone should be set to 1,
             the bits that you want zeroed should be set to 0. Then AND the
             mask with the data.

             If you want to turn on certain bits in a piece of data, use an OR
             mask. The bits that you want left alone should be set to 0. The
             bits that you want turned on should be set to 1. Then OR the mask
             with the data.

             Go back to AND and OR to make sure you believe that this is what
             will happen.  




JUMPS
                 
Hex:            Asm:             Description:

75 or   0F85    jne              jump if not equal
74 or   0F84    je               jump if equal
77 or   0F87    ja               jump if above
0F86            jna              jump if not above
0F83            jae              jump if above or equal
0F82            jnae             jump if not above or equal
0F82            jb               jump if below
0F83            jnb              jump if not below
0F86            jbe              jump if below or equal
0F87            jnbe             jump if not below or equal
0F8F            jg               jump if greater
0F8E            jng              jump if not greater
0F8D            jge              jump if greater or equal
0F8C            jnge             jump if not greater or equal
0F8C            jl               jump if less
0F8D            jnl              jump if not less
0F8E            jle              jump if less or equal
0F8F            jnle             jump if not less or equal
EB              jmp or   jmps    jump directly to
84              test             test
90              nop              no operation



NUMBERS AND ARITHMETIC
                 
             You don't habitually use the base two system to balance your
             checkbook, so it would be counterproductive to teach you machine
             arithmetic on a base two system. What number systems have you had
             a lot of experience with? The base 10 system springs to mind. I'm
             going to show you what happens on a base 10 system so you will
             understand the structure of what happens with computer
             arithmetic.

             BASE 10 MACHINE

             Each place inside the microprocessor that can hold a number is
             called a REGISTER. Normally there are a dozen or so of these. Our
             base 10 machine has 4 digit registers.  They can represent any
             number from 0000 to 9999. They are exactly like an industrial
             counters or the counters on your tape machines.{1} If you add 27
             to a register, the microprocessor counts forward 27; if you
             subtract 153 from a register, the microprocessor counts backwards
             153.   Every time you add 1 to a register, it increments by 1 -
             that is 0245, 0246, 0247, 0248. Every time you subtract 1 from a
             register, it decrements by 1 - that is 3480, 3479, 3478, 3477.

             Let's do some more incrementing.  9997, 9998, 9999, 0000, 0001,
             0002. Whoops! That's a problem. When the register reaches 9999
             and we add 1, it changes to 0000, not 10,000. How can we tell the
             difference between 0000 and 10,000? We can't without a little
             help from the CPU.{2}  Immediately after an arithmetical
             operation, the CPU knows whether you have gone through 10,000
             (9999->0000). The CPU has something called a carry flag. It is
             internal to the CPU and can have the value 0 or 1. After each
             arithmetical operation, the CPU sets the CARRY FLAG to 1 if you
             went through the 9999/0000 boundary, and sets the carry flag to 0
             if you didn't.{3}

             Here are some examples, showing addition, the result, and the
             carry flag. The carry flag is normally abbreviated by CF.

                    number 1       number 2        result     CF

                      0289           4782           5071      0 
                      4398           2964           7382      0
                      8177           5826           4003      1
                      6744           4208           0952      1

             Note that you must check the carry flag immediately after the
             arithmetical operation. If you wait, the CPU will reset it after
             the next arithmetical operation.

             Now let's do some decrementing. 0003, 0002, 0001, 0000, 9999,
             9998. Golly gosh! Another problem. When we got to 0000, rather
             than getting -1, -2, we got 9999, 9998. Apparently 9999 stands
             for -1, 9998 stands for -2. Yes, that's the system on this, on
             the 8086, and on all computers. (Back to that in a moment.) How
             do we tell that the number went through 0 ; i.e. 0000->9999? The
             carry flag comes to the rescue again. If the number goes through
             the 9999/0000 boundary in either direction, the CPU sets the CF
             to 1; if it doesn't, the CPU sets the CF to 0. Here's some
             subtraction, with the result and the carry flag.

                    number 1       number 2       result     CF

                      8473           2752           5721      0
                      2836           4583           1747      1
                      0654           9281           8627      1
                      9281           0654           8627      0

             Look at examples 3 and 4. The numbers are reversed. The results
             are the same but they have different signs. But that is as it
             should be. When you reverse the order in a subtraction, you get
             the same absolute value, only a different sign (15 - 7 = 8 but 
             7 - 15 = -8). Remember, the CF is reliable only immediately after
             the operation.


             NEGATIVE NUMBERS

             The negative numbers go 9999=-1, 9998=-2, 9997=-3, 9996=-4,
             9995=-5 etc. A more negative number is denoted by a smaller
             number in the register; -5 = 10,000 -5 = 9995; -498 = 10,000 -498
             = 9502, and in general, -x = 10,000 -x. Here are some negative
             numbers and their representations on our machine.

                     number     machine no              number     machine no

                        -27          9973                -4652          5348
                      -8916          1084                -6155          3845

             As you will notice, these numbers look exactly the same as the
             unsigned numbers. They ARE exactly the same as the unsigned
             numbers. The machine has no way of knowing whether a number in a
             register is signed or unsigned. Unlike BASIC or PASCAL which will
             complain whenever you try to use a number in an incorrect way,
             the machine will let you do it. This is the power and the curse
             of machine language. You are in complete control. It is your
             responsibility to keep track of whether a number is signed or
             unsigned. 

             Which signed numbers should be positive and which negative? This
             has already been decided for you by the computer, but let's think
             out what a reasonable solution might be. We could have from 0000
             to 8000 positive and from 9999 to 8001 negative, but that would
             give us 8001 positive numbers and 1999 negative numbers. That
             seems unbalanced. More importantly, if we take -(3279) the
             machine will give us 6721, which is a POSITIVE number. We don't
             want that. For reasons of symmetry, the positive numbers are
             0000-4999 and the negative numbers are 9999-5000.{4} Our most
             negative number is -5000 = 10,000 -5000 = 5000.


             10'S COMPLEMENT

             It's time for a digression. If we are going to be using negative
             numbers like -(473), changing from an external number to an
             internal number is going to be a bother: i.e. -473 -> 9527. Going
             the other way is going to be a pain too: i.e. 9527 -> -473. Well,
             it would be a problem except that we have some help.

                 0000 =    10,000    =     9999     +1
                                          - 473
             result                        9526     +1   = 9527

             Let's work this through carefully. On our machine, 0000  and
             10000 (9999+1) are the same thing, so 0 - 473 is the same as
             9999+1-473 which is the same as 9999-473+1. But when we have all
             9s, this is a cinch. We never have to borrow - all we have to do
             is subtract each digit from 9 and then add 1 to the total. We may
             have to carry at the end, but that is a lot better than all those
             borrows. We'll do a few examples: 

             (-4276)
                 0000 =    10,000    =     9999     +1
                                          -4276
             result                        5723     +1   = 5724


             (-3982)
                 0000 =    10,000    =     9999     +1
                                          -3982
             result                        6017     +1   = 6018

             4. That way, if we tell the machine that we are working with
             signed numbers, all it has to do is look at the left digit. If
             the digit is 5-9, we have a negative number, if it is 0-4, we
             have a positive number. Note that 0000 is considered to be
             positive. This is true on all computers.

                                          -1989
             result                        8010     +1   = 8011

             This is called 10s complement. Subtract each digit from 9, then
             add 1 to the total. One thing we should check is whether we get
             the same number back if we negate the negative result; i.e. does
             -(-1989)) = 1989?  From the last example, we see that -1989 =
             8011, so:

             (-8011)
                 0000 =    10,000    =     9999     +1
                                          -8011
             result                        1988     +1   = 1989

             It seems to work. In fact, it always works. See the footnote for
             the proof.{5} You are going to use this from time to time, so you
             might as well practice some. Here are 10 numbers to put into 10s
             complement form. The answers are in the footnote. (1) -628, (2)
             -4194, (3) -9983, (4) -1288, (5) -4058, (6) -6952, (7) -162, (8)
             -9, (9) -2744, (10) -5000.{6}

             The computer keeps track of whether a number is positive or
             negative. After an arithmetical operation, it sets a flag to tell
             whether the result is positive or negative. This flag has no
             meaning if you are using unsigned numbers. The computer is
             saying, "If the last arithmetical operation was with signed
             numbers, then this is the sign of the result." The flag is called
             the sign flag (SF). It is 0 if the number is positive and 1 if
             the number is negative. Let's decrement again and look at both
             the sign flag and carry flag.

                        NUMBER    SIGN     CARRY

                           3         0         0
                           2         0         0
                           1         0         0
                           0         0         0
                        9999         1         1

             =================================================================
             5. Let x be any number. Then:
             -x     = ( 10,000 - x)     = ( 9999 + 1 - x ) ;

             -(-x)  = ( 10,000 - (-x) ) = ( 9999 + 1 - (-x) )
                                       = ( 9999 + 1 - ( 9999 + 1 - x ) )
                                       = ( 9999 + 1 - 9999 - 1 + x )
                                       = x

             6.      (1) -628 = 9372 , (2) -4194 = 5806 , (3) -9983 = 0017,
                     (4) -1288 = 8712 , (5) -4058 = 5942 , (6) -6952 = 3048
                     (7) -162 = 9838 , (8) -9 = 9991 , (9) -2744 = 7256,
                     (10) -5000 = 5000. 

            This last one is a little strange. It changes 5000 into itself. 
            In our system, 5000 is a negative number and it winds up as a 
            negative number. This happens on all computers. If you take the 
            maximum negative number and take its negative, you get the same 
            number back.
             =================================================================
                        9998         1         0
                        9997         1         0
                        9996         1         0

             That worked pretty well. The sign flag changed from 0 to 1 when
             we went from 0 to 9999 and the carry flag was set to 1 for that
             one operation so we could see that we had gone through the
             9999/0000 boundary. 

             Let's do some more decrementing.

                        NUMBER    SIGN     CARRY

                        5003         1         0
                        5002         1         0
                        5001         1         0
                        5000         1         0
                        4999         0         0
                        4998         0         0
                        4997         0         0
                        4996         0         0

             This one didn't work too well. 5000 is our most negative number
             (-5000) and 4999 is our most positive number; when we crossed the
             4999/5000 boundary, the sign changed but there was nothing to
             tell us that the sign had changed. We need to make another flag.
             This one is called the overflow flag. We check the carry flag
             (CF) for the 0000/9999 boundary and we check the overflow flag
             for the 5000/4999 boundary. The last decrementing example with
             the overflow flag:

                        NUMBER    SIGN     CARRY     OVERFLOW

                        5003         1         0         0
                        5002         1         0         0
                        5001         1         0         0
                        5000         1         0         0
                        4999         0         0         1
                        4998         0         0         0
                        4997         0         0         0
                        4996         0         0         0

             This time we can find out that we have gone through the boundary.
             We'll come back to how the computer sets the overflow flag later,
             but let's do some addition and subtraction now.


             UNSIGNED ADDITION AND SUBTRACTION

             Unsigned addition is done the same way as normally. The computer
             adds the two numbers. If the result is over 9999, it sets the
             carry flag and drops the left digit (i.e. 14625 -> 4625, CF = 1,
             19137 -> 9137 CF = 1, 10000 -> 0000 CF = 1). The largest possible
             addition is 9999 + 9999 = 19998. This still has a 1 in the left
             digit. If the carry flag is set after an addition, the result
             must be between 10000 and 19998.

             Since this is unsigned addition, we won't worry about the sign
             flag or the overflow flag for the moment. Here are some examples
             of unsigned addition.

                      NUMBER 1       NUMBER 2       RESULT         CF

                        5147           2834          7981           0
                        6421           8888          5309           1
                        2910           6544          9454           0
                        6200           6321          2521           1

             Directly after the addition, the computer has complete
             information about the number. If the carry flag is set, that
             means that there is an extra 10,000, so the result of the second
             example is 15309 and the result of the fourth example is 12521. 
             There is no way to store all that information in 4 digits in
             memory so that extra information will be lost if it is not used
             immediately. 

             Subtraction is similar. The machine subtracts, and if the answer
             is below 0000, it sets the carry flag, borrows 10000 and adds it
             to the result. -3158 -> -3135 + 10000 -> 6842 CF = 1 ; -8197 ->
             -8197 + 10000 -> 1803  CF = 1. After a subtraction, if the carry
             flag is set, you know the number is 10000 too big. Once again,
             the carry flag information must be used immediately or it will be
             lost. Here are some examples:

                      NUMBER 1       NUMBER 2       RESULT         CF

                        3872           2655          1217           0
                        9826           5967          3859           0
                        4561           7143          7418           1
                        2341           4907          7434           1

             If the carry flag is set, the computer borrowed 10000, so example
             3 is 7418 - 10000 = -2582 and example 4 is 7434 - 10000 = -2566.


             MODULAR ARITHMETIC

             What the computer is doing is modular arithmetic. Modular
             arithmetic is like a clock. If it is 11 o'clock and you go
             forward 1 hour it's now 12 o'clock; if it's 11 and you go
             backwards 1 hour it's now 10. If it's 11 and you go forward 4
             hours it's not 15, it's 3. If it's 11 and you go backward 15
             hours it's not -4, it's 8. 

             The clock is doing  mod 12  arithmetic.{7} 

                 (A+B) mod 12
                 (A-B) mod 12

             From the clock's viewpoint, 11 o'clock today, 11 o'clock
             yesterday and 11 o'clock, June 8, 1754 are all the same thing. If
             you go forward 200 hours (that's 12X16 + 8) you will have the
             same result as going forward 8 hours. If you go backwards 200
             hours (that's -(12X16 + 8) = -(12X16) -8) you get the same result
             as going backwards 8 hours. If you go forward 4 hours from 11
             (11+4) mod 12 = 3 you get the same result as going backwards 8
             hours (11-8) mod 12 = 3. In fact, these come in pairs. If A + B =
             12, then going forward A hours gives the same result as going
             backwards B hours. Forwards 9 = backwards 3; forwards 7 =
             backwards 5; forwards 11 = backwards 1.

             In the mod 12 system, the following things are equivalent:

                 (+72 + 4)      (+72 - 8)
                 (+60 + 4)      (+60 - 8)
                 (+48 + 4)      (+48 - 8)
                 (+36 + 4)      (+36 - 8)
                 (+24 + 4)      (+24 - 8)
                 (+12 + 4)      (+12 - 8)
                 (  0 + 4)      (  0 - 8)
                 (-12 + 4)      (-12 - 8)
                 (-24 + 4)      (-24 - 8)
                 (-36 + 4)      (-36 - 8)
                 (-48 + 4)      (-48 - 8)
                 (-60 + 4)      (-60 - 8)

             They form what is known as an equivalence class mod 12. If you
             use any one of them for addition or subtraction, you will get the
             same result (mod 12) as with any other one. Here's some
             addition:{8}

                 (+48 + 4) + 7 = (48 + 11) mod 12 = 11
                 (-48 - 8) + 7 = (48 - 1 ) mod 12 = 11
                 (  0 - 8) + 7 = ( 0 - 1 ) mod 12 = 11
                 (-60 + 4) + 7 = (-60 +11) mod 12 = 11

             And some subtraction:

                 (+48 + 4) - 2 = (48 + 2 ) mod 12 = 2
                 (-48 - 8) - 2 = (48 - 10) mod 12 = 2
                 (  0 - 8) - 2 = ( 0 - 10) mod 12 = 2
                 (-60 + 4) - 2 = (-60 + 2) mod 12 = 2


             Our pretend computer doesn't cycle every 12 numbers, it cycles
             every 10,000 numbers - it is a mod 10,000 machine. On our
             machine, the number 6453 has the following equivalence class:

                 (+30000 + 6453)               (+30000 - 3547)
                 (+20000 + 6453)               (+20000 - 3547)
                 (+10000 + 6453)               (+10000 - 3547)
                 (     0 + 6453)               (     0 - 3547)
                 (-10000 + 6453)               (-10000 - 3547)
                 (-20000 + 6453)               (-20000 - 3547)
                 (-30000 + 6453)               (-30000 - 3547)
             =================================================================
                8. (-10) mod 12 = 2 ;   (-11) mod 12 = 1
             =================================================================

             Any one of these will act the same as any other one. Notice that
             10000 - 3547 is the subtraction that we did to get the
             representation of -3547 on the machine. 

             -3547    = 9999 + 1
                        3547
                        6452 + 1 = 6453

             6453 and -3547 act EXACTLY the same on this machine. What this
             means is that there is no difference in adding signed or unsigned
             numbers on the machine. The result will be correct if interpreted
             as an unsigned number; it will also be correct if interpreted as
             a signed number.

                 6821 + 3179 = 10000  so  -3179 = 6821   and  3179 = -6821
                 5429 + 4571 = 10000  so  -4571 = 5429   and  4571 = -5429

             Since -3179 and 6821 act the same on our machine and since -4571
             and 5429 act the same, let's do some addition. Take your time so
             you understand why the signed and unsigned numbers are giving the
             same results mod 10000:
             ================================================================= 
                  6821 + 497 = 7318
                 -3179 + 497 = (10000 - 3179) + 497 = 10000 -2682  = -2682

                  7318 + 2682 = 10000      so    -2682 = 7318
             ==================================================================
                  5429 + 876 = 6305
                 -4571 + 876 = (10000 - 4571) + 876 = 10000 - 3695 = -3695

                  6305 + 3695 = 10000      so    -3695 = 6305
             ==================================================================
             Here's some subtraction:
             
                  6821 - 507 = 6314
                 -3179 - 507 = (10000 - 3179) - 507 = 10000 - 3686 = -3686
                  6314 + 3686 = 10000     so     -3686 = 6314
                  5429 - 178 = 5251
                 -4571 - 178 = (10000 - 4571) - 178 = 10000 - 4749 = -4749
                  5251 + 4749 = 10000    so      -4749 = 5251

             It is the same addition or subtraction. Interpreted one way it is
             signed addition or subtraction; interpreted another way it is
             unsigned addition or subtraction.

             The machine could have one operation for signed addition and
             another operation for unsigned addition, but this would be a
             waste of computer resources. These operations are exactly the
             same. This machine, like all computers, has only one integer
             addition operation and one integer subtraction operation. For
             each operation, it sets the flags of importance for both signed
             and unsigned arithmetic.

             For unsigned addition and subtraction, CF, the carry flag tells
             whether the 0000/9999 boundary has been crossed.

             For signed addition and subtraction, SF, the sign flag tells the
             sign of the result and OF, the overflow flag tells whether the
             result was too negative or too positive.


             SIGN EXTENSION

             Although our base 10 machine is set up for 4 digit numbers, it is
             possible to use it for numbers of any size by writing the
             appropriate software. We'll use 12 digit numbers as an example,
             though they could be of any length. The first problem is
             converting 4 digit numbers into 12 digit numbers. If the number
             is an unsigned number, this is no problem (we'll write the number
             in groups of 4 digits to keep it readable):

                 4816      ->   0000 0000 4816
                 9842      ->   0000 0000 9842
                  127      ->   0000 0000 0127

             what if it is a signed number? The first thing we need to know
             about signed numbers is, what is positive and what is negative?
             Once again, for reasons of symmetry, we choose positive to be 
             0000 0000 0000  to  4999 9999 9999 and negative to be 5000 0000
             0000 to 9999 9999 9999.{9}  This longer number system cycles from

             9999 9999 9999 to 0000 0000 0000. Therefore, for longer numbers,
             0000 0000 0000 = 1 0000 0000 0000. They are equivalent. 
             0000 0000 0000 = 9999 9999 9999 + 1.

             If it is a positive signed number, it is still no problem (recall
             that in our 4 digit system, a positive number is between 0000 and
             4999, a negative signed number is between 5000 and 9999). Here
             are some positive signed numbers and their conversions:

                 1974      ->   0000 0000 1974
                    1      ->   0000 0000 0001
                 3909      ->   0000 0000 3909

             =================================================================
             9. Once again, the sign will be decided by the left hand
             digit. If it is 0-4 it is a positive number; if it is 5-9 it is a
             negative number.
             ==================================================================

             If it is a negative number, where did its representation come
             from in our 4 digit system? -x -> 9999 + 1 -x = 9999 - x + 1.
             This time it won't be 9999 + 1 but 9999 9999 9999 + 1. Let's have
             some examples.

                 4 DIGIT SYSTEM       12 DIGIT SYSTEM

             -1964
                  9999     + 1        9999 9999 9999 + 1
                 -1964                         -1964
                  8035   -> 8036      9999 9999 8035 + 1 -> 9999 9999 8036

             -2867
                  9999     + 1        9999 9999 9999 + 1
                 -2867                         -2867
                  7132   -> 7133      9999 9999 7132 + 1 -> 9999 9999 7133

             -182
                  9999     + 1        9999 9999 9999 + 1
                  -182                          -182
                  9817   -> 9818      9999 9999 9817 + 1 -> 9999 9999 9818

             As you can see, all you need to do to sign extend a negative
             number is to put 9s to the left. 

             Can't those 9s on the left become 0s when we add that 1 at the
             end?  No. In order for that to happen, the right four digits must
             be 9999. But that can only happen if the number to be negated is
             0000:

                  9999 9999 9999 + 1
                           -0000
                  9999 9999 9999 + 1 -> 0000 0000 0000

             In all other cases, adding 1 does not carry anything out of the
             right four digits.


             It is impossible to truncate one of these 12 digit numbers to a 4
             digit number without making the results unreliable. Here are two
             examples:

             (number)      0000 0168 7451 ->   7451  (now a negative number)
             (actual value)     +168 7451     -2549 

             (number)      9999 9643 2170 ->   2170  (now a positive number)
             (actual value)     -356 7830     +2170


             We now have 12 digit numbers. Is it possible to add them and
             subtract them? Yes but only 4 digits at a time. When you add with
             pencil and paper you carry left from each digit. The computer can
             carry left from each group of 4 digits. We'll do the following
             addition:

                           0138 6715 6037
                         + 2514 2759 7784

             Do this with pencil and paper and write down all the carries. The
             computer is going to do this in 3 parts:

                 1) 6037 + 7784
                 2) 6715 + 2759 + carry (if any)
                 3) 0138 + 2514 + carry (if any)

             The first addition is our regular addition. It will set the carry
             flag if the 0000/9999 boundary was crossed (i.e. the result was
             larger than 9999). In our case CF = 1 since the result is 13821.
             The register holds 3821. We store 3821. Next, we need to add
             three things: 6715 + 2759 + CF (=1). There is an instruction like
             this on all computers. It adds two numbers plus the value of the
             carry flag. Our first addition was ADD (add two numbers). This
             time the machine instruction is ADC (add two numbers and the
             carry). The result of our second addition is 9475. The register
             holds 9475 and CF = 0. We store 9475. Finally, we need to add
             three more things: 0138 + 2514 + CF (=0). Once again we use ADC.
             The result is 2652, CF = 0. We store the 2652. That is the whole
             result:

                 2652 9475 3821

             If CF = 1 at this point, the number has crossed the
             9999,9999,9999/0000,0000,0000 boundary. This will work for signed
             numbers also. The only difference is that at the very end we
             don't check CF, we check OF to see if the
             4999,9999,9999/5000,0000,0000 boundary has been crossed. 


             Just to give you one more example we'll do a subtraction using
             the same numbers:

                           0138 6715 6037
                           2514 2759 7784

             Notice that in order for you to do this with pencil and paper
             you'll have to put the larger number on top before you subtract.
             With the machine this is unnecessary. Go ahead and do the
             subtraction with pencil and paper.

             The machine can do this 4 digits at a time, so this is a three
             step process:

                 1) 6037 - 7784 
                 2) 6715 - 2759 - borrow (if any)
                 3) 0138 - 2514 - borrow (if any)

             The first one is a regular subtraction and since the bottom
             number is larger, the result is 8253, CF = 1. (Perhaps you are
             puzzled because that's not the result that you got. Don't worry,
             it all comes out in the wash). Step two subtracts but also
             subtracts any borrow (We had a borrow because CF = 1). There is a
             special instruction called SBB (subtract with borrow) that does
             just that. 6715 - 2759 - 1 = 3955, CF = 0. We store the 3955 and
             go on to the third part. This also is SBB, but since we had no
             borrow, we have 0138 - 2514 - 0 = 7624, CF = 1. We store 7624.
             This is the end result, and since CF = 1, we have crossed the
             9999,9999,9999/0000,0000,0000 boundary. This is going to be the
             representation of a negative number mod 1,0000,0000,0000. With
             pencil and paper, your result was:

                 -2375 6044 1747

             The machine result was:

                  7624 3955 8253

             But CF was 1 at the end, so this represents a negative number.
             What number does it represent? Let's take its negative to get a
             positive number with the same absolute value:

                 9999 9999 9999  + 1
                 7624 3955 8253
                 2375 6044 1746  + 1  = 2375 6044 1747

             This is the same thing you got with pencil and paper. The reason
             it looked wierd is that a negative number is always stored as its
             modular equivalent. If you want to read a negative number, you
             need to take its negative to get a positive number with the same
             absolute value.

             If we had been working with signed numbers, we wouldn't have
             checked CF at the very end, we would have checked OF to see if
             the 4999,9999,9999/5000,0000,0000 boundary had been crossed. If
             OF = 1 at the end, then the result was either too negative or too
             positive.


             OVERFLOW

             How does the machine decide that overflow has occured? First,
             what exactly is overflow and when is it possible for overflow to
             occur?

             Overflow is when the result of a signed addition or subtraction
             is either larger than the largest positive number or more
             negative than the most negative number. In the case of the 4
             digit machine, larger than +4999 or more negative than -5000.

             If one number is negative and the other is positive, it is not
             possible for overflow to occur. Take +32 and -4791 as examples.
             If we start with the positive number (+32) and add the negative
             number (-4791), the result can't possibly be too positive.
             Similarly, if we start with the negative number (-4791) and add
             the positive number (+32), the result can't be too negative.
             Therefore, the result can be neither too positive nor too
             negative. Make sure you understand this before going on. 

             What if both are positive? Then overflow is possible. Here are
             some examples:

                 (+3500) + (+4500) = 8000 = -2000
                 (+2872) + (+2872) = 5744 = -4256
                 (+1799) + (+4157) = 5956 = -4044

             In each case, two positive numbers give a negative result. How
             about two negative numbers?

                                (7154) + (6000) = 3154 = +3154
             (actual value)     -2946    -4000

                                (5387) + (5826) = 1213 = +1213
             (actual value)     -4613    -4174

                                (8053) + (6191) = 4244 = +4244
             (actual value)     -1947    -3809

             The numbers underneath are the negative numbers that the numbers
             above them represent. In these cases, adding two negative numbers
             gives a positive result.

             This is what the machine checks for. Before the addition, it
             checks the signs of the numbers. If the signs are the same, then
             the result must also be the same sign or overflow has
             occurred.{10}  Thus + and + must have a + result; - and - must
             have a - result. If not, OF (the overflow flag) is set (OF = 1).
             Otherwise OF is cleared (OF = 0).
            

             MULTIPLICATION

             Unsigned multiplication is easy. The machine simply multiplies
             the two numbers. Since the result can be up to 8 digits (the
             maximum result is 9999 X 9999 = 9998 0001) the machine uses two
             registers to hold the result. We'll call them R1 and R2.

                 5436 X 174     R1   0094
                                R2   5864

                 2641 X 2003    R1   0528
                                R2   9923

             You need to know which register holds which half of the result,
             but besides that, everything is straightforward. On this machine
             R1 holds the left four digits and R2 holds the right four digits.

             Notice that our machine has changed the modular base from N to
             N*N (from 1 0000 to 1 0000 0000). What this means is that two
             things which are modularly equivalent under addition and
             subtraction are not necessarily equivalent under multiplication
             and division.  6281 and -3719 will not work the same.
             
             The machine can't do signed multiplication. What it actually does
             is convert the numbers to positive numbers (if necessary),
             perform unsigned multiplication, and then do sign adjustment of
             the results (if necessary). It uses 2 registers for the result.

                           SIGNED MULTIPLICATION      REGS         RESULT

             (number)           (5372) X (3195)     R1   8521  =  -1478 6460
             (actual value)     -4628  X +3195      R2   3540

             (number)           (9164) X (8746)     R1   0104  =   +104 8344
             (actual value)      -836  X -1254      R2   8344

             (number)           (9927) X (0013)     R1   9999  =        -949
             (actual value)      -73  X   +13       R2   9051

             Looking at the last example, if we performed unsigned
             multiplication on those two numbers, we would have
             9927 X 0013 = 0012 9051, a completely different answer from the
             one we got. Therefore, whenever you do multiplication, you have
             to tell the machine whether you want unsigned or signed
             multiplication.


             DIVISION

             Unsigned division is easy too. The machine divides one number by
             the other, puts the quotient in one register and the remainder in
             another. Once again, the only problem is remembering which
             register has the quotient and which register has the remainder.
             For us, the quotient is R1 and the remainder is R2.

                 6190 / 372          R1   0016           16  remainder 238
                                     R2   0238

                 9845 / 11           R1   0895           895 remainder 0
                                     R2   0000

             As with multiplication, signed division is handled by the machine
             changing all numbers to positive numbers, performing unsigned
             division, then putting back the appropriate signs.


                         SIGNED DIVISION         REGS            RESULT

             (number)      (7192) / (9164)     R1   0003      +3  rem. -300
             (actual value)-2808  /  -836      R2   9700

             (number)      (3753) / (9115)     R1   9996      -4  rem. +213
             (actual value)+3753  /  -885      R2   0213

             Looking at the last example, 3753 / 9115, if that were unsigned
             multiplication the answer would be 0 remainder 3753, a completely
             different answer from the signed division. Every time you do a
             division, you have to state whether you want unsigned or signed
             division.


             BASES 2 AND 16

             I'm making the assumption that if you are along for the ride you
             already know something about binary and hex numbers. This is a
             review only.


             BASE 2 AND BASE 16

             Base 2 (binary) allows only 0s and 1s. Base 16 (hexadecimal)
             allows 0 - 9, and then makes up the next six numbers by using the
             letters A - F. A = 10, B=11, C=12, D=13, E=14 and F=15. You can
             directly translate a hex number to a binary number and a binary
             number to a hex number. A group of four digits in binary is the
             same as a single digit in hex. We'll get to that in a moment.

             The binary digits (BITS) are the powers of 2. The values of the
             digits (in increasing order) are 1, 2, 4, 8, 16, 32, 64, 128, 256
             and so on. 1 + 2 + 4 + 8 = 15, so the first four digits can
             represent a hex number. This repeats itself every four binary
             digits. Here are some numbers in binary, hex, and decimal

                 BINARY         HEX      DECIMAL

                 0100            4          4
                 1111            F         15
                 1010            A         10 
                 0011            3          3

             Let's go from binary to hex. Here's a binary number.

                 0110011010101101

             To go from binary to hex, first divide the binary number up into
             groups of four starting from the right.

                 0110 0110 1010 1101

             Now simply change each group into a hex number. 

                 0110 ->   4 + 2     ->   6
                 0110 ->   4 + 2     ->   6
                 1010 ->   8 + 2     ->   A
                 1101 ->   8 + 4 + 1 ->   D

             and we have 66AD as the result. Similarly, to go from hex to
             binary:

                 D39F

             change each hex digit into a set of four binary digits:

                 D = 13    ->   8 + 4 + 1 ->   1101
                 3         ->   2 + 1     ->   0011
                 9         ->   8 + 1     ->   1001
                 F = 15    ->   8+4+2+1   ->   1111

             and then put them all together:

                 1101001110011111 

             Of course, having 16 digits strung out like that makes it totally
             unreadable, so in this book, if we are talking about a binary
             number, it will always be separated every 4 digits for
             clarity.{1} 

             All computers operate on binary data, so why do we use hex
             numbers? Take a test. Copy these two binary numbers:

                 1011 1000 0110 1010 1001 0101 0111 1010
                 0111 1100 0100 1100 0101 0110 1111 0011

             Now copy these two hex numbers:

                 B86A957A
                 7C4C56F3

             As you can see, you recognize hex numbers faster and you make
             fewer mistakes in transcription with hex numbers. 


             ADDITION AND SUBTRACTION

             The rules for binary addition are easy:

                 0 + 0 = 0
                 0 + 1 = 1
                 1 + 0 = 1
                 1 + 1 = 0  (carry 1 to the next digit left)

             similarly for binary subtraction:

                 0 - 0 = 0
                 0 - 1 = 1  (borrow 1 from the next digit left)
                 1 - 0 = 1
                 1 - 1 = 0

             On the 8086, you can have a 16 bit (binary digit) number
             represent a number from 0 - 65535. 65535 + 1 = 0 (65536). For
             binary numbers, the boundary is 65535/0. You count up or down
             through that boundary. The 8086 is a mod 65536 machine. That
             means the things that are equivalent to 35631 mod 65536 are:{2}

             ================================================================
             1. This will not be true of the actual assembler code, since
             the assembler demands an unseparated number.

             2. 35631 + 29905 = 65536.  -29905 = 35631 (mod 65536)
             ================================================================

                 (3*65536 + 35631)        (3*65536 - 29905)
                 (2*65536 + 35631)        (2*65536 - 29905)
                 (1*65536 + 35631)        (1*65536 - 29905)
                 (      0 + 35631)        (      0 - 29905)
                 (-1*65536 + 35631)       (-1*65536 - 29905)
                 (-2*65536 + 35631)       (-2*65536 - 29905)
                 (-3*65536 + 35631)       (-3*65536 - 29905)

             The unsigned number 35631 and the signed number -29905 look the
             same. They ARE the same. In all addition, they will operate in
             the same fashion. The unsigned number will use CF (the carry
             flag) and the signed number will use OF (the overflow flag). 

             On all 16 bit computers, 0-32767 is positive and 32768 - 65535 is
             negative. Here's 32767 and 32768.

                 32767     0111 1111 1111 1111
                 32768     1000 0000 0000 0000

             32768 and all numbers above it have the left bit 1. 32767 and all
             numbers below it have the left bit 0. This is how to tell the
             sign of a signed number. If the left bit is 0 it's positive and
             if the left bit is 1 it's negative.


             TWO'S COMPLEMENT

             In base 10 we had 10's complement to help us with negative
             numbers. In base 2, we have 2s complememt.

                 0 = 65536 = 65535 + 1

             so we have:

                 1 0000 0000 0000 0000 =  1111 1111 1111 1111 + 1

             To get the negative of a number, we subtract:

                 -49 = 0 - 49 = 65536 - 49 = 65535 - 49 + 1

             (65536)  1111 1111 1111 1111 + 1
                (49)  0000 0000 0011 0001
             result   1111 1111 1100 1110 + 1 -> 1111 1111 1100 1111  (-49)
             ; - - - - -

             -21874
             (65536)  1111 1111 1111 1111 + 1
             (21874)  0101 0101 0101 0111
             result   1010 1010 1010 1000 + 1 -> 1010 1010 1010 1001 (-21847)
             ; - - - - -

             -11628
             (65536)  1111 1111 1111 1111 + 1
             (11628)  0010 1101 0110 1100
             result   1101 0010 1001 0011 + 1 -> 1101 0010 1001 0100 (-11628)
             ; - - - - -

             -1764
             (65536)  1111 1111 1111 1111 + 1
              (1764)  0000 0110 1110 0100 
             result   1111 1001 0001 1011 + 1 -> 1111 1001 0001 1100 (-1764)
             ; - - - - -

             Notice that since:

                 1 - 0 = 1
                 1 - 1 = 0

             when you subtract from 1, you are simply switching the value of
             the subtrahend (that's the number that you subtract).

                 1    ->   0
                 0    ->   1

             1 becomes 0 and 0 becomes 1. You don't even have to think about
             it. Just switch the 1s to 0s and switch the 0s to 1s, and then
             add 1 at the end. Well do one more:

             -348
             (65536) 1111 1111 1111 1111 + 1
              (348)  0000 0001 0101 1100
             result  1111 1110 1010 0011 + 1 ->  1111 1110 1010 0100 (-348)

             Now two more, this time without the crutch of having the top
             number visible. Remember, even though you are subtracting, all
             you really need to do is switch 1s to 0s and switch 0s to 1s, and
             then add 1 at the end.

             -658

              (658)  0000 0010 1001 0010
             result  1111 1101 0110 1101 + 1 -> 1111 1101 0110 1110 (-658)
             ; - - - - -

             -31403

             (34103) 0111 1010 0100 0111
             result  1000 0101 1011 1000 + 1 -> 1000 0101 1011 1001 (-31403)

             SIGN EXTENSION

             If you want to use larger numbers, it is possible to use multiple
             words to represent them.{3}  The arithmetic will be done 16 bits
             at a time, but by using the method described in Chapter 0.1, it
             is possible to add and subtract numbers of any length. One normal
             length is 32 bits. How do you convert a 16 bit to a 32 bit
             number? If it is unsigned, simply put 0s to the left:

               0100 1100 1010 0111 ->  0000 0000 0000 0000 0100 1100 1010 0111

             What if it is a signed number? The first thing we need to know
             about signed numbers is what is positive and what is negative.
             Once again, for reasons of symmetry, we choose positive to be
             
                 from 0000 0000 0000 0000 0000 0000 0000 0000
                 to   0111 1111 1111 1111 1111 1111 1111 1111 
                 (hex 00000000 to 7FFFFFFF) 
             
                 and we choose negative to be 
                 
                  from 1000 0000 0000 0000 0000 0000 0000 0000 
                  to   1111 1111 1111 1111 1111 1111 1111 1111 
                 (hex 10000000 to FFFFFFFF).{4}  
             
                 This longer number system cycles 
                 
                 from 1111 1111 1111 1111 1111 1111 1111 1111 
                 to   0000 0000 0000 0000 0000 0000 0000 0000 
                 (hex FFFFFFFF to 00000000). 
             
             Notice that by using binary numbers we are innundating ourselves
             with 1s and 0s.

             If it is a positive signed number, it is still no problem (recall
             that in our 16 bit system, a positive number is between 0000 0000
             0000 0000 and 0111 1111 1111 1111, a negative signed number is
             between 1000 0000 0000 0000 and 1111 1111 1111 1111). Just put 0s
             to the left. Here are some positive signed numbers and their
             conversions:

                (1974)
                0000 0111 1011 0110 -> 0000 0000 0000 0000 0000 0111 1011 0110 
                (1)
                0000 0000 0000 0001 -> 0000 0000 0000 0000 0000 0000 0000 0001
                (3909)
                0000 1111 0100 0101 -> 0000 0000 0000 0000 0000 1111 0100 0101

             If it is a negative number, where did its representation come
             from in our 16 bit system? -x -> 1111 1111 1111 1111 + 1 -x =
             1111 1111 1111 1111 - x + 1. This time it won't be FFFFh + 1 but
             FFFFFFFFh + 1. Let's have some examples. (Here we have 8 bits to
             the group because there is not enough space on the line  to
             accomodate 4 bits to the group).


               16 BIT SYSTEM                  32 BIT SYSTEM

              -1964
             11111111 11111111 + 1     11111111 11111111 11111111 11111111 + 1
             00000111 10101100         00000000 00000000 00000111 10101100  

             11111000 01010011 + 1     11111111 11111111 11111000 01010011 + 1

             11111000 01010100         11111111 11111111 11111000 01010100  

             =================================================================
             4. Once again, the sign will be decided by the left hand
             digit. If it is 0 it is a positive number; if it is 1 it is a
             negative number.
             =================================================================

             -2867
             11111111 11111111 + 1     11111111 11111111 11111111 11111111 + 1
             00001011 00110011         00000000 00000000 00001011 00110011

             11110100 11001100 + 1     11111111 11111111 11110100 11001100 + 1

             11110100 11001101         11111111 11111111 11110100 11001101

             -182
             11111111 11111111 + 1     11111111 11111111 11111111 11111111 + 1
             00000000 10110110         00000000 00000000 00000000 10110110

             11111111 01001001 + 1     11111111 11111111 11111111 01001001 + 1

             11111111 01001010         11111111 11111111 11111111 01001010

             As you can see, all you need to do to sign extend a negative
             number is to put 1s to the left. 

             Can't those 1s on the left become 0s when we add that 1 at the
             end?  No. In order for that to happen, the right 16 bits must be
             1111 1111 1111 1111. But that can only happen if the number to be
             negated is 0:

                  1111 1111 1111 1111 1111 1111 1111 1111 + 1
                                     -0000 0000 0000 0000
                  1111 1111 1111 1111 1111 1111 1111 1111 + 1 -> 

                                     0000 0000 0000 0000 0000 0000 0000 0000

             In all other cases, adding 1 does not carry anything out of the
             right 16 bits.

             It is impossible to truncate one of these 32 bit numbers to a 16
             bit number without making the results unreliable. Here are two
             examples:

             +1,687,451     
             00000000 00011001 10111111 10011011 -> 10111111 10011011 (-16485) 

             -3,524,830            
             11111111 11001010 00110111 00100010 -> 00110111 00100010 (+14114)

             Truncating has changed both the sign and the absolute value of
             the number.




ADDRESSING MODES AND POINTERS
                  
             In this section we are going to cover all possible ways of
             getting data to and from memory with the different addressing
             modes. Read this carefully, since it is likely this is the only
             time you will ever see ALL addressing possibilities covered. 

             The easiest way to move data is if the data has a name and the
             data is one or two bytes long. Take the following data:

             ; -----
             variable1 dw  2000
             variable2 db  -26
             variable3 dw  -589
             ; -----

             We can write:

                 mov  variable1, ax
                 mov  cl, variable2
                 mov  si, variable3

             and the assembler will write the appropriate machine code for
             moving the data. What can we do if the data is more than two
             bytes long? Here is some more data:

             ; -----
             variable4 db  "This is a string of ascii data."
             variable5 dd  -291578
             variable6 dw  600 dup (-11000)
             ; -----

             Variable4 is the address of the first byte of a string of ascii
             data. Variable5 is a single piece of data, but it won't fit into
             an 8086 register since it is 4 bytes long. Variable6 is a 600
             element long array, with each element having the value -11000. In
             order to deal with these, we need pointers.

             Some of you will be flummoxed at this point, while those who are
             used to the C language will feel right at home. A pointer is
             simply the address of a variable. We use one of the 8086
             registers to hold the address of a variable, and then tell the
             8086 that the register contains the address of the variable, not
             the variable itself. It "points" to a place in memory to send the
             data to or retrieve the data from. If this seems a little
             confusing, don't worry; you'll get the hang of it quickly. 

             As I have said before, the 8086 does not have general purpose
             registers. Many instructions (such as LOOP, MUL, IDIV, ROL) work
             only with specific registers. The same is true of pointers. You
             may use only  BX, SI, DI, and BP as pointers. The assembler will
             give you an error if you try using a different register as a
             pointer.

             There are two ways to put an address in a pointer. For variable4,
             we could write either:

                 lea  si, variable4

             or:

                 mov  si, offset variable4

             Both instructions will put the offset address of variable4 in
             SI.{1} SI now 'points' to the first byte (the letter 'T') of
             variable4. If we wanted to move the third byte of that array
             (the letter 'i') to CL, how would we do it? First, we need to
             have SI point to the third byte, not the first. That's easy:

                 add  si, 2

             But if we now write:

                 mov  cl, si

             we will generate an assembler error because the assembler will
             think that we want to move the data in SI (a two byte number) to
             CL (one byte). How do we tell the assembler that we are using SI
             as a pointer? By enclosing SI in square brackets:

                 mov  cl, [si]

             since CL is one byte, the assembler assumes you want to move one
             byte. If you write:

                 mov  cx, [si]

             then the assembler assumes that you want to move a word (two
             bytes). The whole thing now is:

                 lea  si, variable4
                 add  si, 2
                 mov  cl, [si]

             This puts the third byte of the string in CL. Remember, if a
             register is in square brackets, then it is holding the ADDRESS of
             a variable, and the 8086 will use the register to calculate where
             the data is in memory.

             What if we want to put 0s in all the elements of variable6?
            =================================================================
             1 LEA stands for load effective address. Note that with LEA,
             we use only the name of the variable, while with:

                 mov  si, offset variable4

             we need to use the word 'offset'. The exact difference between
             the two will be explained later.
             ===============================================================

             Here's the code:

                      mov  bx, offset variable6
                      mov  ax, 0
                      mov  cx, 600
                 zero_loop:
                      mov  [bx], ax
                      add  bx, 2
                      loop zero_loop

             We add 2 to BX each time since each element of variable6 is a
             word (two bytes) long. There is another way of writing this:

                      mov  bx, offset variable6
                      mov  cx, 600
                 zero_loop:
                      mov  [bx], 0
                      add  bx, 2
                      loop zero_loop

             Unfortunately, this will generate an assembler error. Why? If the
             assembler sees:

                      mov  [bx], ax

             it knows that you want to move what is in AX to the address in
             BX, and AX is one word (two bytes) long so it generates the
             machine code for a word move. If the assembler sees:

                      mov  [bx], al

             it knows that you want to move what is in AL to the address in
             BX, and AL is one byte long, so it generates the machine code for
             a byte move. If the assembler sees:

                      mov  [bx], 0

             it doesn't know whether you want a byte move or a word move. The
             8086 assembler has implicit sizing. It is the assembler's job to
             look at each instruction and decide whether you want to operate
             on a byte or a word. Other microprocessors do things differently.

             Back to the 8086. If the 8086 assembler looks at an instruction
             and it can't tell whether you want to move a byte or a word, it
             generates an error. When you use pointers with constants, you
             should explicitly state whether you want a byte or a word. The
             proper way to do this is to use the reserved words BYTE PTR or
             WORD PTR.

                      mov  [bx], BYTE PTR 213
                      mov  [bx], WORD PTR 213

             These stand for byte pointer and word pointer respectively. I
             find this terminology exceptionally clumsy, but that's life.
             Whenever you are moving a constant with a pointer, you should
             specify either BYTE PTR or WORD PTR.

             The Microsoft assembler makes some assumptions about the size of
             a constant. If the number is 256 or below (either positive or
             negative), you MUST explicitly state whether it is a byte or a
             word operation. If the number is 257 or above (either positive or
             negative), the assembler assumes that you want a word operation.

             Here's the previous code rewritten correctly:

                      mov  bx, offset variable6
                      mov  cx, 600
                 zero_loop:
                      mov  [bx], WORD PTR 0
                      add  bx, 2
                      loop zero_loop

             Let's add 435 to every element in the variable6 array:

                      mov  bx, offset variable6
                      mov  cx, 600
                 add_loop:
                      add  [bx], WORD PTR 435
                      add  bx, 2
                      loop add_loop

             How about multiplying every element in the array by 12?

                      mov  di, offset variable6
                      mov  cx, 600
                      mov  si, 12
                 mult_loop:
                      mov  ax, [di]
                      imul si
                      mov  [di], ax
                      add  di, 2
                      loop mult_loop

             None of these examples did any error checking, so if the result
             was too large, the overflow was ignored. This time we used DI for
             a change of pace. Remember, we may use BX, SI, DI or BP, but no
             others. You will notice that in all these examples, we started at
             the beginning of the array and went step by step through the
             array. That's fine, and that's what we normally would do, but
             what if we wanted to look at individual elements? Here's a sample
             program:

             ;  START DATA BELOW THIS LINE
             ; 
             poem_array  db "She walks in Beauty, like the night"
                         db "Of cloudless climes and starry skies;"
                         db "And all that's best of dark and bright"
                         db "Meet in the aspect ratio of 1 to 3.14159"
             character_count  db  149
             ;  END DATA ABOVE THIS LINE

             ;  START CODE BELOW THIS LINE

                 mov  bx, offset poem_array
                 mov  dl, character_count

             character_loop:
                 sub  ax, ax              ; clear ax
                 call get_unsigned_byte
                 dec  al                  ; character #1 = array[0]
                 cmp  al, dl              ; out of range?
                 ja   character_loop      ; then try again
                 mov  si, ax              ; move char # to pointer register
                 mov  al, [bx+si]         ; character to al
                 call print_ascii_byte
                 jmp  character_loop

             ; + + + + + END CODE ABOVE THIS LINE

             You enter a number and the program prints the corresponding
             character. Before starting, we put the array address in BX and
             the maximum character count in DL. After getting the number from
             get_unsigned_byte, we decrement AL since the first character is
             actually poem_array[0]. The character count has been reduced by 1
             to reflect this fact. It also makes 0 an illegal entry. Notice
             that the program checks to make sure you don't go past the end of
             the poem. This time we use BX to mark the beginning of the array
             and SI to count the number of the character.

             Once again, there are only specific combinations of pointers that
             can be used. They are:

                 BX with either SI or DI (but not both)
                 BP with either SI or DI (but not both)

             My version of the Microsoft assembler (v5.1) recognizes the forms
             [bx+si], [si+bx], [bx][si], [si][bx], [si]+[bx] and [bx]+[si] as
             the same thing and produces the same machine code for all six.

             We can get even more complicated, but to show that, we need
             structures. In databases they are called records. In C they are
             called structures; in any case they are the same thing - a group
             of different types of data in some standard order. After the
             group is defined, we usually make an array with the identical
             structure for each element of the array.{4} Let's make a
             structure for an address book.

                 last_name  db  15 dup (?)
                 first_name db  15 dup (?)
                 age        db  ?
                 tel_no     db  10 dup (?)

             In this case, all the data is bytes, but that is not necessary.
             It can be anything. Each separate piece of data is called a
             FIELD. We have the last_name field, the first_name field, the age
             field, and the tel_no field. Four fields in all. The structure is
             41 bytes long. What if we want to have a list of 100 names in our
             telephone book? We can allocate memory space with the following
             definition:

                 address_book   db  100 dup ( 41 dup (' ')) {5}

             Well, that allocates room in memory, but how do we get to
             anything? First, we need the array itself:

                 mov  bx, offset address_book

             Then we need one specific entry. Let's take entry 29 (which is
             address_book[28]). Each entry is 41 bytes long, so:

                 mov  ax, 28    ; entry (less 1)
                 mov  cx, 41    ; entry length
                 mul  cx
                 mov  di, ax    ; move to pointer

             That gives us the entry, but if we want to get the age, that's
             not the first byte of the structure, it's the 31st byte (actually
             address_book[28] + 30 since the first byte is at +0). We get it
             by writing:

                 mov  dl, [bx+di+30]

             This is the most complex thing we have - two pointers plus a
             constant. The total code is then:

                 mov  bx, offset address_book
                 mov  ax, 28    ; entry (less 1)
                 mov  cx, 41    ; entry length

                 mul  cx        ; entry offset from array[0]
                 mov  di, ax    ; move entry offset to pointer
                 mov  dl, [bx+di+30]  ; total address

             Though the machine code has only one constant in the code, the
             assembler will allow you to put a number of constants in the
             assembler instruction. It will add them together for you and
             resolve them into one number.

             Once again, there are a limited number of registers - they are
             the same registers as before:

                 BX with either SI or DI (but not both) plus constant
                 BP with either SI or DI (but not both) plus constant

             We can work with structures on the machine level, but it looks
             like it's going to be hard to keep track of where each field is.
             Actually, it isn't so bad because of:

                               OUR FRIEND, THE EQU STATEMENT

             The assembler allows you to do substitution. If you write:

                 somestuff EQU  37 * 44

             then every place that the assembler finds the word "somestuff",
             it will substitute what is on the right side of the EQU. Is that
             a number or text? Sometimes it's a number, sometimes it's text.
             Here are four statements which are defined totally in terms of
             numbers. This is from the assembler listing. (The assembler lists
             how it has evaluated the EQU statement on the left after the
             equal sign.)

              = 0023               statement1 EQU  5 * 7 
              = 000F               statement3 EQU  statement2 - 22 
             and the assembler thinks of these as numbers (these numbers are
             in hex). Now in the next set, with only a minor change:

              = [bp + 3]                    statement1 EQU  [bp + 3] 
              = [bp + 3] + 6 - 4 - 22       statement3 EQU  statement2 - 22 
          
             the assembler thinks of it as text. Obviously, the fact that it
             can be either may cause you some problems along the way. Consult
             the assembler manual for ways to avoid the problem.

             Now we have a tool to deal with structures. Let's look at that
             structure again.

                 last_name  db  15 dup (?)
                 first_name db  15 dup (?)
                 age        db  ?
                 tel_no     db  10 dup (?)

             We don't actually need a data definition to make the structure,
             we need equates:

                 LAST_NAME      EQU  0
                 FIRST_NAME     EQU  15
                 AGE            EQU  30
                 TEL_NO         EQU  31

             this gives us the offset from the beginning of each record. If we
             again define:

                 address_book   db  100 dup ( 41 dup (' '))

              then to get the age field of entry 87, we write:

                 mov  bx, offset address_book
                 mov  ax, 86    ; entry (less 1)
                 mov  cx, 41    ; entry length
                 mul  cx        ; entry offset from array[0]
                 mov  di, ax    ; move entry offset to pointer
                 mov  dl, [bx+di+AGE]  ; total address

             This is a lot of work for the 8086, but that is normal with
             complex structures. The only thing that takes a lot of time is
             the multiplication, but if you need it, you need it.

             How about a two dimensional array of integers, 60 X 40

                 int_array  dw  40 dup  ( 60 dup ( 0 ))

             These are initialized to 0. For our purposes, we'll assume that
             the first number is the row number and the second number is the
             column number; i.e. array [6,13] is row 6, column 13. We will
             have 40 rows of 60 columns. For ease of calculation, the first
             array element is int_array [0,0]. (If it is your array, you can
             set it up any way you want {8}). Each row is 60 words (120 bytes)
             long. To get to int_array [23, 45] we have:

                 mov  ax, 120   ; length of one row in bytes
                 mov  cx, 23    ; row number
                 mul  cx
                 mov  bx, ax    ; row offset to bx
                 mov  si, 45    ; column offset
                 sal  si, 1     ; multiply column offset by 2 (for word size)
                 mov  dx, [bx+si]    ; integer to dx

             Using SAL instead of MUL is about 50 times faster. Since most
             arrays you will be working with are either byte, word, or double
             word (4 bytes) arrays, you can save a lot of time. Let
             ELEMENT_NUMBER be the array number (starting at 0) of the desired
             element in a one-dimensional array. For byte arrays, no
             multiplication is needed. For a word:

                 mov  di, ELEMENT_NUMBER
                 sal  di,1      ; multiply by 2

             and for a double word (4 bytes):

                 mov  di, ELEMENT_NUMBER
                 sal  di, 1
                 sal  di, 1     ; multiply by 4

             This means that a one-dimensional array can be accessed very
             quickly as long as the element length is a power of 2 - either 2,
             4 or 8. Since the standard 8086 data types are all 1, 2, 4, or 8
             bytes long, one dimensional arrays are fast. Others are not so
             fast.

             As a quick review before going on, these are the legal ways to
             address a variable on the 8086:

                 (1) by name.

                           mov  dx, variable1

                 It is also possible to have name + constant.

                           mov  dx, variable1 + 27

                 The assembler will resolve this into a single offset number
                 and will give the appropriate information to the linker.

                 (2) with the single pointers BX, SI, DI and BP (which are
                 enclosed in square brackets).

                           mov  cx, [si]
                           xor  al, [bx]
                           add  [di], cx
                           sub  [bp], dh

                 (3) with the single pointers BX, SI, DI and BP (which are
                 enclosed in square brackets) plus a constant.

                           mov  cx, [si+421]
                           xor  al, 18+[bx]
                           add  93+[di]-7, cx
                           sub  (54/7)+81-3+[bp]-19, dh

                 (4) with the double pointers [bx+si], [bx+di], [bp+si],
                 [bp+di]  (which are enclosed in square brackets).

                           mov  cx, [bx][si]
                           xor  al, [di][bx]
                           add  [bp]+[di], cx
                           sub  [di+bp], dh

                 (5) with the double pointers [bx+si], [bx+di], [bp+si],
                 [bp+di]  (which are enclosed in square brackets) plus a
                 constant.

                           mov  cx, [bx][si+57]
                           xor  al, 45+[di+23][bx+15]-94
                           add  [bp]+[di]-444, cx
                           sub  [6+di+bp]-5, dh

             These are ALL the addressing modes allowed on the 8086. As for
             the constants, it is the ASSEMBLER'S job to resolve all numbers
             in the expression into a single constant. If your expression
             won't resolve into a constant, it is between you and the
             assembler. It has nothing to do with the 8086 chip. 

             We can consolidate all this information into the following list:

                 All the following addressing modes can be used with or
                 without a constant:

                 variable_name  (+constant)
                 [bx]     (+constant)
                 [si]     (+constant)
                 [di]     (+constant)
                 [bp]     (+constant)
                 [bx+si]  (+constant)
                 [bx+di]  (+constant)
                 [bp+si]  (+constant)
                 [bp+di]  (+constant)

                 This is a complete list.

             Thus, you can access a variable by name or with one of the eight
             pointer combinations. There are no other possibilities.

             One thing that may confuse you about an addressing statement is
             all the plusses and minuses. As an example:

                 mov  cx, -45+27[bx+22]+[-195+di]+23-44

             the total address is:

                 -45+27[bx+22]+[-195+di]+23-44

             When the 8086 performs this instruction, it will ADD (1) BX (2)
             DI and (3) a single constant. That single constant can be a
             positive or a negative number; the 8086 will ADD all three
             elements. The '+' in front of  'di' is for convenience of the
             assembler only;  [-195-di] is illegal and the assembler will
             generate an error. If you actually want the negative of what is
             in one of the registers, you must negate it before calling the
             addressing instruction:

                 neg  di
                 mov  cx, -45+27[bx+22]+[-195+di]+23-44

             once again, the only allowable forms are +[di], [di] or [+di].
             Either -[di] or [-di] will generate an assembler error. 


             If you ever see a technical description of the addressing modes,
             you will find a list of 24 different machine codes. The reason
             for this is that:

                      [bx]
                      [bx] + byte constant
                      [bx] + word constant

             are three different machine codes. Here is a listing of the same
             machine instruction with the three different styles:

                 MACHINE CODE             ASSEMBLER INSTRUCTION

                  03 04                     add   ax, [si] 
                  03 44 1B                  add   ax, [si+27] 
                  03 44 E5                  add   ax, [si-27] 
                  03 84 5BA7                add   ax, [si+23463] 
                  03 84 A459                add   ax, [si-23463] 


             (27d = 1Bh , 23463d = 5BA7h). The first byte of code (03) is the
             add (word) instruction. The second byte is the addressing code,
             and the third and fourth bytes (if any) are the constant (in
             hex). Addressing code 04 is:  (ax, [si]). Addressing code 44 is: 
             (ax, [si] + byte constant). Addressing code 84 is:  (ax, [si] +
             word constant). The fact that there are three different machine
             codes is of concern to the assembler, not to you. It is the
             assembler's job to make the machine code as efficient as
             possible. It is your job to write quality, robust code.

             SEGMENT OVERRIDES

             So far, we haven't talked about segment registers. You will
             remember from the last chapter that the 8086 assumes that a named
             variable is in the DS segment:

                 mov  ax, variable1

             If it isn't, the Microsoft assembler puts the correct segment
             override in the machine code. The segment overrides are:

                 SEGMENT OVERRIDE         MACHINE CODE (hex)
                      CS                       2E
                      DS                       3E
                      ES                       26
                      SS                       36

             As an example:

                 MACHINE CODE        ASSEMBLER  INSTRUCTIONS

                 2E: 03 06 0000 R      add   ax, variable3 
                 26: 2B 1E 0000 R      sub   bx, variable2 
                 31 36 0000 R          xor   variable1, si ; no override
                 36: 21 3E 00C8 R      and   variable4, di 

             when the different variables were in segments with different
             ASSUME statements. If you don't remember this, you should reread
             the section on overrides in the last chapter. Remember, the colon
             is in the listing only to tell you that we have a segment
             override. The colon is not in the machine code.

             What about pointers? The natural segment for anything with [bp]
             is SS, the stack segment.{1}  Everything else has DS as its
             natural segment. The natural segments are:

                 (1) DS

                      variable + (constant)
                      [bx] + (constant)
                      [si] + (constant)
                      [di] + (constant)
                      [bx+si] + (constant)
                      [bx+di] + (constant)

                 (2) SS

                      [bp] + (constant)
                      [bp+si] + (constant)
                      [bp+di] + (constant)

             where the constant is always optional. Can you use segment
             overrides? Yes, in all cases.{2}  Here is some assembler code
             along with the machine code which was generated.


                 MACHINE CODE             ASSEMBLER INSTRUCTIONS
                                      
                  26: 03 07                 add   ax, es:[bx] 
                  2E: 01 05                 add   cs:[di], ax 
                  36: 2B 44 11              sub   ax, ss:[si+17] 
                  2E: 29 46 00              sub   cs:[bp], ax 
                  3E: 33 03                 xor   ax, ds:[bp+di] 
                  26: 31 02                 xor   es:[bp+si], ax 
                  26: 89 43 16              mov   es:[bp+di+22], ax 
              
              
                  03 04                     add   ax, [si] 
                  03 44 1B                  add   ax, [si+27] 
                  03 84 A459                add   ax, [si-23463] 
                  26: 03 04                 add   ax, es:[si] 
                  26: 03 44 1B              add   ax, es:[si+27] 
                  26: 03 84 A459            add   ax, es:[si-23463] 


             (17d = 11h, 22d = 16h, 27d = 1Bh, -23463d = 0A459h). The first
             number (which is followed by a colon) is the segment override
             that the assembler has inserted in the machine code. Remember,
             the colon is in the listing to inform you that an override is           
             involved; it is not in the machine code itself.
                                       
             Unfortunately, when you use pointers you must put the override
             into the assembler instructions yourself. The assembler has no
             way of knowing that you want an override. This can cause some
             truly gigantic errors (if you reference a pointer seven times and
             forget the override once, the 8086 will access the wrong segment
             that one time), and those errors are extremely difficult to
             detect.

             As you can see from above, you put the override in the
             instructions by writing the appropriate segment (CS, DS, ES or
             SS) followed by a colon. As always, it is your responsibility to
             make sure that the segment register holds the address of the
             appropriate segment before using an override. 

             We have talked about two different types of constants in the
             chapter, a constant which is part of the address:

                 mov  ax, [bx+17]
                 add  [si+2190], dx
                 and  [di-8179], cx

             and a constant which is a number to used for an arithmetical or
             logical operation:

                 add  ax, 17
                 sub  dl, 45
                 add  dx, 22187

             They are both part of the machine instruction, and are
             unchangeable (true constants). This machine code is going to be
             difficult to read, so just look for (1) the constant DATA and (2)
             the constant in the ADDRESS. All constants in the assembler
             instructions are in hex so that they look the same as in the
             listing of the machine code. Here's a listing of different
             combinations.

             1. Pointer + constant as an address:

                 MACHINE CODE             ASSEMBLER INSTRUCTIONS
                  01 44 1B                  add   [si+1Bh], ax 
                  29 85 0A04                sub   [di+0A04h], ax 
                  30 5C 1F                  xor   [si+1Fh], bl 
                  20 9E 1FAB                and   [bp+1FABh], bl 
              
             2. Arithmetic instruction with a constant:

                 MACHINE CODE             ASSEMBLER INSTRUCTIONS
                  05 1065                   add   ax, 1065h 
                  2D 6771                   sub   ax, 6771h 
                  80 F3 37                  xor   bl, 37h 
                  80 E3 82                  and   bl, 82h 
              
             3. Pointer + constant as an address; arithmetic with a constant

                 MACHINE CODE             ASSEMBLER INSTRUCTIONS
                  81 44 1B 1065             add   [si+1Bh], 1065h 
                  81 AD 0A04 6771           sub   [di+0A04h], 6771h 
                  80 74 1F 37               xor   [si+1Fh], BYTE PTR 37h 
                  80 A6 1FAB 82             and   [bp+1FABh], BYTE PTR 82h 
              

             You will notice that the ADD instruction (as well as the other
             instructions) changes machine code depending on the complete
             format of the instruction (byte or word? to a register or from a
             register? what addressing mode? is AX one of the registers?).
             That's part of the 8086 machine language encoding, and it makes
             the 8086 machine code extremely difficult to decipher without a
             table listing all the options.

             OFFSET AND SEG

             There are two special instructions that the assembler has -
             offset and seg. For any variable or label, offset gives the
             offset from the beginning of the segment, and seg gives the
             segment address. If you write:

                 mov  ax, offset variable1

             the assembler will calculate the offset of variable1 and put it
             in the machine code. It also signals the linker and loader; if
             the linker should change the offset during linking, it will also
             adjust this number. If you write:

                 mov  dx, seg variable1

             The assembler will signal to the linker and the loader that you
             want the address of the segment that variable1 is in. The linker
             and loader will put it in the machine code at that spot. You
             don't need to know the name of the segment. The linker takes care
             of that. We will use the seg operator later. 
             

                                    Addressing Modes                     

                                        SUMMARY

             These are the natural (default) segments of all addressing modes:

                 (1) DS

                      variable + (constant)
                      [bx] + (constant)
                      [si] + (constant)
                      [di] + (constant)
                      [bx+si] + (constant)
                      [bx+di] + (constant)


                 (2) SS

                      [bp] + (constant)
                      [bp+si] + (constant)
                      [bp+di] + (constant)

             Where the constant is optional. Segment overrides may be used.
             The segment overrides are:

                 SEGMENT OVERRIDE         MACHINE CODE (hex)
                      CS:                      2E
                      DS:                      3E
                      ES:                      26
                      SS:                      36


             OFFSET

             The reserved word 'offset' tells the assembler to calculate the
             offset of the variable from the beginning of the segment.

                      mov  ax, offset variable2

             SEG

             The reserved word 'seg' tells the assembler, linker and loader to
             get the segment address of the segment that the variable is in.

                      mov  ax, seg variable2

             LEA

             LEA calculates an address using any of the 8086 addressing modes,
             then puts the address in a register.

                      lea  cx, [bp+di+27] 



SHIFT AND ROTATE

             There are seven instructions that move the individual bits of a
             byte or word either left or right. Each instruction works
             slightly differently. We'll make a standard program and then
             substitute each instruction into that program.
    
             SHL - SAL

             SHL destination,count

             CF <-- destination <-- 0

             SHL is the same instruction as SAL, Shift Arithmatic Left.
             SHL shifts the word or byte at the destination to the left by
             the number of bit positions specified in the second operand,COUNT. 
             As bits are transferred out the left (high-order) end of the 
             destination, zeros are shifted in the right (low-order) end. 
             The Carry flag is updated to match the last bit shifted out of
             the left end. It is used for multiplying an unsigned number by 
             powers of 2.

             There are two (and only two) forms of this instruction. All other
             shift and rotate instructions have these two (and only these two)
             forms as well. The first form is:

                 shl  al, 1

             Which shifts each bit to the left one bit. The number MUST be 1.
             No other number is possible. The other form is:

                 shl  al, cl

             shifts the bits in AL to the left by the number in CL. If CL = 3,
             it shifts left by 3. If CL = 7, it shifts left by 7. The count
             register MUST be CL (not CX). The bits on the left are shifted
             out of the register into the bit bucket, and zeros are inserted
             on the right. 

             For a register, it is faster to use a series of 1 shifts than to
             load cl. For a variable in memory, anything over 1 shift is
             faster if you load cl. CF always signals when a 1 bit has been
             shifted off the end.

             Summary

             SHL (shift logical left) and SAL (shift arithmetic left) are
             exactly the same instruction. They move bits left. 0s are
             placed in the low bit. Bits are shoved off the register (or
             memory data) on the left side, and CF indicates whether the
             last bit shoved was a 1 or a 0. It is used for multiplying
             an unsigned number by powers of 2.

             All shift and rotate instructions operate on either a register or
             on memory. They can be either 1 bit shifts:

                 sal  cx, 1
                 ror  variable1, 1
                 shr  bl, 1

             or shifts indexed by CL (it must be CL):
 
                 rcl  variable2, cl
                 sar  si, cl
                 rol  ah, cl




             SHR and SAR

             SHR destination,count
             0 -> destination -> CF
             Shifts the bits in destination to the right by the number of positions
             specified in the count operand, (or in cl, if no count operand is 
             included). 0's are shifted in on the left. If the sign bit retains
             its original value the Overflow flag is cleared; it is set if the sign
             changes. The Carry flag is updated to reflect the last bit shifted.
             Unlike the left shift instruction, there are two completely
             different right shift instructions. SHR (shift logical right)
             shifts the bits to the right, setting CF if a 1 bit is pushed off
             the right end. It puts 0s in the leftmost bit. It is dividing
             by two and is once again MUCH faster than division. For a single
             shift, the remainder is in CF. For a shift of more than one bit,
             you lose the remainder, but there is a way around this which we
             will discuss in a moment. 

             If you want to divide by 16, you will shift right four times, so
             you'll lose those 4 bits. But those bits are exactly the value of
             the remainder. All we need to do is:

                 mov  dx, ax    ; copy of number to dx
                 and  dx, 0000000000001111b ; remainder in dx
                 mov  cl, 4     ; shift right 4 bits
                 shr  ax, cl    ; quotient in ax

             Using a mask, we keep only the right four bits, which is the
             remainder.

             SAR

             SAR destination,count
             SF -> destination -> CF
             SAR (shift arithmetic right) is different. It shifts right like
             SHR, but the leftmost bit always stays the same. The overflow flag 
             will never change since the left bit will always stay the same. 

             SAR shifts the word or byte in destination to the right by the number
             of bit positions specified in the second operand, COUNT. As bits are
             transferred out the right (low-order) end of the destination, bits
             equal to the original sign bit are shifted into the left (high-order)
             end, thereby preserving the sign bit. The Carry flag is set equal to
             the last bit shifted out of the right end.

             SAR is an instruction for doing signed division by 2 (sort of).
             It is, however, an incomplete instruction. The rule for SAR is:
             SAR gives the correct answer if the number is positive. It gives
             the correct answer if the number is negative and the remainder is
             zero. If the number is negative but there is a remainder, then
             the answer is one too negative. 

             You will never or almost never use SAR for signed division, 
             while you will find lots of opportunity to use SHR and SHL 
             for unsigned multiplication and division.

             Summary
             
             SHR (shift logical right) does the same thing as SHL but in
             the opposite direction. Bits are shifted right. 0s are
             placed in the high bit. Bits are shoved off the register (or
             memory data) on the right side and CF indicates whether the
             last bit shoved off was a 0 or a 1. It is used for dividing
             an unsigned number by powers of 2.
   
             SAR (shift arithmetic right) shifts bits right. The high
             (sign) bit stays the same throughout the operation. Bits are
             shoved off the register (or memory data) on the right side.
             CF indicates whether the last bit shoved off was a 1 or a 0.
             It is used (with difficulty) for dividing a signed number by
             powers of 2.



             ROR and ROL
             
             ROR destination,count
             ROR shifts the word or byte at the destination to the right by
             the number of bit positions specified in the second operand, COUNT.

              --------<------     
             |               |
             -> destination ---> CF 
   
             As bits are transferred out the right (low-order) end of the 
             destination, they re-enter on the left (high-order) end. The Carry
             flag is updated to match the last bit shifted out of the right end.
              
             ROL destination,count

             CF <--- destination <--
                  |                 |
                   ------->----------

             As bits are transferred at the left (high-order) end of the 
             destination, they re-enter on the right (low-order) end. The Carry
             flag is updated to match the last bit shifted out of the left end.

             ROR (rotate right) and ROL (rotate left) rotate the bits around
             the register. The only flags that are defined are OF and CF. OF 
             is set if the high bit changes, and CF is set if a 1 bit moves 
             off the end of the register to the other side. 

             Summary

             ROR and ROL

             ROR (rotate right) and ROL (rotate left) rotate the bits of
             a register (or memory data) right and left respectively. The
             bit which is shoved off one end is moved to the other end.
             CF indicates whether the last bit moved from one end to the
             other was a 1 or a 0.



             RCR and RCL

             RCR destination,count
          
              --------<----------
             |                   | 
              -> destination -> CF

             RCR shifts the word or byte at the destination to the right by
             the number of bit positions specified in the second operand,COUNT.
             A bit shifted out of the right (low-order) end of the destination
             enters the Carry flag, and the displaced Carry flag rotates around
             to enter the vacated left-most bit position of the destination. This
             "bit rotation" continues the number of times specified in COUNT.
             Another way of looking at this is to consider the Carry flag as the
             lowest order bit of the word being rotated.

             RCL destination,count

              ---------->----------             
             |                     |
             CF  <- destination <-

             Another way of looking at this instruction is to consider the Carry 
             flag as the highest order bit of the word being rotated.              

             RCR (rotate through carry right) and RCL (rotate through carry
             left) rotate the same as the above instructions except that the
             carry flag is involved. Rotating right, the low bit moves to CF,
             the carry flag and CF moves to the high bit. Rotating left, the
             high bit moves to CF and CF moves to the low bit. There are 9
             bits (or 17 bits for a word) involved in the rotation. There are only 
             two flags defined, OF and CF. Obviously, CF is set if there is
             something in it. OF is wierd. In RCL (the opposite instruction to
             the one we are using), OF operates normally, signalling a change
             in the top (sign) bit. In RCR, OF signals a change in CF. Why? I
             don't have the slightest idea. You really have no need for the OF
             flag anyways, so this is unimportant.

             Summary  

             RCR and RCL

             RCR (rotate through carry right) and RCL (rotate through
             carry left) rotate the bits of a register (or of memory
             data) right and left respectively. The bit which is shoved
             off the register (or data) is placed in CF and the old CF is
             placed on the other side of the register (or data).

             Well, those are the seven instructions, but what can you do with
             them besides multiply and divide?

             First, you can work with multiple bit data. The 8087 has a word
             length register called the status register.  Looking at the upper
             byte:

                 15 14 13 12 11 10  9  8
                        X  X  X

             bits 11, 12 and 13 contain a number from 0 to 7. The data in this
             register is not directly accessable. You need to move the
             register into memory, then into an 8086 register. If you want to
             find what this number is, what do you do?

                 mov  bx, status_register_data
                 mov  cl, 3
                 ror  bx, cl
                 and  bh, 00000111b

             we rotate right 3 and then mask off everything else. The number
             is now in BH. We could have used SHR if we wanted. Another 8087
             register is the control register. In the upper byte it has:

                 15 14 13 12 11 10  9  8
                              X  X

             a number from 0 to 3 in bits 10 and 11. If we want the
             information, we do the same thing:

                 mov  bx, control_register_data
                 mov  cl, 2
                 ror  bx, cl
                 and  bh, 00000011b

             and the number is in BH. 
      
             One thing to know is that just inside a loop we must push CX.
             That is because we use CL for the ROL instruction. It is then
             POPped just before the loop instruction. This is typical. CX is
             the only register that can be used for counting in indexed
             instructions. It is common for indexing instructions to be
             nested, so you temporarily store the old value of CX while you
             are using CX for something different.

                 push cx        ; typical code for a shift
                 mov  cl, 7
                 shr  si, cl
                 pop  cx
                


             INC
                 INC increments a register or a variable by 1.

                      inc  ax
                      inc variable1


             DEC
                 DEC decrements a register or a variable by 1.

                      dec  ax
                      dec  variable1