Learn Microsoft Assembler in a Day:Appendix I REAL TIME PROGRAMMING

Appendix I
REAL TIME PROGRAMMING

Many people program in Assembly language to speed up program routines that take too much time in a high level language. All good high level language development systems should have an option to produce an Assembly language level source code listing of the compiled high level language source code. In most cases, this Assembly language level source code listing can be modified to improve performance of the software. This appendix discusses some methods commonly used.

One simple method is to check the code and remove all unnecessary NOP instructions. This should make the resulting code smaller and faster. Note that some NOP instructions may be needed by the code to execute correctly.

Another method is to look for unneeded instructions and remove them from the program code stream. Look for ways to take advantage of registers as storage area for temporary data.

When possible, replace all LEA instructions with MOV instructions using the OFFSET address option for the MOV. The MOV instruction is faster than the LEA.

Example:

   lea  SI,data_string

changes to

   mov  SI,offset data_string

Sometimes, you can convert multiply statements into bit shifts with adds to speed up the code. This method is discussed in the computer math section of this book.

When using a shift instruction with a shift count in register CL, the timing is very slow compared to a single bit shift for the standard 8086 processor. For the 8086 processor, the number of clock cycles required for a single bit shift in a register is two. The number of clock cycles required for a bit shift in a register with a shift count in register CL is 8 + (4 * CL). For the 80286 processor, the number of clock cycles required for a single bit shift in a register is two. The number of clock cycles required for a bit shift in a register with a shift count in register CL is 5 + CL. For the 80386 processor, the number of clock cycles required for a single bit shift in a register is three. The number of clock cycles required for a bit shift in a register with a shift count in register CL is three.

The XOR instruction can be used to set a register to zero faster than zero can be moved into the register with the MOV instruction.

The 80X86 processors use a prefetched instruction pipe to speed up execution. Program branching destroys the prefetched instruction pipe. Sometimes, reversing a jump condition and changing the code to match the reversed condition will increase speed.

Many language compilers offer the option of macro expansion which can be used in place of subroutine calls to speed up execution.

HARDWARE INTERRUPT TIMING CONSIDERATIONS

There are special timing problems that can occur when programming code is to be executed during a hardware interrupt. The primary concern is that the interrupt code must finish executing before the next interrupt from the hardware device occurs. To make sure the code can finish in time, the programmer may have to count clock timing cycles.

The following example demonstrates how these clock cycles add up. If the processor is running at 4.77 MHZ, then this translates to approximately 4,770,000 timing cycles per second. In this example, assume there is a hardware device that is interrupting at a rate of about 960 times per second (like a communications port running at 4800 BAUD with 480 receive interrupts per second and 480 transmit interrupts per second). For simplicity, round 960 up to 1,000 and divide this into 4,770,000 timing cycles per second. The result gives us about 4,770 timing cycles between each hardware interrupt at about 1,000 interrupts per second. Now if I say that the average instruction takes about 10 timing cycles, then I can say that you can only execute about 477 instructions between each hardware interrupt. If this hardware interrupt software routine requires the execution of more than 477 instructions per interrupt, you can assume that real time execution problems will occur. To be safe, the interrupt routine should allow for extra free cycles for the other hardware interrupting devices (such as the keyboard and the disk drives) to use.

If you tie into the system clock interrupt that ticks at a rate of about 18 times a second, you get the following figures:

(total cycles per second) / (18 ticks per second)

4,770,000 / 18 = 265,000

this is the number of timing cycles between each tick available

assume 12 cycles per instruction on average

265,000 / 12 = 22,083

this is the available number of instructions per tick

Some systems will use this clock tick interrupt to draw a mouse array item to the video screen. If the system is a 4.77MHZ PC and you find that each dot in the video array takes 100 instructions to update, then the maximum size of the video array will be limited to less than 220 dots or about a 10 by 20 dot array.

Table of Contents

Appendix IREAL TIME PROGRAMMING

Appendix I
REAL TIME PROGRAMMING