CLASS 3

The Machine Level and CPU organization.

Here's a block diagram of how a typical CPU is organized:

        +--------------+
        |    CPU       |           +---------------+
        |              |--->MAR--->|               |
        | ++>ALU--+    |           |               |
        | ||      |    |           |  MEMORY       |
        | ||      |    |           |               |
        | ||      V    |           |               |
        | |reg.file    |<-->MDR<-->|               |
        | |    ^       |           |               |
        | |    |       |           |               |
        | +cont.unit   |           +---------------+
        |      |  ^    |
        |      V  |    |
        |     PC  V    |
        |        IR    |
        +--------------+

The von Neumann machine is divided into two main parts (as shown), the Central Processing Unit (CPU) and the Memory. The CPU runs programs while Memory is used to store inputs, intermediate results, and outputs. Memory is organized like P.O. boxes in a post office: each one has an address (which is a number) and contents.

The CPU is divided into two main parts: the control unit, and the arithmetic/logic unit (ALU). The job of the control unit is to run the instruction cycle described in the previous lecture. So the control unit fetches intructions from memory, decodes the instruction, fetches operands, and stores results. However, when the time comes to actually excute the instruction (e.g., actually perform the addition or subtraction or whatever the instruction requires) the control unit hands the job off to the ALU. The ALU's job is to actually make the addition (or whatever) happen.

The ALU cannot operate on memory items directly. It must have data brought from memory into locations within the CPU, called registers, in order to perform any operations on the data. A register is a location in the computer, with a name, that holds a single data item.

Memory is accessed by the CPU by means of the Memory Address Register (MAR) and the Memory Data Register (MDR). To perform a load operation (reading memory), the CPU puts the address of the item to be read into the MAR; seeing this, the memory unit looks up the specified address, retrieves the data stored there, and puts the data into the MDR. To perform a store operation (writing to memory), the CPU puts the data to be written into the MDR, then puts the address to which it's to be written into the MAR. The memory unit then places the data into the specified location. This process that the memory unit goes through in satisfying a CPU request is called the memory cycle.

Since data can't be accessed directly in memory, it must be loaded first, and stored when done. To add two numbers, the CPU must load both numbers into registers, add the registers, then store the result. Machines that function in this way are called LOAD/STORE machines.

Performance Measurement in Computers

System Performance - Total time required for an unloaded program to run
CPU performance - Total time the CPU is kept busy.
Clock Cycle (ns) and Clock Rate (MHz)
Clock cycles per instruction (CPI) - Average number of clock cycles taken by each instruction to execute.
Total CPU time = Instruction time x CPI x Clock cycle time
MIPS (Millions of Instructions per second)
MFLOPS (Million Floating-point Operations per second)

Processors
CPU has two main parts: DATA PATH and CONTROL UNIT

Main functions of the CPU are:

Executes programs in main memory
Controls the execution of instructions (Control Unit)
Performs arithmetic and logical operations on data (ALU)
Stores data temporarily in a high speed memory (RAM) which consists of a number of REGISTERS. Some special registers are the Program Counter (PC), Instruction Register (IR), etc.

Instruction Execution

Fetch the next instruction from memory
Change program counter to point to next insturuction
Determine the type of the instruction fetched
Find where the data being used by the instruction is kept
Fetch the data, if required
Execute the instruction
Store the results in the appropriate place
Go to step 1 and start all over again

This process called the FETCH-DECODE-EXECUTE cycle.

Instruction cycle. There is an interpreter that executes Machine level programs using Digital Logic. It is built out of digital logic, which is the L1 level of the machine. Here it is written as if it were software, but remember, it is really hardware. This particular interpreter is called the von Neumann cycle:

        pc = 0;
        do {
                instruction = memory[pc++];   /* fetch the instruction */
                decode(instruction);          /* decode the instruction */
                fetch(operands);              /* fetch the operands */
                execute;                      /* execute the instruction */
                store(results);               /* store the results */
        } while (instruction != halt);

From this interpreter, we can get an idea of what a machine language program looks like. It is a sequence of numbers, stored in memory. So if you "looked" at a machine language program stored in memory, it would look like a sequence of numbers. That is, it is indistinguishable from data! So, each instruction at the machine language level could be viewed as a number.

This is another important first example, namely of the notion of representation. Since a computer can't manipulate real world quantites directly, it must use a representation of those real world things internally - it must use a representation for numbers, characters, programs, etc. As a result, the same data can be interpreted in different ways depending on the use to which it is being put. In this case we are representing programs as sequences of numbers. However those numbers could be used for another purpose in a different representation. Choosing good representations is one of the principal topics in this course.

PARALLEL COMPUTERS

What are they ? Why is there a need for them ? How many types ?
SISD - Single instruction stream, Single data stream
SIMD - Single instruction stream, Multiple data stream
MIMD - Multiple instruction stream, Multiple data stream
MISD - not in existence, as too complex

Example of SIMD - Vector ALU

Example of MIMD - Distributed computers

PIPELINING
What is it? Why is it used? Is is still SISD ? Yes.... why?

The pipeline in a modern processor

        -----    ------    -------------    -------    -----
        Fetch -> Decode -> Operand Fetch -> Execute -> Store
        -----    ------    -------------    -------    -----

This is a breakdown of the functions of the control unit -- separate electronics inside the control unit performs each task. (review this diagram):

Typical Example of Pipelining

P1 P2 P3 P4 P5 Instruction fetch Instruction Address Data Instruction unit Analyzer calculation fetch execution

P1: 1 2 3 4 5 6

P2: 1 2 3 4 5

P3: 1 2 3 4

P4: 1 2 3

P5: 1 2

time ---->

The clock drives the pipeline. When you hear that a processor is a "300 MHz processor" that means that instructions in the pipeline are moving from one stage to the next 300 million times/sec.

Why do all this functional division of the control logic? The thing to observe is that each operation only uses one stage at a time. The other stages are available to work on other instructions. So in the SPARC, 5 instructions are in the pipeline at a time; each one in a different stage.

As a result, instead of executing one instruction every 5 cycles, we complete an instruction on every cycle. What kind of problem can be encountered while using this method???
Answer: What happens when a branch is executed? (or a call to another subroutine). All those instructions in the pipeline other than the current one are no good! The solution is to detect a branch really early, in the Decode stage. Load the new instruction into the pipe right after the one being fetched. So the one being fetched is always executed, even though it's after the branch. More on this topic later.

Performance Characteristics of a pipeline.

The latency of a pipeline is time from when an instruction enters the pipeline to when it leaves the pipeline. The throughput of a pipeline is the time difference between successive instructions leaving the pipeline.

A pipeline typically has longer latency than a single-unit design, because there is some inefficiency when handing data from unit to unit in a pipeline. However, the big win is that a pipeline has much better throughput - instructions get executed much more quickly.

Homework

READ ABOUT THE GENERATIONS OF COMPUTERS FOR NEXT CLASS.