Chapter 1: Introduction
Architecture and Organizations:
Definitions
Computer hardware can be seen from three different points of view
depending on the interest of the investigator: computer
organization, computer design and computer architecture
Computer architecture (conceptual view)
Describes major components of the computer
User perspective of computer structure and behavior
Architectural design deals with component specs (e.g.
Processor speed, memory capacity) and combining them to build a
computer
Eg. 1. Instruction sets, addressing techniques.
Eg. 2. Is there a multiply instruction.
Definitions …
Computer organization
Describes how the computer works
How hardware components operate and the way they are connected
together to form the computer system
Eg. 1. Control signals
Eg.2. Is there a hardware multiply unit or is it done by repeated addition
Computer design (computer implementation)
Describes how the computer is built
Once the computer specifications are formulated this step
is performed
Determines which hardware should be used and how parts are connected
E.g. Choosing the ECL (emitter-coupled logic) circuit family for supercomputers over other
types such as TTL (transistor-transistor logic)
Definition, Evolution and Types of
Microprocessors
Processors vary in their speed and capacity of memory, registers, and data bus.
An internal clock synchronizes and controls all the processor’s operations. The
basic time unit, the clock cycle, is rated in terms of megahertz (millions of cycles
per second).
A microprocessor is a clock-driven semiconductor device with tens or hundreds
of thousands of transistors, resistors, capacitors, switches and other digital
circuit elements that are miniaturized on a single silicon ship.
It can be either LSIC or VLSIC
It is considered to be the heart of a computer system as it performs all the tasks.
The microprocessor you are using might be a Pentium, AMD , a PowerPC, a Sun-
SPARC or any of the many other brands and types of microprocessors
µ-processor can be divided into 3 segments for the sake of the clarity: ALU,
Register array, & Control unit.
ALU (Arithmetic/Logic Unit):
This unit performs such arithmetic operations as Addition, Substraction
and such logic operations as AND,OR & exclusive OR.
Register Array:
This area of the µ-processor consists of various registers identified by
letters such as B,C,D,E,H,L and Accumulator.
Control Unit
It controls the Flow of data between the µ-processor and memory &
peripherals.
µ-PROCESSOR also consists :
An address bus (that may be 8, 16 or 32 bits wide) that sends an address to
memory.
A data bus (that may be 8, 16 or 32 bits wide) that can send data to memory or
receive data from memory.
An RD (read) and WR (write) line to tell the memory whether it wants to set or
get the addressed location.
A clock line that lets a clock pulse sequence the processor
A reset line that resets the program counter to zero (or whatever) and restarts
execution
The system bus is a communication Path between the Microprocessor &
peripherals; it is nothing but a group of wires to carry bits.
I/O devices are also known as peripherals. i.e., keyboard, switches, & Analog-
to-digital converter are input devices. Such as LEDs, printer, X-Y plotter, Digital-
to-analog converter, Video screen are various Output devices.
I/O devices are connected to the computer through I/O circuits
Each I/O circuit consist of several registers called I/O ports each with
an address
I/O ports are used to store data and commands for the I/O devices.
They are like a transfer point between the CPU and I/O devices
Through the addresses, I/O ports are connected to the bus system
Two types of data transfer exists between I/O ports & the I/O devices:
serial and parallel
Serial Port: a slower port connection where only a bit is transmitted
Simple configuration but slow data transfer
Parallel Port: byte or word is transferred concurrently
Fast data transfer but more wiring connection is required
Intel 4004
The Intel 4004 was a 4-bit central processing unit (CPU) released by
Intel Corporation in 1971.
It was the first complete CPU on one chip, and also the first
commercially available microprocessor.
It has 2300 transistors, 640 bytes of addressable memory and a 740
KHZ clock speed.
The 4001 was a ROM (read-only memory) with four lines of output the
4002 was a RAM (random access memory) with four lines of input/
output the 4003 was a static shift register to be used for expanding the
I/O lines, for example, for keyboard scanning or for controlling a printer.
The 4004 included control functions for memory and I/O, which are not
normally handled by the microprocessor.
Intel 8080
The first microprocessor to make it into a home computer was the
Intel 8080.
It was introduced in 1974 ,it was a complete 8-bit computer on one
chip.
It was an extended and enhanced variant of the earlier 8008 design,
although without binary compatibility.
The initial specified clock frequency limit was 2 MHz and with
common instructions having execution times of 4,5,7,10 or 11 cycles
this meant a few hundred thousand instructions per second. It has
6000 transistors, 64KB addressable memory and 2 MHZ of clock rate.
The 8080 has sometimes been labeled the first truly usable
microprocessor
Intel 8085
The Intel 8085 is an 8-bit data bus and 16-bit address bus
microprocessor introduced by Intel in 1977.
It was binary-compatible with the more-famous Intel 8080 but
required less supporting hardware, thus allowing simpler and less
expensive microcomputer systems to be built.
The 8085 had a long life as a controller.
The processor has seven 8-bit registers named A, B, C, D, E, H, and L,
where A is the 8-bit accumulator and the other six can be used as
independent byte-registers or as three 16-bit register pairs, BC, DE,
and HL.
It has 64KB of RAM and 8MHZ of clock rate.
Intel 8086
It has a 16-bit data bus and register size with 29,000 transistors and
20 bit address bus and runs faster.
Intel 8088
Has 16-bit registers and an 8-bit data bus and can address up to 1
million bytes of internal memory.
Although the registers can process two bytes at a time, the data
bus can transfer only one byte at a time.
This processor runs in what is known as real mode, that is, one
program at a time, which actual (“real”) addresses in the segment
registers.
Intel 80286
Runs faster than the preceding processors, has additional
capabilities, and can address up to 16 million bytes.
This processor and its successors can operate in real mode or in
protected mode, which enables an operating system like Windows
to perform multitasking (running more than one job concurrently)
and to protect them from each other.
Intel 80386
Has a 32-bit registers and a 32-bit data bus and can address up to 4
billion bytes of memory.
As well as protected mode, the processor supports virtual mode,
whereby it can swap portions of memory onto disk; in this way,
programs running concurrently have space to operate.
It consists of 275,000 transistors as its components
Intel 80486
Also has 32-bit registers and a 32-bit data bus.
As well, high speed cache memory connected to the processor bus
enables the processor to store copies of the most recently used
instructions and data.
The processor can operate faster when using the cache directly
without having to access the slower main memory.
It is the first processor with built-in math co-processor
Pentium
Pentium is a registered trademark that is included in the brand names of many of
Intel's x86-compatible microprocessors
The name Pentium was derived from the Greek pente (πέντε), meaning 'five‘.
Has 32-bit registers, a 64-bit data bus, and separate caches for data and for
memory.
Its superscalar design enables the processor to decode and execute more than
one instruction per clock cycle
Intel's fifth-generation microarchitecture, the P5, was first released under the
Pentium brand on March 22, 1993.
The Pentium 4 brand refers to Intel's line of single-core desktop and laptop
central processing units (CPUs) introduced on November 20, 2000.
the initial 32-bit x86 instruction set of the Pentium 4 microprocessors was
extended by the 64-bit x86-64 set & were clocked from 1.3 GHz to 2 GHz .
Pentium II and III
Have a dual Independent Bus design that provides separate paths to the system
cache and to memory.
Where the previous processors’ connection to a storage cache on the system
board caused delays, these processors are connected to a built-in storage cache
by a 64-bit wide bus.
Processors up through the 80486 have what is known as a single-stage pipeline,
which restricts them to completing one instruction before starting the next.
Pipelining involves the way a processor divides an instruction into sequential
steps using different resources.
The Pentium has a five-stage pipelined structure, and the Pentium II has a 12-
stage super-pipelined structure.
This feature enables them to run many operations in parallel.
Next …
Assembly-, Machine-, and High-Level Languages
Assembly Language Programming Tools
Programmer’s View of a Computer System
Basic Computer Organization
Some Important Questions to Ask
What is Assembly Language?
Why Learn Assembly Language?
What is Machine Language?
How is Assembly related to Machine Language?
What is an Assembler?
How is Assembly related to High-Level Language?
Is Assembly Language portable?
A Hierarchy of Languages
Assembly and Machine Language
Machine language
Native to a processor: executed directly by hardware
Instructions consist of binary code: 1s and 0s
Assembly language
A programming language that uses symbolic names to represent
operations, registers and memory locations.
Slightly higher-level language
Readability of instructions is better than machine language
One-to-one correspondence with machine language instructions
Assemblers translate assembly to machine code
Compilers translate high-level programs to machine code
Either directly, or
Indirectly via an assembler
Compiler and Assembler
Instructions and Machine Language
Each command of a program is called an instruction (it instructs
the computer what to do).
Computers only deal with binary data, hence the instructions must
be in binary format (0s and 1s) .
The set of all instructions (in binary form) makes up the computer's
machine language. This is also referred to as the instruction set.
Instruction Fields
Machine language instructions usually are made up of several
fields. Each field specifies different information for the computer.
The major two fields are:
Opcode field which stands for operation code and it specifies the
particular operation that is to be performed.
Each operation has its unique opcode.
Operands fields which specify where to get the source and
destination operands for the operation specified by the opcode.
The source/destination of operands can be a constant, the
memory or one of the general-purpose registers.
Assembly vs. Machine Code
Translating Languages
English: D is assigned the sum of A times B plus 10.
High-Level Language: D = A * B + 10
A statement in a high-level language is translated
typically into several machine-level instructions
Intel Assembly Language: Intel Machine Language:
mov eax, A A1 00404000
mul B F7 25 00404004
add eax, 10 83 C0 0A
mov D, eax A3 00404008
Mapping Between Assembly Language and HLL
Translating HLL programs to machine language programs is not a
one-to-one mapping
A HLL instruction (usually called a statement) will be translated to
one or more machine language instructions
Advantages of High-Level Languages
Program development is faster
High-level statements: fewer instructions to code
Program maintenance is easier
For the same above reasons
Programs are portable
Contain few machine-dependent details
Can be used with little or no modifications on different machines
Compiler translates to the target machine language
However, Assembly language programs are not portable
Why Learn Assembly Language?
Accessibility to system hardware
Assembly Language is useful for implementing system software
Also useful for small embedded system applications
Space and Time efficiency
Understanding sources of program inefficiency
Tuning program performance
Writing compact code
Writing assembly programs gives the computer designer the needed deep
understanding of the instruction set and how to design one
To be able to write compilers for HLLs, we need to be expert with the
machine language. Assembly programming provides this experience
Assembly vs. High-Level Languages
Some representative types of applications:
Next …
Assembly-, Machine-, and High-Level Languages
Assembly Language Programming Tools
Programmer’s View of a Computer System
Basic Computer Organization
Assembler
Software tools are needed for editing, assembling, linking, and
debugging assembly language programs
An assembler is a program that converts source-code programs
written in assembly language into object files in machine language
Popular assemblers have emerged over the years for the Intel
family of processors. These include …
TASM (Turbo Assembler from Borland)
NASM (Netwide Assembler for both Windows and Linux), and
GNU assembler distributed by the free software foundation
Linker and Link Libraries
You need a linker program to produce executable files
It combines your program's object file created by the
assembler with other object files and link libraries, and
produces a single executable program
LINK32.EXE is the linker program provided with the
MASM distribution for linking 32-bit programs
We will also use a link library for input and output
Called Irvine32.lib developed by Kip Irvine
Works in Win32 console mode under MS-Windows
Assemble and Link Process
Source Object
File Assembler File
Source Object Executable
File Assembler File Linker
File
Source Object Link
File Assembler File Libraries
A project may consist of multiple source files
Assembler translates each source file separately into an object file
Linker links all object files together with link libraries
Debugger
Allows you to trace the execution of a program
Allows you to view code, memory, registers, etc.
Example: 32-bit Windows debugger
Editor
Allows you to create assembly language source files
Some editors provide syntax highlighting features and can be
customized as a programming environment
Next …
Assembly-, Machine-, and High-Level Languages
Assembly Language Programming Tools
Programmer’s View of a Computer System
Basic Computer Organization
Programmer’s View of a Computer System
Increased level Application Programs
of abstraction High-Level Language Level 5
Assembly Language Level 4
Operating System
Level 3
Instruction Set
Architecture Level 2
Microarchitecture Level 1
Each level
Digital Logic hides the
Level 0 details of the
level below it
Programmer's View – 2
Application Programs (Level 5)
Written in high-level programming languages
Such as Java, C++, Pascal, Visual Basic . . .
Programs compile into assembly language level (Level 4)
Assembly Language (Level 4)
Instruction mnemonics are used
Have one-to-one correspondence to machine language
Calls functions written at the operating system level (Level 3)
Programs are translated into machine language (Level 2)
Operating System (Level 3)
Provides services to level 4 and 5 programs
Translated to run at the machine instruction level (Level 2)
Programmer's View – 3
Instruction Set Architecture (Level 2)
Specifies how a processor functions
Machine instructions, registers, and memory are
exposed
Machine language is executed by Level 1
(microarchitecture)
Microarchitecture (Level 1)
Controls the execution of machine instructions (Level 2)
Implemented by digital logic (Level 0)
Digital Logic (Level 0)
Implements the microarchitecture
Uses digital logic gates
Logic gates are implemented using transistors
Instruction Set Architecture (ISA)
In computer science, an instruction set architecture (ISA) is an
abstract model of a computer. It is also referred to as architecture
or computer architecture.
A realization of an ISA, such as a central processing unit (CPU), is
called an implementation.
Collection of assembly/machine instruction set of the machine
Machine resources that can be managed with these instructions
Memory
Programmer-accessible registers.
Provides a hardware/software interface
Cont’d…
The instruction set provides commands to the processor, to tell it what it
needs to do.
The instruction set consists of addressing modes, instructions, native
data types, registers, memory architecture, interrupt, and exception
handling, and external I/O.
An example of an instruction set is the x86 instruction set, which is
common to find on computers today. Different computer processors can
use almost the same instruction set while still having very different
internal design.
Both the Intel Pentium and AMD Athlon processors use nearly the same
x86 instruction set.
An instruction set can be built into the hardware of the processor, or it
can be emulated in software, using an interpreter. The hardware design is
more efficient and faster for running programs than the emulated
software version.
Examples of instruction set
ADD - Add two numbers together.
COMPARE - Compare numbers.
IN - Input information from a device, e.g., keyboard.
JUMP - Jump to designated RAM address.
JUMP IF - Conditional statement that jumps to a designated RAM
address.
LOAD - Load information from RAM to the CPU.
OUT - Output information to device, e.g., monitor.
STORE - Store information to RAM.
Instruction Set Architecture (ISA)
Next …
Welcome
Assembly-, Machine-, and High-Level Languages
Assembly Language Programming Tools
Programmer’s View of a Computer System
Basic Computer Organization
Basic Computer Organization
Since the 1940's, computers have 3 classic
components:
Processor, called also the CPU (Central Processing
Unit)
Memory and Storage Devices
I/O Devices
Interconnected with one or more buses
data bus
Bus consists of registers
Data Bus Processor Memory
I/O
Device
I/O
Device
(CPU)
Address Bus
#1 #2
ALU CU clock
Control Bus
control bus
address bus
Processor (CPU)
Processor consists of
Datapath
ALU
Registers
Control unit
ALU
Performs arithmetic
and logic instructions
Control unit (CU)
Generates the control signals required to execute instructions
Implementation varies from one processor to another
Clock
Synchronizes Processor and Bus operations
Clock cycle = Clock period = 1 / Clock rate
Cycle 1 Cycle 2 Cycle 3
Clock rate = Clock frequency = Cycles per second
1 Hz = 1 cycle/sec 1 KHz = 103 cycles/sec
1 MHz = 106 cycles/sec 1 GHz = 109 cycles/sec
2 GHz clock has a cycle time = 1/(2×109) = 0.5
nanosecond (ns)
Clock cycles measure the execution of instructions
Memory
Ordered sequence of bytes
The sequence number is called the memory address
Byte addressable memory
Each byte has a unique address
Supported by almost all processors
Physical address space
Determined by the address bus width
Pentium has a 32-bit address bus
Physical address space = 4GB = 2 32 bytes
Itanium with a 64-bit address bus can support
Up to 2 64 bytes of physical address space
Address Space
Address Space is
the set of memory
locations (bytes)
that can be
addressed
CPU Memory Interface
Address Bus
Memory address is put on address bus
If memory address = m bits then 2m locations are addressed
Data Bus: b-bit bi-directional bus
Data can be transferred in both directions on the data bus
Note that b is not necessary equal to w or s. So data transfers might take
more than a single cycle (if w > b) .
Control Bus
Signals control
transfer of data
Read request
Write request
Complete transfer
Memory Devices
Random-Access Memory (RAM)
Usually called the main memory
It can be read and written to
It does not store information permanently (Volatile , when it is powered
off, the stored information are gone)
Information stored in it can be accessed in any order at equal time
periods (hence the name random access)
Information is accessed by an address that specifies the exact
location of the piece of information in the RAM.
DRAM = Dynamic RAM
1-Transistor cell + trench capacitor
Dense but slow, must be refreshed
Typical choice for main memory
SRAM: Static RAM
6-Transistor cell, faster but less dense than DRAM
Typical choice for cache memory
Memory Devices
ROM (Read-Only-Memory)
A read-only-memory, non-volatile i.e. stores information permanently
Has random access of stored information
Used to store the information required to startup the computer
Many types: Masked ROM, PROM, EPROM, EEPROM, and FLASH
FLASH memory can be erased electrically in blocks
Cache
A very fast type of RAM that is used to store information that is most
frequently or recently used by the computer
Recent computers have 3-levels of cache; the first level is faster but smaller in
size (usually called internal cache), and the second level is slower but larger
in size (external cache).
Processor-Memory Performance Gap
CPU: 55% per year
1000
“Moore’s Law”
Performance
100
Processor-Memory
Performance Gap:
(grows 50% per year)
10
DRAM: 7% per year
1
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
1980
2000
1980 – No cache in microprocessor
1995 – Two-level cache on microprocessor
The Need for a Memory Hierarchy
Widening (expand) speed gap between CPU and main memory
Processor operation takes less than 1 ns
Main memory requires more than 50 ns to access
Each instruction involves at least one memory access
One memory access to fetch the instruction
Additional memory accesses for instructions involving memory data
access
Memory bandwidth limits the instruction execution rate
Cache memory can help bridge the CPU-memory gap
Cache memory is small in size but fast
Typical Memory Hierarchy
Registers are at the top of the hierarchy
Typical size < 1 KB
Access time < 0.5 ns Microprocessor
Level 1 Cache (8 – 64 KB) Registers
Access time: 0.5 – 1 ns
L1 Cache
L2 Cache (512KB – 8MB)
L2 Cache
Access time: 2 – 10 ns
Bigger
Faster
Main Memory (1 – 2 GB) Memory Bus
Access time: 50 – 70 ns Memory
Disk Storage (> 200 GB)
I/O Bus
Access time: milliseconds
Disk, Tape, etc
Magnetic Disk Storage
Disk Access Time =
Seek Time +
Rotation Latency +
Transfer Time
Sector
Read/write head
Actuator
Recording area
Track 2
Seek Time: head movement to the Track 1
Track 0
desired track (milliseconds)
Arm
Rotation Latency: disk rotation until
desired sector arrives under the head Direction of Platter
rotation
Transfer Time: to transfer data Spindle
Example on Disk Access Time
Given a magnetic disk with the following properties
Rotation speed = 7200 RPM (rotations per minute)
Average seek = 8 ms, Sector = 512 bytes, Track = 200 sectors
Calculate
Time of one rotation (in milliseconds)
Average time to access a block of 32 consecutive sectors
Answer
Rotations per second
= 7200/60 = 120 RPS
Rotation time in milliseconds
Average rotational latency = 1000/120 = 8.33 ms
= time of half rotation = 4.17 ms
Time to transfer 32 sectors
Average access time = (32/200) * 8.33 = 1.33
ms + 1.33 = 13.5 ms
= 8 + 4.17