+
William Stallings
Computer Organization
and Architecture
9th Edition
Chapter 2
Computer Evolution and
Performance
2.1. A brife history of computers
2.2. Designing for performance
2.3. Multicore, MICs and CPFPUs
+
2.4. The evolution of the Intel x86
architecture
2.5. Embedded Systems and the ARM
2.6. performance assessment
+2.1. History of Computers
a. First Generation: Vacuum Tubes
■ ENIAC
■ Electronic Numerical Integrator And Computer
■ Designed and constructed at the University of Pennsylvania
■ Started in 1943 – completed in 1946
■ By John Mauchly and John Eckert
■ World’s first general purpose electronic digital computer
■ Army’s Ballistics Research Laboratory (BRL) needed a way to supply trajectory
tables for new weapons accurately and within a reasonable time frame
■ Was not finished in time to be used in the war effort
■ Its first task was to perform a series of calculations that were used to help
determine the feasibility of the hydrogen bomb
■ Continued to operate under BRL management until 1955 when it was
disassembled
2.1. History of Computers
a. First Generation: Vacuum Tubes ENIAC
Major
Memory drawback
consisted
Occupied was the need
of 20
Contained Capable
1500 Decimal accumulators,
more of for manual
Weighed square 140 kW rather each
than 5000 programming
30 feet Power than capable
18,000 additions by setting
tons of consumption binary of
vacuum per switches
floor machine holding
tubes second and
space a
10 digit plugging/
number unplugging
cables
+2.1. History of Computers
a. First Generation: Vacuum Tubes
John von Neumann
EDVAC (Electronic Discrete Variable Computer)
■First publication of the idea was in 1945
■Stored program concept
■ Attributed to ENIAC designers, most notably the
mathematician John von Neumann
■ Program represented in a form suitable for storing in
memory alongside the data
■IAS computer
■ Princeton Institute for Advanced Studies
■ Prototype of all subsequent general-purpose computers
■ Completed in 1952
2.1. History of Computers
a. First Generation: Vacuum Tubes
Structure of von Neumann
Machine
+2.1. History of Computers
a. First Generation: Vacuum Tubes
IAS Memory Formats
■The memory of the IAS ■ Both data and instructions are
consists of 1000 storage stored there
locations (called words) ■ Numbers are represented in
of 40 bits each binary form and each
instruction is a binary code
+
Structure
of
IAS
Computer
+2.1. History of Computers
a. First Generation: Vacuum Tubes
Memory buffer register • Contains a word to be stored in memory or sent to the I/O unit
(MBR) • Or is used to receive a word from memory or from the I/O unit
Memory address register • Specifies the address in memory of the word to be written from or read
Registers
(MAR) into the MBR
Instruction register (IR) • Contains the 8-bit opcode instruction being executed
Instruction buffer register • Employed to temporarily hold the right-hand instruction from a word in
(IBR) memory
• Contains the address of the next instruction pair to be fetched from
Program counter (PC) memory
Accumulator (AC) and • Employed to temporarily hold operands and results of ALU operations
multiplier quotient (MQ)
+2.1. History of Computers
a. First Generation: Vacuum Tubes
Operations
IAS
+
Table 2.1
The IAS
Instruction
Set
+2.1. History of Computers
a. First Generation: Vacuum Tubes
Commercial ComputersUNIVAC
■ 1947 – Eckert and Mauchly formed the Eckert-Mauchly
Computer Corporation to manufacture computers commercially
■ UNIVAC I (Universal Automatic Computer)
■ First successful commercial computer
■ Was intended for both scientific and commercial applications
■ Commissioned by the US Bureau of Census for 1950 calculations
■ The Eckert-Mauchly Computer Corporation became part of the
UNIVAC division of the Sperry-Rand Corporation
■ UNIVAC II – delivered in the late 1950’s
■ Had greater memory capacity and higher performance
■ Backward compatible
+
■Was the major manufacturer
of punched-card processing
equipment
■Delivered its first electronic
stored-program computer
(701) in 1953
■ Intended primarily for
scientific applications IBM
■Introduced 702 product in
1955
■ Hardware features made it
suitable to business
applications
■Series of 700/7000
computers established IBM as
the overwhelmingly dominant
computer manufacturer
+2.1. History of Computers
b. Second Generation: Transistors
■Smaller
■Cheaper
■Dissipates less heat than a vacuum tube
■Is a solid state device made from silicon
■Was invented at Bell Labs in 1947
■It was not until the late 1950’s that fully transistorized
computers were commercially available
Table 2.2
Computer Generations
+
Computer Generations
+2.1. History of Computers
b. Second Generation: Transistors
Second Generation Computers
■ Introduced:
■ Appearance of the Digital
■ More complex arithmetic
Equipment Corporation
and logic units and control
(DEC) in 1957
units
■ The use of high-level ■ PDP-1 was DEC’s first
programming languages computer
■ Provision of system
software which provided ■ This began the mini-
the ability to: computer phenomenon that
■ load programs would become so
■ move data to peripherals prominent in the third
and libraries generation
■ perform common
computations
2.1. History of Computers
b. Second Generation: Transistors
Table 2.3 Example Members
of the IBM 700/7000 Series
Table 2.3 Example Members of the IBM 700/7000 Series
2.1. History of Computers
b. Second Generation: Transistors
IBM
7094
Configuration
2.1. History of Computers
c. Third Generation: Integrated Circuits
■1958 – the invention of the integrated circuit
■Discrete component
■ Single, self-contained transistor
■ Manufactured separately, packaged in their own containers,
and soldered or wired together onto masonite-like circuit
boards
■ Manufacturing process was expensive and cumbersome
■The two most important members of the third
generation were the IBM System/360 and the DEC PDP-
8
+2.1. History of Computers
c. Third Generation: Integrated Circuits
Microelectronics
+2.1. History of Computers
c. Third Generation: Integrated Circuits
■ A computer consists of
Integrated Circuits gates, memory cells, and
interconnections among
■ Data storage – provided by these elements
memory cells
■ The gates and memory
■ Data processing – provided cells are constructed of
by gates simple digital electronic
components
■ Data movement – the paths ■ Exploits the fact that such
among components are used components as transistors,
to move data from memory resistors, and conductors can
to memory and from be fabricated from a
memory through gates to semiconductor such as silicon
memory
■ Many transistors can be
produced at the same time on
■ Control – the paths among
a single wafer of silicon
components can carry
control signals ■ Transistors can be connected
with a processor metallization
to form circuits
+2.1. History of Computers
c. Third Generation: Integrated Circuits
Wafer,
Chip,
and
Gate
Relationshi
p
+2.1. History of Computers
c. Third Generation: Integrated Circuits
Chip Growth
2.1. History of Computers
c. Third Generation: Integrated Circuits
Moore’s Law
1965; Gordon Moore – co-founder of Intel
Observed number of transistors that could be put on a single
chip was doubling every year
Consequences of Moore’s law:
The pace slowed to a
doubling every 18
months in the 1970’s but The cost of
has sustained that rate The electrical Computer becomes
computer logic
ever since path length is smaller and is more Reduction in
and memory Fewer interchip
shortened, convenient to use in power and cooling
circuitry has a variety of connections
increasing requirements
fallen at a environments
operating speed
dramatic rate
+2.1. History of Computers
c. Third Generation: Integrated Circuits
Table 2.4
Characteristics of the
System/360 Family
Table 2.4 Characteristics of the System/360 Family
2.1. History of Computers
c. Third Generation: Integrated Circuits
Table 2.5
Evolution of the PDP-8
Table 2.5 Evolution of the PDP-8
+2.1. History of Computers
c. Third Generation: Integrated Circuits
DEC - PDP-8 Bus Structure
+ LSI
Large
Scale
d. Later Integration
Generation
VLSI
s Very Large
Scale
Integration
Semiconductor Memory ULSI
Ultra Large
Microprocessors Scale
Integration
+2.1. History of Computers
d. Later Generations
Semiconductor Memory
In 1970 Fairchild produced the first relatively capacious semiconductor memory
Chip was about the size of a Could hold 256 bits of
Non-destructive Much faster than core
single core memory
In 1974 the price per bit of semiconductor memory dropped below the price per bit of core memory
There has been a continuing and rapid decline in memory cost
Developments in memory and processor technologies
accompanied by a corresponding increase in physical memory
changed the nature of computers in less than a decade
density
Since 1970 semiconductor memory has been through 13 generations
Each generation has provided four times the storage density of the previous generation, accompanied by declining cost per bit
and declining access time
+2.1. History of Computers
d. Later Generations
Microprocessors
■The density of elements on processor chips continued to
rise
■ More and more elements were placed on each chip so that
fewer and fewer chips were needed to construct a single
computer processor
■1971 Intel developed 4004
■ First chip to contain all of the components of a CPU on a single
chip
■ Birth of microprocessor
■1972 Intel developed 8008
■ First 8-bit microprocessor
■1974 Intel developed 8080
■ First general purpose microprocessor
■ Faster, has a richer instruction set, has a large addressing
2.1. History of Computers
d. Later Generations
Evolution of Intel Microprocessors
a. 1970s Processors
Processors
b. 1980s
2.1. History of Computers
d. Later Generations
Evolution of Intel Microprocessors
c. 1990s Processors
d. Recent Processors
+2.2. Designing for Performance
a. Microprocessor Speed
Techniques built into contemporary processors include:
Pipelining
• Processor moves data or instructions into a conceptual
pipe with all stages of the pipe processing
simultaneously
Branch • Processor looks ahead in the instruction code fetched
from memory and predicts which branches, or groups
prediction of instructions, are likely to be processed next
Data flow • Processor analyzes which instructions are dependent
on each other’s results, or data, to create an optimized
analysis schedule of instructions
Speculative • Using branch prediction and data flow analysis, some
processors speculatively execute instructions ahead of
their actual appearance in the program execution,
execution holding the results in temporary locations, keeping
execution engines as busy as possible
+2.2. Designing for Performance
b. Performance Balance
■Adjust the organization and Increase the number of
bits that are retrieved at
architecture to compensate one time by making
DRAMs “wider” rather
for the mismatch among the than “deeper” and by
capabilities of the various using wide bus data
paths
components
■Architectural examples Reduce the frequency of
memory access by
include: incorporating
increasingly complex
and efficient cache
structures between the
processor and main
memory
Increase the interconnect
Change the DRAM
bandwidth between
interface to make it
processors and memory
more efficient by
by using higher speed
including a cache or
buses and a hierarchy of
other buffering scheme
buses to buffer and
on the DRAM chip
structure data flow
2.2. Designing for Performance
b. Performance Balance
Typical I/O Device Data Rates
+2.2. Designing for Performance
c. Improvements in Chip Organization and
Architecture
■Increase hardware speed of processor
■ Fundamentally due to shrinking logic gate size
■ More gates, packed more tightly, increasing clock rate
■ Propagation time for signals reduced
■Increase size and speed of caches
■ Dedicating part of processor chip
■ Cache access times drop significantly
■Change processor organization and architecture
■ Increase effective speed of instruction execution
■ Parallelism
+2.2. Designing for Performance
c. Improvements in Chip Organization and
Architecture
Problems with Clock Speed and Login
Density
■ Power
■ Power density increases with density of logic and clock
speed
■ Dissipating heat
■ RC delay
■ Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
■ Delay increases as RC product increases
■ Wire interconnects thinner, increasing resistance
■ Wires closer together, increasing capacitance
■ Memory latency
■ Memory speeds lag processor speeds
2.2. Designing for Performance
+
c. Improvements in Chip Organization and
Architecture
Processor
Trends
2.3. Multicore, MICs and
a. Multicore
GPGPUs The use of multiple processors on
the same chip provides the
potential to increase performance
without increasing the clock rate
Strategy is to use two simpler
processors on the chip rather than
one more complex processor
With two processors larger
caches are justified
As caches became larger it made
performance sense to create two
and then three levels of cache on
a chip
+2.3. Multicore, MICs and GPUs
b. Many Integrated Core (MIC)
c. Graphics Processing Unit (GPU)
MIC GPU
■ Leap in performance as well ■ Core designed to perform
as the challenges in parallel operations on
developing software to graphics data
exploit such a large number
of cores ■ Traditionally found on a
plug-in graphics card, it is
■ The multicore and MIC used to encode and render
strategy involves a 2D and 3D graphics as well
homogeneous collection of as process video
general purpose processors
on a single chip ■ Used as vector processors
for a variety of applications
that require repetitive
computations
2.4. Intel x86 architecture
+ Overview ARM
■ Results of decades of design effort
on complex instruction set Intel
computers (CISCs)
■ Excellent example of CISC design
■ Incorporates the sophisticated
design principles once found only
on mainframes and
supercomputers
x86 Architecture
■ An alternative approach to
processor design is the reduced
instruction set computer (RISC)
■ The ARM architecture is used in a
wide variety of embedded systems
and is one of the most powerful and
best designed RISC based systems
on the market CISC
■ In terms of market share Intel is
ranked as the number one maker of RIS
microprocessors for non-embedded
systems C
2.4. Intel x86 architecture
■ 8080
■ First general purpose microprocessor
■ 8-bit machine with an 8-bit data path to
memory
■ Used in the first personal computer (Altair)
■ 8086
■ 16-bit machine
■ Used an instruction cache, or queue
x86 Evolution ■ First appearance of the x86 architecture
■ 8088
■
+
used in IBM’s first personal computer
■ 80286
■ Enabled addressing a 16-MByte memory
instead of just 1 MByte
■ 80386
■ Intel’s first 32-bit machine
■ First Intel processor to support multitasking
■ 80486
■ More sophisticated cache technology and
instruction pipelining
2.4. Intel x86 architecture
x86 Evolution - Pentium
Pentium Pentium Pro Pentium II Pentium III Pentium 4
• Superscalar • Increased • MMX • Additional • Includes
• Multiple
instructions
executed in
+ superscalar
organization
• Aggressive
technology
• Designed
specifically to
floating-point
instructions to
support 3D
additional
floating-point
and other
parallel register process video, graphics enhancements
renaming audio, and software for multimedia
• Branch graphics data
prediction
• Data flow
analysis
• Speculative
execution
2.4. Intel x86 architecture
x86 Evolution (continued)
■Core
■ First Intel x86
Instruction set
microprocessor with a dual
architecture is core, referring to the
backward compatible implementation of two
with earlier versions
processors on a single chip
■Core 2
X86 architecture ■ Extends the architecture to
continues to
dominate the 64 bits
processor market ■ Recent Core offerings have
outside of
up to 10 processors per
embedded
systems chip
2.4. Intel x86 architecture
General definition: Embedde
d
“A combination of computer
hardware and software, and
perhaps additional mechanical
or other parts, designed to Systems
perform a dedicated function. In
many cases, embedded systems
+ are part of a larger system or
product, as in the case of an
antilock braking system in a
car.”
Table 2.7
Examples of Embedded Systems and Their
Markets
+2.5. Embedded systems and the
a.
ARMEmbeded Systems
Requirements and
Constraints
Small to large systems,
implying different cost
constraints and different needs
for optimization and reuse
Relaxed to very strict
Different models of requirements and
computation ranging from combinations of different
discrete event systems to quality requirements with
hybrid systems respect to safety, reliability,
real-time and flexibility
Different application
characteristics resulting in
static versus dynamic loads,
slow to fast speed, compute Short to long life times
versus interface intensive
tasks, and/or combinations
thereof
Different environmental
conditions in terms of
radiation, vibrations, and
humidity
Figure 2.12
Possible Organization of an a.
Embedded System
ARMEmbeded Systems
+2.5. Embedded systems and the
+2.5. Embedded systems and the
a.
ARMAcorn RISC Machine (ARM)
■ Family of RISC-based ■ Widely used in PDAs and
microprocessors and other handheld devices
microcontrollers
■ Chips are the processors in
■ Designs microprocessor and iPod and iPhone devices
multicore architectures and
licenses them to ■ Most widely used
manufacturers embedded processor
architecture
■ Chips are high-speed
processors that are known ■ Most widely used processor
for their small die size and architecture of any kind
low power requirements
+
A
R
M
E
v
ol
ut
io
n
DSP = digital signal processor SoC = system on a chip
2.5. Embedded systems and the
a.
ARMAcorn RISC Machine (ARM)
ARM Design Categories
■ARM processors are designed to meet the needs of
three system categories:
▪ Secure applications
▪ Smart cards, SIM cards,
and payment terminals
▪ Application platforms
▪ Embedded real-time
▪ Devices running open
systems
operating systems including
▪ Systems for storage, Linux, Palm OS, Symbian
automotive body and OS, and Windows CE in
power-train, industrial, and wireless, consumer
networking applications entertainment and digital
imaging applications
+2.6. Performance Assessment
a. Clock Speed and Instruction per Second
+ System Clock
+2.6. Performance Assessment
a. Clock Speed and Instruction per Second
+ Instruction execution rate
Cycles per instruction
The processor time T needed to ễcute a given program can be
expressed as
Expressed as millions of instruction per second (MIPS), referred
to as th MIPS rate
+2.6. Performance Assessment
a. Clock Speed and Instruction per Second
+ Instruction execution rate
Example:
Consider the execution of a program that results in the execu- tion
of 2 million instructions on a 400-MHz processor. The program
consists of four major types of instructions. The instruction mix
and the CPI for each instruction type are given below based on the
result of a program trace experiment
+ Table
Performance Factors 2.9
and
System Attributes
2.6. Performance Assessment
b. Benchmarks
For example, consider this high-level language statement:
A = B + C /* assume all quantities in main memory */
With a traditional instruction set architecture, referred to as a
complex instruction set computer (CISC), this instruction can be
compiled into one processor instruction:
add mem(B), mem(C), mem (A)
On a typical RISC machine, the compilation would look
something like this:
load mem(B), reg(1);
load mem(C), reg(2);
add reg(1), reg(2), reg(3);
store reg(3), mem (A)
+2.6. Performance Assessment
b. Benchmarks
Desirable Benchmark
Characteristics
Written in a high-level language, making it portable across
different machines
Representative of a particular kind of programming style,
such as system programming, numerical programming, or
commercial programming
Can be measured easily
Has wide distribution
+2.6. Performance Assessment
b. Benchmarks
System Performance Evaluation
Corporation (SPEC)
■Benchmark suite
■ A collection of programs, defined in a high-level language
■ Attempts to provide a representative test of a computer in
a particular application or system programming area
■SPEC
■ An industry consortium
■ Defines and maintains the best known collection of
benchmark suites
■ Performance measurements are widely used for comparison
and research purposes
+ ■ Best known SPEC benchmark suite
■ Industry standard suite for
processor intensive applications
SPEC ■ Appropriate for measuring
performance for applications that
spend most of their time doing
computation rather than I/O
CPU2006 ■ Consists of 17 floating point
programs written in C, C++, and
Fortran and 12 integer programs
written in C and C++
■ Suite contains over 3 million lines of
code
■ Fifth generation of processor
intensive suites from SPEC
2.6. Performance Assessment
+ ■ Gene Amdahl [AMDA67]
c. ■ Deals with the potential speedup of
a program using multiple
Amdahl’s processors compared to a single
processor
Law ■ Illustrates the problems facing
industry in the development of
multi-core machines
■ Software must be adapted to a
highly parallel execution
environment to exploit the power
of parallel processing
■ Can be generalized to evaluate and
design technical improvement in a
computer system
+2.6. Performance Assessment
c. Amdahl’s Law
+2.6. Performance Assessment
d. Little’s Law
■Fundamental and simple relation with broad applications
■Can be applied to almost any system that is statistically
in steady state, and in which there is no leakage
■Queuing system
■ If server is idle an item is served immediately, otherwise an
arriving item joins a queue
■ There can be a single queue for a single server or for multiple
servers, or multiples queues with one being for each of
multiple servers
■Average number of items in a queuing system equals
the average rate at which items arrive multiplied by the
time that an item spends in the system
■ Relationship requires very few assumptions
■ Because of its simplicity and generality it is extremely useful
+ Summary Computer
Evolution and
Performance
Chapter 2
■ Multi-core
■ First generation computers ■ MICs
■ Vacuum tubes
■ Second generation
■ GPGPUs
computers ■ Evolution of the Intel x86
■ Transistors
■ Embedded systems
■ Third generation computers
■ Integrated circuits ■ ARM evolution
■ Performance designs ■ Performance assessment
■ Microprocessor speed ■ Clock speed and
■ Performance balance instructions per second
■ Chip organization and
■ Benchmarks
architecture ■ Amdahl’s Law