0% found this document useful (0 votes)
5 views63 pages

CH02 COA9e

Chapter 2 of 'Computer Organization and Architecture' discusses the evolution of computers from the first generation using vacuum tubes to modern multicore and microprocessor designs. It covers key developments such as the ENIAC, the introduction of transistors, integrated circuits, and the impact of Moore's Law on chip design. The chapter also highlights performance assessment techniques and the significance of architectures like Intel's x86 and ARM in contemporary computing.

Uploaded by

trungidapp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views63 pages

CH02 COA9e

Chapter 2 of 'Computer Organization and Architecture' discusses the evolution of computers from the first generation using vacuum tubes to modern multicore and microprocessor designs. It covers key developments such as the ENIAC, the introduction of transistors, integrated circuits, and the impact of Moore's Law on chip design. The chapter also highlights performance assessment techniques and the significance of architectures like Intel's x86 and ARM in contemporary computing.

Uploaded by

trungidapp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 63

+

William Stallings
Computer Organization
and Architecture
9th Edition
Chapter 2
Computer Evolution and
Performance

2.1. A brife history of computers


2.2. Designing for performance
2.3. Multicore, MICs and CPFPUs
+
2.4. The evolution of the Intel x86
architecture
2.5. Embedded Systems and the ARM
2.6. performance assessment
+2.1. History of Computers
a. First Generation: Vacuum Tubes

■ ENIAC
■ Electronic Numerical Integrator And Computer
■ Designed and constructed at the University of Pennsylvania
■ Started in 1943 – completed in 1946
■ By John Mauchly and John Eckert

■ World’s first general purpose electronic digital computer


■ Army’s Ballistics Research Laboratory (BRL) needed a way to supply trajectory
tables for new weapons accurately and within a reasonable time frame
■ Was not finished in time to be used in the war effort

■ Its first task was to perform a series of calculations that were used to help
determine the feasibility of the hydrogen bomb

■ Continued to operate under BRL management until 1955 when it was


disassembled
2.1. History of Computers
a. First Generation: Vacuum Tubes ENIAC

Major
Memory drawback
consisted
Occupied was the need
of 20
Contained Capable
1500 Decimal accumulators,
more of for manual
Weighed square 140 kW rather each
than 5000 programming
30 feet Power than capable
18,000 additions by setting
tons of consumption binary of
vacuum per switches
floor machine holding
tubes second and
space a
10 digit plugging/
number unplugging
cables
+2.1. History of Computers
a. First Generation: Vacuum Tubes

John von Neumann


EDVAC (Electronic Discrete Variable Computer)
■First publication of the idea was in 1945

■Stored program concept


■ Attributed to ENIAC designers, most notably the
mathematician John von Neumann
■ Program represented in a form suitable for storing in
memory alongside the data

■IAS computer
■ Princeton Institute for Advanced Studies
■ Prototype of all subsequent general-purpose computers
■ Completed in 1952
2.1. History of Computers
a. First Generation: Vacuum Tubes
Structure of von Neumann
Machine
+2.1. History of Computers
a. First Generation: Vacuum Tubes
IAS Memory Formats
■The memory of the IAS ■ Both data and instructions are
consists of 1000 storage stored there
locations (called words) ■ Numbers are represented in
of 40 bits each binary form and each
instruction is a binary code
+
Structure
of
IAS
Computer
+2.1. History of Computers
a. First Generation: Vacuum Tubes
Memory buffer register • Contains a word to be stored in memory or sent to the I/O unit
(MBR) • Or is used to receive a word from memory or from the I/O unit

Memory address register • Specifies the address in memory of the word to be written from or read

Registers
(MAR) into the MBR

Instruction register (IR) • Contains the 8-bit opcode instruction being executed

Instruction buffer register • Employed to temporarily hold the right-hand instruction from a word in
(IBR) memory

• Contains the address of the next instruction pair to be fetched from


Program counter (PC) memory

Accumulator (AC) and • Employed to temporarily hold operands and results of ALU operations
multiplier quotient (MQ)
+2.1. History of Computers
a. First Generation: Vacuum Tubes

Operations
IAS
+

Table 2.1

The IAS
Instruction
Set
+2.1. History of Computers
a. First Generation: Vacuum Tubes
Commercial ComputersUNIVAC
■ 1947 – Eckert and Mauchly formed the Eckert-Mauchly
Computer Corporation to manufacture computers commercially

■ UNIVAC I (Universal Automatic Computer)


■ First successful commercial computer
■ Was intended for both scientific and commercial applications
■ Commissioned by the US Bureau of Census for 1950 calculations

■ The Eckert-Mauchly Computer Corporation became part of the


UNIVAC division of the Sperry-Rand Corporation

■ UNIVAC II – delivered in the late 1950’s


■ Had greater memory capacity and higher performance

■ Backward compatible
+
■Was the major manufacturer
of punched-card processing
equipment

■Delivered its first electronic


stored-program computer
(701) in 1953
■ Intended primarily for
scientific applications IBM
■Introduced 702 product in
1955
■ Hardware features made it
suitable to business
applications

■Series of 700/7000
computers established IBM as
the overwhelmingly dominant
computer manufacturer
+2.1. History of Computers
b. Second Generation: Transistors

■Smaller

■Cheaper

■Dissipates less heat than a vacuum tube

■Is a solid state device made from silicon

■Was invented at Bell Labs in 1947

■It was not until the late 1950’s that fully transistorized
computers were commercially available
Table 2.2
Computer Generations

+
Computer Generations
+2.1. History of Computers
b. Second Generation: Transistors

Second Generation Computers


■ Introduced:
■ Appearance of the Digital
■ More complex arithmetic
Equipment Corporation
and logic units and control
(DEC) in 1957
units
■ The use of high-level ■ PDP-1 was DEC’s first
programming languages computer
■ Provision of system
software which provided ■ This began the mini-
the ability to: computer phenomenon that
■ load programs would become so
■ move data to peripherals prominent in the third
and libraries generation
■ perform common
computations
2.1. History of Computers
b. Second Generation: Transistors

Table 2.3 Example Members


of the IBM 700/7000 Series

Table 2.3 Example Members of the IBM 700/7000 Series


2.1. History of Computers
b. Second Generation: Transistors

IBM
7094
Configuration
2.1. History of Computers
c. Third Generation: Integrated Circuits

■1958 – the invention of the integrated circuit

■Discrete component
■ Single, self-contained transistor
■ Manufactured separately, packaged in their own containers,
and soldered or wired together onto masonite-like circuit
boards
■ Manufacturing process was expensive and cumbersome

■The two most important members of the third


generation were the IBM System/360 and the DEC PDP-
8
+2.1. History of Computers
c. Third Generation: Integrated Circuits

Microelectronics
+2.1. History of Computers
c. Third Generation: Integrated Circuits
■ A computer consists of
Integrated Circuits gates, memory cells, and
interconnections among
■ Data storage – provided by these elements
memory cells
■ The gates and memory
■ Data processing – provided cells are constructed of
by gates simple digital electronic
components
■ Data movement – the paths ■ Exploits the fact that such
among components are used components as transistors,
to move data from memory resistors, and conductors can
to memory and from be fabricated from a
memory through gates to semiconductor such as silicon
memory
■ Many transistors can be
produced at the same time on
■ Control – the paths among
a single wafer of silicon
components can carry
control signals ■ Transistors can be connected
with a processor metallization
to form circuits
+2.1. History of Computers
c. Third Generation: Integrated Circuits

Wafer,
Chip,
and
Gate
Relationshi
p
+2.1. History of Computers
c. Third Generation: Integrated Circuits

Chip Growth
2.1. History of Computers
c. Third Generation: Integrated Circuits

Moore’s Law
1965; Gordon Moore – co-founder of Intel

Observed number of transistors that could be put on a single


chip was doubling every year

Consequences of Moore’s law:


The pace slowed to a
doubling every 18
months in the 1970’s but The cost of
has sustained that rate The electrical Computer becomes
computer logic
ever since path length is smaller and is more Reduction in
and memory Fewer interchip
shortened, convenient to use in power and cooling
circuitry has a variety of connections
increasing requirements
fallen at a environments
operating speed
dramatic rate
+2.1. History of Computers
c. Third Generation: Integrated Circuits

Table 2.4
Characteristics of the
System/360 Family

Table 2.4 Characteristics of the System/360 Family


2.1. History of Computers
c. Third Generation: Integrated Circuits
Table 2.5
Evolution of the PDP-8

Table 2.5 Evolution of the PDP-8


+2.1. History of Computers
c. Third Generation: Integrated Circuits

DEC - PDP-8 Bus Structure


+ LSI
Large
Scale
d. Later Integration

Generation
VLSI
s Very Large
Scale
Integration

Semiconductor Memory ULSI


Ultra Large
Microprocessors Scale
Integration
+2.1. History of Computers
d. Later Generations
Semiconductor Memory
In 1970 Fairchild produced the first relatively capacious semiconductor memory

Chip was about the size of a Could hold 256 bits of


Non-destructive Much faster than core
single core memory

In 1974 the price per bit of semiconductor memory dropped below the price per bit of core memory

There has been a continuing and rapid decline in memory cost


Developments in memory and processor technologies
accompanied by a corresponding increase in physical memory
changed the nature of computers in less than a decade
density

Since 1970 semiconductor memory has been through 13 generations

Each generation has provided four times the storage density of the previous generation, accompanied by declining cost per bit
and declining access time
+2.1. History of Computers
d. Later Generations
Microprocessors
■The density of elements on processor chips continued to
rise
■ More and more elements were placed on each chip so that
fewer and fewer chips were needed to construct a single
computer processor

■1971 Intel developed 4004


■ First chip to contain all of the components of a CPU on a single
chip
■ Birth of microprocessor

■1972 Intel developed 8008


■ First 8-bit microprocessor

■1974 Intel developed 8080


■ First general purpose microprocessor
■ Faster, has a richer instruction set, has a large addressing
2.1. History of Computers
d. Later Generations
Evolution of Intel Microprocessors
a. 1970s Processors
Processors
b. 1980s
2.1. History of Computers
d. Later Generations

Evolution of Intel Microprocessors


c. 1990s Processors
d. Recent Processors
+2.2. Designing for Performance
a. Microprocessor Speed
Techniques built into contemporary processors include:

Pipelining
• Processor moves data or instructions into a conceptual
pipe with all stages of the pipe processing
simultaneously

Branch • Processor looks ahead in the instruction code fetched


from memory and predicts which branches, or groups

prediction of instructions, are likely to be processed next

Data flow • Processor analyzes which instructions are dependent


on each other’s results, or data, to create an optimized

analysis schedule of instructions

Speculative • Using branch prediction and data flow analysis, some


processors speculatively execute instructions ahead of
their actual appearance in the program execution,

execution holding the results in temporary locations, keeping


execution engines as busy as possible
+2.2. Designing for Performance
b. Performance Balance

■Adjust the organization and Increase the number of


bits that are retrieved at
architecture to compensate one time by making
DRAMs “wider” rather
for the mismatch among the than “deeper” and by
capabilities of the various using wide bus data
paths
components
■Architectural examples Reduce the frequency of
memory access by
include: incorporating
increasingly complex
and efficient cache
structures between the
processor and main
memory

Increase the interconnect


Change the DRAM
bandwidth between
interface to make it
processors and memory
more efficient by
by using higher speed
including a cache or
buses and a hierarchy of
other buffering scheme
buses to buffer and
on the DRAM chip
structure data flow
2.2. Designing for Performance
b. Performance Balance
Typical I/O Device Data Rates
+2.2. Designing for Performance
c. Improvements in Chip Organization and
Architecture

■Increase hardware speed of processor


■ Fundamentally due to shrinking logic gate size
■ More gates, packed more tightly, increasing clock rate
■ Propagation time for signals reduced

■Increase size and speed of caches


■ Dedicating part of processor chip
■ Cache access times drop significantly

■Change processor organization and architecture


■ Increase effective speed of instruction execution
■ Parallelism
+2.2. Designing for Performance
c. Improvements in Chip Organization and
Architecture

Problems with Clock Speed and Login


Density
■ Power
■ Power density increases with density of logic and clock
speed
■ Dissipating heat

■ RC delay
■ Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
■ Delay increases as RC product increases

■ Wire interconnects thinner, increasing resistance

■ Wires closer together, increasing capacitance

■ Memory latency
■ Memory speeds lag processor speeds
2.2. Designing for Performance
+
c. Improvements in Chip Organization and
Architecture

Processor
Trends
2.3. Multicore, MICs and
a. Multicore
GPGPUs The use of multiple processors on
the same chip provides the
potential to increase performance
without increasing the clock rate

Strategy is to use two simpler


processors on the chip rather than
one more complex processor

With two processors larger


caches are justified

As caches became larger it made


performance sense to create two
and then three levels of cache on
a chip
+2.3. Multicore, MICs and GPUs
b. Many Integrated Core (MIC)
c. Graphics Processing Unit (GPU)

MIC GPU
■ Leap in performance as well ■ Core designed to perform
as the challenges in parallel operations on
developing software to graphics data
exploit such a large number
of cores ■ Traditionally found on a
plug-in graphics card, it is
■ The multicore and MIC used to encode and render
strategy involves a 2D and 3D graphics as well
homogeneous collection of as process video
general purpose processors
on a single chip ■ Used as vector processors
for a variety of applications
that require repetitive
computations
2.4. Intel x86 architecture
+ Overview ARM
■ Results of decades of design effort
on complex instruction set Intel
computers (CISCs)

■ Excellent example of CISC design

■ Incorporates the sophisticated


design principles once found only
on mainframes and
supercomputers
x86 Architecture
■ An alternative approach to
processor design is the reduced
instruction set computer (RISC)

■ The ARM architecture is used in a


wide variety of embedded systems
and is one of the most powerful and
best designed RISC based systems
on the market CISC
■ In terms of market share Intel is
ranked as the number one maker of RIS
microprocessors for non-embedded
systems C
2.4. Intel x86 architecture
■ 8080
■ First general purpose microprocessor
■ 8-bit machine with an 8-bit data path to
memory
■ Used in the first personal computer (Altair)

■ 8086
■ 16-bit machine
■ Used an instruction cache, or queue
x86 Evolution ■ First appearance of the x86 architecture

■ 8088

+
used in IBM’s first personal computer

■ 80286
■ Enabled addressing a 16-MByte memory
instead of just 1 MByte

■ 80386
■ Intel’s first 32-bit machine
■ First Intel processor to support multitasking

■ 80486
■ More sophisticated cache technology and
instruction pipelining
2.4. Intel x86 architecture
x86 Evolution - Pentium

Pentium Pentium Pro Pentium II Pentium III Pentium 4

• Superscalar • Increased • MMX • Additional • Includes


• Multiple
instructions
executed in
+ superscalar
organization
• Aggressive
technology
• Designed
specifically to
floating-point
instructions to
support 3D
additional
floating-point
and other
parallel register process video, graphics enhancements
renaming audio, and software for multimedia
• Branch graphics data
prediction
• Data flow
analysis
• Speculative
execution
2.4. Intel x86 architecture
x86 Evolution (continued)

■Core
■ First Intel x86
Instruction set
microprocessor with a dual
architecture is core, referring to the
backward compatible implementation of two
with earlier versions
processors on a single chip

■Core 2
X86 architecture ■ Extends the architecture to
continues to
dominate the 64 bits
processor market ■ Recent Core offerings have
outside of
up to 10 processors per
embedded
systems chip
2.4. Intel x86 architecture
General definition: Embedde
d
“A combination of computer
hardware and software, and
perhaps additional mechanical
or other parts, designed to Systems
perform a dedicated function. In
many cases, embedded systems
+ are part of a larger system or
product, as in the case of an
antilock braking system in a
car.”
Table 2.7
Examples of Embedded Systems and Their
Markets
+2.5. Embedded systems and the
a.
ARMEmbeded Systems
Requirements and
Constraints
Small to large systems,
implying different cost
constraints and different needs
for optimization and reuse

Relaxed to very strict


Different models of requirements and
computation ranging from combinations of different
discrete event systems to quality requirements with
hybrid systems respect to safety, reliability,
real-time and flexibility

Different application
characteristics resulting in
static versus dynamic loads,
slow to fast speed, compute Short to long life times
versus interface intensive
tasks, and/or combinations
thereof

Different environmental
conditions in terms of
radiation, vibrations, and
humidity
Figure 2.12
Possible Organization of an a.
Embedded System
ARMEmbeded Systems
+2.5. Embedded systems and the
+2.5. Embedded systems and the
a.
ARMAcorn RISC Machine (ARM)

■ Family of RISC-based ■ Widely used in PDAs and


microprocessors and other handheld devices
microcontrollers
■ Chips are the processors in
■ Designs microprocessor and iPod and iPhone devices
multicore architectures and
licenses them to ■ Most widely used
manufacturers embedded processor
architecture
■ Chips are high-speed
processors that are known ■ Most widely used processor
for their small die size and architecture of any kind
low power requirements
+
A
R
M
E
v
ol
ut
io
n
DSP = digital signal processor SoC = system on a chip
2.5. Embedded systems and the
a.
ARMAcorn RISC Machine (ARM)
ARM Design Categories
■ARM processors are designed to meet the needs of
three system categories:

▪ Secure applications
▪ Smart cards, SIM cards,
and payment terminals

▪ Application platforms
▪ Embedded real-time
▪ Devices running open
systems
operating systems including
▪ Systems for storage, Linux, Palm OS, Symbian
automotive body and OS, and Windows CE in
power-train, industrial, and wireless, consumer
networking applications entertainment and digital
imaging applications
+2.6. Performance Assessment
a. Clock Speed and Instruction per Second
+ System Clock
+2.6. Performance Assessment
a. Clock Speed and Instruction per Second
+ Instruction execution rate
Cycles per instruction

The processor time T needed to ễcute a given program can be


expressed as

Expressed as millions of instruction per second (MIPS), referred


to as th MIPS rate
+2.6. Performance Assessment
a. Clock Speed and Instruction per Second
+ Instruction execution rate
Example:
Consider the execution of a program that results in the execu- tion
of 2 million instructions on a 400-MHz processor. The program
consists of four major types of instructions. The instruction mix
and the CPI for each instruction type are given below based on the
result of a program trace experiment
+ Table
Performance Factors 2.9
and
System Attributes
2.6. Performance Assessment
b. Benchmarks
For example, consider this high-level language statement:

A = B + C /* assume all quantities in main memory */

With a traditional instruction set architecture, referred to as a


complex instruction set computer (CISC), this instruction can be
compiled into one processor instruction:

add mem(B), mem(C), mem (A)

On a typical RISC machine, the compilation would look


something like this:
load mem(B), reg(1);
load mem(C), reg(2);
add reg(1), reg(2), reg(3);
store reg(3), mem (A)
+2.6. Performance Assessment
b. Benchmarks

Desirable Benchmark
Characteristics
Written in a high-level language, making it portable across
different machines

Representative of a particular kind of programming style,


such as system programming, numerical programming, or
commercial programming

Can be measured easily

Has wide distribution


+2.6. Performance Assessment
b. Benchmarks

System Performance Evaluation


Corporation (SPEC)
■Benchmark suite
■ A collection of programs, defined in a high-level language
■ Attempts to provide a representative test of a computer in
a particular application or system programming area

■SPEC
■ An industry consortium
■ Defines and maintains the best known collection of
benchmark suites
■ Performance measurements are widely used for comparison
and research purposes
+ ■ Best known SPEC benchmark suite

■ Industry standard suite for


processor intensive applications
SPEC ■ Appropriate for measuring
performance for applications that
spend most of their time doing
computation rather than I/O
CPU2006 ■ Consists of 17 floating point
programs written in C, C++, and
Fortran and 12 integer programs
written in C and C++

■ Suite contains over 3 million lines of


code

■ Fifth generation of processor


intensive suites from SPEC
2.6. Performance Assessment
+ ■ Gene Amdahl [AMDA67]

c. ■ Deals with the potential speedup of


a program using multiple

Amdahl’s processors compared to a single


processor

Law ■ Illustrates the problems facing


industry in the development of
multi-core machines
■ Software must be adapted to a
highly parallel execution
environment to exploit the power
of parallel processing

■ Can be generalized to evaluate and


design technical improvement in a
computer system
+2.6. Performance Assessment
c. Amdahl’s Law
+2.6. Performance Assessment
d. Little’s Law

■Fundamental and simple relation with broad applications

■Can be applied to almost any system that is statistically


in steady state, and in which there is no leakage

■Queuing system
■ If server is idle an item is served immediately, otherwise an
arriving item joins a queue
■ There can be a single queue for a single server or for multiple
servers, or multiples queues with one being for each of
multiple servers

■Average number of items in a queuing system equals


the average rate at which items arrive multiplied by the
time that an item spends in the system
■ Relationship requires very few assumptions
■ Because of its simplicity and generality it is extremely useful
+ Summary Computer
Evolution and
Performance
Chapter 2
■ Multi-core
■ First generation computers ■ MICs
■ Vacuum tubes
■ Second generation
■ GPGPUs
computers ■ Evolution of the Intel x86
■ Transistors
■ Embedded systems
■ Third generation computers
■ Integrated circuits ■ ARM evolution

■ Performance designs ■ Performance assessment


■ Microprocessor speed ■ Clock speed and
■ Performance balance instructions per second

■ Chip organization and


■ Benchmarks
architecture ■ Amdahl’s Law

You might also like