0% found this document useful (0 votes)
5 views

3310

Computer architecture refers to the design and structure of computer systems, focusing on hardware and software integration, including aspects like instruction sets and memory hierarchy. Performance metrics are crucial, with execution time being inversely proportional to performance, and Amdahl's Law illustrating the limits of speedup based on system components. Trends in technology and cost significantly influence design choices and overall system performance, necessitating careful consideration of benchmarks and workload normalization.

Uploaded by

yashkol44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

3310

Computer architecture refers to the design and structure of computer systems, focusing on hardware and software integration, including aspects like instruction sets and memory hierarchy. Performance metrics are crucial, with execution time being inversely proportional to performance, and Amdahl's Law illustrating the limits of speedup based on system components. Trends in technology and cost significantly influence design choices and overall system performance, necessitating careful consideration of benchmarks and workload normalization.

Uploaded by

yashkol44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Computer Architecture


“Architecture”
 The art and science of designing and
constructing buildings
 A style and method of design and construction
 Design, the way components fit together

Computer Architecture
 The overall design or structure of a computer
system, including the hardware and the software
required to run it, especially the internal structure
of the microprocessor
Computer Architecture

Design aspects:
 Instruction set
 Cache and memory hierarchy
 I/O, storage, disk
 Multi-processors, networked-systems

Criteria: performance, cost, end-applications,
complexity
Technology Trends

Since 1970s: Microprocessor-based

Several PCs/Workstations put together can
buy more cycles for the same cost
 The Berkeley NOW project

Transistor density: 50% per year

DRAM density: 60% per year

Magnetic disk density: 50% per year
Technology Trends (continued)

Software:
 More memory usage
 High-level language

Growth rate in CPU speed: 50% per year
 Architectural ideas: pipelining, caching, out-of-
order execution, sophisticated compilers

Trends are important:
 Product cycle is 4 years!
 Also beware of technology thresholds
Cost Trends

Cost depends on various factors:
 Time, volume, competition

Cost of IC:
 Cost of die + Testing + Packaging

Cost of die: Wafer-cost/Dies-per-wafer

Yield is an important factor

Cost proportional to Die-area^4
Performance Comparison

What performance metric to use?
 User cares about response time
 Performance is inversely proportional

What is execution time?
 Response time
 CPU time: User time + System time

System performance vs. CPU performance
 Throughput vs. response-time

We will focus on CPU performance
Which Program's Execution
Time?

Real “workload” is ideal

Practical options:
 Real programs: compilers, office-suite, scientific...
 Kernels: key pieces of programs

Example: Livermore loops
 Toy benchmarks: small programs

Examples: Quick-sort, tower of Hanoi...
 Synthetic benchmarks: try to capture “average”
frequency of instructions in real programs

Example: Whetstone, Dhrystone
More on Performance
Comparisons...

Caveat of benchmarks
 They are needed
 But manufacturers tend to optimize for benchmarks
 Need to be updated periodically

Benchmark suite: collection of programs
 E.g. SPEC92

Reporting performance
 Reproducibility: program version, compiler, flags
 SPEC specifies compiler flags for baseline
comparison
Some Numerics...
Computer A Computer B Computer C
Program P1 (secs) 1 10 20
Program P2 (secs) 1000 100 20
Total (secs) 1001 110 40


Total (or average) execution time is a
possible metric

Weighted execution time is better  W i x T i
Normalizing the Performance
Norm(A) Norm(A) Norm(A) Norm(B) Norm(B) Norm(B) Norm(C) Norm(C) Norm(C)
A B C A B C A B C
P1 1 10 20 0.1 1 2 0.05 0.5 1
P2 1 0.1 0.02 10 1 0.2 50 5 1
AM 1 5.05 10.01 5.05 1 1.1 25.03 2.75 1


Normalize such that all programs take the
same time, on some machine

Arithmetic mean predicts performance

Geometric mean?
Summary

Performance inversely proportional to
execution-time
 We are concerned with CPU time of unloaded
machine

Weighted execution time with weights from
real workload is ideal

Else, normalize w.r.t one machine
Amdahl's Law

Amdahl's law:
 Diminishing returns 1-F 1-F
 Limit on overall speedup
F/Speedup

Corollary: make the F
common case fast
Amdahl's Law

Amdahl's law:

1-F
Diminishing returns
 Limit on overall speedup
1 F  F F
Overall speedup
F
1 F 
Speedup
1-F

Corollary: make the
common case fast
F/Speedup
Illustrating Amdahl's Law

Example: implement cache, or faster ALU?
 Cache improves performance by 10x
 ALU improves performance by 3x

Depends on fraction of instructions
 Suppose F mem 0.2, F alu 0.5, F other 0.3
1
Speedup with cache 1.22
0.80.2 10
1
Speedup with faster ALU 1.5
0.50.5 3
Example continued...

Fixing F alu 0.5 for what value of F mem is
adding a cache better?
1
1.5
1 F mem F mem 10

10
F mem 0.36
27
The CPU Performance Equation
CPU time Num. clock cycles Clock cycle time
OR
CPU time Num. of clock cycles Clock rate
For a program,
Num. of clock cycles
Instruction Count Cycles Per Instruction
IC CPI

Putting these together


CPU time IC CPI Cycle time
More on the Equation

This form is convenient
 Involves many relevant parameters

Remembering is easy
Seconds
CPU time
Program
Seconds Clock cycles Instructions

Clock cycle Instruction Program

With CPI as the independent variable
CPU time
CPI 
Clock cycle time IC
Other Convenient Forms of the
Equation

Number of clock cycles can be counted as:
n
CPU clock cycles CPI i IC i
i1
n
Hence ,CPU time CPI i IC i  Clock cycle time
i1

Calculating CPI in terms of CPI i
CPU time
n IC i
CPI   CPI i  
Clock cycle time IC i1 IC
Usefulness of the Equation
 IC i easier to measure than F i
 Equivalently, F iis measured through IC i

Equation includes relevant parameters such
as the cycle time
Measuring the Parameters for
the Equation

Clock cycle time:
 Easy for existing architectures
 Needs to be estimated in the design process

Instruction Count:
 Requires a compiler
 And, simulator/interpreter, or instrumentation code

CPI for each instruction type:
 Easy for simple architectures
 Pipelines, caches introduce complications
 Need to simulate and measure average CPI
A Design Example

A design choice for conditional branch
instructions:
 Choice 1: condition code is set by a compare
instruction, checked by the next (branch)
instruction

20% instructions are branches, and another 20% are
compares

2 cycles per branch, 1 cycle for all others

Clock-rate is 25% faster
 Choice 2: single instruction for compare and
branch

Which choice is better?
Solution for Design Example
IC 1  0.8 1 0.2 2 IC 1 1.2
CPU time1  
1.25 C C 1.25

IC 1  0.6 1 0.2 2 IC 1


CPU time 2  
C C
Detailed Example: Using Caches

Thumb rule in hardware design:
 Smaller is faster

Signal propagation delay is lesser

More power per memory cell

Observation w.r.t. software:
 Locality of reference
 Spatial as well as temporal
The Memory Hierarchy

CPU Cache Memory I/O Devices


Registers

Slower
5ns 10ns 100ns O(10ms)
Larger
200B 512KB 512MB O(10-100GB)
Cheaper
Modifying the CPU Performance
Equation

Caches involved hits and misses

Cache miss ==> memory stalls
CPU timeCPU clock cycles Memory stall cycles
Clock cycle
Memory stall cyclesNum. misses Miss penalty

IC Misses per instruction Miss penalty


IC Mem. refs. per instruction Miss rate Miss penalty

Equation in the final form is useful: parameters
can be measured
Some Numerics...
Fraction of memory access instructions0.4

CPI for memory instructions hits2

CPI for other instructions1

Choice 1 : 0.04 miss rate , 25 cycle penalty

Choice 2 : 0.02 miss rate , 50 cycle penalty

Which is a better choice?


What is the overall average CPI?
CPI avg  0.6 1 0.4 2 10.4 0.02 502.8

You might also like