3310
3310
“Architecture”
The art and science of designing and
constructing buildings
A style and method of design and construction
Design, the way components fit together
Computer Architecture
The overall design or structure of a computer
system, including the hardware and the software
required to run it, especially the internal structure
of the microprocessor
Computer Architecture
Design aspects:
Instruction set
Cache and memory hierarchy
I/O, storage, disk
Multi-processors, networked-systems
Criteria: performance, cost, end-applications,
complexity
Technology Trends
Since 1970s: Microprocessor-based
Several PCs/Workstations put together can
buy more cycles for the same cost
The Berkeley NOW project
Transistor density: 50% per year
DRAM density: 60% per year
Magnetic disk density: 50% per year
Technology Trends (continued)
Software:
More memory usage
High-level language
Growth rate in CPU speed: 50% per year
Architectural ideas: pipelining, caching, out-of-
order execution, sophisticated compilers
Trends are important:
Product cycle is 4 years!
Also beware of technology thresholds
Cost Trends
Cost depends on various factors:
Time, volume, competition
Cost of IC:
Cost of die + Testing + Packaging
Cost of die: Wafer-cost/Dies-per-wafer
Yield is an important factor
Cost proportional to Die-area^4
Performance Comparison
What performance metric to use?
User cares about response time
Performance is inversely proportional
What is execution time?
Response time
CPU time: User time + System time
System performance vs. CPU performance
Throughput vs. response-time
We will focus on CPU performance
Which Program's Execution
Time?
Real “workload” is ideal
Practical options:
Real programs: compilers, office-suite, scientific...
Kernels: key pieces of programs
Example: Livermore loops
Toy benchmarks: small programs
Examples: Quick-sort, tower of Hanoi...
Synthetic benchmarks: try to capture “average”
frequency of instructions in real programs
Example: Whetstone, Dhrystone
More on Performance
Comparisons...
Caveat of benchmarks
They are needed
But manufacturers tend to optimize for benchmarks
Need to be updated periodically
Benchmark suite: collection of programs
E.g. SPEC92
Reporting performance
Reproducibility: program version, compiler, flags
SPEC specifies compiler flags for baseline
comparison
Some Numerics...
Computer A Computer B Computer C
Program P1 (secs) 1 10 20
Program P2 (secs) 1000 100 20
Total (secs) 1001 110 40
Total (or average) execution time is a
possible metric
Weighted execution time is better W i x T i
Normalizing the Performance
Norm(A) Norm(A) Norm(A) Norm(B) Norm(B) Norm(B) Norm(C) Norm(C) Norm(C)
A B C A B C A B C
P1 1 10 20 0.1 1 2 0.05 0.5 1
P2 1 0.1 0.02 10 1 0.2 50 5 1
AM 1 5.05 10.01 5.05 1 1.1 25.03 2.75 1
Normalize such that all programs take the
same time, on some machine
Arithmetic mean predicts performance
Geometric mean?
Summary
Performance inversely proportional to
execution-time
We are concerned with CPU time of unloaded
machine
Weighted execution time with weights from
real workload is ideal
Else, normalize w.r.t one machine
Amdahl's Law
Amdahl's law:
Diminishing returns 1-F 1-F
Limit on overall speedup
F/Speedup
Corollary: make the F
common case fast
Amdahl's Law
Amdahl's law:
1-F
Diminishing returns
Limit on overall speedup
1 F F F
Overall speedup
F
1 F
Speedup
1-F
Corollary: make the
common case fast
F/Speedup
Illustrating Amdahl's Law
Example: implement cache, or faster ALU?
Cache improves performance by 10x
ALU improves performance by 3x
Depends on fraction of instructions
Suppose F mem 0.2, F alu 0.5, F other 0.3
1
Speedup with cache 1.22
0.80.2 10
1
Speedup with faster ALU 1.5
0.50.5 3
Example continued...
Fixing F alu 0.5 for what value of F mem is
adding a cache better?
1
1.5
1 F mem F mem 10
10
F mem 0.36
27
The CPU Performance Equation
CPU time Num. clock cycles Clock cycle time
OR
CPU time Num. of clock cycles Clock rate
For a program,
Num. of clock cycles
Instruction Count Cycles Per Instruction
IC CPI
Slower
5ns 10ns 100ns O(10ms)
Larger
200B 512KB 512MB O(10-100GB)
Cheaper
Modifying the CPU Performance
Equation
Caches involved hits and misses
Cache miss ==> memory stalls
CPU timeCPU clock cycles Memory stall cycles
Clock cycle
Memory stall cyclesNum. misses Miss penalty