Computer Performance
Execution time
Time can be defined in different ways, depending on what
we are measuring:
Response time : Total time to complete a task,
including time spent executing on the CPU, accessing
disk and memory, waiting for I/O and other processes,
and operating system overhead.
CPU execution time : Total time a CPU spends
computing on a given task (excludes time for I/O or
running other programs). This is also referred to as
simply CPU time.
Advanced Computer Architecture
3
Clock
Computer Designers use clock ticks to measure how fast
the hardware can perform basic functions
Clock rate (frequency) = cycles per second.
Measured in Hertz (1 Hz = 1 cycle/s).
Clock period is the time between ticks of the clock and is
measured in seconds per cycle.
Period = 1/frequency
Example: A 200 MHz (MegaHertz) clock has a clock period
of 5nanoseconds
Advanced Computer Architecture
4
Time & Clock Metrics
Determine effect of design change on performance
CPU execution time = CPU clock cycles X clock cycle time
CPU execution time = CPU clock cycles/clock rate
For some program running on machine X,
PerformanceX = 1 / Execution timeX
"X is n times faster than Y"
PerformanceX / PerformanceY = n
Problem:
machine A runs a program in 20 seconds
machine B runs the same program in 25 seconds
Advanced Computer Architecture
5
How to improve performance?
seconds cycles seconds
program program cycle
So, to improve performance (everything else being equal)
you can either
reduce
_______ the # of required cycles for a program, or
decrease
_______ the clock cycle time or, said another way,
increase
_______ the clock rate.
execution time number of cycles clock cycle time
Advanced Computer Architecture
6
Example
Our favorite program runs in 10 seconds on computer A,
which has a 400 Mhz. clock. We are trying to help a
computer designer build a new machine B, that will run
this program in 6 seconds. The designer can use new (or
perhaps more expensive) technology to substantially
increase the clock rate, but has informed us that this
increase will affect the rest of the CPU design, causing
machine B to require 1.2 times as many clock cycles as
machine A for the same program. What clock rate
should we tell the designer to target?
Advanced Computer Architecture
7
Example
Let C = number of cycles
Execution time = C X clock cycle time = C/ clock rate
On computer A,
C/ 400 MHz = C/ 400 X 106 = 10 seconds => C = 400 X 107
On computer B, number of cycles = 1.2 X C
What should be B’s clock rate so that our favorite program has
smaller execution time?
1.2 X C/ clock rate = 6 => 1.2 X 400 X 107 / 6 = clock rate
I.e. clock rate = 800 MHz
Advanced Computer Architecture
8
CPU performance equation
An alternative to "number of clock cycles" is "number of
instructions executed" or Instruction Count ( IC ).
Given both the "number of clock cycles" and IC of a
program, the average Clocks Per Instruction ( CPI ) is given
by:
CPU Clock Cycles of a Program
CPI
IC
IC CPI
CPU time IC CPI Clock cycle time
Clock rate
Advanced Computer Architecture
9
Example
Suppose we have two implementations of the same
instruction set architecture. Machine A has a clock cycle
time of 1 ns (nanoseconds) and a CPI of 2.0 for some
program, and machine B has a clock cycle time of 2 ns and
a CPI of 1.2 for the same program.
Which machine is faster for this program, and how much
faster is it?
Advanced Computer Architecture
10
Solution
We know that each machine executes the same number of
instructions for the same program; let’s call this number I
CPU clock cyclesA = I x 2.0 CPU
clock cyclesB = I x 1.2
CPU timeA = CPU clock cyclesA x Clock cycle timeA
= I x 2.0 x 1 ns = 2 x I ns
CPU timeB = I x 1.2 x 2 ns = 2.4 x I ns
Machine A is faster. How much?
Machine A is 1.2 times faster than machine B
Advanced Computer Architecture
11
Example
A compiler designer is trying to decide between two code
sequences for a particular machine. Based on the hardware
implementation, there are three different classes of
instructions: Class A, Class B, and Class C, and they require
one, two, and three cycles (respectively).
The first code sequence has 5 instructions: 2 of A, 1 of B,
and 2 of C
The second sequence has 6 instructions: 4 of A, 1 of B, and
1 of C.
Which sequence will be faster? How much?
What is the CPI for each sequence?
Advanced Computer Architecture
12
Solution
Sequence 2 is 11 % faster than sequence 1
Advanced Computer Architecture
13
Remember
A given program will require
– some number of instructions (machine instructions)
– some number of cycles
– some number of seconds
We have a vocabulary that relates these quantities:
– cycle time (seconds per cycle)
– clock rate (cycles per second)
– CPI (cycles per instruction)
Advanced Computer Architecture
14
CPU Performance
For a given instruction set architecture, increases in CPU
performance can come from three sources:
1. lower the instruction count or generate instructions with a
lower average CPI
2. lower the CPI
3. Increases in clock rate
Advanced Computer Architecture
15
Does doubling the clock rate
double the performance?
Advanced Computer Architecture
16
Amdahl’s law
The performance improvement to be gained from using
some faster mode of execution is limited by the fraction of
the time the faster mode can be used.
This implies that the time consumed by events whose
performance is not improved limits the effect of any
improvement.
Lowest performer restricts all others.
Execution Time After Improvement = Execution Time
Unaffected + ( Execution Time Affected / Amount of
Improvement )
Advanced Computer Architecture
17
Example
Trip from point A to point B in two parts
A 20 C 50/20/4/1.7/0.3 B
A-C Trip C-B Trip Total Time C-B Speedup Overall Speedup
20 50 70 1 1
20 20 40 2.5 1.75
20 4 24 12.5 2.9
20 1.7 21.7 29.4 3.2
20 0.3 20.3 166.66 3.4
Advanced Computer Architecture
18
Example
Example: "Suppose a program runs in 100 seconds on
a machine, with multiply responsible for 80 seconds
of this time. How much do we have to improve the
speed of multiplication if we want the program to run
4 times faster?"
Advanced Computer Architecture
19
Answer
Execution time after improvement = (100 – 80 seconds) +
(80 seconds / n)
Since we want the performance to be 4 times faster, the
new execution time should be 25 seconds, giving:
25 seconds = (20 seconds) + (80 seconds / n)
5 = (80 seconds / n)
n = 80/5 = 16 times
5 times improvement?
There is no amount by which we can enhance multiply to
achieve a fivefold increase in performance, if multiply
accounts for only 80% of the workload.
Advanced Computer Architecture
20