William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
William Stallings Computer Organization and Architecture 8 Edition Computer Evolution and Performance
Computer Organization
and Architecture
8th Edition
Chapter 2
Computer Evolution and
Performance
Performance
• Why is some hardware better than others
for different programs?
• What factors of system performance are
hardware related? (e.g., Do we need a
new machine, or a new operating
system?)
• How does the machine's instruction set
affect performance?
What determines performance of a
machine?
Performance evaluation of the airplane?
• Processor frequency?
—Is it better an Intel Pentium IV at 3.2GHz or an
AMD at 2.2GHz? And how much better the best
one is?
• Memory speed? Cache efficiency?
—Is it better to have 8M of cache or 16M of cache?
Is it better to have a large single-level cache or a
very large slower L2 cache and a very fast small
L1 cache?
Time Time Time Time!!
• Time can be defined in different ways, depending
on what we are measuring:
—Response time : Total time to complete a
task, including time spent executing on the
CPU, accessing disk and memory, waiting for
I/O and other processes, and operating
system overhead.
—CPU execution time : Total time a CPU
spends computing on a given task (excludes
time for I/O or running other programs). This
is also referred to as simply CPU time.
—User CPU time : Total time CPU spends in the
program
—System CPU execution time : Total time
operating system spends executing tasks for
the program.
Performance Metrics
• Computer Designers use clock ticks to
measure how fast the hardware can
perform basic functions
• Clock ticks running at constant rate
—Determines when events take place
—Synchronize activities
—Discrete time intervals are called Clock ticks,
ticks, clock periods, clocks, cycles
Clock Cycles
• Instead of using seconds to measure execution
time, often we use clock cycles, clock ticks, clock
periods, clocks, or cycles.
—Clock rate (frequency) = cycles per second.
—Measured in Hertz (1 Hz = 1 cycle/s).
• Clock period is the time between ticks of the
clock and is measured in seconds per cycle.
—Period = 1/frequency
• Example: A 200 MHz (MegaHertz) clock has a
clock period of 5nanoseconds
• Warning: Some people refer to the clock period
as the clock rate; they are not the same thing!
Relating the Time & Clock Metrics
• Determine effect of design change on
performance
PerformanceX / PerformanceY = n
Problem:
—machine A runs a program in 20 seconds
—machine B runs the same program in 25
seconds
How to Improve Performance
seconds cycles seconds
program program cycle
2nd instruction
3rd instruction
1st instruction
time
4th
5th
6th
...
This assumption is incorrect, different instructions take different
amounts of time on different machines.
Why? hint: remember that these are machine instructions, not lines of C code
Example
A 20 C 50/20/4/1.7/0.3 B
A-C Trip C-B Trip Total Time C-B Speedup Overall Speedup
20 50 70 1 1
20 20 40 2.5 1.75
20 4 24 12.5 2.9
20 1.7 21.7 29.4 3.2
20 0.3 20.3 166.66 3.4
Amdahl’s law
Exec timenew = execution time after some
enhancement
Exec timeold = execution time before any
enhancement
Fractionenhanced = fraction of work using the
enhancement
Speedupenhanced = speedup of enhanced mode
Example
• Suppose that we are considering an
enhancement to the processor of a server system
used for web serving. The new CPU is 10 times
faster on computation in the Web serving
application that the original processor. Assuming
that the original CPU is busy with computation
40% of the time and is waiting for I/O 60% of
the time, what is the overall speedup gained by
incorporating the enhancement?
Amdahl’s law Example
• Consider an enhancement that takes 20ns
on a machine with enhancement and
100ns on a machine without. Assume
enhancement can only be used 30% of
the time.
• What is the overall speedup?
Example
A common transformation required in graphics processors is
square root. Implementations of floating-point (FP) square root
vary significantly in performance, especially among processors
designed for graphics. Suppose FP square root (FPSQR) is
responsible for 20% of the execution time of a critical graphics
benchmark. One proposal is to enhance the FPSQR hardware and
speed up this operation by a factor of 10. The other alternative is
just to try to make all FP instructions in the graphics processor run
faster by a factor of 1.6; FP instructions are responsible for half of
the execution time for the application. The design team believes
that they can make all FP instructions run 1.6 times faster with the
same effort as required for the fast square root. Compare these two
design alternatives.