100% found this document useful (1 vote)
33 views

EE204 Computer Architecture: Lecture 31-Cache Performance

This document discusses cache performance and how to measure and improve it. It covers reducing cache miss rates by decreasing the probability of conflicts and adding additional cache levels. Cache performance is affected by memory stall clock cycles due to reads and writes. Read stall cycles depend on read miss rates and penalties while write stall cycles depend on write miss rates, penalties, and buffer stalls. The document provides an example to calculate performance improvements from a perfect cache versus one with misses.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
33 views

EE204 Computer Architecture: Lecture 31-Cache Performance

This document discusses cache performance and how to measure and improve it. It covers reducing cache miss rates by decreasing the probability of conflicts and adding additional cache levels. Cache performance is affected by memory stall clock cycles due to reads and writes. Read stall cycles depend on read miss rates and penalties while write stall cycles depend on write miss rates, penalties, and buffer stalls. The document provides an example to calculate performance improvements from a perfect cache versus one with misses.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

EE204

Computer Architecture

Lecture 31- Cache


Performance
14th Apr, 2011

EE204 L31 Humaira, Spring 11 1


Measuring & Improving Cache
Performance
• Reducing Miss Rate
– Reducing the probability of two different memory
address for same Cache location
– Adding an Additional Level of Cache
• CPU time
– Clock cycles spent in instruction execution
– Clock cycles spent in waiting for memory system
(Memory-Stall Clock Cycles)

EE204 L31 Humaira, Spring 11 2


Cache Performance
• Memory-Stall Clock cycles
– Clocks spent in Cache Misses
– Read-Stall Cycles + Write-Stall Cycles
• Read-Stall Cycles
– Read accesses per program x
– Read Miss Rate x
– Read Miss Penalty (Clock Cycles)

EE204 L31 Humaira, Spring 11 3


Cache Performance
• Write-Stall Cycles
• Write-through Scheme
– Write Misses (requires fetching of block)
– Write Buffer Stalls (write buffer is full)
– Write Buffer stalls depends on timing of writes,
with sufficient write-buffer depth buffer stalls
are insignificant and can be ignored

EE204 L31 Humaira, Spring 11 4


Cache Performance
• Write-Stall Cycles
– Write accesses per program x
– Write Miss Rate x
– Write Miss Penalty (Clock Cycles)
• Write-Back scheme
– Write-stall when a Cache block is written back to
memory when block is replaced

EE204 L31 Humaira, Spring 11 5


Cache Performance
• Write-through Cache Scheme
– Read Miss Penalty = Write Miss Penalty
• Memory-Stall Clock cycles
– Memory Access per program x
– Miss Rate x
– Miss Penalty
• Memory-Stall Clock cycles
– Instructions per program x
– Misses per Instruction x
– Miss Penalty

EE204 L31 Humaira, Spring 11 6


Cache Performance Example
• Instruction Cache Miss Rate = 2%
• Data Cache Miss Rate = 4%
• Machine CPI = 2 without memory stalls
• Miss Penalty = 40 cycles
• 36% of Instructions are Data Access instructions
• How much faster will machine run with a perfect
Cache which never Misses?

EE204 L31 Humaira, Spring 11 7


Cache Performance Example
• Instruction Miss cycles = I x 2% x 40 = 0.80I
• Data Miss cycles = I x 36% x 4% x 40 = 0.57I
• Total Memory Stall cycles = 1.37I
• CPI with Memory stall = 2+1.37 = 3.37

EE204 L31 Humaira, Spring 11 8


Cache Performance Example
• Performance
= CPU time with stalls/CPU time with perfect Cache
= I x CPIstall x clock cycle/I x CPIperfect x clock cycle
= CPIstall /CPIperfect
= 3.37/2
= 1.68 faster
• Processor is made faster CPI = 1
• Performance = 2.37/1 = 2.37 faster

EE204 L31 Humaira, Spring 11 9


Cache Performance Example
• Clock Rate is doubled
• Total miss cycles/instruction
= (2% x 80) + 36% x (4% x 80) = 2.75
• CPI = 2 + 2.75 = 4.75
• Performance with fast clock/Performance with slow
clock
= exec. time with slow clock/exec. time with fast
clock
= IC x CPI x clock cycle/IC x CPI x clock cycle/2
= 3.36/4.75 x (1/2)
= 1.41
• m/c with faster clock is 1.41 times faster instead of
2.00
EE204 L31 Humaira, Spring 11 10
Cache Performance Example
• Relative Cache penalties increase as machine becomes
faster
• If Clock rate & CPI improve the performance suffers
– Lower the CPI, the more pronounced the effect of
stall cycles
– A higher CPU clock rate leads to a larger miss
penalty

EE204 L31 Humaira, Spring 11 11

You might also like