This document discusses cache performance and how to measure and improve it. It covers reducing cache miss rates by decreasing the probability of conflicts and adding additional cache levels. Cache performance is affected by memory stall clock cycles due to reads and writes. Read stall cycles depend on read miss rates and penalties while write stall cycles depend on write miss rates, penalties, and buffer stalls. The document provides an example to calculate performance improvements from a perfect cache versus one with misses.
This document discusses cache performance and how to measure and improve it. It covers reducing cache miss rates by decreasing the probability of conflicts and adding additional cache levels. Cache performance is affected by memory stall clock cycles due to reads and writes. Read stall cycles depend on read miss rates and penalties while write stall cycles depend on write miss rates, penalties, and buffer stalls. The document provides an example to calculate performance improvements from a perfect cache versus one with misses.
Measuring & Improving Cache Performance • Reducing Miss Rate – Reducing the probability of two different memory address for same Cache location – Adding an Additional Level of Cache • CPU time – Clock cycles spent in instruction execution – Clock cycles spent in waiting for memory system (Memory-Stall Clock Cycles)
EE204 L31 Humaira, Spring 11 2
Cache Performance • Memory-Stall Clock cycles – Clocks spent in Cache Misses – Read-Stall Cycles + Write-Stall Cycles • Read-Stall Cycles – Read accesses per program x – Read Miss Rate x – Read Miss Penalty (Clock Cycles)
EE204 L31 Humaira, Spring 11 3
Cache Performance • Write-Stall Cycles • Write-through Scheme – Write Misses (requires fetching of block) – Write Buffer Stalls (write buffer is full) – Write Buffer stalls depends on timing of writes, with sufficient write-buffer depth buffer stalls are insignificant and can be ignored
EE204 L31 Humaira, Spring 11 4
Cache Performance • Write-Stall Cycles – Write accesses per program x – Write Miss Rate x – Write Miss Penalty (Clock Cycles) • Write-Back scheme – Write-stall when a Cache block is written back to memory when block is replaced
EE204 L31 Humaira, Spring 11 5
Cache Performance • Write-through Cache Scheme – Read Miss Penalty = Write Miss Penalty • Memory-Stall Clock cycles – Memory Access per program x – Miss Rate x – Miss Penalty • Memory-Stall Clock cycles – Instructions per program x – Misses per Instruction x – Miss Penalty
EE204 L31 Humaira, Spring 11 6
Cache Performance Example • Instruction Cache Miss Rate = 2% • Data Cache Miss Rate = 4% • Machine CPI = 2 without memory stalls • Miss Penalty = 40 cycles • 36% of Instructions are Data Access instructions • How much faster will machine run with a perfect Cache which never Misses?
EE204 L31 Humaira, Spring 11 7
Cache Performance Example • Instruction Miss cycles = I x 2% x 40 = 0.80I • Data Miss cycles = I x 36% x 4% x 40 = 0.57I • Total Memory Stall cycles = 1.37I • CPI with Memory stall = 2+1.37 = 3.37
EE204 L31 Humaira, Spring 11 8
Cache Performance Example • Performance = CPU time with stalls/CPU time with perfect Cache = I x CPIstall x clock cycle/I x CPIperfect x clock cycle = CPIstall /CPIperfect = 3.37/2 = 1.68 faster • Processor is made faster CPI = 1 • Performance = 2.37/1 = 2.37 faster
EE204 L31 Humaira, Spring 11 9
Cache Performance Example • Clock Rate is doubled • Total miss cycles/instruction = (2% x 80) + 36% x (4% x 80) = 2.75 • CPI = 2 + 2.75 = 4.75 • Performance with fast clock/Performance with slow clock = exec. time with slow clock/exec. time with fast clock = IC x CPI x clock cycle/IC x CPI x clock cycle/2 = 3.36/4.75 x (1/2) = 1.41 • m/c with faster clock is 1.41 times faster instead of 2.00 EE204 L31 Humaira, Spring 11 10 Cache Performance Example • Relative Cache penalties increase as machine becomes faster • If Clock rate & CPI improve the performance suffers – Lower the CPI, the more pronounced the effect of stall cycles – A higher CPU clock rate leads to a larger miss penalty