Shared Memory Architecture Concepts and Performance Issues: Outline
Shared Memory Architecture Concepts and Performance Issues: Outline
Lecture 1: Multicore Architecture Concepts Lecture 2: Parallel programming with threads and tasks Lecture 3: Shared memory architecture concepts and performance issues
Memory hierarchy Consistency issues and coherence protocols Performance issues, e.g. false sharing Optimizations for data locality (briefly, more in Lecture 7)
Lecture 4: Design and analysis of parallel algorithms Lecture 5: Parallel Sorting Algorithms
2012
2
performance bottleneck
Often implemented with caches to leverage access locality
Cache
Cache = small, fast memory (SRAM) between processor and main memory, today typically on-chip contains copies of main memory words cache hit = accessed word already in cache, get it fast. cache miss = not in cache, load from main memory (slower) Cache line holds a copy of a block of adjacent memory words size: from 16 bytes upwards, can differ for cache levels Cache-based systems profit from spatial access locality access also other data in same cache line temporal access locality access same location multiple times HW-controlled cache line replacement dynamic adaptivity of cache contents suitable for applications with high (also dynamic) data locality
5
Cache (cont.)
Loop Interchange
for (j=0; j<M; j++) for (i=0; i<N; i++) a[ i ][ j ] = 0.0 ;
j
a[0][0]
i ....
old iteration order a[N-1][0]
i
new iteration order
....
a[N-1][0]
Can improve spatial locality of memory accesses (fewer cache misses / page faults) 7
10
11
12
13
14
Write-Update Protocol
Bus-Snooping
15
16
17
18
Example
Core i Cache lines / state Core j Core i Load x: Cache lines / state I Core j
x
19
x
20
BusRd x Memory blocks / state Processor read / bus read observation: state for other copies remains S
BusRdX x Memory blocks / state Must be exclusive owner before writing to local copy: first invalidate the others and update my copy, by BusRdX x
x
21
x
22
Core j
x
23
x
24
Another scenario:
Core i Cache lines / state Core j Core i Store x: Cache lines / state Core j
BusRdX x Memory blocks / state Memory blocks / state Observation of BusRdX x: Cache controller invalidates own copy
x
25
Subsequent store requires flush/update of the cache line in memory and requesting cache before overwriting
x
26
MESI-Protocol
27
28
29
30
31
32
33
34
35
36
Questions?
37
Further Reading
D. Culler et al. Parallel Computer Architecture, a
39