0% found this document useful (0 votes)
48 views

Shared Memory Architecture Concepts and Performance Issues: Outline

The document outlines a lecture on shared memory architecture concepts and performance issues. It discusses topics like memory hierarchy, consistency issues and cache coherence protocols. It provides examples of how caches work and consistency is maintained. It also covers optimizations for data locality, different cache coherence protocols (e.g. write-invalidate) and issues like false sharing that can hurt performance.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Shared Memory Architecture Concepts and Performance Issues: Outline

The document outlines a lecture on shared memory architecture concepts and performance issues. It discusses topics like memory hierarchy, consistency issues and cache coherence protocols. It provides examples of how caches work and consistency is maintained. It also covers optimizations for data locality, different cache coherence protocols (e.g. write-invalidate) and issues like false sharing that can hurt performance.
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Outline

Lecture 1: Multicore Architecture Concepts Lecture 2: Parallel programming with threads and tasks Lecture 3: Shared memory architecture concepts and performance issues
Memory hierarchy Consistency issues and coherence protocols Performance issues, e.g. false sharing Optimizations for data locality (briefly, more in Lecture 7)

Shared Memory Architecture Concepts and Performance Issues


TDDD56 Lecture 3 Christoph Kessler
PELAB / IDA Linkping university Sweden

Lecture 4: Design and analysis of parallel algorithms Lecture 5: Parallel Sorting Algorithms

2012
2

Shared Memory vs. Distributed Memory

Shared Memory Variants


Single shared memory module (UMA) quickly becomes a

performance bottleneck
Often implemented with caches to leverage access locality

As done for single-processor systems, too

Can even be realized on top of distributed memory system

(NUMA non-uniform memory access)

Cache
Cache = small, fast memory (SRAM) between processor and main memory, today typically on-chip contains copies of main memory words cache hit = accessed word already in cache, get it fast. cache miss = not in cache, load from main memory (slower) Cache line holds a copy of a block of adjacent memory words size: from 16 bytes upwards, can differ for cache levels Cache-based systems profit from spatial access locality access also other data in same cache line temporal access locality access same location multiple times HW-controlled cache line replacement dynamic adaptivity of cache contents suitable for applications with high (also dynamic) data locality
5

Cache (cont.)

Optimizing Programs for Improved Access Locality


Example:

Caches: Memory Update Strategies

Loop Interchange
for (j=0; j<M; j++) for (i=0; i<N; i++) a[ i ][ j ] = 0.0 ;
j
a[0][0]

for (i=0; i<N; i++) for (j=0; j<M; j++) a[ i ][ j ] = 0.0 ;


j
row-wise storage of 2D-arrays in C, Java
a[0][0] a[0][M-1]

i ....
old iteration order a[N-1][0]

i
new iteration order

....

a[N-1][0]

Can improve spatial locality of memory accesses (fewer cache misses / page faults) 7

Memory Hierarchy Example

Cache Coherence and Memory Consistency

10

Cache Coherence Formal Definition

Cache Coherence Protocols

11

12

Details: Write-Invalidate Protocol

Write-Invalidate Protocol (cont.)

13

14

Write-Update Protocol

Bus-Snooping

15

16

Write-back invalidation protocol (MSI-protocol)

MSI-Protocol: State Transitions

17

18

Example
Core i Cache lines / state Core j Core i Load x: Cache lines / state I Core j

Cache lines / state

Cache lines / state I

BusRd x Memory blocks / state Memory blocks / state

x
19

x
20

Write requires invalidation


Core i Cache lines / state Core j Load x: Cache lines / state S Core i Cache lines / state Core j Store x: Cache lines / state M

BusRd x Memory blocks / state Processor read / bus read observation: state for other copies remains S

BusRdX x Memory blocks / state Must be exclusive owner before writing to local copy: first invalidate the others and update my copy, by BusRdX x

x
21

x
22

Core i Cache lines / state

Core j Store x: Cache lines / state M

Core i Load x: Cache lines / state

Core j

Cache lines / state S

BusRdX x Memory blocks

BusRd x Memory blocks

x
23

x
24

Another scenario:
Core i Cache lines / state Core j Core i Store x: Cache lines / state Core j

Cache lines / state M

Cache lines / state I

BusRdX x Memory blocks / state Memory blocks / state Observation of BusRdX x: Cache controller invalidates own copy

x
25

Subsequent store requires flush/update of the cache line in memory and requesting cache before overwriting

x
26

MSI-Protocol: State Transitions

MESI-Protocol

27

28

CC-NUMA: Directory based Protocol for non-bus-based Architectures

Performance Issue: False Sharing

29

30

How to Avoid False Sharing?

Shared Memory Consistency Models

31

32

Consistency Models: Strict Consistency

Consistency Models: Sequential Consistency

33

34

Consistency Models: Weak Consistency

Consistency Models: Weak Consistency in OpenMP

35

36

Consistency Models: Weak Consistency in OpenMP (cont.)

Questions?

37

Further Reading
D. Culler et al. Parallel Computer Architecture, a

Hardware/Software Approach. Morgan Kaufmann, 1998.


J. Hennessy, D. Patterson: Computer Architecture, a

Quantitative Approach, Second edition (1996) or later. Morgan Kaufmann.


S. Adve, K. Gharachorloo: Shared memory consistency

models: a tutorial. IEEE Computer, 1996.

39

You might also like