0% found this document useful (0 votes)

75 views7 pages

Shared Memory Architecture Concepts and Performance Issues: Outline

The document outlines a lecture on shared memory architecture concepts and performance issues. It discusses topics like memory hierarchy, consistency issues and cache coherence protocols. It provides examples of how caches work and consistency is maintained. It also covers optimizations for data locality, different cache coherence protocols (e.g. write-invalidate) and issues like false sharing that can hurt performance.

Uploaded by

Mansouri Abdelkhalek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views7 pages

Shared Memory Architecture Concepts and Performance Issues: Outline

Uploaded by

Mansouri Abdelkhalek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Outline

Lecture 1: Multicore Architecture Concepts Lecture 2: Parallel programming with threads and tasks Lecture 3: Shared memory architecture concepts and performance issues
Memory hierarchy Consistency issues and coherence protocols Performance issues, e.g. false sharing Optimizations for data locality (briefly, more in Lecture 7)

Shared Memory Architecture Concepts and Performance Issues

TDDD56 Lecture 3 Christoph Kessler
PELAB / IDA Linkping university Sweden

Lecture 4: Design and analysis of parallel algorithms Lecture 5: Parallel Sorting Algorithms

2012
2

Shared Memory vs. Distributed Memory

Shared Memory Variants

Single shared memory module (UMA) quickly becomes a

performance bottleneck
Often implemented with caches to leverage access locality

As done for single-processor systems, too

Can even be realized on top of distributed memory system

(NUMA non-uniform memory access)

Cache
Cache = small, fast memory (SRAM) between processor and main memory, today typically on-chip contains copies of main memory words cache hit = accessed word already in cache, get it fast. cache miss = not in cache, load from main memory (slower) Cache line holds a copy of a block of adjacent memory words size: from 16 bytes upwards, can differ for cache levels Cache-based systems profit from spatial access locality access also other data in same cache line temporal access locality access same location multiple times HW-controlled cache line replacement dynamic adaptivity of cache contents suitable for applications with high (also dynamic) data locality
5

Cache (cont.)

Optimizing Programs for Improved Access Locality

Example:

Caches: Memory Update Strategies

Loop Interchange
for (j=0; j<M; j++) for (i=0; i<N; i++) a[ i ][ j ] = 0.0 ;
j
a[0][0]

for (i=0; i<N; i++) for (j=0; j<M; j++) a[ i ][ j ] = 0.0 ;

j
row-wise storage of 2D-arrays in C, Java
a[0][0] a[0][M-1]

i ....
old iteration order a[N-1][0]

i
new iteration order

....

a[N-1][0]

Can improve spatial locality of memory accesses (fewer cache misses / page faults) 7

Memory Hierarchy Example

Cache Coherence and Memory Consistency

Cache Coherence Formal Definition

Cache Coherence Protocols

Details: Write-Invalidate Protocol

Write-Invalidate Protocol (cont.)

Write-Update Protocol

Bus-Snooping

Write-back invalidation protocol (MSI-protocol)

MSI-Protocol: State Transitions

Example
Core i Cache lines / state Core j Core i Load x: Cache lines / state I Core j

Cache lines / state

Cache lines / state I

BusRd x Memory blocks / state Memory blocks / state

x
19

x
20

Write requires invalidation

Core i Cache lines / state Core j Load x: Cache lines / state S Core i Cache lines / state Core j Store x: Cache lines / state M

BusRd x Memory blocks / state Processor read / bus read observation: state for other copies remains S

BusRdX x Memory blocks / state Must be exclusive owner before writing to local copy: first invalidate the others and update my copy, by BusRdX x

x
21

x
22

Core i Cache lines / state

Core j Store x: Cache lines / state M

Core i Load x: Cache lines / state

Core j

Cache lines / state S

BusRdX x Memory blocks

BusRd x Memory blocks

x
23

x
24

Another scenario:
Core i Cache lines / state Core j Core i Store x: Cache lines / state Core j

Cache lines / state M

Cache lines / state I

BusRdX x Memory blocks / state Memory blocks / state Observation of BusRdX x: Cache controller invalidates own copy

x
25

Subsequent store requires flush/update of the cache line in memory and requesting cache before overwriting

x
26

MSI-Protocol: State Transitions

MESI-Protocol

CC-NUMA: Directory based Protocol for non-bus-based Architectures

Performance Issue: False Sharing

How to Avoid False Sharing?

Shared Memory Consistency Models

Consistency Models: Strict Consistency

Consistency Models: Sequential Consistency

Consistency Models: Weak Consistency

Consistency Models: Weak Consistency in OpenMP

Consistency Models: Weak Consistency in OpenMP (cont.)

Questions?

Further Reading
D. Culler et al. Parallel Computer Architecture, a

Hardware/Software Approach. Morgan Kaufmann, 1998.

J. Hennessy, D. Patterson: Computer Architecture, a

Quantitative Approach, Second edition (1996) or later. Morgan Kaufmann.

S. Adve, K. Gharachorloo: Shared memory consistency

models: a tutorial. IEEE Computer, 1996.

EGC121lect20 Multicore MSI Protocol
No ratings yet
EGC121lect20 Multicore MSI Protocol
39 pages
Multiprocessor Architectures & Cache Coherence
No ratings yet
Multiprocessor Architectures & Cache Coherence
54 pages
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
No ratings yet
Shared-Memory Architectures: Adapted From A Lecture by Ian Watson, University of Machester
33 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
32 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
Multiprocessors & Thread-Level Parallelism
79% (19)
Multiprocessors & Thread-Level Parallelism
29 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
MC&CC
No ratings yet
MC&CC
21 pages
Cache Coherency
No ratings yet
Cache Coherency
33 pages
Lec 6 SharedArch PDF
No ratings yet
Lec 6 SharedArch PDF
33 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Module 4
No ratings yet
Module 4
40 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Lecture 06
No ratings yet
Lecture 06
26 pages
Lect4 Parallelsystem-Shared Memory
No ratings yet
Lect4 Parallelsystem-Shared Memory
31 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Module 2 (MEMORY) .Debkanta
No ratings yet
Module 2 (MEMORY) .Debkanta
27 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Memory 2
No ratings yet
Memory 2
31 pages
Week 5
No ratings yet
Week 5
52 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
R12 U5 MultiProcessor Architectures
No ratings yet
R12 U5 MultiProcessor Architectures
47 pages
Cache Coherence: CSE 661 - Parallel and Vector Architectures
No ratings yet
Cache Coherence: CSE 661 - Parallel and Vector Architectures
37 pages
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
No ratings yet
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
21 pages
Cache Coherence: - According To Webster's Dictionary
No ratings yet
Cache Coherence: - According To Webster's Dictionary
15 pages
Cache
No ratings yet
Cache
36 pages
Cache Coherency
No ratings yet
Cache Coherency
19 pages
A Survey of Cache Coherence Mechanisms in Shared M
No ratings yet
A Survey of Cache Coherence Mechanisms in Shared M
27 pages
Week4 1
No ratings yet
Week4 1
37 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
IJARCCE-46 Cachemesiwithverilog
No ratings yet
IJARCCE-46 Cachemesiwithverilog
5 pages
Cache Memory Basics
No ratings yet
Cache Memory Basics
3 pages
Lec2 PDF
No ratings yet
Lec2 PDF
21 pages
CA-unit 5-Material-For Reference
No ratings yet
CA-unit 5-Material-For Reference
16 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
CH04 COA9e
No ratings yet
CH04 COA9e
58 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
10 Caches
No ratings yet
10 Caches
124 pages
Term Paper: Cahe Coherence Schemes
No ratings yet
Term Paper: Cahe Coherence Schemes
12 pages
ch5 4
No ratings yet
ch5 4
9 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Distributed OS: Memory & Multiprocessors
No ratings yet
Distributed OS: Memory & Multiprocessors
89 pages
Comporg6 ch12
No ratings yet
Comporg6 ch12
36 pages
Unit 5 (Slides)
No ratings yet
Unit 5 (Slides)
75 pages
Cache Coherence - 20250120 - 142158 - 0000
No ratings yet
Cache Coherence - 20250120 - 142158 - 0000
34 pages
Advanced Shared Memory Systems
No ratings yet
Advanced Shared Memory Systems
25 pages
Qn:Explain Different Latency Hiding Techniques /mechanisms? (Ans:Describe Sections 6.1.2,6.1.3, 6.1.5, 6.2.2.)
No ratings yet
Qn:Explain Different Latency Hiding Techniques /mechanisms? (Ans:Describe Sections 6.1.2,6.1.3, 6.1.5, 6.2.2.)
28 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Types of Parallel Processing Explained
No ratings yet
Types of Parallel Processing Explained
3 pages
Parallel Prefix Sum
No ratings yet
Parallel Prefix Sum
32 pages
Cloud Computing: Theory and Practice 3rd Edition Dan C. Marinescu Digital Download
No ratings yet
Cloud Computing: Theory and Practice 3rd Edition Dan C. Marinescu Digital Download
150 pages
OpenMP Shared Memory Guide
No ratings yet
OpenMP Shared Memory Guide
35 pages
Data Mining Report
100% (1)
Data Mining Report
15 pages
Parallel and Distributed Computing Lec 1 & 2
No ratings yet
Parallel and Distributed Computing Lec 1 & 2
32 pages
X. Mapping Techniques: 27 April, 2009
No ratings yet
X. Mapping Techniques: 27 April, 2009
27 pages
CONVERGE 06-Grid - Control
No ratings yet
CONVERGE 06-Grid - Control
55 pages
Python Parallel Processing and Multiproc
No ratings yet
Python Parallel Processing and Multiproc
10 pages
A Cluster Computer and Its Architecture
No ratings yet
A Cluster Computer and Its Architecture
3 pages
Cap 1 MMMM
No ratings yet
Cap 1 MMMM
3 pages
Multithreading Algorithms
No ratings yet
Multithreading Algorithms
36 pages
Anjum 2017 Cloud-Based Scalable Object Detection and Classification in Video Streams Accepted
No ratings yet
Anjum 2017 Cloud-Based Scalable Object Detection and Classification in Video Streams Accepted
35 pages
Teradata Architecture
No ratings yet
Teradata Architecture
3 pages
High Performance Computing-Question Bank PDF
No ratings yet
High Performance Computing-Question Bank PDF
4 pages
Cloud Computing - Piyushwairale
No ratings yet
Cloud Computing - Piyushwairale
31 pages
Computer Generations Overview
No ratings yet
Computer Generations Overview
12 pages
GPU History & CUDA Programming Basics
No ratings yet
GPU History & CUDA Programming Basics
44 pages
Real-Time Traffic Congestion Prediction
No ratings yet
Real-Time Traffic Congestion Prediction
5 pages
Cloud Computing Ebook
No ratings yet
Cloud Computing Ebook
165 pages
Cluster Computing: A Paper Presentation On
No ratings yet
Cluster Computing: A Paper Presentation On
16 pages
Christophe Bobda-Introduction To Reconfigurable Computing - Architectures, Algorithms and Applications (2007)
100% (2)
Christophe Bobda-Introduction To Reconfigurable Computing - Architectures, Algorithms and Applications (2007)
375 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
62 pages
Smart LEC
No ratings yet
Smart LEC
42 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
This Unit: Superscalar Execution: - Idea of Instruction-Level Parallelism - Superscalar Scaling Issues
No ratings yet
This Unit: Superscalar Execution: - Idea of Instruction-Level Parallelism - Superscalar Scaling Issues
13 pages
Fundamentals of Multicore Software Development PDF
No ratings yet
Fundamentals of Multicore Software Development PDF
322 pages
All Quiz
No ratings yet
All Quiz
31 pages
Lecture Notes Distributed System
100% (3)
Lecture Notes Distributed System
57 pages
18-Assignment 1 - Solution
No ratings yet
18-Assignment 1 - Solution
12 pages

Shared Memory Architecture Concepts and Performance Issues: Outline

Uploaded by

Shared Memory Architecture Concepts and Performance Issues: Outline

Uploaded by

Outline

Shared Memory Architecture Concepts and Performance Issues

Shared Memory vs. Distributed Memory

Shared Memory Variants

As done for single-processor systems, too

Can even be realized on top of distributed memory system

(NUMA non-uniform memory access)

Optimizing Programs for Improved Access Locality

Caches: Memory Update Strategies

for (i=0; i<N; i++) for (j=0; j<M; j++) a[ i ][ j ] = 0.0 ;

Memory Hierarchy Example

Cache Coherence and Memory Consistency

Cache Coherence Formal Definition

Cache Coherence Protocols

Details: Write-Invalidate Protocol

Write-Invalidate Protocol (cont.)

Write-back invalidation protocol (MSI-protocol)

MSI-Protocol: State Transitions

Cache lines / state

Cache lines / state I

BusRd x Memory blocks / state Memory blocks / state

Write requires invalidation

Core i Cache lines / state

Core j Store x: Cache lines / state M

Core i Load x: Cache lines / state

Cache lines / state S

BusRdX x Memory blocks

BusRd x Memory blocks

Cache lines / state M

Cache lines / state I

MSI-Protocol: State Transitions

CC-NUMA: Directory based Protocol for non-bus-based Architectures

Performance Issue: False Sharing

How to Avoid False Sharing?

Shared Memory Consistency Models

Consistency Models: Strict Consistency

Consistency Models: Sequential Consistency

Consistency Models: Weak Consistency

Consistency Models: Weak Consistency in OpenMP

Consistency Models: Weak Consistency in OpenMP (cont.)

Hardware/Software Approach. Morgan Kaufmann, 1998.

Quantitative Approach, Second edition (1996) or later. Morgan Kaufmann.

models: a tutorial. IEEE Computer, 1996.

You might also like