0% found this document useful (0 votes)

65 views64 pages

Transactional Memory: Companion Slides For by Maurice Herlihy & Nir Shavit

Concurrent programming is still too hard. Here we explore why this is. And common-sense observations. And what we can do about it.

Uploaded by

Tran Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views64 pages

Transactional Memory: Companion Slides For by Maurice Herlihy & Nir Shavit

Concurrent programming is still too hard. Here we explore why this is. And common-sense observations. And what we can do about it.

Uploaded by

Tran Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

Transactional Memory

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Our Vision for the Future

In this course, we covered . Best practices New and clever ideas And common-sense observations.

Art of Multiprocessor Programming

Our Vision for the Future

In this course, we covered . Nevertheless Best practices Concurrent programming is still too hard New and clever ideas Here we explore why this is . And common-sense observations. And what we can do about it.

Art of Multiprocessor Programming

A FIFO Queue
Head Tail

d
Enqueue(d)

Dequeue() => a

A Concurrent FIFO Queue

Simple Code, easy to prove correct

Head

Tail

Object lock
a b c d
Q: Enqueue(d)

P: Dequeue() => a

Contention and sequential bottleneck

Fine Grain Locks

Finer Granularity, More Complex Code

Head

Tail

d
Q: Enqueue(d)

P: Dequeue() => a

Verification nightmare: worry about deadlock, livelock

Fine Grain Locks

Complex boundary cases: empty queue, last item

Head

Tail

a b

b c

d
Q: Enqueue(b)

P: Dequeue() => a

Worry how to acquire multiple locks

Locking Relies on Conventions

Relation between
Actual comment Lock bit and object bits from Linux Kernel (hat tip: Bradley Kuszmaul) Exists only in programmers mind
/* * When a locked buffer is visible to the I/O layer * BH_Launder is set. This means before unlocking * we must clear BH_Launder,mb() on alpha and then * clear BH_Lock, so no reader can see BH_Launder set * on an unlocked buffer and then risk to deadlock. */

2006 Herlihy & Shavit

Lock-Free (JDK6.0)
Even Finer Granularity, Even More Complex Code

Head

Tail

d
Q: Enqueue(d)

P: Dequeue() => a

Worry about starvation, subtle bugs, hardness to modify

Real Applications
Complex: Move data atomically between structures
Head Tail

a
P: Dequeue(Q1,a) Enqueue(Q2,a) Head

Tail

More than twice the worry

Transactional Memory
[HerlihyMoss93]

Promise of Transactional Memory

Great Performance, Simple Code

Head

Tail

d
Q: Enqueue(d)

P: Dequeue() => a

Dont worry about deadlock, livelock, subtle bugs, etc

Promise of Transactional Memory

Dont worry which locks need to cover which variables when
Head Tail

a b

b c

d
Q: Enqueue(d)

P: Dequeue() => a

TM deals with boundary cases under the hood

For Real Applications

Will be easy to modify multiple structures atomically
Head Tail

a
P: Dequeue(Q1,a) Enqueue(Q2,a) Head

Tail

Provide Serializability

Using Transactional Memory

enqueue (Q, newnode) { Q.tail-> next = newnode Q.tail = newnode }

Using Transactional Memory

enqueue (Q, newnode) { atomic{ Q.tail-> next = newnode Q.tail = newnode } }

Transactions Will Solve Many of Locks Problems

No need to think what needs to be locked, what not, and at what granularity No worry about deadlocks and livelocks

No need to think about read-sharing

Can compose concurrent objects in a way that is safe and scalable

Hardware Transactional Memory

Exploit Cache coherence Already almost does it
Invalidation Consistency checking

Speculative execution
Branch prediction = optimistic synch!

Art of Multiprocessor Programming

HW Transactional Memory
read
active

T
caches
Interconnect

memory
Art of Multiprocessor Programming 19

Transactional Memory
active

read

active
T T
caches

memory
Art of Multiprocessor Programming 20

Transactional Memory
active committed
active
T T
caches

memory
Art of Multiprocessor Programming 21

Transactional Memory
committed

write

active
T D caches

memory
Art of Multiprocessor Programming 22

Rewind
aborted active

write

active
T T D caches

memory
Art of Multiprocessor Programming 23

Transaction Commit
At commit point
If no cache conflicts, we win.

Mark transactional entries

Read-only: valid Modified: dirty (eventually written back)

Thats all, folks!

Except for a few details
Art of Multiprocessor Programming 24

Not all Skittles and Beer

Limits to
Transactional cache size Scheduling quantum

Transaction cannot commit if it is

Too big Too slow Actual limits platform-dependent
Art of Multiprocessor Programming 25

HTM Strengths & Weaknesses

Ideal for lock-free data structures

HTM Strengths & Weaknesses

Ideal for lock-free data structures Practical proposals have limits on
Transaction size and length Bounded HW resources Guarantees vs best-effort

HTM Strengths & Weaknesses

Ideal for lock-free data structures Practical proposals have limits on
Transaction size and length Bounded HW resources Guarantees vs best-effort

On fail
Diagnostics essential Retry in software?

Software Transactional Memory

[ShavitTouitou94]
The semantics of hardware transactionstoday

Tomorrow: serve as a standard interface to hardware

Allow to extend hardware features when they arrive Todays focus Still, we need to have reasonable performance

The Brief History of STM

2007-9New lock based STMs from IBM, Intel, Sun, Microsoft

Lock-free

Obstruction-free

Lock-based

As Good As Fine Grained Locking

Postulate (i.e. take it or leave it): If we could implement fine-grained locking with the same simplicity of course grained, we would never think of building a transactional memory.
Implication: Lets try to provide STMs that get as close as possible to hand-crafted fine-grained locking.

Transactional Consistency
Memory Transactions are collections of reads and writes executed atomically Tranactions should maintain internal and external consistency
External: with respect to the interleavings of other transactions. Internal: the transaction itself should operate on a consistent state.

External Consistency
Invariant x = 2y

4 X 8 2 Y 4

Transaction A: Write x Write y Transaction B: Read x Read y Compute z = 1/(x-y) = 1/4

Application Memory

Locking STM Design Choices

Map

Array of VersionedWrite-Locks

Application Memory V#

PS = Lock per Stripe (separate array of locks)

PO = Lock per Object (embedded in object)

Encounter Order Locking (Undo Log)

Mem Locks
Blue code does not change memory, red does

V# V#
X Y V# V# V# V# V# V# V#

0 0
0 0 0 0 0 0 0

V#+1 0 V#+1 0 V# V# 1 V# V#+1 0 V#+1 V# 1 0

1. 2. 3. 4. 5. 6.

To Read: load lock + location Check unlocked add to Read-Set To Write: lock location, store value Add old value to undo-set Validate read-set v#s unchanged Release each lock with v#+1 Quick read of values freshly written by the reading transaction

Commit Time Locking (Write Log)

Mem Locks V# V# V# V#+1 V# V# V#+1 V# V# V# V#+1 V#+1 V#+1 V# V# V# V# V# V# V# V# 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1. 2. 3. 4. 5. 6. 7. To Read: load lock + location Location in write-set? (Bloom Filter) Check unlocked add to Read-Set To Write: add value to write set Acquire Locks Validate read/write v#s unchanged Release each lock with v#+1 Hold locks for very short duration

X X Y Y

COM vs. ENC High Load

Red-Black Tree 20% Delete 20% Update 60% Lookup

Hand COM

ENC Lock

COM vs. ENC Low Load

Red-Black Tree 5% Delete 5% Update 90% Lookup

Hand COM ENC

Lock

Problem: Internal Inconsistency

A Zombie is a currently active transaction that is destined to abort because it saw an inconsistent state If Zombies see inconsistent states errors can occur and the fact that the transaction will eventually abort does not save us

Internal Inconsistency
Invariant x = 2y

4 8 X 2 4 Y

Transaction B: Read x = 4

Transaction A: Write x Write y

Transaction B: Read y = 4 {trans is zombie} Compute z = 1/(x-y) DIV by 0 ERROR

Application Memory

Managed Environment Approaches

1. Design STMs that allow internal inconsistency. 2. To detect zombies introduce validation into user code at fixed intervals or loops, used traps, OS support 3. Still there are cases where zombies cannot be detected infinite loops in user code

TL2 STM: Use a Global Clock

Have a shared global version clock Incremented by writing transactions (as infrequently as possible) Read by all transactions Used to validate that the state viewed by a transaction is always consistent

TL2 Version Clock: Read-Only Trans

Mem Locks 100 Vclock (shared)

87 87
34 34 34 88 88 V# 99 99 44 44 50 50 V#

0
0 0 0

0
0 0 100

1. RV VClock 2. To Read: read lock, read mem, read lock, check unlocked, unchanged, and v# <= RV 3. Commit.
Reads form a snapshot of memory. No read set! RV (private)

TL2 Version Clock: Writing Trans

Mem Locks 121 120 100 VClock

X X Y Y

87 87 87 121 34 34 121 88 88
V# 121 99 121 44 44 50 V# 50 V# 50

0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
0 0 0 100 RV

1. RV VClock 2. To Read/Write: check unlocked and v# <= RV then add to Read/Write-Set 3. Acquire Locks 4. WV = F&I(VClock) 5. Validate each v# <= RV 6. Release locks with v# WV Reads+Inc+Writes =serializable

Commit

How we learned to stop worrying and love the clock

Version clock rate is a progress concern, not a safety concern, so ..
(GV4) if failed to increment VClock using CAS use VClock set by winner (GV5) use WV = VClock + 2; inc VClock on abort (GV7) localized clocks [AvniShavit08]

Uncontended Large Red-Black Tree Hand5% Delete 5% Update 90% Lookup

crafted TL/PO TL2/P0 encounter TL/PS TL2/PS Lockfree

Contended Small RB-Tree

30% Delete 30% Update 40% Lookup

TL/P0

TL2/P0
encounter

Locking performs well > #cores

TL/PS

TL/P0

Lockfree 16 Processors

Implicit Privatization [Menon et al]

In real apps: often want to privatize data Then operate on it non-transactionally Many STMs (like TL2) based on Invisible Readers Invisible Readers/Writers are a problem if we want implicit privatization

Privatization Pathology
P privatizes node b then modifies it non-transactionally

P
a 0 b

P: atomically{ a.next = c; } // b is private b.value = 0;

Privatization Pathology
Invisible reader Q cannot detect non-transactional modification to node b P
a 0 b c d

Q
P: atomically{ a.next = c; } // b is private b.value = 0;

Q: divide by 0 error

Q: Q: atomically{ atomically{ tmp tmp = = a.next; a.next; foo foo = = (1/tmp.value) (1/tmp.value) } }

Solving the Privatization Problem

Visible Writers
Reads are made aware of overlapping writes
P
b

Visible Readers
Writes are made aware of overlapping reads

Where we are heading

A lot more work on STM performance Think GC, game just begun
Improve single threaded performance Amazing possibilities for compiler optimization OS support

Explosion of new STMs

~100 TM papers in last couple of years

A bit further down the road

Transactional Languages
No Implicit Privatization Problem Composability

And when hardware TM arrives

Contention management New possibilities for extending and interfacing

Remember 1993?

TM Today

93,300

Second Opinion

2,210,000

Hatin on TM

STM is too inefficient

Hatin on TM

Requires radical change in programming style

Hatin on TM

Erlang-style shared nothing only true path to salvation

Hatin on TM

There is nothing wrong with what we do today.

Gartner Hype Cycle

Hat tip: Jeremy Kemp

Multicores are here

Toda,Thanks!

Art of Multiprocessor Programming

F08 - Transactional Memory in Clojure and C
No ratings yet
F08 - Transactional Memory in Clojure and C
44 pages
Unlocking Concurrency: Computer Architecture
No ratings yet
Unlocking Concurrency: Computer Architecture
10 pages
Transactional Memory: David Chisnall
No ratings yet
Transactional Memory: David Chisnall
21 pages
Transactional Locking II
No ratings yet
Transactional Locking II
15 pages
Transactional Memory: Architectural Support For Lock-Free Data Structures
No ratings yet
Transactional Memory: Architectural Support For Lock-Free Data Structures
12 pages
Herlihy 93 Transactional
No ratings yet
Herlihy 93 Transactional
12 pages
Software Transactional Memory Introductory Paper
No ratings yet
Software Transactional Memory Introductory Paper
18 pages
TCC Thesis BDC Defense
No ratings yet
TCC Thesis BDC Defense
51 pages
09 Indexconcurrency
No ratings yet
09 Indexconcurrency
3 pages
Lecture 5 Slides
No ratings yet
Lecture 5 Slides
39 pages
Extend HTM with Rollback Transactions
No ratings yet
Extend HTM with Rollback Transactions
58 pages
Project #3 - Concurrency Control - CMU 15-445 - 645 - Intro To Database Systems (Fall 2017)
No ratings yet
Project #3 - Concurrency Control - CMU 15-445 - 645 - Intro To Database Systems (Fall 2017)
1 page
Transactional Memory Explained
No ratings yet
Transactional Memory Explained
62 pages
Multiprocessor Architectures & Cache Coherence
No ratings yet
Multiprocessor Architectures & Cache Coherence
54 pages
6s081 Lec Locks
No ratings yet
6s081 Lec Locks
33 pages
Imp Question Notes
No ratings yet
Imp Question Notes
8 pages
2007 Tocs
No ratings yet
2007 Tocs
61 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Tomhart Thesis Memory Reclamation
No ratings yet
Tomhart Thesis Memory Reclamation
104 pages
Compiler
No ratings yet
Compiler
12 pages
ICLP c5 CPP STM
No ratings yet
ICLP c5 CPP STM
20 pages
Lecture 25
No ratings yet
Lecture 25
41 pages
L09 CDS LockBasedNLockFree
No ratings yet
L09 CDS LockBasedNLockFree
44 pages
Stmbench7 Report
No ratings yet
Stmbench7 Report
17 pages
Concurrent Programming Without Locks
No ratings yet
Concurrent Programming Without Locks
59 pages
Merged 2
No ratings yet
Merged 2
21 pages
R12 U5 MultiProcessor Architectures
No ratings yet
R12 U5 MultiProcessor Architectures
47 pages
rdbtf05 Locking
No ratings yet
rdbtf05 Locking
24 pages
Concurrent Data Structures Method
No ratings yet
Concurrent Data Structures Method
17 pages
CH 4 Synchronization Models of Memory Consistency
100% (1)
CH 4 Synchronization Models of Memory Consistency
26 pages
Multiprocessor Synchronization Guide
No ratings yet
Multiprocessor Synchronization Guide
9 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
16 Synchronization
No ratings yet
16 Synchronization
29 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Unit 3 - Memory Organization
No ratings yet
Unit 3 - Memory Organization
98 pages
(MIT 6.1800) Spring 2025 Notes
No ratings yet
(MIT 6.1800) Spring 2025 Notes
17 pages
Distributed OS: Memory & Multiprocessors
No ratings yet
Distributed OS: Memory & Multiprocessors
89 pages
L Lockv2
No ratings yet
L Lockv2
4 pages
Transaction 3
No ratings yet
Transaction 3
26 pages
CompArch Most Important Questions
No ratings yet
CompArch Most Important Questions
12 pages
Written Asst5
No ratings yet
Written Asst5
29 pages
Transactional Memory: Architectural Support For Lock-Free Data Structures
No ratings yet
Transactional Memory: Architectural Support For Lock-Free Data Structures
34 pages
Coos Short
No ratings yet
Coos Short
19 pages
Spin Locks and Contention
No ratings yet
Spin Locks and Contention
53 pages
Mpi
No ratings yet
Mpi
13 pages
11 Memory
No ratings yet
11 Memory
41 pages
Computer Organisation and Architecture PYQ
No ratings yet
Computer Organisation and Architecture PYQ
14 pages
Chapter 4-Concrruncy Controling Techniques
No ratings yet
Chapter 4-Concrruncy Controling Techniques
36 pages
How Ubisoft Montreal Develops Games For Multicore - Before and After C++11 - Jeff Preshing - CppCon 2014
No ratings yet
How Ubisoft Montreal Develops Games For Multicore - Before and After C++11 - Jeff Preshing - CppCon 2014
72 pages
AUTOSAR Memory Stack
No ratings yet
AUTOSAR Memory Stack
31 pages
Transactional Memory PHD Thesis
100% (3)
Transactional Memory PHD Thesis
7 pages
Computer Architecture Important Thing
No ratings yet
Computer Architecture Important Thing
7 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
109 pages
Implementing Locks: How To Write Correct Concurrent Programs? No Race
No ratings yet
Implementing Locks: How To Write Correct Concurrent Programs? No Race
4 pages
CC Part 2 No Quizzes
No ratings yet
CC Part 2 No Quizzes
94 pages
Concurrency Primer
No ratings yet
Concurrency Primer
12 pages
Can Seqlocks Get Along With Programming Language Memory
No ratings yet
Can Seqlocks Get Along With Programming Language Memory
10 pages
Computer Architecture Basics
No ratings yet
Computer Architecture Basics
24 pages
Concurrent Objects: Companion Slides For by Maurice Herlihy & Nir Shavit
No ratings yet
Concurrent Objects: Companion Slides For by Maurice Herlihy & Nir Shavit
184 pages
Universality of Consensus: Companion Slides For by Maurice Herlihy & Nir Shavit
No ratings yet
Universality of Consensus: Companion Slides For by Maurice Herlihy & Nir Shavit
109 pages
Chapter 09
No ratings yet
Chapter 09
276 pages
Chapter 05
No ratings yet
Chapter 05
186 pages
Chapter 13
No ratings yet
Chapter 13
135 pages
Dell Precision M6800 Brochure Brosur
No ratings yet
Dell Precision M6800 Brochure Brosur
2 pages
Thunder Match Technology SDN BHD (541512-U) : All Prices Are Valid For Selected Branches
No ratings yet
Thunder Match Technology SDN BHD (541512-U) : All Prices Are Valid For Selected Branches
1 page
8088 I/O System with 8255 PPI Assembly
No ratings yet
8088 I/O System with 8255 PPI Assembly
9 pages
Controller - PM866
No ratings yet
Controller - PM866
4 pages
CSC204 - Chapter 1.1
No ratings yet
CSC204 - Chapter 1.1
25 pages
Pioneer Pro 607pu
No ratings yet
Pioneer Pro 607pu
39 pages
Last Class: Memory Management Today: Relocation & Paging
No ratings yet
Last Class: Memory Management Today: Relocation & Paging
8 pages
Apple US Education Institution Price List
No ratings yet
Apple US Education Institution Price List
55 pages
Techcom Price List 10 12 2024
No ratings yet
Techcom Price List 10 12 2024
37 pages
Memory and Storage Devices
No ratings yet
Memory and Storage Devices
41 pages
IdeaPad 5 Pro 16ACH6 Spec
No ratings yet
IdeaPad 5 Pro 16ACH6 Spec
7 pages
Manual PDF 2111254
No ratings yet
Manual PDF 2111254
42 pages
Operating System Question Bank For ESE
No ratings yet
Operating System Question Bank For ESE
6 pages
PPC-R0 .2: Project Planning Manual
No ratings yet
PPC-R0 .2: Project Planning Manual
52 pages
Rekall and GRR: Searching For Evil, Together!
No ratings yet
Rekall and GRR: Searching For Evil, Together!
73 pages
Unit 5 DLD
No ratings yet
Unit 5 DLD
189 pages
Operating System MCQ
No ratings yet
Operating System MCQ
85 pages
Nutanix Study Notes (Part 1) - InfraPCS
No ratings yet
Nutanix Study Notes (Part 1) - InfraPCS
7 pages
Best Practices - v2 1
0% (1)
Best Practices - v2 1
11 pages
IT Video Exercises & Tests
No ratings yet
IT Video Exercises & Tests
19 pages
Computer Practice N4 Study Guide. (CTC)
100% (6)
Computer Practice N4 Study Guide. (CTC)
97 pages
MSI MS-7918 Tech Specs
No ratings yet
MSI MS-7918 Tech Specs
53 pages
Aoc 1619swa+Service+Manual
No ratings yet
Aoc 1619swa+Service+Manual
51 pages
PH.D Computer Application (Section A) (Part 1)
No ratings yet
PH.D Computer Application (Section A) (Part 1)
24 pages
USB Compatibility Guide
No ratings yet
USB Compatibility Guide
6 pages
MINI F56 ECU Flashing Guide
No ratings yet
MINI F56 ECU Flashing Guide
20 pages
RS232 Support: CNC: Fanuc 6MA/TA CNC Parameters
No ratings yet
RS232 Support: CNC: Fanuc 6MA/TA CNC Parameters
2 pages
H-Bridge l9960t
No ratings yet
H-Bridge l9960t
95 pages
MONTRES-NVM: An External Sorting Algorithm For Hybrid Memory
No ratings yet
MONTRES-NVM: An External Sorting Algorithm For Hybrid Memory
6 pages
Onstat Utility for IBM Informix IDS
No ratings yet
Onstat Utility for IBM Informix IDS
52 pages

Transactional Memory: Companion Slides For by Maurice Herlihy & Nir Shavit

Uploaded by

Transactional Memory: Companion Slides For by Maurice Herlihy & Nir Shavit

Uploaded by

Transactional Memory

Our Vision for the Future

Art of Multiprocessor Programming

Our Vision for the Future

Art of Multiprocessor Programming

A Concurrent FIFO Queue

Contention and sequential bottleneck

Fine Grain Locks

Verification nightmare: worry about deadlock, livelock

Fine Grain Locks

Worry how to acquire multiple locks

Locking Relies on Conventions

2006 Herlihy & Shavit

Worry about starvation, subtle bugs, hardness to modify

More than twice the worry

Promise of Transactional Memory

Dont worry about deadlock, livelock, subtle bugs, etc

Promise of Transactional Memory

TM deals with boundary cases under the hood

For Real Applications

Using Transactional Memory

Using Transactional Memory

Transactions Will Solve Many of Locks Problems

No need to think about read-sharing

Hardware Transactional Memory

Art of Multiprocessor Programming

Mark transactional entries

Thats all, folks!

Not all Skittles and Beer

Transaction cannot commit if it is

HTM Strengths & Weaknesses

HTM Strengths & Weaknesses

HTM Strengths & Weaknesses

Software Transactional Memory

Tomorrow: serve as a standard interface to hardware

The Brief History of STM

2007-9New lock based STMs from IBM, Intel, Sun, Microsoft

As Good As Fine Grained Locking

Transaction A: Write x Write y Transaction B: Read x Read y Compute z = 1/(x-y) = 1/4

Locking STM Design Choices

PS = Lock per Stripe (separate array of locks)

PO = Lock per Object (embedded in object)

Encounter Order Locking (Undo Log)

V#+1 0 V#+1 0 V# V# 1 V# V#+1 0 V#+1 V# 1 0

Commit Time Locking (Write Log)

COM vs. ENC High Load

COM vs. ENC Low Load

Hand COM ENC

Problem: Internal Inconsistency

Transaction A: Write x Write y

Managed Environment Approaches

TL2 STM: Use a Global Clock

TL2 Version Clock: Read-Only Trans

TL2 Version Clock: Writing Trans

How we learned to stop worrying and love the clock

Uncontended Large Red-Black Tree Hand5% Delete 5% Update 90% Lookup

crafted TL/PO TL2/P0 encounter TL/PS TL2/PS Lockfree

Contended Small RB-Tree

Locking performs well > #cores

Implicit Privatization [Menon et al]

P: atomically{ a.next = c; } // b is private b.value = 0;

Solving the Privatization Problem

Where we are heading

Explosion of new STMs

A bit further down the road

And when hardware TM arrives

STM is too inefficient

Requires radical change in programming style

Erlang-style shared nothing only true path to salvation

There is nothing wrong with what we do today.

Gartner Hype Cycle

Hat tip: Jeremy Kemp

Multicores are here

Art of Multiprocessor Programming

You might also like