0% found this document useful (0 votes)

61 views21 pages

Stable Multithreading for Developers

Parrot is a practical runtime for deterministic, stable, and reliable threads. It provides fast performance and is easy to deploy through the use of effective performance hints. Parrot greatly improves reliability over traditional multithreading approaches by reducing the number of possible thread interleavings. It was evaluated on 108 popular programs and showed an average speedup of 6.9% while improving model checking coverage significantly.

Uploaded by

Sagnik Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views21 pages

Stable Multithreading for Developers

Uploaded by

Sagnik Ghosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Parrot: A Practical Runtime

for Deterministic, Stable, and

Reliable Threads

Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum,
Xinan Xu, Junfeng Yang, Garth Gibson, Randal Bryant
Columbia University Carnegie Mellon University
1
Parrot Preview
• Multithreading: hard to get right
– Key reason: too many thread interleavings, or schedules

• Techniques to reduce the number of schedule

– Deterministic Multithreading (DMT)
– Stable Multithreading (StableMT)
– Challenges: too slow or too complicated to deploy

• Parrot: a practical StableMT runtime

– Fast and deployable: effective performance hints
– Greatly improve reliability

• http://github.com/columbia/smt-mc
2
Too Many Schedules in Multithreading
• Schedule: a total order of synchronizations
• # of Schedules: exponential in both N and K
All schedules
• All inputs: much more schedules
// thread 1 ... // thread N
...; ...;
lock(m); lock(m);
N! schedules
(N!) K
schedules!
Each does K steps

...; ...;
unlock(m); unlock(m); Lower bound!
. .
. ... .
. . Checked schedules
lock(m); lock(m);
...; ...;
unlock(m); unlock(m); 3
Stable Multithreading (StableMT):
Reducing the number of schedules for all inputs [HotPar 13] [CACM
14]

– Benefits pretty much all reliability techniques

• E.g., improve precision of static analysis [Wu PLDI 12]
All schedules

// thread 1 ... // thread N

...; ...;
lock(m); lock(m);
...; ...;
unlock(m); unlock(m);
. .
. ... .
. . Checked schedules
lock(m); lock(m);
...; ...;
unlock(m); unlock(m); 4
Conceptual View
• Traditional multithreading
– Hard to understand, test, analyze, etc

• Stable Multithreading (StableMT)

– E.g., [Tern OSDI 10] [Determinator OSDI 10] [Peregrine
SOSP 11] [Dthreads SOSP 11]

• Deterministic Multithreading (DMT)

– E.g., [Dmp ASPLOS 09] [Kendo ASPLOS 09]
[CoreDet ASPLOS 10] [dOS OSDI 10]

• StableMT is better! [HotPar 13] [CACM 14]

5
Challenges of StableMT
• Performance challenge: slow
– Ignore load balance (e.g., [Dthreads SOSP 11): serialize
parallelism (5x slow down with 30% programs)

• Deployment challenge:
// thread 1 ... too
// complicated
thread N
...;
compute(); ...;
– Reuselock(m);
schedules (e.g., [Tern OSDI 10][Peregrine SOSP 11] [Ics
lock(m);
13]): sophisticated ...;
OOPSLA ...; program analysis
unlock(m); unlock(m);
. .compute();
. ... .
. .
lock(m); lock(m);
...; ...;
unlock(m); unlock(m); 6
Parrot Key Insight
• The 80-20 rule
– Most threads spend majority of their time in a
small number of core computations

• Solution for good performance

– The StableMT schedules only need to balance
these core computations

7
Parrot: A Practical StableMT Runtime
• Simple: a runtime system in user-space
– Enforce round-robin schedule for Pthreads synchronization

• Flexible: performance hints

– Soft barrier: Co-schedule threads at core computations
– Performance critical section: get through the section fast

• Practical: evaluate 108 popular programs

– Easy to use: 1.2 lines of hints, 0.5~2 hours per program
– Fast: 6.9% with 55 real-world programs, 12.7% for all
– Scalable: 24-core machine, different input sizes
– Reliable: Improve coverage of [Dbug SPIN 11] by 106 ~ 1019734

8
Outline
• Example

• Evaluation

• Conclusion

9
An Example based on PBZip2
int main(int argc, char *argv[]) {
for (i=0; i<atoi(argv[1]); ++i) // argv[1]: # of threads
pthread_create(…, consumer, 0);
for (i=0; i<atoi(argv[2]); ++i) { // argv[2]: # of file blocks
block = block_read(i, argv[3]); // argv[3]: file name
pthread_mutex_lock(&mu);
add(queue, block); enqueue(queue, block);
} pthread_cond_signal(&cv);
} pthread_mutex_unlock(&mu);
void *consumer(void *arg) {
for(;;) { // exit logic elided for clarity
pthread_mutex_lock(&mu);
block = get(queue); // blocking
// termination call
logic elided
compress(block); // core computation
while (empty(q))
} pthread_cond_wait(&cv, &mu);
} char *block = dequeue(q);
pthread_mutex_unlock(&mu);

10
The Serialization Problem
LD_PRELOAD=parrot.so pbzip 2 2 a.txt
int main(int argc, char *argv[]) {
for (i=0; i<atoi(argv[1]); ++i) main
consumer1 consumer2
pthread_create(…, consumer, 0); thread
for (i=0; i<atoi(argv[2]); ++i) { get() wait
block = block_read(i, argv[3]);
get() wait
add(queue, block);
} add()
}
runnable
void *consumer(void *arg) {
for(;;) { get() ret
block = get(queue); compress()
compress(block); add()
}
wdow n
7.7x slo !
runnable
}
Ob s e r
r
ved
i a l i ze d in a
Se re a d s get()
i t h 1 6 th
w yst e m . compress()
s s
previou
11
Adding Soft Barrier Hints
LD_PRELOAD=parrot.so pbzip 2 2 a.txt
int main(int argc, char *argv[]) {
soba_init(atoi(artv[1])); main
consumer1 consumer2
for (i=0; i<atoi(argv[1]); ++i) thread
pthread_create(…, consumer, 0); get() wait
for (i=0; i<atoi(argv[2]); ++i) {
get() wait
block = block_read(i, argv[3]);
add(queue, block); add()
}
} get() ret
void *consumer(void *arg) { soba_wait()
for(;;) {
block = get(queue); add()
soba_wait();
compress(block); get() ret
} soba_wait()
}
verhe ad!
0. 8% o compress() compress()
Only
12
Performance Hint: Soft Barrier
• Usage
– Co-schedule threads at core computations

• Interface
– void soba_init(int size, void *id = 0, int timeout = 20);
– void soba_wait(void *id = 0);

• Can also benefit

– Other similar systems, and traditional OS schedulers
13
Performance Hint:
Performance Critical Section (PCS)
• Motivation
– Optimize Low level synchronizations
– E.g., {lock(); x++; unlock();}

• Usage
– Get through these sections fast by ignoring round-robin

• Interface
– void pcs_enter();
– void pcs_exit();

• And can check

– Use model checking tools to completely check schedules in PCS
14
Evaluation Questions
• Performance of Parrot

• Effectiveness of performance hints

• Improvement on model checking coverage

15
Evaluation Setup
• A wide range of 108 programs: 10x more, and complete
– 55 real-world software: BerkeleyDB, OpenLDAP, MPlayer, etc.
– 53 benchmark programs: Parsec, Splash2x, Phoenix, NPB.
– Rich thread idioms: Pthreads, OpenMP, data partition, fork-join,
pipeline, map-reduce, and workpile.

• Concurrency setup
– Machine: 24 cores with Linux 3.2.0
– # of threads: 16 or 24

• Inputs
– At least 3 input sizes (small, medium, large) per program

16
Performance of Parrot
4
4

3.5
Normalized Execution Time
3

2.5

2
2

1.5

1
1

0.5

0
0

ImageMagick GNU C++ Parallel STL Parsec Splash2-x Phoenix NPB

pfscan
openldap

aget
mencoder
redis
berkeley db

pbzip2_compress
pbzip2_decompress

17
Effectiveness of Performance Hints
# programs # lines Overhead Overhead
requiring of hints /wo hints /w hints
hints
Soft barrier 81 87 484% 9.0%
Performance 9 22 830% 42.1%
critical section
Total 90 109 510% 11.9%

Time: 0.5~2 hours per program, mostly by inexperienced students.

# Lines: In average, 1.2 lines per program.
How: deterministic performance debugging + idiom patterns.
18
Improving Dbug’s Coverage
• Model checking: systematically explore schedules
– E.g., [Dpor POPL 05] [Explode OSDI 06] [MaceMC NSDI 07] [Chess OSDI 08] [Modist
NSDI 09] [Demeter SOSP 11] [Dbug SPIN 11]
– Challenge: state-space explosion  poor coverage

• Parrot+Dbug Integration
– Verified 99 of 108 programs under test setup (1 day)
• Dbug alone verified only 43
– Reduced the number of schedules for 56 programs by
106 ~ 1019734 (not a typo!)

19
Conclusion and Future Work
• Multithreading: too many schedules

• Parrot: a practical StableMT runtime system

– Well-defined round-robin synchronization schedules
– Performance hints: flexibly optimize performance

• Thorough evaluation
– Easy to use, fast, and scalable
– Greatly improve model checking coverage

• Broad application
– Current: static analysis, model checking
– Future: replication for distributed systems
20
Thank you! Questions?


Parrot: http://github.com/columbia/smt-mc
Lab: http://systems.cs.columbia.edu

Parrot
No ratings yet
Parrot
22 pages
CS241 System Programming: Discussion Section 4 Feb 13 - Feb 16
No ratings yet
CS241 System Programming: Discussion Section 4 Feb 13 - Feb 16
31 pages
Pthreads Mod
No ratings yet
Pthreads Mod
110 pages
Programming Shared Address Space Platforms
No ratings yet
Programming Shared Address Space Platforms
44 pages
Fully Dynamic Scheduler For Numerical Computing On Multicore Processors
No ratings yet
Fully Dynamic Scheduler For Numerical Computing On Multicore Processors
10 pages
MAP - Unit2
No ratings yet
MAP - Unit2
134 pages
Pthreads Programming Lecture Notes
No ratings yet
Pthreads Programming Lecture Notes
34 pages
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
No ratings yet
Programming Shared Address Space Platforms: Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
67 pages
Multi Threading
No ratings yet
Multi Threading
11 pages
Unit 1
No ratings yet
Unit 1
11 pages
Concurrency Analysis Report
No ratings yet
Concurrency Analysis Report
42 pages
An Auntomata Theoretical
No ratings yet
An Auntomata Theoretical
10 pages
Chapter2.2-Thread Concurrent
No ratings yet
Chapter2.2-Thread Concurrent
2 pages
02AS1M AdvancedScheduler Report 20224291 20224328
No ratings yet
02AS1M AdvancedScheduler Report 20224291 20224328
58 pages
Concurrent Data Structures Report
No ratings yet
Concurrent Data Structures Report
23 pages
Module 1
No ratings yet
Module 1
11 pages
Multithreading Analysis and Its Challanges: Ms-Cs 3 Semester
No ratings yet
Multithreading Analysis and Its Challanges: Ms-Cs 3 Semester
10 pages
High Performance Computing
No ratings yet
High Performance Computing
67 pages
Dynamic Simultaneous Multithreading
No ratings yet
Dynamic Simultaneous Multithreading
11 pages
Csci4061 Final Exam Practice Sol
No ratings yet
Csci4061 Final Exam Practice Sol
13 pages
Swapnil PDC
No ratings yet
Swapnil PDC
43 pages
#Include #Include #Define
No ratings yet
#Include #Include #Define
8 pages
Printout
No ratings yet
Printout
8 pages
CONCLUSION (Multi-Threading)
No ratings yet
CONCLUSION (Multi-Threading)
6 pages
OS Journal
No ratings yet
OS Journal
21 pages
Chapter2.2-Thread Question
No ratings yet
Chapter2.2-Thread Question
3 pages
Ex - No:1 Multiprocessor Operating Systems-Date: Semaphores
No ratings yet
Ex - No:1 Multiprocessor Operating Systems-Date: Semaphores
30 pages
Chapter 4
No ratings yet
Chapter 4
45 pages
15CSE213 - II Ass Feb 2019
No ratings yet
15CSE213 - II Ass Feb 2019
2 pages
Wa0010
No ratings yet
Wa0010
42 pages
DC Lab Manual
No ratings yet
DC Lab Manual
25 pages
OS Module 1 Slides-2
No ratings yet
OS Module 1 Slides-2
47 pages
02AS2P AdvancedScheduler Report 20224302 20224300
No ratings yet
02AS2P AdvancedScheduler Report 20224302 20224300
47 pages
Midterm Fall2012Solutions
No ratings yet
Midterm Fall2012Solutions
6 pages
Dchuynh HW4
No ratings yet
Dchuynh HW4
5 pages
03 Process Part2
No ratings yet
03 Process Part2
92 pages
Module - 6
No ratings yet
Module - 6
89 pages
IT 201 Chapter 4 Summarize
No ratings yet
IT 201 Chapter 4 Summarize
2 pages
Os Project Report: Computer Science and Engineering
No ratings yet
Os Project Report: Computer Science and Engineering
26 pages
Qthreads PDF
No ratings yet
Qthreads PDF
8 pages
Parallel Computing
100% (1)
Parallel Computing
12 pages
L7 Multicore 2
No ratings yet
L7 Multicore 2
22 pages
OS and CD Lab Manual: Objective
No ratings yet
OS and CD Lab Manual: Objective
48 pages
Lecture 25
No ratings yet
Lecture 25
41 pages
Mid 19
No ratings yet
Mid 19
3 pages
Pthread and Semaphore Programming Guide
No ratings yet
Pthread and Semaphore Programming Guide
24 pages
Lab Journal Os Simulation Updated
No ratings yet
Lab Journal Os Simulation Updated
7 pages
Hardware Multithreading
No ratings yet
Hardware Multithreading
22 pages
Lecture 16
No ratings yet
Lecture 16
30 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Pdclab 9
No ratings yet
Pdclab 9
9 pages
Experiment 35: Illustration of Thread Management On Windows-Nt AIM: To Write A Program To Illustrate of Thread Management Functions Theory
No ratings yet
Experiment 35: Illustration of Thread Management On Windows-Nt AIM: To Write A Program To Illustrate of Thread Management Functions Theory
10 pages
Operating System Chapter 4
No ratings yet
Operating System Chapter 4
6 pages
DS Lab Manual LP V
No ratings yet
DS Lab Manual LP V
51 pages
Lect9 Pthread
No ratings yet
Lect9 Pthread
24 pages
Hardware Multithreading: Student Full Name Institutional Affiliation Course Full Title Instructor Full Name Due Date
No ratings yet
Hardware Multithreading: Student Full Name Institutional Affiliation Course Full Title Instructor Full Name Due Date
15 pages
Lecture 4
No ratings yet
Lecture 4
38 pages
Daa 1
No ratings yet
Daa 1
40 pages
OS and CD Lab Manual
67% (3)
OS and CD Lab Manual
54 pages
First Floor Duty Schedule Sep
No ratings yet
First Floor Duty Schedule Sep
5 pages
PHD Interview April27
No ratings yet
PHD Interview April27
8 pages
Val
No ratings yet
Val
58 pages
Journal ISSN and E-ISSN Directory
No ratings yet
Journal ISSN and E-ISSN Directory
179 pages
Nursery Booklist 2023
No ratings yet
Nursery Booklist 2023
1 page
Ijhi0 202 Contents Page
No ratings yet
Ijhi0 202 Contents Page
1 page
OOPJ Question Bank
No ratings yet
OOPJ Question Bank
1 page
Queues Fairness and The Go Scheduler V3
No ratings yet
Queues Fairness and The Go Scheduler V3
145 pages
Multi Programming
No ratings yet
Multi Programming
6 pages
Message Queue - Kafka
No ratings yet
Message Queue - Kafka
8 pages
Parallelism in Microprocessor
No ratings yet
Parallelism in Microprocessor
17 pages
CS 345 Project 3 - Jurassic Park
No ratings yet
CS 345 Project 3 - Jurassic Park
8 pages
OS Module 2
No ratings yet
OS Module 2
44 pages
Parallel Programming Using MPI
No ratings yet
Parallel Programming Using MPI
69 pages
Shared Memory Programming Pthreads: DR Matthew Grove
No ratings yet
Shared Memory Programming Pthreads: DR Matthew Grove
41 pages
Spark for Data Engineers
No ratings yet
Spark for Data Engineers
10 pages
Cse 325 - Operat Lab
No ratings yet
Cse 325 - Operat Lab
2 pages
Unit 4 Parallel Computing
No ratings yet
Unit 4 Parallel Computing
8 pages
Cloud Computing
No ratings yet
Cloud Computing
30 pages
Chapter 5 PDF
No ratings yet
Chapter 5 PDF
11 pages
Lecture 8-OpenMP Basics-Continued
No ratings yet
Lecture 8-OpenMP Basics-Continued
53 pages
Yozo Log
No ratings yet
Yozo Log
6 pages
MPI Parallel Programming Guide
No ratings yet
MPI Parallel Programming Guide
67 pages
Chapter2 Mutex BasicTopics
No ratings yet
Chapter2 Mutex BasicTopics
99 pages
Chapter 2 Multithreading - Minh
No ratings yet
Chapter 2 Multithreading - Minh
43 pages
Distributed Mutex
No ratings yet
Distributed Mutex
24 pages
Multilevel Queue Scheduling Algorithm
No ratings yet
Multilevel Queue Scheduling Algorithm
9 pages
Design Issues: SMT and CMP Architectures
No ratings yet
Design Issues: SMT and CMP Architectures
9 pages
Python Parallel Programming Cookbook - Sample Chapter
67% (3)
Python Parallel Programming Cookbook - Sample Chapter
39 pages
DC - Unit 3 - Mutex and Deadlock Notes
No ratings yet
DC - Unit 3 - Mutex and Deadlock Notes
27 pages
BSC - Computer Science Cs - Semester 5 - 2023 - April - Operating Systems I 2019 Pattern
No ratings yet
BSC - Computer Science Cs - Semester 5 - 2023 - April - Operating Systems I 2019 Pattern
2 pages
Transactions and Concurrency Control
100% (1)
Transactions and Concurrency Control
7 pages
Critical Section Solutions in OS
No ratings yet
Critical Section Solutions in OS
14 pages
Semphores
No ratings yet
Semphores
34 pages
CSC3150 Assignment 1
No ratings yet
CSC3150 Assignment 1
7 pages
Web Application Architecture
No ratings yet
Web Application Architecture
13 pages

Stable Multithreading for Developers

Uploaded by

Stable Multithreading for Developers

Uploaded by

Parrot: A Practical Runtime

for Deterministic, Stable, and

• Techniques to reduce the number of schedule

• Parrot: a practical StableMT runtime

– Benefits pretty much all reliability techniques

// thread 1 ... // thread N

• Stable Multithreading (StableMT)

• Deterministic Multithreading (DMT)

• StableMT is better! [HotPar 13] [CACM 14]

• Solution for good performance

• Flexible: performance hints

• Practical: evaluate 108 popular programs

• Can also benefit

• And can check

• Effectiveness of performance hints

• Improvement on model checking coverage

ImageMagick GNU C++ Parallel STL Parsec Splash2-x Phoenix NPB

Time: 0.5~2 hours per program, mostly by inexperienced students.

• Parrot: a practical StableMT runtime system

You might also like