0% found this document useful (0 votes)
61 views21 pages

Stable Multithreading for Developers

Parrot is a practical runtime for deterministic, stable, and reliable threads. It provides fast performance and is easy to deploy through the use of effective performance hints. Parrot greatly improves reliability over traditional multithreading approaches by reducing the number of possible thread interleavings. It was evaluated on 108 popular programs and showed an average speedup of 6.9% while improving model checking coverage significantly.

Uploaded by

Sagnik Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views21 pages

Stable Multithreading for Developers

Parrot is a practical runtime for deterministic, stable, and reliable threads. It provides fast performance and is easy to deploy through the use of effective performance hints. Parrot greatly improves reliability over traditional multithreading approaches by reducing the number of possible thread interleavings. It was evaluated on 108 popular programs and showed an average speedup of 6.9% while improving model checking coverage significantly.

Uploaded by

Sagnik Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Parrot: A Practical Runtime

for Deterministic, Stable, and


Reliable Threads

Heming Cui, Jiri Simsa, Yi-Hong Lin, Hao Li, Ben Blum,
Xinan Xu, Junfeng Yang, Garth Gibson, Randal Bryant
Columbia University Carnegie Mellon University
1
Parrot Preview
• Multithreading: hard to get right
– Key reason: too many thread interleavings, or schedules

• Techniques to reduce the number of schedule


– Deterministic Multithreading (DMT)
– Stable Multithreading (StableMT)
– Challenges: too slow or too complicated to deploy

• Parrot: a practical StableMT runtime


– Fast and deployable: effective performance hints
– Greatly improve reliability

• http://github.com/columbia/smt-mc
2
Too Many Schedules in Multithreading
• Schedule: a total order of synchronizations
• # of Schedules: exponential in both N and K
All schedules
• All inputs: much more schedules
// thread 1 ... // thread N
...; ...;
lock(m); lock(m);
N! schedules
(N!) K
schedules!
Each does K steps

...; ...;
unlock(m); unlock(m); Lower bound!
. .
. ... .
. . Checked schedules
lock(m); lock(m);
...; ...;
unlock(m); unlock(m); 3
Stable Multithreading (StableMT):
Reducing the number of schedules for all inputs [HotPar 13] [CACM
14]

– Benefits pretty much all reliability techniques


• E.g., improve precision of static analysis [Wu PLDI 12]
All schedules

// thread 1 ... // thread N


...; ...;
lock(m); lock(m);
...; ...;
unlock(m); unlock(m);
. .
. ... .
. . Checked schedules
lock(m); lock(m);
...; ...;
unlock(m); unlock(m); 4
Conceptual View
• Traditional multithreading
– Hard to understand, test, analyze, etc

• Stable Multithreading (StableMT)


– E.g., [Tern OSDI 10] [Determinator OSDI 10] [Peregrine
SOSP 11] [Dthreads SOSP 11]

• Deterministic Multithreading (DMT)


– E.g., [Dmp ASPLOS 09] [Kendo ASPLOS 09]
[CoreDet ASPLOS 10] [dOS OSDI 10]

• StableMT is better! [HotPar 13] [CACM 14]


5
Challenges of StableMT
• Performance challenge: slow
– Ignore load balance (e.g., [Dthreads SOSP 11): serialize
parallelism (5x slow down with 30% programs)

• Deployment challenge:
// thread 1 ... too
// complicated
thread N
...;
compute(); ...;
– Reuselock(m);
schedules (e.g., [Tern OSDI 10][Peregrine SOSP 11] [Ics
lock(m);
13]): sophisticated ...;
OOPSLA ...; program analysis
unlock(m); unlock(m);
. .compute();
. ... .
. .
lock(m); lock(m);
...; ...;
unlock(m); unlock(m); 6
Parrot Key Insight
• The 80-20 rule
– Most threads spend majority of their time in a
small number of core computations

• Solution for good performance


– The StableMT schedules only need to balance
these core computations

7
Parrot: A Practical StableMT Runtime
• Simple: a runtime system in user-space
– Enforce round-robin schedule for Pthreads synchronization

• Flexible: performance hints


– Soft barrier: Co-schedule threads at core computations
– Performance critical section: get through the section fast

• Practical: evaluate 108 popular programs


– Easy to use: 1.2 lines of hints, 0.5~2 hours per program
– Fast: 6.9% with 55 real-world programs, 12.7% for all
– Scalable: 24-core machine, different input sizes
– Reliable: Improve coverage of [Dbug SPIN 11] by 106 ~ 1019734

8
Outline
• Example

• Evaluation

• Conclusion

9
An Example based on PBZip2
int main(int argc, char *argv[]) {
for (i=0; i<atoi(argv[1]); ++i) // argv[1]: # of threads
pthread_create(…, consumer, 0);
for (i=0; i<atoi(argv[2]); ++i) { // argv[2]: # of file blocks
block = block_read(i, argv[3]); // argv[3]: file name
pthread_mutex_lock(&mu);
add(queue, block); enqueue(queue, block);
} pthread_cond_signal(&cv);
} pthread_mutex_unlock(&mu);
void *consumer(void *arg) {
for(;;) { // exit logic elided for clarity
pthread_mutex_lock(&mu);
block = get(queue); // blocking
// termination call
logic elided
compress(block); // core computation
while (empty(q))
} pthread_cond_wait(&cv, &mu);
} char *block = dequeue(q);
pthread_mutex_unlock(&mu);

10
The Serialization Problem
LD_PRELOAD=parrot.so pbzip 2 2 a.txt
int main(int argc, char *argv[]) {
for (i=0; i<atoi(argv[1]); ++i) main
consumer1 consumer2
pthread_create(…, consumer, 0); thread
for (i=0; i<atoi(argv[2]); ++i) { get() wait
block = block_read(i, argv[3]);
get() wait
add(queue, block);
} add()
}
runnable
void *consumer(void *arg) {
for(;;) { get() ret
block = get(queue); compress()
compress(block); add()
}
wdow n
7.7x slo !
runnable
}
Ob s e r
r
ved
i a l i ze d in a
Se re a d s get()
i t h 1 6 th
w yst e m . compress()
s s
previou
11
Adding Soft Barrier Hints
LD_PRELOAD=parrot.so pbzip 2 2 a.txt
int main(int argc, char *argv[]) {
soba_init(atoi(artv[1])); main
consumer1 consumer2
for (i=0; i<atoi(argv[1]); ++i) thread
pthread_create(…, consumer, 0); get() wait
for (i=0; i<atoi(argv[2]); ++i) {
get() wait
block = block_read(i, argv[3]);
add(queue, block); add()
}
} get() ret
void *consumer(void *arg) { soba_wait()
for(;;) {
block = get(queue); add()
soba_wait();
compress(block); get() ret
} soba_wait()
}
verhe ad!
0. 8% o compress() compress()
Only
12
Performance Hint: Soft Barrier
• Usage
– Co-schedule threads at core computations

• Interface
– void soba_init(int size, void *id = 0, int timeout = 20);
– void soba_wait(void *id = 0);

• Can also benefit


– Other similar systems, and traditional OS schedulers
13
Performance Hint:
Performance Critical Section (PCS)
• Motivation
– Optimize Low level synchronizations
– E.g., {lock(); x++; unlock();}

• Usage
– Get through these sections fast by ignoring round-robin

• Interface
– void pcs_enter();
– void pcs_exit();

• And can check


– Use model checking tools to completely check schedules in PCS
14
Evaluation Questions
• Performance of Parrot

• Effectiveness of performance hints

• Improvement on model checking coverage

15
Evaluation Setup
• A wide range of 108 programs: 10x more, and complete
– 55 real-world software: BerkeleyDB, OpenLDAP, MPlayer, etc.
– 53 benchmark programs: Parsec, Splash2x, Phoenix, NPB.
– Rich thread idioms: Pthreads, OpenMP, data partition, fork-join,
pipeline, map-reduce, and workpile.

• Concurrency setup
– Machine: 24 cores with Linux 3.2.0
– # of threads: 16 or 24

• Inputs
– At least 3 input sizes (small, medium, large) per program

16
Performance of Parrot
4
4

3.5
Normalized Execution Time
3

2.5

2
2

1.5

1
1

0.5

0
0

ImageMagick GNU C++ Parallel STL Parsec Splash2-x Phoenix NPB


pfscan
openldap

aget
mencoder
redis
berkeley db

pbzip2_compress
pbzip2_decompress

17
Effectiveness of Performance Hints
# programs # lines Overhead Overhead
requiring of hints /wo hints /w hints
hints
Soft barrier 81 87 484% 9.0%
Performance 9 22 830% 42.1%
critical section
Total 90 109 510% 11.9%

Time: 0.5~2 hours per program, mostly by inexperienced students.


# Lines: In average, 1.2 lines per program.
How: deterministic performance debugging + idiom patterns.
18
Improving Dbug’s Coverage
• Model checking: systematically explore schedules
– E.g., [Dpor POPL 05] [Explode OSDI 06] [MaceMC NSDI 07] [Chess OSDI 08] [Modist
NSDI 09] [Demeter SOSP 11] [Dbug SPIN 11]
– Challenge: state-space explosion  poor coverage

• Parrot+Dbug Integration
– Verified 99 of 108 programs under test setup (1 day)
• Dbug alone verified only 43
– Reduced the number of schedules for 56 programs by
106 ~ 1019734 (not a typo!)

19
Conclusion and Future Work
• Multithreading: too many schedules

• Parrot: a practical StableMT runtime system


– Well-defined round-robin synchronization schedules
– Performance hints: flexibly optimize performance

• Thorough evaluation
– Easy to use, fast, and scalable
– Greatly improve model checking coverage

• Broad application
– Current: static analysis, model checking
– Future: replication for distributed systems
20
Thank you! Questions?

Parrot: http://github.com/columbia/smt-mc
Lab: http://systems.cs.columbia.edu

21

You might also like