A Study of The Internal and External Effects of Concurrency Bugs
A Study of The Internal and External Effects of Concurrency Bugs
concurrency bugs that were found in MySQL [5], a mature, These characteristics make MySQL representative of some
widely-used database server application. of the biggest challenges that we will be facing as complex
Our study produced several interesting findings. First, applications become more and more concurrent.
we found a non-negligible number of latent concurrency In Section 3 we provide some brief background on
bugs. Latent concurrency bugs, when triggered, do not be MySQL, which will help in better understanding our results.
come immediately visible to users. Instead, these concur
rency bugs first silently corrupt internal data structures, and 2.2 Concurrency bug selection
only potentially much later cause an application failure to
become externally visible!. Latent concurrency bugs have
The MySQL versions that are affected by the bugs that
been anecdotally reported [13], but we are the first to study
were reported in the bug report database range from version
their extent, and their internal and external effects in detail.
3.x to 6.x and the oldest bug reports date back to 2003.
A second finding is related to bugs that cause the ap
The MySQL bug report database contains a very large
plication to fail in ways other than silently crashing. We
number of bugs. Therefore, to make the task feasible, we
characterize Byzantine failures that are caused by concur
automatically filtered bugs that are not likely to be relevant
rency bugs. Some of our findings were surprising, like the
by performing a search query on the bug report database.
fact that these bugs cause subtle changes in the output that
Our search query filtered bugs based on (1) the keywords
would be difficult to find using existing run-time monitor
contained in the bug description, (2) the status of the bug
ing tools, or the fact that there exists a strong correlation
and (3) the bug category.
between bugs that cause Byzantine failures and latent bugs.
We searched the MySQL bug report database for bugs
Our findings have implications for the design of tools and
that contained keywords commonly associated with concur
methodologies that address concurrency bugs. For the con
rency bugs. Such keywords included the following terms:
venience of the reader we present a summary of our main
lock, acquire, compete, atomic, concurrency, synchroniza
findings together with their implications in Table 1.
tion, etc. In addition to this we searched for bugs whose sta
The remainder of the paper is organized as follows. In
tus was closed (i.e., bugs that are no longer under analysis
Section 2 we describe our methodology. We then present
by the developers/debuggers). It would have been interest
an overview of the MySQL application in Section 3. The
ing to also consider bugs with other status (such as won'tfix
results of our study are presented in Section 4 and in Sec
and can't repeat) but these bug reports are not likely to have
tion 5 we discuss their implications. We survey related work
detailed discussions and more importantly, in general, they
in Section 6 and we conclude in Section 7.
won't contain patches. Without reasonably complete bug
reports it would not be possible to thoroughly understand
2 Methodology the bugs they report.
Next, to exclude bugs from stand-alone utilities that are
In this section we present the methodology that we unrelated to the multi-threaded server, our search query also
adopted to find and analyze concurrency bugs. Our method limited the search to bugs that were related to MySQL
ology is similar to one used in previous work [22]. Server, including those that were within the Storage En
gines category [26].
2.1 Choice of concurrent application Finally, we randomly sampled a subset of the bugs that
matched our search query and manually analyzed them.
We selected MySQL as the target of our study for three The manual inspection revealed that some of the bugs that
main reasons. First, it is a widely deployed database. matched the search query were not concurrency bugs (de
Databases are a critical component of the IT infrastructure fined in Section 3) and so we also excluded them. In addi
of many corporations, and MySQL represents a substantial tion, we excluded bugs for which the bug log did not con
share of that market (about 1/3 of deployed database sys tain enough information to analyze them. After filtering, we
tems [4]). This implies that there is market pressure for obtained a final set with 80 concurrency bugs that were an
a quality development and maintenance process, so this is alyzed, a number that is very close (or even superior) to the
an instance of well-maintained software where finding and number of bugs analyzed in previous studies [11,22].
eliminating bugs matters. Second, it is an open source appli Table 2 shows the bug count across the different stages
cation with a well-maintained bug report database. Having of the bug selection process.
access to the source code and the bug logs is necessary for Note that this selection process has two main limitations.
an in-depth analysis. Finally, it is a highly concurrent ap First, the search query can miss some actual concurrency
plication with rich semantics, and it has a large code base. bugs. However, a concurrency bug report that does not con
lThe term latent bug is used in other papers [8,18,20] with an unrelated tain any of the main keywords associated with concurrency
meaning - that of a bug that went undetected by the programmer. is also more likely to be incomplete and therefore more dif-
Finding Implication
Evolution of concurrency bugs
According to the opening dates of our sampled bugs, This shows the increasing need for new tools and
the proportion of fixed bugs that involved concurrency methodologies to handle concurrency bugs.
more than doubled over the last 6 years.
External effects of concurrency bugs
We found slightly more non-deadlock bugs (63%) than Having good tools to handle deadlock bugs is not
deadlock bugs (40%). enough - we also need to handle non-deadlock bugs.
We found a significant fraction of semantic/Byzantine Techniques for Byzantine fault tolerance can potentially
bugs (15%). handle a considerable fraction of concurrency bugs.
Immediacy of effects
Latent concurrency bugs were also found in significant Tools and methodologies such as proactive recovery
numbers (15%). can be leveraged to mask errors caused by a significant
numbers of concurrency bugs.
Of the latent concurrency bugs analyzed, 92% were se- Given the high correlation between these classes of
mantic bugs and conversely 92% of the semantic bugs bugs, techniques that handle one class should also han-
were also latent bugs. die the other.
Semantic concurrency bugs
The vast majority of semantic bugs (92%) generated Run-time monitoring tools will have to devise complex
subtle violations of application semantics. application-specific checks to detect the presence of se-
mantic bugs.
Internal data structures
Most of the examined latent bugs (92%) corrupted mul- Techniques that detect inconsistencies among data
tiple data structures. structures could be used to detect latent bugs. Analyz-
ing data structures individually might not suffice.
Severity and fixing complexity of bugs
Latent bugs were found to be slightly more severe than Latent bugs are an important threat to software reliabil-
non-latent bugs. ity and, therefore, latent bugs should also be addressed.
Latent bugs were found to be easier to fix than non- Further studies should be performed to analyze the rea-
latent bugs. sons for this difference.
Table 1. Main findings of this study and their implications. T he methodology for collecting the data
presented here is described in Section 2 and the results are explained in detail in Section 4.
Phase Number of bugs bugs. We analyzed the bugs using information contained in
Total MySQL server closed bugs 12.5k the bug reports (including the patches), as well as the source
Concurrency related keyword matches 583 code of the application.
Sampled bugs 347
Bug reports contain several types of information that are
Concurrency bugs analyzed 80
useful for filtering out non-concurrency bugs, and for under
Table 2. Bug counts for different stages of the standing their characteristics. In particular, bug reports con
analysis. tain not only the description of the bug, but also discussion
among the developers and debuggers about how to diagnose
and solve the problem. The information contained in these
discussions is often important to understand the bugs, in
ficult to successfully analyze. Second, concurrency bugs particular to determine whether they are concurrency bugs,
are likely to be underreported, which would explain why and to understand their effects. Typically the bug report will
out of a total of about 12.5k bugs in the bug database we also include the patch, and even the method to reproduce
only found 80 concurrency bugs. the bug; sometimes more than one patch attempt is made
before developers agree on a definitive patch. Bug reports
2.3 Manual analysis of bug reports also include additional fields such as the perceived severity,
the status, and the software version affected.
We manually analyzed the bug reports of the sampled list We used all these types of information contained in bug
of bugs, focusing on trying to understand the effects of the reports to gain an understanding of how bugs are triggered
and when they are what are their effects 2. In addition, some Despite the existence of recent proposals for other types
of this information was also used to estimate the complexity of synchronization primitives such as transactional mem
of fixing concurrency bugs and their severity. ory [17], there is value in studying and improving the meth
ods that address the problems with lock-based synchroniza
tion. This is not only because we still run many applications
3 MySQL
that use locks, which will benefit from being made more ro
bust for years to come, but also because the vision behind
In this section we provide a brief overview of the char such proposals is not to entirely replace locks, but instead
acteristics of MySQL that are relevant for this study. to use these new primitives in smaller sections of the code
where the possible performance impact would be lower.
3.1 Internal structure
3.3 Request vs. transaction concurrency
25 25 25 25
# Concurrency bugs --+-- # Concurrency bugs --+--
§ §
�
�
.D .D
>.
15 15
!" 15 15
!"
>.
"
u
"
u
g" 0
g"
10 10 10 / 10 .Q
" .x.. ....
'e u
-
8.
u
0
K-- -->E __--- ... 8. 6 - )(
___ ,.._ _--- x- ------- -----
__
u u )(_____ ---x
__
5 ____
5 5 5 Ie
"" _
______
Ie "" "-
"-
0 0 0 0
2003 2004 2005 2006 2007 2008 2009 2003 2004 2005 2006 2007 2008 2009
Time (year) Time (year)
Figure 1. Evolution of bugs (by open date). Figure 2. Evolution of bugs (by close date).
MySQL server category. To obtain the set containing all Table 3. Note that the sum of all occurrences is larger than
bugs we excluded the keyword part of the search together the total number of bugs because some bugs fit into more
with the sampling phase explained in Section 2. For each than one category.
year we counted the number of concurrency bugs and their We can see that there are slightly more bugs that cause
proportion (compared with generic bugs). We looked at non-deadlock conditions (63%) than deadlock conditions
both the opening date and closing date because program (40%), and among the non-deadlock bugs the most preva
mers typically require a significant amount of time (i.e., lent consequences are either causing the server to crash
many months) to solve the bugs under analysis. The results (28%) or providing the wrong results to the user, which we
are presented in Figures 1 and 2. From these results we can term semantic bugs (15%).
see that there has been a trend of increasing number and Semantic bugs are Byzantine failures, where the applica
proportion of concurrency bugs over the years. However, tion provides the user with a result that violates the intended
this trend does not seem to be very prominent. semantics of the application. This is an interesting class
The data that we collect does not allow us to determine of bugs since masking their effects requires sophisticated
the causes underlying this finding, however we can think of (and possibly expensive) techniques such as Byzantine
two possible reasons for this slight increase. One possible fault-tolerant replication [10] or run-time verification of the
explanation is that the advent of multi-core hardware causes behavior of the application against a specification of the
users and developers to stumble upon these bugs more of system [30]. We discuss these bugs in more detail in Sec
ten than they used to in the past. Another explanation that tion 4.4.
we cannot rule out is that developers, while trying to fur
The high percentage of deadlock bugs that we encoun
ther parallelize the code, actually increase the number of
tered leads us to believe that, despite significant research
concurrency bugs that they introduce.
to address deadlock bugs, in practice this class of bugs still
Of the concurrency bugs that we sampled, the oldest con constitutes a significant problem for the robustness of soft
currency bug was opened in March 2nd, 2003, while the ware. The percentage of deadlock bugs that our study found
youngest was closed in September 16th, 2009. Therefore, is in line with results from other studies [22].
to make the comparison fair, we excluded the bugs that were
The remaining three classes of external effects were
outside this range from the list of generic bugs used to com
slightly less prevalent. These are error messages (9%),
pute the proportions.
which we distinguish from the class of semantic bugs, de
To interpret these results it should also be taken into con
spite the fact that when error messages are provided to the
sideration that, as we show in Section 4.7, the time it takes
user an unexpected result is also returned. We distinguish
to close a concurrency bug can be quite long (e.g., some
error bugs from semantic bugs by the fact that an error is
bugs took more than a year to fix). This explains why the ab
detected by the server and therefore is explicitly flagged in
solute number of bugs opened in the last year is low: many
the reply to the client request, and can be handled by the
concurrency bugs potentially discovered in 2009 have not
client application appropriately. For instance, in one bug
yet been fixed, which means they are not yet closed and
(bug #42519) when a restore operation is performed con
were therefore not accounted for in this study.
currently with an insert operation a generic error message is
returned to the user. We also found a number of bugs (8%)
4.2 External effects in which client requests hang (the client does not receive a
reply), which differs from a deadlock situation where one
We analyzed the concurrency bugs with respect to the thread or a series of threads are waiting in a circular de
external effects that are exposed to the clients, and divided pendency. Typically these are caused by a thread that fails
these effects into six categories. The results are presented in to release a certain lock, causing another thread that tries to
Applications can be very different (e.g., some have graph source applications (including MySQL) but the focus of
ical user interfaces while others do not, some applications their work was quite different from ours. They analyzed all
use the client-server model while others do not). As an ex bugs (among which only 12 were concurrent) and focused
ample, from the data collected in another study [22] that exclusively on determining whether generic recovery tech
compared different applications, about half of the deadlocks niques such as process pairs would be effective in tolerating
found in MySQL involved only 1 resource while almost all them. In their case, concurrency bugs were only one pos
of the deadlocks found in Mozilla involved 2 or more re sible type of bug that fell into the category for which such
sources. Given the very different characteristics of applica techniques are effective. In contrast, we focus on a more
tions, we believe that the conclusions that we present here narrow class of bugs by limiting ourselves to concurrency
are unlikely to be generalizable to arbitrary multi-threaded bugs, but provide a broader analysis taking into considera
applications. tion several characteristics of these bugs.
The number of bugs analyzed in this study is compara Farchi et al. analyzed concurrency bugs, but by artifi
ble to the number of bugs analyzed in other related stud cially creating them [14]. The methodology adopted by
ies [11,22,32]. However, it is worth noting that our results the study was to ask programmers to write programs con
could potentially suffer from two sources of bias. First, our taining concurrency bugs, which arguably may not lead to
sample, in absolute terms, is small. Obviously, this limits bugs that are representative of real world problems. In con
the confidence in the results, but at the same time it is a trast, we analyze a database of bugs in a widely used, well
limitation that is difficult to overcome due to the time re maintained application.
quired to gather the data and the amount of data available.
Recently Lu et al. [22] studied real concurrency bugs that
(This is a limitation shared by previous studies.) Second, we
were found in four open source applications. Using the re
only analyzed bugs that were documented and fixed. This
spective bug report databases, the authors analyzed a total
means we did not account for bugs that were not fixed (or
of 105 concurrency bugs. Their study focused on several
even found), nor bugs that were fixed but not documented.
aspects of the causes of concurrency bugs, and the study of
We believe that these biases are very difficult to overcome
their effects was limited to determining whether they caused
given the nature of bugs in general but specifically given
deadlocks or not. We build on this study, in particular by
the nature of concurrency bugs. Nevertheless, more studies
using a very similar methodology for deciding which bugs
are desirable to improve our understanding of concurrency
to analyze, but provide a complementary angle by studying
bugs.
the effects of concurrency bugs (e.g., whether concurrency
bugs are latent or not, or what type of failures they cause).
6 Related Work
There also exist various studies of bug characteristics
in software systems focusing on several aspects of generic
Given the importance of software reliability and the
bugs [12, 16, 21, 25, 31]. In contrast, our study focuses
prevalence of bugs in software in general, many studies
specifically on concurrency bugs, which are more challeng
about bugs have previously been undertaken.
ing to analyze.
There is a large body of literature about the propaga
tion [33] and even prediction [24] of bugs in source code. Recently Sahoo et al. have been trying to understand the
Some of these studies use the revision control system to reproducibility of bugs [29]. W hile the main focus of their
understand the behavior of programmers and its effects on study was not concurrency bugs, the authors distinguished
software reliability (e.g., which components or source code concurrency bugs from non-concurrency bugs when trying
files are most prone to errors). This work is complemen to characterize their reproducibility.
tary to the work presented in this paper, which is focused Finally, there exist many proposals for handling concur
on a specific class of bugs (i.e., concurrency bugs) and on rency bugs. These represent not only different techniques,
understanding their consequences. but also very different approaches to improving software re
In a previous paper, researchers analyzed the conse liability. They include approaches to avoid bugs [17], to
quences of bugs for three different database systems [32]. find bugs [13], to mask bugs [32] and even to recover from
However the authors did not distinguish between con bugs [9]. Because concurrency bugs, in addition to being
currency and non-concurrency bugs, and only evaluated dependent on the input, are also dependent on the interleav
whether they caused crash or Byzantine faults (since that ing chosen by the operating system, there are approaches
paper was focused on presenting a replication architecture, that specifically handle concurrency bugs by artificially dis
instead of being focused on studying bugs). In contrast, we turbing [6], controlling [23] or limiting [7] thread interleav
provide a detailed analysis of the effects of the bugs and we ings. Our work is complementary in that it has the potential
focus on concurrency bugs. to guide and motivate the development of these kinds of
Chandra et al. [11] looked at bug databases of three open- techniques and approaches.
7 Conclusion [10] M. Castro and B. Liskov. Practical byzantine fault tolerance. In Proc.
of Operating System Design and Implementation (OSDI),1999.
[11] S. Chandra and P. M. Chen. Whither generic recovery from appli
Concurrency bugs pose a challenge in the development cation faults? A fault study using open-source software. In Proc.
of reliable applications. Concurrency bugs are a type of bug of International Conference on Dependable Systems and Networks,
2000.
that is likely to become more and more prevalent in the de [12] A. Chou, 1. Yang, B. Chelf, S. Hallem, and D. Engler. An empir
velopment life cycle as applications become more concur ical study of operating systems errors. In Proc. of Symposium on
Operating System Principles (SOSP),2001.
rent to take advantage of parallelism in the hardware.
[13] D. Engler and K. Ashcraft. RacerX: Effective, static detection of
To gain a better understanding of this problem, we pre race conditions and deadlocks. SIGOPS Operating Systems Review,
sented a study of concurrency bugs in MySQL. In contrast 37(5):237-252,2003.
to previous studies, we focused on the effects of concur [14] E. Farchi, Y. Nir, and S. Ur. Concurrent bug patterns and how to test
them. In International Parallel and Distributed Processing Sympo
rency bugs rather than on their causes. sium (IPDPS),2003.
Studying how bugs manifest enabled us to produce some [15] 1. Gray. W hy do computers stop and what can be done about it?
In Proceedings of Reliability in Distributed Software and Database
interesting findings, such as a high prevalence of latent bugs Systems,1986.
that silently corrupt data structures but may take longer to [16] W. Gu, Z. Kalbarczyk, Ravishankar, K. Iyer, and Z. Yang. Charac
become externally visible, and a strong correlation between terization of linux kernel behavior under errors. In Proc. of Interna
tional Conference on Dependable Systems and Networks,2003.
latent bugs and bugs that cause Byzantine failures.
[l7] M. Herlihy and 1. E. B. Moss. Transactional memory: Architectural
We hope that our study can open interesting avenues for support for lock-free data structures. SIGARCH Computer Architec
future research. In particular, we intend to develop tools that ture News,21(2):289-300,1993.
[l8] D. Hovemeyer and W. Pugh. Finding bugs is easy. SIGPLAN Notices,
address the issue of latent bugs from two different angles.
39(12):92-106,2004.
First, we need to develop better ways to find these bugs dur [l9] H. lula, D. Tralamazza, C. Zamfir, and G. Candea. Deadlock im
ing the course of testing. We intend to develop better tools munity: Enabling systems to defend against deadlocks. In Proc. of
Operating System Design and Implementation (OSDI),2008.
for catching the subtle corruption of internal state caused by
[20] T. Kelly, Y. Wang, S. Lafortune, and S. Mahlke. Eliminating concur
the kinds of bugs we analyzed. Second, latent bugs provide rency bugs with control engineering. IEEE Computer,99(1),2009.
an interesting opportunity to develop techniques that detect [21] Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai. Have things
changed now?: An empirical study of bug characteristics in modem
them and heal the service state before the buggy output is
open source software. In Proc. of Architectural and System Support
seen by clients. jor Improving Software Dependability (ASID),2006.
[22] S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from mistakes: A
comprehensive study on real world concurrency bug characteristics.
Acknowledgments SIGARCH Computer Architecture News,36(1):329-339,2008.
[23] M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar, and
I. Neamtiu. Finding and reproducing heisenbugs in concurrent pro
We are grateful for the feedback provided by the anony grams. In Proc. of Operating System Design and Implementation
mous reviewers. Pedro Fonseca was supported by a grant (OSDI),2008.
provided by FCT. [24] S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller. Predicting
vulnerable software components. In Proc. of Conference on Com
puter and communications security (CCS),2007.
References [25] T. Ostrand, E. Weyuker, and R. Bell. Predicting the location and
number of faults in large software systems. IEEE Transactions on
Software Engineering (TSE),31(4):340-355,April 2005.
[I] Azul Systems - Industry's Leading Azul Compute Appliances.
http://www.az ulsystems.com/products/compute- [26] S. Pachev. Understanding MySQL internals. O'Reilly Media, Inc.,
appliance. htm. 2007.
[2] GeForce GTX 295. http://www.nvidia.com/object/ [27] S. Park, S. Lu, and Y. Zhou. CTrigger: Exposing atomicity violation
product_geforce_9tx_295_us.html. bugs from their hiding places. In Proc. of International Conference
on Architectural Support jor Programming Languages and Operat
[3] Intel Previews Intel Xeon 'Nehalem-EX' Processor.
http:
ing Systems (ASPLOS),2009.
//www.intel.com/pressroom/archive/releases/
2009/20090526comp.htm. [28] F. Qin, 1. Tucek, 1. Sundaresan, and Y. Zhou. Rx: Treating bugs as
allergies-A safe method to survive software failures. In Proc. of
[4] MySQL :: Market Share. http://www.mysql.com/
why-mysql/marketshare/.
Symposium on Operating System Principles (SOSP),2005.
[29] S. K. Sahoo, 1. Criswell, and V. S. Adve. An empirical study of
[5] MySQL:: The world's most popular open source database. http:
reported bugs in server software with implications for automated bug
//www.mysql.com.
diagnosis. Tech. Report 2142/13697, University of Illinois, 2009.
[6] Y. Ben-Asher, Y. Eytani, E. Farchi, and S. Ur. Producing scheduling
that causes concurrent programs to fail. In Proc. of Parallel and
[30] B. Schroeder. On-line monitoring: A tutorial. IEEE Computer,
Distributed Systems: Testing and Debugging (PADTAD),2006. 28( 6) : 72-78, lun 1995.
[31] M. Sullivan and R. Chillarege. A comparison of software defects
[7] R. L. Bocchino, V. S. Adve, S. V. Adve, and M. Snir. Parallel pro
in database management systems and operating systems. In Proc.
gramming must be deterministic by default. In Proc. of Workshop on
of International Symposium on Fault-Tolerant Computing (F T CS),
Hot Topics in Parallelism (HotPar),2009.
1992.
[8] Y. Brun and M. D. Ernst. Finding latent code errors via machine [32] B. Vandiver, H. Balakrishnan, B. Liskov, and S. Madden. Tolerat
learning over program executions. In Proc. of International Confer ing byzantine faults in transaction processing systems using commit
ence on Software Engineering (ICSE),2004. barrier scheduling. In Proc. of Symposium on Operating System Prin
[9] G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Mi ciples (SOSP),2007.
croreboot - A technique for cheap recovery. In Proc. of Operating [33] L. Voinea and A. Telea. How do changes in buggy mozilla files prop
System Design and Implementation (OSDI),2004. agate? In Proc. of Symposium on Software Visualization (SoftVis),
2006.