Counterexample Explanation in Divine Model-Checker: M U F I
Counterexample Explanation in Divine Model-Checker: M U F I
}w!"#$%&123456789@ACDEFGHIPQRS`ye|
MASTERS
THESIS
Brno, 2011
Declaration
Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.
ii
Acknowledgement
I would like to thank my supervisor prof. RNDr. Lubo Brim, CSc. for guidance, discussions and the helpful advice. I am also grateful to the authors of the paper which was implemented in this work for providing further explanations on how the algorithm and JPF itself work. Last, but not least, I thank my family and closest friends for their support.
iii
Abstract
The thesis focuses on counterexample explanation in model checking, which aims to provide useful information about the cause of an error so that the system designer can nd and x the error faster. We summarize some of the existing methods to deal with this problem and implement the method of Alex Groce and Willem Visser. This method describes three ways of comparing failing and unfailing runs to extract the cause of the problem. The original method was used in Java PathFinder, we implemented it in the parallel model checker DiVinE 2. The implemented method is evaluated on various models and a summary of the results is presented.
iv
Contents
1. Introduction 2. Notions in Model Checking
2.1. Explicit model checking with safety properties . . . . . . . . . . . .
1 2
2
5
5 10 12 13 14
16
16 19 22 25
5. Algorithm Evaluation
5.1. Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
27 42 46
50 51
51 51
1. Introduction
In the past century, computers have changed our society in a signicant way and now they are practically ubiquitous. They are used in a wide variety of areas, ranging from leisure activities to business and even to controlling critical systems such as airplanes. Requirements for reliability and safety in the latter group are naturally very high and thus we need formal methods of verifying our systems. One of such methods is called model checking. In model checking, the system is veried by an automatic program against a specication given either as a safety or a liveness property. A safety property states that something bad never happens and is usually expressed using propositional logic. A liveness property states that something good keeps happening and is expressed using a temporal logic such as CTL or LTL. When the model checker nishes its work, it either states that the property is fullled or reports a particular run of the system which violates the specication. This run is called counterexample and in the case of safety properties it has the form of a sequence of states of the system which leads to the state violating the safety specication. When an error is found, it is up to the developer to nd its cause and correct it. In this thesis, we call this problem counterexample explanation, even though not every published approach works specically with the counterexample. The task of nding the specic cause of the error is not always easy for the system designer, particularly in larger systems, where they are confronted with long counterexample traces consisting of many states and actions. And while model checking helps signicantly with nding counterexamples, there still is room for automated methods, which would help with localizing the error. As opposed to model checking, counterexample explanation is not a clearly dened problem, because what the designer needs is help with nding the specic cause, but its not clear what the help should be like or what actually a specic cause of an error is. There are a few points of view in publications on counterexample explanation and they are briey summarized in Chapter 3. In this work, we implemented the method of A. Groce and W. Visser[11] in the parallel explicit model checker DiVinE 2.
2. Notions in Model Checking T is a nite set of transitions ( T : S S) S0 S is an initial state L : S 2AP is a labelling function where AP is a set of atomic propositions (s, t) for some T can also be written as s t. When the transition name is omitted, as in s t, the formula T : s t is meant. Counterexample explanation needs more information about the internal structure of a state than the model checker. Some of the information can be provided by the labelling function L, but often the set S itself needs to be structured further. In practical systems, each state of a system comprises one control location for every process, thread or HW component and data valuation of all variables in the system. This means that the set S of states actually has a structure: S=CD To access the internals of a state s we can use the two projection functions c : S C and d : S D. To illustrate, consider the following picture of a Kripke structure. It consists of 4 states, s1 , . . . , s4 and transitions , , , . Each state has three components, the rst is the control location of the rst process and it can be either a or b, the second component is the control location of the second process with possible values being x or y and the last is an integer variable which we will refer to as z.
s1 s4 b,y,1
a,x,0
s2 a,y,0
a,y,1 s3
The labelling function usually contains atomic propositions stating the values of variables so in our example, {z = 0} s1 , {z = 1} s3 . Now we dene the notion of a nite path, which is central to most of our discussion.
2. Notions in Model Checking Denition 2. A nite path of Kripke structure M = (S, T, S0 , L) is a nite sequence s0 s1 . . . sn such that n 0, si S and 0 < i n : si1 si . Denition 3. A nite path t = s0 s1 sk is a prex of nite path t = s0 s1 sk when 0 < k < k and i < k : (i 0 si = si ) (i > 0 i = i ). A safety property holds in a Kripke structure M if and only if holds in every state reachable from S0 [2]. holds in a state s S, written as s |= , when holds under valuation corresponding to L(s). In our example, the formula z < 3 holds but z = 1 does not. Given the Kripke structure and a formula, the model checker can verify it by visiting all states from the initial state using DFS or BFS and checking the validity of the formula at every state. When a state is found in which the formula does not hold, the search is terminated and the path from initial state to the failing state is presented to the user as a counterexample. This algorithm may also be called reachability, because it enumerates the reachable states and additionally performs the formula validity check.
1 2 k 1 2 k
3. A survey of approaches to counterexample explanation 2. sk = sk Further, let neg(t) be the set of all negatives with respect to a counterexample t. Denition 5. A positive (with respect to t) is a nite path t = s0 s1 sk such that: 1. c(sk1 ) = c(sk 1 ) k = k 2. sk 3. t neg(t) : t not a prex of t Again, let pos(t) be the set of all positives with respect to a counterexample t. Note that the last state of the counterexample, sk , belongs to the set of virtual error states , so we are in fact not comparing states of the system, only the kind of error. Similarly for k , the action leading to the virtual state the action corresponds to checking an assertion, so k = k means that we are only interested in negatives which fail on the same assertion. As the method works on a single counterexample, we will be omitting the parameter of neg and pos. We nd only a subset of neg and a subset of pos. No attempt is made to enumerate the sets neg or pos entirely as that could be computationally prohibitive and is not necessary. However, it also means that we are unable to fully check the condition that no positive is a prex of a negative. It is sufcient for our purposes to check only the negatives that we generate. The algorithm assumes that it can run a reachability procedure, which explores the state space from a given starting state and returns sets of positives and negatives. These are found by checking a safety property corresponding to the denitions; specically, we are searching for a state which has the same control location as the state preceding the error state in the counterexample and which leads to the same error state (for negatives) or does not lead to an error state (for positives). Please note that the error state is actually a virtual state. The algorithm also requires that the reachability procedure can be limited to search only to a maximum depth and that it does not visit states that were already visited in preceding calls. Finally, the procedure must be able to nd multiple states satisfying the condition given above and return a path to each of them. The following pseudocode shows the basic version of negatives and positives searching algorithm. The intuitive idea is that it starts a reachability procedure called MC from every state on the counterexample with a limited search depth and collects the reported negatives and positives. In the pseudocode, v denotes the set of visited states, it is used and also lled by MC to avoid visiting states
1
k
3. A survey of approaches to counterexample explanation repeatedly. n is the set of negatives found so far, p the set of positives and d is the maximum search depth parameter. Algorithm 1: BasicNegPos Input: counterexample t, maximum search depth d 1 v := ne g := pos := ; 2 i := k 1; 3 while i 0 do 4 (n, p, v) := M C(si , t, d, v); 5 ne g := ne g n; 6 pos := pos p; 7 i := i 1; 8 end 9 for t pos do 10 for t ne g do 11 if t is a prex of t then 12 pos := pos t 13 end 14 end 15 end 16 return (ne g, pos) The publication proposes one more algorithm which, when integrated into the MC procedure, can nd some more negatives and positives. It is triggered in state s when the depth limit is reached. The algorithm, named Extension, tries to extend current search in such a way that the control locations and actions match some sufx of t, the original counterexample. This means that d works like a kind of edit distance where the negative/positive is allowed to diverge from t for d steps, but then it must match t again. In more detail, the algorithm rst tries to nd j such that c(s j ) = c(s), e.g. some state c j on the counterexample whose control location matches that of s (the state where depth limit is reached). When this j is found, it further tries to nd a path from s such that the actions and control locations along this path match those on the counterexample starting from s j .
Algorithm 2: Extension Input: counterexample t, starting state s, set of already visited states v Result: adds negatives and positives to the global sets 1 j := i; 2 while j < k do 3 if c(s j ) = c(s) then 4 s := s; 5 l := j + 1; 6 broken := false; 7 while l < k broken do
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
if s : s s c(s ) = c(sl ) s v then s := s ; else broken := true; end l := l + 1; end if broken then if s s then if s then add transition sequence to s to negatives; else add transition sequence to s to positives; end end end end j := j + 1; end
k
3. A survey of approaches to counterexample explanation Finally, we can try to convert a positive run into a negative one to nd a change which could be the cause of the error.
Transition analysis
When some action appears in all negatives, but is missing from positives, it is very likely the cause of the error. Other properties, such as an action appearing in all positives can be useful too. Thats the intuition behind transition analysis. We build several sets of transitions and present them to the user as a possible indication of the error. In transition comparison, we ignore the data component. We say that a nite path contains c, iff n < k : c(sn ) = c n+1 = . The sets we calculate are given in the following table: Set trans(neg) trans(pos) all(neg) all(pos) only(neg) only(pos) cause(neg) cause(pos) Denition c, |t ne g : t contains c, c, |t pos : t contains c, c, |t ne g : t contains c, c, |t pos : t contains c, trans(neg) \ trans(pos) trans(pos) \ trans(neg) all(neg) only(neg) all(pos) only(pos)
The sets trans(neg), trans(pos) are only intermediate results and are too large, so they are not reported to the user. The remaining sets are of greater interest though. all(. . . ) contains all the transitions which occur in all negatives or positives respectively. only(. . . ) are the transitions appearing only in negatives or only in positives. When this information is still too general, cause(. . . ) provides a further reduction and when its not empty, it can provide precise location of the common behaviour, which differentiates the positive and negative sets. See [11] for an example of transition analysis.
Invariant analysis
For some programs, transition analysis based on the control ow of the program is unsuitable. Transition analysis ignores the data components of the runs, but in
3. A survey of approaches to counterexample explanation some cases the data may be the discriminant between negatives and positives. According to the authors, applying transition analysis to d(s) instead of c(s) faces the problem that only some variables may be relevant rather than the full valuation. Instead, data invariants in the negative runs are compared to invariants in the positive runs. The user chooses certain points in the program where the data invariants over negatives and over positives are calculated. These are then compared and presented to the user. As before, please refer to the original paper for an example of this approach.
10
3. A survey of approaches to counterexample explanation model check device drivers written in C and has been successful in nding errors in real drivers. It uses abstraction to reduce state space to nite and manageable sizes and Counterexample Guided Abstraction Renement (CEGAR) to rene the abstraction when the counterexample produced is invalid. Nevertheless, the method for error localization from the paper is described on explicit state spaces. The authors also describe how procedure calls are handled in generating the state space and how it affects their algorithm, but we wont go into such detail in this work. The high level overview of the method is that when a safety violation is found, a counterexample is generated as usual, then a set of so called correct transitions is found and nally the cause is reported as the set difference between the transitions on the counterexample without the correct transitions. Then the control ow graph of the veried model is modied so as to remove the incorrect transitions. The whole process is repeated until no more causes are found. The set of correct transitions is found by an algorithm which is in principle the same as reverse reachability. If we denote the erroneous state as v, the process is started from the states which have the same location as v but dont have an error. More specically, a working set is established and initially it contains the correct states with the same location as v. Then for each state from the working set, the states that have a transition into the current one are added to the working set and the transitions are collected as the correct transitions. The algorithm ends when the working set is empty, that is when all backwardly reachable states have been enumerated. The cause of the error is identied as
project(T ) \ project(C)
where T is the set of transitions on the counterexample, C is the set of correct transitions and the function project removes the variable valuations of the states of the transition and returns just a pair of locations. This is very similar to only(neg) from the previous paper. The authors evaluated the method on Windows device drivers and checked for two properties. They received a total of 15 error traces in 8 drivers. In 11 of the traces the error was correctly localized, in 3 cases a change to the set difference formula was required and the last error wasnt localized because of the use of abstraction.
11
12
3. A survey of approaches to counterexample explanation is that we get the true most similar correct execution. There is one more step before the differences are presented to the user. This step is called slicing and it removes the differences which are not relevant to the error. A typical example is ltering out input values as they are always reected in the program variables. slicing is a variant of static or dynamic slicing, which is an approach which tries to nd which variables inuenced a certain program point. Slicing is already extensively described in literature. In this instance slicing can benet from the use of a model checker, making the list of causes even smaller and more focused. To nd the slice, the pseudo-Boolean solver is used to solve an optimization problem where the distance metric is the same, but the formula is modied so that a smaller slice is found. The new solution may not correspond to a possible run of the program, it is only used to reduce the size of the explanation. Please see the original paper for a more detailed description. There was also an attempt to perform the search for a most similar correct run and the slicing in a single invocation of the pseudo-Boolean solver, but that turned out to be counterproductive. The main problem was that without already having a correct run, the slicing part cannot optimize against one specic run. As for evaluation, the authors modelled the Trafc Collision Avoidance System (TCAS) from [8]. In some cases the explanation algorithm worked well, but they also encountered a problem where simply a different input was chosen to produce a correct run, which was unhelpful in resolving the error. After running slicing to the explanation, almost nothing was left, which means that slicing was useful in that it removed the unhelpful differences. The scores table in the article also contains a comparison with explain implemented in JPF and in this work, but due to differences between the two systems the comparison is of somewhat dubious value according to the authors.
13
3. A survey of approaches to counterexample explanation found using other means. Furthermore, Static Single Assignment is not used, which means that we need to align the (abstract) states, but on the other hand there are no values from paths which were not executed. Using predicates instead of concrete values has the advantage that the users work with high-level concepts which hopefully match their intuition about the program as opposed to concrete values for which they have to build an abstraction rst. To dene a distance metric, we rst need to align the two executions. The alignment is expressed using a logic formula and requires that only states with matching control locations are aligned. The other properties of the alignment are intuitive; a state has at most one aligned state, ordering is preserved and gaps are allowed. The formulas for the distance metric are rather involved, but the idea is that it sums the number of mismatched predicates and actions when the states are aligned with the number of unaligned states. The distance between two runs is then the minimum of the sum over all possible alignments. To nd the closest correct execution, again a formula is generated from the program under verication, then the constraints on alignment and the variables expressing the difference between the original counterexample and the searched correct execution are added to it. This is then processed by a pseudo-Boolean solver which minimizes the distance metric, which means that it nds a correct run with as few differences as possible together with an alignment. Furthermore, the execution found must be checked that it is not spurious as it operates in an abstracted environment. If it is spurious, a clause is added to the formula which forces generating a different trace and the process is repeated. Results of comparing this newer method with the previous one are mixed. On one hand, abstract model checking has the advantage that it allows to process larger systems and that, at least intuitively, predicates provide more information to the user than concrete values. There was success on some programs and on some kinds of errors. On the other hand, tests of the new method on TCAS or C/OS-II 2.0 failed to show an improvement.
14
3. A survey of approaches to counterexample explanation group contains the data signals. In verication of parallel software, the decisions of the process scheduler can be considered as controlling signals. The run of the system can be thought of as a two player game where the environment is trying to steer the system into an error. At every state, the environment chooses values of the controlling variables and then the system chooses values of the noncontrolling variables. The system then moves according to its transition relation. The model checker nds sets L i of states where states in L i are exactly those where the system must make i mistakes to be forced to an error. Using these sets, a counterexample containing the least possible free segments can be found and presented to the user for analysis.
15
16
In this example, written in the syntax of DVE, we have two processes, A and B, one global 32bit variable x and a variable count local to process A. Process A has two locations, q0, q1, similarly for B. The system is asynchronous. A state in DVE consists of a valuation of global variables, contents of buffered channels, locations of all processes and values of all local variables in all processes. Variables can be either integers (8 or 32 bits wide) or one dimensional arrays. Channels in DVE can transfer multiple values at the same time, but we only consider buffered channels carrying one value at a time, because Daikon, the invariant detector used in invariant analysis, supports only one dimensional arrays, which correspond to single-value buffered channels. A state in the example above can look like this:
x = 1 location of A = q1 A.count = 3 location of B = z0
As opposed to Java, DVE is not a real programming language, so working with our implementation of the explanation may not be as accessible to regular programmers as opposed to JPF, which operates over a well known language. That is a disadvantage, because the primary aim of counterexample explanation is to make using a model checker even easier. There have been attempts to build a new language for DiVinE, such as [13] (written in Czech), but as of time of writing, none have been completed.
17
deadlock The state has no successors. This happens when there is no enabled
transition in any of the processes.
expression A DVE expression given to DiVinE on the command line. The expression is evaluated in each state and the state is erroneous when it evaluates to false.
error Occurs when the DVE generator encounters an error such as two synchronized transitions accessing the same variable. DiVinE allows selecting which kinds of goals we are interested in using command line switches. There are no virtual states (see 3.1) in DiVinE, so the denition of negatives and positives is changed to express the same meaning using different concepts. For a negative, we require the following: c sk = c sk type of error in sk = type of error in sk (type of error in sk = assertion violation) (assertion violated in sk = assertion violated in sk ). In these denitions, unprimed symbols refer to the counterexample and primed to the negative or positive, as usual. In our setting, sk is the last real state of the system and it corresponds to sk1 in JPF. We dont consider the action k as in JPF, because, again, this action leads to the virtual state. Instead, we check that we end up with the same type of error and in the case of assertion violation, that we deal with the same assertion in both cases. A positive must satisfy the following conditions: c sk = c sk sk has no error sk +1 such that sk sk +1 , sk +1 has no error and sk+1 does not lie on a negative.
18
4. Our implementation in detail Here we also omit the requirement on the last actions to be the same. In JPF, the actions mean checking deadlock or assertion violation, but there is no counterpart for that in DiVinE. The requirement that the positive is not a prex of a negative is strengthened and helps lter out more positives that may end in an error. However, as we dont always nd all negatives, we may still end up with a positive that can continue to an erroneous state. That may or may not be a problem, depending on the kind of model and the kind of error. For example in ex1.dve, which models the rst example from [11], the second positive may continue and end up in an erroneous state. This has not been detected, because no negative passes through the last state of the positive. The following three trails illustrate our requirements on positives and negatives. For a system with one process (whose control location is the rst component) and one variable x (whose value is written in the second component) with an assertion that x = 0 in control location c, we may have the following counterexample: (a, 0) (a, 1) (b, 1) (c, 0) a negative may look like this: (a, 0) (b, 0) (c, 0) a positive may be: (a, 0) (a, 1) (b, 1) (c, 1) (d, 1) On the other hand, the following run which ends in a deadlock may not be a negative, because only runs with the same type of error as in the counterexample (assertion violation) are considered: (a, 0) (a, 1) (b, 1) (c, 2)
19
t := nd_counterexample(); ps := ns := ; i := k 1; while i 0 do (p, n) := explain_step(si ); ps := ps p; ns := ns n; (p, n) := extension_step(); ps := ps p; ns := ns n; i := i 1; end pos := extract_traces(ps); neg := extract_traces(ns); transition_analysis(pos, neg); invariant_analysis(pos, neg); transformation_analysis(pos, neg);
The basic algorithm remains in the same form, but the extension algorithm is changed. As opposed to the original approach of calling the extension algorithm whenever the basic algorithm crosses the maximum depth, we call it only after one step of the basic algorithm is done. The reason for this is the parallel infrastructure of DiVinE, which is based on BFS and does not allow a nested DFS search easily. Instead, we remember all the visited states behind the depth barrier and run the extension for these states in a batch. Both explain_step() and extension_step() return only the seed of negative or positive, the last edge of the trail. We nd and store the entire trail in extract_traces() using parent pointers after the entire search is done. This is a requirement of the parallel BFS algorithm, the trail is not immediately available as in the case of DFS. It also means that we get trails which are close to being the shortest paths from the initial state to the seed.
explain_step()
This is a parallel BFS reachability starting from the state si given as an argument. It checks the requirements for negative in each state and the requirements for positive in each edge. It marks states as visited to avoid seeing any state more than once. Thanks to this, all the calls to this function combined take time linear
20
4. Our implementation in detail in the size of the graph, which is divided among available processors on the machine (with some overhead). All states whose distance from si exceeds the depth limit are stored in an array for use in the next step.
extension_step()
Algorithm 4: extension_step Input: the set E of states behind the depth barrier, counterexample t Input: maximum depth d Result: adds negatives and positives to the global sets 1 foreach i E do 2 Extension(i, d); 3 end This function runs the pseudocode 2 for each state that was remembered in
explain_step(). Our implementation of Extension follows the pseudocode,
but it uses the parallel BFS infrastructure available in DiVinE. In each state, it selects only one successor, using the CAS synchronization primitive. When it is no longer possible to follow the chosen path, no attempt is done to retreat, the current search ends and we try the next alignment (algorithm Extension proceeds to the next iteration of the while loop on line 2). Time complexity of this algorithm depends on the number of states in the set E, on the length of counterexample t, on the number of matches of input state and the counterexample and on the number of outgoing edges on each state visited. Let mi , i E be the number of matches of state i with the counterexample t. Next, let Di,k , i E, k 1, 2, . . . , mi be the sum of outdegrees of states visited when processing state i and match number k. Then the time complexity of all calls to the function is Di,k |t| iE k{1,2,...,mi } A simpler upper bound is |t| |t| dm which uses the fact that the number of steps made by Extension for one starting state is never greater than the length of the counterexample times the maximal degree of states along the path (dm is the maximal degree of the graph). This part runs sequentially.
21
22
4. Our implementation in detail This causes the global and local copies of the variable to get out of sync and thats where an assertion is violated. The output of transition analysis restricted to the writer process looks like this:
ALL: (pos) ---------------------------------<0.0> / q0 -> wait { guard writer==0; effect writer = <0.2> / wait -> CS { guard X==0; effect X = 1, X_copy <0.3> / CS -> q0 { effect writer = 0; } ALL: (neg) ---------------------------------<0.0> / q0 -> wait { guard writer==0; effect writer = <0.1> / wait -> CS { guard X==1; effect X_copy = 0; } <0.2> / wait -> CS { guard X==0; effect X = 1, X_copy <0.3> / CS -> q0 { effect writer = 0; } ONLY: (pos) ---------------------------------ONLY: (neg) ---------------------------------<0.1> / wait -> CS { guard X==1; effect X_copy = 0; } CAUSE: (pos) ---------------------------------CAUSE: (neg) ---------------------------------<0.1> / wait -> CS { guard X==1; effect X_copy = 0; }
1; } = 1; }
1; } = 1; }
We can see that the transition wait -> CS which forgot to set X to the new value is present in in all(neg), which means that it was found in all the negatives, its also in only(neg), which means that it was not found in any of the positives. Both these facts result in the transition being present in cause(neg).
23
4. Our implementation in detail specify relations between the variables to avoid generating invariants comparing variables which have no semantic relation. As DiVinE does not have this information, we dont use this feature of Daikon. According to an email from the authors of JPF, their tool let the user specify these relations by hand. The following is the output of invariant analysis on our example model. Some lines were removed for brevity. We can see that the invariant that the local and global copies of X are equal is present in the positives, but not the negatives.
instrument0.neg:::POINT X == 1 readers one of { 0, 1 } writer one of { 0, 1 } r one of { 0, 1 } =========================================== instrument0.pos:::POINT X == W_0.X_copy X == 1 readers one of { 0, 1 } writer one of { 0, 1 } r one of { 0, 1 }
24
The positive part is empty, which means that the transformation suggests removing the subtrace presented as the negative part, which contains the erroneous transition.
25
4. Our implementation in detail can be considered as sequential. The analyses of the traces are sequential as they are not computationally intensive. See 5.3 for actual run times on different kinds of models. As for space complexity, no part of the program uses more than a constant amount of space per state, which gives space complexity linear in the number of states.
26
5. Algorithm Evaluation
We ran the algorithm on several models with several kinds of errors to evaluate its ability to help with locating the errors. Some of the models were taken from the BEEM [5] database, some of them are models from the Software-artifact Infrastructure Repository [8] rewritten in DVE and the rest of them were created during the evaluation. This section contains the list of tested model instances along with a short description of the model, of the error and of the output of the explain algorithm. At the end of each evaluation, there is a one-word summary. It can be one of the words useless, touches, notbad and great, which represent four levels of usefulness for localising the error. The complete information about all the runs, including the exact input and output, command line parameters and run time can be found on the attached CD. See Appendix B for information about how to use the data. Only the runs where just one worker is used are repeatable, with multiple parallel workers the code is not deterministic anymore and may nd a different counterexample, positives and negatives. At the end of the section, we provide a summarization of the results.
5.1. Outputs
5.1.1. lamport_47.244.dve
This is a DVE model of the Lamports Mutual Exclusion Protocol [15] taken from BEEM.
Error description:
After several steps, the model ends up in a deadlocked state because of the error. The problem lies on transition q2 -> q22 {guard y != 255;effect b[x]=0;} of every process. The correct effect is b[i] = 0 where i is the index of the process. When x is different from i, which happens when another process tries to enter the CS, the error occurs.
27
5. Algorithm Evaluation
Evaluation Transition analysis The analysis outputs many transitions and the known offender is burried among them. Invariant analysis Instrumentation points were chosen based on the knowledge of the error in the model, but were not specied overly accurately. There is a large amount of invariants, but none of them provides any useful insight.
Transformation analysis All the rst three smallest transformations are symmetric and propose removing certain transitions from the negative. Unfortunately, neither this analysis is helpful in nding the problem. Summary: touches
5.1.2. lamport_47.244.dve
This is another run of explain on the same model with the same error. This time we found a different counterexample and the analysis for it is more successful.
Evaluation Transition analysis This analysis outputs the one problematic transition in only(neg), along with another transition. This is a good result, since only(neg) is an important group and the signal to noise ratio in the group is good. Invariant analysis As before, there are too many invariants and the output is
not helpful.
5.1.3. msmie_13.65.dve
This is a model of a protocol for communication between processors in a real-time control system. The model for this protocol in BEEM has an error, which causes it to deadlock.
28
5. Algorithm Evaluation
Error description:
When studying the model itself and the location of the deadlock, it is rather easy to see where the error lies. The master process template is waiting in state change for the rst element of b to become MASTER where clearly the intention is to wait for any element of the array to become MASTER. A similar error occurs in the transition from no_readers to change.
Evaluation
The algorithm nds no positives, likely because all candidates are ltered out, because they can continue into a negative. With no positives, none of the three analyses can be used, but at least we know that there must be a fundamental error which does not allow the existence of a positive. Summary: useless
5.1.4. train-gate_53.277.dve
This is a train gate controller which controls which of the several trains can cross a bridge. The model comes from BEEM.
Error description:
There is a deadlock caused by incorrect translation of the system into DVE. This model uses a global variable e together with messages sent via global channels. The recipient of the message is stored in the variable e. However, a race condition causes the variable to be out of sync.
Invariant analysis We placed an instrumentation point at the end of all negatives. As usual, Daikon outputs a large number of invariants which is hard for a user to process. However we can see that in all negatives e = 7 (e is incorrectly represented as an array from which only the rst element is used) and in all positives e = 1. This may be useful, because the error is a race condition on this variable.
29
5.1.5. rw_1.dve
An implementation of the readers/writers lock with readers preference. In this model variant, we have only one reader, to make the negatives and consequently error analysis simpler.
Error description:
The error lies in the reader not releasing the writer lock when it is done, causing a deadlock on the next operation.
Invariant analysis This type of analysis is not suitable for this problem, so we
skipped it.
5.1.6. pushparent_1.dve
We created a model for a function which adds an element to a set represented by a xed size array. The C code for the function looks like this:
1 2 3 4 5 6 for ( size_t ix = if ( parents[ if ( parents[ parents[ ix ] return true; } 0; ix < NUM_PARENTS; ix ++ ) { ix ] == n ) return false; ix ].valid() ) continue; = n;
The array should contain all elements inserted into it and no element should be repeated. This code is correct, but there is an articial error in the model.
30
5. Algorithm Evaluation
Error description:
The rst error is omitting the return statement on line 5. It causes the element to be inserted multiple times into the set.
Evaluation Transition analysis Although the erroneous transition is printed in the group
all(neg), there are almost all transitions of the model, so this is not very helpful. The more specic groups are either empty or unhelpful.
Invariant analysis This type of analysis is not suitable for this problem, so we
skipped it.
Transformation analysis All three transformations suggest replacing the sequence of actions which adds a 1 (P:{1|1|0} is the incorrect state in our counterexample) with a sequence which starts to add a 2 and thus avoid the problem. Unfortunately, it does not lead any closer to the problematic transition. Summary: useless
5.1.7. pushparent_2.dve
Error description:
The second error is exchanging lines 2 and 3, so an element already present in the set can be overlooked and inserted at the end of the list.
Evaluation Transition analysis This error is rather complex, it changes three transitions,
so its not surprising that transition analysis does not output anything useful.
Transformation analysis The transformations only suggest changing the initial state of the set so that adding the element does not result in an assertion violation. Summary: useless
31
5. Algorithm Evaluation
5.1.8. rw_2.dve
Error description:
The error lies in the reader not releasing the writer lock when it is done, causing a deadlock on the next operation.
Evaluation Transition analysis At a rst glance, the rst interesting group (only(neg))
contains many transitions, but theyre just two copies of readers actions, one for the rst reader and one for the second. It provides the same information as rw_1.dve.
Transformation analysis As with rw_1.dve, the transformation analysis suggests removing readers action altogether. Summary: touches
5.1.9. bakery_err1.dve
The bakery lock for 2 participants, coming from BEEM.
Error description:
The error is located in choose -> choose { guard j<2 and number[j]>max; effect max = number[j], j = j + 2;}, where we should increment just by 1.
Evaluation Transition analysis Here the transition analysis found the culprit and put it right into the cause(neg) group. Excellent. Transformation analysis All the transformations suggest changing the action where j is incremented by 2 to incrementing just by 1 by using different values in the number[] array. The error is rather severe and the system in location choose can choose (deterministically) between two transitions, one correct and one incorrect, thus making transition analysis easier. Summary: great
32
5. Algorithm Evaluation
5.1.10. airline_5_2.dve
This is the airline model from Software-artifact Infrastructure Repository recreated in the DVE language.
Evaluation
This is an example of a model which cannot be used with the explanation algorithm. Both the denition of negatives and positives require the last locations to be the same, but in this model, having the same locations means that we have reached the error. Therefore no positives can be found. The same problem occurs in models of mutual exclusion protocols where we are looking for the goal P1.CS && P2.CS. No positives can exist with the same locations as the counterexample, whose locations are P1.CS and P2.CS. Summary: useless
5.1.11. nested_monitor_5.dve
This is the nested_monitor model from Software-artifact Infrastructure Repository recreated in the DVE language.
Error description:
The locking of buf_lock is superuous and it causes the system to deadlock when a consumer tries to read before anything is written.
Evaluation Transition analysis There is just one negative, the original counterexample, which has only one action, so there is not much to compare. Invariant analysis There is no data in this model, except the synchronization primitives, so invariant analysis is unsuitable for this task. Transformation analysis All the transformations suggest inserting rst some
elements into the buffer, because then the consumer does not deadlock when trying to get a value. The only information this provides is that its the consumer who blocks. Summary: useless
33
5. Algorithm Evaluation
5.1.12. array_partition_4.dve
This is the array_partition model from Software-artifact Infrastructure Repository recreated in the DVE language.
Error description:
One of the transitions, go_up -> go_up { guard lo <= hi && a[lo] <= pivot; effect lo = lo + 2 } is incorrect, it should add just 1 to lo. This way it skips some elements, specically in this run it skips the 1 at position 1 ending up with an array that is not divided by the pivot (0) into two halves.
Evaluation
All the positives have such initial values of the array that the error does not occur.
Transition analysis The only nonempty groups are all(neg) and all(pos), both
contain, among others, the problematic transition. The other groups are empty which follows from what kinds of positives were found. No useful information here.
5.1.13. array_partition_4b.dve
Error description:
The error is located in the transition swap -> ini2 { guard lo > hi; effect tmp = a[hi],a[hi] = a[lo], a[lo] = tmp; } where there is < instead of a >. This causes a deadlock in cases where lo < hi, because no transition is dened for that case.
34
5. Algorithm Evaluation
Evaluation Transition analysis The group only(pos) contains just two transitions, one of them being the incorrect one. This may imply that the positives managed to avoid the deadlock by taking this transition, and one might infer that the condition has a problem. But its certainly not a direct line of thought. Invariant analysis We placed one instrumentation point before the algorithm
starts. Like in the previous run, the positives differ from the negatives in selecting an initial conguration which does not cause the error to happen. Unfortunately, the invariants are unable to capture this or provide any insight.
5.1.14. ConcurrentLinkedQueue.dve
This is a model of the offer method from the class java.util .concurrent .ConcurrentLinkedQueue from OpenJDK, obtained from
http://hg.openjdk.java.net/jdk7/jdk7/jdk/raw-file/00cd9dc3c2b5 /src/share/classes/java/util/concurrent/ConcurrentLinkedQueue.java
The method is faithfully translated into DVE, using some m4 macros to make the translation process faster.
Error description:
This version of the model has an error where we use a nonatomic compare and set instead of p.casNext(null, n), causing a race condition. The model checker nds a counterexample where this race condition occurs and the system ends up in an incorrect memory state.
Evaluation Transition analysis This analysis outputs many transitions in the any groups
and some transitions in only(pos). From there we can see that the negatives never take the rst or third branch in the innermost if statement (the compound one, with three branches and two ifs). So the negatives always take the middle
35
5. Algorithm Evaluation branch, because p.next == null, for all processes. This might be a hint at the race condition, but probably not very strong.
Invariant analysis Not used. Transformation analysis The transformations suggest a different interleaving
of parts of negatives and positives, correctly indicating the race condition. Summary: notbad
5.1.15. ConcurrentLinkedQueue_2.dve
Error description:
There is an error when accessing memory, instead of looking at location x, we look at x + 2 in the action p = succ(p);
Evaluation Transition analysis This analysis shows 6 transitions in only(neg), 3 for the second and the same 3 transitions for the third process, so the user only has to analyze 3 transitions if they notice this symmetry. The three transitions represent one branch of the if statement, the one where the problem is located and indeed, one of them is the guilty transition. We can conclude that this analysis was successful. Invariant analysis Not done. Transformation analysis Corresponds to transition analysis the transformations suggest replacing a trace containing the erroneous transition with one which does not contain it. Summary: notbad
5.1.16. ConcurrentLinkedQueue_3.dve
Error description:
This version of the model also contains the iterator, with an error where nextItem is not set before returning an element.
36
5. Algorithm Evaluation
Evaluation Transition analysis The transitions given in only(pos), only(neg) are not relevant to the error.
Invariant analysis The list of invariants is longer than the model itself, so
its not helpful in any way. We can notice, however, that I_0.count is always smaller than 3 in the positives, which means that the assertion is true, because the left side of the implication is never true. Which in this case means, that the positive is still an incorrect run of the system.
5.1.17. rw_3.dve
Error description:
In this version, we dont lock the r mutex atomically, which gives an opportunity for a race condition. This results in two processes being in the CS at the same time.
Transformation analysis The rst two transformations use positives that also
contain the error and they dont contain the problematic place at all. The last one uses a good positive, but it does not show the race condition that occurs. Summary: useless
37
5. Algorithm Evaluation
5.1.18. rw_4.dve
Error description:
The writer does not acquire the writer lock atomically, leaving an opportunity for a race condition.
Evaluation Transition analysis Only(neg) contains two transitions, but they are not related to the error. All(neg) contains the two erroneous transitions, but they are not immediately noticeable among the total 5 transitions.
5.1.19. rw_5.dve
Error description:
Here two lines of are exchanged, readers = readers + 1 and if (readers==1)
writer.lock().
Evaluation Transition analysis There is one transition in cause(neg), but it is not relevant
to the problem. Only(neg) contains some more transitions, one of them is the exchanged action, but it is not obvious that there is the problem from the listing.
38
5. Algorithm Evaluation
5.1.20. rw_6.dve
Error description:
The error in this model was designed specically to be detected by transition analysis. In the writer process, there are two transitions that are almost equal, except for the error. The system can take either of them, making it possible to nd a positive with the correct transition and a negative with the incorrect transition.
Evaluation Transition analysis The algorithm correctly identies the erroneous transition
and shows it in cause(neg).
Transformation analysis All the transformations suggest removing the sequence of actions of the writer where it uses the incorrect transition. Summary: great
5.1.21. rw_7.dve
Error description:
In this version, we have omited the if (readers == 0) condition before releasing the writer lock. This causes a violation of the mutual exclusion property.
Evaluation Transition analysis The group only(pos) contains one transition, but its not
the one with the problem. However, it works with the writer mutex, which may indicate that theres something wrong with that, which is true.
39
5. Algorithm Evaluation
5.1.22. train_gate_corrected.dve
A corrected version of train_gate model where we xed the problem with the global variable e.
Error description:
There is a race condition caused by incorrect modelling of the system. When the gate is in state S6, where it got an appr signal and is about to send a stop signal, the train continues to move and crosses the bridge. The gate is then unable to send the stop signal. In the Uppaal model, the state S6 is committed, which does not allow the train to move until the gate leaves S6. DVE also has committed states, but the semantics is different and it cannot be used.
Evaluation
This run was examined without prior knowledge of the error.
Transition analysis The analysis shows that in positives the train 5 arrives
before train 1 whereas in negatives it happens in the opposite order.
5.1.23. termination.dve
An incorrect attempt at constructing a termination detection algorithm for a situation where two processes communicate using shared memory. The goal of the processes is to set all values in their portion of the memory to 0 while the processes walk through the memory sequentially. At any time, one of the processes can put a 1 in the other processs memory. Termination is detected by remembering if this happened during the last swipe.
Error description:
When nothing is changed in the partner processs memory during a round, we detect termination. However, this is not enough as we may still need to act upon the 1s our process received during the current round. This error is rather hard to detect, we dont expect explain to be very successful.
40
5. Algorithm Evaluation
Evaluation
The explain algorithm nds only two positives, so it does not have much information for analysis.
5.1.24. nested_monitor_5a.dve
Error description:
Producer handles the queue incorrectly, decreasing both the empty and full counters.
Evaluation Transition analysis Almost all groups except all(neg), all(pos) are empty
which makes this analysis useless.
Transformation analysis The transformations suggest removing a part of execution where the variable e_mpty reaches 0 which causes the deadlock. That sequence of actions is inevitable though, so transformation analysis isnt helpful either. Summary: useless
5.1.25. nested_monitor_5b.dve
Error description:
In this version, the consumer does not use an atomic increment of the variable e_mpty. This naturally leads to incorrect values and eventually to a deadlock.
41
5. Algorithm Evaluation
Evaluation Transition analysis Almost all groups except all(neg), all(pos) are empty which makes this analysis useless. Transformation analysis Even though this is a race condition, all transformations just suggest removing a part of the execution, making the analysis unhelpful. Summary: useless
5.2. Summary
The complete statistics for the one-word summary is given in the table. In most cases explanation is unsuccessful. evaluation count % useless 12 48 % touches 8 32 % notbad 3 12 % great 2 8% total 25 100 %
All of them except ConcurrentLinkedQueue.dve have found the error using transition analysis. That was possible because the models contained an alternative to the erroneous transition and therefore the error could be found by comparing transitions from positives with those from negatives. In rw_6.dve this kind of error was inserted on purpose, in the other cases the errors just happened to be this way. In ConcurrentLinkedQueue.dve the error is a race condition. In this case, transformation analysis was able to nd it. Similarly in rw_4.dve. In the remaining runs with a race condition, explain was not able to nd the cause. In
42
5. Algorithm Evaluation
train_gate_corrected.dve the error lies in incorrect modelling of the system
and the transformations given by the analysis do not point to the real error. The remaining runs containing a race simply didnt show any useful transformation. Either the positives used were actually also wrong or were not similar enough to show the problem. Maybe a more specic condition for selecting the positives could help.
43
5. Algorithm Evaluation
Failing positives Positives may pass through the error location and still be
useful for diagnostics, but in some cases it does not work very well. This is the case of rw_4.
Positive misses the problem Positives may also completely circumvent the
problem and thus be useless for any comparison with negatives. This is a very subtle distinction and it depends highly on the model and the kind of error it has. Occurs in pushparent_1, pushparent_2, Concurrent LinkedQueue_3, rw_3, rw_7, termination, nested_monitor_5b.
Too many invariants The tool outputs so many invariants that even though
some of them may be insightful, it takes a lot of work for the user to consider them all. This happens practically in all models where invariant analysis was used.
44
5. Algorithm Evaluation
Too many transitions Transition analysis outputs too many transitions. This
is similar to the previous problem except that it does not happen so often. Occurs only in lamport_47.244.
Transition and transformation analysis say the same thing This is not a
problem per se, it just means that transformation analysis brings no new information. Occurs in train-gate_53.277, rw_1, ConcurrentLinked Queue_2, train_gate_corrected.
Symmetric transition analysis Transition analysis prints very similar transitions, usually belonging to multiple instances of the same process. If the algorithm had information about multiple instantiations of a process template, this could be handled automatically. Occurs in rw_2, termination, Concurrent LinkedQueue_2.
45
5. Algorithm Evaluation
5.3. Performance
We measured the run time of the program on a multicore machine using three different models and using a different number of CPU cores. The table 5.1 contains all the results. It contains run times of parts of the entire algorithm in seconds. The parts not included in the results such as Extension, Transformation and Transition analyses were skipped because their run time was negligible. See the complete run results on the CD for complete information. The lamport model ran out of memory when run in a two-core conguration, possibly due to a different order of exploration. The time in the column explain is the time taken to nd negatives and positives by the basic algorithm. The time in the column inv. a. is the length of invariant analysis, dominated by invoking Daikon. We used a machine with a quadcore Intel Xeon 5130 CPU on 2.00GHz and with 16 GB of RAM to perform the measurements. Search depth was set to 100 and up to 50 positives and 50 negatives were found. model train-gate_53.277.dve cambridge_18.92.dve lamport_47.247.dve 1 core explain inv. a. 26,16 2,12 22,18 2,3 61,87 1,81 2 cores explain inv. a. 14,11 2,42 8,61 2,58 4 cores explain inv. a. 9,85 2,63 6,67 2,25 25,1 1,92
The table 5.2 gives an idea of the sizes of the models. We can see that simple reachability is noticeably faster than the search for positives and negatives. model train-gate_53.277.dve cambridge_18.92.dve lamport_47.247.dve states 5904140 3354295 8717688 1 core reachability 5,14 0,12 2,7
Specic arrangements were necessary to obtain repeatable results. It was not sufcient to simply run the algorithm on the same input, because when multiple cores were used, states are processed in an nondeterministic order, which is likely to cause the program to nd a different counterexample and
46
5. Algorithm Evaluation make the explanation completely different. To mitigate this problem, we stored the counterexample found in the singlecore case and used it for subsequent parallel runs. For this reason, reachability times for multicore runs have no informative value and we only include the time for Explain. This approach required a change to the program code which can be found in a patch called multicore_provisions on the CD.
CPU cores
47
Bibliography
[1] Fadi A. Aloul, Arathi Ramani, Igor L. Markov, and Karem A. Sakallah. Pbs: A backtrack search pseudo boolean solver. In In Symposium on the theory and applications of satisability testing (SAT), pages 346353, 2002. [2] Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. The MIT Press, 2008. [3] Thomas Ball, Mayur Naik, and Sriram K. Rajamani. From symptom to cause: Localizing errors in counterexample traces. In In Principles of Programming Languages, pages 97105, 2003. [4] Thomas Ball and Sriram K. Rajamani. The slam project: debugging system software via static analysis. SIGPLAN Not., 37:13, January 2002. [5] Benchmarks for explicit model checkers. models/. Accessed: 28th Sep 2010.
http://anna.fi.muni.cz/
[6] Sagar Chaki, Alex Groce, and Ofer Strichman. Explaining abstract counterexamples. SIGSOFT Softw. Eng. Notes, 29:7382, October 2004. [7] Divine homepage. http://divine.fi.muni.cz. Accessed: 28th Sep 2010. [8] Hyunsook Do, Sebastian G. Elbaum, and Gregg Rothermel. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Software Engineering: An International Journal, 10(4):405435, 2005. [9] Michael D. Ernst. Dynamically Discovering Likely Program Invariants. Ph.D., University of Washington Department of Computer Science and Engineering, Seattle, Washington, August 2000. [10] Alex Groce. Error explanation with distance metrics. In In Tools and Algorithms for the Construction and Analysis of Systems, pages 108122, 2004.
48
Bibliography [11] Alex Groce and Willem Visser. What went wrong: Explaining counterexamples. In In SPIN Workshop on Model Checking of Software, pages 121135, 2003. [12] Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming. Morgan Kaufmann, March 2008. [13] Tom Janouek. Nvrh modelovacho jazyka nstroje divine. Bachelors thesis, Faculty of Informatics, Masaryk University, May 2010. [14] HoonSang Jin, Kavita Ravi, and Fabio Somenzi. Fate and free will in error traces. In Proceedings of the 8th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 02, pages 445459, London, UK, 2002. Springer-Verlag. [15] Leslie Lamport. A fast mutual exclusion algorithm. ACM Transactions on Computer Systems, 5(1):111, 1987. [16] Quick guide through the dve specication language. http://divine.fi. muni.cz/page.php?page=language. Accessed: 28th Sep 2010.
49
A. Contents of the CD
mainstore Contains complete records in Python source format of all the per-
50
where <x> is substituted with the number of the data le to read. The variable data then contains a list of dictionaries which can be further processed.
Setting the working directory correctly is important, the script expects the data to be found in ../mainstore.
51