Structured Software Testing
Structured Software Testing
(PSE)
Máster Universitario en Ingeniería y Tecnología de Sistemas Software
2021-2022
Authors
T. E. J. Vos
N. van Vugt-Hage
External contributors
J. M. Bach (Chapters 3 and 4)
prof. dr. G. Fraser (Chapter 10)
A. Gambi (Chapter 10)
prof. dr. E. Miranda (Chapter 11)
dr. T. Ruys (Chapter 13)
6
Contents
1 Why do we test? 15
2 Software quality 17
3 Errors, faults and failures 19
4 Vocabulary and terminology in software testing 23
5 Some famous cases of software failure 24
6 What do we test? We cannot test everything! 26
7 What do we test? Test case design techniques 28
8 What is the best set of test cases? 30
9 How can we evaluate the quality of a test suite? 31
9.1 Coverage criteria 31
9.2 Mutation scores 33
10 Levels of testing 33
10.1 Unit testing 34
10.2 Integration testing 34
10.3 System testing 37
10.4 Acceptance testing 39
11 And after we change something: regression testing 40
12 Test project management 41
13 The philosophy of this course 41
2 Testing is model-based 46
Overview 47
1 Introduction 53
2 Testing is not A science; testing IS science. 54
3 Formality vs. informality 55
4 Agency vs. algorithm 56
5 Deliberation vs. spontaneity 56
6 Tacit vs. explicit 57
7 How exploration typically works in software testing 58
8 Testing is not proving; verification is not testing 59
7
Universidad Politécnica de Valencia Structured Software testing
1 Introduction 63
2 Recording of an exploratory testing process 66
2.1 What is the nature of this testing? 70
2.2 How did I choose that test string? 71
2.3 How did I choose to use a loop? 71
2.4 How did I know what the Reflow function is supposed to do? 71
2.5 What should this kind of testing be called? 71
2.6 What about the oracle? 72
2.7 What is the next step? 72
2.8 How about a de Bruijn sequence? 74
2.9 Progressively smaller words 74
2.10 Exactly sized lines 76
2.11 Try the interesting characters 77
2.12 Consecutive special characters 77
2.13 Have we tested enough? 77
2.14 Final bug list 78
1 Introduction 81
2 Defect taxonomies 83
2.1 Boris Beizer’s taxonomy 83
2.2 Kaner, Falk, and Nguyen’s taxonomy 84
2.3 Whittaker’s ”How to Break Software” taxonomy 84
3 Catalogs and checklists 85
3.1 Catalogs 85
3.2 Checklists 87
4 Your taxonomy, catalog or checklist 90
1 Equivalence classes 93
2 Partitioning the inputs 94
3 History and related work 97
4 Making a model 97
4.1 Example: testing at GUI level 98
4.2 Example: testing at code level 102
4.3 Example: testing at class interface level 104
5 Exercises 106
6 Coverage criteria 107
6.1 All Combinations coverage 108
6.2 Each Choice coverage 109
6.3 Other ways of combining 110
6.4 Invalid Part coverage 111
8
CONTENTS
7 Exercises 112
8 Faults that can be found 113
1 Introduction 115
2 Boundaries: ON and OFF points 116
3 Finding ON and OFF points for different types of values 117
3.1 Numerical types 117
3.2 Non-numerical types 118
3.3 User-defined types 118
3.4 n-dimensional types 120
4 1x1 boundary coverage 122
5 The domain test matrix 122
5.1 Domain test matrices for generate-grade 123
5.2 Exercises 128
6 Faults that can be found 130
1 Introduction 135
2 Decision tables 136
2.1 Extended/limited entry decision tables 138
2.2 Implicit variant: do not care (DNC) 140
2.3 Implicit variant: cannot happen (CNH) 140
2.4 Implicit variant: do not know (DNK) 141
2.5 Summary 142
3 Checking the decision table 144
3.1 Check implicit variants 144
3.2 Check decision table properties 144
3.3 Check testability 145
4 Coverage criteria for decision tables 145
5 Exercises 145
1 Introduction 149
2 Faults due to interactions of conditions 150
3 Combinatorics 151
4 Orthogonal and covering arrays 153
4.1 Orthogonal arrays 153
4.2 Covering arrays 156
5 Challenges for practical application of combinatorial testing 157
9
Universidad Politécnica de Valencia Structured Software testing
1 Introduction 161
2 How tests find faults 162
3 Fault-based testing 164
4 Mutation testing 165
4.1 Central hypotheses 168
4.2 Equivalent mutants 169
5 Scalability 171
5.1 Skipping unreachable mutants 171
5.2 Mutant schemata 171
5.3 Mutant sampling 173
6 Mutation testing in practice 173
6.1 Mutation testing at Google 174
6.2 Mutation testing for fun 174
7 Summarising the mutation testing method 175
1 Introduction 178
2 Classification trees 178
3 Modelling with a classification tree 179
4 Test relevant aspects 181
5 An example: The Find command 182
6 Combinatorial coverage criteria 184
7 Designing the test cases 185
8 Summarising the classification tree method 185
9 Tool support 187
10 More examples 187
10.1 The flexible manufacturing system 187
10.2 The audio amplifier 193
10.3 The password diagnoser 195
11 Exercise 201
1 Introduction 205
2 Graphs 206
3 Paths in graphs 207
10
CONTENTS
1 Introduction 229
2 Labelled transition tystems 231
2.1 Conformance 234
2.2 Coverage 236
3 Test case generation from LTS 237
4 Axini Modelling Language 240
4.1 Model 240
4.2 Communication: labels 241
4.3 Non-deterministic choice 241
4.4 Loop: repeat 243
4.5 States and goto 243
4.6 Data: parameters 245
4.7 State variables 246
4.8 Advanced features 248
1 Introduction 252
2 Planning: the Master Test Plan (MTP) 252
2.1 The risk analysis: why do we test? 254
2.2 Test strategy: what and how will we test? 256
2.3 Organisation: who will test and where? 256
2.4 Time schedule and budget 257
3 Monitoring: progress through defect tracking 257
4 Reporting: about the bugs 257
11
Universidad Politécnica de Valencia Structured Software testing
12
CONTENTS
13
Chapter contents 1
Overview 15
1 Why do we test? 15
2 Software quality 17
3 Errors, faults and failures 19
4 Vocabulary and terminology in software testing 23
5 Some famous cases of software failure 24
6 What do we test? We cannot test everything! 26
7 What do we test? Test case design techniques 28
8 What is the best set of test cases? 30
9 How can we evaluate the quality of a test suite? 31
9.1 Coverage criteria 31
9.2 Mutation scores 33
10 Levels of testing 33
10.1 Unit testing 34
10.2 Integration testing 34
10.3 System testing 37
10.4 Acceptance testing 39
11 And after we change something: regression testing 40
12 Test project management 41
13 The philosophy of this course 41
14
Chapter 1
OVERVIEW
This chapter, as the title implies, introduces the why, what and how of
software testing. It introduces some of the terminology used in soft-
ware testing, and presents some famous software failures due to in-
sufficient testing. It discusses different phases of testing and types of
testing. Moreover, it describes how testing needs to be done at different
levels and how all test processes need to be managed.
The chapter will also discuss the philosophy of this course and how we
consider test case design as being both model-based and exploratory.
Some might find this chapter long and somewhat boring. We under-
stand. However, this chapter is necessary to ensure that we all use the
same terminology and are able to communicate with the rest of the field
of software testing. Moreover, it provides an overview of software test-
ing.
LEARNING GOALS
After studying this chapter, you are expected to:
– understand why testing is important and what the purpose of test-
ing is
– understand the concepts and terminology of software testing as
used in this course
– have an overview of various aspects that comprise the field of soft-
ware testing: test case design, test levels, test quality, test manage-
ment.
CONTENTS
However, due to a mistake, the comma was forgotten, and so the tele-
gram read:
15
Universidad Politécnica de Valencia Structured Software testing
One forgotten comma cost the sender of the telegram lots of money!
Boris Beizer, in one of his seminal books on software testing [11], dis-
cusses the purpose of testing in terms of five phases of maturity a test
process can have. This maturity is characterised by the goals and thoughts
of the testers:
PHASE 1 - Thinking that the purpose of testing is to show that the soft-
ware works.
PHASE 2 - Thinking that the purpose of testing is to show that the soft-
ware does not work.
Although looking for failures will make us better testers, these negative
goals also have side-effects. One important side-effect is team morale:
testers want to find the mistakes that programmers are trying hard not
to make. Consequently, Phase 2 testing goals put testers and develop-
ers in an adversarial relationship and in practice this does not create an
ideal working atmosphere. Another problem in this phase is to know
what to do when no failures are found. Is the software very good? Or
is the testing very bad? When do we know that we can stop testing?
Many share the view that Phase 2 is where most of the software indus-
try currently is [4].
This phase’s thinking is nothing more than accepting the limits of test-
ing. Edsger Dijkstra, the famous Dutch computer pioneer, already said
it in 1969:
16
Chapter 1 Introduction to software testing
In this phase, we know what we can and cannot do with testing. Testa-
bility of software becomes the new goal, meaning that software will
be constructed in such a way that it makes testing easier. First of all
this will reduce the effort of testing and second, and more importantly,
testable code has fewer bugs than code that is hard to test.
EXERCISE 1.1
Describe in your own words what the difference is between testing and
debugging.
EXERCISE 1.2
Describe in your own words what ”higher quality software” (as stated
in PHASE 4) would mean.
It depends . . .
17
Universidad Politécnica de Valencia Structured Software testing
18
Chapter 1 Introduction to software testing
DEFINITION 1.1 Quality is a subjective term for which each person or sector has its own
definition. In technical usage, quality can have two meanings:
There exist many quality models that define all sorts of quality characteris-
tics for software products that can be used to define these two mean-
ings. Famous computer scientists like Barry W. Boehm [19] already
started defining characteristics of software quality in 1978. The most
recent and detailed definition of a quality model for software products
can be found in the series of standards ISO/IEC 25000, also known as
SQuaRE (System and Software Quality Requirements and Evaluation)
[3]. Specifically part 25010 describes the model, consisting of character-
istics and subcharacteristics, for software product quality (cf. the mean-
ing in item 1) in Definition 1.1: satisfy needs), and software quality in
use (cf. item 2) in Definition 1.1: quality in use). In Figure 1.1 the quality
characteristics from this model are depicted.
The field of software quality does not only discuss quality of products,
it also looks at processes. Software development processes for design-
ing the software are for example requirements, design, implementation,
testing, maintenance, et cetera. More about this is in Section 1.12 and
Chapter 14.
DEFINITION 1.2 Error A human action that produces an incorrect result, for example a
mistake, a misunderstanding, a misconception, et cetera.
3 http://www.istqb.org/
19
Universidad Politécnica de Valencia Structured Software testing
DEFINITION 1.3 Fault or defect A flaw in a component or system (e.g. an incorrect state-
ment or data definition) that can cause the component or system to enter
an incorrect state (e.g. variable gets assigned the wrong value). A fault,
if encountered during execution, may cause a failure of the component
or system but it can also go unnoticed.
DEFINITION 1.4 Failure A deviation of the component or system from its expected deliv-
ery, service or result.
Error
The programmer has made a mistake writing this code. Maybe the pro-
grammer made a typo: typing 1 instead of 0. Maybe the programmer
did not know that in Java the first element of an array resides at index
0. Maybe the programmer re-used some code and forgot to adjust the
index.
Fault
As a result of this error, the first element in the array is never checked
to be zero and so is not counted if it happens to be zero.
Failure
The fault only propagates to a failure that is visible to the user when
numZero is called with an array that has a zero in the first element:
input [0, 4, 6, 8]
expected result 1
actual result 0
verdict FAILURE
20
Chapter 1 Introduction to software testing
If there is no zero in the first element, the fault will be executed but does
not result in a failure.
input [1, 4, 0, 8]
expected result 1
actual result 1
verdict PASS
/**
* Find last index of element.
*
* @param x array to search
* @param y value to look for
* @return last index of y in x; -1 if absent
* @throws NullPointerException if x is null
**/
public int findLast(int[] x, int y) {
for (int i=x.length-1; i>0; i--) {
if (x[i] == y) {
return i;
}
}
return -1;
}
/**
* Find LAST index of zero.
*
* @param x array to search
* @return index of last 0 in x; -1 if absent
* @throws NullPointerException if x is null
**/
FIGURE 1.2 Faulty programs for Exercise 1.3 adapted from [4].
EXERCISE 1.3
In Figures 1.2 and 1.3 there are four faulty programs. For each of the
programs:
a Explain what is wrong with the code. Describe the fault precisely
and propose a modification to the code.
b If possible, give a test case that does not execute the fault. Briefly
explain why.
21
Universidad Politécnica de Valencia Structured Software testing
/**
* Count positive elements in array.
* @param x array to search
* @return number of positive elements in x
* @throws NullPointerException if x is null
**/
public int cntPositive(int[] x){
int count = 0;
for (int i=0; i<x.length; i++) {
if (x[i] >= 0) {
count++;
}
}
return count;
}
/**
* Count odd or positive elements in an array.
* @param x array to search
* @return number of odd or positive elements in x
* @throws NullPointerException if x is null
**/
public int oddOrPos(int[] x) {
int count = 0;
for (int i=0; i<x.length; i++) {
if (x[i]%2 == 1 || x[i] > 0){
count++;
}
}
return count;
}
FIGURE 1.3 More faulty programs for Exercise 1.3 adapted from [4].
c If possible, give a test case that does execute the faulty code but does
not result in failure. Briefly explain why.
d If possible, give a test case that does execute the faulty code and re-
sults in failure. Briefly explain why.
We end this section by explaining a few more words used in this con-
text: bug, issue and incident. Perhaps you have already discovered
on the Internet that the software engineering community has not yet
reached a consensus on which words to use for what.
Bug
If you look up the word bug on the Internet, you will find definitions
that include all the words we have defined above. Some will define it
as an error, others will define it as a fault, and yet other definitions relate
bugs only to failures. On Wikipedia4 they play safe by mentioning them
all:
22
Chapter 1 Introduction to software testing
Also, you will find many people having all kinds of opinions about
whether the word bug may still be used. We will not go into that here
in this course. The only thing we do want to tell you about the word
bug is related to Grace Hopper, just because this is a nice historical com-
puter science story. Whether or not Grace Hopper is the first to have
issued the term “computer bug” is another thing, but there is an actual
logbook of the Mark II Aiken Relay Calculator while it was being tested
at Harvard University, on 9 September 1947. And in this logbook you
can find the first actual case of a bug being found (see Figure 1.4).
Incident
The word incident is often used when something suspicious has hap-
pened, but it is not yet clear what it is. It is a symptom that something
is wrong and that alerts the tester or user that a failure might come.
Issue
The word issue is used in an even broader sense to state that something
is going on but without making claims about where it comes from, if it
is a failure due to some fault, or whether it should be fixed. This ter-
minology is sometimes used such that the customer cannot claim that
things should be fixed during the warranty period of a software pro-
gram since they are not recognised as real failures. Also, the word is
sometimes used to avoid offending programmers and prevent harm to
the team morale.
In this section we want to look a bit more into the history of the software
testing community and why it has not yet reached a consensus on the
vocabulary and terminology in software testing.
23
Universidad Politécnica de Valencia Structured Software testing
Organisations like ISTQB5 , IEEE6 and ISO7 have tried to define vocab-
ulary for software testing, to come to a consensus and reach standard-
isation. Standards do exist and are being worked on to deal with this,
but there is still a long way to go.
In May 2007, the ISO Software Testing Group (WG26) of the ISO/IEC
JTC1/SC7 Software and Systems Engineering committee started the de-
velopment of ISO/IEC/IEEE 29119 [1], a series of five international
standards for software testing. The goal of the standard: to define vo-
cabulary, processes, documentation, techniques, and a process assess-
ment model for testing that can be used within any software develop-
ment life-cycle.
The first release in 2013 kicked off a whole lot of blog posts from testers
around the world against the standards. In August 2014 an online pe-
tition STOP 29119 was created [2] to suspend publication of parts 4
and 5, and to withdraw part 1, 2 and 3. The standard was considered
”dangerous” and to ”put focus on the wrong things”. A ”war” against
29119 was declared and it became evident that the standard had failed
to bring consensus and agreement.
It is outside the scope of this section to go into all the arguments and
reasons that were given for the petition. A nice overview of opinions
and viewpoints can be found here [103]. The answer of ISO Software
Testing Group (WG26) can be found here [1].
In this course we will not adhere to the standards but we will also not
oppose to use any of the definitions if we think that these are useful to
explain certain concepts (like we have done in the previous section on
errors, faults and failures).
There are many examples of software failures that have reached the
news. We mention some of them below.
5 http://www.istqb.org/
6 https://www.ieee.org/
7 https://www.iso.org/
24
Chapter 1 Introduction to software testing
FIGURE 1.5 Wrong calculations (top left Daily Mail [38], bottom left Tech-
nica [117], right The Telegraph [115])
25
Universidad Politécnica de Valencia Structured Software testing
Taking into account the 5 phases of Beizer from Section 1.1, it seems
reasonable to say that the purpose of testing should be to provide as
much confidence as possible concerning the (good) quality of the soft-
ware. In that same section, we recalled Dijkstra’s famous saying that
testing cannot prove the absence of faults. Why is that? Let us look at
some examples to show how impossible it is to test software completely.
There are too many possible input values. Imagine a System Under Test
(SUT) that takes a simple date as input (day-month-year) and based on
the date makes a specific calculation. We need to take into account
that some months have 30 days, others 31, we need to think about leap
years, and so on. But if we want to test it completely, we have to enter
26
Chapter 1 Introduction to software testing
START
yes
yes
no
PIN OK? ERROR Again?
yes no
Withdraw yes no
which < 4?
amount? max 4 times
yes End
transaction
Enough no New no
saldo? amount?
yes
no
Again?
yes
Give money
yes
< 5?
max 5 times
no
Card back
END
all the possible dates that exist one by one to check there are no fail-
ures. Moreover, maybe we should also enter wrong dates to see if the
program handles those adequately. That is an almost impossible calcu-
lation and would take a lot of time.
EXERCISE 1.4
Above we indicate that there are more than 25 million possible paths
from START to END in Figure 1.8. Can you come to the same estimation?
27
Universidad Politécnica de Valencia Structured Software testing
The examples above show that it is impossible even for very simple ap-
plications to capture all cases through testing, let alone real applications
that are many times more complex than the examples given above. And
this example was only about testing some functionality! What about
usability, portability, performance, etc! There are many other quality
characteristics that need to be tested (we will list more in Section 1.10.3).
And we are not there yet: software applications are becoming increas-
ingly complex and the number of devices and systems on which we can
use them is growing at lightning speed. Testing is therefore becoming
increasingly complex.
That does not sound easy and, indeed, it is not. So, how can we handle
that?
Test cases are important artefacts in testing, and, as you will expect by
now, there are no two sources that use the same definition of “test case”.
We use the following definition.
DEFINITION 1.5 A test case contains all information necessary to guide the execution of
a particular test.
28
Chapter 1 Introduction to software testing
To guide the execution of a test we need (1) to prepare for the test, (2)
execute the test and (3) verify whether the outcome encountered a fail-
ure or not. The third and last part is related to something that in the
test world is called an oracle. This terminology was coined in 1978 by
William Howden [53] and comes from mythology: "the one that knows
all the answers". An oracle in testing is the mechanism you use to de-
cide whether the test case output is correct or not. Later in this course,
we will talk about the oracle problem.
How do we come up with suitable test cases? Test case design requires
information about:
• what the SUT should do (for example what functionalities it should
have);
• how the SUT implements those functionalities;
• how people will (or should) use the SUT;
• et cetera.
EXERCISE 1.5
Imagine we want to test a permanent marker of the brand edding 400.
On the website of the manufacturer (http://www.edding.com/), the
characteristics of this marker are described as follows.
Design test cases for testing the “ready to use” property and the “quick-
drying” property of this marker.
29
Universidad Politécnica de Valencia Structured Software testing
What is the best set of test cases? It depends on the circumstances, and
there is no single right answer.
There are no perfect solutions to test case design, just test suites that are
better or worde depending on the context in which a SUT is used and
the trade-off between the risks and the costs.
The program reads three integer values from an input dialogue. The three
values represent the lengths of the sides of a triangle. The program displays a
message that states whether the triangle is scalene, isosceles, or equilateral.
Since then the triangle problem has been used by many to make a point
about testing in general or a technique in particular. Black even com-
pares it to Rorschach inkblots tests for test professionals [18]. Others
[65, 34] joke that a book on software testing is not complete without a
discussion of the triangle problem. We do not want this course text to
omit this classic example. Here, however, we use it to show that the ex-
perts all differ on how many test cases are needed to test this problem
adequately.
Myers [84] indicates about 20 test cases stating that these do not guar-
antee that all possible errors would be found. Jorgensen [57] lists about
125 test cases for this problem when applying boundary value analy-
sis (a test case design technique we discuss later in this course), plus
an additional 11 after applying decision table analysis (also explained
later on this course). Binder [14] listed 65 tests for the triangle problem,
addressing several new dimensions of risk, such as potential errors aris-
ing if you try to repeat the test more than once. Ammann and Offutt [4]
come up with 64 test cases resulting from all combinations of equiva-
lence classes. Black [18] lists 28 test cases but he indicates in a footnote
that his reviewers came up with many additional things that could be
tested.
30
Chapter 1 Introduction to software testing
Collard [35] states that due to how the exercise is specified and the lack
of context, he can argue that 4 test cases are adequate. However, would
the context change to a program used within a NASA space shuttle it
would be an entirely different situation. Space shuttle software, for ex-
ample, needs to compute the shape of triangles as part of its orbital
navigation. A NASA mission is life-critical: if the orbital navigation is
wrong, the consequences could be disastrous. However, the triangles
computed on the space shuttle are curved because they are based on
the shape of the Earth’s surface. Most testers of the triangle program
have assumed a flat surface without really thinking about it.
So, we never know how many and what errors there are in a system. (If
we would, testing would no longer be necessary.) Moreover, knowing
which of these errors would be the most important to find depends on
the context in which the SUT is executed. However, we still need to be
able to evaluate the test suite somehow. Since there is no way to have
the real measure of quality, surrogate measures will have to suffice, i.e.
measures of which we know, think or hope that they correlate to the
real measure. In testing these measures are related to different kinds of
coverage criteria or mutation scores.
Code coverage criteria give an idea of the percentage of code that has
been executed by our test suite. Different criteria are defined based on
whether specific code constructs are executed or not during the tests.
For example:
Statement coverage: The percentage of statements that have been exe-
cuted by our tests. (This is also sometimes called instruction coverage
or block coverage.)
Branch coverage: The percentage of branches that have been executed
by our tests. For example, given an if statement, have both the True and
False branches been executed? This is also sometimes called decision
coverage.
Condition coverage: The percentage of Boolean sub-expressions present
in the guards of a branch that have been evaluated to both True and
False during our tests. (This is also sometimes called predicate cover-
age.)
31
Universidad Politécnica de Valencia Structured Software testing
Consider the program oddOrPos from Figure 1.3. And consider the
following test case:
sub-expr1 sub-expr2
True not evaluated
False True
False False
The test case executes two of these three (i.e. 66.6% of multiple con-
dition coverage):
32
Chapter 1 Introduction to software testing
• 100% branch coverage. Now the True as well as the False branch of
the if statement is executed.
• 100% condition coverage. Now sub-expr2 is False when x [3] is inves-
tigated.
• 100% multiple condition coverage.
Using the code to define coverage criteria is easy. Moreover, tools ex-
ist to automatically determine coverage for different programming lan-
guages. However, care should be taken when interpreting code cover-
age. A 100% code coverage does not mean there are no errors left. The
last examples show that with 100% statement, branch and (multiple)
condition coverage, the error that resides in the code (from Exercise 1.3)
has not been found.
The general principle underlying mutation testing is that the faults are
deliberately inserted (this is called seeding) into the original program.
This can be done for example by changing a simple syntactic construct
(e.g. replacing an == with an =, or an && with an kk). The faulty pro-
grams created this way (one deliberately inserted fault per program)
are called mutants. To assess the quality of a given test suite, the mu-
tated programs are tested with the test suite to see if the seeded faults
are detected. The mutation score is the percentage of mutants that are
detected by the test suite.
33
Universidad Politécnica de Valencia Structured Software testing
Programmers should not only implement their programs. They are also
responsible for testing their code extensively. This type of testing at the
code level is called unit testing or component testing. Programming and
testing in this phase are closely intertwined. Some programmers feel
they do not have time to write unit tests, but actually, you do not have
the time not to write them. Not writing unit tests will ultimately lead to
errors in your code that will be difficult to find and then you will have
to spend many long hours in the debugger trying to find out where they
came from. Writing unit tests will significantly reduce your debugging
time.
The way we write the tests can be roughly divided in test-first, test-last
or test-whenever. Test-first development, also known as Test-Driven De-
velopment (TDD) is a software development style in which you write
the unit tests before you write the code to test. Test-last means you
write the code first and then you write tests for it. Test-whenever means
sometimes you write the code first and sometimes you write it last.
There are pros and cons for each of them, and people in favour of and
against each of them. However, this is beyond the scope of this course,
since we concentrate on how to make test cases (not when).
After each unit has been tested, units can be integrated into subsystems
that will eventually form the entire system. When two or more units
are integrated into a subsystem we need to test that they work together
or communicate with each other in the desired way: this is called inte-
gration testing. During integration testing, we mainly look for defects in
the use and implementation of the interfaces that the units offer.
Which units and subsystems we consider, what their interfaces are and
how they can communicate with each other is described in the design.
For example, consider the units (or components) and their dependen-
cies on other units in a software system as depicted in Figure 1.9. How
can we integrate these and test the interfaces? Roughly there are three
ways: bottom-up, top-down and big bang. We describe each of these
below.
Bottom-up integration
The first integration strategy is bottom-up integration. We test the units
M1 to M7 independently with unit tests and then start with integra-
tions A or C (see Figure 1.10). To test M1 to M7 independently we need
to write a driver for each one of the components since we do not want to
include components higher up in the tree already in the testing. Con-
tinuing with integration A, for instance, we need to write another driver
for M8, because we do not want to include M9 in the test yet (the inte-
gration M8 - M9 still needs to be tested). In other words, integration A
is tested in isolation from M9. This is depicted in Figure 1.11.
34
Chapter 1 Introduction to software testing
System
M11
M9 M10
M8
M1 M2 M3 M4 M5 M6 M7
uses
System
M11
Integration B Integration C
Integration A M9 M10
M8
M1 M2 M3 M4 M5 M6 M7
A driver can be defined as a piece of code that calls other code mod-
ules, units or components. For any component you want to test, it is
important to have a program that calls it. The test drivers functionally
simulate the behaviour of upper-level components, which are not part
of the integration yet. Such a driver acts as a temporary replacement
for the calling component: it supplies the same input to and receives
the output of the lower level component.
advantages
• it is easy to do
• you can start testing early
• you can test integrations in parallel (like integrations A and C)
disadvantages
• the visibility of the system (M11) comes late for the client
• writing drivers can be costly (imagine for example the kind of driver
we would need for M9 and M10)
35
Universidad Politécnica de Valencia Structured Software testing
Driver
Integration A
M8
M1 M2
Top-down integration
The second integration strategy is top-down integration. We start from
the top with integration A from Figure 1.12.
Integration D
Integration C
Integration B
Integration A
System
M11
M9 M10
M8
M6 M7
M1 M2
M3 M4 M5
M10
Double Double
for M6 for M7
36
Chapter 1 Introduction to software testing
Test double [79] is a generic term for any pretend object used in place of
a real object for testing purposes. The name comes from the notion of a
stunt double in movies. There are different names and kinds of doubles:
dummy objects, fake objects, stubs and mocks.
advantages
• early visibility of the system (M11) to the client
disadvantages
• it is difficult to do
• writing test doubles can be very costly
Big bang
The third and last integration strategy is big bang: we just put it all to-
gether and test at M11 level. Although this is not very systematic, there
are contexts in which this approach is justified. For example when the
SUT is small, when the SUT is stable and we only added some com-
ponents, or when the SUT is monolithic, meaning that the dependency
between the components is so strong that testing them separately will
be almost impossible (e.g. drivers or doubles will be almost a copy of
the real component).
advantages
• it is easy to do
• it is fast (no drivers, no doubles)
disadvantages
• it is difficult to localise the defects that caused failures
When all components have been integrated and the system is complete,
we can start with system testing. System tests are not restricted to func-
tionality testing: at this stage of testing all kinds of non-functional prop-
erties also need to be tested and also all types of different configura-
tions. Here we can test some of the quality characteristics we have seen
in Section 1.2 and Figure 1.1. For example:
• performance;
• security;
• usability;
• accessibility;
Performance testing
37
Universidad Politécnica de Valencia Structured Software testing
During performance tests, you can collect data about the number of
virtual users, hits per second, errors per second, response time, latency
and bytes per second (throughput), as well as the correlations between
these. Through the reports you can identify bottlenecks, bugs and er-
rors, and decide what needs to be done. Some specific examples of
performance testing are:
• Load testing is a specific type of performance testing, that consists
of constantly and steadily increasing the load on the system to de-
termine whether it meets the specified requirements concerning the
threshold limit of load it can take.
• Stress testing is also a specific type of performance testing, which ex-
amines how the system behaves under intense loads (i.e. stress), and
whether (and how) it recovers when going back to normal usage.
Note that these characteristics should also be part of the specified
requirements.
• Spike testing tests an application with extreme increments and decre-
ments in the load (i.e. spikes of load).
• Soak testing consists of testing the performance and the stability of the
system over a long period of time.
Security testing
Usability testing
Since usability is about users, testing needs to be done with real users
performing real tasks with the system. While they use the software,
performing an assigned task, they are observed (and sometimes also
recorded and eye-tracked) to detect any usability problem. A myriad of
books have been written on usability testing, for example [101, 68, 87].
38
Chapter 1 Introduction to software testing
Since this type of testing with real users is expensive and time-consuming,
we could also opt for a usability inspection where checklists and heuris-
tics are used by experts to give an opinion about the usability of a prod-
uct. The most well-known heuristics are from Jakob Nielsen [86].
Accessibility testing
Screen reader software reads out the text that is displayed on the screen.
We need to test whether the graphical user interface of our SUT reveals
enough information for this assistive technology to work properly and
be able to read out sufficiently detailed information to its user such that
the latter can understand and act.
Configuration testing
Acceptance testing comes after system testing, and is the level of testing
whether it is acceptable to release or ship the software. The definition of
acceptable depends on the previously defined acceptance criteria. Usu-
ally. these are the tests that are executed by the users, customers or the
project managers to determine whether or not to accept the system.
39
Universidad Politécnica de Valencia Structured Software testing
Beta testing is acceptance testing done by the end users of the software.
They can be the customers themselves or the customers of the cus-
tomers. This is also known as User Acceptance Testing (UAT).
Which tests to choose for regression testing? There are typically three
different approaches for choosing the regression tests:
Retest All: retest all the tests in all test suites. Even if the majority of
these tests are automated, this is expensive because it can take a consid-
erable amount of time. Moreover, it is not always necessary.
Test Case Selection (TCS): Rather than taking all tests, this method
proposes choosing a representative selection of the test suites. Repre-
sentative meaning that it gives the desired coverage. This selection can
be based on, for example, only the riskiest use cases, only tests that be-
long to the changed code, only tests that test the most complex code, et
cetera.
Test Case Prioritisation (TCP): With a limited set of test cases, it is
ideal to prioritise those tests. Test case prioritisation aims to order a
set of test cases to achieve an early optimisation based on preferred
properties (risk, coverage, etc.). It gives the ability to execute highly
significant test cases first according to some measure, and produce the
desired outcome, such as revealing faults earlier and providing feed-
back to the testers. It can also help to run the crucial tests first if time is
running out.
40
Chapter 1 Introduction to software testing
This is a difficult journey for which only designing and executing test
cases is not enough. Testing software is an engineering activity, and like
software development, it should be managed using well-established
test project management processes. Test project management, or just test
management, is only indirectly visible to the rest of the development
team. We are talking about processes like:
• test planning;
• defect tracking and reporting;
• controlling and monitoring the test status;
• setting-up and controlling the test environment (tools, databases, plat-
forms, configurations, et cetera);
• test organisation: roles and staff;
• continuous improvement.
In this course we put the emphasis on test case design. We consider this
to be one of the most challenging and important parts of testing, since
the test cases you execute determine the quality of your testing. We con-
centrate mainly on functionality testing: we focus on the functionalities
and features that the software should offer to its users. We leave testing
the non-functional properties to another course.
Looking for a course book that fits our goals, we were confronted with
the fact that two schools of software testing exist that have contradic-
tory views on how testing is done best:
• The analytical school, where the emphasis is on better testing by
improved precision of specifications and many types of models, i.e.
model-based testing. This school has many proponents in academia.
• The context-driven school, where emphasis is on better testing by
adapting to the circumstances under which the product is developed
and used, by exploring, learning and questioning, i.e. exploratory test-
ing.
41
Universidad Politécnica de Valencia Structured Software testing
Agreeing with two different schools and combining their views, while
they have been discussing for years, means that none of the existing
books on software testing fit our philosophy and the goals of this course
perfectly.
And this brings us to the second element of this course: practice. The
only way to become a good tester is to apply the theory and your own
knowledge, intellect, experience and creativity to lots of examples. So
even if you have read and understood a particular technique, and then
encounter an exercise that seems to be very simple, or just another case
of the same problem, we strongly advise you to do the exercise. You may
be surprised by the fact that you did not yet know everything there was
to know about that particular technique.
42
Chapter 1 Introduction to software testing
Theo Ruys (The Netherlands) Dr. Theo Ruys studied Computer Sci-
ence at the University of Twente in Enschede. In 2001 he obtained his
PhD within the Formal Methods and Tools (FMT) group at the same
university. For ten years, he worked within the FMT group as an assis-
tant professor. His main research topics were the effective use of model
checkers in general and the architecture and construction of software
model checkers. In 2015 he joined Axini, a small software company
specialized in model-based testing, founded in 2008 and located in Am-
sterdam. Axini has developed the Axini Modeling Suite (ATM), an ad-
vanced tool for automatically testing reactive systems. ATM has been
built on the theory of Jan Tretmans and Ed Brinksma, both from the
FMT group of the University of Twente. Hence, demostrating a good
example from academic test theory to practice. At Axini, Theo Ruys has
the roles of test architect and software engineer, and he is concerned
with the effective use of model-based testing.
43
Universidad Politécnica de Valencia Structured Software testing
44
Chapter 1 Introduction to software testing
45
Chapter contents 2
Testing is model-based
Overview 47
46
Chapter 2
Testing is model-based
OVERVIEW
LEARNING GOALS
After studying this chapter, you are expected to:
– realise that models are all around us
– be able to explain that models can have different levels of details,
abstraction and formality
– be able to explain why we do not speak about black-box or white-
box testing in this course
– be able to explain the ‘’model – coverage criterion – test cases”
structure of test case design techniques in this course.
CONTENTS
In software testing, this works in the same way. We use models, with
different levels of abstraction, different levels of details, and different lev-
els of formality. Lacking details, formality and abstracting away from
maybe difficult aspects makes all models wrong [23]. Modelling some-
thing implies making simplifications about the real world which we
know are false but which we believe may be useful anyway.
47
Universidad Politécnica de Valencia Structured Software testing
All models are wrong, but some are useful, is a famous quote often at-
tributed to the British statistician George E. P. Box [25].
48
Chapter 2 Testing is model-based
START
yes
yes
no
PIN OK? ERROR Again?
yes no
Withdraw yes no
which < 4?
amount? max 4 times
yes End
transaction
Enough no New no
saldo? amount?
yes
no
Again?
yes
Give money
yes
< 5?
max 5 times
no
Card back
END
However, we can still use the simple model from Figure 2.2 to make
some test cases.
In the informal model from Figure 2.2 there are two very different paths
possible: in one, the client gets his/her money, in the other, he/she
doesn’t get the money. So this would result in only two test cases. One
test case where we get money because our bank account balance is
enough, and another test case that results in no money because we do
not have enough balance.
In Chapter 1, we have seen that the model from Figure 2.3 has over 25
million possible paths. Since it is impossible to cover those entirely, we
need to choose test cases. Both paths from Figure 2.2 are also present
in Figure 2.3, but due to the many details there, they are much harder
to spot. So in this case, the simpler model helps us to see the bigger
picture better. However, there is evidently a lot more to test: simple
models typically result in simple test suites.
49
Universidad Politécnica de Valencia Structured Software testing
Make a model
Design testcases
In this course, you will see all kinds of models: equivalence classes,
decision tables, classification trees, graphs, and state machine models.
Each model gives rise to a different test case design technique. And
which model you choose to describe a specific test situation: depends....
It depends on how you look at the test problem, it depends on the infor-
mation that is avaliable,it depends on the characteristics that you want
to test, it depends on the time you have for testing, it depends on your
experience, it depends on the risks, et cetera. Most of the times different
models are possible, some of which may turn out to work better than
others. However, some kind of model is always needed to create at least
a semblance of order out of the chaos that a SUT is with all of its possible
input values and all its possible use cases.
Then, when we have a model, we will try to make test cases that will
cover it according to some coverage criterion.
All test techniques that we will see in this course basically come down
to (see Figure 2.4):
1 make a model;
2 pick a coverage criterion;
3 generate test cases based on applying this criterion to the model.
In this course, you will see a technique called equivalence class testing.
The model underlying this technique consists of equivalence classes of
the domains of the relevant input variables. These equivalence classes
can be derived from specifications of the SUT’s functionalities, but these
classes can also be determined at the program code level for each func-
tion or procedure.
50
Chapter 2 Testing is model-based
While studying test design techniques in this course, you will encounter
a lot of examples explaining some technique. It is highly probable that
you will often think, after reading such an example: “Is that all?! I
could have easily thought of that myself!”. This is perfectly normal and
exactly what we expect. However, we also expect you to have an en-
tirely different opinion after completing the exercises we propose. The
exercises will force you to start exploring yourself to come to a good
model, there will be modelling decisions that you need to make that will
have consequences for your testing. Solving the exercises will make you
aware of how difficult testing is and will give you the skills to become
a better tester.
The aims of teaching you test case design techniques in this course are:
• to make you conscious of the enormous set of well-known techniques
that exist;
• to let you experience that it mostly is not as simple as in the explain-
ing examples;
• to let you explore to make models;
• to make you appreciate that testing is a craft, and that a lot of experi-
ence and common sense is required to do it well.
51
Chapter contents 3
Overview 53
1 Introduction 53
2 Testing is not A science; testing IS science. 54
3 Formality vs. informality 55
4 Agency vs. algorithm 56
5 Deliberation vs. spontaneity 56
6 Tacit vs. explicit 57
7 How exploration typically works in software testing 58
8 Testing is not proving; verification is not testing 59
52
Chapter 3
OVERVIEW
This chapter and the next have been written by James Bach, a well-
known advocate of the context-driven school and of exploratory test-
ing. Since, in this course, we want to make sure that we convey ex-
ploratory testing well, what better option than including explanations
written by the experts themselves? James Bach is convinced that soft-
ware testing should belong more to social sciences than it does to com-
puter science. Since this is a course from computer science, we want to
put his use of the word ”formal” in context. In a broad sense, the word
”formal” means standardised, having a recognised set of rules, rigid or
dictated by other people. Think of a formal dinner where there are rules
for the clothes you have to wear and the way and order in which you
have to use your silverware. However, in computer science, the word
formal is often associated only with formal methods, i.e. using math-
ematical models to build software and hardware systems. Use of the
word ”formal” in this chapter is much broader. This is important to re-
alise because, for us computer scientists, this means that when we read
in this chapter that exploratory testing is informal testing this does not nec-
essarily mean that we cannot use formal methods and models to design
test cases. We can and we should because testing is both model-based
and exploratory.
LEARNING GOALS
After studying this chapter, you are expected to:
– be able to explain what the term exploratory testing means
– understand that exploratory testing is structured testing and can
be model-based
– be able to explain how we can look at software testing as a scientific
process.
CONTENTS
3.1 Introduction
53
Universidad Politécnica de Valencia Structured Software testing
you may be doing when testing at a professional level, you are also ac-
tively learning and making new choices about what to do next based
on what you learn. You are never merely following a pre-established
procedure.
But while that means the phrase ”exploratory testing” is largely redun-
dant, some testing is especially exploratory (i.e. informal, decided mo-
ment by moment by the tester), while some testing is especially scripted
(i.e. formal, determined by someone else or at some earlier time). The
testing process is always some mix of the two approaches. To do test-
ing well, we need to know how/when/why to emphasize choice and
how/when/why to emphasize procedure.
The learning that scientists do is disruptive to their work; and they love
that. The people who run the Large Hadron Collider would like noth-
ing more than to invalidate the Standard Model of quantum physics by
learning something unexpected that would require new theories and
new experimental designs. But there is a major difference between a
physicist and a software tester – the things we testers experiment upon
54
Chapter 3 The exploratory nature of software testing
are far less known and far less stable. No physicist worries about need-
ing to ”regression test” physical laws because they might have been
changed the previous night. Yet, that sort of thing does happen to
testers, so our learning curve never flattens out, and our test designs
must be allowed to change.
Software testing is exactly like that. Software tests are not just similar
to scientific experiments that test hypotheses or discover new things,
they are experiments (the word ”test” is right there in the definition of
experiment). Software tests are experiments and the professional tester
is a scientist who studies the product under test. If you want to learn
how to test very well, study the design of experiments.
55
Universidad Politécnica de Valencia Structured Software testing
The way we know for sure that machines don’t have agency is by look-
ing at what happens when machines misbehave: they are not punished.
Machines don’t get sued and don’t go to jail. Machines are, at best, like
children. If they misbehave, the blame falls upon the nearest source of
agency: their creators or operators.
56
Chapter 3 The exploratory nature of software testing
But it is equally exploratory to sit and think about what would be useful
to do next, and to think through the reasons why it would be useful.
You are encouraged to deliberate, in other words. Now, if you plan
multiple steps that you want to take, and carefully stick to that plan,
then that would be a scripted test. If you plan those steps and don’t
force yourself to stick to them, that is less scripted.
If you have some deep-seated habit, or if, by not being aware of your
options, you always make the same choices, then your ”spontaneity”
is low. For instance, imagine only knowing about one food: pizza.
When asked what you want to have for dinner, you always say ”pizza.”
Well that might feel spontaneous, but it is actually pre-determined. You
might as well have a written contract agreeing to pizza. Or perhaps
you play the piano, and you practice a piece of music so much that you
eventually can play it in a flowing spontaneous way, without reading
any musical notation. Finally, we all have little phrases we use that are
formalized and yet uttered spontaneously, such as ”thank you” or ”how
are you” or ”I am fine.”
Many of the processes of exploratory testing are not made explicit. In-
stead they are tacit.
57
Universidad Politécnica de Valencia Structured Software testing
and software testers are not competent to say how they themselves do
their own work! They are experts in talking about software, but not nec-
essarily in identifying the thought processes by which they think about
software. This has led to a huge emphasis on algorithmic accounts of
software testing.
To learn how to test, you must watch testers work; and you must watch
yourself work. You must respect that there is a deep structure even
to such seemingly trivial tasks such as deciding when to interrupt a
test process and when to stick to it. Tacit knowledge is developed
not by reading and memorizing instructions. It is developed through
the internal theorizing you do when you watch someone else work or
when you engage in an interactive conversation, and by the automatic
model-building that happens in your mind when you struggle to solve
a problem such as how to make a program produce a certain output.
In other words, tacit knowledge is founded on the experience of living
and working in a stimulating world.
As you test, you develop and use a mental model of the product under
test. You can make formal models, too, but no formal model will cap-
ture the full extent of your mental model. Your mental model includes
not only details of what the product is, but also how it works, what its
purposes and uses might be, what it influences and connects to, its past
and future, its similarities and relationships to other products, patterns
of consistency and inconsistency within it, its potential problems and
prospects for improvement, et cetera. Your mental model is automati-
cally created and maintained through the processes of interacting with
the product and reflecting on what you know. This knowledge is then
crucial to your ability to evaluate the product and report on the status
of your testing.
When you first encounter a product and you don’t know much about
it, you explore to learn what it can be. I call this ”survey testing,” which
means any testing that has the primary goal of building a mental model
of the product (i.e. learning about it). Although its primary goal is
learning, it is real testing, because you may encounter and report bugs
while doing it. Survey testing is a very exploratory process.
58
Chapter 3 The exploratory nature of software testing
You can also explore before you ever see a physical product. You could
do this by reading about it, reviewing similar products, or having con-
versations with the developers and designers. You can ask questions,
either out loud or silently as you perform research. This helps you build
that all important understanding of what you will eventually test in
physical form, and might also lead to finding bugs in the specifications
and ideas that you encounter.
During this early phase of the test process, specific ideas about how to
test it will emerge. Resist the urge to write down those ideas as formal-
ized ”test cases.” It’s too early for that. The worst time to plan testing is
at the beginning of a project. Getting a general idea of what you want
to do, or writing down those ideas is not the problem. The problem is
making premature and binding decisions about specifically what and
how to test. Formalization is premature unless you have already in-
formally experimented with possible test procedures and gained tacit
knowledge from that process. You will learn a great deal very quickly
when you are actually in front of the product, and if you have already
formalized the testing you will simply have to throw away most of that
scripting and start again, or else ignore what you learned and follow
procedures that you know to be bad.
After you have a strong idea of what the product ought to be and how
you want to test it, you still explore to find hidden problems and un-
known risks. Even in the midst of a formalized test procedure, when
you find any indication of a possible problem, you must explore to dis-
cover the extent of that problem. All along, throughout testing, you
keep your eyes and mind open for new surprises.
No matter how formal your testing gets, you cannot use testing to ”prove”
that software ”works.”
59
Universidad Politécnica de Valencia Structured Software testing
Yet new testers often speak casually about proving that their product
works. This is a seductive but poisonous way of thinking. Here’s why
it’s wrong in one phrase: can is not the same as will. If someone drinks
a lot of alcohol and then drives home safely in a car, that only proves it
is possible not to be injured or injure someone else while drunk driving.
It does not verify that everyone, always, will be safe. It doesn’t prove
it’s a good thing to do. It is only one experience. Similarly, when you
experience that your product doesn’t fail in the five minutes that you
look at it, that is no ”proof” that it ”works” because it may have been
failing in a way you didn’t see, or it may fail five minutes from now.
You can’t know.
Here’s another key phrase: Seeing no problem is not the same as seeing
that there are no problems. Just because you fail to detect a bug doesn’t
mean it wasn’t there in front of you, fully able to be detected. Or it may
have been very good at hiding from you, yet not so good at hiding from
your users. Therefore, it is wrong to make sweeping statements about
what is true about a product based on the assumption that what you
see is all there is.
60
Chapter 3 The exploratory nature of software testing
61
Chapter contents 4
Overview 63
1 Introduction 63
2 Recording of an exploratory testing process 66
2.1 What is the nature of this testing? 70
2.2 How did I choose that test string? 71
2.3 How did I choose to use a loop? 71
2.4 How did I know what the Reflow function is supposed to do? 71
2.5 What should this kind of testing be called? 71
2.6 What about the oracle? 72
2.7 What is the next step? 72
2.8 How about a de Bruijn sequence? 74
2.9 Progressively smaller words 74
2.10 Exactly sized lines 76
2.11 Try the interesting characters 77
2.12 Consecutive special characters 77
2.13 Have we tested enough? 77
2.14 Final bug list 78
62
Chapter 4
OVERVIEW
This is James Bach’s other chapter that has been written as an exam-
ple of how an experienced exploratory tester would set about testing
a given problem. It is meant to put you in the right state of mind to
continue with the rest of this workbook.
LEARNING GOALS
After studying this chapter, you are expected to:
– be able to explain the practice of exploratory testing
– be able to explain how exploring leads to models
– be able to explain how the need for a model leads to more explor-
ing.
CONTENTS
4.1 Introduction
The specification and algorithm, due to Naur, are given in Figures 4.1
and 4.2. These were presented to James Bach in this way to start testing
and record the process. Naur’s algorithm uses an array buffer to grad-
ually collect the next word (sequence of characters without blanks or
newlines) from the input text; that word occupies positions 1 . . . bufpos.
Moreover, it uses a variable fill for the number of characters that were
sent to the output since the last newline.
63
Universidad Politécnica de Valencia Structured Software testing
64
Chapter 4 Let us do some testing
65
Universidad Politécnica de Valencia Structured Software testing
EXERCISE 4.1
Study the specification and algorithm from Figures 4.1 and 4.2 well. Try
to understand how the algorithm intends to solve the problem stated in
the specification. You can disregard all remarks about action clusters.
Do you see some errors already?
In both approaches, there is a risk we will introduce errors that are not
caused by the algorithm itself, since we are, so to speak, working with
an interpretation of the algorithm. In other words, maybe we are not
really testing the algorithm. However, there is nothing we can do about
that, except pay attention and try not to make mistakes.
Since I (James Bach) was not given running code, I first implemented a
version of the example program in Perl to serve as the SUT, doing my
best to preserve any bugs that might be in the provided pseudo-code. A
few obvious problems immediately surfaced, and they have to do with
underspecification of the pseudo-code:
• The pseudo-code calls an exception of some kind (“Alarm”) when
the length of a single word is equal to or greater than the width of
the viewport. This means a long string of text without blanks is an
actual error condition. However, the error handling has not been
clearly defined.
• The pseudo-code never exits, because the code does not recognise
the end-of-string condition. Does this mean that an infinite stream
processing function needs to be implemented that never terminates?
Or is this again underspecification?
I resolved the second to have a version of the function that does ter-
minate and on which it would be worth doing more than the simplest
testing.
The running function is called Reflow, it is written in Perl and you can
find it on the course site1 .
reflow_function($input,\$output,$max);
where:
• $input contains the text to be re-wrapped
• $output receives the re-wrapped text (the "\" indicates it is passed
by reference)
• $max contains the width of the viewport.
1 There is also an implementation in Java that can be found on the course site.
66
Chapter 4 Let us do some testing
A simple and quick way to test this is to provide a simple test string that
includes some words of varying lengths, then progressively reducing
the size of the viewport from large to small. For example, this code can
do that:
for (my $max=40; $max>=0; $max--)
{
my $output = "";
my $result = reflow_function("this is a simple test string.", \
$output,$max);
print "$output\n", "*" x $max, "\t$result","\n";
}
This produces the following output (we added +++ to mark beginning
and end of the output):
+++
67
Universidad Politécnica de Valencia Structured Software testing
this is a simple
test string.
*********************
this is a simple
test string.
********************
this is a simple
test string.
*******************
this is a simple
test string.
******************
this is a simple
test string.
*****************
this is a
simple test
string.
****************
this is a
simple test
string.
***************
this is a
simple test
68
Chapter 4 Let us do some testing
string.
**************
this is a
simple test
string.
*************
this is a
simple test
string.
************
this is a
simple test
string.
***********
this is a
simple
test
string.
**********
this is
a simple
test
string.
*********
this is
a simple
test
string.
********
this
is a
simple
test
string.
*******
this
is a
simple
test
****** Alarm!
this
is a
***** Alarm!
this
69
Universidad Politécnica de Valencia Structured Software testing
is a
**** Alarm!
*** Alarm!
** Alarm!
* Alarm!
Alarm!
+++
Merely glancing at this immediately reveals that the reflow function al-
ways adds a new line and a space to the beginning of the string, which
seems inconsistent with its purpose, the desires of a hypothetical user,
and equivalent functions in other products. Otherwise, reflow appears
to work as advertised, except that words larger than the viewport are
treated as error conditions instead of becoming wrapped, while the ac-
tual error condition of a zero-width viewport is not explicitly handled.
How did it arise? It might seem too obvious even to discuss, but in fact
I have made many interesting choices, here, that could have been made
differently. At this point, these choices are “tacit,” meaning unspoken.
I have used my tacit knowledge of word wrapping (also known as line
breaking) and testing to guide me. Tacit knowledge often feels like “just
doing the obvious thing” to people who already have the particular
skills and knowledge involved. But if you do not have these, the work
can look mystical or ritualistic.
70
Chapter 4 Let us do some testing
I wanted an inexpensive test, but I would still like some good test cov-
erage. A loop is a very simple way to cover a lot of ground. I can “try
everything” by looping through all the sizes within a range.
4.2.4 How did I know what the Reflow function is supposed to do?
71
Universidad Politécnica de Valencia Structured Software testing
I used a blink oracle. That is a heuristic based on the assumption that our
brains are very good at detecting incorrect patterns in a huge amount
of data in a blink.
Here I used a progressive mismatch oracle, this is a kind of blink oracle, be-
cause the code results in a sequence of outputs based on small changes
to the input. This helps the tester’s eye pick out incorrect patterns in
that output.
I must stop and think at this point. I need to consider three major ques-
tions:
72
Chapter 4 Let us do some testing
In this case, functional and data seem like large enough surfaces for me to
explore at this moment. The other surfaces are either trivial or obscure
to me. In the case of the functional surface, I think I already have a
good start on that. I just need to question whether Reflow has all the
capabilities that a good word wrapper ought to have. Since this is an
exercise, I will put that aside and focus instead on the surface data.
I need a rich set of test data. I can put this together by hand or write
code to generate it. Either way, I first need to model the input data space
so that I can have a clear idea of what I want to create. The primary
data of interest is the text to be wrapped. Based on my background
knowledge, a little Googling, and the desire to keep things simple, the
elements of that data include:
• Text encoding. ASCII, ANSI, UTF-8, UNICODE... what encodings
are supported? (I’m going to assume ANSI).
• Words. A word is a string of ANSI characters other than newline,
whitespace, or hyphens.
• Whitespace. I will include the following character codes in my con-
cept of whitespace: Horizontal Tab (0x09) and Space (0x20)
• Non-breaking space. I will consider a non-breaking space (0xA0) to
be something that should not allow a line break.
• New lines. I will consider line feeds (0x0A) and carriage returns
(0x0D) to be new line characters, and I will consider a carriage re-
turn to be a single new line character.
• Hyphens. I will consider the following character codes to be hy-
phens: Dash (0x2D), En Dash (0x96), Em Dash (0x97), Soft Hyphen
(0xAD).
73
Universidad Politécnica de Valencia Structured Software testing
• 1
• <longest word
• =longest word
• >longest word
• maxint
Timing does not seem important in this case, and there are probably no
persistent state variables to worry about. There is no Graphical User
Interface (GUI). So, what I need to do is construct one or more texts
which explore every important factor that we want to cover.
This turned out to be a pretty bad idea. The sequence I wanted was too
big for the applet, but with Googling, I found a Python script that gave
me what I needed. However, when I transformed that into a usable
input string, it was a big mess. I could run it through Reflow, but it
was too difficult to interpret the result. So, I abandoned the de Bruijn
sequence idea and returned to something simpler.
*********************************************************
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2 http://www.hakank.org/comb/debruijn.cgi
74
Chapter 4 Let us do some testing
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxx xxxxxxxxxxxx
xxxxxxxxxxx xxxxxxxxxx xxxxxxxxx xxxxxxxx xxxxxxx xxxxxx
xxxxx xxxx xxx xx x
*********************************************************
However, when I narrowed the viewport to the size of the largest word
I saw this:
****************************************
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx
xxxxxxxxxxxxxx xxxxxxxxxxxxx
xxxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxx
xxxxxxxxx xxxxxxxx xxxxxxx xxxxxx xxxxx
xxxx xxx xx x
****************************************
75
Universidad Politécnica de Valencia Structured Software testing
In this text, we will make sure that each line has exactly the same size
(49 characters). When it is word-wrapped with a viewport size of 49 I
should see something like this (i.e. our oracle):
76
Chapter 4 Let us do some testing
When I created a new version of the previous test data which had three
consecutive special characters in each input text, I discovered that new
lines were being replaced by spaces, which seems like a bug to me. Con-
secutive whitespace is handled by passing it through. Although the
spec is silent about that, I think it is reasonable behaviour.
77
Universidad Politécnica de Valencia Structured Software testing
I have found a few bugs and now is a good time to move on to other
testing. When I get the new – fixed – version, I can easily repeat this
testing, and I might add more at that time. For instance, I might use
naturalistic text.
3 http://www.satisfice.com/articles/good\_enough\_testing.pdf
78
Chapter 4 Let us do some testing
79
Chapter contents 5
Overview 81
1 Introduction 81
2 Defect taxonomies 83
2.1 Boris Beizer’s taxonomy 83
2.2 Kaner, Falk, and Nguyen’s taxonomy 84
2.3 Whittaker’s ”How to Break Software” taxonomy 84
3 Catalogs and checklists 85
3.1 Catalogs 85
3.2 Checklists 87
4 Your taxonomy, catalog or checklist 90
80
Chapter 5
OVERVIEW
LEARNING GOALS
After studying this chapter, you are expected to:
– know of different taxonomies, catalogs and checklists used in soft-
ware testing
– see how useful taxonomies, catalogs and checklists can be during
testing
– understand that the most useful ones are those that you make
yourself
– be able to draw a Product Coverage Outline for a testing problem
using taxonomies, catalogs and/or checklists.
CONTENTS
5.1 Introduction
While generating test ideas and test relevant aspects, many times it
helps to make an overview using a picture or a table. Some call this
a Product Coverage Outline (PCO) [60]. A PCO is an artefact (a picture,
mind map, list, diagram, sketch, table, et cetera) that identifies the di-
mensions or elements of a product that might be relevant to testing.
Mind maps turn out to be very useful when creating a PCO. Mind map-
ping is a technique that can help people to learn more effectively. It
was invented, together with the term mind map, by popular psychology
author Tony Buzan [30].
81
Universidad Politécnica de Valencia Structured Software testing
For example, the small mind map in Figure 5.1 gives an overview of
this course.
Mind maps are very flexible. For example, another way to structure the
content of this course is shown in Figure 5.2. This flexibility is one of the
strengths of mind maps, but at the same time can be a weakness when
more precise semantics are needed.
To generate a mind map, or PCO, with test ideas in the early stages of
testing, we can use taxonomies, catalogs or checklists to get started. We
look at examples of these in the next sections.
82
Chapter 5 Taxonomies, catalogs and checklists
One of the first and most used defect taxonomies was defined by Boris
Beizer in Software Testing Techniques [11] and later revised in [21]. It is
based on nine top-level categories:
1 requirements defects
2 feature and functionality defects
3 structure defects
4 data defects
5 implementation and coding defects
6 integration defects
7 system and software architecture defects
8 test definition and execution defects
9 unclassified defects
Even considering only the top two levels it is quite extensive. All four
levels of the taxonomy constitute a fine-grained framework with which
to categorise defects.
83
Universidad Politécnica de Valencia Structured Software testing
You do not have to read this now carefully word for word, you can just
browse through it and get impressed by the huge amount of possible
defects and for now let this taxonomy serve as a reminder of how many
defects can occur in software and how difficult a task testing can be.
Later you can use the taxonomy to generate test ideas. We will also see
how such a defect taxonomy can be at the basis of making a catalog or
a checklist that will be explained in subsequent sections.
He bases his taxonomy (or fault model as he calls it) on the four basic
tasks that any software system performs:
1 software accepts inputs from its environment;
2 software produces outputs and transmits these to its environment;
3 software stores data internally in one or more data structures;
4 software performs computations using input and stored data.
84
Chapter 5 Taxonomies, catalogs and checklists
This fault model can guide the tester: if the software does any of these
four things wrong, we found a failure. In the book [126] he then de-
scribes different attacks that can be performed from different environ-
ments: the Graphical User Interface (GUI) for the human user, the file
system user, the operating system user and the external software user.
5.3.1 Catalogs
In the book by Pezze and Young [94], test catalogs are used to provide a
list of test parameters according to the input type of each variable. Also
Marick [74] uses catalogs in his book.1,2 These catalogs can be used to
give you more test ideas.
Appending elements
• The collection is initially empty; add at least one new element.
• The collection is not initially empty; add at least one new element.
• Add the last element that can fit (the collection is now full).
• Attempt to add one more element to a full collection.
• Add zero new elements.
• Add multiple elements, almost filling up the collection
• Add multiple elements, filling up the collection.
• The collection is not full. Try to add one too many elements (adding
more than one).
Deleting elements
• The collection has one element.
• The collection has no elements (nothing to delete).
1 www.exampler.com/testing-com/writings/short-catalog.pdf
2 www.exampler.com/testing-com/writings/catalog.pdf
85
Universidad Politécnica de Valencia Structured Software testing
How can we use this? Recall that we have seen some examples of pro-
grams that were about arrays in Figure 1.3. For example, let us look
again at:
/**
* Find LAST index of zero.
* @param x array to search
* @return index of last 0 in x;
* -1 if absent
* @throws NullPointerException
* if x is null
*/
input x = [2, 3, 5]
expected result -1
2 Exactly one match: a test case where there is only one occurrence of
0 in the array x:
input x = [2, 0, 3, 5]
expected result 1
3 More than one match: a test case where there is more than one oc-
currence of 0 in the array x:
input x = [7, 2, 0, 3, 0, 5, 6]
expected result 4
4 Single match found in the first position: the 0 is in the first position
of x:
input x = [0, 2, 3, 5]
expected result 0
5 Single match found in the last position: the 0 is in the last position
of x:
input x = [1, 2, 3, 5, 0]
expected result 4
86
Chapter 5 Taxonomies, catalogs and checklists
5.3.2 Checklists
Let us look at the last part: dates. Dates are often a source of defects,
as we have seen in Section 1.5. The checklist in [54] gives the items
depicted in Figure 5.3.
FIGURE 5.3 Part of the checklist from [54] about testing dates and Y2K
(year 2000 issues).
87
Universidad Politécnica de Valencia Structured Software testing
as aspects of the product that you consider testing. This includes as-
pects intrinsic to the product and relationships between the product
and things outside it. Elements that are considered are remembered by
the mnemonic SFDIPOT [20, 7]: Structure, Function, Data, Interface,
Platform, Operations and Time.
Let us look into each one briefly and see what type of questions we can
ask ourself to start getting these test ideas. Note that we have tried to
include many elements of the taxonomies, catalogs and checklists that
we have seen until now in this chapter.
Structure What is the structure of the application? Does it use files?
Can we talk to the developers? Can we build it? Do we have access to
the code? What are the technologies that were used to make the soft-
ware? Are there any updates or patches? What type of documentation
is there (user, developers)?
Function What is the functionality of the application? What are the
individual features/functions that it does? Do any of the functions in-
teract? Think about arithmetic/logic functions, estimations, transfor-
mations, multimedia, et cetera. This can be a (organised) listing of all of
the actions that can be done using the application. Maybe like simple
user stories, like You can add/delete to chart and User can join mailing list.
Is any error handling done?
Data What inputs does the product need and process? What are the
types, cardinalities, volumes and properties of these data? Do they need
formatting? Do they have boundaries? What is the precision? Are
there any dates? What does its output look like? What kinds of modes
or states can it be in? Does it come with pre-set data? Is any of its
input sensitive to timing or sequencing? Is there data that is expected
to persist over multiple operations?
Interface How is data being exchanged with the user (e.g. displays,
buttons, fields, whether physical or virtual)? Any other system inter-
faces, with other programs, hard disk, network, wireless, bluetooth,
data base servers, printers, et cetera? Are there any programmatic in-
terfaces (APIs) or tools intended to allow the development of new ap-
plications using this product? Any export or import of data to or from
external applications?
Platform (or the whole ecosystem) What does the application depend
on outside of the software? What operating systems does the applica-
tion run on? Are there any third-party components (hardware or soft-
ware) needed to run the application? Does the environment have to be
configured in any special way? Does it run on or have connection with
the internet? Does it need to comply with any standards (e.g. related to
security, accessibility, money, et cetera)?
Operations How it will be used? Who will use the product? Does it
have authentication (identifications of the users)? Does it do authoriza-
tion (handling of what an authenticated user can see and do)? Does it
have different types of users? Are there certain things that users are
more likely to do? What about ignorant, rogue and careless users? Any
privacy issues (i.e. to not disclose sensitive data to unauthorized users)?
Secrecy (i.e. to not disclose information about the underlying systems)?
88
Chapter 5 Taxonomies, catalogs and checklists
89
Universidad Politécnica de Valencia Structured Software testing
In Figure 5.4 you can see a partial PCO for the Naur algorithm from
Chapters 4. We only expanded the Data part of the SFDIPOT. It gives a
first impression of the things to take into account when testing.
Now that we have seen several different tools for generating test ideas,
the question arises: which is the best one for you? The one that is most use-
ful is the one you create from your experience when testing. Often the
place to start is with an existing one. Then modify it to more accurately
reflect your particular test experience in terms of defects you find, gen-
eral test situations you encounter, data you normally work with, their
frequency of occurrence, characteristics, et cetera.
The first step in creating your own tool is compiling a list of key con-
cepts. Do not worry if your list becomes long. That may be just fine.
Make sure the items are short, descriptive phrases. Keep your users
(that is you and maybe other testers you know) in mind. Use terms
that are common for them. Later, look for natural hierarchical relation-
ships between items. Combine these into a major category with sub-
categories underneath. Try not to duplicate or overlap categories and
subcategories. Continue to add new categories as they are discovered.
Revise the categories and subcategories when new items do not seem
to fit well. Share your tools with others and solicit their feedback.
90
Chapter 5 Taxonomies, catalogs and checklists
91
Chapter contents 6
Overview 93
1 Equivalence classes 93
2 Partitioning the inputs 94
3 History and related work 97
4 Making a model 97
4.1 Example: testing at GUI level 98
4.2 Example: testing at code level 102
4.3 Example: testing at class interface level 104
5 Exercises 106
6 Coverage criteria 107
6.1 All Combinations coverage 108
6.2 Each Choice coverage 109
6.3 Other ways of combining 110
6.4 Invalid Part coverage 111
7 Exercises 112
8 Faults that can be found 113
92
Chapter 6
OVERVIEW
LEARNING GOALS
After studying this chapter, you are expected to:
– understand and be able to describe partition testing techniques and
equivalence classes
– be able to model input domains of a SUT at different levels
– be able to apply different coverage criteria to generate test cases
– be able to apply the techniques of this chapter in a relatively small
practical exercise.
CONTENTS
We leave the subject of partitions for a while and discuss relations and
equivalence classes instead. But in the end, you will see that it all comes
together again.
93
Universidad Politécnica de Valencia Structured Software testing
Make a model For each input, model the domain with equivalence classes
N × N = {(n, m) | n, m ∈ N}
94
Chapter 6 Input domain modelling with equivalence classes
DEFINITION 6.1 The set of all possible input values of one input variable is called an
input domain.
95
Universidad Politécnica de Valencia Structured Software testing
D2
D1
D
D3
D5
D4
Since it is impossible to test a SUT with every possible value for every
input variable, the partition testing method described in this chapter
explores the input domains of the SUT with the intention of reducing the
number of tests. We divide (partition) the input domains into subsets
assuming that all elements of the same subset should result in similar
behaviour from the test object. That is, each subset is assumed to be an
equivalence class concerning that behaviour. Then we only test with a
representative value from each equivalence class.
Consider, for example, Figure 6.3. Suppose we have a SUT, some be-
havioural characteristic C and an input domain D. If we partition D
into D1 , D2 , . . . , D5 , then we assume that for all i (1 ≤ i ≤ 5), input
x ∈ Di and input y ∈ Di are considered to be equivalent for testing, i.e.
executing the SUT with input x gives rise to the same behaviour concerning
characteristic C as executing the SUT with input y. Consequently, we only
test the SUT with 5 representatives, one of each domain.
For example, instead of testing with every single model of printer, the
tester might treat all Hewlett-Packard inkjet printers as roughly equiv-
alent. Consequently, we have a test with only one HP printer as a rep-
resentative of that entire set.
96
Chapter 6 Input domain modelling with equivalence classes
Partition testing has its roots in papers by Ostrand and Balcer in the
1980s [92]. In other books and articles describing similar techniques for
test case specification based on dividing the input space, you can find
them under the names input space partitioning [4], equivalence class testing
[57, 65], equivalence class partitioning [28, 102], and category partitioning
[14].
As with all testing techniques, making the model is both the most crit-
ical and a challenging part. Critical because bad domain models give
rise to bad test cases. Challenging because there is no one fit-for-all
solution to come up with good partitions for testing.
97
Universidad Politécnica de Valencia Structured Software testing
exam 50 exam 50
coursework 20 coursework 20
Mark is outsite
coursework 20 exepted range
Calculate
Grade
To master the craft of partition testing you need to practice it a lot. Let
us start with some examples and then some exercises.
EXAMPLE 6.1 Consider an app on your telephone that is called generate-grade. The
GUI looks like in Figure 6.4. It has the following specification:
98
Chapter 6 Input domain modelling with equivalence classes
To test this characteristic, we have two input variables: exam and course-
work. A first attempt to partition the domains of these variables could
be as shown in Table 6.1.
Note that the parts are given an identifier (ID) so that we can reference
these more easily later on. These identifiers also mark parts of the par-
titions as valid or invalid parts:
• valid parts (vPi ) are composed of values that are valid according to
the specification, with respect to both their type and their values.
• invalid parts (iPi ) are composed of values that are invalid according
to the specification, either because the specification indicates this or
because the specification does not mention it. In generate-grade there
are two types of invalid parts:
– values of the right type but invalid according to the described func-
tionality. For example, exam marks are passed as integers, so that
is a valid type; however, they cannot be negative. Therefore nega-
tive integers are considered invalid.
– values of the wrong type. For example, exam marks are expected
to be integers, and anything that is not an integer is an invalid
value. Although in principle, these are not to be given to the pro-
gram, we still need to consider them for testing the behaviour of
the SUT when faced with these invalid inputs.
99
Universidad Politécnica de Valencia Structured Software testing
Note that we might not always have all the details about the specific
fault messages that can arise. For example in the description above, it is
just said that a fault message is generated when a mark is outside of the
range. It can or cannot contain more information about why it was not
accepted. Also the specification is unclear about what happens when
non-integer inputs are passed. Maybe they are not accepted? Or maybe
the same fault message ("FM") is raised?
It could therefore be argued that we do not know whether iP1 and iP2
are different equivalence classes, because when they both result in the
same fault message ("FM") the elements of vP1 and vP2 are equivalent
concerning that characteristic. Following that reasoning, we might as
well put all the invalid inputs in one equivalence class! However, from
a testing point of view, that is not what we want. Although it is not
explicitly specified what should happen when invalid inputs are given
to the system, something will definitely happen, and we need to test
whether this something is acceptable.
Consequently, for invalid partitions, our goal is not only to test whether
for invalid inputs the SUT behaves according to the specification, but
also whether it does correct error handling and has considered all in-
valid inputs. For this, we should test any possible sort of invalid input,
and partition the invalid parts of the domains as much as possible us-
ing a very detailed Ci . For example, we can update Table 6.1 as shown
in Table 6.2.
The valid parts we have now are the equivalence classes defined by the
following characteristic:
100
Chapter 6 Input domain modelling with equivalence classes
However, now we are clearly not testing whether different grades are
calculated in the right way.
This means there is a dependency between the inputs that together de-
termine the outcome. In this case we could introduce an intermediate
variable and use that to define the partitions. In this case the intermedi-
ate variable would represent the sum of exam and coursework.
This seems a better domain model for the generate-grade app as you
can see in Table 6.3.
There are various ways you can describe these equivalence classes in
your test model; you can pick the one that you like most or that fits the
test problem at hand best.
101
Universidad Politécnica de Valencia Structured Software testing
exam (e)
c = 25
100
90
80
e = 75
70
A
60 e + c = 100
50
B
40
e + c = 70
30
C
20 e + c = 50
10
D
e + c = 30
102
Chapter 6 Input domain modelling with equivalence classes
The following code is supposed to calculate the price (any faults are
deliberate, see Exercise 6.7). This is the method that has to be tested.
1 public double calculate_price (double baseprice,
2 int discount,
3 double specialprice,
4 int colour,
5 double extraprice,
6 int extras)
7 {
8 double specialprice_with_discount;
9
10 if (colour == 0)
11 specialprice_with_discount = specialprice * 0.86;
12 else if (colour == 1)
13 specialprice_with_discount = specialprice * 0.93;
14 else specialprice = extraprice;
15
16 double extraprice_with_discount;
17
18 if (extras >= 2)
19 extraprice_with_discount = extraprice * 0.90;
20 else if (extras >= 6)
21 extraprice_with_discount = extraprice * 0.85;
22 else extraprice_with_discount = 0;
23
24 return baseprice * (1-discount/100.0)
25 + specialprice
26 + extraprice_with_discount;
27 }
One obvious behavioural characteristic for valid inputs for this simple
example is:
103
Universidad Politécnica de Valencia Structured Software testing
Note that we included two parts containing invalid integers for discount:
]−∞, 0[ and ]100, +∞[. Instead, we could have taken the union of these
two intervals to form one larger part containing all invalid integers for
discount. That would certainly be alright in many situations. How-
ever, probably negative percentages are treated differently from per-
centages larger than 100, hence our choice for two separate parts.
104
Chapter 6 Input domain modelling with equivalence classes
EXAMPLE 6.3 Let us consider a class LinkedList that represents a singly linked list. It
offers in its interface a method getNextElement (see Figure 6.7) that
will return the next element, provided the list is not empty. It will wrap
from last to first element when reaching the end of the list. When called
on an empty list, it will throw an EmptyListException.
What are the input variables that we can use to test the functionality of
the method getNextElement? We need an object list (abbreviated
l) of type LinkedList that represents the state of the list at a specific
moment. Furthermore, we need a way to distinguish specific, though
abstract, test cases. For instance, we want to ascertain that all goes well
when the list has size 1, or when we ask for the next element of the
last element in the list. To be able to make these distinctions, we will
use a virtual variable lastVisited (abbreviated lV), indicating the last
element visited by getNextElement. We come up with the partitions
for list and lastVisited shown in Table 6.5.
Note that we chose to indicate the first position in the list with index 1,
i.e., lastVisited ranges from 1 to l.size().
105
Universidad Politécnica de Valencia Structured Software testing
6.5 Exercises
We strongly encourage you to really put some effort into solving the ex-
ercises below (and in later chapters as well). You will notice that it takes
some time, some exploring and some trial and error (or experience), to
really comprehend the problem at hand and its most important pecu-
liarities.
Moreover, when comparing your own answer to our answer, you will
see some very elaborate answers in this workbook. We do not expect
you to elaborate this much; we just try to show you a possible process
to reach a satisfactory solution, a process that is both model-based and
exploratory. As always there are more answers possible.
EXERCISE 6.1
Partition the input domain of the following application description adapted
from [111].
A company orders an application that needs to calculate the annual
bonus of its employees. This bonus is a percentage of their monthly
salary, and depends on how long they have worked for the company.
In the requirements the following rules are found:
• more than three years at the company yields a bonus of 50%
• more than five years yields a bonus of 75%
• more than eight years yields a bonus of 100%
EXERCISE 6.2
Partition the input domains of the following function Z
x+y 10 < x ≤ 20 ∧ 12 < y ≤ 30
∀ x, y ∈ N :: Z ( x, y) = x−y 0 ≤ x ≤ 10 ∧ 0 ≤ y ≤ 12
0 otherwise
106
Chapter 6 Input domain modelling with equivalence classes
EXERCISE 6.3
A hardware store sells hammers (5 euros) and screwdrivers (10 euros).
Over time however, their discount system has grown a bit complex.
They have asked the nephew of the boss (who is studying computer
science) to develop a little application that can calculate the price a cus-
tomer needs to pay when buying these products. They have the follow-
ing discount rules:
• If the total is more than 200 euros, then the client obtains a discount
of 5% over the total;
• If the total is more than 1000 euros, then the client obtains a discount
of 20% over the total;
• If the client buys more than 30 screwdrivers, then there is an addi-
tional discount of 10%.
Now that we have our model –in this case: domain partitions– we can
go to the next two steps from Figure 6.1: making test cases by selecting
test values (Step 3) that cover the model (Step 2). Before we can pick
specific values for our test cases, we have to decide where those values
should come from (i.e., from which part of the domain) and how values
for different input variables should be combined.
The reason behind Rule 2 is easy to see. When using a value from an
invalid part, the goal of the test is to find out how the SUT behaves for
this invalid part. If we were to combine it with other invalid parts, we
would not know to which invalid part the outcome of the SUT is related.
Actually, Rule 1 should be "you should also combine values from valid
parts in the same test case" (so not just "you can"), because there is no
guarantee that failures are caused by just one input. Of course this cre-
ates the risk of not satisfying Rule 3, since taking all kinds of combi-
nations of inputs into account means more test cases. This is where
coverage criteria for combining the parts come into play.
We always have to choose both a coverage criterion for the valid parts
of the input domains and a coverage criterion for the invalid parts. We
start with descriptions of two coverage criteria for valid parts: All Com-
binations and Each Choice.
107
Universidad Politécnica de Valencia Structured Software testing
The most obvious choice would be to take, for each input variable, all
valid parts of its input domain, and then to combine (the parts of) these
input variables in every possible way. This is known as All Combinations
coverage (abbreviated as ACC or AC coverage).
A test suite existing of only test cases 1, 2 and 3, for example, would
result in 75% AC coverage.
For Example 6.3 we would need 2 × 3 = 6 test cases to cover all com-
binations (see Table 6.5). However, vP4 is not valid when vP1 holds so
these two cannot be combined. Therefore we need only 5 test cases to
achieve maximal AC coverage (see Table 6.8).
108
Chapter 6 Input domain modelling with equivalence classes
TABLE 6.7 Test suite for dishwasher discount with 100% AC coverage
Another way to reduce the number of test cases is not to take all com-
binations but instead to ensure that (a value from) each part occurs in
at least one test case. This is known as Each Choice coverage (abbreviated
as ECC or EC coverage).
For Example 6.2, we have 11 valid parts that can be combined into 4
test cases to reach 100% EC coverage. For example, we can just take test
cases 1, 2, 7 and 12 from Table 6.7 and renumber them: see Table 6.9.
Example 6.3 has 5 valid parts in total. All of them are covered by test
cases 1, 4 and 5 from Table 6.8.
109
Universidad Politécnica de Valencia Structured Software testing
TABLE 6.9 Test suite for dishwasher discount with 100% EC coverage
ECC is the simplest strategy. For these small and easy examples, it might
even appear that you do not need to know about ECC and could have
come up with these test cases immediately without that much thought.
However, if the number of input variables grows, and accordingly the
number of different parts, then ECC will provide you with a tool that
allows you to work in a structured manner.
ECC is the simplest strategy and also the least effective for obtaining
good test cases, since it does not consider combinations. However, if
you have many different parts, it might be the only strategy that pro-
vides a feasible number of test cases that can be executed in a given
time with a given budget.
So, ACC seems like a good idea for covering many possibilities, but can
easily blow up and become impractical. ECC is the simplest way of
combining, but is less effective in obtaining good test cases.
110
Chapter 6 Input domain modelling with equivalence classes
ACC basically means t is the number of input variables: given all input
variables, every combination of valid parts of all these variables must
be covered in at least one test case.
ECC comes down to t =1: each valid part of the domain of every variable
must be covered in at least one test case.
The higher the number t, the more test cases there will be, and the more
effort the testing will take. t-wise coverage for t = 2 is also called pair-
wise coverage or All Pairs Testing.
Constructing all t-wise combinations and at the same time keeping the
number of test cases as low as possible can be very tedious to do by
hand. Techniques that are frequently used to create t-wise test sets in-
clude orthogonal arrays and covering arrays. However, many other tech-
niques and tools have been investigated and discussed. For example,
[51] contains a survey.
As said earlier, you cannot combine values from invalid parts in the
same test case. You will need one test case for each invalid part (see
Section 6.6). This is known as Invalid Part coverage (abbreviated as IPC
or IP coverage).
For Example 6.1, 100% IP coverage comes down to 6 test cases; for in-
stance, the ones shown in Table 6.10. The expected outcome as stated in
the specification is that a specific fault message (FM) is generated.
For Example 6.2, we have 12 invalid parts and so need 12 test cases for
IP coverage. The expected outcome of the test cases corresponding to
iP2 , iP5 , iP7 , iP8 , iP10 and iP12 is that the SUT (the Java method) or the
compiler will give a type error. However, the expected outcome for iP1 ,
111
Universidad Politécnica de Valencia Structured Software testing
iP3 , iP4 , iP6 , iP9 and iP11 is not explicitly described in the specification.
We probably found an error here already due to this incomplete specifi-
cation, as the method calculate_price does not check whether the
variables are within the desired domains.
For Example 6.3, there is only one invalid part, containing the empty
list.
6.7 Exercises
We conclude the subject of coverage criteria (for now) with some exer-
cises. The first three are quick and easy, the fourth is not! We recom-
mend that you definitely take the time to do Exercise 6.7.
EXERCISE 6.4
For your domain model from Exercise 6.1 (annual bonus), design test
suites using EC coverage and AC coverage. How many test cases do
you need to obtain 100% IP coverage?
EXERCISE 6.5
From your domain model from Exercise 6.2 (Z-function), design test
suites using EC coverage and AC coverage. How many test cases do
you need to obtain 100% IP coverage?
EXERCISE 6.6
From your domain model from Exercise 6.3 (hardware store) design test
suites using EC coverage and AC coverage. How many test cases do you
need to obtain 100% IP coverage?
EXERCISE 6.7
a Act as if you are the programmer of the code in Example 6.2 (dish-
washer) and you want to test the code using JUnit. You want to use the
test case values from this chapter, repair any faults and test again until
you are convinced the code is correct. Which set of test cases do you
use (i.e., the one with AC coverage, or the one with EC coverage)? Why?
b What about the tests for IP coverage? We already indicated that
test cases corresponding to iP2 , iP5 , iP7 , iP8 , iP10 and iP12 will give a
type error for the Java method, so we do not have to test these with
JUnit. However, the expected outcome for iP1 , iP3 , iP4 , iP6 , iP9 and
iP11 needs completing the specification. Let us suppose that method
calculate_price should throw an InvalidInputException when
the variables are of the correct type but not within the desired domains.
Define that exception, use it to complete the implementation, and test
iP1 , iP3 , iP4 , iP6 , iP9 and iP11 with JUnit.
112
Chapter 6 Input domain modelling with equivalence classes
113
Chapter contents 7
Overview 115
1 Introduction 115
2 Boundaries: ON and OFF points 116
3 Finding ON and OFF points for different types of values 117
3.1 Numerical types 117
3.2 Non-numerical types 118
3.3 User-defined types 118
3.4 n-dimensional types 120
4 1x1 boundary coverage 122
5 The domain test matrix 122
5.1 Domain test matrices for generate-grade 123
5.2 Exercises 128
6 Faults that can be found 130
114
Chapter 7
OVERVIEW
LEARNING GOALS
After studying this chapter, you are expected to:
– understand and be able to describe domain boundary testing
– be able to apply boundary value analysis using the 1×1 coverage
criterion.
CONTENTS
7.1 Introduction
115
Universidad Politécnica de Valencia Structured Software testing
min max
5 <= X < 10
Boundary b
P1 P2
Boundary a
P4
P3
Boundary testing has its roots in papers by White and Cohen in the late
1970s, early 1980s [125]. Jeng and Weyuker [55] simplified it in an article
in the mid-1990s. Since then, it has been described in several books, for
instance [14] and [11].
Boundary testing is not difficult, but possibly a bit tedious [11]. First we
need some more definitions.
DEFINITION 7.1 An IN point belongs to the domain, an OUT point does not.
An ON point is on a boundary, an OFF point is near a boundary.
116
Chapter 7 Input domain boundaries
OUT IN OUT
ON OFF
ON ON
The definition states that OFF points should be near the boundary. The
interpretation of this depends on the type of the domain. For integers
“near” means +1 or -1. For decimals, it means the smallest supported
decimal fraction.
Moreover, in theory, each ON point has two OFF points: one to the left
of the boundary and one to the right (or: one below the boundary and
one above). However, in some situations, only one of those OFF points
is necessary, as described below.
So for inequalities, we choose only one of the two available OFF points
for an ON point, as shown in Figure 7.3. There, the leftmost boundary
point (ON) is IN the domain (since it is a closed boundary), so we choose
its OFF point OUT of the domain. Similarly, the rightmost boundary
point (ON) is OUT, so we choose its OFF point IN the domain.
Note that in the case of x = 23 above, these rules apply as well: the ON
point is IN, both OFF points are OUT.
117
Universidad Politécnica de Valencia Structured Software testing
EXERCISE 7.1
Look closely at the examples in the table and make sure you understand
these in the light of the previous definitions.
Variable Boundary ON OFF
point point
1 y∈N y≤6 6 7
2 y∈N y<6 6 5
3 x∈R x > 10 10 10.00001
4 x∈R x ≤ 20 20 20.00001
However, if the input has an enumerated type that uses identifiers that
behave as constants, this is not possible. Imagine we have an enumer-
ated type
118
Chapter 7 Input domain boundaries
class Money {
private int fAmount;
private String fCurrency;
State abstractions
Other classes can be less simple. Sometimes we can use state abstrac-
tions to define a suitable partition [14, p. 408]. For example, for any im-
plementation of a stack or a queue we can define three abstract states:
empty, loaded and full. We can use these state abstractions to define par-
tition conditions, for example:
In this case, there is only one boundary: between vP1 and vP2 , and this
boundary itself does not have a value. Therefore, we cannot choose an
ON point exactly as it was meant: on the boundary. Instead, we use
the following rules for choosing an ON and an OFF point when crossing
a boundary from part (state) P into part (state) Q, both defined using
state abstractions:
• an ON point is an object that satisfies the partition condition of P (so it
is IN P), but the smallest possible change would cause a state change,
and hence an object that does not satisfy the partition condition (and
hence is OUT of P, and IN Q).
• an OFF point is an object that does not satisfy the partition condition
of P (so it is OUT of P, and IN Q), but the smallest possible change
would cause a state change, and hence an object that does satisfy it.
119
Universidad Politécnica de Valencia Structured Software testing
For example, let us look at the stack s, and suppose M > 1 is the maxi-
mum number of possible values in the stack. If we look at the partition
of the input domain into vP1 (i.e., empty) and vP2 (i.e., not empty) de-
fined above, we can make the following choices. The empty stack has
s.size = 0, and the smallest possible change that causes the stack to be
not empty any more is pushing an element onto s, resulting in s.size = 1.
In Figure 7.4 we determine ON and OFF points for all borders between
the three states empty, loaded and full of a stack. Note that we do not con-
sider the transition of state loaded to state loaded. This transition happens
for instance when pushing an element onto a stack that is not empty and
not full, resulting in a stack that is still not full. We skip this transition on
purpose because for the discussion in this chapter only transitions that
cross a border are of interest, while the transition from loaded to loaded
happens “in the middle” of the area satisfying the state condition loaded.
EXERCISE 7.2
What are the ON, OFF, IN and OUT points for the boundaries resulting
from partitioning according to the state conditions full and not full? An-
swer the same question for loaded and not loaded.
How can we proceed finding ON and OFF points for two of the bound-
aries of vP3 , which are defined by 70 ≤ exam + coursework ≤ 100? First,
we pick a value for one of the variables, let us say coursework. The value
we pick should not violate any of the other relevant parts concerning
coursework in Table 6.3; in this case vP2 , which states that 0 ≤ coursework
≤ 25. A rule of thumb here is to pick, if possible, a mid-point of the
valid part for the variable. In this case coursework = 12 is somewhere in
the middle of vP2 .
120
Chapter 7 Input domain boundaries
exam (e)
c = 25
100
90
80
e = 75
70
A
60 e + c = 100
50
B
40
e + c = 70
30
C
20 e + c = 50
10
D
e + c = 30
The upper bound of vP3 is exam + coursework ≤ 100. Here filling in the
number 12 for coursework results in exam ≤ 88. So if coursework is 12,
then to end up on the upper bound of vP3 we have to choose exam = 88.
However, this violates vP1 (0 ≤ exam ≤ 75), so this cannot be used as an
ON point. Here, Figure 7.5 comes in handy: there we see immediately
that the only point on the upper bound of vP3 that satisfies both vP1 and
vP2 is (25, 75). Hence for the upper bound of vP3 we get:
Note that there are other possibilities. One of these is to pick a value
for exam first and then solve the equation with coursework; then we get
different ON and OFF points.
EXERCISE 7.3
Find ON and OFF points for vP3 by picking a value for exam first.
121
Universidad Politécnica de Valencia Structured Software testing
From Exercise 7.3 it is clear that in theory, it does not matter whether we
fix the value of exam or the value of coursework. In practice, however, it
does matter: starting from a fixed value of coursework is much less work.
Moreover, it may also matter which representation of the partition we
use: from Figure 7.5 it is immediately clear that fixing a value for course-
work (let us say 12) ensures that we encounter all boundaries involved
in the generate-grade problem. If, however, we choose to start with
exam, we have to find a new starting value for each boundary.
There is one coverage criterion for boundary value analysis, that has
proven to be sufficient for all possible scenarios ([55], [14]):
DEFINITION 7.2 The 1×1 (“one-by-one”) coverage criterion calls for 1 ON point and 1 OFF
point for each domain boundary. The OFF point should be as close as
possible to the ON point.
You might wonder why the OFF point should be as close as possible to
the ON point. This will be explained in Section 7.6.
The domain test matrix (DTM) from Robert Binder [14] provides a conve-
nient representation to design a test suite for boundary value analysis
with the 1×1 coverage criterion. Figure 7.6 shows what an empty ma-
trix looks like and how it is built up of variables, boundary conditions,
ON/ OFF/ IN points and test cases.
Each matrix contains the conditions that describe exactly one domain
of which we want to analyse the boundaries (for example an equiva-
lence class). Each column is a test case. Since each test case is meant
to test one aspect of one boundary of the domain, only one ON or OFF
point appears in each test. These values appear on the diagonal of the
matrix. IN points are generated for all other variables in each test case:
these should not be boundary points. The IN points need to be chosen
before or after the ON and OFF points, depending on the conditions and
what you need. IN points are chosen by guessing, (pseudo)random se-
lection algorithms or analysing the situation [14], making sure that the
ON point is as close to the OFF point as possible [55].
122
Chapter 7 Input domain boundaries
EXERCISE 7.4
For each of the following examples of situations with two adjacent in-
teger domains (hence with a common boundary), determine the ON
and OFF points needed for that boundary, when looking at it from both
sides. The boundary itself is y = 23 in all cases.
a Which ON and OFF points are needed for y > 23? And for y < 23?
b The same question for y ≥ 23 and y < 23.
c The same question for y > 23 and y ≤ 23.
From Exercise 7.4 we conclude that, in situations where there are two
adjacent domains:
• in all cases, the ON point can be reused for both adjacent domains;
• if the boundary is IN exactly one of the two domains, the OFF points
for both domains are the same;
• otherwise, two different OFF points are needed, one for each domain.
123
Universidad Politécnica de Valencia Structured Software testing
We now need to make a matrix for each equivalence class. This way we
can analyse its boundaries. There are two variables, so the skeleton of
the DTM will look like this:
Now we have to look at the different conditions that define the bound-
aries of the equivalence classes. Again (like Exercise 7.3) we can choose
to express the conditions on e using c or the other way around. Let us
express the conditions on e in c, which means that we can analyse the
boundary conditions of the domain induced by vP2 and vP1 /vP3 using
the following DTM:
124
Chapter 7 Input domain boundaries
125
Universidad Politécnica de Valencia Structured Software testing
exam (e)
c = 25
100
P
90
80 K
e = 75
70
A
60 e + c = 100
L
50
B H
40
e + c = 70
30
C I
20 e + c = 50
10 J
D e + c = 30
Note that for test cases 9 and 10, picking typical value c = 12 causes
the ON point for e ≤ 100 − c to contradict e ≤ 75. Here we can make
a test case design decision. If we also choose to test the boundary be-
tween domains K and P we need to violate e ≤ 75. If we have informa-
tion that indicates that this boundary is not at risk of being violated we
can choose another typical value for c (this can only be 25, a boundary
value) and then test cases 9 and 10 coincide with the points (25,75) and
(25, 76). The latter always seems to be a good point to test and if we do
not have it, we could add it to the test suite.
Also, note that boundary value analysis as described above does not
provide test cases for the “corners” of a domain (i.e., the points where
two or more boundaries coincide, like (0,75) for instance). If you want to
include the corners as well, you will be doing robust worst-case boundary
testing, as Jorgensen calls it [57]. There is a trade-off between being
thorough and keeping the number of test cases to a minimum.
Let us continue with the DTM for the domain induced by vP2 and vP1 /vP4 .
The domain matrix is below:
126
Chapter 7 Input domain boundaries
This adds 6 test cases, indicated with green dots in the graph:
exam (e)
c = 25
100
P
90
80 K
e = 75
70
A
60 e + c = 100
L
50
B H
40
e + c = 70
30
C I
20 e + c = 50
10 J
D e + c = 30
As you can see, the domain matrix for vP4 generates some test cases that
we already added when analysing vP3 , so we do not add these again.
This is indicated by the lack of a test case number in the top row. The
reason for not adding the test case is shown in the bottom row.
For the domains induced by vP5 and vP6 the domain matrix are:
127
Universidad Politécnica de Valencia Structured Software testing
The resulting graph with depicted test cases can be found in Figure 7.7.
7.5.2 Exercises
The objective of the following exercise is not really to come up with the
test cases, but to practise making DTMs, so that when we are doing it
for problems with dimensions higher than 2 –for which graphs are not
easily drawn– you will be fluent and confident at it. Then you will be
able to use DTMs as a way to grasp higher dimensions easily.
128
Chapter 7 Input domain boundaries
exam (e)
c = 25
100
P
90
80 K
e = 75
70
A
60 e + c = 100
L
50
B H
40
e + c = 70
30
C I
20 e + c = 50
10 J
D e + c = 30
FIGURE 7.7 Test cases for vP5 and vP6 from Example 6.1
EXERCISE 7.5
Based on your domain model for the Z function from Exercise 6.2, con-
struct the DTMs needed to get test cases that give 100% 1×1 boundary
coverage. Draw a graph that illustrates the domain partition you mod-
elled and how the test cases from the DTMs cover the parts and bound-
aries (similar to that in Figure 7.5).
Hint: Use Excel or another spreadsheet program for the DTMs. You can
draw the graph on paper.
After finishing this exercise and really going through the effort of mak-
ing all of the DTMs, you will have realised that (although sometimes a
bit tedious) DTMs are a useful tool to analyse the boundaries. You will
have acquired a good feeling of how the boundaries are related to the
functionality you are testing. You will also have noticed that using Ex-
cel or a similar program can help a lot here. For the two-dimensional
case (i.e. two variables) you can easily check whether all boundaries
have been analysed: you need to have an ON and OFF point for each of
the boundaries induced by the boundary conditions.
129
Universidad Politécnica de Valencia Structured Software testing
EXERCISE 7.6
Based on the domain model from Exercise 6.3 (hardware store) design
a test suite that gives 100% 1×1 boundary coverage. Draw a figure that
illustrates the domain partitions we modelled and how the test cases
cover the partitions and boundaries.
EXERCISE 7.7
a Design test cases for Example 6.2 (dishwasher, see Table 6.4 on
page 104) using the DTM and the 1×1 boundary strategy.
Hint: There are a lot of variables here, so at first sight, it looks like our
first exercise with more than 2 dimensions. But is it? How can we make
it simpler? On which variables does the functionality depend? If we re-
duce the number of variables in the DTM, what consequences does this
have for our test cases?
b Draw your test cases in a graph representing the input domain par-
titions.
c Add JUnit tests for each of the designed test cases to your code from
Exercise 6.7.
If we then inspect the code to look for the cause of the failure, we can
find the following two types of faults:
• computation faults: the wrong function is applied to values of Pi in the
implementation.
• domain faults: the boundary between two parts in the implementa-
tion is wrong. General examples of domain faults for open bound-
aries are depicted in Figure 7.8, and similarly for closed boundaries
in Figure 7.9.
In Figure 7.8, the situation is as follows. Suppose that our model de-
scribes a part A of some input domain as an interval of which the left
boundary is open. If we apply the 1×1 strategy, we have to indicate 1
ON and 1 OFF point for this boundary. The ON point (that happens to be
OUT in this case) should have some computation related to part B, while
the OFF point (that should then be IN) should have some computation
related to A.
Figure 7.8 shows four kinds of faults that will be detected with this
strategy. We discuss them all:
• The closure fault occurs when a closed rather than an open boundary
is implemented. It will be found with this strategy because the ON
point should give rise to behaviour different from A (since it is OUT),
but we get behaviour corresponding to A instead.
130
Chapter 7 Input domain boundaries
FIGURE 7.8 Possible domain faults for open boundaries based on [11]
For closed domains the cases are similar, except now the strategy gives
us an ON point (that is IN) that should have computation related to A,
and an OFF point (that is OUT) that should have computation related to
B (see Figure 7.9).
EXERCISE 7.8
Figure 7.9 shows domain boundary faults for closed one-dimensional
domains. Make similar pictures for the closed two-dimensional case
and reason why the 1×1 strategy finds the faults. Imagine that the in-
tended closed domain A to be implemented is like this:
131
Universidad Politécnica de Valencia Structured Software testing
FIGURE 7.9 Possible domain faults for closed boundaries based on [11]
Draw and reason how the ON and OFF point can find the following bugs
in this 2-dimensional boundary: closure bug, shift up, and incorrect
inclination. Also, think of a scenario when the 1×1 strategy might miss
an incorrectly implemented boundary.
132
Chapter 7 Input domain boundaries
133
Chapter contents 8
Decision tables
Overview 135
1 Introduction 135
2 Decision tables 136
2.1 Extended/limited entry decision tables 138
2.2 Implicit variant: do not care (DNC) 140
2.3 Implicit variant: cannot happen (CNH) 140
2.4 Implicit variant: do not know (DNK) 141
2.5 Summary 142
3 Checking the decision table 144
3.1 Check implicit variants 144
3.2 Check decision table properties 144
3.3 Check testability 145
4 Coverage criteria for decision tables 145
5 Exercises 145
134
Chapter 8
Decision tables
OVERVIEW
In this chapter we use decision tables as the model to design test cases.
It fits with the way we present the techniques in this course as follows:
Design testcases
LEARNING GOALS
After studying this chapter, you are expected to:
– understand the goals of decision table testing
– be able to use decision tables to design test cases.
CONTENTS
8.1 Introduction
Imagine you are trying to test a product that makes many different de-
cisions based on combinations of input variables, or you are reading
a specification that lists a series of conditions under which different
events will occur, or you need to create test data for a system that pro-
cesses these data in complex ways. Then making a decision table as
135
Universidad Politécnica de Valencia Structured Software testing
Decision tables have been around for a long time [64]. Their use in
testing was first described in [50] where they were also called condition
tables.
Decision tables have a strong logical basis and so they are also called
a logic-based testing technique. Decision table testing consists of mak-
ing a decision table or finding one in the SUT’s specification and then
covering it. Decision table testing can help to test combinations of in-
put conditions, that might not be thoroughly tested with input domain
partition and boundary value analysis.
Section 8.2 will introduce the concepts and terminology behind deci-
sion tables. Section 8.3 discusses how we can check whether the deci-
sion table is suitable for testing. Subsequently, Section 8.4 explains how
to cover decision tables. Finally there is a section containing several
exercises. You will learn that the process of making and checking a de-
cision table for a SUT is again an exploratory process of deciding which
conditions interact, how they interact, and which conditions to include.
136
Chapter 8 Decision tables
Rules
z }| {
Rule 1 Rule 2 Rule 3 Rule 4
Condition 1
Condition 2 Condition
Conditions section Condition 3 entries
Condition 4
Actions section Action 1 Action
Action 2 entries
Each rule specifies the conditions that should be met to take the indi-
cated actions. The implicit logical operator between conditions is and
(∧). The order in which inputs arrive and conditions are evaluated is
irrelevant.
EXAMPLE 8.1 Table 8.3 shows (the column-based version of) a decision table describ-
ing an insurance renewal specification adapted from [123, 14]. Let us
look at some of the properties of Table 8.3.
• There are two decision variables: num_ claims and age.
• There are three possible actions: increase the premium amount of the
insurance, send a warning letter, cancel the policy.
• There are 4 conditions for the variable num_claims: num_claims =
0, num_claims = 1, num_claims ∈ [2, 4] and num_claims ≥ 5. They
coincide with the partitions of num_claims from Chapter 6.
• Similarly, age has 2 conditions/valid parts: 16 ≤ age ≤ 25 and 25 <
age ≤ 85.
• Consequently, we can combine these conditions in 4 × 2 = 8 rules.
However, for num_claims ≥ 5, the two possible conditions for age are
combined into one condition: (16 ≤ age ≤ 25 ∨ 25 < age ≤ 85) ⇔
16 ≤ age ≤ 85, resulting in the 7 rules in Table 8.3.
Table 8.4 shows the row-based version of Table 8.3, here the rules corre-
spond to the rows of the table.
137
Universidad Politécnica de Valencia Structured Software testing
1 2 3 4 5 6 7
num _claims =0 =0 =1 =1 ∈ [2,4] ∈ [2,4] ≥5
Condit-
section
≥16 ≥16 ≥16 ≥16
ions
> 25 >25 >25
age
≤25 ≤85 ≤25 ≤85 ≤25 ≤85 ≤85
Actions
section
Warning no no yes no yes yes no
Cancel no no no no no no yes
TABLE 8.3 Column-based decision table for the car insurance example
TABLE 8.4 Row-based decision table for the car insurance example
Up until now, we have been using extended condition entry decision tables
[57].
The advantage of decision tables for testing lies in the use of limited
condition entry decision tables [57] or decision tables in a truth table format
[14]. Here, what we can write in the condition entries is limited because
they can contain only True or False (or yes/no, or 0/1, et cetera), which
makes it much like a truth table from logic. Extended entry decision
tables can always be transformed into limited entry decision tables: the
variable num_claims with extended condition entry “[2, 4]” is translated
into the condition “num_claims ∈ [2, 4]” with limited condition entry
True. Table 8.3 can be converted to the truth table depicted in Table 8.5.
138
Chapter 8 Decision tables
1 2 3 4 5 6 7
Conditions =0 C1 T T F F F F F
=1 C2 F F T T F F F
section num_claims
∈ [2,4] C3 F F F F T T F
≥5 C4 F F F F F F T
∈ [16,25] C5 T F T F T F DNC
age
∈ ]25,85] C6 F T F T F T DNC
increment premium 50 25 100 50 400 200 0
Actions
section
TABLE 8.5 Column-based, limited entry decision table (or truth table)
We will now describe the qualifications DNC, CNH and DNK in more
detail.
139
Universidad Politécnica de Valencia Structured Software testing
1 2 3 4 5 6 7
Conditions =0 C1 T T F F F F F F
=1 C2 F F T T F F F F
section num claims
∈ [2,4] C3 F F F F T T F F
≥5 C4 F F F F F T T T
∈ [16,25] C5 T F T F T F T F
age
∈ ]25,85] C6 F T F T F T F T
increment premium 50 25 100 50 400 200 0 0
Actions
section
Do Not Care (DNC) indicates that the value of this condition has no
influence on the resulting actions. In Table 8.3, when 5 or more claims
have been filed, the age of the insured person does not matter anymore;
this is a DNC entry. We do not need to specify more than one explicit
variant because the actions are the same in either case. The truth table
in Table 8.5 makes the DNC variants a bit more visible.
Rule 7 actually counts for two variants, as depicted in Table 8.6. This
means rule 7 has rule complexity 2 because it hides 1 implicit variant.
Knowing the complexity of a rule means knowing how many implicit
variants it is hiding.
Can Not Happen (CNH) variants often lead to faults that arise from
unwarranted assumptions about so-called impossible situations.
140
Chapter 8 Decision tables
----------------------------------------------------
50-50 Program Specification
----------------------------------------------------
This is a routine for use on a Field Programmable
Gate Array. It’s intended to supply the sum of two
numbers submitted as input. The numbers must be
integers in the range of 0 to 50. For the benefits
of performance, there is no error checking. Calling
clients are responsible for their own data checking.
EXERCISE 8.1
To get an idea of how dangerous CNH variants are, let us consider the
program specification in Figure 8.1.
a How would you test such a program?
b There are three different implementations of that specification avail-
able on the course site. If you do not look into these implementations
but just run the tests on them, do you get the same results on all three
implementations?
c If you now take a look at the implementations, can you say anything
about the assumptions you made about things that cannot happen?
Implicit variants DNK imply that the specification is not complete. For
example, in Tables 8.3 and 8.4 we do not know what will happen if
the age of the insured person is younger than 16 years or older than
85 years. From a testing point of view these implicit variants have to
be taken into account. Because, although we do not know what will
happen if we try to renew the insurance from somebody of age 90, we
do know that something will happen.
141
Universidad Politécnica de Valencia Structured Software testing
When faced with DNK variants, we may have actually found an issue
with the specification which is not complete. Facing this we might need
to go back and explore to find out why the specification is not complete
(maybe it was deliberately underspecified), what information the spec-
ification needs to have more and how the SUT reacts in these cases.
8.2.5 Summary
Testing can be seen as the art of managing these assumptions and learn-
ing to fill up the lack of information. Checking the decision table for
the purpose of testing is all about investigating these implicit variants,
checking the underlying assumptions and finding incomplete informa-
tion. We already mentioned most of these checks in the previous sub-
sections, while we were constructing the table. In Section 8.3 we will
summarise the checks. First, in the following exercise you can see for
yourself how many and what assumptions we are actually making.
EXERCISE 8.2
Construct just the conditions section of the entire limited entry decision
table corresponding to Table 8.3, for instance using Excel. Determine
the implicit variants and their causes (DNC, CNH or DNK).
N . B . We will deal with the actions later.
From Exercise 8.2 we see that, for the car insurance example, all CNH
implicit variants are due to the assumption that variables can only have
one value at a time. If we cannot be sure of the validity of this assump-
tion, we should add explicit rules for (some of) these, with as action a
specific error message like “num_claims and age can each only have one
value”.
In fact, taking a closer look at the DNKs makes us ask some additional
questions: what does it mean that num_claims does not have a value, or,
more precisely, does not satisfy any of its 4 conditions? Is it undefined?
Is it a negative integer? Is it not an integer? If we want to distinguish
between these possibilities in our decision table, we need to add more
conditions, as shown in Table 8.8.
Now the question is, should we do that? Our answer is: probably not,
because:
142
Chapter 8 Decision tables
1 2 3 4 5 6 7 8 9
Conditions =0 C1 T T F F F F F DNC F
=1 C2 F F T T F F F DNC F
section num claims
∈ [2,4] C3 F F F F T T F DNC F
≥5 C4 F F F F F F T DNC F
∈ [16,25] C5 T F T F T F DNC F DNC
age
∈ ]25,85] C6 F T F T F T DNC F DNC
increment premium 50 25 100 50 400 200 0 no no
Actions
section
1 2 3 4 5 6 7 8 9 10 11
=0 C1 T T F F F F F DNC F F F
Conditions
=1 C2 F F T T F F F DNC F F F
num claims
section
∈ [2,4] C3 F F F F T T F DNC F F F
≥5 C4 F F F F F F T DNC F F F
is undef C7 F F F F F F F F F T F
is NaN C8 F F F F F F F F F F T
∈ [16,25] C5 T F T F T F DNC F DNC DNC DNC
age
∈ ]25,85] C6 F T F T F T DNC F DNC DNC DNC
Actions
section
TABLE 8.8 Adding conditions to deal with inputs of the wrong type.
Should we do that?
• it would make our table very large, complex and unmanageable (see
for example Table 8.8 where we only started to add two more condi-
tions for invalid inputs for num_claims);
• moreover, adding the checks for validity of input values to our table
would distract us from the core business rules we are trying to test.
143
Universidad Politécnica de Valencia Structured Software testing
Part of testing using decision tables consists of checking the tables con-
cerning:
1 implicit variants that should be explicit for testing;
2 the information that was used to create the table;
3 the SUT for which test cases need to be designed.
!1 Check that all implicit variants CNH indeed cannot happen. If they
can happen they should be replaced with an explicit variant such that
the case is forced during testing.
!2 For all implicit variants DNK, check what is unknown/missing
from the information used to design the decision table. It should be
replaced with an explicit variant.
!3 For all implicit variants DNC we need to check how to deal with
the values of the input variables we do not care about. If values for
these inputs:
• are necessary, one test case for the DNC variant is enough. The test
case can contain IN values for these DNC input variables, and checks
whether these inputs indeed have no effect.
• cannot be provided, also one test case for each DNC variant is enough
and we evidently do not need test values for these input variables.
• may be omitted, two test cases are needed per DNC, one with values
and another without values, to check whether they indeed have no
effect. Here we could add an explicit variant such that both cases are
covered.
!4 All variants are mutually exclusive: if the conditions for one vari-
ant are met, no other variant is applicable.
!5 Each rule is unique: if several actions are to result from one variant,
multiple actions are defined; the variant is not repeated.
!6 The actions specified for a variant with DNCs are acceptable for all
possible truth values of the DNC conditions.
144
Chapter 8 Decision tables
!4 and !5 are essentially the same. They differ only in their goals: !4
is about the conditions, while !5 is about the actions.
Actually, these last three checks should be satisfied for all test suites,
not just the ones that result from a decision table approach!
For a decision table that has been checked, all-explicit variant coverage is a
sufficient testing strategy, assuming that adequate testing of the invalid
inputs and domain boundaries is performed.
8.5 Exercises
As always in this course, there is not just one right way to tackle a test-
ing problem. In other words, if your solution differs from ours, this
does not necessarily mean that one of them is not correct.
EXERCISE 8.3
Let us return to the specification of Example 6.2 (price of a dishwasher)
and the accompanying code fragment on page 103.
a Infer the conditions and the actions you think are necessary for mak-
ing a decision table for testing this software. What are the business
rules? What inputs do we need to apply them?
b Make both an extended and a limited entry decision table.
145
Universidad Politécnica de Valencia Structured Software testing
c Check the 9 properties mentioned in Section 8.3, if you did not al-
ready consider these while making the table.
d Which test cases do we obtain from this decision table? Add them
to the JUnits that we made in previous chapters (i.e. Exercises 6.7 and
7.7)?
EXERCISE 8.4
The function in this exercise comes from Jorgensen [57]. We use it here
because it illustrates the problem of dependencies in the input domain.
Jorgensen explains that this makes it a perfect example for decision-
table testing, because decision tables can highlight such dependencies.
NextDate is a function that takes three variables (month, day and year),
and returns the date of the day after the input date. The input variables
are of type Integer, and it should work for years starting from 1800.
a Make a limited entry decision table.
b Check the 9 properties mentioned in Section 8.3, if you did not al-
ready consider these while making the table.
c What other ways can you think of to test the NextDate function?
What would you use for an oracle?
146
Chapter 8 Decision tables
147
Chapter contents 9
Combinatorial testing
Overview 149
1 Introduction 149
2 Faults due to interactions of conditions 150
3 Combinatorics 151
4 Orthogonal and covering arrays 153
4.1 Orthogonal arrays 153
4.2 Covering arrays 156
5 Challenges for practical application of combinatorial testing 157
6 The oracle problem 157
7 Configurations and other test relevant aspects 158
148
Chapter 9
Combinatorial testing
OVERVIEW
Throughout the previous chapters, we have already seen that test case
design involves combining. For example, truth values of conditions
may have to be combined, or values of different input parameters. Also,
when doing configuration testing (see Section 1.10.3), we need to test
combinations of configuration parameters, e.g. operating systems (ver-
sions of Windows, Linux, macOS, et cetera), browsers (Internet Ex-
plorer, Safari, Chrome, et cetera), different versions of compilers, pro-
cessors, peripherals (e.g. printers, modems), varying amounts of mem-
ory available, et cetera.
This chapter contains reading assignments that involve reading the fol-
lowing paper, which you can find on the course site:
LEARNING GOALS
After studying this chapter, you are expected to:
– understand the goals and challenges of testing combinations
– be able to find and use precalculated orthogonal arrays and cover-
ing arrays, both fixed-level and mixed-level
– know of the existence of other techniques and tools to calculate
combinatorial test suites.
CONTENTS
9.1 Introduction
149
Universidad Politécnica de Valencia Structured Software testing
We saw that the easiest one, giving the least number of test cases, was
1-wise (or ECC), where each valid part of every variable was covered in
at least one test case. The higher the number t, the more test cases there
will be, and the more effort the testing will take. We also referred to
t = 2 as pairwise coverage since then all possible pairs of parameters are
supposed to be covered by at least one test.
EXERCISE 9.1
The second paragraph of Section 1 from [69] talks about the growing
complexity of software, mentioning the increasing numbers of Lines of
Code (LOC):
• the Pathfinder software with 155,000 LOC
• a Boeing 777 with 6.5 million lines
• the Windows XP operating system with 40 million lines
• an average new car with around 100 million LOC
Although the paper is from 2010, these numbers are still up to date.
How many LOC do you think are in the Android Operating system,
Facebook and all Google services?
EXERCISE 9.2
The text in [69] mentions a couple of studies carried out since the 1990s.
Which three causes of system failures were found in these studies?
150
Chapter 9 Combinatorial testing
Studies have shown that with 2-way coverage you will, in general, find
more than half of the faults. However, you will have to go up to 6-way
to be reasonably sure about finding all faults.
9.3 Combinatorics
We start with recalling the mathematics behind this. Then we will show
what it is all about in a small example and, finally, we will briefly return
to the manufacturing automation system.
Mathematics
The number of ways of choosing t things from a set of n things, in no
particular order, is given by the binomial coefficient (nt), for 0 ≤ t ≤ n.
In other words, (nt) denotes the number of possible t-way combinations
of n parameters. The binomial coefficient can be calculated as follows:
n n!
=
t t!(n − t)!
Here n! = 1 · 2 · . . . · (n − 1) · n is the number of different permutations
of n parameters.
151
Universidad Politécnica de Valencia Structured Software testing
If this is too expensive, we can opt for 2-way coverage, for instance.
There are (62) = 15 different 2-way parameter combinations: ab, ac, ad,
ae, af, bc, bd, be, bf, cd, ce, cf, de, df, ef (in combinations, order is not
important, so ab denotes the same combination as ba).
Since each parameter has 3 values, the number of 2-way value combina-
tions to cover is (62) × 32 = 15 × 9 = 135. Even if we make one test case
for each of these 135 combinations, the number of test cases is already
significantly less than 729.
However, we can reduce the number of test cases even further by realis-
ing that one test case covers (much) more than one 2-way value combi-
nation. For example, 012012 contains one pair of values for each pair of
parameters: 01 for ab, ae, db and de; 00 for ad and so on. Every test case
with concrete values for each of the 6 parameters covers 15 different
2-way parameter combinations.
If we carefully construct the test cases, we can cover all 135 different
value combinations in remarkably few tests. In fact, 15 test cases are
sufficient, as shown below. Note that the – here means DNC (Do Not
Care).
Alternatively, we could opt for 3-way coverage. There are (63) = 20 dif-
ferent 3-way parameter combinations: abc, abd, abe, abf, acd, ace, acf, ade,
adf, aef, bcd, bce, bcf, bde, bdf, bef, cde, cdf, cef, def. So the number of 3-way
value combinations that must be covered is (63) × 33 = 20 × 27 = 540.
Again, if we carefully construct the tests, we can cover these in 49 tests:
152
Chapter 9 Combinatorial testing
Note the use of the word “Alternatively” when introducing the 3-way
option above: you never choose for 2-way and 3-way coverage, since
covering all 3-way combinations inevitably means having all 2-way
combinations as well.
The question now is: how do we carefully construct these 200 tests?
run f1 f2 f3
1 0 0 0
2 0 1 1
3 1 0 1
4 1 1 0
1 https://en.wikipedia.org/wiki/Leonhard_Euler
153
Universidad Politécnica de Valencia Structured Software testing
Imagine a SUT with 4 toggle buttons that can be on, off or disabled.
That means we have 4 factors (k = 4) and |S| = 3. If we want to do
2-way testing we need to find an orthogonal array OA( N, 34 , 2). Note
as a tester you do not create orthogonal arrays yourself. Just locate the
right one for your purpose. For example, N. J. A. Sloane has a library2
of over 200 precalculated orthogonal arrays. The only thing you as a
tester need to do is find the right array and map your own values onto
the values used in the array you found.
For example, we can find3 the following OA(9, 34 , 2) (or OA(9, 4, 3, 2)):
run f1 f2 f3 f4
1 0 0 0 0
2 0 1 1 2
3 0 2 2 1
4 1 0 1 1
5 1 1 2 0
6 1 2 0 2
7 2 0 2 2
8 2 1 0 1
9 2 2 1 0
For our test problem we can turn that into the following tests when we
map 0 7→ on, 1 7→ off and 2 7→ disabled:
The orthogonal arrays above are also known as fixed-level orthogonal ar-
rays, because we assume that all k factors draw their values from the
same set S. However, in practice this is not the case. For this we need
mixed-level orthogonal arrays.
2 http://neilsloane.com/oadir/
3 http://neilsloane.com/oadir/oa.9.4.3.2.txt
154
Chapter 9 Combinatorial testing
k
A mixed-level orthogonal array is denoted by MA( N, s11 s2k2 . . . skmm , t), where:
• N is again the number of runs
• the number of columns (i.e. factors) is ∑im=1 k i
• m is the number of different sets of levels s1 . . . sm
• each factor k i (1 ≤ i ≤ m) has si levels (i.e. can take si different
values)
• t is the strength
If we want to test this 2-way, we can find4 the following MA(18, 36 61 , 2):
We should map the values onto each column to obtain a 2-way test
suite. For example:
• operating systems: 0 7→ MacOS, 1 7→ Linux, 2 7→ Windows
• server systems: 0 7→ Windows, 1 7→ Linux, 2 7→ Unix
• user: 0 7→ admin, 1 7→ guest, 2 7→ registered
• time: 0 7→ day, 1 7→ evening, 2 7→ night
• battery: 0 7→ low, 1 7→ medium, 2 7→ full
• connection: 0 7→ Wi-Fi, 1 7→ cable, 2 7→ data
• browsers: 0 7→ IE, 1 7→ Firefox, 2 7→ Mozilla, 3 7→ Safari, 4 7→ Chrome,
5 7→ Opera
4 http://neilsloane.com/oadir/MA.18.3.6.6.1.txt
155
Universidad Politécnica de Valencia Structured Software testing
Covering arrays have only one key difference with orthogonal arrays.
While an orthogonal array OA( N, |S|k , t) covers each possible t-tuple
exactly λ times in any N × t subarray, a covering array CA( N, |S|k , t)
covers each possible t-tuple at least λ times in any N × t subarray. The
covering array relaxes the restriction that each combination is covered
exactly the same number of times. Thus covering arrays may result
in some test duplication, but they offer the advantage that they can be
computed for much larger problems than is possible for orthogonal ar-
rays.
k
Mixed-level covering arrays are denoted by MCA( N, s11 s2k2 . . . skmm , t).
Again, as a tester you do not create covering arrays yourself. There are
tools (see Section 9.5) and also websites5,6 where they can be found.
Note that in the above examples, we just found an array that precisely
fits our needs in terms of number of factors and corresponding levels.
However, this is not always possible. If this happens we can simply
take the next larger array [36]. In [18] simple rules can be found.
k
Imagine we are looking for an array M(C)A( N, s11 s2k2 . . . skmm , t).
l lp
We can use another M(C)A( M, r11 r2l2 . . . r p , t) when:
1 the chosen array has at least as many factors (columns) as the prob-
p
lem we are testing. That is: ∑im=1 k i ≤ ∑i=1 li . If there are too many
columns we can just drop the ones we do not need because it maps
a factor that does not exist.
2 the chosen array must hold at least enough unique levels in the
columns to hold all the options for each factor. We can replace any
unused number with a valid option for the factor (this option is a
DNC). Evidently, we cannot just delete rows in the chosen array.
We can only delete rows when they only contain DNC entries or
t-tuples that are covered in other rows.
5 https://math.nist.gov/coveringarrays/
6 http://www.public.asu.edu/~ccolbou/src/tabby/catable.html
7 https://math.nist.gov/coveringarrays/
156
Chapter 9 Combinatorial testing
The second issue is common to all of software testing and has been
discussed in Section 1.6.
We will take a brief look at tool support for combinatorial testing. There
are many tools; below we list three that are related to this course.
ACTS The authors of the paper from the reader promote a tool called
ACTS (Advanced Combinatorial Testing System). The tool can be ob-
tained by sending an e-mail to the first author, Rick Kuhn8 . ACTS sup-
ports t-way test set generation for 1 ≤ t ≤ 6.
In testing, an oracle is the mechanism you use to decide whether the test
case output is correct or not. Or as Section 5 of [69] defines it: a test
component that determines the expected result for each set of inputs.
8 [email protected]
9 http://www.satisfice.com/tools/pairs.zip
157
Universidad Politécnica de Valencia Structured Software testing
So, the oracles we have seen in this course were all crafted manually by
the tester. In Section 5 of [69], however, automated generation of oracles
based on models is mentioned. This means that we need to make mod-
els that automatically decide if observed program behaviour is correct
with respect to its specification. Evidently, this can get very expensive.
The better and more accurate the oracle, the closer the model will come
to yet another implementation of the SUT. And such a complicated au-
tomated oracle might need to be tested itself...
Crash testing is based on implicit oracles, i.e. oracles that rely on implied
information and assumptions about any type of software system. For
example, a system should not crash, hang or spit out ugly error mes-
sages. Note that these oracles are cheap because they hold for almost
all software systems. However, they can only give verdicts about crash-
related behaviour, nothing about the domain-specific functionality of
the SUT.
The human oracle [9] is the most expensive but also the most sophisti-
cated oracle. Humans can be guided by heuristics (e.g. blink oracles).
158
Chapter 9 Combinatorial testing
EXERCISE 9.3
The options mentioned for the different configurations of the telecom-
munications software are:
• call = {local, long distance, international}
• billing = {caller, phone card, 800}
• access = {ISDN, VOIP, PBX}
• server for billing = {Windows Server, Linux/MySQL, Oracle}
Design a 2-wise and a 3-wise test suite for this system.
159
Chapter contents 10
Mutation testing
Overview 161
1 Introduction 161
2 How tests find faults 162
3 Fault-based testing 164
4 Mutation testing 165
4.1 Central hypotheses 168
4.2 Equivalent mutants 169
5 Scalability 171
5.1 Skipping unreachable mutants 171
5.2 Mutant schemata 171
5.3 Mutant sampling 173
6 Mutation testing in practice 173
6.1 Mutation testing at Google 174
6.2 Mutation testing for fun 174
7 Summarising the mutation testing method 175
160
Chapter 10
Mutation testing
OVERVIEW
LEARNING GOALS
After studying this chapter, you are expected to:
– be able to explain the necessary conditions for a test to reveal a
fault
– be able to create artificial faults using mutation testing
– be able to distinguish equivalent from non-equivalent mutants, us-
ing tests.
CONTENTS
10.1 Introduction
@Test
public void testSumWithAssertions(){
Calculator c = new Calculator();
int result = c.sum(1, 1);
int expected = 2;
assertEquals(expected, result);
}
161
Universidad Politécnica de Valencia Structured Software testing
Both tests execute the same code in the Calculator class, and thus
achieve identical coverage. Yet, testSum is not as effective as
testSumWithAssertions, since it does not actually check the result
of the invocation of the method under test. Thus, if the sum is wrong,
testSum will not fail. Code coverage is oblivious to this problem. In
this chapter, we will take a look at mutation testing, which is a tech-
nique that helps us to overcome this limitation of code coverage. As
we will see in this chapter, mutation testing would immediately point
out that the test case testSumWithAssertions is clearly better than
testSum. To understand why, let us start by taking a closer look at
what is required for a fault to be detected.
The aim of testing is to find faults. Let us revisit the example function
counting the number of zeros occurring in an array of integers (Chap-
ter 1):
public static int numZero (int[] x) {
int count = 0;
for (int i = 1; i < x.length; i++) {
if (x[i] == 0) count++;
}
return count;
}
Let us assume we have the following test case for this function:
@Test
public void testZeros() {
int[] values = {1,0,2};
int numZeros = numZero(values);
assertEquals(1, numZeros);
}
This test will pass: our implementation returns a count of 1, which is the
correct expected value. Clearly our test is not that good if it misses such
a blatant bug! However, if you did not already know that there was the
fault in the for-loop, how would you know whether to trust the test or
not? The usual answer to this would be code coverage: our example
test has more than 0 elements, so it enters the for-loop; it also contains
both zero-valued and non-zero elements, so the if-condition evaluates
to true and false in the test execution. Thus, our example test achieves
not only 100% statement coverage, but also 100% branch coverage. And
yet, it does not spot the bug. Why not?
The underlying reason is that there are four prerequisites for a fault
to manifest itself as a test failure: reachability, infection, propagation and
expectation.
162
Chapter 10 Mutation testing
Reachability
According to reachability, faulty code can be detected if, and only if,
it is executed. Reachability is what code coverage measures. Since our
testZeros test achieves 100% code coverage, we know that if there is a
fault, then that fault will be reached. However, reaching and executing
a fault does not guarantee that the fault also manifests in some change
of the execution state.
Infection
The execution state of a program is defined by its variables; for example,
the execution state of numZero is defined by the variables count and
i. If the fault does not cause any of the variables to take on a different
value than it would have in the correct program, then there is no way
to detect the presence of a fault. Whether or not the fault causes a state
change depends on the test input (i.e., the array of integers x in our
example). For some inputs, the state is not affected (i.e., does not differ
from the state the correct program would be in), for others it may be
affected. If the state is affected, then we say that the fault infects the
program state, and the test fulfils the second prerequisite for detecting
faults. Consider our example test case – does it infect the state? It does,
since the correct program would initialise i with the value 0 on the
first execution, while our faulty program sets i to 1 initially. So our test
reaches the fault, the execution infects the state, but the test nevertheless
fails to detect that the program state has changed.
Propagation
An infected state differs in some of its variable values from the expected
state. Not all the variables are observable by a test, as some variables
are internal to the code under test. In our example, the variable i is a
local variable inside the function with a scope that ends when the for-
loop ends. Thus, regardless of how wrong the value of i is, a test case
can never check that value (a developer instead would be able to look
inside the execution of the for-loop using a debugger.) Since we cannot
directly observe the value of i in our test, the only way we have to
detect that the program reaches an infected state is to check if the fault
somehow affects other variables that we can observe, such as variable
count in our example. Throughout the course of execution an infected
state is likely to affect other variables as more computations and value
assignments are performed, and infect their state too. Sooner or later a
wrong internal variable value may therefore propagate to an observable
variable. This propagation of the infected program state is the third
prerequisite for detecting a fault. In our example, this prerequisite is
not satisfied, since the value of count is not wrong at any point of the
test execution and, therefore, the test cannot detect the fault.
Expectation
A failure is the manifestation of an infected state that is propagated to
an observable difference. If the failure is a program crash, then that is
easy to spot. However, if the failure consists of a wrong value, then the
test needs to compare the observed output with the expected output.
163
Universidad Politécnica de Valencia Structured Software testing
This expectation is the final prerequisite for the test to be able to mani-
fest the fault as a failure. In principle, our example test case satisfies this
part, since it has an assertion that compares the return value with the
expected value. However, since the state infection does not propagate
in this test, the assertion holds and the test passes.
Now that we know all the prerequisites for finding the fault in our ex-
ample function, it is easy to create a test that satisfies all prerequisites.
The test needs:
1 to use a non-empty array to satisfy reachability, i.e., the wrong ini-
tialisation of i;
2 to infect the state (which is given by the wrong assignment);
3 to propagate the state infection, which requires the value 0 at the
first position in the array;
4 to check the propagated value of count in an assertion.
Below you can find a test that satisfies all of these prerequisites.
@Test
public void testZeros() {}
int[] values = {0,1,2};
int numZeros = numZero(values);
assertEquals(1, numZeros);
}
EXERCISE 10.1
Consider the following function, which returns the number of occur-
rences of y in the array x. If x is null, then the function throws a
NullPointerException.
public int countNum(int[] x, int y) {
int num = 0;
for (int i=x.length-1; i>0; i--) {
if (x[i] == y) {
num++;
}
}
return num;
}
164
Chapter 10 Mutation testing
We may not know where the faults in our program are, but we do know
which faults we have made in the past. So, the idea of fault-based testing
is to build on this experience and check if our tests would be able to find
bugs that are similar to the ones we have made in the past. The basic
scenario is this: given a program under test and a test suite where all
the tests pass (usually called a green test suite), would this test suite be
able to find past faults? To find out, we artificially create faulty versions
of the program that represent our past bugs. Then we run the tests on
these faulty versions and observe whether the tests fail or not. If a test
fails on a program with an artificial fault, then that means that the test
satisfies all four conditions necessary to reveal that type of fault. If an
artificial fault does not make any of the tests in our test suite fail, then
at least one of the conditions is not satisfied for that fault, and it shows
that our test suite needs improvement. After all, if the test suite cannot
detect the artificial fault, it is also unlikely to detect whether we have
made a similar bug in the current program.
Thus, fault-based testing has two applications. On the one hand, it can
be used to assess whether a test suite is good enough to detect certain
types of faults; if it is, then we can be more confident that this type
of bug does not exist in our program. On the other hand, fault-based
testing guides us in improving a test suite, since every inserted fault
that is not detected reveals a deficiency in our test suite. As we have
complete knowledge of the inserted faults, we can create new test cases
which fulfil reachability, infection and propagation, as well as declaring
proper assertions as we have done in the previous section.
165
Universidad Politécnica de Valencia Structured Software testing
166
Chapter 10 Mutation testing
EXERCISE 10.2
Given the following code, which sorts an array of integers using the
widely known bubble sort algorithm, and a test which tests it, create a
mutant which survives the test, and a mutant which is killed by it.
public class Sorter{
public static void bubbleSort(int[] array) {
boolean swapped = true;
int j = 0;
int tmp;
while (swapped) {
swapped = false;
j++;
for (int i = 0; i < array.length - j; i++) {
if (array[i] > array[i + 1]) {
tmp = array[i];
array[i] = array[i + 1];
array[i + 1] = tmp;
swapped = true;
}
167
Universidad Politécnica de Valencia Structured Software testing
}
}
}
}
@Test
public void testBubbleSort() {
/* setup */
int[] inputArray = new int[]{1, 3, 2};
/* exercise */
int[] actualArray = Sorter.bubbleSort(inputArray);
/* assert */
int[] expectedArray = new int[]{1, 2, 3};
Assert.assertArrayEquals(expectedArray, actualArray);
}
A mutant differs from the original program in exactly one very sim-
ple and very small syntactic change. If you look closely, all the muta-
tion operators we listed produce mutants that are really more like sim-
ple programming glitches, where the programmer hit the wrong key,
or mixed up two different variables. Real faults are not always that
simple: real faults can be the result of substantially misunderstanding
the software requirements, or of fundamentally breaking an algorithm.
these simple mutants? So how can examining these simple mutants be
sufficient? Since real faults may be arbitrarily large, a tempting and ob-
vious thought would be to produce more complex artificial faults by
combining multiple mutations into larger mutants. We call a combina-
tion of multiple mutants a higher order mutant. However, it is generally
sufficient to look at our simple mutants, if we make two fundamental
assumptions.
168
Chapter 10 Mutation testing
Consider the following code example, which sums the first four ele-
ments of an array:
public static int sumFirstFour(int[] b) {
int sum = 0;
for (int i = 0; i < b.length; i++) {
sum = sum + b[i];
if (i == 3) {
break;
}
}
return sum;
}
Now let us apply the ROR mutation operator on the if-statement, which
replaces if (i == 3) with if (i >= 3):
public static int sumFirstFour(int[] b) {
int sum = 0;
for (int i = 0; i < b.length; i++) {
sum = sum + b[i];
if (i >= 3) {
break;
}
}
return sum;
}
Any test with a non-empty array will reach the mutant. However, no
matter what test we use, we cannot get the mutant to infect the state:
for the mutant to produce a different execution state than the original
program, the variable i would need to be larger than 3. However, as
soon as i equals 3 the program exits the loop and so the if-condition can
never be evaluated for values of i larger than 3. The problem is that,
even though the mutant and the original program differ syntactically,
they do not differ semantically. In mutation testing, this type of mutant
is known as an equivalent mutant.
169
Universidad Politécnica de Valencia Structured Software testing
EXERCISE 10.3
Given the following program, which finds the maximum value in the
int array values, and some of its mutants, decide for each mutant if
it is equivalent or not. In the latter case, prove that the mutant is non-
equivalent by showing a test which kills the mutant.
public static int max(int[] values) {
int r, i;
r = 0;
for (i = 1; i<values.length; i++) {
if (values[i] > values[r])
r = i;
}
return values[r];
}
Mutant 1:
public static int max(int[] values) {
int r, i;
r = 0;
for(i = 0; i<values.length; i++) {
if (values[i] > values[r])
r = i;
}
return values[r];
}
Mutant 2:
public static int max(int[] values) {
int r, i;
r = 0;
for(i = 1; i<=values.length; i++) {
if (values[i] > values[r])
r = i;
}
return values[r];
}
Mutant 3:
public static int max(int[] values) {
int r, i;
r = 0;
for(i = 1; i<values.length; i++) {
if (values[i] >= values[r])
r = i;
}
return values[r];
}
Mutant 4:
public static int max(int[] values) {
int r, i;
r = 1;
for(i = 1; i<values.length; i++) {
if (values[i] > values[r])
r = i;
}
return values[r];
}
170
Chapter 10 Mutation testing
Mutant 5:
public static int max(int[] values) {
int r, i;
r = 0;
for(i = 1; i<values.length; i++) {
if (values[r] > values[r])
r = i;
}
return values[r];
}
10.5 Scalability
A primary factor of the high costs are the many test executions that mu-
tation testing potentially requires. However, many test executions can
be avoided: first, and rather obviously, once a mutant has been killed,
no further tests need to be executed on that mutant. However, there is a
further simple yet effective optimisation: recall that tests need to reach
the mutant, infect the state, and propagate the state infection. Whether
or not a mutant is reached is something we can know without actually
executing the test on the mutant. It is sufficient to know whether the
mutated location in the source code is covered by the tests; if a mutant
is not covered, then by definition it cannot be killed, and we do not need
to execute any tests. Typically, all tests are executed on the non-mutated
version of the program as a first step in the mutation analysis (to check
whether all tests pass), and coverage information can be generated dur-
ing this initial execution.
EXERCISE 10.4
Why is such a simple solution effective in practice?
171
Universidad Politécnica de Valencia Structured Software testing
The mutation now simply replaces the + operator with the aor meta-
operator:
public static int sum(int x, int y) {
return aor(x, y, 1);
}
172
Chapter 10 Mutation testing
This way, the compilation process only needs to be done once and the
overhead of repeatedly restarting virtual machines and other execution
infrastructure is removed. For each mutant, the same program is exe-
cuted, and simply setting activeMutation to the appropriate value
will activate the corresponding mutant.
EXERCISE 10.5
With mutant schemata the compilation process is done just once, but
what about test execution? Does test execution need to be repeated or
do we need to run each test only once?
EXERCISE 10.6
Given the following code snippet, create code for a mutant schema us-
ing the ROR mutation operator (assume that a getVariant function
exists).
public static boolean isAplusBgreaterThanC(int a, int b, int c){
return (a + b > c);
In the past, several research tools that implement mutation testing were
proposed. For example, for the Java programming language, notewor-
thy mutation testing tools are µJava [72], Javalanche [106], and Ma-
jor [58].
173
Universidad Politécnica de Valencia Structured Software testing
Usually, research prototypes have limitations and are only meant to il-
lustrate or evaluate the application of advanced mutation testing tech-
niques. For Java, a well-known mutation testing framework commonly
applied in practice and actively maintained is PIT (on the course site
there is a small laboratory exercise for using PIT).
174
Chapter 10 Mutation testing
tests using a code editor. The goal of the attackers is to introduce arti-
ficial bugs into the classes under test that reveal weaknesses in the test
suites, hence they gain points if their mutants survive the tests. Con-
versely, defenders aim to improve the test suites by adding tests which
deflect the attacks, hence they gain points by killing mutants.
A test can only reveal a fault if the test (1) reaches the fault in the
code, (2) the execution of the fault infects the execution state, (3) the
infected execution state propagates to an observable output, and (4) the
test checks this output against an expected value. While code cover-
age focuses on reachability, mutation testing aims to optimise tests for
all of these conditions. To achieve this, artificial faults (mutants) are
seeded in the program, and the test suite is evaluated based on how
many of these mutants it can distinguish. Mutants that are not detected
point out where tests are missing in the test suite. In contrast to basic
coverage analysis, mutation testing is computationally expensive, for
example because the number of mutants that can be generated for any
non-trivial program tends to be huge. In this chapter, we looked at var-
ious optimisations to overcome these issues.
1 https://github.com/CodeDefenders/CodeDefenders
175
Chapter contents 11
Classification trees
Overview 177
1 Introduction 178
2 Classification trees 178
3 Modelling with a classification tree 179
4 Test relevant aspects 181
5 An example: The Find command 182
6 Combinatorial coverage criteria 184
7 Designing the test cases 185
8 Summarising the classification tree method 185
9 Tool support 187
10 More examples 187
10.1 The flexible manufacturing system 187
10.2 The audio amplifier 193
10.3 The password diagnoser 195
11 Exercise 201
176
Chapter 11
Classification trees
This chapter has been written together with Eduardo Miranda based on
his material in [70].
OVERVIEW
Note that classification trees are not something totally new and differ-
ent from what we have seen in the previous chapters. They provide just
yet another way of making a test model, with its own advantages and
disadvantages, as we will describe in this chapter. Neither are we say-
ing that classification trees are the thing to use rather than other more
informal tree structures. Not at all! A mind map can be a very good
technique to clarify your testing strategy. It can even be a first draft for
a more formal classification tree if you want to make one.
LEARNING GOALS
After studying this chapter, you are expected to:
– understand the concepts of classification trees
– be able to create classification trees as test models
– apply t-wise coverage criteria to generate test cases from these
models.
177
Universidad Politécnica de Valencia Structured Software testing
CONTENTS
11.1 Introduction
178
Chapter 11 Classification trees
Once we have the tree, test cases are composed by combining leaf classes
(using a t-wise combinatorial coverage criterion from Chapter 9).
179
Universidad Politécnica de Valencia Structured Software testing
The test cases are composed by combining the leaf classes. Recall from
Chapter 6 that we cannot combine invalid and valid classes. We need to
apply different combination coverage criteria for them. We should pick
Each Choice Coverage (ECC) for the invalid classes:
For the valid classes, we can choose All Combinations Coverage (ACC),
giving the same results as before:
180
Chapter 11 Classification trees
A test relevant aspect is any factor the tester thinks can have implica-
tions on the outcome of a test run. A non-exclusive list includes:
• The morphology of the inputs. That means any characteristic of the in-
put variables other than its apparent values that might lead to the
execution of an unspecified path and result in failure. For example,
the possible attributes of a data type. When considering data types
like lists, arrays, sets, sequences, strings, et cetera, think of:
– the cardinality (e.g. the minimum and maximum permitted length
of a password);
– whether its elements are ordered or not;
– the position of a particular character within a sequence;
– multiple occurrences of a value in a sequence.
• The state of the SUT itself. The state of the SUT refers to the internal in-
formation a SUT needs to maintain between successive executions so
that it performs as designed. A SUT that does not need information
from its first execution to respond correctly to the second is called
stateless. This means that after the SUT is executed, all local vari-
ables and objects that were created or assigned a value as part of the
execution are discarded. A stateful SUT, however, remembers infor-
mation from execution to execution. Consequently, when presented
with the same inputs it might produce a different result depending
on whether it is the first, the second or any subsequent invocation.
181
Universidad Politécnica de Valencia Structured Software testing
EXERCISE 11.1
For each of the test relevant aspects mentioned in this section, find one
or two examples/exercises/SUTs from earlier in this course for which
that aspect is indeed relevant.
Imagine we need to test the Find command from a text editing program.
When invoking this command while editing some text, we will get a
dialogue window as depicted in Figure 11.2.
This dialogue enables us to specify the string we want to find in our text
together with some options.
EXERCISE 11.2
In Figure 11.2 you can see the dialogue window that starts up when
executing the Find command. Make a list or mind map of the test rel-
evant aspects concentrating on the finding functionality (i.e. ignoring
the ? (help), Cancel and X (close) buttons). Then look at the example CT
in Figure 11.3 and try to understand the different parts and how they
coincide with your test relevant aspects.
The root of the tree in Figure 11.3 defines the scope of the test design. By
stating the scope as FindCommand we have purposefully excluded from
the scope other commands embedded in the dialogue window, such as
? (help), Cancel and X (close). This way our tree stays manageable.
The second level nodes are compositions and highlight the three top-
level test relevant aspects, i.e. test parameters:
1 the command options (node Options);
2 the search string we input in the text field right of Find what:
(node FindWhat);
3 the environment in which the command is to execute, i.e. the text in
which the string has to be found (node SearchedText).
182
Chapter 11 Classification trees
FindWhat models the string searched for in the text. Instead of enumerat-
ing actual values for this input field, the test parameter is decomposed
into three lower level test parameters, corresponding to characteristics
(morphology) of the search string the test designer judged important to
test for (i.e. the length, the number of spaces and the presence of capital
letters). Note that these aspects might or might not have been described
in the specification document, but in the experience of the test designer,
they are aspects relevant to include in the test. For example, because
he or she has seen developers using spaces as delimiters and programs
crashing when presented with an empty string.
183
Universidad Politécnica de Valencia Structured Software testing
Another way of writing these test cases could be the following combi-
nation table:
Test Case# #hammers (h) #screwdrivers (s) 5h+10s
1 h<0 0<= s <= 30 ?
2 not an integer 0<= s <= 30 ?
3 h>=0 s<0 ?
4 h>=0 not an integer ?
5 h>=0 0<= s <= 30 in [0, 200[
6 h>=0 0<= s <= 30 in [200, 1000[
7 h>=0 0<= s <= 30 in [1000, ->[
8 h>=0 s>30 in [200, 1000[
9 h>=0 s>30 in [1000, ->[
All different ways of representing the same thing. Note that the combi-
nation table only describes which classes should be combined according
to the chosen coverage criterion, hence the name. There are no specific
values for the test cases yet. This will be discussed in the next section.
This is the combination table for the Find command, picking the 2-
wise coverage criterion, that we have seen in when doing the Problems-
exercise on YouLearn related to Chapter 9:
2 www.testona.net
184
Chapter 11 Classification trees
As you can see from the example above, the test parameter values of
classifications (i.e. the leaves of a CT) are given in abstract form, they are
abstract test cases. To make them executable they need to be concretised
into concrete test cases like those in the tables on page 180 where you
can see explicit values for the input variables. This is also known as
sensitisation [13], finding input values that will cause a selected test case
to be executed.
For the hardware store example this process is simple; we just pick val-
ues that satisfy the equivalence classes. However, for the Find com-
mand dialogue window, it is less trivial. For example, to be executable,
the parameter value ExistMoreThanOnce would need to be transformed
into an actual sequence of strings of which at least two must be identical
to the one used as a value for the FindWhat test parameter.
The CTM partitions test relevant aspects into equivalence classes. Talk-
ing in general terms about test relevant aspects encourages testers to
think beyond the input domain. The graphical representation as trees
is descriptive and easy to learn. Trees are also more scalable than tables
and hence allow for modelling bigger problems. Moreover, linking the
trees to the combination tables makes test coverage visible.
185
Universidad Politécnica de Valencia Structured Software testing
186
Chapter 11 Classification trees
As with all the test models we have seen in previous chapters, the re-
sults of CTM models are not unique. As the creators of the method ex-
plain, the CTM provides a structured and systematised approach to test
case design, making the test cases understandable and documentable.
However, the results will be highly dependent on the tester’s experi-
ence, knowledge and creativity. We will discuss the advantages and
disadvantages of alternative partitions as we go through the examples
in subsequent sections.
The tool that supports the CTM is TESTONA3 . This is a commercial tool
that is exploited by a German company. For this course, students can
obtain a free license to draw classification trees and create combination
tables. On the course site you can find details on how to obtain the
license.
We illustrate the use of the CTM by way of a series of examples. The first
one describes the flexible manufacturing system, which is an extension
of the machine vision system used by Grochtmann in his presentation
at the STAR conference [52]. Its purpose is to explain the main mod-
elling concepts and how to deal with valid and invalid values to pre-
vent the introduction of ineffectual combinations in a test suite. The
second example corresponds to a sound amplifier and is used to illus-
trate the handling of ineffectual combinations. The third is a password
diagnoser which addresses the problems of missing and infeasible com-
binations.
3 http://www.testona.net/
187
Universidad Politécnica de Valencia Structured Software testing
The first step in the process is to select the scope of testing. Are we
going to test the entire system, the machine vision subsystem or the
robotic arm subsystem? If we choose the vision subsystem, our concern
will be that the system is able to correctly classify the sheets and issue
commands. If we choose the robotic arm portion, the starting point will
be the reception of a well-formed command, the ability of the system
to interpret and execute the command. If we choose the entire system
it will be a combination of the two previous scenarios plus some other
cases such as powering-up, shutting down, emergency stop and so on.
For the sake of this example we have selected the machine vision sub-
system as the SUT.
Once this has been decided, the test designer might start by considering
the figures to be recognised. An obvious choice, since the purpose of the
subsystem is to classify them according to their shape, colour and size
as a test parameter. He or she might also consider environmental condi-
tions such as the room illumination, the conveyor background and the
speed at which the figures move in front of the camera to be relevant
test parameters, either because a specification called for it or because
the test designer suspects the performance of the subsystem might be
affected by them. This first cut at the problem is documented by the
classification tree in Figure 11.5. We can make the following remarks
about this CT:
• Because each figure is defined by three non-exclusive attributes, namely
shape, colour and size, we model Figure as a composition node. Each
of the attributes is necessary to define the figure.
• Contrast this with the room Lighting, which is modelled as a classifica-
tion node. If the Lighting is Dim, whatever the definition of Dim is, it
cannot be Bright. Classifications define exclusive choices.
• The Conveyor aspect is also modelled as a composition since its effect
on the correctness of the classification depends on two coexisting at-
tributes: the conveyor’s Speed and its Background.
188
Chapter 11 Classification trees
FIGURE 11.5 First cut decomposition for the machine vision subsystem
Now that we have identified the first test parameters, we need to de-
cide whether to assign terminal values to them or to further refine or
decompose the parameters into sub-parameters. Notice that given the
hierarchical approach taken, sub-parameters might be employed as pa-
rameters in their own right or as values of the parent parameter. In
general, a test parameter or a value will be decomposed when the en-
tity it represents:
• is made up of sub-entities that might affect the output of a test; or
189
Universidad Politécnica de Valencia Structured Software testing
For example the node element Shape might take the values: Triangle,
Circle and Square. Since a shape can only have one of these values at
a time (i.e. it cannot be a circle and a square at the same time), we can
model Shape as a classification of classes Triangle, Circle and Square.
At this point, the test designer might ask himself/herself about differ-
ent types of triangles. Is the machine vision system capable of recognis-
ing equilateral, scalene and isosceles triangles? This might be specified
or not, but it is a legitimate question. If the specification called for the
recognition of equilateral triangles, then the other two types must not
be classified as triangles, so we need to test that this is indeed the case.
However, if the specification assumed all kind of triangles, then the
tester must assure that the system not only works with equilateral but
with the other types of triangles as well. Consequently, the kind of tri-
angle is a test relevant aspect that applies to the value Triangle, which
now becomes a sub-test parameter. This can be modelled as shown by
the tree below:
Before continuing with the refinement of the model, this is a good point
to illustrate the relationship between the model constructed so far and
combinatorial coverage criteria for generating test cases (see Chapter 9).
The obvious choice would be to consider Shape, Color, Size, Lighting, Speed
190
Chapter 11 Classification trees
This gives the following test cases for the first part:
And for the second part the following remaining test cases:
191
Universidad Politécnica de Valencia Structured Software testing
This can be a very important decision to reduce the number of test cases,
since we know that, for a given test strength, the number of test cases
generated grows with the number of test parameters.
Up until now, we have only considered valid values for the different test
parameters, but of course invalid values have to be taken into account
too. For example, what happens if the figure does not have one of the
defined shapes or colours? Does the system stop? Does it assign the
same classification as to the last one processed? Does it classify it as
unrecognisable and send it to a trash bin?
The challenge with negative testing in the CTM is that the inclusion of
an invalid value in a combined test case might result in the premature
termination of the processing – for example with the display of an error
message – and the discarding of valuable combinations of valid values
included in the combined test case.
192
Chapter 11 Classification trees
Of course, we choose ECC since every test case should contain only
one invalid value, as we explained in Section 6.6.
• all t-way combinations for all valid values like we have seen before.
The amplifier has two input jacks, two volume controls, two toggle
switches and one three-way selection switch. The main purpose of this
example is to continue the discussion about ineffectual combinations
briefly addressed in the previous example when discussing how to deal
with invalid values. Here we are specifically addressing the problem of
ineffectual combinations caused by dependent parameters.
193
Universidad Politécnica de Valencia Structured Software testing
While there is nothing wrong with the tree as drawn, if we were to me-
chanically map its nodes onto the parameters and values of a combina-
tion table, the resulting test suite would provide a coverage lower than
the table’s nominal strength. A look at the following table will show
how this is possible.
Test
J1 J2 Control1 Control2 RMS OnStdby PowerOffTweed Comment
Case#
1 Unplugged Unplugged 0 0 60Position Stdby Off Ineffectual
2 Plugged Plugged 0 1 100Position On Power
3 Unplugged Plugged 0 5 60Position On Tweed
4 Unplugged Unplugged 0 9 100Position Stdby Power Ineffectual
5 Plugged Unplugged 0 10 60Position Stdby Tweed Ineffectual
6 Plugged Plugged 1 0 100Position Stdby Tweed Ineffectual
7 Unplugged Unplugged 1 1 60Position On Off Ineffectual
8 Plugged Unplugged 1 5 60Position Stdby Power Ineffectual
9 Plugged Plugged 1 9 100Position On Off Ineffectual
10 Unplugged Plugged 1 10 100Position On Power
11 Plugged Unplugged 5 0 60Position On Power
12 Unplugged Plugged 5 1 100Position Stdby Tweed Ineffectual
13 Unplugged Unplugged 5 5 100Position On Off Ineffectual
14 Unplugged Plugged 5 9 60Position On Tweed
15 Plugged Unplugged 5 10 60Position Stdby Off Ineffectual
16 Unplugged Plugged 9 0 60Position Stdby Tweed Ineffectual
17 Plugged Unplugged 9 1 100Position On Off Ineffectual
18 Unplugged Plugged 9 5 60Position Stdby Power Ineffectual
19 Unplugged Plugged 9 9 60Position On Power
20 Unplugged Unplugged 9 10 60Position On Off Ineffectual
21 Unplugged Plugged 10 0 60Position Stdby Power Ineffectual
22 Plugged Unplugged 10 1 100Position On Off Ineffectual
23 Plugged Unplugged 10 5 100Position Stdby Tweed Ineffectual
24 Unplugged Plugged 10 9 100Position Stdby Off Ineffectual
25 Plugged Plugged 10 10 100Position Stdby Off Ineffectual
194
Chapter 11 Classification trees
Now we can split the test suite into three test groups, one for each mode,
as shown in the following table:
Mode Test
J1 J2 Control1 Control2 RMS OnStdby PowerOffTweed
Case#
Off 1 Off
Stdby 2 Stdby Power
Stdby 3 Stdby Tweed
On 4 Unplugged Unplugged 0 0 60Position On Tweed
On 5 Plugged Plugged 0 1 100Position On Power
On 6 Unplugged Unplugged 0 5 100Position On Power
On 7 Plugged Unplugged 0 9 100Position On Tweed
On 8 Unplugged Plugged 0 10 60Position On Tweed
On 9 Plugged Plugged 1 0 60Position On Power
On 10 Unplugged Unplugged 1 1 100Position On Tweed
On 11 Plugged Plugged 1 5 60Position On Tweed
On 12 Unplugged Plugged 1 9 60Position On Power
On 13 Plugged Unplugged 1 10 100Position On Power
On 14 Unplugged Plugged 5 0 100Position On Tweed
On 15 Plugged Unplugged 5 1 60Position On Power
On 16 Plugged Unplugged 5 5 60Position On Tweed
On 17 Plugged Unplugged 5 9 100Position On Tweed
On 18 Unplugged Unplugged 5 10 60Position On Power
On 19 Plugged Unplugged 9 0 60Position On Tweed
On 20 Unplugged Plugged 9 1 100Position On Power
On 21 Plugged Unplugged 9 5 60Position On Power
On 22 Plugged Unplugged 9 9 100Position On Tweed
On 23 Plugged Plugged 9 10 100Position On Power
On 24 Plugged Unplugged 10 0 60Position On Power
On 25 Unplugged Plugged 10 1 100Position On Tweed
On 26 Plugged Unplugged 10 5 60Position On Power
On 27 Plugged Unplugged 10 9 60Position On Power
195
Universidad Politécnica de Valencia Structured Software testing
Characteristic Requirement
Length Shall be 8 characters or more
Composition Shall include at least one upper case character,
one numeric character and one special character
Predictability Shall not have any QWERTY keyboard or
ASCII sequence of length greater than 3
In the previous example (the audio amplifier), each class consists of one
value of a test parameter. We could make such a value-based tree for the
password diagnoser as well, in such a way that actual test passwords
can be generated by concatenating all combinations of the parameter
values, like in the tree below:
So, for this specific example, the value-based approach is not the best
choice.
196
Chapter 11 Classification trees
A quick look at the entries in the table shows that no valid password
was generated. Without a doubt, such a password must be part of any
test suite that we could call adequate. A valid password must comply
with five requirements:
1 its length has to be 8 characters or more (Length ≥ 8);
2 it shall include at least one upper case letter (UpperCase = Yes);
3 it shall include a number (Number = Yes);
4 it shall include a special character (SpecialCharacter = Yes); and
5 it shall not include a sequence longer than 3 characters (Predictability
= No).
Did you notice that the test designer seems to have a tendency to cap-
italise the first letter of the passwords and to write the sequences in
ascending or left to right order? One cannot but wonder whether the
developer based his design on the same thought patterns:
197
Universidad Politécnica de Valencia Structured Software testing
198
Chapter 11 Classification trees
Did the developer test for the presence of an uppercase letter in any position
other than the beginning of the string?
To prevent these biases, the test designer might want to include other
test relevant aspects such as the position where the characters appear
and the sequence order.
Note that these new test parameters do not come from the diagnoser
specification but from the experience or creativity of the test designer or
from organisational assets such as test catalogues or bug taxonomies.
If we were to combine all the leaves of the previous tree we would pro-
duce a large number of test cases including many infeasible combinations.
Look at the following test cases, for example:
Test case Length UpperCase Number SpecialCharacter SequenceType SequencePosition SequenceOrder Test value
number
1 <8 No No No No Beginning Ascending Ineffectual
2 >= 8 No No Beginning QWERTY InBetween Descending @lkjheduardo
.
.
.
21 <8 Beginning Beginning No No InBetween Descending Infeasible
22 <8 Beginning Beginning Beginning QWERTY Beginning Ascending Infeasible
.
.
.
69 >= 8 End End No No InBetween Descending Infeasible
70 >= 8 Beginning InBetween InBetween QWERTY End Descending E9#eduardolkjh
199
Universidad Politécnica de Valencia Structured Software testing
In test cases 21, 22 and 69, an UpperCase letter and a Number occupy
the same position in a string, a situation which is physically impossible.
Test case 70 is valid since the InBetween position is not a single position
but any position between the beginning and the end of the string.
Notice that from the point of view of the intellectual effort required to
create a tree, it is better to start treating every test parameter indepen-
dently from each other like we did for the previous version of the tree
–on page 199– and then rearrange it, than trying to figure the optimum
tree from the beginning.
200
Chapter 11 Classification trees
The first test group tests for Sequences with different characteristics as-
suming all other requirements are met. The second group tests for inter-
action effects between all requirements, assuming that all variations of
Sequence form an equivalence class. The third group defines a seeded
test case that consists of a password that satisfies all requirements since
there is no guarantee that such a test case would be generated using
2-way interactions.
11.11 Exercise
EXERCISE 11.3
Look at the following Replace dialogue. It contains a FindNext com-
mand (similar to what we have seen in Section 11.5), but also Replace,
ReplaceAll, Help (?) and Close/×.
In Section 11.5, we focussed on just one command (i.e. Find), and ex-
cluded the other commands in the dialogue window. In this exercise
we do not want to do that. We want the scope of testing to be the en-
tire dialogue and all the functionalities it encompasses. This apparently
simple difference seems to be taken for granted in the literature but it
201
Universidad Politécnica de Valencia Structured Software testing
has a large effect on the complexity of the resulting models and the ex-
tent to which we will experience combination anomalies (i.e. ineffectual
or infeasible combinations). We want you to find out why while mod-
elling the Replace dialogue as a classification tree. While modelling,
think what combinations your tree would generate, and if there are a
lot of anomalies, think of different ways you can model.
202
Chapter 11 Classification trees
203
Chapter contents 12
Graph models
Overview 205
1 Introduction 205
2 Graphs 206
3 Paths in graphs 207
4 General graph-based coverage criteria 209
4.1 Vertex coverage 210
4.2 Edge coverage 210
4.3 Path coverage 211
5 Procedures for making test suites in the case of cycles 211
5.1 Transition trees 211
5.2 Prime paths 216
6 Graph coverage for source code 217
6.1 Control flow graph from source code 218
6.2 Statement and branch coverage 220
6.3 Condition coverage 221
6.4 Multiple condition coverage 221
6.5 Other code coverage criteria 222
6.6 McCabe’s cyclomatic complexity 222
7 Graph coverage for state machines 223
8 Flowcharts 226
204
Chapter 12
Graph models
OVERVIEW
In this chapter, graphs are the models for test case generation and cov-
erage. Graphs are used in many ways in software engineering and they
come in all kinds of flavours and styles. But in the end they all con-
sist of a collection of objects (called vertices and drawn as a circles, el-
lipses or boxes) and relations between them (called edges and drawn as
lines or arrows). Graphs can model flow dynamics in software applica-
tions. Consequently, the test cases that we can derive from graphs are
sequences describing steps through the SUT. Up until now the test cases
we have seen were testing input-output behaviour by inserting single
input values, not sequences.
Again, graphs fit in the way we present all testing techniques in this
course:
So, the graph is the model, and we will need to make test cases with
the intention of covering it. Beizer’s [13] general principles for using
graphs in testing fit very well in this course. His motto is:
LEARNING GOALS
After studying this chapter, you are expected to:
– understand the use of graphs in testing
– know different types of graphs that are used in testing (control
flow, state machines and other flowcharts)
– understand and be able to apply different general coverage criteria
for graphs.
CONTENTS
12.1 Introduction
205
Universidad Politécnica de Valencia Structured Software testing
In this chapter, we will first give an overview of the concepts and defi-
nitions related to graphs. This is not an extensive treatment of graphs,
but just the concepts we will use here to explain the different coverage
criteria for testing for general type of graphs. Subsequently we will look
at examples of more concrete graphs that are used in software engineer-
ing and show how the definitions on the general graphs translate to the
concrete examples (e.g., control flow graphs and state machines).
12.2 Graphs
V = {1, 2, 3, 4, 5, 6}
E = {(1, 3), (1, 4), (1, 6), 3
(2, 3), (2, 4), (3, 2),
(3, 6), (4, 5), (5, 6), 5 1
(6, 3)}
Vinit = {1}
Vf in = {6}
4 2
206
Chapter 12 Graph models
We can also represent graphs with adjacency lists. Typically we will have
a list of size |V | where each list entry consists of a list of edges that go
out of the vertex that is represented by the list entry. The adjacency list
representation of graph G1 is below:
• The number of edges in the path is the length of the path. We denote
this with length( p).
• paths( G ) denotes the set of all paths in graph G = (V, E).
• A simple path is a path in which no vertex appears more than once.
• A path segment or a subpath of path p is a consecutive subsequence
of p.
• A vertex v is reachable from a vertex u (and u can reach v) if there
exists a path that starts with u and ends with v.
• A cycle or a loop is a path that starts and ends in the same vertex.
207
Universidad Politécnica de Valencia Structured Software testing
Make a graph
Edge coverage
Round-trip coverage
Execute the
test cases
Determine c coverage
of the test run (a
posteriori coverage)
Now we get to the first test-related definitions. Recall that a test case
contains all information necessary to guide the execution of a particular
test (e.g. input, recipe, oracle). Also remember the way we present test
techniques in this course:
Once we have our concrete test cases, we need to execute them to test
how our system responds to them, to see whether the intended test path
has been executed (i.e. the intended coverage has been reached) and to
detect failures if any.
1 Note that there are also techniques where a test case is not a path, but a subgraph or
a subtree. In this section we only look at paths.
208
Chapter 12 Graph models
Note that a set of test paths and a set of traces can both be represented
as a set of subpaths in a graph G, but they are different things. To sum-
marise how they relate and how graph-based testing works, we refer to
Figure 12.2.
p visits v ≡ v ∈ p
p tours q ≡ q is a subpath of p
p visits e ⇔ p tours e
• The set of subpaths that are toured by the paths from a set of paths P
is defined as:
toured_by_paths( P) = {q | ∃ p : p ∈ P : ( p tours q) }
Since both are sets of paths, we will define the coverage criteria in terms
of sets of paths in a graph. You will not be surprised to find that there
are several coverage criteria, such as vertex coverage, edge coverage
and path coverage.
209
Universidad Politécnica de Valencia Structured Software testing
DEFINITION 12.1 Given a graph G = (V, E) and a set of paths P in G. The vertex coverage
of P is the percentage of all vertices in G that are visited by the paths in
P:
{v ∈ V | ∃ p : p ∈ P : ( p visits v)}
× 100%
|V |
1, 3, 2, 4, 5, 6
Note that this test path starts in a vertex from Vinit and ends in a vertex
from Vf in .
DEFINITION 12.2 Given a graph G = (V, E) and a set of paths P in G, the edge coverage
of P is the percentage of all edges in G that are visited by the paths in P:
{e ∈ E | ∃ p : p ∈ P : ( p visits e)}
× 100%
| E|
For graph G1 from Figure 12.1 the following set of 4 test paths gives
100% edge coverage:
1, 4, 5, 6
1, 6, 3, 6
1, 3, 2, 3, 6
1, 3, 2, 4, 5, 6
Note that Definition 12.2 of edge coverage from above (which is ba-
sically considered the standard definition by many authors) does not
include vertex coverage. In [4] it is indicated that intuitively it might be
a good idea to include vertex coverage in edge coverage. The example
given there is a graph with a node that has no edges. Our definition of
edge coverage does not cover that node.
For that purpose, Ammann and Offutt define edge coverage as follows
[4]:
DEFINITION 12.3 Given a graph G = (V, E) and a set of paths P in G, the Ammann and
Offutt edge coverage of P is the percentage of all paths in G with length
up to and including 1 that are toured by the paths in P:
{ p | p ∈ toured_by_paths( P) ∧ length( p) ≤ 1}
× 100%
|{ p | p ∈ paths( G ) ∧ length( p) ≤ 1}|
So we reach 100% Ammann and Offutt edge coverage if each node and
each edge are visited at least once by a test run of a test suite.
210
Chapter 12 Graph models
In 1976, Pimont and Rault [95] introduced a criterion for covering pairs
of edges. They called it switch cover. In [4] this is called edge-pair coverage.
In 1978, Chow [32] generalized this and defined n-switch coverage for a
specific graph (i.e. state machines, which we will see in Section 12.7).
The n-switch coverage criterion is about the percentage of paths with
length n + 1 that are covered. This way 0-switch coverage is edge cover-
age and 1-switch coverage is Pimont and Rault’s switch cover.
DEFINITION 12.4 Given a graph G and a set of paths P in G, the n-switch coverage of P
is the percentage of all paths with length n + 1 in G that are toured by
the paths in P:
{ p | p ∈ toured_by_paths( P) ∧ length( p) = n + 1}
× 100%
|{ p | p ∈ paths( G ) ∧ length( p) = n + 1}|
DEFINITION 12.5 Given a graph G and a set of paths P in G, the path coverage of P is the
percentage of all paths in G that are toured by the paths in P:
{ p | p ∈ toured_by_paths( P)}
× 100%
|paths( G )|
The problem with this coverage definition is that if the graph contains a
cycle, the total number of paths in G is infinite and hence it is impossible
to reach 100% coverage. In the next section, we discuss several attempts
to solve this problem and construct finite test suites.
Selecting test paths from an infinite set of possible paths to create a test
suite is not easy. In this section, we will describe two different proce-
dures from testing literature that deal with the infinite number of paths
resulting from a cycle in the graph and construct finite test suites:
• transition trees defined in Binder [14] and
• prime paths from Offutt [4].
In [14], Robert Binder defines a test strategy for making a test suite to
cover a graph. It is an adaption of an ”automata-theoretic” test strategy
defined by Chow [32], in 1978, called the W-method for Finite State
Machines (FSM).
211
Universidad Politécnica de Valencia Structured Software testing
Binder’s strategy only uses the set P of Chow’s method. Chow defines
this set, which we will call set PChow , as follows:
• Set PChow is any set of paths that contains, for every edge e = (vi , v j ),
both a path v0 , . . . , vi from the initial vertex to vi and the extension of
this path with v j , that is v0 , . . . , vi , v j .
Binder [14] gives a recipe for constructing such a set PChow by building
a transition tree (TT) of G. All subpaths of length ≥ 0 that start at the
root of the transition tree then constitute the set PChow . A procedure for
constructing such a transition tree for G = (V, E) is in Figure 12.3.
The procedure in Figure 12.3 always terminates, since there are only a
finite number of vertices in G. Also, depending on the left to right order
in which we place the successor nodes, a different tree may result.
Whichever tree results, Binder states [14] that covering all subpaths
starting in the root of a TT is a good test strategy for graphs in the pres-
ence of loops. We will define this as transition tree coverage.
2 This recipe is a mixture from the descriptions in Chow [32] and Binder[14]. In the
original paper of Chow [32], in step 2, the condition j ≤ k is stated for j. However, as you
can see later from Example 12.1 this does not always give the desired result and does not
define the complete set PChow . So we adapted that to "j < k or if a node with this label
has already been examined at level k". Also in [62] it is stated that a node is final if it is
already encountered higher up in the tree.
212
Chapter 12 Graph models
3
V = {1, 2, 3, 4, 5}
E = {(1, 1), (1, 2), (1, 3),
(2, 1), (2, 2), (2, 4), 5
(2, 5), (3, 1), (3, 5),
(4, 5), (5, 3)} 1
Vinit = {1}
Vf in = {5}
4 2
DEFINITION 12.6 Given a graph G, a set PChow for G and a set of paths P in G, the transi-
tion tree coverage of P for the given PChow is the percentage of all paths
in PChow that are toured by the paths in P:
{ p ∈ PChow | p ∈ toured_by_paths( P)}
× 100%
| PChow |
Level 3
• The node with label 1 is terminated at level 2 because it is
already in the tree at level 1.
• Vertex 2 has four outgoing edges, to vertices 1, 2, 5 and 4, so
the tree will become:
213
Universidad Politécnica de Valencia Structured Software testing
Level 4
• All nodes with label 1 or 2 at level 3 are terminated because
they are already in the tree (at levels 1 and 2).
• Vertex 5 has one outgoing edge, to vertex 3, so the leftmost
node 5 in the tree gets a successor node 3.
• Vertex 4 has one outgoing edge, to vertex 5.
• The rightmost node 5 in the tree is now terminated because
we already added its successor earlier in this step.
Note that the TT contains each edge of G2 exactly once, and does not
contain anything else. Consequently, transition tree coverage implies
edge coverage, but not necessarily the other way around. Moreover,
note that each edge is reachable from the root (i.e. the initial vertex of
G2 ). Consequently, the set of subpaths of length ≥ 0 starting in the root
of the TT indeed satisfies the definition of PChow . For our TT of G2 , this
set consists of the following paths:
1 1, 2, 2 1, 2, 4, 5
1, 1 1, 2, 5 1, 3
1, 2 1, 2, 5, 3 1, 3, 1
1, 2, 1 1, 2, 4 1, 3, 5
A set of test paths achieving 100% transition tree path coverage for this
PChow is, for instance:
1, 2, 5, 3, 1, 3, 5
1, 2, 4, 5
1, 2, 1, 1, 2, 2, 5
1, 3, 1, 2, 5
214
Chapter 12 Graph models
DEFINITION 12.7 Given a graph G and a set of paths P in G. The round-trip path coverage
of P is the percentage of all round-trip paths in G that are toured by the
paths in P:
{ p| p ∈ toured_by_paths( P) ∧ p is a round-trip path of G }
× 100%
|{ p| p is a round-trip path of G }|
1, 1 3, 5, 3 1, 2, 4, 5, 3, 1
2, 2 5, 3, 5 2, 4, 5, 3, 1, 2
1, 2, 1 1, 2, 5, 3, 1 4, 5, 3, 1, 2, 4
1, 3, 1 2, 5, 3, 1, 2 5, 3, 1, 2, 4, 5
2, 1, 2 5, 3, 1, 2, 5 3, 1, 2, 4, 5, 3
3, 1, 3 3, 1, 2, 5, 3
As mentioned before, Binder claims that the test suite that covers the TT
for G2 , is as good as covering all round-trip paths of G2 .
At this stage of the research, there is no clear, precise reason that ex-
plains why covering the paths in the TT (i.e. transition tree coverage) is
better than covering all edges. Moreover, also Binder’s conjecture that
transition tree coverage is as strong as round-trip coverage is also still
unproven.
215
Universidad Politécnica de Valencia Structured Software testing
2 1
EXERCISE 12.1
a Make a TT of G2 using a DFS approach, which defines your PChow for
dfs
this exercise. Let us call this PChow .
b Construct (by hand) a set of test paths P that gives 100% transition
dfs
tree coverage for your PChow from part a of this exercise.
c Determine the transition tree path coverage of your set of test paths
dfs
P for the set PChow resulting from the BFS approach in Example 12.1.
EXERCISE 12.2
a What are the round-trip paths in graph G3 from Figure 12.6?
b Make a TT for G3 using the DFS approach.
c Which set of paths would result in 100% transition tree coverage of
G3 ?
d Do these paths also result in 100% round-trip coverage of G3 ?
In [4], prime paths are introduced to help creating test suites in the pres-
ence of loops.
• A prime path in a graph G = (V, E) is a simple path from vi ∈ V to
v j ∈ V that
– may have vi = v j ;
– does not appear as a proper subpath of any other path.
EXAMPLE 12.2 Graph G2 from Figure 12.4 contains the following prime paths:
216
Chapter 12 Graph models
Note that these are exactly all 17 round-trip paths and one prime path
that is not a round-trip path: 2,1,3,5. So there is a very subtle difference
between round-trip paths and prime paths, which can be expressed by
the following alternative definition of round-trip path: a round-trip
path is a prime path that starts and ends at the same vertex.
https://cs.gmu.edu:8443/offutt/coverage/GraphCoverage
we can find that the following 12 test paths cover all prime paths:
Note that the last two test paths could be merged into one path 1, 1, 2,
2, 5 that then covers primepaths 1, 1 and 2, 2.
DEFINITION 12.8 Given a graph G and a set of paths P in G. The prime path coverage of
P is the percentage of all prime paths in G that are toured by the paths
in P:
{ p| p ∈ toured_by_paths( P) ∧ p is a prime path of G }
× 100%
|{ p| p is a prime path of G }|
EXERCISE 12.3
How many test paths do we need for 100% prime path coverage of
graph G3 from Figure 12.6? Do this at first by hand. When finished,
use the aforementioned website to compare your answer to theirs.
In Section 1.9.1.1, we have briefly seen code coverage criteria, that give
an idea of the percentage of code that has been executed by a test suite.
Different criteria are defined based on whether specific parts of the code
are executed or not during the tests.
217
Universidad Politécnica de Valencia Structured Software testing
1 statement_a;
2 while (condition)
3 statement_b;
4 statement_c;
All these criteria we have seen are graph-based coverage criteria of the
Control Flow Graph (CFG). Control flow graphs are graphical represen-
tations of the source code of a program: each vertex corresponds to a
statement or a condition, and the edges correspond to branches. In the
following section we will show how to construct a CFG.
1 if (condition)
2 statement_a;
3 else statement_b;
4 statement_c;
The vertex with label 1 is called a decision vertex, representing the condi-
tion from line 1 whose value decides whether we continue to statement
2 or 3.
The vertex with label 4 is called a junction vertex since it has more than
1 incoming edge.
The CFG for a while-loop construct can be found in Figure 12.8. Decision
vertex 2 represents the execution of the condition that decides whether
we continue with the loop or not.
For the for-loop in Figure 12.9 we have written the initialisation, the con-
dition and the increment parts of the for on three different lines so it
is easier to see which vertex it corresponds with. The initialisation is
only executed once, and then the loop is executed until the condition in
vertex 3 no longer evaluates to True.
218
Chapter 12 Graph models
1 statement_a;
2 for ( init ;
3 condition;
4 increment )
5 statement_b;
6 statement_c;
In Figure 12.10 you can see the CFG of a do-while-loop. The loop body is
always executed at least once.
1 statement_a;
2 do statement_b;
3 while (condition);
4 statement_c;
The case or switch statement has the CFG as in Figure 12.11. Here we put
different statements together in a vertex. We can do this because these
statements are executed as a basic block, i.e. all statements in the block
are executed, in a sequential manner. We could make separated vertices
for each of the statements, but it would make the CFG larger.
EXERCISE 12.4
Make a CFG for the following program, which might look familiar. The
methods incharacter, outcharacter and nextcharacter can throw
an IOException.
219
Universidad Politécnica de Valencia Structured Software testing
1 get (ch);
2 switch (ch){
3 case 'A':
4 statement_a;
5 break;
6 case 'B':
7 statement_b;
8 break;
9 case 'C':
10 statement_c;
11 break;
12 default:
13 statement_d;
14 break;
15 }
16 statement_e;
The coverage criteria from the previous section can now be applied to
the CFGs and we can see that we get the coverage criteria explained in
Section 1.9.1.1. The notion of vertex coverage is statement coverage; edge
coverage is branch coverage or decision coverage.
220
Chapter 12 Graph models
Recall that condition coverage (or predicate coverage) was defined as:
the percentage of Boolean sub-expressions present in the guards of a
branch that have been evaluated to both True and False during our tests.
If a decision vertex in the CFG represents a guard as a whole, then edge
coverage is not enough for condition coverage.
However if we draw the CFG a bit different, then edge coverage on that
graph does imply condition coverage.
For example, recall the program oddOrPos from Figure 1.3, repeated
below for convenience:
sub-expr1 = (x [i ]%2 == 1)
sub-expr2 = (x [i ] > 0)
Note that if you use tools for code coverage, it is important to make sure
you know how the CFG is constructed to make sure that edge coverage
indeed subsumes condition coverage.
221
Universidad Politécnica de Valencia Structured Software testing
i=0
i < x.length
x[i]%2 ==1
5 x[i] > 0
i++
FIGURE 12.12 CFG for the program oddOrPos from Figure 1.3
This means that each sub-condition must be executed twice, with the
results True and False, but with no difference in the truth values of all
other conditions in the decision. In addition, it needs to be shown that
each condition independently affects the decision. With this coverage
criterion, some combinations of condition results turn out to be redun-
dant and are not counted in the coverage result.
222
Chapter 12 Graph models
McCabe was one of the first to use CFGs in 1976 [76] when he defined
his cyclomatic complexity metric as a software metric used to indicate
the complexity of a program.
CG = | E | − | V | + 2 p
CG = | E | − | V | + 2
In 1989, McCabe et.al. [77] proposed the basis path testing coverage crite-
rion, which states that it is enough to test CG distinct simple paths from
the initial to the final vertex in a single entry-single exit CFG.
Note that the number of nodes and edges depends on how we draw the
CFG : one vertex per guard, or one vertex per sub-condition in the guard.
However, there is more to be careful about. The basis path testing crite-
rion turns out to be problematic and unreliable. In the next exercise we
ask you to think about why that could be.
EXERCISE 12.5
Try –for a few minutes, then read the solution– to construct a single
entry-single exit CFG for which it is possible to select CG distinct simple
paths from the initial to the final vertex without reaching 100% state-
ment or branch coverage.
A myriad of articles and textbooks exist that write about state machines.
Many different ways exist to represent them as a graph, define what can
be in a state, what a transition represents and what can be annotated
to the vertices and the edges. Just to give you an impression, people
have written about FSMs (Finite State Machines), EFSMs (Extended Fi-
nite State Machines), Petri Nets (PN), I/O Automata, Timed I/O Au-
tomata, Probabilistic I/O Automata, Labelled Transition Systems (LTS),
Labelled Terminal Transition Systems (LTTS), Timed Transition Systems
(TTS), and many other variants. In this chapter, we will look at a gen-
eral form of state machines and their representation as graphs. In Chap-
ter 13 we will look in more detail at Labelled Transition Systems (LTS)
as a test model, together with the techniques and accompanying tools
that can automatically generate and execute tests from LTSs.
223
Universidad Politécnica de Valencia Structured Software testing
OFF
press off/
press on/
off sound
on sound
press off/
off sound IDLE
OPERATING /finish
PAUSED
In Figure 12.13 you can see a simple state machine of a typical washing
machine. The vertices represent the states in which the machine can
be: off, idle, programmed, operating, paused and ready. The edges are
annotated with expressions in the format event[guard]/actions:
• the events (inputs) that a user can generate by interacting with the
machine: push the on button, push the off button, select a program,
reset, start, stop, continue.
• the actions (outputs) of the machine: finish, notify with ready sound,
off sound, on sound, message on display
• the guards mean that this transition can only happen when the guard
evaluates to True. You can only put the washing machine into oper-
ation when the door is closed.
The representation from above is also referred to as the Mealy [78] rep-
resentation.
Let us look again at the state machine in Figure 12.13. Vertex coverage
from Definition 12.1 means that all states have been visited once and
hence is also called state coverage. Edge coverage from Definition 12.2
means that all transitions in the state machine have been visited once
and hence is called transition coverage.
EXERCISE 12.6
a Define test cases that establish 100% transition tree path coverage
(Definition 12.6) for the state machine in Figure 12.13. Assume that
Vinit = Vf in = {OFF} and use a BFS approach.
b Do you need to add more tests for 100% round trip path coverage
(Definition 12.7)? You do not have to actually find new test paths, but
224
Chapter 12 Graph models
The most difficult part of using state machines for testing is not the
coverage criteria, it is making the state machine. However, it is also the
most entertaining and creative part. Furthermore, it can help you as a
tester learn a lot about the SUT and even find errors while still modelling
[102, 4].
EXAMPLE 12.3 We consider the following specification for the cash dispenser.
The user of the cash dispenser can insert a bank card. After inserting the card,
the user is asked for the card’s PIN. If the PIN is correct, the user is asked for
the amount of money that is required. If the user has enough credit, the money
is given, otherwise a message is displayed that there is not enough money in
the account. In both cases, the cash dispenser will return the card. If the PIN
is wrong, the machine will permit one other attempt. If the PIN is incorrect
again, the cash dispenser will issue a message that the card will be kept.
Reading the description we can detect the following states, events, guards
and actions:
• The machine is waiting (state) for a customer to insert a card. So we
can call this state waiting for card .
• When the card is inserted (event), the machine asks for the PIN (action)
and starts waiting for PIN (state).
• If the PIN is entered incorrectly (event), since it has not been tried two
times the machine will stay in the same state waiting for PIN .
• If the PIN is entered incorrectly (event) for the second time (guard), the
card will be swallowed (action) and the machine will return to the state
waiting for card .
• When the PIN is entered correctly (event), the machine asks for the amount
(action) and then starts waiting for amount (state) of money the user
wants.
• When the user enters the amount it will be checked (event) and if there
is enough money in the account (guard), the money will be given (ac-
tion), the card will be returned (action) and the machine will return to
the state waiting for card .
• When there is not enough money (guard), a message is displayed (ac-
tion) and the machine will return to the state waiting for card .
225
Universidad Politécnica de Valencia Structured Software testing
insert card/
ask for PIN
waiting for
card
waiting for
PIN enter incorrect PIN [< 2 times]/
ask for PIN
enter incorrect PIN [2 times]/
swallow card, message
waiting for
amount
EXERCISE 12.7
a Adapt the state model of the cash dispenser such that it also checks
whether the inserted card is valid or not. When invalid it should eject
the card.
b Design a test suite for using the model with a 100% of transition
(edge) coverage.
12.8 Flowcharts
Flowcharts are a general purpose type of graphs that can be used for
testing. A CFG is a flowchart, showing the steps of a piece of code.
However, we can make all kinds of flowcharts at different levels of ab-
straction. Like any type of diagrams, flowcharts can help visualise what
is going on and thereby help understand a system we are testing, and
perhaps also find errors, bottlenecks, and other less-obvious features
within it.
There are many different types of flowcharts, and each type has its own
repertoire of different styles of boxes to draw vertices and other nota-
tional conventions.
3 https://www.uml.org/
226
Chapter 12 Graph models
227
Chapter contents 13
State-transition models
Overview 229
1 Introduction 229
2 Labelled transition tystems 231
2.1 Conformance 234
2.2 Coverage 236
3 Test case generation from LTS 237
4 Axini Modelling Language 240
4.1 Model 240
4.2 Communication: labels 241
4.3 Non-deterministic choice 241
4.4 Loop: repeat 243
4.5 States and goto 243
4.6 Data: parameters 245
4.7 State variables 246
4.8 Advanced features 248
228
Chapter 13
State-transition models
This chapter has been written by Theo Ruys from Axini.
OVERVIEW
LEARNING GOALS
After studying this chapter, you are expected to:
– understand how labelled transition systems work
– understand the concepts of ATCG and its advantages and disad-
vantages
– know the basic constructs of AML
– understand how AML constructs are mapped upon labelled transi-
tion systems
– be able to construct AML models for small reactive systems.
CONTENTS
13.1 Introduction
229
Universidad Politécnica de Valencia Structured Software testing
For the reactive systems in this chapter, the SUT is treated as a black-box
exhibiting behaviour and interacting with its environment, but with-
out knowledge about its internal structure. The only way a tester can
control and observe an implementation is via its interfaces. The aim of
testing is to check the correctness of the behaviour of the SUT on its in-
terfaces [116]. These interfaces are not limited to GUIs, but can be any
communication interface, e.g., I / O interfaces (standard input/output,
file systems, middleware, et cetera), API calls, et cetera.
The type of testing we will see in the chapter does not require any
knowledge about the implementation of the SUT. However, if the source
code of the SUT is available, other test techniques from this course can
of course be used to (unit) test parts of the implementation before testing
the integration of the different parts which constitute the complete SUT.
Or one could apply code coverage measurements of the source code to
assess the quality of the tests.
Again, the testing techniques discussed in this chapter fit with the way
we present the test techniques in this book:
230
Chapter 13 State-transition models
The ATCG approach is the result of more than three decades of scientific
research. The ATCG approach has a strong mathematical basis; many
theoretical papers and descriptions are available which formally define
the test derivation algorithms and prove their correctness. In this chap-
ter, we introduce ATCG by example and only touch upon the formal
foundations of ATCG, and we are (sometimes very) informal to ease the
presentation. For a more formal introduction the interested reader is
referred to, for example, [116].
The labels on the transitions represent the observable actions of the sys-
tem; they model the system’s interactions with the environment. Be-
sides obervable actions, the model may also contain internal actions
(often denoted by ι or τ) which are unobservable for the system’s envi-
ronment. We will not use internal actions in this chapter.
231
Universidad Politécnica de Valencia Structured Software testing
s0 s0 s0 ?kick s2
?button ?button
s1 ?button !tea
s1
!tea !coffee !tea
s1
s2 s2 s3
(a) (b) (c)
Some of the advantages of labelled transition systems are that they are
easy to draw and it is intuitively clear what the semantics of the system
are. Figure 13.1 shows some examples of LTSs. The bullets represent the
states and the arrows between the bullets are the transitions. Each label
is either prefixed with a question mark (?) or an exclamation mark (!):
a question mark ? represents an input to the system, an exclamation
mark ! represents an output from the system. The start state of the
system is the state with an ingoing arrow which has neither a source
state nor a label. A state without an outgoing arrow is called a final
state.
Let us look at the examples of Figure 13.1 in more detail. Figure 13.1 (a)
describes a system that can deliver a cup of tea. The system has three
states. From the start state s0 , a ?button can be pressed and the system
goes to state s1 . From state s1 the system can do a !tea action. After the
!tea action, the system goes to the final state s2 , and nothing can happen
anymore. Figure 13.1 (b) describes a system which can deliver coffee
or tea, non-deterministically. After receiving the ?button, the system can
either deliver !coffee or !tea. Figure 13.1 (c) is a continuous version of
Figure (a): after delivering the tea, the button can be pressed again to
deliver tea. If the system is ?kick-ed, the system goes out-of-order and
does nothing anymore.
1 Tretmans [116] takes a more formal approach and distinguishes between labelled
transition systems (where the transitions do not have a direction) and input-output tran-
sition systems (where the transitions are divided into input- and output transitions).
232
Chapter 13 State-transition models
sidle
?Card
!AskPincode
!KeepCard
?Pincode
!Wrong
?Pincode !AskAmount
!Wrong sgive_money
!AskAmount
?Amount
!Money !NotEnough
!Card
Figure 13.2 shows an LTS of a cash dispenser. The user of the cash dis-
penser can insert a bank card (input ?Card). After inserting the card, the
user is asked for the card’s PIN (output !AskPIN). If the PIN entered (in-
put ?PIN) is correct, then the user is asked how much money is required
(output !AskAmount). If the user has enough credit, the money is dis-
pensed (output !Money), otherwise a message is displayed that there is
not enough money in the account (output !NotEnough). In both cases, the
cash dispenser will return the card (output !Card). If the PIN is wrong,
the machine will permit one other attempt to enter the correct PIN (in-
put ?PIN). If the PIN is incorrect again, the cash dispenser will issue a
message about the PIN entered (output !Wrong) and the card will be kept
(output !KeepCard).
233
Universidad Politécnica de Valencia Structured Software testing
EXERCISE 13.1
In this exercise you are asked to define an LTS for an abstract Stack ma-
chine. The Stack can hold a maximum of three elements. The Stack
accepts two stimuli: push and pop. We abstract from the actual values
that are being pushed to and popped from the Stack. The Stack ma-
chine has two responses: value and error. After a pop stimulus, the Stack
responds with a value label. If the Stack machine receives a pop stim-
ulus when there are no more elements, then it outputs an error label.
Similarly, when the Stack is full and the machine receives a push la-
bel, the Stack also outputs an error label. Draw an LTS for this Stack
machine.
13.2.1 Conformance
234
Chapter 13 State-transition models
We assume that the SUT can be modelled as an LTS and that the input
and output actions of the SUT are the same as specified in L I and LO of
the LTS. In practice, though, the physical labels of the SUT have to be
translated to the logical labels of the model, and vice versa.
Executing a test case results in a execution trace of the SUT, which either
corresponds to a trace of the LTS or not. Test execution may be success-
ful, meaning that the observed responses of the SUT correspond to the
expected responses of the test case. We say that the test passes and the
execution trace will correspond to a trace in the LTS. Test execution may
be unsuccessful: when executing the stimuli from the test case, we ob-
serve a response from the SUT that is not an expected response from the
test case. We say that the test fails. In this case the execution trace will
not correspond entirely to a trace in the LTS: the last observed response
does not correspond to a transition of the LTS.
Given an LTS, a test case can be represented by a tree, where each edge
is either a stimulus or a response, and ends with a verdict: pass or fail.
For example, recall the LTS in Figure 13.3 (a). Figure 13.3 (b) (taken
from [40]) is a test case which describes all traces for this LTS. The pass-
ing traces all start with a ?button stimulus, followed by either a !coffee
or !tea response, and then nothing more. Observing nothing – or the
absence of a response – is named quiescence. This is represented by θ:
it means that we do not observe any response from the SUT. The tran-
sition θ will be taken if none of the output responses can be observed.
Testing for quiescence is usually implemented as a timeout: if we do
not observe any response after a certain amount of time, we conclude
that no responses will be generated.
235
Universidad Politécnica de Valencia Structured Software testing
Figure 13.3 (c) shows a possible execution of the test case at the SUT:
h?button, !tea, !teai. After a ?button stimulus, the SUT responds with !tea
and then !tea once again. The test execution is a trace in the test case of
Figure 13.3 (b), leading to a fail verdict, because the second !tea output
is not allowed in the LTS. In other words, the observed execution trace
is not a trace of the LTS, and the test fails. Note that if we would exe-
cute the test case again, we might observe another test execution, e.g.
h?button, !coffee, θ i, which represents a passing test execution.
Most reactive systems are supposed to work uninterrupted: they are
designed to never stop. The set of traces of such systems is clearly infi-
nite and we have already seen that it is impossible to define a test suite
of test cases containing all possible traces. We can still test the machine
up to a predefined depth n, though: we limit the test cases to that depth
of n labels. If a test execution still conforms to the test case after n steps,
we consider the test case passed up to depth n.
For industrial-sized systems the test case will often be too large due to
the large number of possible non-deterministic stimuli, even if we limit
the test case vertically to a predefined depth of n steps. To limit the test
case horizontally we can leave out certain stimuli (in parts of) of the
LTS . The choice of stimuli to be included in the test case is driven by the
intended coverage of the model.
EXERCISE 13.2
Consider the LTS of the Stack machine of Exercise 13.1. Because the
behaviour of Stack machine is infinite, we cannot define a complete
test case. Draw a test case for the Stack machine up to depth 4, i.e. 4
labels deep.
EXERCISE 13.3
The test case of Exercise 13.2 includes all traces of depth 4. Conse-
quently, it does not include the interesting trace h?push, ?push, ?push,
?push, !errori of length 5. Draw a test case for the Stack machine up to
depth 5, where a ?pop stimulus is only allowed after two ?push stimuli.
13.2.2 Coverage
236
Chapter 13 State-transition models
within the LTS can be of interest. That is, for example n-switch cov-
erage (see Definition 12.4) of a test suite, which tries to include all
(sub)paths of length n (from the start state or from each state) in the
generated test suite.
• Data coverage: in the examples that we have seen so far, the labels
of the LTS did not carry any data: they were just abstract names for
actions of the system. In practice, though, we will usually add more
details to the labels, including data values. In Section 13.4 on AML
we will see examples of this. For such labels, we can use the same
approaches as we discussed in Chapters 6, 7 and 8 of this book, e.g.,
input domain modelling with equivalence classes or input domain
boundaries, to obtain and increase the data coverage.
Consider Figure 13.1 (c). If we never ?kick this tea machine, and only
push the ?button to get tea, both the state coverage and the transition
coverage of the LTS will be 66% as the transition labelled with ?kick and
state s2 will not be covered.
Both state and transition coverage of the LTS model might not say much
about the coverage of the complete SUT. The model may be incomplete
in the sense that functionality of the SUT has been left out or certain
details have been abstracted from. From a dynamic point of view, cov-
erage of (sub)paths might be much more interesting than the static cov-
erage of states and transitions. Still, even with these reservations, state
and transition coverage should be as high as possible.
EXERCISE 13.4
Consider the LTS of the Stack machine of Exercise 13.1 again. Suppose
we would execute the following three tests:
• h?push, ?pop, !valuei
• h?push, ?push, ?pop, !value, ?pushi
• h?push, ?push, ?pushi
What is the state coverage of these three tests? What is the transition
coverage of these tests?
Model-Based Testing (MBT) or Conformance Testing. However, in this book we consider that
all software testing should be model-based (see Chapter 2), and not just the approach
using state-transition models. Therefore we deviate from literature here and use the term
ATCG in the context of testing state-transition models.
237
Universidad Politécnica de Valencia Structured Software testing
coverage
model
criteria
Test Case
Generator
test
test
test
case
case
case
responses
Test Case
Executor adapter SUT
stimuli
test
test
test
case pass / fail
case
execution
• If the label in the trace is a stimulus, the adapter transforms the stim-
ulus to a physical label and offers the physical label to the SUT.
• If the label in the trace is a response, the test executor waits until it
receives the logical response from the adapter.
If the SUT returns a response which is not expected in the test case, it
concludes that the test has failed. If the test executor can execute a com-
plete trace of the test case against the SUT, it concludes that the test has
passed. Figure 13.4 is regarded as an off-line method: the set of test cases
is generated before the actual testing takes place at the SUT.
coverage
model
criteria
responses
On-the-fly
adapter SUT
Tester
stimuli
test
test
test
case pass / fail
case
execution
238
Chapter 13 State-transition models
Crucial to test automation – and thus the ATCG approach – is the adapter,
which translates logical labels of the model to physical labels of the SUT
and vice versa. If we consider our coffee machine of our previous exam-
ple, the adapter has to translate the stimulus ?button to an actual press
on the button. Furthermore, the adapter has to observe the deliverance
of !tea and !coffee (in a cup). Fortunately, even without ATCG, the de-
velopers of a SUT need to test their system themselves. So usually the
manufacturer has provided a testing interface to the system to analyse
and diagnose the system. Such an interface can be used to connect the
adapter as well. Still, the development of an adapter can be expensive
and can be a considerable part of the ATCG effort.
So although coverage criteria are input for the ATCG approach, most
bugs are being found during modelling and random testing, which do
not use these coverage criteria.
239
Universidad Politécnica de Valencia Structured Software testing
13.4.1 Model
The process 'main' has no behaviour defined; the semantics of this pro-
cess in terms of an LTS is a single state with no outgoing transitions.
3 https://www.axini.com/
240
Chapter 13 State-transition models
The generic name for a stimulus or a response is label. This name reflects
the fact that the names of the stimuli and responses are used to label
transitions in the underlying transition system.
1 external 'extern'
process('button-tea') {
s0
2
3 # declarations of labels, variables
4 timeout 10.0
?button
5 stimulus 'button', on: 'extern'
response 'tea', on: 'extern'
s1
6
7
8 # behaviour of the process !tea
9 receive 'button'
send 'tea'
10
s2
11 }
Labels like 'button' and 'tea' have to be declared before they can
be used in the behavioural part of the process. The semantics of this
model coincides with the LTS of the Figure 13.1 (a). As can be seen in
this example, a label has to be assigned to a specific channel.
Also note the timeout declaration at the start of the process. The test
tool needs to know how long it has to wait for responses to arrive. This
timeout 10.0 declaration specifies that the default waiting time for re-
sponses is 10.0 seconds in process 'button-tea'. If – when waiting
for the response 'tea' – the response 'tea' does not arrive in 10.0
seconds, the test tool will exit with the verdict fail: quiescence is ob-
served instead of 'tea'.
241
Universidad Politécnica de Valencia Structured Software testing
1 choice {
2 o { <alternative_1> }
3 o { <alternative_2> }
4 ...
5 o { <alternative_n> }
6 }
Below is an example for the LTS we have seen before in Figure 13.1
(repeated below for convenience). The process 'tea-or-coffee' first
waits for 'button' to be pressed. Then it non-deterministically sends a
'tea' or a 'coffee'. The behaviour of the process 'tea-or-coffee'
is captured by the LTS.
1 external 'extern'
2 process('tea-or-coffee') {
3 # declarations of labels, variables
timeout 10.0
s0
4
5 channel('extern') {
6 stimulus 'button'
7 responses 'tea', 'coffee' ?button
}
s1
8
9
10 # behaviour of the process
11 receive 'button' !coffee !tea
12 choice {
13
14
o { send 'tea' }
o { send 'coffee' }
s2 s3
15 }
16 }
242
Chapter 13 State-transition models
mixing of stimuli and responses in a choice, the tester will always give
precedence to the responses: it will wait to observe the responses, and
thus ignore the stimuli. The reason for this is subtle: if we were to
choose the stimulus and later observe the response, we do not know
whether the response is caused by the stimulus or whether the response
is just late.
The syntax and semantics of the repeat construct are similar to those
of the choice construct. The difference with the choice is that after an
alternative has been executed, the repeat construct is executed again.
The stop_repetition statement can be used to break out of the loop.
1 external 'extern'
2 process('tea-or-kick') {
3 timeout 10.0
4 channel('extern') { s0 ?kick s2
5 stimuli 'button', 'kick'
6 response 'tea'
?button !tea
7 }
8 repeat {
9 o { receive 'kick' ; stop_repetition } s1
10 o { receive 'button'; send 'tea' }
11 }
12 }
Again, apart from the repeat construct itself, we introduced some short-
hand notation. The actions in the alternatives of the repeat are written
on a single line. With this syntax the actions have to be separated by a
semi-colon (;).
243
Universidad Politécnica de Valencia Structured Software testing
After the stimulus 'kick', there is no need to explicitly jump to the end
of the process: after the choice, the process has ended anyway.
For the larger example of the ATM, or cash dispenser, from Figure 13.2
and the given the AML constructs that we have seen so far, we can now
make the following AML model:
1 external 'extern'
2 process('cash-dispenser') {
3 timeout 10.0
4 channel('extern') {
5 stimuli 'Card', 'Pincode', 'Amount'
6 responses 'AskPincode', 'AskAmount', 'Wrong', 'Money',
7 'NotEnough', 'Card', 'KeepCard'
8 }
9
10 state 'idle'
11 receive 'Card'
12 send 'AskPincode'
13 receive 'Pincode'
14
15 choice {
16 o { send 'AskAmount'; goto 'give money' }
17 o {
18 send 'Wrong'
19 receive 'Pincode'
20 choice {
21 o { send 'AskAmount'; goto 'give money' }
22 o { send 'Wrong'; send 'KeepCard'; goto 'idle' }
23 }
24 }
25 }
26
27 state 'give money'
244
Chapter 13 State-transition models
28 receive 'Amount'
29 choice {
30 o { send 'Money' }
31 o { send 'NotEnough' }
32 }
33 send 'Card'; goto 'idle'
34 }
EXERCISE 13.5
Consider the Stack machine of Exercise 13.1 again. Write an AML model
for this Stack machine which uses named states and goto statements.
So far, we have only seen models where the observable labels were ab-
stract names. In practice, however, stimuli and responses usually carry
data. AML supports data through parameters on labels. The names
of parameters are strings and the types can be simple types (:integer,
:string, :boolean, :decimal) or structured types (lists, structs, hashes).
In this text, we will only use simple types. The names and types of label
parameters are specified in a list enclosed by curly brackets and have to
be specified with the definition of the label.
For example, consider the following AML model where we have to in-
sert coins to get tea.
1 external 'extern'
2 process('coin-tea-parameters') {
3 timeout 10.0
4 channel('extern') {
5 stimulus 'coin', {'value' => :integer}
6 response 'tea', {'volume' => :integer}
7 }
8
9 repeat {
10 o {
11 receive 'coin', constraint: 'value == 50'
12 send 'tea', constraint: 'volume == 200'
13 }
14 o {
15 receive 'coin', constraint: 'value == 100'
16 send 'tea', constraint: 'volume == 300'
17 }
18 }
19 }
245
Universidad Politécnica de Valencia Structured Software testing
test case will check the value of the parameters as offered by the SUT.
In the example above, we check that the 'volume' is 200 (line 12) or
300 (line 16). Besides equality, AML also supports relational and logical
operators to constrain the parameters, e.g.,
1 receive 'coin', constraint: 'value >= 50 && value < 100'
Here we specify that the 'value' of the 'coin' has to be at least 50,
but smaller than 100.
246
Chapter 13 State-transition models
Figure 13.7 shows the LTS for the process 'coin-tea-total'. Figure 13.8
shows a sample (passing) trace of the process 'coin-tea-total' by
running AMS’s EXPLORER on the model: three coins have been inserted
to deliver one small and two medium cups of tea.
EXERCISE 13.6
Consider the Stack machine of Exercise 13.1 again. In Exercise 13.5 you
developed an AML model which uses named states and explicit goto
statements to model the Stack machine. Rewrite your AML model such
that it no longer uses named states and goto statements but instead
uses a repeat statement and a state variable to count the number of
values on the Stack.
Note that the underlying LTS of both AML models will be different. And
hence the number of states and transitions will be different as well. This
also means that given a SUT and the same test execution, the state and
transition coverage for both AML models could be different.
247
Universidad Politécnica de Valencia Structured Software testing
248
Chapter 13 State-transition models
249
Chapter contents 14
Overview 251
1 Introduction 252
2 Planning: the Master Test Plan (MTP) 252
2.1 The risk analysis: why do we test? 254
2.2 Test strategy: what and how will we test? 256
2.3 Organisation: who will test and where? 256
2.4 Time schedule and budget 257
3 Monitoring: progress through defect tracking 257
4 Reporting: about the bugs 257
5 Controlling and adjusting: Test Process Improvement (TPI) 258
250
Chapter 14
OVERVIEW
There is much more to software testing than designing test cases. Test-
ing software is an engineering activity, and like any engineering project,
it should be managed using well-established test project management
processes. Test project management, or just test management, is all
about these processes.
LEARNING GOALS
After studying this chapter, you are expected to:
– know that test management consists of many different activities
that are needed for the planning, monitoring, controlling, report-
ing and adjusting of test activities
– know the high-level contents of a test plan.
– understand how simple defect data can be used to monitor the
progress of testing
– explain why the bug reporting process is critical
– be able to list some properties of good bug reports
– name some of the most famous Test Process Improvement (TPI)
models.
This chapter contains reading assignments that involve reading the fol-
lowing papers, the links to which you can find on the course site:
251
Universidad Politécnica de Valencia Structured Software testing
[16] R. Black.
Charting the progress of system development using defect data.
12th International Software Quality Week, 24-28 May 1999.
[17] R. Black.
The bug reporting processes.
In Journal of Software Testing Professionals, 2000.
CONTENTS
14.1 Introduction
We will not treat each one of those here in detail; that is beyond the
scope of this course. In this chapter, we give an overview of the tasks
and processes that make up the planning, monitoring, controlling, re-
porting and adjusting. As a vehicle we have made the mind map from
Figure 14.1.
For two specific subjects (monitoring progress through defect tracking and
reporting defects and bugs) we have included two reading assignments of
the aforementioned articles by Rex Black. Although these were written
at the turn of the century, they describe insights and good practices that
are still needed and used widely today.
The Master Test Plan (MTP) is a document that describes in detail how
the testing is being planned and how it will be managed across the en-
tire test project. Although agile contexts write test lans with less details,
it is still important to have an overview of the why, what, how, who,
where, when and how much.
Many templates can be found on the internet. The most well-known are
offered by:
1 https://www.softwaretestingstandard.org/
2 TMAP ®is a registered trademark of Sogeti.
3 TMAP ®Next is a registered trademark of Sogeti.
252
Chapter 14 Test management and test process improvement
253
Universidad Politécnica de Valencia Structured Software testing
The why section is probably the most important of the MTP. It describes
the reasons why we are planning all these test activities the way we are.
Basically there is no reason for testing nor the need to make an MTP if
there is no risk [119, 67, 104]:
no risk ⇒ no test
4 https://www.softwaretestingstandard.org/
5 https://www.istqb.org/
6 For more examples, [96] contains an extensive checklist for project risks.
254
Chapter 14 Test management and test process improvement
FIGURE 14.2 Example of MoSCoW priority chart for testing from [96]
During a risk analysis we need to identify the risks and for each risk
consider two aspects:
• likelihood: the probability that the circumstances or events occur;
• impact: the relative importance of the consequences.
Several of the above mentioned methodologies [96, 67] apply the MoSCoW
method for testing. MoSCoW was developed by Dai Clegg of Ora-
cle UK in 1994 to explain prioritisation. The term MoSCoW itself is
an acronym derived from the first letter of each of four prioritisation
categories (Must, Should, Could, and Won’t). For testing it has been
adapted in [96] to the example given in Figure 14.2. This model can be
adapted to the different types of software and clients.
255
Universidad Politécnica de Valencia Structured Software testing
Again, numerous books discuss how to do risk analysis for testing specif-
ically [37, 18, 67, 104] and project management in general. Moreover,
tools exist to help. Basically, the process comes down to doing brain-
storming sessions with as many different types of stakeholders involved
as possible. Books about brainstorming can also fill up an entire book-
case.
The test strategy describes the what we need to test and the how. Evi-
dently, our goal is to test the system that is being developed in some
software development project. But what are the properties and char-
acteristics of this system, both functional as well as non-functional (us-
ability, accessibility, security, performance, et cetera)? At what levels
do we need to test (unit, integration, system, acceptance) and do we
need to test the same for all subsystems? Do we only need to test the
software? Or also other accompanying artefacts (user documentation,
installation guides, et cetera)? And what about the hardware?
The test organisation is about the people, their tasks, their skills, experi-
ence, education, knowledge, et cetera. Do they have the right technical
as well as business knowledge? Do we need additional training?
256
Chapter 14 Test management and test process improvement
Again, this is not different from any other project. To learn more about
scheduling and budgeting, just pick up any one of your favourite project
management books.
The article [16] shows how simple means may provide insight in the
testing and development process. These “simple means” are charts that
can be drawn from basic logging information gathered during the test-
ing process. The first two charts that are described are not immediately
obvious, while the last two are simple but useful. The author also de-
scribes conclusions that may be drawn from these charts, with the pre-
caution that you always have to be careful: sometimes things are not
what they seem. A chart that looks good does not guarantee that all is
well. However, a chart that looks strange is definitely a reason to take a
better look.
Reading assignment: Read the article [16]. You may skip the para-
graphs that describe how to get it to work in Excel.
The second article, [17], uses the same case study as the first, and actu-
ally refers to the first article at some point (page 12, line -18; at least, we
think that [16] fits the description of that “last article” fairly well).
If you have no experience with large test projects, it may be just too
much information when reading this article for the first time. In that
case, we suggest leaving it for a couple of days (or weeks) and then
reading it again.
The article [17] contains some typos. We mention one that may cause
confusion:
page 3, line -17 should read ‘... e.g., I could disambiguate this article
by using the alternative phrase “be clear” instead of disambiguate.’ By
“this article” the author means the current item in the numbered list.
You may also have noticed that the tester in the case study is doing
exploratory testing on the fly, much like James Bach does in Chapter 4.
257
Universidad Politécnica de Valencia Structured Software testing
For testing, there are also improvement models that make it easier to
Plan, Do, Check and Act on specific test processes. We will mention two
of them that are the most well known. There is TMMi (Test Maturity
Model) that is described in [28] and online [43]. And there is TPI7 (Test
Process Improvement) [66] that has more recently been upgraded to TPI
Next8 [120].
7 TPI ®is a registered trademark of Sogeti.
8 TPI ®Next is a registered trademark of Sogeti.
258
Chapter 14 Test management and test process improvement
TMMi is a staged model for test process improvement (see Figure 14.3).
That means it contains levels of maturity through which an organisation
passes while its testing processes evolve from ad hoc and unmanaged,
to managed, defined, measured, and in optimization mode. The model
describes the different maturity levels and how you can advance from
one to another.
259
Universidad Politécnica de Valencia Structured Software testing
Bibliography
260
[15] R. Black, L. Van Der Aalst, and J.L. Rommens. The Expert Test
Manager: Guide to the ISTQB Expert Level Certification. Rocky
Nook, 2017.
[17] Rex. Black. The bug reporting processes. Journal of Software Testing
Professionals, 2000.
[21] Boriz Beizer and Otto Vinter. Bug taxonomy and statistics. Tech-
nical report? Software Engineering Mentor, p. 2630, http:
//ottovinter.dk/Finalrp3.doc, 2001. [Online; accessed
27-2-2018].
[30] T. Buzan and B. Buzan. The Mind Map Book. Mind set. BBC Active,
2006.
261
Universidad Politécnica de Valencia Structured Software testing
[31] Hung Q. Nguyen Cem Kaner, Jack Falk. Testing Computer Soft-
ware. Wiley, 1999.
[32] T. S. Chow. Testing software design modeled by finite-state ma-
chines. IEEE Transactions on Software Engineering, SE-4(3):178–187,
May 1978.
[33] M.B. Chrissis, M. Konrad, and S. Shrum. CMMI for Development:
Guidelines for Process Integration and Product Improvement. SEI Se-
ries in Software Engineering. Pearson Education, 2011.
[34] Ross Collard. Appendix 1: Analyzing the triangle prob-
lem. http://www.testingeducation.org/conference/
wtst3_collard5.pdf, 2004. [Online; accessed 4-8-2018].
[35] Ross Collard. Exercise: Analyzing the triangle prob-
lem. http://www.testingeducation.org/conference/
wtst3_collard4.pdf, 2004. [Online; accessed 4-8-2018].
[36] Lee Copeland. A Practitioner’s Guide to Software Test Design. Soft-
ware Testing. Artech House, 2004.
[37] Rick D. Craig and Stefan P. Jaskiel. Systematic Soft-
ware Testing. Artech House, Inc. online available here:
https://flylib.com/books/en/2.174.1/, Norwood, MA, USA,
2002.
[38] Daily Mail UK. Up to 300,000 heart patients may have been
given wrong drugs or advice due to major NHS IT blun-
der. http://www.dailymail.co.uk/health/article-
3585149/Up-300-000-heart-patients-given-wrong-
drugs-advice-major-NHS-blunder.html, 2016. [Online;
accessed 4-2-2017].
[39] G. de Vries and E. Roodenrijs. Template master test plan. http:
//www.tmap.net/sites/default/files/Template_
Master_Test_Plan_TMap_NEXT_v2_1%20%281%29.doc,
2019. [Online; accessed 03-06-2019].
[40] René G. de Vries and Jan Tretmans. On-the-fly conformance test-
ing using SPIN. STTT, 2(4):382–393, 2000.
[41] Edsger W. Dijkstra. Letters to the editor: go to statement consid-
ered harmful. Communications of the ACM, 11(3):147–148, 1968.
[42] A. G. Duncan and J. S. Hutchison. Using attributed grammars to
test designs and implementations. In Proceedings of the 5th Interna-
tional Conference on Software Engineering, ICSE ’81, pages 170–178,
Piscataway, NJ, USA, 1981. IEEE Press.
[43] TMMi foundation. Tmmi. https://www.tmmi.org/, 2019.
[Online; accessed 03-06-2019].
[44] Lars Frantzen, Jan Tretmans, and Tim A. C. Willemse. Test gen-
eration based on symbolic specifications. In Jens Grabowski and
Brian Nielsen, editors, Formal Approaches to Software Testing, 4th
International Workshop, FATES 2004, Linz, Austria, September 21,
2004, Revised Selected Papers, volume 3395 of Lecture Notes in Com-
puter Science, pages 1–15. Springer, 2005.
262
[45] R. S. Freedman. Testability of software components. IEEE Trans-
actions on Software Engineering, 17(6):553–564, June 1991.
[46] De Gelderlander. Foutje: 104-jarige Zweedse mag naar kleuter-
school. http://www.gelderlander.nl/bizar/foutje-
104-jarige-zweedse-mag-naar-kleuterschool~
ab175865/, 2016. [Online; accessed 4-2-2017].
[47] Gerald M. Weinberg . Perfect Software and other illusions about test-
ing. Dorset House, 2008.
[48] Gerald M. Weinberg . Errors Bugs, Boo-boos, Blunders. Leanpub,
2015.
[49] John B. Goodenough and Susan L. Gerhart. Toward a theory of
test data selection. SIGPLAN Not., 10(6):493–510, April 1975.
[50] John B. Goodenough and Susan L. Gerhart. Toward a theory of
test data selection. In Proceedings of the International Conference on
Reliable Software, pages 493–510, New York, NY, USA, 1975. ACM.
[51] Mats Grindal, Jeff Offutt, and Sten F. Andler. Combination testing
strategies: a survey. Software Testing, Verification and Reliability,
15(3):167–199, 2005.
[52] Matthias Grochtmann and Klaus Grimm. Classification trees
for partition testing. Software Testing, Verification and Reliability,
3(2):63–82.
[53] William E. Howden. Theoretical and empirical studies of pro-
gram testing. In Proceedings of the 3rd International Conference on
Software Engineering, ICSE ’78, pages 305–311, Piscataway, NJ,
USA, 1978. IEEE Press.
[54] Michael Hunter. You are not done yet-checklist.
https://www.thebraidytester.com/downloads/
YouAreNotDoneYet.pdf, 2010. [Online; accessed 03-06-
2019].
[55] Bingchiang Jeng and Elaine J. Weyuker. A simplified domain-
testing strategy. ACM Trans. Softw. Eng. Methodol., 3(3):254–270,
July 1994.
[56] Jean-Marc Jézéquel and Bertrand Meyer. Design by contract: The
lessons of ariane. Computer, 30(1):129–130, January 1997.
[57] Paul C. Jorgensen. Software Testing: A Craftsman’s Approach. CRC
Press, Inc., Boca Raton, FL, USA, 4th edition, 2014.
[58] René Just. The major mutation framework: Efficient and scal-
able mutation analysis for java. In Proceedings of the 2014 Inter-
national Symposium on Software Testing and Analysis, ISSTA 2014,
pages 433–436, New York, NY, USA, 2014. ACM.
[59] René Just, Darioush Jalali, Laura Inozemtseva, Michael D Ernst,
Reid Holmes, and Gordon Fraser. Are mutants a valid substitute
for real faults in software testing? In Proceedings of the 22nd ACM
SIGSOFT International Symposium on Foundations of Software Engi-
neering, pages 654–665. ACM, 2014.
263
Universidad Politécnica de Valencia Structured Software testing
[61] Cem Kaner, James Bach, and Bret Pettichord. Lessons Learned in
Software Testing. John Wiley & Sons, Inc., New York, NY, USA,
2001.
[65] E. Kit and S. Finzi. Software Testing in the Real World: Improving the
Process. ACM Press books. Addison-Wesley, 1995.
[67] Tim Koomen, Leo van der Aalst, Bart Broekman, and Michiel
Vroon. Tmap Next - voor resultaatgericht testen. Tutein Nolthenius,
Den Bosch, The Netherlands, 1st edition, 2006.
[71] D.R. Kuhn, D.R. Wallace, and A.M. Gallo. Software fault interac-
tions and implications for software testing. IEEE Transactions on
Software Engineering, 30(6):418–421, jun 2004.
[72] Yu-Seung Ma, Jeff Offutt, and Yong-Rae Kwon. Mujava: A muta-
tion system for java. In Proceedings of the 28th International Confer-
ence on Software Engineering, ICSE ’06, pages 827–830, New York,
NY, USA, 2006. ACM.
[74] Brian Marick. The Craft of Software Testing: Subsystem Testing In-
cluding Object-based and Object-oriented Testing. Prentice-Hall, Inc.,
Upper Saddle River, NJ, USA, 1995.
264
[75] R. Marselis, J. van Rooyen, C. Schotanus, and I. Pinkster. TestGrip:
Gaining Control on IT Quality and Processes Through Test Policy and
Test Organisation. LogicaCMG, 2007.
[83] Glenford J. Myers. The Art of Software Testing. John Wiley & Sons,
Inc., New York, NY, USA, 1st edition, 1979.
[84] Glenford J. Myers, Corey Sandler, and Tom Badgett. The Art of
Software Testing. John Wiley & Sons, Inc., New York, NY, USA,
3rd edition edition, 2011.
265
Universidad Politécnica de Valencia Structured Software testing
266
[105] Roger G Schroeder, Kevin Linderman, Srilata Zaheer, and
Adrian S Choo. Six sigma: a goal-theoretic perspective. Qual-
ity control and applied statistics, 49(1):49–50, 2004.
[107] Richard W Selby, Victor R Basili, Jerry Page, and Frank E Mc-
Garry. Evaluating software testing strategies. In Proc. Ninth Annu.
Software Eng. Workshop, 1984.
[116] Jan Tretmans. Model based testing with labelled transition sys-
tems. In Robert M. Hierons, Jonathan P. Bowen, and Mark
Harman, editors, Formal Methods and Testing, An Outcome of the
FORTEST Network, Revised Selected Papers, volume 4949 of Lecture
Notes in Computer Science, pages 1–38. Springer, 2008.
267
Universidad Politécnica de Valencia Structured Software testing
[118] Roland H Untch, A Jefferson Offutt, and Mary Jean Harrold. Mu-
tation analysis using mutant schemata. In ACM SIGSOFT Soft-
ware Engineering Notes, volume 18, pages 139–148. ACM, 1993.
268