0% found this document useful (0 votes)

66 views

Structured Software Testing

Structured software testing

Uploaded by

Rafa Esparza

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

Structured Software Testing

Structured software testing

Uploaded by

Rafa Esparza

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 263

Universidad Politécnica de Valencia Structured Software testing

Structured software testing

Libro de texto para el curso

Pruebas de Software Estructuradas

(PSE)
Máster Universitario en Ingeniería y Tecnología de Sistemas Software

2021-2022

Authors
T. E. J. Vos
N. van Vugt-Hage

External contributors
J. M. Bach (Chapters 3 and 4)
prof. dr. G. Fraser (Chapter 10)
A. Gambi (Chapter 10)
prof. dr. E. Miranda (Chapter 11)
dr. T. Ruys (Chapter 13)

6
Contents

1 Introduction to software testing 14

Overview 15

1 Why do we test? 15
2 Software quality 17
3 Errors, faults and failures 19
4 Vocabulary and terminology in software testing 23
5 Some famous cases of software failure 24
6 What do we test? We cannot test everything! 26
7 What do we test? Test case design techniques 28
8 What is the best set of test cases? 30
9 How can we evaluate the quality of a test suite? 31
9.1 Coverage criteria 31
9.2 Mutation scores 33
10 Levels of testing 33
10.1 Unit testing 34
10.2 Integration testing 34
10.3 System testing 37
10.4 Acceptance testing 39
11 And after we change something: regression testing 40
12 Test project management 41
13 The philosophy of this course 41

2 Testing is model-based 46
Overview 47

1 Different kinds of models 47

2 Model – coverage criterion – test cases 50
3 Black-box or white-box? Model-based! 50
4 Exploring to get useful models 51

3 The exploratory nature of software testing 52

Overview 53

1 Introduction 53
2 Testing is not A science; testing IS science. 54
3 Formality vs. informality 55
4 Agency vs. algorithm 56
5 Deliberation vs. spontaneity 56
6 Tacit vs. explicit 57
7 How exploration typically works in software testing 58
8 Testing is not proving; verification is not testing 59

4 Let us do some testing 62

Overview 63

7
Universidad Politécnica de Valencia Structured Software testing

1 Introduction 63
2 Recording of an exploratory testing process 66
2.1 What is the nature of this testing? 70
2.2 How did I choose that test string? 71
2.3 How did I choose to use a loop? 71
2.4 How did I know what the Reflow function is supposed to do? 71
2.5 What should this kind of testing be called? 71
2.6 What about the oracle? 72
2.7 What is the next step? 72
2.8 How about a de Bruijn sequence? 74
2.9 Progressively smaller words 74
2.10 Exactly sized lines 76
2.11 Try the interesting characters 77
2.12 Consecutive special characters 77
2.13 Have we tested enough? 77
2.14 Final bug list 78

5 Taxonomies, catalogs and checklists 80

Overview 81

1 Introduction 81
2 Defect taxonomies 83
2.1 Boris Beizer’s taxonomy 83
2.2 Kaner, Falk, and Nguyen’s taxonomy 84
2.3 Whittaker’s ”How to Break Software” taxonomy 84
3 Catalogs and checklists 85
3.1 Catalogs 85
3.2 Checklists 87
4 Your taxonomy, catalog or checklist 90

6 Input domain modelling with equivalence classes 92

Overview 93

1 Equivalence classes 93
2 Partitioning the inputs 94
3 History and related work 97
4 Making a model 97
4.1 Example: testing at GUI level 98
4.2 Example: testing at code level 102
4.3 Example: testing at class interface level 104
5 Exercises 106
6 Coverage criteria 107
6.1 All Combinations coverage 108
6.2 Each Choice coverage 109
6.3 Other ways of combining 110
6.4 Invalid Part coverage 111

8
CONTENTS

7 Exercises 112
8 Faults that can be found 113

7 Input domain boundaries 114

Overview 115

1 Introduction 115
2 Boundaries: ON and OFF points 116
3 Finding ON and OFF points for different types of values 117
3.1 Numerical types 117
3.2 Non-numerical types 118
3.3 User-defined types 118
3.4 n-dimensional types 120
4 1x1 boundary coverage 122
5 The domain test matrix 122
5.1 Domain test matrices for generate-grade 123
5.2 Exercises 128
6 Faults that can be found 130

8 Decision tables 134

Overview 135

1 Introduction 135
2 Decision tables 136
2.1 Extended/limited entry decision tables 138
2.2 Implicit variant: do not care (DNC) 140
2.3 Implicit variant: cannot happen (CNH) 140
2.4 Implicit variant: do not know (DNK) 141
2.5 Summary 142
3 Checking the decision table 144
3.1 Check implicit variants 144
3.2 Check decision table properties 144
3.3 Check testability 145
4 Coverage criteria for decision tables 145
5 Exercises 145

9 Combinatorial testing 148

Overview 149

1 Introduction 149
2 Faults due to interactions of conditions 150
3 Combinatorics 151
4 Orthogonal and covering arrays 153
4.1 Orthogonal arrays 153
4.2 Covering arrays 156
5 Challenges for practical application of combinatorial testing 157

9
Universidad Politécnica de Valencia Structured Software testing

6 The oracle problem 157

7 Configurations and other test relevant aspects 158

10 Mutation testing 160

Overview 161

1 Introduction 161
2 How tests find faults 162
3 Fault-based testing 164
4 Mutation testing 165
4.1 Central hypotheses 168
4.2 Equivalent mutants 169
5 Scalability 171
5.1 Skipping unreachable mutants 171
5.2 Mutant schemata 171
5.3 Mutant sampling 173
6 Mutation testing in practice 173
6.1 Mutation testing at Google 174
6.2 Mutation testing for fun 174
7 Summarising the mutation testing method 175

11 Classification trees 176

Overview 177

1 Introduction 178
2 Classification trees 178
3 Modelling with a classification tree 179
4 Test relevant aspects 181
5 An example: The Find command 182
6 Combinatorial coverage criteria 184
7 Designing the test cases 185
8 Summarising the classification tree method 185
9 Tool support 187
10 More examples 187
10.1 The flexible manufacturing system 187
10.2 The audio amplifier 193
10.3 The password diagnoser 195
11 Exercise 201

12 Graph models 204

Overview 205

1 Introduction 205
2 Graphs 206
3 Paths in graphs 207

10
CONTENTS

4 General graph-based coverage criteria 209

4.1 Vertex coverage 210
4.2 Edge coverage 210
4.3 Path coverage 211
5 Procedures for making test suites in the case of cycles 211
5.1 Transition trees 211
5.2 Prime paths 216
6 Graph coverage for source code 217
6.1 Control flow graph from source code 218
6.2 Statement and branch coverage 220
6.3 Condition coverage 221
6.4 Multiple condition coverage 221
6.5 Other code coverage criteria 222
6.6 McCabe’s cyclomatic complexity 222
7 Graph coverage for state machines 223
8 Flowcharts 226

13 State-transition models 228

Overview 229

1 Introduction 229
2 Labelled transition tystems 231
2.1 Conformance 234
2.2 Coverage 236
3 Test case generation from LTS 237
4 Axini Modelling Language 240
4.1 Model 240
4.2 Communication: labels 241
4.3 Non-deterministic choice 241
4.4 Loop: repeat 243
4.5 States and goto 243
4.6 Data: parameters 245
4.7 State variables 246
4.8 Advanced features 248

14 Test management and test process improvement 250

Overview 251

1 Introduction 252
2 Planning: the Master Test Plan (MTP) 252
2.1 The risk analysis: why do we test? 254
2.2 Test strategy: what and how will we test? 256
2.3 Organisation: who will test and where? 256
2.4 Time schedule and budget 257
3 Monitoring: progress through defect tracking 257
4 Reporting: about the bugs 257

11
Universidad Politécnica de Valencia Structured Software testing

5 Controlling and adjusting: Test Process Improvement (TPI) 258

12
CONTENTS

13
Chapter contents 1

Introduction to software testing

Overview 15

14
Chapter 1

Introduction to software testing

OVERVIEW

This chapter, as the title implies, introduces the why, what and how of
software testing. It introduces some of the terminology used in soft-
ware testing, and presents some famous software failures due to in-
sufficient testing. It discusses different phases of testing and types of
testing. Moreover, it describes how testing needs to be done at different
levels and how all test processes need to be managed.

The chapter will also discuss the philosophy of this course and how we
consider test case design as being both model-based and exploratory.

Some might find this chapter long and somewhat boring. We under-
stand. However, this chapter is necessary to ensure that we all use the
same terminology and are able to communicate with the rest of the field
of software testing. Moreover, it provides an overview of software test-
ing.

LEARNING GOALS
After studying this chapter, you are expected to:
– understand why testing is important and what the purpose of test-
ing is
– understand the concepts and terminology of software testing as
used in this course
– have an overview of various aspects that comprise the field of soft-
ware testing: test case design, test levels, test quality, test manage-
ment.

CONTENTS

1.1 Why do we test?

We need to test because we are fallible: we make mistakes. And lit-

tle mistakes can have major consequences. In one of his books, Gerald
Weinberg [48] demonstrates this with an example from coding an in-
struction in natural language. Somebody needed to send a telegram
that said:

NO, PRICE IS TOO HIGH.

However, due to a mistake, the comma was forgotten, and so the tele-
gram read:

NO PRICE IS TOO HIGH.

15
Universidad Politécnica de Valencia Structured Software testing

One forgotten comma cost the sender of the telegram lots of money!

When engineering software, we are even more fallible. Moreover, the

impact of software problems is growing in our society, which increas-
ingly depends on software for everyday things (communication, bank-
ing, transport, security, etc.).

Boris Beizer, in one of his seminal books on software testing [11], dis-
cusses the purpose of testing in terms of five phases of maturity a test
process can have. This maturity is characterised by the goals and thoughts
of the testers:

PHASE 0 - Thinking that testing and debugging are the same.

Testers do not draw a distinction between testing and debugging. This

type of thinking was true in the early 1970s when testing emerged as
a discipline. Also in many programming classes, this is what students
do. They get their programs to compile, then debug the program with a
few inputs that are either chosen arbitrarily or provided by the teacher.

PHASE 1 - Thinking that the purpose of testing is to show that the soft-
ware works.

Compared to Phase 0 this way of thinking is more mature. However,

it is self-corrupting as was shown by Glenford Myers in his seminal
work on software testing [83]1 . Myers discusses the psychology of test-
ing. He describes how human beings tend to be highly goal-oriented,
and establishing the proper goal has an important psychological effect.
If our goal is to demonstrate that software does not fail, then we will
subconsciously be steered towards this goal; that is, we tend to select
test data that have a low probability of causing the software to fail. On
the other hand, if our goal is to demonstrate that software can fail, our
test data will have a higher probability of finding failures. The latter
approach will add more value to the software than the former.

PHASE 2 - Thinking that the purpose of testing is to show that the soft-
ware does not work.

Although looking for failures will make us better testers, these negative
goals also have side-effects. One important side-effect is team morale:
testers want to find the mistakes that programmers are trying hard not
to make. Consequently, Phase 2 testing goals put testers and develop-
ers in an adversarial relationship and in practice this does not create an
ideal working atmosphere. Another problem in this phase is to know
what to do when no failures are found. Is the software very good? Or
is the testing very bad? When do we know that we can stop testing?
Many share the view that Phase 2 is where most of the software indus-
try currently is [4].

PHASE 3 - Thinking that the purpose of testing is to reduce risk.

This phase’s thinking is nothing more than accepting the limits of test-
ing. Edsger Dijkstra, the famous Dutch computer pioneer, already said
it in 1969:

1 The 3rd and latest edition is from 2011 [84].

16
Chapter 1 Introduction to software testing

Program testing can be used to show the presence of bugs, but

never to show their absence!
—- Dijkstra (1969) "Notes On Structured Programming"
(EWD249)
Section 3 ("On The Reliability of Mechanisms"), corollary at
the end2

If we accept this limitation of testing, we accept that there is always a

risk of remaining failures. The goal of our testing is therefore to reduce
that risk. The more we test, the more confidence we have in our prod-
uct. We will risk releasing the software to our clients when that confi-
dence is high enough. More and more companies are moving towards
this phase [4] and call this risk-based testing.

PHASE 4 - Thinking that testing is a state of mind, the purpose of which

is to develop higher quality software.

In this phase, we know what we can and cannot do with testing. Testa-
bility of software becomes the new goal, meaning that software will
be constructed in such a way that it makes testing easier. First of all
this will reduce the effort of testing and second, and more importantly,
testable code has fewer bugs than code that is hard to test.

EXERCISE 1.1
Describe in your own words what the difference is between testing and
debugging.

EXERCISE 1.2
Describe in your own words what ”higher quality software” (as stated
in PHASE 4) would mean.

1.2 Software quality

Beizer’s last phase indicates that the purpose of testing is to develop

higher quality software. But what is quality? And what is quality of
software? Finding a good definition for quality is not that easy. This
is because quality is a subjective characteristic that depends on who is
looking (people) and what they are looking at (context).

Imagine a piece of software with a very unattractive user interface that

crashes when you try to minimize it and gives strange error messages
when you click some OK buttons. However, the software, to you, offers
some functionalities that are really helpful for the tasks you have to do
at your work and increase your productivity a lot. What is the quality
of this piece of software for you? What about other people?

It depends . . .

The American Society of Quality, in the glossary [88], tries to embrace

the subjectiveness of quality and proposes the following definition:
2 http://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF

17
Universidad Politécnica de Valencia Structured Software testing

FIGURE 1.1 ISO/IEC 25010 quality model [3].

18
Chapter 1 Introduction to software testing

DEFINITION 1.1 Quality is a subjective term for which each person or sector has its own
definition. In technical usage, quality can have two meanings:

1) the characteristics of a product or service that bear on its ability to

satisfy stated or implied needs;

2) a product or service free of deficiencies.

According to Joseph Jurana , quality means ”fitness for use”; according

to Philip Crosbyb , it means ”conformance to requirements.”
a Joseph Moses Juran (December 24, 1904 - February 28, 2008) is a famous author of

quality and quality management books.

b Philip Bayard "Phil" Crosby (June 18, 1926 - August 18, 2001) is a known writer that

contributed to quality management practices.

There exist many quality models that define all sorts of quality characteris-
tics for software products that can be used to define these two mean-
ings. Famous computer scientists like Barry W. Boehm [19] already
started defining characteristics of software quality in 1978. The most
recent and detailed definition of a quality model for software products
can be found in the series of standards ISO/IEC 25000, also known as
SQuaRE (System and Software Quality Requirements and Evaluation)
[3]. Specifically part 25010 describes the model, consisting of character-
istics and subcharacteristics, for software product quality (cf. the mean-
ing in item 1) in Definition 1.1: satisfy needs), and software quality in
use (cf. item 2) in Definition 1.1: quality in use). In Figure 1.1 the quality
characteristics from this model are depicted.

In this course we will mainly concentrate on testing functional suitabil-

ity (or functionality) and freedom from risk, in the sense of freedom
from failures. In Section 1.10.3 we will briefly look at some of the other
quality characteristics and how to test them.

The field of software quality does not only discuss quality of products,
it also looks at processes. Software development processes for design-
ing the software are for example requirements, design, implementation,
testing, maintenance, et cetera. More about this is in Section 1.12 and
Chapter 14.

1.3 Errors, faults and failures

In testing, there is no consensus on the terminology for the concepts of

error, fault, failure, incidents and bugs. For this course, we will stick
to the terminology used by ISTQB, the International Software Testing
Qualification Board3 , as described in [111]. But do not assume that ev-
erybody uses this. Therefore when you discuss a problem with some-
one check that you have the same understanding of the terminology or
vocabulary that people use.

DEFINITION 1.2 Error A human action that produces an incorrect result, for example a
mistake, a misunderstanding, a misconception, et cetera.

3 http://www.istqb.org/

19
Universidad Politécnica de Valencia Structured Software testing

DEFINITION 1.3 Fault or defect A flaw in a component or system (e.g. an incorrect state-
ment or data definition) that can cause the component or system to enter
an incorrect state (e.g. variable gets assigned the wrong value). A fault,
if encountered during execution, may cause a failure of the component
or system but it can also go unnoticed.

DEFINITION 1.4 Failure A deviation of the component or system from its expected deliv-
ery, service or result.

To gain a better understanding of the relation between the three con-

cepts we will adapt an analogy from [4].

Consider a medical doctor diagnosing a patient. The patient informs

the doctor with a list of symptoms (failures). The doctor must then dis-
cover the error, that is the root cause of the symptoms. To help make the
diagnosis, the doctor might want to do some tests that look for anoma-
lous internal conditions such as high blood pressure, irregular heart-
beat, high cholesterol, et cetera. These conditions would correspond to
the faults.

For examples of errors, faults and failures in software, let us look at a

code example also borrowed from [4].
/**
* Counts the amount of zeros in an array
* @param x: the array to count zeros in
* @return: the number of occurrences of 0 in array x
* @throws: NullPointerException if array x is null
*/
public static int numZero (int[] x) {
int count = 0;
for (int i=1; i < x.length; i++) {
if (x[i] == 0) count++;
}
return count;
}

Error
The programmer has made a mistake writing this code. Maybe the pro-
grammer made a typo: typing 1 instead of 0. Maybe the programmer
did not know that in Java the first element of an array resides at index
0. Maybe the programmer re-used some code and forgot to adjust the
index.

Fault
As a result of this error, the first element in the array is never checked
to be zero and so is not counted if it happens to be zero.

Failure
The fault only propagates to a failure that is visible to the user when
numZero is called with an array that has a zero in the first element:

input [0, 4, 6, 8]
expected result 1
actual result 0
verdict FAILURE

20
Chapter 1 Introduction to software testing

If there is no zero in the first element, the fault will be executed but does
not result in a failure.
input [1, 4, 0, 8]
expected result 1
actual result 1
verdict PASS

It is clear what a failure is: some incorrect behaviour of the software

that is visible to the user. However, the distinction between error and
fault might be more obscure. Therefore, we do not want to spend too
much time on distinguishing errors and faults. Most of the time these
two definitions can be used interchangeably anyway.

/**
* Find last index of element.
*
* @param x array to search
* @param y value to look for
* @return last index of y in x; -1 if absent
* @throws NullPointerException if x is null
**/
public int findLast(int[] x, int y) {
for (int i=x.length-1; i>0; i--) {
if (x[i] == y) {
return i;
}
}
return -1;
}

/**
* Find LAST index of zero.
*
* @param x array to search
* @return index of last 0 in x; -1 if absent
* @throws NullPointerException if x is null
**/

public int lastZero(int[] x) {

for (int i=0; i<x.length; i++) {
if (x[i] == 0) {
return i;
}
}
return -1;
}

FIGURE 1.2 Faulty programs for Exercise 1.3 adapted from [4].

EXERCISE 1.3
In Figures 1.2 and 1.3 there are four faulty programs. For each of the
programs:
a Explain what is wrong with the code. Describe the fault precisely
and propose a modification to the code.
b If possible, give a test case that does not execute the fault. Briefly
explain why.

21
Universidad Politécnica de Valencia Structured Software testing

/**
* Count positive elements in array.
* @param x array to search
* @return number of positive elements in x
* @throws NullPointerException if x is null
**/
public int cntPositive(int[] x){
int count = 0;
for (int i=0; i<x.length; i++) {
if (x[i] >= 0) {
count++;
}
}
return count;
}

/**
* Count odd or positive elements in an array.
* @param x array to search
* @return number of odd or positive elements in x
* @throws NullPointerException if x is null
**/
public int oddOrPos(int[] x) {
int count = 0;
for (int i=0; i<x.length; i++) {
if (x[i]%2 == 1 || x[i] > 0){
count++;
}
}
return count;
}

FIGURE 1.3 More faulty programs for Exercise 1.3 adapted from [4].

c If possible, give a test case that does execute the faulty code but does
not result in failure. Briefly explain why.
d If possible, give a test case that does execute the faulty code and re-
sults in failure. Briefly explain why.

We end this section by explaining a few more words used in this con-
text: bug, issue and incident. Perhaps you have already discovered
on the Internet that the software engineering community has not yet
reached a consensus on which words to use for what.

Bug
If you look up the word bug on the Internet, you will find definitions
that include all the words we have defined above. Some will define it
as an error, others will define it as a fault, and yet other definitions relate
bugs only to failures. On Wikipedia4 they play safe by mentioning them
all:

A software bug is an error, flaw, defect, failure or fault in a computer program

or system that causes it to produce an incorrect or unexpected result, or to
behave in unintended ways.

4 https://en.wikipedia.org/wiki/Software_bug (last visited september 2019)

22
Chapter 1 Introduction to software testing

Also, you will find many people having all kinds of opinions about
whether the word bug may still be used. We will not go into that here
in this course. The only thing we do want to tell you about the word
bug is related to Grace Hopper, just because this is a nice historical com-
puter science story. Whether or not Grace Hopper is the first to have
issued the term “computer bug” is another thing, but there is an actual
logbook of the Mark II Aiken Relay Calculator while it was being tested
at Harvard University, on 9 September 1947. And in this logbook you
can find the first actual case of a bug being found (see Figure 1.4).

FIGURE 1.4 The first real computer bug

Incident
The word incident is often used when something suspicious has hap-
pened, but it is not yet clear what it is. It is a symptom that something
is wrong and that alerts the tester or user that a failure might come.

Issue
The word issue is used in an even broader sense to state that something
is going on but without making claims about where it comes from, if it
is a failure due to some fault, or whether it should be fixed. This ter-
minology is sometimes used such that the customer cannot claim that
things should be fixed during the warranty period of a software pro-
gram since they are not recognised as real failures. Also, the word is
sometimes used to avoid offending programmers and prevent harm to
the team morale.

1.4 Vocabulary and terminology in software testing

In this section we want to look a bit more into the history of the software
testing community and why it has not yet reached a consensus on the
vocabulary and terminology in software testing.

23
Universidad Politécnica de Valencia Structured Software testing

Organisations like ISTQB5 , IEEE6 and ISO7 have tried to define vocab-
ulary for software testing, to come to a consensus and reach standard-
isation. Standards do exist and are being worked on to deal with this,
but there is still a long way to go.

Let us give a very brief historical overview of standardisation in testing,

just to give you a flavour of what is going on and where we are now.

In May 2007, the ISO Software Testing Group (WG26) of the ISO/IEC
JTC1/SC7 Software and Systems Engineering committee started the de-
velopment of ISO/IEC/IEEE 29119 [1], a series of five international
standards for software testing. The goal of the standard: to define vo-
cabulary, processes, documentation, techniques, and a process assess-
ment model for testing that can be used within any software develop-
ment life-cycle.

In September 2013, parts 1, 2 and 3 were published and became official

International Standards. Part 4 was published in 2015 and part 5 at the
end of 2016. As of June 2018, no major revisions have occurred to the
five parts of the standard.

The first release in 2013 kicked off a whole lot of blog posts from testers
around the world against the standards. In August 2014 an online pe-
tition STOP 29119 was created [2] to suspend publication of parts 4
and 5, and to withdraw part 1, 2 and 3. The standard was considered
”dangerous” and to ”put focus on the wrong things”. A ”war” against
29119 was declared and it became evident that the standard had failed
to bring consensus and agreement.

It is outside the scope of this section to go into all the arguments and
reasons that were given for the petition. A nice overview of opinions
and viewpoints can be found here [103]. The answer of ISO Software
Testing Group (WG26) can be found here [1].

In this course we will not adhere to the standards but we will also not
oppose to use any of the definitions if we think that these are useful to
explain certain concepts (like we have done in the previous section on
errors, faults and failures).

1.5 Some famous cases of software failure

There are many examples of software failures that have reached the
news. We mention some of them below.

Wrong heart medicine prescriptions (2016)

Due to a software fault, at least 300,000 heart patients have been given
the wrong drugs or advice (see Figure 1.5). An IT system used in gen-
eral practitioner’s surgeries has been miscalculating patients’ risk of
heart attack since 2009! This could have been a typing mistake or the
incorrect use of some comparison operators.

5 http://www.istqb.org/
6 https://www.ieee.org/
7 https://www.iso.org/

24
Chapter 1 Introduction to software testing

FIGURE 1.5 Wrong calculations (top left Daily Mail [38], bottom left Tech-
nica [117], right The Telegraph [115])

ESA Ariane 5 flight 501 (1996)

The European Space Agency’s Ariane 5, with a budget of $7 billion, was
only about 40 seconds into flight, when, at an altitude of about 3700 m,
the guidance system shut down, and the rocket self-destructed [98, 128].
The software tried to force a 64-bit floating point number into a 16-bit
signed integer leading to a conversion error or type mismatch error.

FIGURE 1.6 Example of a millennium bug (right) De Gelderlander [46],

(left) RTL Nieuws [100])

Swedish pensioner, aged 104, allowed to go to Kindergarten (2016)

A Swedish woman born in 1912 got a message that she could go to
kindergarten, just like the rest of the children who were born in 2012
[114]! Mistakes like these are due to the incorrect representation of data,
such as a date. This was known as the millennium bug, or Y2K problem,
in the 1990s. However, last year, this kind of error once again hit the
headlines (see Figure 1.6), and this will not be the last case. For example,
in the year 2038, Unix systems will run into a problem [112] with the
signed 32-bit integer representation of time that is sufficient until:

03:14:07 UTC on Tuesday, 19 January 2038.

25
Universidad Politécnica de Valencia Structured Software testing

Gangnam Style breaks YouTube (2014)

YouTube developers built their platform on a 32-bit register, meaning
that YouTube could track a range of -2,147,483,648 to 2,147,483,647 val-
ues for its view counter (see Figure 1.7). "We never thought a video
would be watched in numbers greater than a 32-bit integer" they com-
mented on a blog post [129]. Gangnam Style made them upgrade to
64-bit integers.

FIGURE 1.7 Gangnam style breaks YouTube [113].

1.6 What do we test? We cannot test everything!

Taking into account the 5 phases of Beizer from Section 1.1, it seems
reasonable to say that the purpose of testing should be to provide as
much confidence as possible concerning the (good) quality of the soft-
ware. In that same section, we recalled Dijkstra’s famous saying that
testing cannot prove the absence of faults. Why is that? Let us look at
some examples to show how impossible it is to test software completely.

There are too many possible input values. Imagine a System Under Test
(SUT) that takes a simple date as input (day-month-year) and based on
the date makes a specific calculation. We need to take into account
that some months have 30 days, others 31, we need to think about leap
years, and so on. But if we want to test it completely, we have to enter

26
Chapter 1 Introduction to software testing

START

no Card yes no Card

ERROR < 3?
OK? max 3 times swallowed

yes
yes

no
PIN OK? ERROR Again?

yes no

Withdraw yes no
which < 4?
amount? max 4 times

yes End
transaction

Enough no New no
saldo? amount?

yes
no
Again?

yes
Give money

yes
< 5?
max 5 times
no

Card back

END

FIGURE 1.8 Flow-diagram illustrating a possible cash dispenser

all the possible dates that exist one by one to check there are no fail-
ures. Moreover, maybe we should also enter wrong dates to see if the
program handles those adequately. That is an almost impossible calcu-
lation and would take a lot of time.

There are too many sequences through a piece of software. A sequence,

or path, through a program are all steps (clicks, scrolls, types of text, in-
serting data, decisions, events, etc.) that you make until you are done
with a task or a program. For example, in Figure 1.8 you can see a
flow-diagram of a cash dispenser. There are over 25 million paths from
START to END in this diagram. Imagine it costs 5 minutes to think of
a test case for such a path, execute it and evaluate it. Then this comes
down to 2,164,655 hours or 54,116 working weeks of 40 hours. Counting
42 working weeks in a year that means we need 1288 years! Obviously,
that is way too long. Besides, flow diagrams as simple as the one we
see here only appear in instruction books. In reality, the flow-diagram
for a cash dispenser is far more extensive, with even more paths.

EXERCISE 1.4
Above we indicate that there are more than 25 million possible paths
from START to END in Figure 1.8. Can you come to the same estimation?

27
Universidad Politécnica de Valencia Structured Software testing

There are too many combinations of inputs, paths and/or platforms.

Think for example about:
• browsers (Firefox, Chrome, Internet Explorer, Safari, etc.)
• operating systems (Windows, Linux, macOS, Android, iOS, etc.)
• and their versions (Windows XP, Vista, Windows 7, Windows 8, Win-
dows 10, etc.)
• or distributions (Suse, Redhat, Ubuntu, Debian, etc.)
• and patches
• and updates
Many applications need to work on any platform, with any browser.
Even if we forget about the “etc.” in the enumeration above, and even
if we only consider Windows, there are still 4 × 5 combinations of a
browser and a version of Windows.

The examples above show that it is impossible even for very simple ap-
plications to capture all cases through testing, let alone real applications
that are many times more complex than the examples given above. And
this example was only about testing some functionality! What about
usability, portability, performance, etc! There are many other quality
characteristics that need to be tested (we will list more in Section 1.10.3).
And we are not there yet: software applications are becoming increas-
ingly complex and the number of devices and systems on which we can
use them is growing at lightning speed. Testing is therefore becoming
increasingly complex.

So the problem is clear. Software contains errors. We can prevent some

errors by working in a structured and neat manner. But software is
too complex and people are not perfect thinkers, so it will never be er-
rorless [47]. Testing is therefore always necessary. However, there is
too much to test everything completely: different aspects of software,
at different levels, with many test values and test sequences and their
combinations lead to infinitely many possible test cases. In addition,
we can never know what kind and how many errors there can be in a
software application.

Testing is the search for an unknown number of unknown errors in a software

application, by choosing a limited number of test cases from an infinite number
of possibilities.

That does not sound easy and, indeed, it is not. So, how can we handle
that?

1.7 What do we test? Test case design techniques

Test cases are important artefacts in testing, and, as you will expect by
now, there are no two sources that use the same definition of “test case”.
We use the following definition.

DEFINITION 1.5 A test case contains all information necessary to guide the execution of
a particular test.

28
Chapter 1 Introduction to software testing

To guide the execution of a test we need (1) to prepare for the test, (2)
execute the test and (3) verify whether the outcome encountered a fail-
ure or not. The third and last part is related to something that in the
test world is called an oracle. This terminology was coined in 1978 by
William Howden [53] and comes from mythology: "the one that knows
all the answers". An oracle in testing is the mechanism you use to de-
cide whether the test case output is correct or not. Later in this course,
we will talk about the oracle problem.

How do we come up with suitable test cases? Test case design requires
information about:
• what the SUT should do (for example what functionalities it should
have);
• how the SUT implements those functionalities;
• how people will (or should) use the SUT;
• et cetera.

How do we get this information? We explore. This includes question-

ing, studying, observing and inferring.

Based on this information we make some sort of model. These models

can be simple informal mental models to comprehend the system and
think about test cases. However, they can also be more mathematical, or
formal, models. The degree of formality can vary and increment from
simple intervals or domains to decision tables, flow diagrams, up to
state transition graphs.

Exploratory skills are needed to make good models. In other words,

making good test models needs creativity; it is not just about applying
the techniques [4]. From this perspective, all testing can be considered
to be both model-based and exploratory.

Techniques or methods used to specify, design or identify test cases go

under all possible combinations of these words, for example, test case
specification techniques, test case design methods, test case identifica-
tion methods, et cetera. Basically, we are talking about ways to make
test cases.

EXERCISE 1.5
Imagine we want to test a permanent marker of the brand edding 400.
On the website of the manufacturer (http://www.edding.com/), the
characteristics of this marker are described as follows.

For permanent labelling of almost all materials, e.g. paper, cardboard,

metal, glass and plastic. It is waterproof, quick-drying, low-odour,
lightfast and wear-resistant. Other product characteristics:
• Use: Ready to use
• Stroke width: Approx. 1 mm
• Refillable: Yes
• Replacement nibs: Yes

Design test cases for testing the “ready to use” property and the “quick-
drying” property of this marker.

29
Universidad Politécnica de Valencia Structured Software testing

1.8 What is the best set of test cases?

What is the best set of test cases? It depends on the circumstances, and
there is no single right answer.

Whether a set of test cases, or a test suite, is good or bad depends on

many contextual circumstances that boil down to:
• the risks involved;
• the costs to avoid those risks.

There are no perfect solutions to test case design, just test suites that are
better or worde depending on the context in which a SUT is used and
the trade-off between the risks and the costs.

This context-dependency, the inherent exploratory nature of testing to-

gether with incomplete specifications leads to different testers coming
up with different models and different test suites for the same problem.

To illustrate this let us consider the classical triangle problem. In one

of the first books on software testing from Glenford Myers [83] (now
already in 3rd edition [84]), a relatively simple triangle program is used
to show that even the testing of a seemingly simple program is not an
easy task. He starts his book with a self-assessment test for the reader
asking them to write a set of test cases – specific sets of data – to test a
relatively simple program properly. The description of the program is
as follows: [84, p2]:

The program reads three integer values from an input dialogue. The three
values represent the lengths of the sides of a triangle. The program displays a
message that states whether the triangle is scalene, isosceles, or equilateral.

Since then the triangle problem has been used by many to make a point
about testing in general or a technique in particular. Black even com-
pares it to Rorschach inkblots tests for test professionals [18]. Others
[65, 34] joke that a book on software testing is not complete without a
discussion of the triangle problem. We do not want this course text to
omit this classic example. Here, however, we use it to show that the ex-
perts all differ on how many test cases are needed to test this problem
adequately.

Myers [84] indicates about 20 test cases stating that these do not guar-
antee that all possible errors would be found. Jorgensen [57] lists about
125 test cases for this problem when applying boundary value analy-
sis (a test case design technique we discuss later in this course), plus
an additional 11 after applying decision table analysis (also explained
later on this course). Binder [14] listed 65 tests for the triangle problem,
addressing several new dimensions of risk, such as potential errors aris-
ing if you try to repeat the test more than once. Ammann and Offutt [4]
come up with 64 test cases resulting from all combinations of equiva-
lence classes. Black [18] lists 28 test cases but he indicates in a footnote
that his reviewers came up with many additional things that could be
tested.

30
Chapter 1 Introduction to software testing

Collard [35] states that due to how the exercise is specified and the lack
of context, he can argue that 4 test cases are adequate. However, would
the context change to a program used within a NASA space shuttle it
would be an entirely different situation. Space shuttle software, for ex-
ample, needs to compute the shape of triangles as part of its orbital
navigation. A NASA mission is life-critical: if the orbital navigation is
wrong, the consequences could be disastrous. However, the triangles
computed on the space shuttle are curved because they are based on
the shape of the Earth’s surface. Most testers of the triangle program
have assumed a flat surface without really thinking about it.

1.9 How can we evaluate the quality of a test suite?

So, we never know how many and what errors there are in a system. (If
we would, testing would no longer be necessary.) Moreover, knowing
which of these errors would be the most important to find depends on
the context in which the SUT is executed. However, we still need to be
able to evaluate the test suite somehow. Since there is no way to have
the real measure of quality, surrogate measures will have to suffice, i.e.
measures of which we know, think or hope that they correlate to the
real measure. In testing these measures are related to different kinds of
coverage criteria or mutation scores.

1.9.1 Coverage criteria

Coverage criteria indicate how much of the specific parts, behaviour

or characteristics of a SUT have been exercised (i.e. covered) during a
test. To define a coverage criterion we need a model of the software. In
this course, we consider different test models (e.g. equivalence classes,
decision tables, classification trees, graphs, state machines, etc.) and
their corresponding coverage criteria.

Here we discuss a very commonly used and easy to understand crite-

rion that uses the source code to define coverage.

1.9.1.1 Code coverage criteria

Code coverage criteria give an idea of the percentage of code that has
been executed by our test suite. Different criteria are defined based on
whether specific code constructs are executed or not during the tests.
For example:
Statement coverage: The percentage of statements that have been exe-
cuted by our tests. (This is also sometimes called instruction coverage
or block coverage.)
Branch coverage: The percentage of branches that have been executed
by our tests. For example, given an if statement, have both the True and
False branches been executed? This is also sometimes called decision
coverage.
Condition coverage: The percentage of Boolean sub-expressions present
in the guards of a branch that have been evaluated to both True and
False during our tests. (This is also sometimes called predicate cover-
age.)

31
Universidad Politécnica de Valencia Structured Software testing

Multiple condition coverage: The percentage of all possible True-False

combinations of sub-expressions present in the guards of a branch that
have occurred during our tests.

Consider the program oddOrPos from Figure 1.3. And consider the
following test case:

TC1 = (call: oddOrPos([1,2,3]), expected result: 3)

This gives rise to:

• 100% statement coverage.
• 50% branch coverage. Only the True branch of the if statement is
executed.
• 75% condition coverage. There are two sub-expressions:
sub-expr1 = (x [i ]%2 == 1)
sub-expr2 = (x [i ] > 0)
For condition coverage, each should be evaluated to both True and
False. For TC1 we have:

sub-expr1 sub-expr2 obtained

True when x [0] and x [2] are investigated
False when x [1] is investigated
True when x [1] is investigated
False never for TC1

Note that due to short-circuiting in Java, sub-expr2 will not be exe-

cuted when sub-expr1 is True. So the True value for sub-expr2 is only
obtained when x [1] is investigated and not when x [0] and x [2] are
investigated (although sub-expr2 would evaluate to True for these
cases).
• 66.6% multiple condition coverage. Of all the possible 4 combina-
tions of truth values of these two expressions, only 3 tests are needed
for 100% multiple condition coverage due to short-circuiting in Java:

sub-expr1 sub-expr2
True not evaluated
False True
False False

The test case executes two of these three (i.e. 66.6% of multiple con-
dition coverage):

sub-expr1 sub-expr2 obtained

True not evaluated when x [0] and x [2] are investigated
False True when x [1] is investigated
False False never for TC1

To increase the code coverage we add a non-positive number to our

array:

TC2 = (call: oddOrPos([1,2,3,-4]), expected result: 3)

This test case gives us:

• 100% statement coverage.

32
Chapter 1 Introduction to software testing

• 100% branch coverage. Now the True as well as the False branch of
the if statement is executed.
• 100% condition coverage. Now sub-expr2 is False when x [3] is inves-
tigated.
• 100% multiple condition coverage.

sub-expr1 sub-expr2 obtained

True not evaluated when x [0] and x [2] are investigated
False True when x [1] is investigated
False False when x [3] is investigated

Using the code to define coverage criteria is easy. Moreover, tools ex-
ist to automatically determine coverage for different programming lan-
guages. However, care should be taken when interpreting code cover-
age. A 100% code coverage does not mean there are no errors left. The
last examples show that with 100% statement, branch and (multiple)
condition coverage, the error that resides in the code (from Exercise 1.3)
has not been found.

1.9.2 Mutation scores

Another surrogate measure to determine test suite effectiveness origi-

nates in mutation testing. Mutation testing is a fault-based testing tech-
nique which provides a criterion called the mutation score. The mutation
score can be used to measure the effectiveness of a given test suite in
terms of its ability to detect faults.

The general principle underlying mutation testing is that the faults are
deliberately inserted (this is called seeding) into the original program.
This can be done for example by changing a simple syntactic construct
(e.g. replacing an == with an =, or an && with an kk). The faulty pro-
grams created this way (one deliberately inserted fault per program)
are called mutants. To assess the quality of a given test suite, the mu-
tated programs are tested with the test suite to see if the seeded faults
are detected. The mutation score is the percentage of mutants that are
detected by the test suite.

Mutation testing will be discussed in Chapter 10.

1.10 Levels of testing

A basic concept of the software development process is Software De-

velopment Life Cycle (SDLC) models. The SDLC is a continuous pro-
cess, which starts from the moment the decision to launch the project
is made, and ends at the moment of its full removal from exploitation.
There is not one single SDLC model. They are divided into main groups,
each with its own features and weaknesses. You probably know the wa-
terfall model, the V-model, the iterative model, the spiral model and the
agile model. Testing is obviously part of all these models and should be
done at various levels during software development. Below we discuss
levels at which testing should take place.

33
Universidad Politécnica de Valencia Structured Software testing

1.10.1 Unit testing

Programmers should not only implement their programs. They are also
responsible for testing their code extensively. This type of testing at the
code level is called unit testing or component testing. Programming and
testing in this phase are closely intertwined. Some programmers feel
they do not have time to write unit tests, but actually, you do not have
the time not to write them. Not writing unit tests will ultimately lead to
errors in your code that will be difficult to find and then you will have
to spend many long hours in the debugger trying to find out where they
came from. Writing unit tests will significantly reduce your debugging
time.

The way we write the tests can be roughly divided in test-first, test-last
or test-whenever. Test-first development, also known as Test-Driven De-
velopment (TDD) is a software development style in which you write
the unit tests before you write the code to test. Test-last means you
write the code first and then you write tests for it. Test-whenever means
sometimes you write the code first and sometimes you write it last.

There are pros and cons for each of them, and people in favour of and
against each of them. However, this is beyond the scope of this course,
since we concentrate on how to make test cases (not when).

1.10.2 Integration testing

After each unit has been tested, units can be integrated into subsystems
that will eventually form the entire system. When two or more units
are integrated into a subsystem we need to test that they work together
or communicate with each other in the desired way: this is called inte-
gration testing. During integration testing, we mainly look for defects in
the use and implementation of the interfaces that the units offer.

Which units and subsystems we consider, what their interfaces are and
how they can communicate with each other is described in the design.
For example, consider the units (or components) and their dependen-
cies on other units in a software system as depicted in Figure 1.9. How
can we integrate these and test the interfaces? Roughly there are three
ways: bottom-up, top-down and big bang. We describe each of these
below.

Bottom-up integration
The first integration strategy is bottom-up integration. We test the units
M1 to M7 independently with unit tests and then start with integra-
tions A or C (see Figure 1.10). To test M1 to M7 independently we need
to write a driver for each one of the components since we do not want to
include components higher up in the tree already in the testing. Con-
tinuing with integration A, for instance, we need to write another driver
for M8, because we do not want to include M9 in the test yet (the inte-
gration M8 - M9 still needs to be tested). In other words, integration A
is tested in isolation from M9. This is depicted in Figure 1.11.

34
Chapter 1 Introduction to software testing

System
M11

M9 M10

M1 M2 M3 M4 M5 M6 M7

uses

FIGURE 1.9 Units and their dependencies in a tree

System
M11

Integration B Integration C

Integration A M9 M10

M1 M2 M3 M4 M5 M6 M7

FIGURE 1.10 Bottom-up integration

A driver can be defined as a piece of code that calls other code mod-
ules, units or components. For any component you want to test, it is
important to have a program that calls it. The test drivers functionally
simulate the behaviour of upper-level components, which are not part
of the integration yet. Such a driver acts as a temporary replacement
for the calling component: it supplies the same input to and receives
the output of the lower level component.

When integration A has finished, we can proceed with integration B.

Finally when integration C has finished, we can test the entire system
from M11 (we reached the top, starting from the bottom).

Bottom-up integration has advantages and disadvantages:

advantages
• it is easy to do
• you can start testing early
• you can test integrations in parallel (like integrations A and C)

disadvantages
• the visibility of the system (M11) comes late for the client
• writing drivers can be costly (imagine for example the kind of driver
we would need for M9 and M10)

35
Universidad Politécnica de Valencia Structured Software testing

Driver

Integration A

M1 M2

FIGURE 1.11 Driving the functionality of M8 to do integration A

Top-down integration
The second integration strategy is top-down integration. We start from
the top with integration A from Figure 1.12.

Integration D

Integration C

Integration B
Integration A
System
M11

M9 M10

M6 M7

M1 M2

M3 M4 M5

FIGURE 1.12 Top-down integration

Note that to do this integration, we have to start testing M11 in isola-

tion. To do this, we need test doubles for M9 and M10. Subsequently, to
test integration A (the integration of M11 with the real implementations
of M9 and M10), we need test doubles for {M8, M3, M4, M5} (for M9)
and test doubles for {M6, M7} (for M10, see Figure 1.13).

M10

Double Double
for M6 for M7

FIGURE 1.13 To test M10 we need test doubles pretending to be M6 and

M7.

36
Chapter 1 Introduction to software testing

Test double [79] is a generic term for any pretend object used in place of
a real object for testing purposes. The name comes from the notion of a
stunt double in movies. There are different names and kinds of doubles:
dummy objects, fake objects, stubs and mocks.

Top-down integration also has advantages and disadvantages:

advantages
• early visibility of the system (M11) to the client

disadvantages
• it is difficult to do
• writing test doubles can be very costly

Big bang
The third and last integration strategy is big bang: we just put it all to-
gether and test at M11 level. Although this is not very systematic, there
are contexts in which this approach is justified. For example when the
SUT is small, when the SUT is stable and we only added some com-
ponents, or when the SUT is monolithic, meaning that the dependency
between the components is so strong that testing them separately will
be almost impossible (e.g. drivers or doubles will be almost a copy of
the real component).

Advantages and disadvantages of big-bang integration are:

advantages
• it is easy to do
• it is fast (no drivers, no doubles)

disadvantages
• it is difficult to localise the defects that caused failures

1.10.3 System testing

When all components have been integrated and the system is complete,
we can start with system testing. System tests are not restricted to func-
tionality testing: at this stage of testing all kinds of non-functional prop-
erties also need to be tested and also all types of different configura-
tions. Here we can test some of the quality characteristics we have seen
in Section 1.2 and Figure 1.1. For example:
• performance;
• security;
• usability;
• accessibility;

On each of those an entire separate course can be written. In the next

sections we will give short overviews.

Performance testing

37
Universidad Politécnica de Valencia Structured Software testing

Performance testing determines whether the system meets the speci-

fied requirements concerning performance, for example, the respon-
siveness, the throughput, the usage of resources and the availability
of the software and the capacity of the infrastructure.

During performance tests, you can collect data about the number of
virtual users, hits per second, errors per second, response time, latency
and bytes per second (throughput), as well as the correlations between
these. Through the reports you can identify bottlenecks, bugs and er-
rors, and decide what needs to be done. Some specific examples of
performance testing are:
• Load testing is a specific type of performance testing, that consists
of constantly and steadily increasing the load on the system to de-
termine whether it meets the specified requirements concerning the
threshold limit of load it can take.
• Stress testing is also a specific type of performance testing, which ex-
amines how the system behaves under intense loads (i.e. stress), and
whether (and how) it recovers when going back to normal usage.
Note that these characteristics should also be part of the specified
requirements.
• Spike testing tests an application with extreme increments and decre-
ments in the load (i.e. spikes of load).
• Soak testing consists of testing the performance and the stability of the
system over a long period of time.

Security testing

Security testing intends to uncover vulnerabilities of the system that

could compromise security (typically divided into confidentiality, in-
tegrity and availability of data and resources) and privacy (typically:
information about subjects) that should be protected from possible in-
truders. Mechanisms that implement security, like authentication, au-
thorization, non-repudiation et cetera, should be tested.

An example of security testers are ethical hackers. The difference with

malicious hackers is that ethical hackers ensure minimal impact on the
hacked systems and disclose their findings to the affected party, while
malicious hackers intrude systems for other reasons, e.g. for profit or
self-serving ends. Ethical hackers are sometimes hired by companies to
test their security. Such a test is called a penetration test or pentest.

Usability testing

Usability testing is about testing the user experience related to under-

standability (How easy or hard is it to understand how to use the soft-
ware?), learnability (Are there any obstacles when learning the software
system?) and operability (How easy or hard is it to use the software for
executing the tasks we need to?).

Since usability is about users, testing needs to be done with real users
performing real tasks with the system. While they use the software,
performing an assigned task, they are observed (and sometimes also
recorded and eye-tracked) to detect any usability problem. A myriad of
books have been written on usability testing, for example [101, 68, 87].

38
Chapter 1 Introduction to software testing

Since this type of testing with real users is expensive and time-consuming,
we could also opt for a usability inspection where checklists and heuris-
tics are used by experts to give an opinion about the usability of a prod-
uct. The most well-known heuristics are from Jakob Nielsen [86].

Accessibility testing

Accessibility testing is a subset of usability testing that is about check-

ing whether the SUT can be used well by persons that have, for instance,
hearing loss, are colour-blind or of old age. It has much clearer require-
ments than general usability testing, related to the type of disability and
the assistive technologies that are used.

For example, speech recognition software is an assistive technology that

converts spoken word to text. We need to test whether our SUT cor-
rectly processes this as input. The same holds for special keyboards
with Braille or for other motor control difficulties.

Screen reader software reads out the text that is displayed on the screen.
We need to test whether the graphical user interface of our SUT reveals
enough information for this assistive technology to work properly and
be able to read out sufficiently detailed information to its user such that
the latter can understand and act.

Configuration testing

Configuration testing is testing the SUT on various combinations of soft-

ware and hardware (i.e. configurations). For example, operating sys-
tems (versions of Windows, Linux, macOS, et cetera), browsers (Inter-
net Explorer, Safari, Chrome, et cetera), different versions of compilers,
processors, peripherals (e.g. printers, modems), varying amounts of
memory available, et cetera.

When testing a specific configuration we can test the functionality, but

also the performance, security, usability and accessibility.

Related to configuration testing is portability testing, where we test whether

we can effectively and efficiently port (transfer) the SUT from one con-
figurated usage environment to another.

Also related is installability testing, where we test the installation pro-

cess of our SUT on different configurations.

Also, localisation testing is a form of configuration testing, where we test

our SUT on different local culture or settings per language and coun-
try. Think of the decimal point/comma, 24 hours versus AM/PM time,
different alphabets, et cetera.

1.10.4 Acceptance testing

Acceptance testing comes after system testing, and is the level of testing
whether it is acceptable to release or ship the software. The definition of
acceptable depends on the previously defined acceptance criteria. Usu-
ally. these are the tests that are executed by the users, customers or the
project managers to determine whether or not to accept the system.

39
Universidad Politécnica de Valencia Structured Software testing

Other terminology used around acceptance testing differs mostly on

who is doing the testing:

Alpha testing is acceptance testing done internally in the organisation

that develops the software (i.e. by product management, sales and/or
customer support).

Beta testing is acceptance testing done by the end users of the software.
They can be the customers themselves or the customers of the cus-
tomers. This is also known as User Acceptance Testing (UAT).

1.11 And after we change something: regression testing

Software systems are continuously modified. We change existing func-

tionality, we add new features, we correct errors, et cetera. Regression
testing consists of re-testing the software after it has been modified to
make sure that the change did not break any existing functionality. Its
purpose is to find errors that may have been introduced as a result of
the modifications.

As the software evolves, a software test suite tends to increase in size,

which frequently makes it expensive to execute. Research shows regres-
sion testing is an expensive process that may require more than 33% of
the cumulative expenses of the software [127, 63].

Consequently, regression testing should be automated. There are too

many tests to be rerun for a manual approach to be effective. Testers
will get bored and lose attention if they have to enter the same tests
over and over again. This will ultimately result in no regression testing.
Test automation frameworks need to be used for this.

Which tests to choose for regression testing? There are typically three
different approaches for choosing the regression tests:

Retest All: retest all the tests in all test suites. Even if the majority of
these tests are automated, this is expensive because it can take a consid-
erable amount of time. Moreover, it is not always necessary.
Test Case Selection (TCS): Rather than taking all tests, this method
proposes choosing a representative selection of the test suites. Repre-
sentative meaning that it gives the desired coverage. This selection can
be based on, for example, only the riskiest use cases, only tests that be-
long to the changed code, only tests that test the most complex code, et
cetera.
Test Case Prioritisation (TCP): With a limited set of test cases, it is
ideal to prioritise those tests. Test case prioritisation aims to order a
set of test cases to achieve an early optimisation based on preferred
properties (risk, coverage, etc.). It gives the ability to execute highly
significant test cases first according to some measure, and produce the
desired outcome, such as revealing faults earlier and providing feed-
back to the testers. It can also help to run the crucial tests first if time is
running out.

40
Chapter 1 Introduction to software testing

1.12 Test project management

So testing is the search for an unknown number of unknown errors in a

software application, by choosing a limited number of test cases from
an infinite number of possibilities. Moreover, errors can occur at different
levels of the software (in units, during the integration of units, or at the
system level) and at the system level many different types of character-
istics can cause these errors.

This is a difficult journey for which only designing and executing test
cases is not enough. Testing software is an engineering activity, and like
software development, it should be managed using well-established
test project management processes. Test project management, or just test
management, is only indirectly visible to the rest of the development
team. We are talking about processes like:
• test planning;
• defect tracking and reporting;
• controlling and monitoring the test status;
• setting-up and controlling the test environment (tools, databases, plat-
forms, configurations, et cetera);
• test organisation: roles and staff;
• continuous improvement.

We will consider these things later on in this course in Chapter 14.

1.13 The philosophy of this course

A myriad of books, articles and opinions about software testing exist.

As you have seen in this first introductory chapter, there is a lot to write
and learn about testing. We can put emphasis on processes, frame-
works, documentation, different levels of testing, test automation, soft-
ware quality, test assessment, et cetera.

In this course we put the emphasis on test case design. We consider this
to be one of the most challenging and important parts of testing, since
the test cases you execute determine the quality of your testing. We con-
centrate mainly on functionality testing: we focus on the functionalities
and features that the software should offer to its users. We leave testing
the non-functional properties to another course.

Looking for a course book that fits our goals, we were confronted with
the fact that two schools of software testing exist that have contradic-
tory views on how testing is done best:
• The analytical school, where the emphasis is on better testing by
improved precision of specifications and many types of models, i.e.
model-based testing. This school has many proponents in academia.
• The context-driven school, where emphasis is on better testing by
adapting to the circumstances under which the product is developed
and used, by exploring, learning and questioning, i.e. exploratory test-
ing.

We believe that both schools are right:

Testing should be both model-based and exploratory.

41
Universidad Politécnica de Valencia Structured Software testing

As we explained before, test case design requires information about the

SUT that we get through exploring, questioning, studying, observing
and inferring. Based on this information we make some sort of model
that helps us design a good set of test cases.

Agreeing with two different schools and combining their views, while
they have been discussing for years, means that none of the existing
books on software testing fit our philosophy and the goals of this course
perfectly.

Therefore, we have written our own course book on software testing/test

case design. In this course book, we frequently refer to other existing
books and articles. This is the common way of referencing the people
from whom we borrow terminology, techniques, definitions, examples
and exercises. You do not need to look up and read these books and
articles while studying this course. In the few cases when we do want
you to read an article, we say so explicitly.

This course is intended to be a mix of theory and practice. The theory

is explained in the chapters, containing mathematical background for
model-based testing techniques, coverage criteria, overviews of testing
practices, and so on. Our focus is not on the mathematics but on pro-
viding good starting points from which a good set of test cases can be
made. The theory provides some well-known techniques, mathemati-
cal or otherwise, to help us with that. However, these techniques are by
no means the only ones we can use: everything you can think of that
can help you make better test cases is allowed.

And this brings us to the second element of this course: practice. The
only way to become a good tester is to apply the theory and your own
knowledge, intellect, experience and creativity to lots of examples. So
even if you have read and understood a particular technique, and then
encounter an exercise that seems to be very simple, or just another case
of the same problem, we strongly advise you to do the exercise. You may
be surprised by the fact that you did not yet know everything there was
to know about that particular technique.

Apart from practising on relatively small exercises, for which it is clear

beforehand what technique to apply, we also provide a few larger prac-
tical exercises, which are an essential part of this course. Here, you can
experience the exploratory nature of software testing, as well as apply
everything you have learned about the model-based side of it (if you
deem it to be applicable to the problem at hand, that is).

To make sure that the textbook contains a good mixture of techniques

from all schools and also from industrial practitioners, we have worked
together with an international and diverse group of people with exten-
sive experience in testing:

James Bach (United States of America) James Bach is a software tester,

author, trainer and consultant. He has been testing software and con-
sulting for 32 years. He is a proponent of exploratory testing and the
context-driven school of software testing [61, 60]. By many people he
is considered a real guru in software testing. In his autobiography [8]

42
Chapter 1 Introduction to software testing

he reports that he worked as a software testing manager for Apple and

Borland after dropping out of high school. He also programmed Apple
II and Commodore 64 ports of various titles for Spinnaker Software.
Since 1999, he has worked as an independent consultant at Satisfice8 .

Eduardo Miranda (United States of America) Dr. Eduardo Miranda has

an extensive industry experience as a software developer, project leader,
and manager. Before joining Carnegie Mellon University he worked
for companies such as Ericsson (1996–2005) and Lockheed Martin (1991
–1996) and taught courses at universities in Argentina and Canada,
where he lived for nearly 20 years. He is the author of ”Running the
Hi-Tech Project Office” [81], a handbook for setting up project manage-
ment offices based on his experience in this area while at Ericsson, and
has contributed a chapter to the book ”Introduction to Combinatorial
Testing” [70]. He has authored numerous articles on the use of Petri
Nets in software development, requirements analysis, release planning,
the use of reliability growth models in project management, estimation
techniques, and the calculation of contingency funds for projects.

Gordon Fraser (Germany) Prof. Dr. Gordon Fraser is a Full Professor

in Computer Science at the University of Passau. He received his PhD
from Graz University of Technology, Austria, in 2007, then worked as a
post-doc researcher at Saarland University, Germany, and as a (Senior)
Lecturer at the University of Sheffield until 2017. He has published on
improving software quality and programmer productivity at all major
software engineering venues and has received six ACM SIGSOFT Dis-
tinguished Paper Awards.

Theo Ruys (The Netherlands) Dr. Theo Ruys studied Computer Sci-
ence at the University of Twente in Enschede. In 2001 he obtained his
PhD within the Formal Methods and Tools (FMT) group at the same
university. For ten years, he worked within the FMT group as an assis-
tant professor. His main research topics were the effective use of model
checkers in general and the architecture and construction of software
model checkers. In 2015 he joined Axini, a small software company
specialized in model-based testing, founded in 2008 and located in Am-
sterdam. Axini has developed the Axini Modeling Suite (ATM), an ad-
vanced tool for automatically testing reactive systems. ATM has been
built on the theory of Jan Tretmans and Ed Brinksma, both from the
FMT group of the University of Twente. Hence, demostrating a good
example from academic test theory to practice. At Axini, Theo Ruys has
the roles of test architect and software engineer, and he is concerned
with the effective use of model-based testing.

Jan Tretmans (The Netherlands) Dr. Jan Tretmans holds a degree in

Electrotechnical Engineering and a PhD in Computer Science, both from
the University of Twente. He is a researcher at TNO ESI (Embedded
Systems Innovation), and also a part-time associate professor at the
Radboud University, Nijmegen. His work includes modelling, verifi-
cation, and testing of embedded systems and software. He particularly
worked to combine modelling and testing to automatically generate test
8 https://www.satisfice.com/ (Last visited 19-09-2019)

43
Universidad Politécnica de Valencia Structured Software testing

cases from a model of system behaviour, also referred to as model-based

testing. He is the inventor of the ioco-testing theory for labelled transi-
tion systems. This theory combines concepts from process algebra, the
theory of testing equivalences, symbolic transition systems, algebraic
data types, satisfaction-modulo-theories tools, and equational reason-
ing. He has many publications in this field, and gave numerous presen-
tations at scientific conferences as well as to industrial audiences.

44
Chapter 1 Introduction to software testing

45
Chapter contents 2

Testing is model-based

Overview 47

1 Different kinds of models 47

2 Model – coverage criterion – test cases 50
3 Black-box or white-box? Model-based! 50
4 Exploring to get useful models 51

46
Chapter 2

Testing is model-based

OVERVIEW

As we explained in Chapter 1, in this course we consider all testing to

be both model-based and exploratory. To design test cases we need in-
formation about what the SUT should do, or how the SUT implements
it, or how people (or machines) will (or should) use the SUT, et cetera.
We get this information by exploring, as explained in Chapter 3. Based
on this information we make a model. In this chapter, we will explain
what a model is and how models can range from simple, informal men-
tal models to formal mathematical models.

LEARNING GOALS
After studying this chapter, you are expected to:
– realise that models are all around us
– be able to explain that models can have different levels of details,
abstraction and formality
– be able to explain why we do not speak about black-box or white-
box testing in this course
– be able to explain the ‘’model – coverage criterion – test cases”
structure of test case design techniques in this course.

CONTENTS

2.1 Different kinds of models

In our daily lives and in science we continually use models to reflect

the complex reality [122, 102]. A model of something is a representation
that shows what it looks like or how it works. In Figure 2.1 you see two
examples. Both models lack a lot of details, yet they are useful to reason
about day and night, how we travel from one country to another, or
different temperatures on different planets. A model shows the entities
that are relevant to the problem at hand and the relations between them.
It abstracts away from everything that is not deemed necessary and is
therefore almost always a simplification of reality.

In software testing, this works in the same way. We use models, with
different levels of abstraction, different levels of details, and different lev-
els of formality. Lacking details, formality and abstracting away from
maybe difficult aspects makes all models wrong [23]. Modelling some-
thing implies making simplifications about the real world which we
know are false but which we believe may be useful anyway.

47
Universidad Politécnica de Valencia Structured Software testing

FIGURE 2.1 Models: useful simplifications of reality.

FIGURE 2.2 Writing an informal model on a napkin.

All models are wrong, but some are useful, is a famous quote often at-
tributed to the British statistician George E. P. Box [25].

In this textbook, we will consider models to be useful for testing, pro-

vided that it gives you insight into the SUT or helps you structure your
mind so that you can choose good test values.

Let us reconsider the case of getting money from a cash dispenser. On

a napkin, we can quickly make a diagram like the one in Figure 2.2
that illustrates the process. This is a very simplistic model. It is not
formal, since we do not have a precise semantics for the boxes and the
arrows. The model abstracts away a lot of details about what should
happen if the pin code is incorrect, or how many tries the user gets
in typing another amount he or she wants to withdraw. You can, for
example, compare this model to the one we saw in Chapter 1, which for
convenience is shown again in Figure 2.3. You can see the difference in
details, abstraction and formality.

48
Chapter 2 Testing is model-based

START

no Card yes no Card

ERROR < 3?
OK? max 3 times swallowed

yes
yes

no
PIN OK? ERROR Again?

yes no

Withdraw yes no
which < 4?
amount? max 4 times

yes End
transaction

Enough no New no
saldo? amount?

yes
no
Again?

yes
Give money

yes
< 5?
max 5 times
no

Card back

END

FIGURE 2.3 Flow-diagram illustrating a cash dispenser from Chapter 1

However, we can still use the simple model from Figure 2.2 to make
some test cases.

In the informal model from Figure 2.2 there are two very different paths
possible: in one, the client gets his/her money, in the other, he/she
doesn’t get the money. So this would result in only two test cases. One
test case where we get money because our bank account balance is
enough, and another test case that results in no money because we do
not have enough balance.

In Chapter 1, we have seen that the model from Figure 2.3 has over 25
million possible paths. Since it is impossible to cover those entirely, we
need to choose test cases. Both paths from Figure 2.2 are also present
in Figure 2.3, but due to the many details there, they are much harder
to spot. So in this case, the simpler model helps us to see the bigger
picture better. However, there is evidently a lot more to test: simple
models typically result in simple test suites.

49
Universidad Politécnica de Valencia Structured Software testing

Make a model

Pick a coverage criterion

Design testcases

FIGURE 2.4 Test case design is based on a model

2.2 Model – coverage criterion – test cases

In this course, you will see all kinds of models: equivalence classes,
decision tables, classification trees, graphs, and state machine models.
Each model gives rise to a different test case design technique. And
which model you choose to describe a specific test situation: depends....
It depends on how you look at the test problem, it depends on the infor-
mation that is avaliable,it depends on the characteristics that you want
to test, it depends on the time you have for testing, it depends on your
experience, it depends on the risks, et cetera. Most of the times different
models are possible, some of which may turn out to work better than
others. However, some kind of model is always needed to create at least
a semblance of order out of the chaos that a SUT is with all of its possible
input values and all its possible use cases.

Then, when we have a model, we will try to make test cases that will
cover it according to some coverage criterion.

All test techniques that we will see in this course basically come down
to (see Figure 2.4):
1 make a model;
2 pick a coverage criterion;
3 generate test cases based on applying this criterion to the model.

2.3 Black-box or white-box? Model-based!

A confusing thing in software testing literature about test case design

techniques is that people try to categorise these into specification-based
versus code-based, or similarly into black-box versus white-box testing.

Whether a technique is specification-based (or black-box), or code-based (or

white-box) only depends on where the information used to make the
model came from.

In this course, you will see a technique called equivalence class testing.
The model underlying this technique consists of equivalence classes of
the domains of the relevant input variables. These equivalence classes
can be derived from specifications of the SUT’s functionalities, but these
classes can also be determined at the program code level for each func-
tion or procedure.

50
Chapter 2 Testing is model-based

So how should we classify this technique within the white-box or black-

box categorisation of techniques? It is both white-box and black-box, since
we can use this technique at the specification level as well as at the code
level. Consequently, we prefer to categorise techniques by the models
we use to generate the test cases.

In this course, we will consider different kinds of models that we can

make from the information available about the SUT and that can subse-
quently be used for testing. Since testing this way begins with exploring
and modelling, the choice of an appropriate technique and the usefulness
of the resulting model determine the ultimate success of the associated
testing. These are three things that you will learn in this course: how to
explore, how to know which models to pick, and to make useful mod-
els.

2.4 Exploring to get useful models

While studying test design techniques in this course, you will encounter
a lot of examples explaining some technique. It is highly probable that
you will often think, after reading such an example: “Is that all?! I
could have easily thought of that myself!”. This is perfectly normal and
exactly what we expect. However, we also expect you to have an en-
tirely different opinion after completing the exercises we propose. The
exercises will force you to start exploring yourself to come to a good
model, there will be modelling decisions that you need to make that will
have consequences for your testing. Solving the exercises will make you
aware of how difficult testing is and will give you the skills to become
a better tester.

The aims of teaching you test case design techniques in this course are:
• to make you conscious of the enormous set of well-known techniques
that exist;
• to let you experience that it mostly is not as simple as in the explain-
ing examples;
• to let you explore to make models;
• to make you appreciate that testing is a craft, and that a lot of experi-
ence and common sense is required to do it well.

51
Chapter contents 3

The exploratory nature of software testing

Overview 53

52
Chapter 3

The exploratory nature of software testing

This chapter has been written by James Bach.

OVERVIEW

This chapter and the next have been written by James Bach, a well-
known advocate of the context-driven school and of exploratory test-
ing. Since, in this course, we want to make sure that we convey ex-
ploratory testing well, what better option than including explanations
written by the experts themselves? James Bach is convinced that soft-
ware testing should belong more to social sciences than it does to com-
puter science. Since this is a course from computer science, we want to
put his use of the word ”formal” in context. In a broad sense, the word
”formal” means standardised, having a recognised set of rules, rigid or
dictated by other people. Think of a formal dinner where there are rules
for the clothes you have to wear and the way and order in which you
have to use your silverware. However, in computer science, the word
formal is often associated only with formal methods, i.e. using math-
ematical models to build software and hardware systems. Use of the
word ”formal” in this chapter is much broader. This is important to re-
alise because, for us computer scientists, this means that when we read
in this chapter that exploratory testing is informal testing this does not nec-
essarily mean that we cannot use formal methods and models to design
test cases. We can and we should because testing is both model-based
and exploratory.

LEARNING GOALS
After studying this chapter, you are expected to:
– be able to explain what the term exploratory testing means
– understand that exploratory testing is structured testing and can
be model-based
– be able to explain how we can look at software testing as a scientific
process.

CONTENTS

3.1 Introduction

What I’m going to do in this chapter is summarize the basic concepts

that comprise the essence of software testing. In doing so, I aim to con-
vince you that testing is an exploratory process. It’s not just sometimes
exploratory; it is inherently exploratory. In other words, whatever else

53
Universidad Politécnica de Valencia Structured Software testing

you may be doing when testing at a professional level, you are also ac-
tively learning and making new choices about what to do next based
on what you learn. You are never merely following a pre-established
procedure.

But while that means the phrase ”exploratory testing” is largely redun-
dant, some testing is especially exploratory (i.e. informal, decided mo-
ment by moment by the tester), while some testing is especially scripted
(i.e. formal, determined by someone else or at some earlier time). The
testing process is always some mix of the two approaches. To do test-
ing well, we need to know how/when/why to emphasize choice and
how/when/why to emphasize procedure.

3.2 Testing is not A science; testing IS science.

To understand the nature of the testing process, we should start with

science. Imagine a science teacher in grammar school who does a chem-
istry demonstration. First, the teacher describes the setup and the equip-
ment. Then he explains the chemicals and describes the reaction that
will occur. He mixes two transparent liquids together, and you see the
resulting solution slowly turns dark blue.

This is commonly called a ”science experiment”, but only by people that

are non-scientists. The teacher knew exactly what would happen; there-
fore, it is more properly called a demonstration. The Oxford-English
dictionary defines an experiment as ”a scientific procedure undertaken
to make a discovery, test a hypothesis, or demonstrate a known fact.”
A scientific procedure is defined as ”principles and procedures for the
systematic pursuit of knowledge (i.e. discovery, validation, demonstra-
tion) involving (1) the recognition and formulation of a problem, (2)
the collection of data through observation, (3) the formulation of hy-
potheses, and (4) validation of hypotheses”. So, the science teacher has
technically performed an experiment, but not one that any real scientist
cares about. Why not? Because scientists are in the business of learning
new things.

What professional scientists call an experiment is a systematic process

of investigating some phenomenon; specifically, by controlling a system
in various ways while methodically observing its behavior in relation
to that control. Such investigating is only necessary in a situation where
there is something you don’t already know about the phenomenon or
system in question. An experiment can be highly controlled or lightly
controlled; more formal or less formal. But one thing is true about a
real-life experiment: it is uncertain. Any scientist who knows for sure
what will happen in an experiment is merely performing a demonstra-
tion; not learning.

The learning that scientists do is disruptive to their work; and they love
that. The people who run the Large Hadron Collider would like noth-
ing more than to invalidate the Standard Model of quantum physics by
learning something unexpected that would require new theories and
new experimental designs. But there is a major difference between a
physicist and a software tester – the things we testers experiment upon

54
Chapter 3 The exploratory nature of software testing

are far less known and far less stable. No physicist worries about need-
ing to ”regression test” physical laws because they might have been
changed the previous night. Yet, that sort of thing does happen to
testers, so our learning curve never flattens out, and our test designs
must be allowed to change.

When people do real experiments, even rather formalized ones, unex-

pected events can occur and unexpected data may be obtained. Scien-
tists use reason, insight, and curiosity to react to that new information.
This is called exploration. There are grand examples of scientific explo-
ration, such as building a probe to chase a comet. But even in small
everyday experiments, scientists scratch their heads over the behavior
of two bats hunting together, or wonder why a bumblebee suddenly
charges off in a random direction while collecting nectar. They hypoth-
esize and construct new protocols to observe these phenomena. That is
exploration, too.

Software testing is exactly like that. Software tests are not just similar
to scientific experiments that test hypotheses or discover new things,
they are experiments (the word ”test” is right there in the definition of
experiment). Software tests are experiments and the professional tester
is a scientist who studies the product under test. If you want to learn
how to test very well, study the design of experiments.

Exploratory testing means performing tests while learning things that

may influence the testing.

3.3 Formality vs. informality

Formality is a major issue in both science and software testing. It means

”having a conventionally recognized form, structure, or set of rules.” A
formal situation is one that is dictated by choices of some other people
or by you from some previous time. Formality is followed; it is adhered
to. Conversely, informality means something not constrained to any
particular form, structure, or set of rules– in other words, the experi-
menter is free to choose here and now.

Formality is important because it allows us to make strong statements

about what we have and have not done; what we have and have not
proven. Formality makes mathematics possible. Formality facilitates
reliable, dependable results. It also makes certain kinds of collabora-
tion possible, whereby the work of many people must remain consis-
tent over time and space. But formality has a cost. Setting up a formal
arrangement of any kind means making design decisions that include
some things and exclude other things. Fixing upon a good solution,
now, may mean giving up a better solution, later. As an example, IPv4
protocol is so widely adopted that the switch to IPv6 has been painfully
slow. Twenty years after the IPv6 draft standard was settled, more than
90% of traffic on the Internet is still using the older standard. IPv4 is an
example of both the incredible value of standardization and the incred-
ible difficulties standardization may impose.

55
Universidad Politécnica de Valencia Structured Software testing

Informality is important because it allows new information and new

ideas, even ones that are inconsistent with all that have come before,
to inform our work. Informality allows us to benefit from tacit knowl-
edge, which is knowledge that is unspoken and unwritten and therefore
invisible and unjustified within any formal framework. Informality is
often associated with spontaneity, too. Play is informal, by definition.
Yet play is vital for creative problem-solving. Apart from helping us
discover new solutions, playfulness gives us motivation. It liberates
energy we need to do difficult things.

Exploratory testing is informal testing.

3.4 Agency vs. algorithm

Formality is tied up with another key idea: agency. Agency is about

who is in control; who is making the choices that guide the work; who
is accountable. A test tube sitting in a laboratory has no agency. Even in
the most formalized and mechanized of experiments, which minimize
human agency, we would never say that a test tube is ”doing science.”
We would not say that a thermal probe is ”experimenting” on the liquid
it measures. There are robots on Mars right now, but no one would call
them ”automated scientists.” They are tools. Science is always done by
people, usually with the aid of tools, and the minds of good scientists
are ever curious and not robotic or algorithmic.

Algorithmic processes are immensely useful, of course. But algorithms

are pure processes without agency. Any ”free” choices are made not
by algorithms but by designers of the algorithm. And there is no algo-
rithm for testing itself. Algorithms are applied in testing, but always by
people. The people who apply them must be accountable for them.

The way we know for sure that machines don’t have agency is by look-
ing at what happens when machines misbehave: they are not punished.
Machines don’t get sued and don’t go to jail. Machines are, at best, like
children. If they misbehave, the blame falls upon the nearest source of
agency: their creators or operators.

Exploratory testing is testing by someone who possesses agency and is therefore

accountable for that work.

3.5 Deliberation vs. spontaneity

Some testing is spontaneous, in that it emerges in the moment. Other

testing is deliberative, in that it is carefully planned in advance. This is
not the same as informal vs. formal. Spontaneous testing can be formal
or informal; deliberative testing can be formal or informal.

Exploratory testing is commonly associated with spontaneous testing,

because that’s an easy case to talk about: you step up to a product and
pound on the keyboard. Type anything that comes to mind. Click on
whatever you feel like clicking. React to whatever happens.

56
Chapter 3 The exploratory nature of software testing

But it is equally exploratory to sit and think about what would be useful
to do next, and to think through the reasons why it would be useful.
You are encouraged to deliberate, in other words. Now, if you plan
multiple steps that you want to take, and carefully stick to that plan,
then that would be a scripted test. If you plan those steps and don’t
force yourself to stick to them, that is less scripted.

If you have some deep-seated habit, or if, by not being aware of your
options, you always make the same choices, then your ”spontaneity”
is low. For instance, imagine only knowing about one food: pizza.
When asked what you want to have for dinner, you always say ”pizza.”
Well that might feel spontaneous, but it is actually pre-determined. You
might as well have a written contract agreeing to pizza. Or perhaps
you play the piano, and you practice a piece of music so much that you
eventually can play it in a flowing spontaneous way, without reading
any musical notation. Finally, we all have little phrases we use that are
formalized and yet uttered spontaneously, such as ”thank you” or ”how
are you” or ”I am fine.”

Exploratory testing can be spontaneous, or highly deliberative.

3.6 Tacit vs. explicit

Many of the processes of exploratory testing are not made explicit. In-
stead they are tacit.

Explicit knowledge is any knowledge that is represented in a form that

can be reduced to a string of bits. Anything spoken, written, or pictured
is explicit. There is also knowledge in our minds that is not explicit,
but could be made explicit if we tried hard enough (e.g. writing down
everything you know about France). Written scripts are explicit knowl-
edge, of course. However, there are also a great many tacit scripts by
which we humans live.

In addition to explicit knowledge, we have knowledge that is inherently

tacit. That means knowledge not put into a form that can be reduced
to a string of bits. This includes knowledge built into our minds at a
low level (e.g. ”I feel surprised when I see flashing colors”), knowledge
that we may not consciously be aware of (e.g. ”I didn’t realize that I
believed a window should not flash strange colors when I scroll it un-
til the moment the product started doing that”), as well as knowledge
we develop in the moment in response to social situations (e.g. ”Man-
agement was annoyed by my bug report about flashing colors, because
they said it is ’merely cosmetic’, but I can edit it to make them see how
important it is.”)

Sociologists, anthropologists, psychologists, and other scientists who

study human behavior have developed elaborate protocols to identify
tacit knowledge. They do not allow themselves to assume that the pro-
cesses and knowledge that a person puts into words represent the pro-
cesses and knowledge by which that person actually works. Unfor-
tunately, few computer scientists are also sociologists. The people who
write testing textbooks are rarely conversant with the protocols and dis-
cipline of social research. It might even be said that computer scientists

57
Universidad Politécnica de Valencia Structured Software testing

and software testers are not competent to say how they themselves do
their own work! They are experts in talking about software, but not nec-
essarily in identifying the thought processes by which they think about
software. This has led to a huge emphasis on algorithmic accounts of
software testing.

To learn how to test, you must watch testers work; and you must watch
yourself work. You must respect that there is a deep structure even
to such seemingly trivial tasks such as deciding when to interrupt a
test process and when to stick to it. Tacit knowledge is developed
not by reading and memorizing instructions. It is developed through
the internal theorizing you do when you watch someone else work or
when you engage in an interactive conversation, and by the automatic
model-building that happens in your mind when you struggle to solve
a problem such as how to make a program produce a certain output.
In other words, tacit knowledge is founded on the experience of living
and working in a stimulating world.

As you test, you develop and use a mental model of the product under
test. You can make formal models, too, but no formal model will cap-
ture the full extent of your mental model. Your mental model includes
not only details of what the product is, but also how it works, what its
purposes and uses might be, what it influences and connects to, its past
and future, its similarities and relationships to other products, patterns
of consistency and inconsistency within it, its potential problems and
prospects for improvement, et cetera. Your mental model is automati-
cally created and maintained through the processes of interacting with
the product and reflecting on what you know. This knowledge is then
crucial to your ability to evaluate the product and report on the status
of your testing.

Part of how we mediate the development of new knowledge about

complex systems is through the processes of curiosity and play. The
spontaneous as well as deliberative ideas for new interactions with a
product occur in response to our internal explorations of our emerg-
ing understanding of that very product. Therefore, we cannot expect
to formalize the processes of this learning, since we cannot predict or
control those moments of curiosity. Exploration is crucial to learning
about anything complicated.

Exploratory testing is not only reliant on tacit knowledge, it is a process for

developing tacit knowledge about a product that in turn allows you to test
better.

3.7 How exploration typically works in software testing

When you first encounter a product and you don’t know much about
it, you explore to learn what it can be. I call this ”survey testing,” which
means any testing that has the primary goal of building a mental model
of the product (i.e. learning about it). Although its primary goal is
learning, it is real testing, because you may encounter and report bugs
while doing it. Survey testing is a very exploratory process.

58
Chapter 3 The exploratory nature of software testing

You can also explore before you ever see a physical product. You could
do this by reading about it, reviewing similar products, or having con-
versations with the developers and designers. You can ask questions,
either out loud or silently as you perform research. This helps you build
that all important understanding of what you will eventually test in
physical form, and might also lead to finding bugs in the specifications
and ideas that you encounter.

During this early phase of the test process, specific ideas about how to
test it will emerge. Resist the urge to write down those ideas as formal-
ized ”test cases.” It’s too early for that. The worst time to plan testing is
at the beginning of a project. Getting a general idea of what you want
to do, or writing down those ideas is not the problem. The problem is
making premature and binding decisions about specifically what and
how to test. Formalization is premature unless you have already in-
formally experimented with possible test procedures and gained tacit
knowledge from that process. You will learn a great deal very quickly
when you are actually in front of the product, and if you have already
formalized the testing you will simply have to throw away most of that
scripting and start again, or else ignore what you learned and follow
procedures that you know to be bad.

As you test informally, you will see good opportunities to formalize

your testing to some degree. You will use specific techniques to con-
struct test data and perform specific experiments. Some of these you
will encode into a form that affords automated output checking. But
even as you formalize, that is not the end of exploration.

After you have a strong idea of what the product ought to be and how
you want to test it, you still explore to find hidden problems and un-
known risks. Even in the midst of a formalized test procedure, when
you find any indication of a possible problem, you must explore to dis-
cover the extent of that problem. All along, throughout testing, you
keep your eyes and mind open for new surprises.

Formality in testing generally increases the integrity of the test process:

that means it increases the probability that you have tested what you
think you tested. However, testing must remain exploratory to some
degree. All good testing needs a thinking, socially competent human
in the loop, capable of steering the process and either suspending or
applying formality as needed.

Always remember, testing is not a discipline of pushing buttons or in-

voking code. Testing is a discipline of evaluating a product through
rapid learning that always happens in the presence of complexity, un-
certainty, and change.

3.8 Testing is not proving; verification is not testing

No matter how formal your testing gets, you cannot use testing to ”prove”
that software ”works.”

59
Universidad Politécnica de Valencia Structured Software testing

If you follow a specific procedure and the product behaves exactly as

you were certain that it would behave, that is a demonstration. This
might be entertaining, and it might help you explain what the product
is supposed to do, but please don’t call it a test. I suggest calling that
an output check. Performing checks (sometimes called ”verifications”)
is certainly part of testing; it’s a fundamental tactic of testing. But the
full process of testing involves modeling, sensemaking, and reasoning
about product risks and making social judgements about them, all of
which are beyond the scope of mere checking.

Testing is an empirical process; it gathers evidence about the world.

This process is subject to all the limitations of evidence that philoso-
phers have cited since Socrates and Sextus Empiricus. Proof, on the
other hand, belongs to the world of mathematics; to formal systems.
Real life systems, even computer systems, are far more complex and
far less certain than are formal systems. Real life doesn’t necessarily
behave according to mathematical models, because such models might
be wrong or the data about the system incomplete. Even if meteorol-
ogists had a perfect model of how a hurricane works, they would still
need a vast, impossible amount of data and computing time to provide
a completely accurate prediction of its behavior.

Yet new testers often speak casually about proving that their product
works. This is a seductive but poisonous way of thinking. Here’s why
it’s wrong in one phrase: can is not the same as will. If someone drinks
a lot of alcohol and then drives home safely in a car, that only proves it
is possible not to be injured or injure someone else while drunk driving.
It does not verify that everyone, always, will be safe. It doesn’t prove
it’s a good thing to do. It is only one experience. Similarly, when you
experience that your product doesn’t fail in the five minutes that you
look at it, that is no ”proof” that it ”works” because it may have been
failing in a way you didn’t see, or it may fail five minutes from now.
You can’t know.

Here’s another key phrase: Seeing no problem is not the same as seeing
that there are no problems. Just because you fail to detect a bug doesn’t
mean it wasn’t there in front of you, fully able to be detected. Or it may
have been very good at hiding from you, yet not so good at hiding from
your users. Therefore, it is wrong to make sweeping statements about
what is true about a product based on the assumption that what you
see is all there is.

Formal verification of properties in computer programs does exist, but

it applies in relatively narrow situations, applying only to certain kinds
of software, depending on a long list of assumptions, and depending
on the correct operation of verification tools that themselves could have
bugs. Formal verification is a nice tool to augment testing, or limit the
need for certain kinds of testing. It cannot replace testing entirely.

Testing is a systematic evidence gathering and assessment process, but never

proves that a product works.

60
Chapter 3 The exploratory nature of software testing

61
Chapter contents 4

Let us do some testing

Overview 63

62
Chapter 4

Let us do some testing

This chapter has been written together with James Bach.

OVERVIEW

This is James Bach’s other chapter that has been written as an exam-
ple of how an experienced exploratory tester would set about testing
a given problem. It is meant to put you in the right state of mind to
continue with the rest of this workbook.

LEARNING GOALS
After studying this chapter, you are expected to:
– be able to explain the practice of exploratory testing
– be able to explain how exploring leads to models
– be able to explain how the need for a model leads to more explor-
ing.

CONTENTS

4.1 Introduction

In this chapter, a renowned exploratory tester, James Bach, tackles a

given testing problem. You will see the exploratory and model-based
character of testing.

The problem at hand is Naur’s text formatting program from a paper

from 1969 [85]. This paper presents a technique called action clusters,
and uses that text formatting program as an example to demonstrate
programming with action clusters. Goodenough and Gerhart’s paper
on test data selection [49] shows that this program contains at least
seven errors that could have been prevented by testing the program.
Since then Naur’s text formatting algorithm became popular for illus-
trating different test methods [107, 42, 45] and formal specification pit-
falls [80].

The specification and algorithm, due to Naur, are given in Figures 4.1
and 4.2. These were presented to James Bach in this way to start testing
and record the process. Naur’s algorithm uses an array buffer to grad-
ually collect the next word (sequence of characters without blanks or
newlines) from the input text; that word occupies positions 1 . . . bufpos.
Moreover, it uses a variable fill for the number of characters that were
sent to the output since the last newline.

63
Universidad Politécnica de Valencia Structured Software testing

FIGURE 4.1 The specification taken from [85]

64
Chapter 4 Let us do some testing

FIGURE 4.2 The solution taken from [85]

65
Universidad Politécnica de Valencia Structured Software testing

EXERCISE 4.1
Study the specification and algorithm from Figures 4.1 and 4.2 well. Try
to understand how the algorithm intends to solve the problem stated in
the specification. You can disregard all remarks about action clusters.
Do you see some errors already?

The algorithm in Figure 4.2 is just that: an algorithm, it is not a program.

That means that testing it can be done in two ways:
• by executing it by hand, as we did in Exercise 4.1;
• by implementing it and then running it, as James Bach does in the
next section.

In both approaches, there is a risk we will introduce errors that are not
caused by the algorithm itself, since we are, so to speak, working with
an interpretation of the algorithm. In other words, maybe we are not
really testing the algorithm. However, there is nothing we can do about
that, except pay attention and try not to make mistakes.

4.2 Recording of an exploratory testing process

Since I (James Bach) was not given running code, I first implemented a
version of the example program in Perl to serve as the SUT, doing my
best to preserve any bugs that might be in the provided pseudo-code. A
few obvious problems immediately surfaced, and they have to do with
underspecification of the pseudo-code:
• The pseudo-code calls an exception of some kind (“Alarm”) when
the length of a single word is equal to or greater than the width of
the viewport. This means a long string of text without blanks is an
actual error condition. However, the error handling has not been
clearly defined.
• The pseudo-code never exits, because the code does not recognise
the end-of-string condition. Does this mean that an infinite stream
processing function needs to be implemented that never terminates?
Or is this again underspecification?

I resolved the second to have a version of the function that does ter-
minate and on which it would be worth doing more than the simplest
testing.

The running function is called Reflow, it is written in Perl and you can
find it on the course site1 .

Reflow is accessible via this call:

reflow_function($input,\$output,$max);

where:
• $input contains the text to be re-wrapped
• $output receives the re-wrapped text (the "\" indicates it is passed
by reference)
• $max contains the width of the viewport.

1 There is also an implementation in Java that can be found on the course site.

66
Chapter 4 Let us do some testing

A simple and quick way to test this is to provide a simple test string that
includes some words of varying lengths, then progressively reducing
the size of the viewport from large to small. For example, this code can
do that:
for (my $max=40; $max>=0; $max--)
{
my $output = "";
my $result = reflow_function("this is a simple test string.", \
$output,$max);
print "$output\n", "*" x $max, "\t$result","\n";
}

This produces the following output (we added +++ to mark beginning
and end of the output):

+++

this is a simple test string.

****************************************

this is a simple test string.

***************************************

this is a simple test string.

**************************************

this is a simple test string.

*************************************

this is a simple test string.

************************************

this is a simple test string.

***********************************

this is a simple test string.

**********************************

this is a simple test string.

*********************************

this is a simple test string.

********************************

this is a simple test string.

*******************************

this is a simple test string.

******************************

this is a simple test

string.
*****************************

this is a simple test

string.
****************************

67
Universidad Politécnica de Valencia Structured Software testing

this is a simple test

string.
***************************

this is a simple test

string.
**************************

this is a simple test

string.
*************************

this is a simple test

string.
************************

this is a simple test

string.
***********************

this is a simple test

string.
**********************

this is a simple
test string.
*********************

this is a simple
test string.
********************

this is a simple
test string.
*******************

this is a simple
test string.
******************

this is a simple
test string.
*****************

this is a
simple test
string.
****************

this is a
simple test
string.
***************

this is a
simple test

68
Chapter 4 Let us do some testing

string.
**************

this is a
simple test
string.
*************

this is a
simple test
string.
************

this is a
simple test
string.
***********

this is a
simple
test
string.
**********

this is
a simple
test
string.
*********

this is
a simple
test
string.
********

this
is a
simple
test
string.
*******

this
is a
simple
test
****** Alarm!

this
is a
***** Alarm!

this

69
Universidad Politécnica de Valencia Structured Software testing

is a
**** Alarm!

*** Alarm!

** Alarm!

* Alarm!

Alarm!

+++

Merely glancing at this immediately reveals that the reflow function al-
ways adds a new line and a space to the beginning of the string, which
seems inconsistent with its purpose, the desires of a hypothetical user,
and equivalent functions in other products. Otherwise, reflow appears
to work as advertised, except that words larger than the viewport are
treated as error conditions instead of becoming wrapped, while the ac-
tual error condition of a zero-width viewport is not explicitly handled.

4.2.1 What is the nature of this testing?

How did it arise? It might seem too obvious even to discuss, but in fact
I have made many interesting choices, here, that could have been made
differently. At this point, these choices are “tacit,” meaning unspoken.
I have used my tacit knowledge of word wrapping (also known as line
breaking) and testing to guide me. Tacit knowledge often feels like “just
doing the obvious thing” to people who already have the particular
skills and knowledge involved. But if you do not have these, the work
can look mystical or ritualistic.

A danger with tacit knowledge is that the student can be tempted to

copy the mysterious choices of a teacher without understanding their
basis or knowing the world of plausible alternatives from which those
choices were drawn. That is why good teachers encourage critical think-
ing and discussion to pick apart even seemingly innocuous situations
so that deeper learning will happen.

70
Chapter 4 Let us do some testing

4.2.2 How did I choose that test string?

As my goal is an inexpensive sanity test, I chose something easy to type,

but that still had several words of different sizes. I wanted the total
length of the text to be smaller than 40 characters to reduce the size of
the output. I did not include any “weird” characters, because I do not
yet know what the character encoding is supposed to be and, besides,
that can wait for deeper testing. I did not include any line feeds because
I did not think of it at the time.

4.2.3 How did I choose to use a loop?

I wanted an inexpensive test, but I would still like some good test cov-
erage. A loop is a very simple way to cover a lot of ground. I can “try
everything” by looping through all the sizes within a range.

4.2.4 How did I know what the Reflow function is supposed to do?

I am applying a tacit, or generic specification. In other words, I already

know what “word wrapping” means, because I have experience and
education in the world of word processing. If all I know is that there is
an input string, an output string, and a viewport width, I can already
guess most of what the system is supposed to do. If, however, I was
unfamiliar with the nature of this functionality, I would have to Google
it or interview or pair up with someone who did know. No matter what
my personal knowledge is, I would probably benefit from doing some
study. For instance, the Wikipedia article on word-wrapping includes a
lot of interesting ideas, including how to handle hyphens.

4.2.5 What should this kind of testing be called?

In a project, I would call this a “functional sanity test.” But to be more

specific:
• It is exploratory testing, because I am in control of this testing, mak-
ing all my own choices (as far as I am aware of them). It would be
scripted testing if I were working only within a predefined plan of
steps that need to be executed and if I did not control that plan. All
testing is exploratory to some degree (unless you are merely push-
ing the buttons according to a predefined script). All testing is also
scripted to some degree, at the very least because we are not aware
of all the factors that unconsciously limit our choices. In this case
the testing is highly exploratory, but notice something: I embedded
my choices into a program. If someone else runs the script, their
work would be more scripted, because they would be accepting all
my embedded choices (unless they rewrote the program).
• It is sanity testing, because it is a quick and inexpensive way to dis-
cover if the Reflow function can work at all (i.e. is it sane or insane?).
• It is functional testing, because it focuses on what the product can
do, and because the scope of this assignment is to specifically test a
particular function.
• It is also a form of input domain testing, because it is a systematic
exploration of an input space.

71
Universidad Politécnica de Valencia Structured Software testing

• Some would call it black-box testing because it is designed without

consideration for how the code works.
• It can be considered unit testing because it focuses on a specific, un-
integrated portion (unit) of a larger system.
• It is identical to API testing, but we would not call it that unless this
was part of a published API.
• It is not risk-based testing, because I have not attempted to model
important problems and then look for them.
• It is not stress testing because I have not yet attempted to overwhelm
the product.
• It is shallow testing, and not deep testing, because deep testing re-
quires significant time, energy, and preparation to perform. To call
anything deep testing, I must be prepared to make a specific argu-
ment to that effect. You do not just accidentally do deep testing.

4.2.6 What about the oracle?

In Chapter 1 you learned that an oracle in testing is the mechanism you

use to decide whether the test case output is correct or not. Basically,
an oracle is a way to recognise a problem that appears during testing.
Which oracle did I use here to decide whether the output of the line-
editing program is correct or not for our test inputs?

I used a blink oracle. That is a heuristic based on the assumption that our
brains are very good at detecting incorrect patterns in a huge amount
of data in a blink.

Here I used a progressive mismatch oracle, this is a kind of blink oracle, be-
cause the code results in a sequence of outputs based on small changes
to the input. This helps the tester’s eye pick out incorrect patterns in
that output.

4.2.7 What is the next step?

I must stop and think at this point. I need to consider three major ques-
tions:

What is the status of my testing?

How important is it to find problems in this product?

What other things are worthwhile testing?

To consider these questions I must know how to reason about my mis-

sion and my strategy. The mission part is pretty easy, since this is out-
side of any actual project. So I will simply declare that my goal is to
discover any bug that I believe a reasonable person could find in a cou-
ple of hours of trying. By that standard I am definitely not ready to
stop. As for my test strategy, let us begin by systematically analysing
(or “factoring”) the product to identify testable elements which then get
worked into tests (when I cover a product element in a test I call that a
test factor). I begin with surfaces. What are the surfaces of this prod-
uct? In other words, what kind of models can I make of this product

72
Chapter 4 Let us do some testing

that would be complex enough to be worth exploring. Each kind of

model and each big section of it can be viewed as a “surface” or facet
of a crystal. When I test, I want to examine (or “cover”) these surfaces.
Here is some broad classification of surfaces we could model:
• Structural. What is it physically made of?
• Functional. What does it do?
• Data. What does it operate upon? What is it controlled by?
• Interfaces. How do I interact with it?
• Platform. What does it depend upon?
• Operations. What patterns of use will it encounter?
• Time. How does the element of time affect its behaviour?

In this case, functional and data seem like large enough surfaces for me to
explore at this moment. The other surfaces are either trivial or obscure
to me. In the case of the functional surface, I think I already have a
good start on that. I just need to question whether Reflow has all the
capabilities that a good word wrapper ought to have. Since this is an
exercise, I will put that aside and focus instead on the surface data.

I need a rich set of test data. I can put this together by hand or write
code to generate it. Either way, I first need to model the input data space
so that I can have a clear idea of what I want to create. The primary
data of interest is the text to be wrapped. Based on my background
knowledge, a little Googling, and the desire to keep things simple, the
elements of that data include:
• Text encoding. ASCII, ANSI, UTF-8, UNICODE... what encodings
are supported? (I’m going to assume ANSI).
• Words. A word is a string of ANSI characters other than newline,
whitespace, or hyphens.
• Whitespace. I will include the following character codes in my con-
cept of whitespace: Horizontal Tab (0x09) and Space (0x20)
• Non-breaking space. I will consider a non-breaking space (0xA0) to
be something that should not allow a line break.
• New lines. I will consider line feeds (0x0A) and carriage returns
(0x0D) to be new line characters, and I will consider a carriage re-
turn to be a single new line character.
• Hyphens. I will consider the following character codes to be hy-
phens: Dash (0x2D), En Dash (0x96), Em Dash (0x97), Soft Hyphen
(0xAD).

The input text to be wrapped has attributes such as:

• Total length in characters (0, maxint, <viewport width, =width, >width)
• Number of words (0, 1, many)
• Number of lines (0, 1, many)
• Length of lines (0, >0)
• Length of words (1, >1)
• Number of consecutive whitespace characters (1, 2, viewport width)
• Number of consecutive new line characters (1, 2)
• Specific characters in words (all ANSI word characters)
• Sequences of words

The only other input is the viewport width:

• 0

73
Universidad Politécnica de Valencia Structured Software testing

• 1
• <longest word
• =longest word
• >longest word
• maxint

Additional conditions that seem interesting include:

• Words longer than the maximum line length
• Lines consisting entirely of spaces
• Zero-length string

Timing does not seem important in this case, and there are probably no
persistent state variables to worry about. There is no Graphical User
Interface (GUI). So, what I need to do is construct one or more texts
which explore every important factor that we want to cover.

4.2.8 How about a de Bruijn sequence?

We are dealing with testing a problem of constructing a text, i.e. a se-

quence of characters. So I thought I would use a de Bruijn sequence2 ,
which is a string that packs all combinations of a set of tokens into the
smallest space. The tokens I care about are whitespace, hyphen, new
lines, non-breaking space, word character. If I create all sequences of
these up to seven tokens long, wherein I replace each character class
with a random representative from that class, I get a very long string to
use in testing. The applet at Hakank.org can provide the base sequence,
using the values 0-4. Then I could write a little program to transform
that into a usable string.

This turned out to be a pretty bad idea. The sequence I wanted was too
big for the applet, but with Googling, I found a Python script that gave
me what I needed. However, when I transformed that into a usable
input string, it was a big mess. I could run it through Reflow, but it
was too difficult to interpret the result. So, I abandoned the de Bruijn
sequence idea and returned to something simpler.

4.2.9 Progressively smaller words

Then I created a string with progressively smaller words, as follows:

for (1..40){print "x" x (41 - $_), " ";}

This gave me a text with a distinctive and easily understood pattern of

wrapping. The result of Reflow on this string, with a viewport size of
57, is shown below. The two lines of 57 asterisks mark beginning and
end of the output.

*********************************************************

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2 http://www.hakank.org/comb/debruijn.cgi

74
Chapter 4 Let us do some testing

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxx xxxxxxxxxxxxxx xxxxxxxxxxxxx xxxxxxxxxxxx
xxxxxxxxxxx xxxxxxxxxx xxxxxxxxx xxxxxxxx xxxxxxx xxxxxx
xxxxx xxxx xxx xx x
*********************************************************

However, when I narrowed the viewport to the size of the largest word
I saw this:

****************************************

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx
xxxxxxxxxxxxxx xxxxxxxxxxxxx
xxxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxx
xxxxxxxxx xxxxxxxx xxxxxxx xxxxxx xxxxx
xxxx xxx xx x
****************************************

75
Universidad Politécnica de Valencia Structured Software testing

Reflow put an extra line feed at the top of the string.

4.2.10 Exactly sized lines

In this text, we will make sure that each line has exactly the same size
(49 characters). When it is word-wrapped with a viewport size of 49 I
should see something like this (i.e. our oracle):

x xxxxxxxxx xx xxxxxxxx xxxxxxx x xxxxxxxxxxxxxxx

Instead I see this:

x xxxxxxxxx xx xxxxxxxx xxxxxxx x

This is explained by the known bug where a space is always added to

the beginning of the output. That first space throws everything else off
until the text spontaneously synchronises toward the bottom.

We can prevent this effect by shortening the second "word" by one x. If

we then run Reflow again, we indeed get:

x xxxxxxxx xx xxxxxxxx xxxxxxx x xxxxxxxxxxxxxxx

76
Chapter 4 Let us do some testing

FIGURE 4.3 Considering special characters

4.2.11 Try the interesting characters

I created a series of texts by inserting special characters from Table 4.3

into the middle of the string “xxxxxxxx”. Then I called Reflow with a
viewport width of 7 to see which texts would break and which would
not. Note that “Alarm!” is a known bug that simply indicates that the
word is too long for the line. Also note that Reflow adds a new line and
a space to the beginning of the output, which is also a known bug.

4.2.12 Consecutive special characters

When I created a new version of the previous test data which had three
consecutive special characters in each input text, I discovered that new
lines were being replaced by spaces, which seems like a bug to me. Con-
secutive whitespace is handled by passing it through. Although the
spec is silent about that, I think it is reasonable behaviour.

4.2.13 Have we tested enough?

Much of the original outline of testable elements identified in Section 4.2.7

has now been covered. Walking through it:
• I have not covered maxint for viewport width, because that is a value
too large to test in this case.
• I have not dealt with different character encodings, different dashes,
non-breaking spaces, and such, because I already know from the
code that they are not handled. If I were testing a completely black
box, I would try those, too.
• I do not think I need to do stress testing, for the moment, because
functionality is the main concern at present.

77
Universidad Politécnica de Valencia Structured Software testing

I have found a few bugs and now is a good time to move on to other
testing. When I get the new – fixed – version, I can easily repeat this
testing, and I might add more at that time. For instance, I might use
naturalistic text.

Part of judging the enoughness3 of testing is to decide if you have pro-

vided enough useful information to the developers for them to make
significant updates based on your work. You will need to retest any
changes they make and so you do not need to do all testing at once.
At the same time, be aware of other things that require your attention.
You will also find that switching between different testing activities and
areas will help you when you return to test any given area further.

4.2.14 Final bug list

1 If the first word is of length MAXPOS an extra line feed is prepended

to the output.
2 Prepends space to output.
3 The following characters do not allow a line break but probably
should: tab, em dash, en dash, hyphen, soft hyphen.
4 Carriage return does not cause a new line.
5 New line characters and form feeds are stripped and replaced with
spaces.

3 http://www.satisfice.com/articles/good\_enough\_testing.pdf

78
Chapter 4 Let us do some testing

79
Chapter contents 5

Taxonomies, catalogs and checklists

Overview 81

80
Chapter 5

Taxonomies, catalogs and checklists

OVERVIEW

In this chapter we look at taxonomies, catalogs and checklists that col-

lect and organise information about software products and projects.
This information can be valuable for generating ideas about how you
can start your testing. More importantly, it can provide backup for
the test design, help you find a suitable model, support decisions for
allocating testing resources and improve the review of requirements.
Furthermore, taxonomies, catalogs and checklists are suitable for mea-
suring product quality.

LEARNING GOALS
After studying this chapter, you are expected to:
– know of different taxonomies, catalogs and checklists used in soft-
ware testing
– see how useful taxonomies, catalogs and checklists can be during
testing
– understand that the most useful ones are those that you make
yourself
– be able to draw a Product Coverage Outline for a testing problem
using taxonomies, catalogs and/or checklists.

CONTENTS

5.1 Introduction

When we are faced with an assignment to test something, we will im-

mediately start with generating ideas about what is important to test.
These test ideas will help us identify more clear test relevant aspects, they
will guide our testing and determine the models that we will make.

While generating test ideas and test relevant aspects, many times it
helps to make an overview using a picture or a table. Some call this
a Product Coverage Outline (PCO) [60]. A PCO is an artefact (a picture,
mind map, list, diagram, sketch, table, et cetera) that identifies the di-
mensions or elements of a product that might be relevant to testing.

Mind maps turn out to be very useful when creating a PCO. Mind map-
ping is a technique that can help people to learn more effectively. It
was invented, together with the term mind map, by popular psychology
author Tony Buzan [30].

81
Universidad Politécnica de Valencia Structured Software testing

FIGURE 5.1 Partial mind map for PSE.

FIGURE 5.2 Looking at PSE from another angle

For example, the small mind map in Figure 5.1 gives an overview of
this course.

Mind maps are used nowadays in many disciplines, including testing.

They are useful during brainstorming and to record things to think
about when designing a test model.

Mind maps are very flexible. For example, another way to structure the
content of this course is shown in Figure 5.2. This flexibility is one of the
strengths of mind maps, but at the same time can be a weakness when
more precise semantics are needed.

To generate a mind map, or PCO, with test ideas in the early stages of
testing, we can use taxonomies, catalogs or checklists to get started. We
look at examples of these in the next sections.

82
Chapter 5 Taxonomies, catalogs and checklists

5.2 Defect taxonomies

A taxonomy is a classification of things into ordered groups or cate-

gories that indicate natural, hierarchical relationships. The word tax-
onomy is derived from two Greek roots: "taxis" meaning arrangement
and "onoma" meaning name. Taxonomies not only facilitate the orderly
storage of information, but they also facilitate its retrieval and the dis-
covery of new ideas.

Various taxonomies are used in testing. In this section, we briefly ex-

amine taxonomies of faults or defects to give you an idea of the kinds
of faults software can have. Part of the content in this section has been
taken and adapted from [36] with the consent of the author, Lee Copeland.

Taxonomies in testing, besides helping to guide your testing by gener-

ating ideas for test design, can help you:
• audit your test plans to determine the coverage your test cases are
providing concerning the different types of defects that are in the
taxonomy.
• understand your defects, their types and severities;
• understand the process you currently use to produce those defects;
• improve your development process;
• improve your testing process;
• train new testers regarding important areas that deserve testing;
• explain to management the complexities of software testing.

Let us look at some existing taxonomies in testing.

5.2.1 Boris Beizer’s taxonomy

One of the first and most used defect taxonomies was defined by Boris
Beizer in Software Testing Techniques [11] and later revised in [21]. It is
based on nine top-level categories:
1 requirements defects
2 feature and functionality defects
3 structure defects
4 data defects
5 implementation and coding defects
6 integration defects
7 system and software architecture defects
8 test definition and execution defects
9 unclassified defects

Subsequently, each category defines a four-level classification of soft-

ware defects. The entire taxonomy can be found in an appendix to this
chapter, on page ??.

Even considering only the top two levels it is quite extensive. All four
levels of the taxonomy constitute a fine-grained framework with which
to categorise defects.

83
Universidad Politécnica de Valencia Structured Software testing

You do not have to read this now carefully word for word, you can just
browse through it and get impressed by the huge amount of possible
defects and for now let this taxonomy serve as a reminder of how many
defects can occur in software and how difficult a task testing can be.

Later you can use the taxonomy to generate test ideas. We will also see
how such a defect taxonomy can be at the basis of making a catalog or
a checklist that will be explained in subsequent sections.

For companies, the taxonomy can be used as a framework to record

defect data. Subsequent analysis of this data can help a company to
understand the types of defects it creates, how many (in terms of raw
numbers and percentages), and how and why these defects occur. Then,
when faced with too many things to test and not enough time, there
is data available to make risk-based, rather than random, test design
decisions.

5.2.2 Kaner, Falk, and Nguyen’s taxonomy

The book Testing Computer Software [31] contains a detailed taxon-

omy consisting of over 400 types of defects that are categorised in 13
categories:
1 user interface errors
2 error handling
3 boundary-related errors
4 calculation errors
5 initial and later states
6 control-flow errors
7 errors in handling or interpreting data
8 race conditions
9 load conditions
10 hardware
11 source and version control
12 documentation
13 testing errors

5.2.3 Whittaker’s ”How to Break Software” taxonomy

James Whittaker’s book How to Break Software [126] gives a taxonomy

that guides us to where specifically we should look for failures. Not
only does he identify areas in which faults tend to occur, but he also
defines specific testing attacks to locate these faults.

He bases his taxonomy (or fault model as he calls it) on the four basic
tasks that any software system performs:
1 software accepts inputs from its environment;
2 software produces outputs and transmits these to its environment;
3 software stores data internally in one or more data structures;
4 software performs computations using input and stored data.

84
Chapter 5 Taxonomies, catalogs and checklists

This fault model can guide the tester: if the software does any of these
four things wrong, we found a failure. In the book [126] he then de-
scribes different attacks that can be performed from different environ-
ments: the Graphical User Interface (GUI) for the human user, the file
system user, the operating system user and the external software user.

5.3 Catalogs and checklists

5.3.1 Catalogs

Catalogs capture the experience of test designers by listing important

cases for all types of variables or situations that they have encountered
in the past and are important to test. This way, lessons learned from
testing one system could improve testing of new systems.

In the book by Pezze and Young [94], test catalogs are used to provide a
list of test parameters according to the input type of each variable. Also
Marick [74] uses catalogs in his book.1,2 These catalogs can be used to
give you more test ideas.

As an example of how these catalogs can be used, let us look at collec-

tions, the fundamental data types in programming, like lists or arrays.
Imagine we are testing a method that is changing the contents of such
a collection. The catalog from [74] then provides us with the follow-
ing ideas about what to test when we deal with situations of collections
(there is a lot more in this catalog, this is just a small excerpt):

Appending elements
• The collection is initially empty; add at least one new element.
• The collection is not initially empty; add at least one new element.
• Add the last element that can fit (the collection is now full).
• Attempt to add one more element to a full collection.
• Add zero new elements.
• Add multiple elements, almost filling up the collection
• Add multiple elements, filling up the collection.
• The collection is not full. Try to add one too many elements (adding
more than one).

Deleting elements
• The collection has one element.
• The collection has no elements (nothing to delete).

Finding or searching for elements

• Match not found.
• Exactly one match.
• More than one match.
• Single match found in the first position.
• Single match found in the last position.

1 www.exampler.com/testing-com/writings/short-catalog.pdf
2 www.exampler.com/testing-com/writings/catalog.pdf

85
Universidad Politécnica de Valencia Structured Software testing

How can we use this? Recall that we have seen some examples of pro-
grams that were about arrays in Figure 1.3. For example, let us look
again at:
/**
* Find LAST index of zero.
* @param x array to search
* @return index of last 0 in x;
* -1 if absent
* @throws NullPointerException
* if x is null
*/

public int lastZero(int[] x) {

for (int i=0; i<x.length; i++) {
if (x[i] == 0) {
return i;
}
}
return -1;
}

Using the catalog, we should have a test case for:

1 Match not found: a test case where there is no occurrence of 0 in the
array x:

input x = [2, 3, 5]
expected result -1

2 Exactly one match: a test case where there is only one occurrence of
0 in the array x:

input x = [2, 0, 3, 5]
expected result 1

3 More than one match: a test case where there is more than one oc-
currence of 0 in the array x:

input x = [7, 2, 0, 3, 0, 5, 6]
expected result 4

4 Single match found in the first position: the 0 is in the first position
of x:

input x = [0, 2, 3, 5]
expected result 0

5 Single match found in the last position: the 0 is in the last position
of x:

input x = [1, 2, 3, 5, 0]
expected result 4

Evidently, test case 3 detects the fault in this program.

86
Chapter 5 Taxonomies, catalogs and checklists

5.3.2 Checklists

Besides these types of catalogs, all kinds of checklists exist. An impres-

sive one is the You are not done yet-checklist [54], which is a compendium
of ideas for testing software. It gives many test ideas for all kinds of
aspects, both functional and non-functional. For example:
• input methods;
• file names;
• file operations;
• network connectivity;
• dialogue boxes;
• undo and redo;
• printing;
• international locales;
• platforms;
• security;
• dates.

Let us look at the last part: dates. Dates are often a source of defects,
as we have seen in Section 1.5. The checklist in [54] gives the items
depicted in Figure 5.3.

FIGURE 5.3 Part of the checklist from [54] about testing dates and Y2K
(year 2000 issues).

Another checklist, or “set of guideword heuristics”, as it is called, is

James Bach’s HTSM, The Heuristic Test Strategy Model [7]. The HTSM
lists a set of patterns for designing and choosing tests to perform. Espe-
cially the Product Elements part of the HTSM can help getting test ideas
for test relevant aspects. The Product Elements are defined in HTSM

87
Universidad Politécnica de Valencia Structured Software testing

as aspects of the product that you consider testing. This includes as-
pects intrinsic to the product and relationships between the product
and things outside it. Elements that are considered are remembered by
the mnemonic SFDIPOT [20, 7]: Structure, Function, Data, Interface,
Platform, Operations and Time.

Let us look into each one briefly and see what type of questions we can
ask ourself to start getting these test ideas. Note that we have tried to
include many elements of the taxonomies, catalogs and checklists that
we have seen until now in this chapter.
Structure What is the structure of the application? Does it use files?
Can we talk to the developers? Can we build it? Do we have access to
the code? What are the technologies that were used to make the soft-
ware? Are there any updates or patches? What type of documentation
is there (user, developers)?
Function What is the functionality of the application? What are the
individual features/functions that it does? Do any of the functions in-
teract? Think about arithmetic/logic functions, estimations, transfor-
mations, multimedia, et cetera. This can be a (organised) listing of all of
the actions that can be done using the application. Maybe like simple
user stories, like You can add/delete to chart and User can join mailing list.
Is any error handling done?
Data What inputs does the product need and process? What are the
types, cardinalities, volumes and properties of these data? Do they need
formatting? Do they have boundaries? What is the precision? Are
there any dates? What does its output look like? What kinds of modes
or states can it be in? Does it come with pre-set data? Is any of its
input sensitive to timing or sequencing? Is there data that is expected
to persist over multiple operations?
Interface How is data being exchanged with the user (e.g. displays,
buttons, fields, whether physical or virtual)? Any other system inter-
faces, with other programs, hard disk, network, wireless, bluetooth,
data base servers, printers, et cetera? Are there any programmatic in-
terfaces (APIs) or tools intended to allow the development of new ap-
plications using this product? Any export or import of data to or from
external applications?
Platform (or the whole ecosystem) What does the application depend
on outside of the software? What operating systems does the applica-
tion run on? Are there any third-party components (hardware or soft-
ware) needed to run the application? Does the environment have to be
configured in any special way? Does it run on or have connection with
the internet? Does it need to comply with any standards (e.g. related to
security, accessibility, money, et cetera)?
Operations How it will be used? Who will use the product? Does it
have authentication (identifications of the users)? Does it do authoriza-
tion (handling of what an authenticated user can see and do)? Does it
have different types of users? Are there certain things that users are
more likely to do? What about ignorant, rogue and careless users? Any
privacy issues (i.e. to not disclose sensitive data to unauthorized users)?
Secrecy (i.e. to not disclose information about the underlying systems)?

88
Chapter 5 Taxonomies, catalogs and checklists

FIGURE 5.4 Partial Product Coverage Outline for Naur

89
Universidad Politécnica de Valencia Structured Software testing

Should it be able to withstand penetration attempts? Any other mali-

cious use? Is it piracy resistant (i.e. no possibility to illegally copy and
distribute the software or code)? Does it need to comply with security
standards?
Time Any time limits on the product, for different circumstances (e.g.
slow network)? Any requirements on appropriate usage of memory,
storage and other resources? Any constraints on availability? Respon-
siveness? Throughput? Does it need to handle load for a long time?

In Figure 5.4 you can see a partial PCO for the Naur algorithm from
Chapters 4. We only expanded the Data part of the SFDIPOT. It gives a
first impression of the things to take into account when testing.

5.4 Your taxonomy, catalog or checklist

Now that we have seen several different tools for generating test ideas,
the question arises: which is the best one for you? The one that is most use-
ful is the one you create from your experience when testing. Often the
place to start is with an existing one. Then modify it to more accurately
reflect your particular test experience in terms of defects you find, gen-
eral test situations you encounter, data you normally work with, their
frequency of occurrence, characteristics, et cetera.

Just as in other disciplines, like biology, psychology and medicine, there

is no one, single, right way to categorise, taxonomise or catalogise. Cat-
egories may be fuzzy and overlapping. Defects or data or inputs may
not correspond to just one category. Our list may not be complete, cor-
rect or consistent. That matters very little. What matters is that we are
collecting, analysing, and categorising our past experience and feed-
ing it forward to improve our ability to test. Taxonomies, catalogs and
checklists in a way are also models and as George Box said: “All models
are wrong; some models are useful” [24].

The first step in creating your own tool is compiling a list of key con-
cepts. Do not worry if your list becomes long. That may be just fine.
Make sure the items are short, descriptive phrases. Keep your users
(that is you and maybe other testers you know) in mind. Use terms
that are common for them. Later, look for natural hierarchical relation-
ships between items. Combine these into a major category with sub-
categories underneath. Try not to duplicate or overlap categories and
subcategories. Continue to add new categories as they are discovered.
Revise the categories and subcategories when new items do not seem
to fit well. Share your tools with others and solicit their feedback.

90
Chapter 5 Taxonomies, catalogs and checklists

91
Chapter contents 6

Input domain modelling with equivalence classes

Overview 93

92
Chapter 6

Input domain modelling with equivalence classes

OVERVIEW

In this course, we approach test case design according to the follow-

ing strategy: make a model, pick a coverage criterion, and design test
cases. In this chapter, we illustrate this strategy with one particular
kind of model and a few very specific coverage criteria. In later chap-
ters, you will see different models and a more general discussion of
coverage criteria. The concrete model here is made by partitioning the
input domains into equivalence classes.

LEARNING GOALS
After studying this chapter, you are expected to:
– understand and be able to describe partition testing techniques and
equivalence classes
– be able to model input domains of a SUT at different levels
– be able to apply different coverage criteria to generate test cases
– be able to apply the techniques of this chapter in a relatively small
practical exercise.

CONTENTS

6.1 Equivalence classes

This chapter, and applying the technique of equivalence class testing in

practice, are not meant to be very mathematical: we only use the idea
of equivalence classes and partitions. However, to understand that cor-
rectly, you should be familiar with the underlying mathematical con-
cepts. We assume you are familiar with these concepts. If not, you can
read the short summary and example below.

A partition of a set S is obtained by dividing S into disjoint subsets, such

that the union of all subsets is S again. In other words, each element of S
is put in exactly one subset. We use the term parts to refer to the subsets
of a partition. For example, some partitions of the set N of natural
numbers are
• P1 = {1} ∪ {2} ∪ {3} ∪ . . . with parts {1}, {2},{3}, . . .
• P10 = {1, . . . , 10} ∪ {11, . . . , 20} ∪ . . . with parts {1, . . . , 10}, . . .
• Peo = {n ∈ N | n is even} ∪ {n ∈ N | n is odd}.

We leave the subject of partitions for a while and discuss relations and
equivalence classes instead. But in the end, you will see that it all comes
together again.

93
Universidad Politécnica de Valencia Structured Software testing

Make a model For each input, model the domain with equivalence classes

Pick a coverage criterion Each Choise coverage (ECC)

All Combinations coverage (ACC)
t-Wise Combinations coverage (t-WCC)
IP coverage
Design testcases

FIGURE 6.1 Model-based testing in this chapter

A relation is a subset of a cartesian product. For instance, the cartesian

product N × N contains all pairs of natural numbers:

N × N = {(n, m) | n, m ∈ N}

We can define a relation R ⊆ N × N by specifying, for instance, nRm ⇔

n mod 2 = m mod 2. In other words, a pair ( x, y) ∈ N × N is in R if and
only if x and y have the same parity (both even, or both odd). Instead of
( x, y) ∈ R we often write xRy. Since R links elements of N to elements
of N, it is called a relation in N; we call the set N the domain of R. Since
R contains only pairs, it is called a binary relation.

This relation R has three useful properties: it is

• reflexive: for all n ∈ N, nRn holds;
• symmetric: for all n, m ∈ N, if nRm then mRn as well;
• transitive: for all n, m, k ∈ N, if nRm and mRk, then nRk as well.

Any binary relation with these three properties is called an equivalence

relation. Every equivalence relation divides its domain into equivalence
classes (subsets with specific properties): each element of the domain
belongs to exactly one equivalence class, and all elements in one equiv-
alence class exhibit the same behaviour concerning R. Our sample rela-
tion R has two equivalence classes:

NE = {n ∈ N | n is even} and NO = {n ∈ N | n is odd}

Now, back to partitions. Every equivalence relation in a domain D de-

fines a partition of D, in which the equivalence classes are the parts. The
converse is also true: every partition defines an equivalence relation.

Figure 6.2 contains a visual representation of the connections between

sets, cartesian products, (equivalence) relations and partitions.

6.2 Partitioning the inputs

Software, in a very abstract way, can be seen as doing three different

things: (a) classifying the input values, (b) doing processing and com-
putation appropriate to this classification and (c) generating outputs.
The test case design technique in this chapter focuses on testing the
classification part of the inputs.

94
Chapter 6 Input domain modelling with equivalence classes

FIGURE 6.2 Equivalence relation and equivalence classes

The word "input" may be used in two different ways. Sometimes it

means "input variable", similar to a formal parameter of a method. At
other times "input" means "input value", similar to an actual parame-
ter of a method. When the need arises we will use the terms "input
variable" or "input value" to indicate which use of "input" we mean.
On the other hand, sometimes we do not need to make this distinction,
either because it is not important or because both input variables and
input values have to be considered. In those cases we use just the term
"input", without further classification.

The kind of inputs we need to consider depends on which SUT we are

testing, and at what level.
• If we are testing a piece of software at the Graphical User Interface
(GUI) level (testing at the system level), then the inputs are related to
what can be entered into the text fields, what can be selected from
menus, which radio buttons can be checked, et cetera.
• If we are unit testing a method m in class C at the code level (i.e., we
have access to the code of m), each of the parameters of the method
is an input, but also the attributes of class C can be considered inputs
to the method m.
• If we are testing a class C as a whole at the interface level (i.e., we do
not necessarily have the code, but we do have the API-description for
C and we need to test the integration with another class), methods
m are invoked for objects o of type C that represent the current state
(that is for example o.m( x, y)). These methods can have input param-
eters (like x and y), but may also depend on the state of the object o
for which they are invoked.

DEFINITION 6.1 The set of all possible input values of one input variable is called an
input domain.

This chapter describes a test design method called partition testing or

equivalence class testing of the input domain.

95
Universidad Politécnica de Valencia Structured Software testing

D1
D
D3

D5
D4

FIGURE 6.3 Partitioning the domain

Since it is impossible to test a SUT with every possible value for every
input variable, the partition testing method described in this chapter
explores the input domains of the SUT with the intention of reducing the
number of tests. We divide (partition) the input domains into subsets
assuming that all elements of the same subset should result in similar
behaviour from the test object. That is, each subset is assumed to be an
equivalence class concerning that behaviour. Then we only test with a
representative value from each equivalence class.

Consider, for example, Figure 6.3. Suppose we have a SUT, some be-
havioural characteristic C and an input domain D. If we partition D
into D1 , D2 , . . . , D5 , then we assume that for all i (1 ≤ i ≤ 5), input
x ∈ Di and input y ∈ Di are considered to be equivalent for testing, i.e.
executing the SUT with input x gives rise to the same behaviour concerning
characteristic C as executing the SUT with input y. Consequently, we only
test the SUT with 5 representatives, one of each domain.

Note that our assumption on what is equivalent can be wrong. We

might not know precisely what the partitions should really be accord-
ing to the implementation or the failures that we are searching for. We
are just making this model of the input domain for a testing purpose
based on some notion of equivalence that we can deduce from the infor-
mation about the behavioural characteristic C and an input domain D
that we have available.

For example, instead of testing with every single model of printer, the
tester might treat all Hewlett-Packard inkjet printers as roughly equiv-
alent. Consequently, we have a test with only one HP printer as a rep-
resentative of that entire set.

Partition testing might lead to a simplification of the test problem. To

some it might even sound non-sensical: instead of making an assump-
tion on what is equivalent that can be wrong (and hence fails to find
some errors), why do we not just assume there are no errors in the soft-
ware, and then we do not need to perform any tests. However, partition
testing has some important properties [4, 51].
• It lets the tester identify test suites of manageable size by selecting
a test case for each equivalence class. It gives us the opportunity to
test widely across the entire surface before we delve deeper into the
parts.

96
Chapter 6 Input domain modelling with equivalence classes

• It allows test effectiveness to be measured in terms of coverage con-

cerning the partition model created.
• The process of partitioning forces the tester to start thinking system-
atically about what matters. Often, this process alone leads to errors
being found in for example the specifications, the documentations or
even the SUT.
• It is a relatively fairly easy technique to get started quickly.

6.3 History and related work

Partition testing has its roots in papers by Ostrand and Balcer in the
1980s [92]. In other books and articles describing similar techniques for
test case specification based on dividing the input space, you can find
them under the names input space partitioning [4], equivalence class testing
[57, 65], equivalence class partitioning [28, 102], and category partitioning
[14].

6.4 Making a model

Since equivalence class partitioning is done based on the input domains

that are relevant to the characteristic that we want to test, we call the test
model for this technique an input domain model. It basically consists of
appropriate equivalence classes of each identified input domain.

As with all testing techniques, making the model is both the most crit-
ical and a challenging part. Critical because bad domain models give
rise to bad test cases. Challenging because there is no one fit-for-all
solution to come up with good partitions for testing.

Defining equivalence classes is a subjective process based on explo-

ration [10], heuristics [84, p54], intuition [82, p117] , experience [18],
creativity [4, p78], craftsmanship [57, p101], guessing [57, p101] and
trial and error [82, p110].

In [31] it is nicely stated that "equivalence is in the eye of the tester".

Two people analysing a program will come up with a different domain
model of equivalence classes.

Although there is no clear guide on how to make a good domain model

for a SUT, because it all depends on the system you have to test and the
information that you have, we can present some high-level steps for
input domain modelling.
1 Identify which behavioural characteristics of the SUT you want to
test (for example some functionality, some processing, some com-
ponents, et cetera).
2 For each behavioural characteristic C and each possible outcome ex-
amine the relevant input variables and their input domains D (for
example the parameters of a class method, the attributes of a class,
the signature or interface of a class, the text fields in a GUI, et cetera).
Note that the input domain D of an input variable is made up of all
possible values that can be entered for this input variable in the SUT.
3 For each domain D, find a partition that yields equivalence classes
with respect to C.

97
Universidad Politécnica de Valencia Structured Software testing

exam 50 exam 50

coursework 20 coursework 20

Calculate Final Grade is A

Grade

exam 100 Fault Message

Mark is outsite
coursework 20 exepted range

Calculate
Grade

FIGURE 6.4 Graphical User Interface of the generate-grade app.

4 Refine or adjust the characteristic C if necessary.

To master the craft of partition testing you need to practice it a lot. Let
us start with some examples and then some exercises.

6.4.1 Example: testing at GUI level

Let us look at an example system taken from [26]. This is an exam-

ple where domain models can be defined based on a description of the
functionality of the SUT.

EXAMPLE 6.1 Consider an app on your telephone that is called generate-grade. The
GUI looks like in Figure 6.4. It has the following specification:

The component is passed an exam mark (out of 75) and a coursework

mark (out of 25), from which it generates a grade for the course in the
range "A" to "D". The grade is calculated from the overall mark, which
is the sum of the exam and coursework marks, as follows:
• greater than or equal to 70: "A"
• greater than or equal to 50, but less than 70: "B"
• greater than or equal to 30, but less than 50: "C"
• less than 30: "D"

If a mark is outside its expected range then a fault message ("FM") is

generated. All inputs are passed as integers.

98
Chapter 6 Input domain modelling with equivalence classes

First attempt at partitioning

One obvious behavioural characteristic to test for this system is:

C1 = calculate a final grade given an exam mark and a coursework

mark.

To test this characteristic, we have two input variables: exam and course-
work. A first attempt to partition the domains of these variables could
be as shown in Table 6.1.

input part comment

ID values
exam vP1 [0, 75] valid input according to spec
iP1 ]−∞, 0[ invalid
iP2 ]75, +∞[ invalid
iP3 not an integer value invalid
coursework vP2 [0, 25] valid input according to spec
iP4 ]−∞, 0[ invalid
iP5 ]25, +∞[ invalid
iP6 not an integer value invalid
TABLE 6.1 Generate-grade: first attempt

Note that the parts are given an identifier (ID) so that we can reference
these more easily later on. These identifiers also mark parts of the par-
titions as valid or invalid parts:
• valid parts (vPi ) are composed of values that are valid according to
the specification, with respect to both their type and their values.
• invalid parts (iPi ) are composed of values that are invalid according
to the specification, either because the specification indicates this or
because the specification does not mention it. In generate-grade there
are two types of invalid parts:
– values of the right type but invalid according to the described func-
tionality. For example, exam marks are passed as integers, so that
is a valid type; however, they cannot be negative. Therefore nega-
tive integers are considered invalid.
– values of the wrong type. For example, exam marks are expected
to be integers, and anything that is not an integer is an invalid
value. Although in principle, these are not to be given to the pro-
gram, we still need to consider them for testing the behaviour of
the SUT when faced with these invalid inputs.

Partitioning invalid parts

Partitioning the invalid parts of a domain is done with respect to the
following characteristic, which states the idealistic behaviour of a SUT
when given invalid values:

Cinvalid = indicate in a fault message FM why this value cannot be

accepted.

99
Universidad Politécnica de Valencia Structured Software testing

Note that we might not always have all the details about the specific
fault messages that can arise. For example in the description above, it is
just said that a fault message is generated when a mark is outside of the
range. It can or cannot contain more information about why it was not
accepted. Also the specification is unclear about what happens when
non-integer inputs are passed. Maybe they are not accepted? Or maybe
the same fault message ("FM") is raised?

It could therefore be argued that we do not know whether iP1 and iP2
are different equivalence classes, because when they both result in the
same fault message ("FM") the elements of vP1 and vP2 are equivalent
concerning that characteristic. Following that reasoning, we might as
well put all the invalid inputs in one equivalence class! However, from
a testing point of view, that is not what we want. Although it is not
explicitly specified what should happen when invalid inputs are given
to the system, something will definitely happen, and we need to test
whether this something is acceptable.

Consequently, for invalid partitions, our goal is not only to test whether
for invalid inputs the SUT behaves according to the specification, but
also whether it does correct error handling and has considered all in-
valid inputs. For this, we should test any possible sort of invalid input,
and partition the invalid parts of the domains as much as possible us-
ing a very detailed Ci . For example, we can update Table 6.1 as shown
in Table 6.2.

input part comment

ID values
exam vP1 [0, 75] valid input according to spec
iP1 ]−∞, 0[ FM exam mark cannot be negative
iP2 ]75, +∞[ FM exam mark cannot be more than 75
iP3 not an integer value FM exam mark should be integer
coursework vP2 [0, 25] valid input according to spec
iP4 ]−∞, 0[ FM coursework mark cannot be negative
iP5 ]25, +∞[ FM coursework mark cannot be more than 25
iP6 not an integer value FM coursework mark should be integer

TABLE 6.2 Generate-grade: better description of invalid parts

According to the different fault messages mentioned in the comments,

all iPs are now different equivalence classes with respect to Cinvalid .

A closer look at partitioning the valid parts

Let us now have a look at the valid parts of the input domains of exam
and coursework. They actually do not correspond to the equivalence
classes of C1 when we take C1 to be:

C1 = calculate a final grade given an exam mark and a coursework

mark.

The valid parts we have now are the equivalence classes defined by the
following characteristic:

100
Chapter 6 Input domain modelling with equivalence classes

C0 = input value does not result in a fault message.

However, now we are clearly not testing whether different grades are
calculated in the right way.

We need to refine our partition to make it correspond to equivalence

classes with respect to C1 . That way we take into account the busi-
ness rules defined for calculating a final grade. We need to be more
creative and partition the input values in such a way that we can distin-
guish different ways of processing the inputs according to these busi-
ness rules. From the description we know that based on the sum of
exam and coursework we have four different outcomes:

0 ≤ exam + coursework < 30 gives rise to a D

30 ≤ exam + coursework < 50 gives rise to a C

50 ≤ exam + coursework < 70 gives rise to a B

70 ≤ exam + coursework ≤ 100 gives rise to an A

This means there is a dependency between the inputs that together de-
termine the outcome. In this case we could introduce an intermediate
variable and use that to define the partitions. In this case the intermedi-
ate variable would represent the sum of exam and coursework.

This seems a better domain model for the generate-grade app as you
can see in Table 6.3.

If we draw these inequalities in a diagram with coursework on the x-

axis and exam on the y-axis, the equivalence classes become visible, see
Figure 6.5.

There are various ways you can describe these equivalence classes in
your test model; you can pick the one that you like most or that fits the
test problem at hand best.

input part comment

ID values
exam vP1 [0, 75] valid
iP1 ]−∞, 0[ FM exam mark cannot be negative
iP2 ]75, +∞[ FM exam mark cannot be more than 75
iP3 not an integer value FM exam mark should be integer
coursework vP2 [0, 25] valid
iP4 ]−∞, 0[ FM coursework cannot be negative
iP5 ]25, +∞[ FM coursework cannot be more than 25
iP6 not an integer value FM coursework should be integer
exam + coursework vP3 [70, 100] outcome A
vP4 [50, 70[ outcome B
vP5 [30, 50[ outcome C
vP6 [0, 30[ outcome D

TABLE 6.3 Generate-grade: introduction of intermediate variable

101
Universidad Politécnica de Valencia Structured Software testing

exam (e)

c = 25
100

80
e = 75
70
A
60 e + c = 100

50
B
40
e + c = 70
30
C
20 e + c = 50

10
D
e + c = 30

0 10 20 30 40 50 60 70 80 90 100 coursework (c)

FIGURE 6.5 Equivalence classes of the generate-grade app

6.4.2 Example: testing at code level

Testing at code level is typically done by the programmer while writ-

ing the code. It will probably go like this: the programmer designs test
cases, executes them with some unit testing tool (e.g. JUnit when pro-
gramming in Java), and when one of the tests fails, starts looking for the
fault in the code, or in the test case (either with the use of a debugger,
or simply by trying out the code by hand). When found, the fault is
repaired –hopefully– and this should be tested again. It is therefore a
cyclic process. To demonstrate this kind of testing –here combined with
input partitioning– we use an example inspired by a component from
[111].

EXAMPLE 6.2 A software application of a kitchen appliance store contains a compo-

nent that is responsible for calculating the price of a dishwasher. The
store has the crazy discount weeks!
• The starting point is baseprice, the base price of the dishwasher.
• There is a discount over the baseprice that was granted by the
vendor.
• On top of this there is a specialprice for a special model of the
machine.
• If the colour of the special model is white then the WHITEdiscount
(14%) is applied to the specialprice. If the colour is platinum then the
PLATINUMdiscount is applied (7%). Other colours (red, pink and
black) have no colour discount.

102
Chapter 6 Input domain modelling with equivalence classes

• There is an extraprice for extras (rack, timer delay, et cetera)

• If two to five extras are chosen (that are not part of the special
model), there is an extra discount of 10% on the extraprice. If
six or more extras are chosen this extra discount is 15%.

The following code is supposed to calculate the price (any faults are
deliberate, see Exercise 6.7). This is the method that has to be tested.
1 public double calculate_price (double baseprice,
2 int discount,
3 double specialprice,
4 int colour,
5 double extraprice,
6 int extras)
7 {
8 double specialprice_with_discount;
9
10 if (colour == 0)
11 specialprice_with_discount = specialprice * 0.86;
12 else if (colour == 1)
13 specialprice_with_discount = specialprice * 0.93;
14 else specialprice = extraprice;
15
16 double extraprice_with_discount;
17
18 if (extras >= 2)
19 extraprice_with_discount = extraprice * 0.90;
20 else if (extras >= 6)
21 extraprice_with_discount = extraprice * 0.85;
22 else extraprice_with_discount = 0;
23
24 return baseprice * (1-discount/100.0)
25 + specialprice
26 + extraprice_with_discount;
27 }

One obvious behavioural characteristic for valid inputs for this simple
example is:

C = calculate the discounts that have to be applied.

To test this characteristic, we have six input variables, as we can see

from the method’s signature. Actually, at this point, we could just as
well be testing at the interface level (see also Section 6.4.3), since until
now we only used information that is public.

Looking at just the specifications, we can partition the domains of most

variables, as shown in Table 6.4. Only for the colours does the spec-
ification fail to provide enough information, but now we can use the
code to conclude that colour == 0 apparently means "white", and
colour == 1 means "platinum" (assuming that the code is correct in
that aspect, of course).

103
Universidad Politécnica de Valencia Structured Software testing

input part comment

ID values
baseprice vP1 [0, +∞[ valid
iP1 ]−∞, 0[ cannot be negative
iP2 not a double should be a double
discount vP2 [0, 100] valid
iP3 ]−∞, 0[ % cannot be negative
iP4 ]100, +∞[ % cannot be more than 100
iP5 not an integer % needs to be integer
specialprice vP3 [0, +∞[ valid
iP6 ]−∞, 0[ cannot be negative
iP7 not a double should be a double
colour vP4 0 valid, WHITEdiscount 14%
vP5 1 valid, PLATINUMdiscount 7%
vP6 ]1, +∞[ valid, no colour discount
vP7 ]−∞, 0[ valid, no colour discount
iP8 not an integer should be an integer
extraprice vP8 [0, +∞[ valid
iP9 ]−∞, 0[ cannot be negative
iP10 not a double should be a double
extras vP9 [0, 1] valid, extra price discount 0%
vP10 [2, 5] valid, extra price discount 10%
vP11 [6, +∞[ valid, extra price discount 15%
iP11 ]−∞, 0[ cannot be negative
iP12 not an integer should be an integer
TABLE 6.4 Discounts: domain model

Note that we included two parts containing invalid integers for discount:
]−∞, 0[ and ]100, +∞[. Instead, we could have taken the union of these
two intervals to form one larger part containing all invalid integers for
discount. That would certainly be alright in many situations. How-
ever, probably negative percentages are treated differently from per-
centages larger than 100, hence our choice for two separate parts.

Note as well that we have marked colour ∈ ]−∞, 0[ as a valid part,

since the textual specification does not mention anything about the do-
main of colour, but from the code we can see it has type int. The
specification does mention that there are only three other colours (red,
pink and black), but we have no way of knowing what the correspond-
ing int values are.

6.4.3 Example: testing at class interface level

Finally, let us look at an example where we need testing at the class

interface level. As indicated above, here the input domain can contain
objects that represent the current state of the program.

104
Chapter 6 Input domain modelling with equivalence classes

FIGURE 6.6 The classes Element and LinkedList

FIGURE 6.7 The interface of getNextElement

EXAMPLE 6.3 Let us consider a class LinkedList that represents a singly linked list. It
offers in its interface a method getNextElement (see Figure 6.7) that
will return the next element, provided the list is not empty. It will wrap
from last to first element when reaching the end of the list. When called
on an empty list, it will throw an EmptyListException.

Again there is one obvious behavioural characteristic C in this system

that we can test:

C = get the next element in the list.

What are the input variables that we can use to test the functionality of
the method getNextElement? We need an object list (abbreviated
l) of type LinkedList that represents the state of the list at a specific
moment. Furthermore, we need a way to distinguish specific, though
abstract, test cases. For instance, we want to ascertain that all goes well
when the list has size 1, or when we ask for the next element of the
last element in the list. To be able to make these distinctions, we will
use a virtual variable lastVisited (abbreviated lV), indicating the last
element visited by getNextElement. We come up with the partitions
for list and lastVisited shown in Table 6.5.

Note that we chose to indicate the first position in the list with index 1,
i.e., lastVisited ranges from 1 to l.size().

Moreover, note that we will never use the variable lastVisited in

dealing with the program, i.e., we will never use it when calling the
method getNextElement. Its sole purpose is to make it easier to pick
appropriate lists from vP2 to test.

105
Universidad Politécnica de Valencia Structured Software testing

input part comment

ID values
list iP1 l.size() = 0 return EmptyListException
(l) vP1 l.size() = 1 valid, return this element
vP2 l.size() > 1 valid, return next element
lastVisited vP3 lV undefined valid when first call of getNextElement()
(lV) vP4 1 ≤ lV < l.size() valid when l.size() > 1
vP5 lV = l.size() at end of the list, need to wrap
TABLE 6.5 Linked list: domain model

6.5 Exercises

We strongly encourage you to really put some effort into solving the ex-
ercises below (and in later chapters as well). You will notice that it takes
some time, some exploring and some trial and error (or experience), to
really comprehend the problem at hand and its most important pecu-
liarities.

Moreover, when comparing your own answer to our answer, you will
see some very elaborate answers in this workbook. We do not expect
you to elaborate this much; we just try to show you a possible process
to reach a satisfactory solution, a process that is both model-based and
exploratory. As always there are more answers possible.

EXERCISE 6.1
Partition the input domain of the following application description adapted
from [111].
A company orders an application that needs to calculate the annual
bonus of its employees. This bonus is a percentage of their monthly
salary, and depends on how long they have worked for the company.
In the requirements the following rules are found:
• more than three years at the company yields a bonus of 50%
• more than five years yields a bonus of 75%
• more than eight years yields a bonus of 100%

EXERCISE 6.2
Partition the input domains of the following function Z


 x+y 10 < x ≤ 20 ∧ 12 < y ≤ 30
∀ x, y ∈ N :: Z ( x, y) = x−y 0 ≤ x ≤ 10 ∧ 0 ≤ y ≤ 12
0 otherwise


106
Chapter 6 Input domain modelling with equivalence classes

EXERCISE 6.3
A hardware store sells hammers (5 euros) and screwdrivers (10 euros).
Over time however, their discount system has grown a bit complex.
They have asked the nephew of the boss (who is studying computer
science) to develop a little application that can calculate the price a cus-
tomer needs to pay when buying these products. They have the follow-
ing discount rules:
• If the total is more than 200 euros, then the client obtains a discount
of 5% over the total;
• If the total is more than 1000 euros, then the client obtains a discount
of 20% over the total;
• If the client buys more than 30 screwdrivers, then there is an addi-
tional discount of 10%.

Partition the input domains of this application.

6.6 Coverage criteria

Now that we have our model –in this case: domain partitions– we can
go to the next two steps from Figure 6.1: making test cases by selecting
test values (Step 3) that cover the model (Step 2). Before we can pick
specific values for our test cases, we have to decide where those values
should come from (i.e., from which part of the domain) and how values
for different input variables should be combined.

In general, there are three basic rules:

1 You can combine values from valid parts in the same test case.
2 You cannot combine values from invalid parts in the same test case.
You will need one test case for each invalid part.
3 You should try to reduce the number of different test cases. Time is
money!

The reason behind Rule 2 is easy to see. When using a value from an
invalid part, the goal of the test is to find out how the SUT behaves for
this invalid part. If we were to combine it with other invalid parts, we
would not know to which invalid part the outcome of the SUT is related.

Actually, Rule 1 should be "you should also combine values from valid
parts in the same test case" (so not just "you can"), because there is no
guarantee that failures are caused by just one input. Of course this cre-
ates the risk of not satisfying Rule 3, since taking all kinds of combi-
nations of inputs into account means more test cases. This is where
coverage criteria for combining the parts come into play.

We always have to choose both a coverage criterion for the valid parts
of the input domains and a coverage criterion for the invalid parts. We
start with descriptions of two coverage criteria for valid parts: All Com-
binations and Each Choice.

107
Universidad Politécnica de Valencia Structured Software testing

6.6.1 All Combinations coverage

The most obvious choice would be to take, for each input variable, all
valid parts of its input domain, and then to combine (the parts of) these
input variables in every possible way. This is known as All Combinations
coverage (abbreviated as ACC or AC coverage).

The number of test cases we need for AC coverage can be calculated by

multiplying the number of valid parts for each input variable. So for
Example 6.1 (see Table 6.3) we need: 1 × 1 × 4 = 4 test cases to reach
100% AC coverage (1 valid part for exam, 1 valid part for coursework and
4 valid parts for the intermediate input variable we considered). Within
the boundaries of the combined parts, we are then free to choose values
for the input variables, as shown in Table 6.6.

test exam coursework expected covers

case outcome parts
1 60 20 A vP1 , vP2 , vP3
2 44 22 B vP1 , vP2 , vP4
3 32 13 C vP1 , vP2 , vP5
4 12 5 D vP1 , vP2 , vP6
TABLE 6.6 Test suite for generate-grade with 100% AC coverage

A test suite existing of only test cases 1, 2 and 3, for example, would
result in 75% AC coverage.

For Example 6.2 (see Table 6.4) we need at least 1 × 1 × 1 × 4 × 1 × 3 =

12 test cases to reach 100% AC coverage. An example of such a test
suite can be found in Table 6.7. From this we can see how much extra
work a little bit of complexity for an input variable can cause. Note
that we have chosen one value for baseprice to use in every test case,
and similarly for discount, specialprice and extraprice. This
makes it easier to spot the values from different domains in the other
columns. However, from testing quality perspective, it would be better
to vary the values in the first, second, third and fifth column as well.

For Example 6.3 we would need 2 × 3 = 6 test cases to cover all com-
binations (see Table 6.5). However, vP4 is not valid when vP1 holds so
these two cannot be combined. Therefore we need only 5 test cases to
achieve maximal AC coverage (see Table 6.8).

So AC coverage seems like a good idea for covering many possibilities.

However, since the number of tests needed is the multiplication of the
number of valid parts per input variable, it can easily blow up. Imag-
ine that the dishwasher store gets another discount that gives rise to
another 5 parts, then we are already at 60 test cases. And these exam-
ples are simple toy problems for illustration of the techniques. You can
imagine that real systems have a lot more input variables and corre-
sponding valid parts.

108
Chapter 6 Input domain modelling with equivalence classes

base discount special colour extra extras expected expected covers

price price price discounts outcome parts
1 199.99 15 % 34.50 0 60.00 1 14% colour 259.66 vP1 , vP2 , vP3 ,
0% extra vP4 , vP8 , vP9
2 199.99 15% 34.50 1 60.00 1 7% colour 262.08 vP1 , vP2 , vP3 ,
0% extra vP5 , vP8 , vP9
3 199.99 15% 34.50 5 60.00 1 0% colour 264.49 vP1 , vP2 , vP3 ,
0% extra vP6 , vP8 , vP9
4 199.99 15% 34.50 -5 60.00 1 0% colour 264.49 vP1 , vP2 , vP3 ,
0% extra vP7 , vP8 , vP9
5 199.99 15% 34.50 0 60.00 4 14% colour 253.66 vP1 , vP2 , vP3 ,
10% extra vP4 , vP8 , vP10
6 199.99 15% 34.50 1 60.00 4 7% colour 256.08 vP1 , vP2 , vP3 ,
10% extra vP5 , vP8 , vP10
7 199.99 15% 34.50 5 60.00 4 0% colour 258.49 vP1 , vP2 , vP3 ,
10% extra vP6 , vP8 , vP10
8 199.99 15% 34.50 -5 60.00 4 0% colour 258.49 vP1 , vP2 , vP3 ,
10% extra vP7 , vP8 , vP10
9 199.99 15% 34.50 0 60.00 8 14% colour 250.66 vP1 , vP2 , vP3 ,
15% extra vP4 , vP8 , vP11
10 199.99 15% 34.50 1 60.00 8 7% colour 253.08 vP1 , vP2 , vP3 ,
15% extra vP5 , vP8 , vP11
11 199.99 15% 34.50 5 60.00 8 0% colour 255.49 vP1 , vP2 , vP3 ,
15% extra vP6 , vP8 , vP11
12 199.99 15% 34.50 -5 60.00 8 0% colour 255.49 vP1 , vP2 , vP3 ,
15% extra vP7 , vP8 , vP11

TABLE 6.7 Test suite for dishwasher discount with 100% AC coverage

6.6.2 Each Choice coverage

Another way to reduce the number of test cases is not to take all com-
binations but instead to ensure that (a value from) each part occurs in
at least one test case. This is known as Each Choice coverage (abbreviated
as ECC or EC coverage).

Looking at Example 6.1 we have 6 valid parts that can be combined in 4

test cases to reach 100% EC coverage. In this case, that happens to be the
same number of test cases as for AC coverage. It is even possible to use
precisely the test cases from Table 6.6 since 100% AC coverage implies
100% EC coverage and the test suite in Table 6.6 does not contain more
test cases than we need for 100% EC coverage.

For Example 6.2, we have 11 valid parts that can be combined into 4
test cases to reach 100% EC coverage. For example, we can just take test
cases 1, 2, 7 and 12 from Table 6.7 and renumber them: see Table 6.9.

Example 6.3 has 5 valid parts in total. All of them are covered by test
cases 1, 4 and 5 from Table 6.8.

109
Universidad Politécnica de Valencia Structured Software testing

test list lastVisited expected covers

case outcome parts
1 [(a,1)] undefined (a,1) vP1 , vP3
2 [(a,1)] 1 (a,1) vP1 , vP5
3 [(b,2),(c,3),(d,4)] undefined (b,2) vP2 , vP3
4 [(b,2),(c,3),(d,4)] 2 (d,4) vP2 , vP4
5 [(b,2),(c,3),(d,4)] 3 (b,2) vP2 , vP5
TABLE 6.8 Test suite for LinkedList with maximal AC coverage

base discount special colour extra extras expected expected covers

price price price discount outcome parts
1 199.99 15% 34.50 0 60.00 1 14% colour 259.66 vP1 , vP2 , vP3 ,
0% extra vP4 , vP8 , vP9
2 199.99 15% 34.50 1 60.00 1 7% colour 262.08 vP1 , vP2 , vP3 ,
0% extra vP5 , vP8 , vP9
3 199.99 15% 34.50 5 60.00 4 0% colour 258.49 vP1 , vP2 , vP3 ,
10% extra vP6 , vP8 , vP10
4 199.99 15% 34.50 -5 60.00 8 0% colour 255.49 vP1 , vP2 , vP3 ,
15% extra vP7 , vP8 , vP11

TABLE 6.9 Test suite for dishwasher discount with 100% EC coverage

ECC is the simplest strategy. For these small and easy examples, it might
even appear that you do not need to know about ECC and could have
come up with these test cases immediately without that much thought.
However, if the number of input variables grows, and accordingly the
number of different parts, then ECC will provide you with a tool that
allows you to work in a structured manner.

ECC is the simplest strategy and also the least effective for obtaining
good test cases, since it does not consider combinations. However, if
you have many different parts, it might be the only strategy that pro-
vides a feasible number of test cases that can be executed in a given
time with a given budget.

6.6.3 Other ways of combining

So, ACC seems like a good idea for covering many possibilities, but can
easily blow up and become impractical. ECC is the simplest way of
combining, but is less effective in obtaining good test cases.

In between ACC and ECC, a whole range of other combination coverage

criteria exist. These are called t-wise (or t-way) coverage (abbreviated
as TWC), meaning that all possible t-tuples of valid parts of the input
variables must be covered in at least one test case.

The insights underlying t-wise coverage are that:

• not every input variable contributes to every fault, and
• many faults are caused by interactions between a relatively small
number of input variables [70]: typically, t ∈ {2, . . . , 6} suffices.

110
Chapter 6 Input domain modelling with equivalence classes

ACC basically means t is the number of input variables: given all input
variables, every combination of valid parts of all these variables must
be covered in at least one test case.

ECC comes down to t =1: each valid part of the domain of every variable
must be covered in at least one test case.

The higher the number t, the more test cases there will be, and the more
effort the testing will take. t-wise coverage for t = 2 is also called pair-
wise coverage or All Pairs Testing.

Constructing all t-wise combinations and at the same time keeping the
number of test cases as low as possible can be very tedious to do by
hand. Techniques that are frequently used to create t-wise test sets in-
clude orthogonal arrays and covering arrays. However, many other tech-
niques and tools have been investigated and discussed. For example,
[51] contains a survey.

Later on, in Chapter 9 we will take a more detailed look at combinato-

rial or combinations testing. We postpone this to a later chapter since
these criteria do not only apply to domain testing, but they are useful
coverage criteria for many testing techniques and deserve their own
chapter. Combinatorial testing makes a trade-off between cost and ef-
fectiveness.

6.6.4 Invalid Part coverage

As said earlier, you cannot combine values from invalid parts in the
same test case. You will need one test case for each invalid part (see
Section 6.6). This is known as Invalid Part coverage (abbreviated as IPC
or IP coverage).

For Example 6.1, 100% IP coverage comes down to 6 test cases; for in-
stance, the ones shown in Table 6.10. The expected outcome as stated in
the specification is that a specific fault message (FM) is generated.

test exam coursework expected covers

case outcome parts
1 -10 13 FM exam mark cannot be negative iP1
2 83 5 FM exam mark cannot be more than 75 iP2
3 7.5 23 FM exam mark should be an integer iP3
4 23 -10 FM coursework cannot be negative iP4
5 45 50 FM coursework cannot be more than 25 iP5
6 67 7.5 FM coursework should be an integer iP6

TABLE 6.10 Test suite for generate-grade with 100% IP coverage

For Example 6.2, we have 12 invalid parts and so need 12 test cases for
IP coverage. The expected outcome of the test cases corresponding to
iP2 , iP5 , iP7 , iP8 , iP10 and iP12 is that the SUT (the Java method) or the
compiler will give a type error. However, the expected outcome for iP1 ,

111
Universidad Politécnica de Valencia Structured Software testing

iP3 , iP4 , iP6 , iP9 and iP11 is not explicitly described in the specification.
We probably found an error here already due to this incomplete specifi-
cation, as the method calculate_price does not check whether the
variables are within the desired domains.

For Example 6.3, there is only one invalid part, containing the empty
list.

6.7 Exercises

We conclude the subject of coverage criteria (for now) with some exer-
cises. The first three are quick and easy, the fourth is not! We recom-
mend that you definitely take the time to do Exercise 6.7.

EXERCISE 6.4
For your domain model from Exercise 6.1 (annual bonus), design test
suites using EC coverage and AC coverage. How many test cases do
you need to obtain 100% IP coverage?

EXERCISE 6.5
From your domain model from Exercise 6.2 (Z-function), design test
suites using EC coverage and AC coverage. How many test cases do
you need to obtain 100% IP coverage?

EXERCISE 6.6
From your domain model from Exercise 6.3 (hardware store) design test
suites using EC coverage and AC coverage. How many test cases do you
need to obtain 100% IP coverage?

EXERCISE 6.7
a Act as if you are the programmer of the code in Example 6.2 (dish-
washer) and you want to test the code using JUnit. You want to use the
test case values from this chapter, repair any faults and test again until
you are convinced the code is correct. Which set of test cases do you
use (i.e., the one with AC coverage, or the one with EC coverage)? Why?
b What about the tests for IP coverage? We already indicated that
test cases corresponding to iP2 , iP5 , iP7 , iP8 , iP10 and iP12 will give a
type error for the Java method, so we do not have to test these with
JUnit. However, the expected outcome for iP1 , iP3 , iP4 , iP6 , iP9 and
iP11 needs completing the specification. Let us suppose that method
calculate_price should throw an InvalidInputException when
the variables are of the correct type but not within the desired domains.
Define that exception, use it to complete the implementation, and test
iP1 , iP3 , iP4 , iP6 , iP9 and iP11 with JUnit.

112
Chapter 6 Input domain modelling with equivalence classes

6.8 Faults that can be found

If a model consists of partitions of input domains, faults will be found

when inputting a value of some part Pi in the SUT results in anything
that is not the functionality f i belonging to the values of this part. These
are basically computation faults: the wrong function is applied to some
part Pi in the implementation.

Moreover, depending on the coverage criterion that has been selected,

we will find faults due to interactions between input variables. A t-
way interaction fault is a fault triggered by a particular combination of
t input variables. A simple fault is a t-way fault where t = 1; a pairwise
fault is a t-way fault where t = 2.

113
Chapter contents 7

Input domain boundaries

Overview 115

114
Chapter 7

Input domain boundaries

OVERVIEW

In the previous chapter, we have modelled the relevant input domains

using equivalence classes. We have divided (partitioned) the input do-
mains into subsets that result in similar behaviour from the test object.
Then we only test with a representative value from each equivalence
class. In this chapter we will analyse the boundaries of these input do-
mains to make sure we test them well.

The boundary testing that we will do in this chapter is going to be very

detailed. Some might argue that for boundary testing it is way too de-
tailed, and that other courses simply explain this in one page. They
might also argue that this elaborate technique is not used in practice, so
why do it? However, it is our belief that to learn testing and obtain the
testing mindset that we are aiming at, you should thoroughly do this at
least once in your life. This chapter is your chance to do that.

LEARNING GOALS
After studying this chapter, you are expected to:
– understand and be able to describe domain boundary testing
– be able to apply boundary value analysis using the 1×1 coverage
criterion.

CONTENTS

7.1 Introduction

This chapter is organised according to the same format as the previous

chapter, i.e., make a model – choose a coverage criterion – design test
cases. For the boundary analysis technique discussed in this chapter, it
turns out that, in practice, one coverage criterion is sufficient and so we
do not really have to choose.

In the previous chapter, we discussed a domain testing technique that

partitions the domain of each input variable into parts (equivalence
classes), such that all values in one part are considered equivalent for
testing some behavioural characteristic. In the present chapter, we in-
troduce another domain testing technique, called boundary testing or
boundary value analysis. It is often – but not only, and not always – used
after partitioning, since common sense indicates that points near or on
a boundary are more sensitive to program faults than other points.

115
Universidad Politécnica de Valencia Structured Software testing

min max

5 <= X < 10

FIGURE 7.1 Closed and open boundaries in a one-dimensional domain

Boundary b

P1 P2

Boundary a
P4
P3

FIGURE 7.2 Open and closed boundaries in a two-dimensional domain

Boundary testing has its roots in papers by White and Cohen in the late
1970s, early 1980s [125]. Jeng and Weyuker [55] simplified it in an article
in the mid-1990s. Since then, it has been described in several books, for
instance [14] and [11].

In the sequel we use the terms “domain” and “part” interchangeably,

i.e., with “domain” we may also mean a “part” of an input domain
after partitioning.

7.2 Boundaries: ON and OFF points

The boundaries of a numerical domain can be open or closed. The nature

of a boundary of a one-dimensional numerical domain can be made
visible in the interval notation we have been using, for instance [5, 10[.
One-dimensional domains can also be depicted by lines, using • to de-
note a closed boundary and ◦ to denote an open boundary (for an ex-
ample see Figure 7.1).

In two-dimensional scenarios, we use little dashes that point towards

the area that includes the boundary values. For example, in Figure 7.2,
points on the boundary line a belong to domain P3 or P4 , because the
dashes under boundary a point down to these domains. Similarly, points
on the boundary line b belong to P1 or P4 because the dashes of bound-
ary b point up towards these domains.

Boundary testing is not difficult, but possibly a bit tedious [11]. First we
need some more definitions.

We need a way to specify the essential properties of values on and

around boundaries of a specific domain. We describe these in Defini-
tion 7.1.

DEFINITION 7.1 An IN point belongs to the domain, an OUT point does not.
An ON point is on a boundary, an OFF point is near a boundary.

116
Chapter 7 Input domain boundaries

OUT IN OUT

ON OFF

ON ON

FIGURE 7.3 An illustration of IN, OUT, ON and OFF points

The definition states that OFF points should be near the boundary. The
interpretation of this depends on the type of the domain. For integers
“near” means +1 or -1. For decimals, it means the smallest supported
decimal fraction.

Moreover, in theory, each ON point has two OFF points: one to the left
of the boundary and one to the right (or: one below the boundary and
one above). However, in some situations, only one of those OFF points
is necessary, as described below.

For boundaries that are formalised by an equality (e.g., x = 23, in an

integer domain), we indeed choose both OFF points, i.e., one to the right
of the boundary (in this case x = 24) and one to the left (x = 22).

For boundaries that are formalised by an inequality (e.g., x ≥ 0 or y < 0),

we use the following rules to cover all four kinds of points:
• if the ON point is IN, choose the OFF point OUT, and
• if the ON point is OUT, choose the OFF point IN.

So for inequalities, we choose only one of the two available OFF points
for an ON point, as shown in Figure 7.3. There, the leftmost boundary
point (ON) is IN the domain (since it is a closed boundary), so we choose
its OFF point OUT of the domain. Similarly, the rightmost boundary
point (ON) is OUT, so we choose its OFF point IN the domain.

Note that in the case of x = 23 above, these rules apply as well: the ON
point is IN, both OFF points are OUT.

7.3 Finding ON and OFF points for different types of values

Depending on the type of the values in a domain, finding appropriate

ON and OFF points is either easy or requires a bit of thinking. Here we
describe a few cases of interest.

Moreover, domains can have dimensions higher than 1, i.e. n-dimensional

domains. How ON and OFF points are chosen for these is discussed in
Section 7.3.4.

7.3.1 Numerical types

Finding ON and OFF points for numerical types is straightforward. In

Exercise 7.1 you can find some examples.

117
Universidad Politécnica de Valencia Structured Software testing

EXERCISE 7.1
Look closely at the examples in the table and make sure you understand
these in the light of the previous definitions.
Variable Boundary ON OFF
point point
1 y∈N y≤6 6 7
2 y∈N y<6 6 5
3 x∈R x > 10 10 10.00001
4 x∈R x ≤ 20 20 20.00001

7.3.2 Non-numerical types

Partitions of non-numerical types like strings, booleans, or enumera-

tions are typically listed or described by a property. ON and OFF points
here are the same as IN and OUT points, respectively. If possible, you
can try to minimise the difference between ON and OFF points.

For example for an input of type String and a domain like

{“red”, “orange”, “yellow”, “green”},

pick as ON point “orange” and as OFF point “oronge”, just changing a
single character.

However, if the input has an enumerated type that uses identifiers that
behave as constants, this is not possible. Imagine we have an enumer-
ated type

enum Colour {red, orange, yellow, green, blue, pink}

Then for a variable c and a part {red, orange, green}, the ON

point is IN the part (e.g. red) and the OFF point is OUT of the part
(e.g. pink).

The boolean type is often a pre-defined enumeration of the values False

and True.

7.3.3 User-defined types

Input domains for object-oriented programs typically include objects

that are instances of complex user-defined data types [14]. So how can
we define ON and OFF points for these?

Some user-defined types are essentially the same as primitive types,

and the domain boundaries can be modelled the same way as for prim-
itive types. We have actually already seen that in Example 6.3: there
we modelled a list using the size of the list and the lastVisited
position. Both of these can be modelled with numerical partitions.

Below we discuss two other kinds of user-defined types: classes with

attributes, and state abstractions.

Classes with attributes

As another example consider a class Money1 :
1A frequently used example in Junit http://junit.sourceforge.net/

118
Chapter 7 Input domain boundaries

class Money {
private int fAmount;
private String fCurrency;

public Money(int amount, String currency) {

fAmount = amount;
fCurrency = currency;
}

public int amount() {return fAmount;}

public String currency() {return fCurrency;}

}

Each instance of class Money can be represented by an amount and a

type of currency. Both of these can be modelled by types of which we
know how to pick ON and OFF points.

State abstractions
Other classes can be less simple. Sometimes we can use state abstrac-
tions to define a suitable partition [14, p. 408]. For example, for any im-
plementation of a stack or a queue we can define three abstract states:
empty, loaded and full. We can use these state abstractions to define par-
tition conditions, for example:

vP1 = empty vP2 = not empty

In this case, there is only one boundary: between vP1 and vP2 , and this
boundary itself does not have a value. Therefore, we cannot choose an
ON point exactly as it was meant: on the boundary. Instead, we use
the following rules for choosing an ON and an OFF point when crossing
a boundary from part (state) P into part (state) Q, both defined using
state abstractions:
• an ON point is an object that satisfies the partition condition of P (so it
is IN P), but the smallest possible change would cause a state change,
and hence an object that does not satisfy the partition condition (and
hence is OUT of P, and IN Q).
• an OFF point is an object that does not satisfy the partition condition
of P (so it is OUT of P, and IN Q), but the smallest possible change
would cause a state change, and hence an object that does satisfy it.

Note that, following these rules, crossing the boundary from P to Q

yields an ON point IN P (say p), and an OFF point OUT of P (hence IN
Q, say q). However, crossing the same boundary in the opposite direc-
tion will yield an ON point IN Q (hence OUT of P, for instance q again),
and an OFF point OUT of Q (hence IN P, so for instance p again). In
both cases, point p is supposed to exhibit behaviour corresponding to P,
and q is supposed to be treated according to Q’s specification. In other
words, we do not have to cross each border in both directions. One ON
point and one OFF point per boundary are sufficient. Alternatively, we
could say that in the case of state transitions, it is not relevant to get the
ON/ OFF/ IN/ OUT terminology exactly right, as what is labelled as an
ON and IN point when crossing a boundary in one direction, becomes
an OFF and OUT point when crossing it in the other direction. However,
in both cases, it is the same point, with the same expected behaviour.

119
Universidad Politécnica de Valencia Structured Software testing

FIGURE 7.4 Examples of ON and OFF points in case of state abstractions

For example, let us look at the stack s, and suppose M > 1 is the maxi-
mum number of possible values in the stack. If we look at the partition
of the input domain into vP1 (i.e., empty) and vP2 (i.e., not empty) de-
fined above, we can make the following choices. The empty stack has
s.size = 0, and the smallest possible change that causes the stack to be
not empty any more is pushing an element onto s, resulting in s.size = 1.

In Figure 7.4 we determine ON and OFF points for all borders between
the three states empty, loaded and full of a stack. Note that we do not con-
sider the transition of state loaded to state loaded. This transition happens
for instance when pushing an element onto a stack that is not empty and
not full, resulting in a stack that is still not full. We skip this transition on
purpose because for the discussion in this chapter only transitions that
cross a border are of interest, while the transition from loaded to loaded
happens “in the middle” of the area satisfying the state condition loaded.

EXERCISE 7.2
What are the ON, OFF, IN and OUT points for the boundaries resulting
from partitioning according to the state conditions full and not full? An-
swer the same question for loaded and not loaded.

7.3.4 n-dimensional types

If boundary conditions involve more than one variable, ON and OFF

points can be found by solving equations. In Example 6.1 (see Ta-
ble 6.3, and Figure 6.5, which we repeat here for your convenience, in
Figure 7.5) we have boundary conditions with 2 variables, exam and
coursework, so the parts vP3 through vP6 are 2-dimensional.

How can we proceed finding ON and OFF points for two of the bound-
aries of vP3 , which are defined by 70 ≤ exam + coursework ≤ 100? First,
we pick a value for one of the variables, let us say coursework. The value
we pick should not violate any of the other relevant parts concerning
coursework in Table 6.3; in this case vP2 , which states that 0 ≤ coursework
≤ 25. A rule of thumb here is to pick, if possible, a mid-point of the
valid part for the variable. In this case coursework = 12 is somewhere in
the middle of vP2 .

120
Chapter 7 Input domain boundaries

exam (e)

c = 25
100

80
e = 75
70
A
60 e + c = 100

50
B
40
e + c = 70
30
C
20 e + c = 50

10
D
e + c = 30

0 10 20 30 40 50 60 70 80 90 100 coursework (c)

FIGURE 7.5 Equivalence classes of the generate-grade app

Look at the lower bound of part vP3 , which is 70 ≤ exam + coursework.

With our chosen value of 12 for coursework, we get 70 ≤ exam +12,
which yields 58 ≤ exam. The value 58 satisfies the boundaries for exam
(i.e., vP1 ), so we can pick:

ON point: (coursework: 12, exam: 58), and

OFF point: (coursework: 12, exam: 57).

The upper bound of vP3 is exam + coursework ≤ 100. Here filling in the
number 12 for coursework results in exam ≤ 88. So if coursework is 12,
then to end up on the upper bound of vP3 we have to choose exam = 88.
However, this violates vP1 (0 ≤ exam ≤ 75), so this cannot be used as an
ON point. Here, Figure 7.5 comes in handy: there we see immediately
that the only point on the upper bound of vP3 that satisfies both vP1 and
vP2 is (25, 75). Hence for the upper bound of vP3 we get:

ON point: (coursework: 25, exam: 75), and

OFF point: (coursework: 25, exam: 76).

Note that there are other possibilities. One of these is to pick a value
for exam first and then solve the equation with coursework; then we get
different ON and OFF points.

EXERCISE 7.3
Find ON and OFF points for vP3 by picking a value for exam first.

121
Universidad Politécnica de Valencia Structured Software testing

From Exercise 7.3 it is clear that in theory, it does not matter whether we
fix the value of exam or the value of coursework. In practice, however, it
does matter: starting from a fixed value of coursework is much less work.
Moreover, it may also matter which representation of the partition we
use: from Figure 7.5 it is immediately clear that fixing a value for course-
work (let us say 12) ensures that we encounter all boundaries involved
in the generate-grade problem. If, however, we choose to start with
exam, we have to find a new starting value for each boundary.

7.4 1x1 boundary coverage

There is one coverage criterion for boundary value analysis, that has
proven to be sufficient for all possible scenarios ([55], [14]):

DEFINITION 7.2 The 1×1 (“one-by-one”) coverage criterion calls for 1 ON point and 1 OFF
point for each domain boundary. The OFF point should be as close as
possible to the ON point.

As explained in Section 7.2, for boundaries defined by an equality, we

choose two OFF points. And, as explained in the previous section, for
boundaries defined by state conditions, we simply use two points, one
on each side of the boundary, and are not concerned about the exact
terminology.

You might wonder why the OFF point should be as close as possible to
the ON point. This will be explained in Section 7.6.

7.5 The domain test matrix

The domain test matrix (DTM) from Robert Binder [14] provides a conve-
nient representation to design a test suite for boundary value analysis
with the 1×1 coverage criterion. Figure 7.6 shows what an empty ma-
trix looks like and how it is built up of variables, boundary conditions,
ON/ OFF/ IN points and test cases.

Each matrix contains the conditions that describe exactly one domain
of which we want to analyse the boundaries (for example an equiva-
lence class). Each column is a test case. Since each test case is meant
to test one aspect of one boundary of the domain, only one ON or OFF
point appears in each test. These values appear on the diagonal of the
matrix. IN points are generated for all other variables in each test case:
these should not be boundary points. The IN points need to be chosen
before or after the ON and OFF points, depending on the conditions and
what you need. IN points are chosen by guessing, (pseudo)random se-
lection algorithms or analysing the situation [14], making sure that the
ON point is as close to the OFF point as possible [55].

In the remainder of this section, we demonstrate the use of the domain

test matrix by designing a 1×1 test suite for a boundary value analysis
of Example 6.1. The other two examples from Chapter 6 are left as an
exercise. But first, we want to make an observation about the number
of ON/OFF points needed for a boundary between two adjacent parts.

122
Chapter 7 Input domain boundaries

FIGURE 7.6 Domain test matrix

EXERCISE 7.4
For each of the following examples of situations with two adjacent in-
teger domains (hence with a common boundary), determine the ON
and OFF points needed for that boundary, when looking at it from both
sides. The boundary itself is y = 23 in all cases.
a Which ON and OFF points are needed for y > 23? And for y < 23?
b The same question for y ≥ 23 and y < 23.
c The same question for y > 23 and y ≤ 23.

From Exercise 7.4 we conclude that, in situations where there are two
adjacent domains:
• in all cases, the ON point can be reused for both adjacent domains;
• if the boundary is IN exactly one of the two domains, the OFF points
for both domains are the same;
• otherwise, two different OFF points are needed, one for each domain.

7.5.1 Domain test matrices for generate-grade

In Example 6.1 we distinguished 2 inputs: exam (from now on abbre-

viated as e) and coursework (c). The equivalence classes are described
by means of conditions on intermediate input variable: e + c. The valid
ones from Table 6.3 are:

123
Universidad Politécnica de Valencia Structured Software testing

input part comment

ID values
e vP1 [0, 75] valid
c vP2 [0, 25] valid
e+c vP3 [70, 100] outcome A
vP4 [50, 70[ outcome B
vP5 [30, 50[ outcome C
vP6 [0, 30[ outcome D

We now need to make a matrix for each equivalence class. This way we
can analyse its boundaries. There are two variables, so the skeleton of
the DTM will look like this:

Now we have to look at the different conditions that define the bound-
aries of the equivalence classes. Again (like Exercise 7.3) we can choose
to express the conditions on e using c or the other way around. Let us
express the conditions on e in c, which means that we can analyse the
boundary conditions of the domain induced by vP2 and vP1 /vP3 using
the following DTM:

124
Chapter 7 Input domain boundaries

To analyse the boundaries induced by these conditions (i.e. domain A

in Fig 7.5), we start filling the diagonal of the matrix with one ON and
OFF point per boundary as the 1×1 strategy dictates.

• Test case 1 is column 1 of the matrix and corresponds to the ON point

for partition condition c ≥ 0 and some typical IN point for e. Note
that when c = 0, the conditions that e should satisfy are: e ≥ 0,
e ≤ 75, e ≤ 100, e ≥ 70. This means 70 ≤ e ≤ 75. Choosing 73 is
somewhere in the middle.
• Test case 2 is column 2 of the matrix and corresponds to the OFF point
for partition condition c ≥ 0 and some typical IN point for e. We
chose the same IN as for test case 1.
• Similarly, test cases 3 and 4 correspond to the ON and OFF points for
partition condition c ≤ 25 and some typical IN point for e.
• Test cases 5 to 12 are for the boundaries concerning e + c. Based on
earlier experience, we pick some typical value for c, solve the result-
ing equation to get e, and then determine the appropriate ON and OFF
points.
Note here that the typical c-values for test cases 5-6 need to be the
same because the ON and OFF points of e should be close together.
However, for test cases 7-8 (and 9-10, and 11-12), we can choose a
different c-value than 12. Trying to avoid repeating the point values
might increase the chance of revealing unexpected bugs [14]. We did
not do this in the table, you can try out yourself how this changes the
test cases.

So we have a total of 12 test cases for domain A. If we draw these on a

graph representing the input domain partition, it becomes even clearer
what the idea behind the 1×1 strategy is. The red dots below depict the
test cases from the domain matrix above.

125
Universidad Politécnica de Valencia Structured Software testing

exam (e)

c = 25
100
P
90

80 K
e = 75
70
A
60 e + c = 100
L
50
B H
40
e + c = 70
30
C I
20 e + c = 50

10 J
D e + c = 30

0 10 20 30 40 50 60 70 80 90 100 coursework (c)

M N

Note that for test cases 9 and 10, picking typical value c = 12 causes
the ON point for e ≤ 100 − c to contradict e ≤ 75. Here we can make
a test case design decision. If we also choose to test the boundary be-
tween domains K and P we need to violate e ≤ 75. If we have informa-
tion that indicates that this boundary is not at risk of being violated we
can choose another typical value for c (this can only be 25, a boundary
value) and then test cases 9 and 10 coincide with the points (25,75) and
(25, 76). The latter always seems to be a good point to test and if we do
not have it, we could add it to the test suite.

Also, note that boundary value analysis as described above does not
provide test cases for the “corners” of a domain (i.e., the points where
two or more boundaries coincide, like (0,75) for instance). If you want to
include the corners as well, you will be doing robust worst-case boundary
testing, as Jorgensen calls it [57]. There is a trade-off between being
thorough and keeping the number of test cases to a minimum.

Let us continue with the DTM for the domain induced by vP2 and vP1 /vP4 .
The domain matrix is below:

126
Chapter 7 Input domain boundaries

This adds 6 test cases, indicated with green dots in the graph:

exam (e)

c = 25
100
P
90

80 K
e = 75
70
A
60 e + c = 100
L
50
B H
40
e + c = 70
30
C I
20 e + c = 50

10 J
D e + c = 30

0 10 20 30 40 50 60 70 80 90 100 coursework (c)

M N

As you can see, the domain matrix for vP4 generates some test cases that
we already added when analysing vP3 , so we do not add these again.
This is indicated by the lack of a test case number in the top row. The
reason for not adding the test case is shown in the bottom row.

For the domains induced by vP5 and vP6 the domain matrix are:

127
Universidad Politécnica de Valencia Structured Software testing

The resulting graph with depicted test cases can be found in Figure 7.7.

7.5.2 Exercises

The objective of the following exercise is not really to come up with the
test cases, but to practise making DTMs, so that when we are doing it
for problems with dimensions higher than 2 –for which graphs are not
easily drawn– you will be fluent and confident at it. Then you will be
able to use DTMs as a way to grasp higher dimensions easily.

128
Chapter 7 Input domain boundaries

exam (e)

c = 25
100
P
90

80 K
e = 75
70
A
60 e + c = 100
L
50
B H
40
e + c = 70
30
C I
20 e + c = 50

10 J
D e + c = 30

0 10 20 30 40 50 60 70 80 90 100 coursework (c)

M N

FIGURE 7.7 Test cases for vP5 and vP6 from Example 6.1

Determining the structure of your DTMs requires practice and experi-

ence. It may take some trial-and-error to get it right. And of course,
right just means that you end up with a good test suite, not that you
have the perfect DTM (whatever that may be).

EXERCISE 7.5
Based on your domain model for the Z function from Exercise 6.2, con-
struct the DTMs needed to get test cases that give 100% 1×1 boundary
coverage. Draw a graph that illustrates the domain partition you mod-
elled and how the test cases from the DTMs cover the parts and bound-
aries (similar to that in Figure 7.5).
Hint: Use Excel or another spreadsheet program for the DTMs. You can
draw the graph on paper.

After finishing this exercise and really going through the effort of mak-
ing all of the DTMs, you will have realised that (although sometimes a
bit tedious) DTMs are a useful tool to analyse the boundaries. You will
have acquired a good feeling of how the boundaries are related to the
functionality you are testing. You will also have noticed that using Ex-
cel or a similar program can help a lot here. For the two-dimensional
case (i.e. two variables) you can easily check whether all boundaries
have been analysed: you need to have an ON and OFF point for each of
the boundaries induced by the boundary conditions.

129
Universidad Politécnica de Valencia Structured Software testing

EXERCISE 7.6
Based on the domain model from Exercise 6.3 (hardware store) design
a test suite that gives 100% 1×1 boundary coverage. Draw a figure that
illustrates the domain partitions we modelled and how the test cases
cover the partitions and boundaries.

EXERCISE 7.7
a Design test cases for Example 6.2 (dishwasher, see Table 6.4 on
page 104) using the DTM and the 1×1 boundary strategy.
Hint: There are a lot of variables here, so at first sight, it looks like our
first exercise with more than 2 dimensions. But is it? How can we make
it simpler? On which variables does the functionality depend? If we re-
duce the number of variables in the DTM, what consequences does this
have for our test cases?
b Draw your test cases in a graph representing the input domain par-
titions.
c Add JUnit tests for each of the designed test cases to your code from
Exercise 6.7.

7.6 Faults that can be found

With a model that consists of input domain partitions, we encounter

a failure when inputting a value of some part Pi in the SUT results in
something other than the functionality f i that belongs to the values of
this part.

If we then inspect the code to look for the cause of the failure, we can
find the following two types of faults:
• computation faults: the wrong function is applied to values of Pi in the
implementation.
• domain faults: the boundary between two parts in the implementa-
tion is wrong. General examples of domain faults for open bound-
aries are depicted in Figure 7.8, and similarly for closed boundaries
in Figure 7.9.

In Figure 7.8, the situation is as follows. Suppose that our model de-
scribes a part A of some input domain as an interval of which the left
boundary is open. If we apply the 1×1 strategy, we have to indicate 1
ON and 1 OFF point for this boundary. The ON point (that happens to be
OUT in this case) should have some computation related to part B, while
the OFF point (that should then be IN) should have some computation
related to A.

Figure 7.8 shows four kinds of faults that will be detected with this
strategy. We discuss them all:
• The closure fault occurs when a closed rather than an open boundary
is implemented. It will be found with this strategy because the ON
point should give rise to behaviour different from A (since it is OUT),
but we get behaviour corresponding to A instead.

130
Chapter 7 Input domain boundaries

FIGURE 7.8 Possible domain faults for open boundaries based on [11]

• A boundary shifted right cannot be detected by the ON point only, since

the ON point gives rise to B as expected. However, the OFF point
gives rise to B too and this results in the failure. In this example, we
see that to detect the slightest shift to the right, we need to pick the
OFF point as close as possible to the ON point.
• A boundary shifted left can be detected by the ON point that now gets
A computation but was expected to get B. Note that we cannot dis-
tinguish this fault from the closure fault, but we do know there is a
fault.
• For the missing boundary, the ON point and the OFF point get the same
computation, which results in the failure.

For closed domains the cases are similar, except now the strategy gives
us an ON point (that is IN) that should have computation related to A,
and an OFF point (that is OUT) that should have computation related to
B (see Figure 7.9).

EXERCISE 7.8
Figure 7.9 shows domain boundary faults for closed one-dimensional
domains. Make similar pictures for the closed two-dimensional case
and reason why the 1×1 strategy finds the faults. Imagine that the in-
tended closed domain A to be implemented is like this:

131
Universidad Politécnica de Valencia Structured Software testing

FIGURE 7.9 Possible domain faults for closed boundaries based on [11]

Draw and reason how the ON and OFF point can find the following bugs
in this 2-dimensional boundary: closure bug, shift up, and incorrect
inclination. Also, think of a scenario when the 1×1 strategy might miss
an incorrectly implemented boundary.

132
Chapter 7 Input domain boundaries

133
Chapter contents 8

Decision tables

Overview 135

134
Chapter 8

Decision tables

OVERVIEW

In this chapter we use decision tables as the model to design test cases.
It fits with the way we present the techniques in this course as follows:

Make a model Model the SUT using a decision table

Pick a coverage criterion All explicit variant coverage

Design testcases

Decision tables are essentially just another point of view or represen-

tation of some information. Constructing and checking the decision
table makes it possible to detect combinations of conditions that other-
wise might not have been considered and therefore not tested or devel-
oped. It also makes it possible to detect incompleteness or ambiguities
in the information. While making a decision table, the conditions be-
come much clearer. This makes it easier to observe irregularities with
the conditions, than when they were expressed only in text or code.

Decision tables should not be seen as a technique that is completely

different from equivalence classes and boundary value analysis. All
these techniques have overlap but are also complementary. They all are
merely ways to structure the testing problem.

LEARNING GOALS
After studying this chapter, you are expected to:
– understand the goals of decision table testing
– be able to use decision tables to design test cases.

CONTENTS

8.1 Introduction

Imagine you are trying to test a product that makes many different de-
cisions based on combinations of input variables, or you are reading
a specification that lists a series of conditions under which different
events will occur, or you need to create test data for a system that pro-
cesses these data in complex ways. Then making a decision table as

135
Universidad Politécnica de Valencia Structured Software testing

Number of claims =0 =0 =1 =1 ∈ [2,4] [∈ 2,4] ≥5

≥16 > 25 ≥16 >25 ≥16 >25 ≥16
Age
≤25 ≤85 ≤25 ≤85 ≤25 ≤85 ≤85

Increment premium 50 25 100 50 400 200 0

Send warning letter no no yes no yes yes no
Cancel policy no no no no no no yes
TABLE 8.1 Decision table for car insurance example

your test model and covering it might be a useful approach. Sometimes

you might even find a decision table in a specification. Decision tables
are extensively used in all kinds of applications in business data pro-
cessing, embedded real-time and knowledge-based applications.

For example, imagine a car insurance company. Their business rules to

decide which action to take when a client of a particular age has filed a
particular number of claims may, for example, be formalised using the
decision table in Table 8.1 (example adapted from [123, 14]).

Decision tables are a convenient representation of software and system

requirements where a particular response is to be selected by evaluating
many related conditions. They are a precise yet compact way to model
complicated logic. Almost every software application has functionali-
ties that can be modelled this way.

Decision tables have been around for a long time [64]. Their use in
testing was first described in [50] where they were also called condition
tables.

Decision tables have a strong logical basis and so they are also called
a logic-based testing technique. Decision table testing consists of mak-
ing a decision table or finding one in the SUT’s specification and then
covering it. Decision table testing can help to test combinations of in-
put conditions, that might not be thoroughly tested with input domain
partition and boundary value analysis.

Section 8.2 will introduce the concepts and terminology behind deci-
sion tables. Section 8.3 discusses how we can check whether the deci-
sion table is suitable for testing. Subsequently, Section 8.4 explains how
to cover decision tables. Finally there is a section containing several
exercises. You will learn that the process of making and checking a de-
cision table for a SUT is again an exploratory process of deciding which
conditions interact, how they interact, and which conditions to include.

8.2 Decision tables

Decision tables consist of five parts as shown in Table 8.2: conditions

section, condition entries, actions section, action entries and rules. Ta-
ble 8.2 is a column-based decision table, the rules being specified in
columns. We can also turn it around and specify the rules in rows, re-
sulting in a row-based decision table.

136
Chapter 8 Decision tables

Rules
z }| {
Rule 1 Rule 2 Rule 3 Rule 4
 Condition 1


 Condition 2 Condition
Conditions section Condition 3 entries
Condition 4




Actions section Action 1 Action
Action 2 entries

TABLE 8.2 Structure of a column-based decision table

The conditions section lists the conditions. Inputs or environmental fac-

tors that are referenced in the conditions section are called decision vari-
ables. So, a condition expresses a relationship between decision vari-
ables and hence is a predicate. To say that a condition is satisfied, or
met, just means that the predicate is true.

The actions section lists responses to be produced when corresponding

combinations of conditions are true or false.

Each rule specifies the conditions that should be met to take the indi-
cated actions. The implicit logical operator between conditions is and
(∧). The order in which inputs arrive and conditions are evaluated is
irrelevant.

EXAMPLE 8.1 Table 8.3 shows (the column-based version of) a decision table describ-
ing an insurance renewal specification adapted from [123, 14]. Let us
look at some of the properties of Table 8.3.
• There are two decision variables: num_ claims and age.
• There are three possible actions: increase the premium amount of the
insurance, send a warning letter, cancel the policy.
• There are 4 conditions for the variable num_claims: num_claims =
0, num_claims = 1, num_claims ∈ [2, 4] and num_claims ≥ 5. They
coincide with the partitions of num_claims from Chapter 6.
• Similarly, age has 2 conditions/valid parts: 16 ≤ age ≤ 25 and 25 <
age ≤ 85.
• Consequently, we can combine these conditions in 4 × 2 = 8 rules.
However, for num_claims ≥ 5, the two possible conditions for age are
combined into one condition: (16 ≤ age ≤ 25 ∨ 25 < age ≤ 85) ⇔
16 ≤ age ≤ 85, resulting in the 7 rules in Table 8.3.

Table 8.4 shows the row-based version of Table 8.3, here the rules corre-
spond to the rows of the table.

137
Universidad Politécnica de Valencia Structured Software testing

1 2 3 4 5 6 7
num _claims =0 =0 =1 =1 ∈ [2,4] ∈ [2,4] ≥5

Condit-

section
≥16 ≥16 ≥16 ≥16

ions
> 25 >25 >25
age
≤25 ≤85 ≤25 ≤85 ≤25 ≤85 ≤85

Incr 50 25 100 50 400 200 0

Actions
section
Warning no no yes no yes yes no
Cancel no no no no no no yes
TABLE 8.3 Column-based decision table for the car insurance example

Conditions section Actions section

num_claims age Increment premium Send warning Cancel insurance
1 =0 ∈ [16, 25] 50 no no
2 =0 ∈ ]25, 85] 25 no no
3 =1 ∈ [16, 25] 100 yes no
4 =1 ∈ ]25, 85] 50 no no
5 ∈ [2,4] ∈ [16, 25] 400 yes no
6 ∈ [2,4] ∈ ]25, 85] 200 yes no
7 ≥5 ∈ [16, 85] 0 no yes

TABLE 8.4 Row-based decision table for the car insurance example

8.2.1 Extended/limited entry decision tables

Up until now, we have been using extended condition entry decision tables
[57].

In an extended condition entry decision table, a condition entry can

have any kind of format (e.g. values, intervals, expressions, et cetera).
The types of the condition entries are extended to any type, hence the
name extended condition entry decision table. For example, in case of the
variable num_claims, condition entries are values (i.e. “0” and “1”), the
interval “[2, 4]” and the expression “≥ 5”.

The advantage of decision tables for testing lies in the use of limited
condition entry decision tables [57] or decision tables in a truth table format
[14]. Here, what we can write in the condition entries is limited because
they can contain only True or False (or yes/no, or 0/1, et cetera), which
makes it much like a truth table from logic. Extended entry decision
tables can always be transformed into limited entry decision tables: the
variable num_claims with extended condition entry “[2, 4]” is translated
into the condition “num_claims ∈ [2, 4]” with limited condition entry
True. Table 8.3 can be converted to the truth table depicted in Table 8.5.

In Table 8.5 we observe the following:

138
Chapter 8 Decision tables

1 2 3 4 5 6 7
Conditions =0 C1 T T F F F F F
=1 C2 F F T T F F F
section num_claims
∈ [2,4] C3 F F F F T T F
≥5 C4 F F F F F F T
∈ [16,25] C5 T F T F T F DNC
age
∈ ]25,85] C6 F T F T F T DNC
increment premium 50 25 100 50 400 200 0
Actions
section

send warning letter no no yes no yes yes no

cancel policy no no no no no no yes

TABLE 8.5 Column-based, limited entry decision table (or truth table)

• There are still 6 different conditions. However, now it has become

evident that there are 26 = 64 possible combinations of True/False
for these conditions. This is called the table complexity. Each of the
possible combinations for the conditions is called a variant. So when
talking about variants, we only look at the conditions part of a rule
and disregard the actions part.
• In general, many of these possible combinations do not end up in the
decision table, because we Do Not Care about them (DNC), they Can
Not Happen (CNH), or we Do Not Know (DNK) what will happen
in these cases [14]. The difference between these qualifications will
be explained in Sections 8.2.2 to 8.2.4. Variants that do not show up
in the decision table are called implicit variants.
• The remaining variants are called explicit variants. Table 8.5 has 7
explicit variants for business rules, just like Table 8.3. With a table
complexity of 64 this means there are 64 − 7 = 57 implicit variants.

Here we immediately see some advantages of both representations, which

might make it useful to start with an extended entry table and, for test-
ing, to convert to a truth table:
• Tables with extended entries are more compact and hence easier to
interpret and use for communication purposes (with designers or an-
alysts for example);
• Tables with limited entries provide clearer information about the cause
of implicit variants. The reason for this is that we force ourselves
to take a good look at every possible combination and to decide
whether it is explicit or implicit and why.

Looking at the implicit variants is the essence of using decision tables

for testing purposes. When testing, it is essential to know which im-
plicit variants are present in a decision table, because although some-
thing can be hidden, we need to ask ourselves the following questions:
• Can we test it?
• Do we need to test it?
• And if yes, how do we test it?

We will now describe the qualifications DNC, CNH and DNK in more
detail.

139
Universidad Politécnica de Valencia Structured Software testing

1 2 3 4 5 6 7
Conditions =0 C1 T T F F F F F F
=1 C2 F F T T F F F F
section num claims
∈ [2,4] C3 F F F F T T F F
≥5 C4 F F F F F T T T
∈ [16,25] C5 T F T F T F T F
age
∈ ]25,85] C6 F T F T F T F T
increment premium 50 25 100 50 400 200 0 0
Actions
section

send warning letter no no yes no yes yes no no

cancel policy no no no no no no yes yes

TABLE 8.6 Making DNC into explicit variants

8.2.2 Implicit variant: do not care (DNC)

Do Not Care (DNC) indicates that the value of this condition has no
influence on the resulting actions. In Table 8.3, when 5 or more claims
have been filed, the age of the insured person does not matter anymore;
this is a DNC entry. We do not need to specify more than one explicit
variant because the actions are the same in either case. The truth table
in Table 8.5 makes the DNC variants a bit more visible.

Rule 7 actually counts for two variants, as depicted in Table 8.6. This
means rule 7 has rule complexity 2 because it hides 1 implicit variant.
Knowing the complexity of a rule means knowing how many implicit
variants it is hiding.

From a testing point of view we need to distinguish between the follow-

ing possible situations concerning implicit DNC variants for a specific
SUT (adapted from [14]):
1 The inputs are necessary, but have no effect on the outcome.
2 The inputs may be omitted, but if we supply them they have no
effect on the outcome. We might need to test what happens when
we provide values and what happens when we do not.
3 The inputs cannot be entered into the application. For example,
when the number of claims exceeds 5, the possibility to enter val-
ues for age is disabled.

We need to be clear about these things for every DNC variant.

8.2.3 Implicit variant: cannot happen (CNH)

Can Not Happen (CNH) variants often lead to faults that arise from
unwarranted assumptions about so-called impossible situations.

For example, it cannot happen that the number of claims is simulta-

neously 0 and 3. Similarly, it cannot happen that somebody’s age is
simultaneously younger than 25 years and older than 25 years. So all
these variants are not included and hence implicit.

140
Chapter 8 Decision tables

----------------------------------------------------
50-50 Program Specification
----------------------------------------------------
This is a routine for use on a Field Programmable
Gate Array. It’s intended to supply the sum of two
numbers submitted as input. The numbers must be
integers in the range of 0 to 50. For the benefits
of performance, there is no error checking. Calling
clients are responsible for their own data checking.

The actual program runs on the FPGA,

but we have a version simulated on
Windows too, for testing purposes.
---------------------------------------------------
FIGURE 8.1 50-50 Program Specification exercise provided by James
Bach

However, is that really true? Would it be absolutely impossible? If the

implementation has one variable of type Integer for the number of claims,
then indeed it would be impossible. Binder [14] calls this a type-safe
exclusion. However, imagine that somebody implemented number of
claims not with 1 variable of type Integer, but with a separate boolean
variable for each of the 4 conditions? Then the CNH condition depends
on the consistency of the implementation. And this is precisely what
we need to test! So, in this case we cannot exclude these variants and
need to make them explicit for testing.

Cannot happen is a chronic source of bugs [14]. The Ariane 5 rocket

failure could be described as this type of error [56].

EXERCISE 8.1
To get an idea of how dangerous CNH variants are, let us consider the
program specification in Figure 8.1.
a How would you test such a program?
b There are three different implementations of that specification avail-
able on the course site. If you do not look into these implementations
but just run the tests on them, do you get the same results on all three
implementations?
c If you now take a look at the implementations, can you say anything
about the assumptions you made about things that cannot happen?

8.2.4 Implicit variant: do not know (DNK)

Implicit variants DNK imply that the specification is not complete. For
example, in Tables 8.3 and 8.4 we do not know what will happen if
the age of the insured person is younger than 16 years or older than
85 years. From a testing point of view these implicit variants have to
be taken into account. Because, although we do not know what will
happen if we try to renew the insurance from somebody of age 90, we
do know that something will happen.

141
Universidad Politécnica de Valencia Structured Software testing

When faced with DNK variants, we may have actually found an issue
with the specification which is not complete. Facing this we might need
to go back and explore to find out why the specification is not complete
(maybe it was deliberately underspecified), what information the spec-
ification needs to have more and how the SUT reacts in these cases.

8.2.5 Summary

As we already indicated, the insurance tables have 6 conditions and

hence 26 = 64 possible combinations of True-False for all the conditions.
The tables only have 7 explicit variants for business rules, so there are
64 − 7 = 57 implicit variants that can come from:
• assumptions: do not care (DNC), cannot happen (CNH)
• lack of information: do not know (DNK).

Testing can be seen as the art of managing these assumptions and learn-
ing to fill up the lack of information. Checking the decision table for
the purpose of testing is all about investigating these implicit variants,
checking the underlying assumptions and finding incomplete informa-
tion. We already mentioned most of these checks in the previous sub-
sections, while we were constructing the table. In Section 8.3 we will
summarise the checks. First, in the following exercise you can see for
yourself how many and what assumptions we are actually making.

EXERCISE 8.2
Construct just the conditions section of the entire limited entry decision
table corresponding to Table 8.3, for instance using Excel. Determine
the implicit variants and their causes (DNC, CNH or DNK).
N . B . We will deal with the actions later.

From Exercise 8.2 we see that, for the car insurance example, all CNH
implicit variants are due to the assumption that variables can only have
one value at a time. If we cannot be sure of the validity of this assump-
tion, we should add explicit rules for (some of) these, with as action a
specific error message like “num_claims and age can each only have one
value”.

The DNK implicit variants are all due to an incomplete specification, in

which either num_claims or age or both are unspecified. Table 8.7 shows
how we can augment the decision table with explicit variants for age
not being in the range [16, 85] (column 8) or num_claims not in the range
[0, ∞[ (column 9). We should add a designated error message for each
case to the actions section.

In fact, taking a closer look at the DNKs makes us ask some additional
questions: what does it mean that num_claims does not have a value, or,
more precisely, does not satisfy any of its 4 conditions? Is it undefined?
Is it a negative integer? Is it not an integer? If we want to distinguish
between these possibilities in our decision table, we need to add more
conditions, as shown in Table 8.8.

Now the question is, should we do that? Our answer is: probably not,
because:

142
Chapter 8 Decision tables

1 2 3 4 5 6 7 8 9
Conditions =0 C1 T T F F F F F DNC F
=1 C2 F F T T F F F DNC F
section num claims
∈ [2,4] C3 F F F F T T F DNC F
≥5 C4 F F F F F F T DNC F
∈ [16,25] C5 T F T F T F DNC F DNC
age
∈ ]25,85] C6 F T F T F T DNC F DNC
increment premium 50 25 100 50 400 200 0 no no
Actions
section

send warning letter no no yes no yes yes no no no

cancel policy no no no no no no yes no no
default error message no no no no no no no yes yes

TABLE 8.7 Adding columns to deal with DNK implicit variants

1 2 3 4 5 6 7 8 9 10 11
=0 C1 T T F F F F F DNC F F F
Conditions

=1 C2 F F T T F F F DNC F F F
num claims
section

∈ [2,4] C3 F F F F T T F DNC F F F
≥5 C4 F F F F F F T DNC F F F
is undef C7 F F F F F F F F F T F
is NaN C8 F F F F F F F F F F T
∈ [16,25] C5 T F T F T F DNC F DNC DNC DNC
age
∈ ]25,85] C6 F T F T F T DNC F DNC DNC DNC
Actions
section

increment premium 50 25 100 50 400 200 0 no no no no

send warning letter no no yes no yes yes no no no no no
cancel policy no no no no no no yes no no no no
default error message no no no no no no no yes yes yes yes

TABLE 8.8 Adding conditions to deal with inputs of the wrong type.
Should we do that?

• it would make our table very large, complex and unmanageable (see
for example Table 8.8 where we only started to add two more condi-
tions for invalid inputs for num_claims);
• moreover, adding the checks for validity of input values to our table
would distract us from the core business rules we are trying to test.

However, we obviously need to test these scenarios. This is best done

by using equivalence classes, where all these invalid inputs will be
tested by covering the invalid parts.

This illustrates an important characteristic of the testing techniques we

have seen up to now: they have overlap but are also complementary.
We should pick the one that most suits our needs. Invalid inputs, for ex-
ample, are best done with equivalence classes. Business rules for valid
inputs are made up of the valid equivalence classes that are used as the
conditions in the decision table that combines them. The boundaries
of the equivalence class conditions (e.g. like num_claims ∈ [2, 4] in the
decision table from Figure 8.7) need to be analysed and tested using the
boundary value analysis techniques.

143
Universidad Politécnica de Valencia Structured Software testing

8.3 Checking the decision table

Part of testing using decision tables consists of checking the tables con-
cerning:
1 implicit variants that should be explicit for testing;
2 the information that was used to create the table;
3 the SUT for which test cases need to be designed.

Checking these requires domain knowledge. This process of checking

leads to test cases that can be used to test the SUT but can also suggest
additions to the decision table. Moreover, this process also tests the in-
formation that is used to make the decision table. (Note that we deliber-
ately say information here in its general form: it can be a specification,
but it can also be code.)

The checklist described in the following subsections (adapted from [14,

p. 150]) is not meant to be followed in its entirety after every decision
table you make. Instead, you can see it as a reminder of all the things
that you have explicitly considered and made a decision about while
exploring the problem using a decision table.

8.3.1 Check implicit variants

!1 Check that all implicit variants CNH indeed cannot happen. If they
can happen they should be replaced with an explicit variant such that
the case is forced during testing.
!2 For all implicit variants DNK, check what is unknown/missing
from the information used to design the decision table. It should be
replaced with an explicit variant.
!3 For all implicit variants DNC we need to check how to deal with
the values of the input variables we do not care about. If values for
these inputs:
• are necessary, one test case for the DNC variant is enough. The test
case can contain IN values for these DNC input variables, and checks
whether these inputs indeed have no effect.
• cannot be provided, also one test case for each DNC variant is enough
and we evidently do not need test values for these input variables.
• may be omitted, two test cases are needed per DNC, one with values
and another without values, to check whether they indeed have no
effect. Here we could add an explicit variant such that both cases are
covered.

8.3.2 Check decision table properties

!4 All variants are mutually exclusive: if the conditions for one vari-
ant are met, no other variant is applicable.
!5 Each rule is unique: if several actions are to result from one variant,
multiple actions are defined; the variant is not repeated.
!6 The actions specified for a variant with DNCs are acceptable for all
possible truth values of the DNC conditions.

144
Chapter 8 Decision tables

!4 and !5 are essentially the same. They differ only in their goals: !4
is about the conditions, while !5 is about the actions.

8.3.3 Check testability

Testability of a decision table is how easy it is to use it when testing a

given SUT by a particular tester and test process, in a given context.
Does the decision table really serve to test the SUT? For this we can
check three things:
!7 Each action has an understandable specification such that a tester
can productively investigate them.
!8 All actions are sufficiently observable such that a tester can assess
them or detect their presence or absence during testing.
!9 All specified decision variables are reasonably controllable by the
tester. A tester can enter any combination of variables for which an
explicit variant exists.

Actually, these last three checks should be satisfied for all test suites,
not just the ones that result from a decision table approach!

8.4 Coverage criteria for decision tables

For a decision table that has been checked, all-explicit variant coverage is a
sufficient testing strategy, assuming that adequate testing of the invalid
inputs and domain boundaries is performed.

Additional heuristics that can be applied are [12]:

• In decision tables, action selection is independent of the order in
which the conditions in an explicit variant are evaluated. It is usu-
ally not possible to change the order in which input values are fed to
the SUT or in which conditions are evaluated because the order can
be built into the SUT. However, if we can change the order, we can
make test cases for different combinations of condition evaluation.
• The order in which different variants are tested should have no effect
on the resulting actions. However, if possible we could try different
orders of variant evaluations.

8.5 Exercises

As always in this course, there is not just one right way to tackle a test-
ing problem. In other words, if your solution differs from ours, this
does not necessarily mean that one of them is not correct.

EXERCISE 8.3
Let us return to the specification of Example 6.2 (price of a dishwasher)
and the accompanying code fragment on page 103.
a Infer the conditions and the actions you think are necessary for mak-
ing a decision table for testing this software. What are the business
rules? What inputs do we need to apply them?
b Make both an extended and a limited entry decision table.

145
Universidad Politécnica de Valencia Structured Software testing

c Check the 9 properties mentioned in Section 8.3, if you did not al-
ready consider these while making the table.
d Which test cases do we obtain from this decision table? Add them
to the JUnits that we made in previous chapters (i.e. Exercises 6.7 and
7.7)?

EXERCISE 8.4
The function in this exercise comes from Jorgensen [57]. We use it here
because it illustrates the problem of dependencies in the input domain.
Jorgensen explains that this makes it a perfect example for decision-
table testing, because decision tables can highlight such dependencies.
NextDate is a function that takes three variables (month, day and year),
and returns the date of the day after the input date. The input variables
are of type Integer, and it should work for years starting from 1800.
a Make a limited entry decision table.
b Check the 9 properties mentioned in Section 8.3, if you did not al-
ready consider these while making the table.
c What other ways can you think of to test the NextDate function?
What would you use for an oracle?

146
Chapter 8 Decision tables

147
Chapter contents 9

Combinatorial testing

Overview 149

148
Chapter 9

Combinatorial testing

OVERVIEW

Throughout the previous chapters, we have already seen that test case
design involves combining. For example, truth values of conditions
may have to be combined, or values of different input parameters. Also,
when doing configuration testing (see Section 1.10.3), we need to test
combinations of configuration parameters, e.g. operating systems (ver-
sions of Windows, Linux, macOS, et cetera), browsers (Internet Ex-
plorer, Safari, Chrome, et cetera), different versions of compilers, pro-
cessors, peripherals (e.g. printers, modems), varying amounts of mem-
ory available, et cetera.

Combinations are important to consider when testing since the inter-

action of conditions or parameters might lead to failures. That is why
we dedicate an entire chapter on testing combinations, or combinatorial
testing.

This chapter contains reading assignments that involve reading the fol-
lowing paper, which you can find on the course site:

[69] D. Richard Kuhn, Raghu N. Kacker, and Yu Lei.

Advanced combinatorial test methods for system reliability.
In IEEE Reliability Society 2010 Annual Technical Report
pages 1 – 6, 2010.

LEARNING GOALS
After studying this chapter, you are expected to:
– understand the goals and challenges of testing combinations
– be able to find and use precalculated orthogonal arrays and cover-
ing arrays, both fixed-level and mixed-level
– know of the existence of other techniques and tools to calculate
combinatorial test suites.

CONTENTS

9.1 Introduction

Testing involves combining, as we have seen in previous chapters.

In Chapter 1 (Section 1.9.1.1), we discussed multiple condition cover-

age, a code coverage criterion that demands all possible combinations of
truth values of the conditions in a guard to be obtained during testing.

149
Universidad Politécnica de Valencia Structured Software testing

FIGURE 9.1 Combinatorial testing as a horizontal topic

In Chapter 6, we needed to combine values of different input parameters

from different valid equivalence classes. In that chapter, we defined that
t-wise (or t-way) coverage (abbreviated as TWC) means that all possible t-
tuples of valid parts of the input variables must be covered in at least
one test case.

We saw that the easiest one, giving the least number of test cases, was
1-wise (or ECC), where each valid part of every variable was covered in
at least one test case. The higher the number t, the more test cases there
will be, and the more effort the testing will take. We also referred to
t = 2 as pairwise coverage since then all possible pairs of parameters are
supposed to be covered by at least one test.

In Chapter 8, we again needed to check all possible combinations of truth

values of the conditions in a decision table so that we could reason
about the implicit variants and decide whether they could stay implicit
or needed to be made explicit for testing purposes.

In subsequent chapters, we will see more combinatorics when testing.

In this chapter on testing combinations, or combinatorial testing, we
will stop for a moment and think about this testing technique as a hori-
zontal topic (see Figure 9.1).

9.2 Faults due to interactions of conditions

Reading assignment: Read the first section of [69].

EXERCISE 9.1
The second paragraph of Section 1 from [69] talks about the growing
complexity of software, mentioning the increasing numbers of Lines of
Code (LOC):
• the Pathfinder software with 155,000 LOC
• a Boeing 777 with 6.5 million lines
• the Windows XP operating system with 40 million lines
• an average new car with around 100 million LOC
Although the paper is from 2010, these numbers are still up to date.
How many LOC do you think are in the Android Operating system,
Facebook and all Google services?

EXERCISE 9.2
The text in [69] mentions a couple of studies carried out since the 1990s.
Which three causes of system failures were found in these studies?

150
Chapter 9 Combinatorial testing

Reading assignment: Read Section 2 of [69].

Studies have shown that with 2-way coverage you will, in general, find
more than half of the faults. However, you will have to go up to 6-way
to be reasonably sure about finding all faults.

In reality, of course, we can never know in advance how many faults

there are and what degree of interaction is required to trigger all these
faults in a system. The investigations mentioned here have taken a more
practical alternative: collect empirical data on faults that occur among
similar systems in various application domains. Such data is obtained
from failure reports and bug-tracking database systems, for example.

Subsequently, the studies investigated the Failure-Triggering Fault In-

teraction number (FTFI) [71] for each of these known failures. The FTFI
is the number of conditions required to trigger a failure. For example,
if a microwave oven control module fails when power is set on "High"
and time is set to 20 minutes, then the FTFI is 2.

If a long history of failure data shows that a particular type of applica-

tion has never required the interaction of more than four parameters to
reveal a failure, then an appropriate testing goal for that class of appli-
cations might be to test all 5-way or fewer interactions.

9.3 Combinatorics

The following example is given in the last paragraph of Section 1 of the

paper [69]. A manufacturing automation system has 20 controls (i.e. pa-
rameters), each with 10 possible settings (i.e. values), and hence a total
of 1020 combinations. If we want to test these 2-way, the paper indicates
that we need less than 200 tests, if the tests are carefully constructed.

We start with recalling the mathematics behind this. Then we will show
what it is all about in a small example and, finally, we will briefly return
to the manufacturing automation system.

Mathematics
The number of ways of choosing t things from a set of n things, in no
particular order, is given by the binomial coefficient (nt), for 0 ≤ t ≤ n.
In other words, (nt) denotes the number of possible t-way combinations
of n parameters. The binomial coefficient can be calculated as follows:

n n!
=
t t!(n − t)!
Here n! = 1 · 2 · . . . · (n − 1) · n is the number of different permutations
of n parameters.

EXAMPLE 9.1 Consider 6 parameters a, b, c, d, e, f . Suppose that these 6 parameters

can each take 3 different test values (0, 1 and 2). To cover all 36 = 729
combinations of values –i.e. AC coverage or 6-way coverage– we would
need 729 test cases.

151
Universidad Politécnica de Valencia Structured Software testing

If this is too expensive, we can opt for 2-way coverage, for instance.
There are (62) = 15 different 2-way parameter combinations: ab, ac, ad,
ae, af, bc, bd, be, bf, cd, ce, cf, de, df, ef (in combinations, order is not
important, so ab denotes the same combination as ba).

Since each parameter has 3 values, the number of 2-way value combina-
tions to cover is (62) × 32 = 15 × 9 = 135. Even if we make one test case
for each of these 135 combinations, the number of test cases is already
significantly less than 729.

However, we can reduce the number of test cases even further by realis-
ing that one test case covers (much) more than one 2-way value combi-
nation. For example, 012012 contains one pair of values for each pair of
parameters: 01 for ab, ae, db and de; 00 for ad and so on. Every test case
with concrete values for each of the 6 parameters covers 15 different
2-way parameter combinations.

If we carefully construct the test cases, we can cover all 135 different
value combinations in remarkably few tests. In fact, 15 test cases are
sufficient, as shown below. Note that the – here means DNC (Do Not
Care).

abcdef abcdef abcdef abcdef abcdef

001221 011010 020002 102000 110120
121212 200112 212201 221100 022121
21–022 1–0211 ––2–12 –––2–0 –––0–1

Alternatively, we could opt for 3-way coverage. There are (63) = 20 dif-
ferent 3-way parameter combinations: abc, abd, abe, abf, acd, ace, acf, ade,
adf, aef, bcd, bce, bcf, bde, bdf, bef, cde, cdf, cef, def. So the number of 3-way
value combinations that must be covered is (63) × 33 = 20 × 27 = 540.
Again, if we carefully construct the tests, we can cover these in 49 tests:

abcdef abcdef abcdef abcdef abcdef

000121 001211 002020 010010 011102
012221 020222 021000 022111 100211
101020 102102 110120 111202 112001
120100 121121 122210 200201 201110
202011 210111 211220 212122 220021
221112 222200 211002 12001– 000002
122022 110212 121211 202222 002200
011021 210100 –01–02 212210 11111–
100220 001012 221101 220102 0–2–12
22–020 0–21-0 0–––01 ––1–22

152
Chapter 9 Combinatorial testing

Note the use of the word “Alternatively” when introducing the 3-way
option above: you never choose for 2-way and 3-way coverage, since
covering all 3-way combinations inevitably means having all 2-way
combinations as well.

Returning to the example in [69] about the manufacturing automation

system: we need to cover (20 2
2 ) × 10 = 19, 000 different 2-way combina-
tions, which according to the paper can be done in less than 200 tests, if
the tests are carefully constructed.

The question now is: how do we carefully construct these 200 tests?

9.4 Orthogonal and covering arrays

Reading assignment: Read Section 3 of [69].

Orthogonal and covering arrays are referred to as mathematical objects

in the paper. They form two possible ways to calculate a test suite that
covers a given combination criterion.

There are many other so-called combination strategies. In [51] there is an

overview of all possible methods. There are non-deterministic strate-
gies that depend on some degree of randomness (e.g. using simulated
annealing, genetic algorithms or other bio-inspired algorithms). Or-
thogonal and covering arrays are deterministic approaches; they will
always produce the same result given a specific input.

9.4.1 Orthogonal arrays

Orthogonal arrays can be traced back to the famous mathematician Eu-

ler1 in the context of Latin squares. The application of orthogonal arrays
to testing was first introduced in 1985 by Mandl [73].

An orthogonal array OA( N, |S|k , t) is an N × k matrix in which the entries

are from a finite set S of symbols such that any N × t subarray contains
each t-tuple exactly the same number of times:
• N is the number of rows in the matrix (called the runs);
• k is the number of columns (called the factors);
• |S| is the cardinality of the set S and indicates the number of different
values (also called levels) of the factors;
• t is the strength.

For example an OA(4, 23 , 2) (or OA(4, 3, 2, 2)) where S = {0, 1} is:

run f1 f2 f3
1 0 0 0
2 0 1 1
3 1 0 1
4 1 1 0

1 https://en.wikipedia.org/wiki/Leonhard_Euler

153
Universidad Politécnica de Valencia Structured Software testing

Different notations are used in many works on orthogonal arrays. For

example, OA( N, |S|k , t) can also be found as OA( N, k, |S|, t). Sometimes
the N is left out: OA(k, |S|, t), and sometimes the order of the param-
eters is different: OA(t, k, |S|). Also we have the L-notation: L N (|S|k ).
Here it is implicitly assumed that t = 2.

The index of an orthogonal array, denoted by λ, is equal to |SN|t . The

index indicates how many times a t-tuple appears in any of the N × t
subarrays. In the example above: λ = 242 = 1, i.e. each 2-tuple (= pair)
appears exactly once, which is easy to check looking at the array.

Imagine a SUT with 4 toggle buttons that can be on, off or disabled.
That means we have 4 factors (k = 4) and |S| = 3. If we want to do
2-way testing we need to find an orthogonal array OA( N, 34 , 2). Note
as a tester you do not create orthogonal arrays yourself. Just locate the
right one for your purpose. For example, N. J. A. Sloane has a library2
of over 200 precalculated orthogonal arrays. The only thing you as a
tester need to do is find the right array and map your own values onto
the values used in the array you found.

For example, we can find3 the following OA(9, 34 , 2) (or OA(9, 4, 3, 2)):

run f1 f2 f3 f4
1 0 0 0 0
2 0 1 1 2
3 0 2 2 1
4 1 0 1 1
5 1 1 2 0
6 1 2 0 2
7 2 0 2 2
8 2 1 0 1
9 2 2 1 0

For our test problem we can turn that into the following tests when we
map 0 7→ on, 1 7→ off and 2 7→ disabled:

run toggle1 toggle2 toggle3 toggle4

1 on on on on
2 on off off disabled
3 on disabled disabled off
4 off on off off
5 off off disabled on
6 off disabled on disabled
7 disabled on disabled disabled
8 disabled off on off
9 disabled disabled off on

The orthogonal arrays above are also known as fixed-level orthogonal ar-
rays, because we assume that all k factors draw their values from the
same set S. However, in practice this is not the case. For this we need
mixed-level orthogonal arrays.

2 http://neilsloane.com/oadir/
3 http://neilsloane.com/oadir/oa.9.4.3.2.txt

154
Chapter 9 Combinatorial testing

k
A mixed-level orthogonal array is denoted by MA( N, s11 s2k2 . . . skmm , t), where:
• N is again the number of runs
• the number of columns (i.e. factors) is ∑im=1 k i
• m is the number of different sets of levels s1 . . . sm
• each factor k i (1 ≤ i ≤ m) has si levels (i.e. can take si different
values)
• t is the strength

For example, imagine we have a SUT that needs to be tested on different

configurations. We distinguish 6 variables with each 3 options and 1
variable with 6 possible values:
• operating systems: { MacOS, Linux, Windows }
• server systems: { Windows, Linux, Unix }
• user: { admin, guest, registered }
• time: { day, evening, night }
• battery: { low, medium, full }
• connection: { Wi-Fi, cable, data }
• browsers: { IE, Firefox, Mozilla, Safari, Chrome, Opera }

If we want to test this 2-way, we can find4 the following MA(18, 36 61 , 2):

run OS server user time battery connection browser

1 0 0 0 0 0 0 0
2 0 1 2 2 0 1 1
3 0 2 1 2 1 0 2
4 0 1 1 0 2 2 3
5 0 2 0 1 2 1 4
6 0 0 2 1 1 2 5
7 1 1 1 1 1 1 0
8 1 2 0 0 1 2 1
9 1 0 2 0 2 1 2
10 1 2 2 1 0 0 3
11 1 0 1 2 0 2 4
12 1 1 0 2 2 0 5
13 2 2 2 2 2 2 0
14 2 0 1 1 2 0 1
15 2 1 0 1 0 2 2
16 2 0 0 2 1 1 3
17 2 1 2 0 1 0 4
18 2 2 1 0 0 1 5

We should map the values onto each column to obtain a 2-way test
suite. For example:
• operating systems: 0 7→ MacOS, 1 7→ Linux, 2 7→ Windows
• server systems: 0 7→ Windows, 1 7→ Linux, 2 7→ Unix
• user: 0 7→ admin, 1 7→ guest, 2 7→ registered
• time: 0 7→ day, 1 7→ evening, 2 7→ night
• battery: 0 7→ low, 1 7→ medium, 2 7→ full
• connection: 0 7→ Wi-Fi, 1 7→ cable, 2 7→ data
• browsers: 0 7→ IE, 1 7→ Firefox, 2 7→ Mozilla, 3 7→ Safari, 4 7→ Chrome,
5 7→ Opera

4 http://neilsloane.com/oadir/MA.18.3.6.6.1.txt

155
Universidad Politécnica de Valencia Structured Software testing

9.4.2 Covering arrays

Covering arrays have only one key difference with orthogonal arrays.
While an orthogonal array OA( N, |S|k , t) covers each possible t-tuple
exactly λ times in any N × t subarray, a covering array CA( N, |S|k , t)
covers each possible t-tuple at least λ times in any N × t subarray. The
covering array relaxes the restriction that each combination is covered
exactly the same number of times. Thus covering arrays may result
in some test duplication, but they offer the advantage that they can be
computed for much larger problems than is possible for orthogonal ar-
rays.
k
Mixed-level covering arrays are denoted by MCA( N, s11 s2k2 . . . skmm , t).

Again, as a tester you do not create covering arrays yourself. There are
tools (see Section 9.5) and also websites5,6 where they can be found.

On page 4 of [69], in Figure 2, you can find CA(13, 210 , 3).

Note that in the above examples, we just found an array that precisely
fits our needs in terms of number of factors and corresponding levels.
However, this is not always possible. If this happens we can simply
take the next larger array [36]. In [18] simple rules can be found.
k
Imagine we are looking for an array M(C)A( N, s11 s2k2 . . . skmm , t).
l lp
We can use another M(C)A( M, r11 r2l2 . . . r p , t) when:

1 the chosen array has at least as many factors (columns) as the prob-
p
lem we are testing. That is: ∑im=1 k i ≤ ∑i=1 li . If there are too many
columns we can just drop the ones we do not need because it maps
a factor that does not exist.
2 the chosen array must hold at least enough unique levels in the
columns to hold all the options for each factor. We can replace any
unused number with a valid option for the factor (this option is a
DNC). Evidently, we cannot just delete rows in the chosen array.
We can only delete rows when they only contain DNC entries or
t-tuples that are covered in other rows.

On the NIST website7 only fixed-level CAs are found. However, if we

k
are looking for an MCA( N, s11 s2k2 . . . skmm , t), we can:

1 calculate the maximum level smax (with ∀1 ≤ i ≤ m :: si ≤ smax )

2 take the sum of all factors k total = ∑im=1 k i
k total
3 take a CA( N, smax , t) or in the notation of the website CA(t, k total , smax ).

5 https://math.nist.gov/coveringarrays/
6 http://www.public.asu.edu/~ccolbou/src/tabby/catable.html
7 https://math.nist.gov/coveringarrays/

156
Chapter 9 Combinatorial testing

9.5 Challenges for practical application of combinatorial testing

Reading assignment: Read Section 4 of [69].

In this section two issues are discussed:

1 We need tool support for calculating combination strategies for given
t-wise criteria.
2 There is an enormous number of possible tests for a SUT.

The second issue is common to all of software testing and has been
discussed in Section 1.6.

We will take a brief look at tool support for combinatorial testing. There
are many tools; below we list three that are related to this course.

ACTS The authors of the paper from the reader promote a tool called
ACTS (Advanced Combinatorial Testing System). The tool can be ob-
tained by sending an e-mail to the first author, Rick Kuhn8 . ACTS sup-
ports t-way test set generation for 1 ≤ t ≤ 6.

AllPairs Allpairs.pl is a Perl script written by James Bach that con-

structs a reasonably small set of test cases for 2-way coverage of a set of
parameters. “Reasonably small” means that the tool does not necessar-
ily produce an optimal solution: smaller test suites might be possible.
The tool can be downloaded from the webpage9 of James Bach.

TESTONA is a commercial tool. It calculates combinatorial test suites

from classification trees (see Chapter 11). Follow the instructions on the
course site to obtain a student copy of the tool.

9.6 The oracle problem

Reading assignment: Read Section 5 of [69].

The oracle problem is also a problem common to all of software testing.

We mentioned it in Section 1.7 and promised to get back to it. We will
do that here.

In testing, an oracle is the mechanism you use to decide whether the test
case output is correct or not. Or as Section 5 of [69] defines it: a test
component that determines the expected result for each set of inputs.

The oracle problem is a trade-off between cost and precision. Producing

a very precise test oracle can be as complex as producing the original
SUT and hence costly. Reducing the costs implies losing precision.

8 [email protected]
9 http://www.satisfice.com/tools/pairs.zip

157
Universidad Politécnica de Valencia Structured Software testing

In Chapters 6, 7 and 8 the oracles were present in our domain mod-

els, our DTMs and our decision tables, respectively. We specified the
expected result ourselves, based on the chosen test values (that in turn
were chosen from our model) and our knowledge of the subject. Con-
sequently, we can call them model-based test oracles since they are based
on some model. In other literature (e.g. [9]) these are also known as
specified oracles.

So, the oracles we have seen in this course were all crafted manually by
the tester. In Section 5 of [69], however, automated generation of oracles
based on models is mentioned. This means that we need to make mod-
els that automatically decide if observed program behaviour is correct
with respect to its specification. Evidently, this can get very expensive.
The better and more accurate the oracle, the closer the model will come
to yet another implementation of the SUT. And such a complicated au-
tomated oracle might need to be tested itself...

Crash testing is based on implicit oracles, i.e. oracles that rely on implied
information and assumptions about any type of software system. For
example, a system should not crash, hang or spit out ugly error mes-
sages. Note that these oracles are cheap because they hold for almost
all software systems. However, they can only give verdicts about crash-
related behaviour, nothing about the domain-specific functionality of
the SUT.

A derived oracle [9] decides whether behaviour is correct or incorrect by

using information derived from the SUT or similar systems. This infor-
mation may include comparing system execution results for similarity,
or comparing the characteristics of versions of the SUT (during regres-
sion testing). The parallel oracles that we have mentioned before, where
the results of another software program are used to compare our out-
comes with, are a type of derived oracle. A pseudo-oracle [124] is another
type of derived oracle. It consists of a program that is explicitly and
separately written to serve as an oracle.

The human oracle [9] is the most expensive but also the most sophisti-
cated oracle. Humans can be guided by heuristics (e.g. blink oracles).

Embedded assertions, as mentioned in Section 5 of [69], are like oracles

built into the software. These can do checks while the software is being
executed.

9.7 Configurations and other test relevant aspects

Reading assignment: Read Section 6 of [69].

The paper describes a telecommunications software that may be config-

ured to work with different types of calls, billing systems, access meth-
ods and servers for billing. The software must work correctly with all
combinations of these four major configuration items.

158
Chapter 9 Combinatorial testing

EXERCISE 9.3
The options mentioned for the different configurations of the telecom-
munications software are:
• call = {local, long distance, international}
• billing = {caller, phone card, 800}
• access = {ISDN, VOIP, PBX}
• server for billing = {Windows Server, Linux/MySQL, Oracle}
Design a 2-wise and a 3-wise test suite for this system.

159
Chapter contents 10

Mutation testing

Overview 161

160
Chapter 10

Mutation testing

This chapter was written by Alessio Gambi and Gordon Fraser.

OVERVIEW

Mutation testing is the process of estimating the quality of a test suite

in terms of how many seeded artificial faults (i.e., mutants) it can find.
Mutants not detected by the test suite can guide the tester in adding
new tests to improve his or her test suite.

LEARNING GOALS
After studying this chapter, you are expected to:
– be able to explain the necessary conditions for a test to reveal a
fault
– be able to create artificial faults using mutation testing
– be able to distinguish equivalent from non-equivalent mutants, us-
ing tests.

CONTENTS

10.1 Introduction

As we have discussed in Section 1.9.1.1, code coverage gives us an intu-

itive way of establishing an estimate of test adequacy: test suites which
do not cover enough cannot be deemed to be adequate. Unfortunately,
high code coverage does not guarantee that tests are actually effective
at finding faults. Consider the following two tests, which check the im-
plementation of the sum method of a hypothetical Calculator class.
@Test
public void testSum(){
Calculator c = new Calculator();
int result = c.sum(1, 1);
System.out.println("The sum is " + result);
}

@Test
public void testSumWithAssertions(){
Calculator c = new Calculator();
int result = c.sum(1, 1);
int expected = 2;
assertEquals(expected, result);
}

161
Universidad Politécnica de Valencia Structured Software testing

Both tests execute the same code in the Calculator class, and thus
achieve identical coverage. Yet, testSum is not as effective as
testSumWithAssertions, since it does not actually check the result
of the invocation of the method under test. Thus, if the sum is wrong,
testSum will not fail. Code coverage is oblivious to this problem. In
this chapter, we will take a look at mutation testing, which is a tech-
nique that helps us to overcome this limitation of code coverage. As
we will see in this chapter, mutation testing would immediately point
out that the test case testSumWithAssertions is clearly better than
testSum. To understand why, let us start by taking a closer look at
what is required for a fault to be detected.

10.2 How tests find faults

The aim of testing is to find faults. Let us revisit the example function
counting the number of zeros occurring in an array of integers (Chap-
ter 1):
public static int numZero (int[] x) {
int count = 0;
for (int i = 1; i < x.length; i++) {
if (x[i] == 0) count++;
}
return count;
}

As discussed in Chapter 1, there is a fault in this function: the for-loop

starts iterating at index 1. This means that if the very first element of the
input array, i.e., x[0], contains the value 0, it would not be counted.

Let us assume we have the following test case for this function:
@Test
public void testZeros() {
int[] values = {1,0,2};
int numZeros = numZero(values);
assertEquals(1, numZeros);
}

This test will pass: our implementation returns a count of 1, which is the
correct expected value. Clearly our test is not that good if it misses such
a blatant bug! However, if you did not already know that there was the
fault in the for-loop, how would you know whether to trust the test or
not? The usual answer to this would be code coverage: our example
test has more than 0 elements, so it enters the for-loop; it also contains
both zero-valued and non-zero elements, so the if-condition evaluates
to true and false in the test execution. Thus, our example test achieves
not only 100% statement coverage, but also 100% branch coverage. And
yet, it does not spot the bug. Why not?

The underlying reason is that there are four prerequisites for a fault
to manifest itself as a test failure: reachability, infection, propagation and
expectation.

162
Chapter 10 Mutation testing

Reachability
According to reachability, faulty code can be detected if, and only if,
it is executed. Reachability is what code coverage measures. Since our
testZeros test achieves 100% code coverage, we know that if there is a
fault, then that fault will be reached. However, reaching and executing
a fault does not guarantee that the fault also manifests in some change
of the execution state.

Infection
The execution state of a program is defined by its variables; for example,
the execution state of numZero is defined by the variables count and
i. If the fault does not cause any of the variables to take on a different
value than it would have in the correct program, then there is no way
to detect the presence of a fault. Whether or not the fault causes a state
change depends on the test input (i.e., the array of integers x in our
example). For some inputs, the state is not affected (i.e., does not differ
from the state the correct program would be in), for others it may be
affected. If the state is affected, then we say that the fault infects the
program state, and the test fulfils the second prerequisite for detecting
faults. Consider our example test case – does it infect the state? It does,
since the correct program would initialise i with the value 0 on the
first execution, while our faulty program sets i to 1 initially. So our test
reaches the fault, the execution infects the state, but the test nevertheless
fails to detect that the program state has changed.

Propagation
An infected state differs in some of its variable values from the expected
state. Not all the variables are observable by a test, as some variables
are internal to the code under test. In our example, the variable i is a
local variable inside the function with a scope that ends when the for-
loop ends. Thus, regardless of how wrong the value of i is, a test case
can never check that value (a developer instead would be able to look
inside the execution of the for-loop using a debugger.) Since we cannot
directly observe the value of i in our test, the only way we have to
detect that the program reaches an infected state is to check if the fault
somehow affects other variables that we can observe, such as variable
count in our example. Throughout the course of execution an infected
state is likely to affect other variables as more computations and value
assignments are performed, and infect their state too. Sooner or later a
wrong internal variable value may therefore propagate to an observable
variable. This propagation of the infected program state is the third
prerequisite for detecting a fault. In our example, this prerequisite is
not satisfied, since the value of count is not wrong at any point of the
test execution and, therefore, the test cannot detect the fault.

Expectation
A failure is the manifestation of an infected state that is propagated to
an observable difference. If the failure is a program crash, then that is
easy to spot. However, if the failure consists of a wrong value, then the
test needs to compare the observed output with the expected output.

163
Universidad Politécnica de Valencia Structured Software testing

This expectation is the final prerequisite for the test to be able to mani-
fest the fault as a failure. In principle, our example test case satisfies this
part, since it has an assertion that compares the return value with the
expected value. However, since the state infection does not propagate
in this test, the assertion holds and the test passes.

Now that we know all the prerequisites for finding the fault in our ex-
ample function, it is easy to create a test that satisfies all prerequisites.
The test needs:
1 to use a non-empty array to satisfy reachability, i.e., the wrong ini-
tialisation of i;
2 to infect the state (which is given by the wrong assignment);
3 to propagate the state infection, which requires the value 0 at the
first position in the array;
4 to check the propagated value of count in an assertion.

Below you can find a test that satisfies all of these prerequisites.
@Test
public void testZeros() {}
int[] values = {0,1,2};
int numZeros = numZero(values);
assertEquals(1, numZeros);
}

EXERCISE 10.1
Consider the following function, which returns the number of occur-
rences of y in the array x. If x is null, then the function throws a
NullPointerException.
public int countNum(int[] x, int y) {
int num = 0;
for (int i=x.length-1; i>0; i--) {
if (x[i] == y) {
num++;
}
}
return num;
}

a The function contains a fault. What is this fault?

b Find a test case that does not reach the fault.
c Find a test case that reaches the fault, but does not infect the state.
d Find a test case that reaches the fault and infects the state, but does
not propagate the state infection.
e Find a test case that reaches the fault, infects the state, and propagates
the infection so that the test can assert a failure.

10.3 Fault-based testing

Fault-based testing is an approach that aims to result in tests that satisfy

all of the necessary conditions for finding faults, unlike code coverage,
which is mainly concerned with reachability. However, there is one cru-
cial problem with this: fault-based testing requires knowing where the
faults are. In the example above we already knew what the fault was,
so we were able to check if our test fulfilled all the necessary conditions

164
Chapter 10 Mutation testing

or not. This is not a realistic scenario – in practice, if we know where a

fault is, we would just fix it straightaway. When we test a program, we
generally do not know where the faults are – finding these is the whole
point of testing! However, if we do not know where the faults are, then
how can we ensure that our tests achieve infection and propagation?

We may not know where the faults in our program are, but we do know
which faults we have made in the past. So, the idea of fault-based testing
is to build on this experience and check if our tests would be able to find
bugs that are similar to the ones we have made in the past. The basic
scenario is this: given a program under test and a test suite where all
the tests pass (usually called a green test suite), would this test suite be
able to find past faults? To find out, we artificially create faulty versions
of the program that represent our past bugs. Then we run the tests on
these faulty versions and observe whether the tests fail or not. If a test
fails on a program with an artificial fault, then that means that the test
satisfies all four conditions necessary to reveal that type of fault. If an
artificial fault does not make any of the tests in our test suite fail, then
at least one of the conditions is not satisfied for that fault, and it shows
that our test suite needs improvement. After all, if the test suite cannot
detect the artificial fault, it is also unlikely to detect whether we have
made a similar bug in the current program.

Thus, fault-based testing has two applications. On the one hand, it can
be used to assess whether a test suite is good enough to detect certain
types of faults; if it is, then we can be more confident that this type
of bug does not exist in our program. On the other hand, fault-based
testing guides us in improving a test suite, since every inserted fault
that is not detected reveals a deficiency in our test suite. As we have
complete knowledge of the inserted faults, we can create new test cases
which fulfil reachability, infection and propagation, as well as declaring
proper assertions as we have done in the previous section.

One challenge with this approach is that we require detailed knowledge

about our past faults. If we use a modern issue-tracking system and
appropriately label our commits to the code repository, then we might
be able to extract some of our past faults. But then, where and how
would we apply these faults to a new program? And what if we do not
have a collection of past faults? Then the answer is mutation testing.

10.4 Mutation testing

The idea of mutation testing is to simulate plausible faults with artificial

faults called mutants. Mutation operators are a core concept in muta-
tion testing. A mutation operator is a rule for systematically producing
faulty versions of a program (also called mutants) which captures the
essence of a specific type of fault. For example, the “replacement of re-
lational operators” mutator generates faults related to the use of wrong
relational operators. The mutation operator takes a program as input,
and for each relational operator (e.g., <) in the program it produces a

165
Universidad Politécnica de Valencia Structured Software testing

number of mutants, each of which replaces the original relational opera-

tor with a different relational operator (e.g., ≥). The following example
shows the code which recursively computes the greatest common divi-
sor (gcd), and one mutant generated by replacing the relational opera-
tor == on line 2 with != below.
int gcd(int a, int b) {
if( b == 0)
return a;
return gcd(b, a % b);
}

int gcd(int a, int b) {

if( b != 0)
return a;
return gcd(b, a % b);
}

Many different mutation operators have been proposed in the litera-

ture for various programming languages, and a convention is to give
these operators three-letter acronyms. According to this convention the
replacement of relational operators is identified as ROR.

Mutation operators are not restricted to relational operators, they can

act on different elements of the code. For example, they can change the
sign of values, the value of some variables, or they can replace arith-
metic operators with different ones, variables with other variables in
the same scope, and so on.

Mutation operators are not usually created arbitrarily, but based on a

general fault model for the programming language and programming
domain the mutation testing is applied in. The following list reports
some of the most commonly used mutation operators for modern lan-
guages, but many more mutation operators can be defined.
Absolute Value Insertion (ABS) replaces a variable with its absolute,
possibly negated, value; for example:
x=y −→ x=abs(y), x=-abs(y), et cetera.
Arithmetic Operator Replacement (AOR) changes arithmetic operators
into different ones; for example:
tmp=x*y; −→ tmp=x+y, tmp=x/y, et cetera.
Relational Operator Replacement (ROR) transforms relational opera-
tors or defaults these to true or false; for example:
y>0 −→ y<0, y!=0, et cetera.
Conditional Operator Replacement (COR) transforms or removes con-
ditionals, or defaults to true or false; for example:
(a&&b) −→ (a||b), (true), (a), et cetera.
Unary Operator Insertion (UOI) applies unary operators to variables
and values, including increments and decrements; for example:
y −→ -y, !y, y++, et cetera.
Scalar Variable Replacement (SVR) replaces scalar variables with other
scalar variables in the scope of the mutated instruction; for example:
tmp=x%y −→ tmp=y%y, tmp=tmp%y, et cetera.

166
Chapter 10 Mutation testing

The mutation operators listed above can be applied to any program-

ming language. However, other mutation operators rely on specific
features of the programming language which is used to write the pro-
gram. For example, programs written in an object-oriented program-
ming language, such as Java, can be mutated by replacing method re-
turn values with other values (e.g., return obj; becomes return
null;), constructor calls with default initialisations (e.g., Object obj
= new Object(); becomes Object obj = null;), and by alter-
ing method signatures. For example, we can change the type of method
arguments and return values (e.g., gcd(Integer a, Integer b)
becomes gcd(Number a, Number b)), or the visibility of the method
(e.g., private void foo becomes public void foo).

Mutation operators are applied to the program under test to produce

mutants. Each mutant represents a distinct faulty program version on
which tests are executed. If a test, which passes on the original program,
fails on a mutant, then that mutant is detected. The common mutation
testing lingo is that mutants that have been detected are dead, and mu-
tants that have not yet been detected are alive. Thus, we informally say
that a test that detects a mutant kills this mutant. After executing the
tests against all the mutants, each mutant can either be dead, because
at least one test killed it, or alive, because no test killed it. Dead mu-
tants represent the potential faults which the test suite is able to spot;
conversely, live mutants likely reveal weaknesses in the test suite.

To measure the effectiveness of the test suite, we compute the mutation

score. The mutation score measures how many mutants were killed by
at least one test over the overall number of mutants; hence, the mu-
tation score indicates how good the test suite is in detecting potential
faults. As a consequence, it is believed that the mutation score corre-
lates well with the ability of the test suite in detecting real faults. While
a mutant that is killed increases the mutation score, and may increase
confidence in the test suite, the true value of mutation testing lies in the
live mutants, because live mutants can guide the developers towards
parts of the system which are insufficiently tested.

EXERCISE 10.2
Given the following code, which sorts an array of integers using the
widely known bubble sort algorithm, and a test which tests it, create a
mutant which survives the test, and a mutant which is killed by it.
public class Sorter{
public static void bubbleSort(int[] array) {
boolean swapped = true;
int j = 0;
int tmp;
while (swapped) {
swapped = false;
j++;
for (int i = 0; i < array.length - j; i++) {
if (array[i] > array[i + 1]) {
tmp = array[i];
array[i] = array[i + 1];
array[i + 1] = tmp;
swapped = true;
}

167
Universidad Politécnica de Valencia Structured Software testing

}
}
}
}

@Test
public void testBubbleSort() {
/* setup */
int[] inputArray = new int[]{1, 3, 2};
/* exercise */
int[] actualArray = Sorter.bubbleSort(inputArray);
/* assert */
int[] expectedArray = new int[]{1, 2, 3};
Assert.assertArrayEquals(expectedArray, actualArray);
}

10.4.1 Central hypotheses

A mutant differs from the original program in exactly one very sim-
ple and very small syntactic change. If you look closely, all the muta-
tion operators we listed produce mutants that are really more like sim-
ple programming glitches, where the programmer hit the wrong key,
or mixed up two different variables. Real faults are not always that
simple: real faults can be the result of substantially misunderstanding
the software requirements, or of fundamentally breaking an algorithm.
these simple mutants? So how can examining these simple mutants be
sufficient? Since real faults may be arbitrarily large, a tempting and ob-
vious thought would be to produce more complex artificial faults by
combining multiple mutations into larger mutants. We call a combina-
tion of multiple mutants a higher order mutant. However, it is generally
sufficient to look at our simple mutants, if we make two fundamental
assumptions.

The first assumption is called the competent programmer hypothesis: this

hypothesis asserts that programmers tend to be at least somewhat com-
petent, and thus they will usually produce programs that are some-
where in the vicinity of the correct program. That is, programs tend to
differ from the correct version only by smaller mistakes.

Of course, not all mistakes are small; even competent programmers

may produce large and complex faults every now and then. However,
there is a further assumption that justifies why mutation testing never-
theless works. This second fundamental assumption in mutation test-
ing is the coupling effect. Two faults are coupled if a test that reveals one
fault will also reveal the other fault. The coupling effect states that small
faults are coupled with complex ones; that is, tests that find small faults will
also be able to find complex faults. Intuitively, this makes sense: the
larger a fault, the more aspects of the program behaviour it will break,
and the easier it will be to find a test that reveals the fault. The smaller
a fault, the subtler it will be, and the more difficult it will be to find a
test that finds the fault. This intuition has been supported by empirical
evidence [89, 5, 59], and therefore we can generally assume that tests
that find mutants will also be good at finding real faults.

168
Chapter 10 Mutation testing

10.4.2 Equivalent mutants

Consider the following code example, which sums the first four ele-
ments of an array:
public static int sumFirstFour(int[] b) {
int sum = 0;
for (int i = 0; i < b.length; i++) {
sum = sum + b[i];
if (i == 3) {
break;
}
}
return sum;
}

Now let us apply the ROR mutation operator on the if-statement, which
replaces if (i == 3) with if (i >= 3):
public static int sumFirstFour(int[] b) {
int sum = 0;
for (int i = 0; i < b.length; i++) {
sum = sum + b[i];
if (i >= 3) {
break;
}
}
return sum;
}

Any test with a non-empty array will reach the mutant. However, no
matter what test we use, we cannot get the mutant to infect the state:
for the mutant to produce a different execution state than the original
program, the variable i would need to be larger than 3. However, as
soon as i equals 3 the program exits the loop and so the if-condition can
never be evaluated for values of i larger than 3. The problem is that,
even though the mutant and the original program differ syntactically,
they do not differ semantically. In mutation testing, this type of mutant
is known as an equivalent mutant.

Since equivalent mutants by definition behave exactly like the original

program, they cannot possibly be detected by any tests. Consequently,
the mutation score may be skewed by equivalent mutants. If we mea-
sure a mutation score of 90% on a test suite, how do we know whether
this means that the test suite is weak and we need to add more tests to
kill the remaining 10% of mutants, or whether it means that the remain-
ing 10% of mutants are equivalent and our test suite is already as good
as it gets?

Unfortunately, in general terms, the problem of determining whether

a mutant is equivalent is the same as deciding the equivalence of two
programs, and this is an undecidable problem. While there are certain
equivalent mutants that can be automatically detected (e.g., a mutation
in dead code can be detected if a compiler optimisation removes the
mutated code [90]), in general, determining whether a mutant is equiv-
alent or not has to be done manually.

169
Universidad Politécnica de Valencia Structured Software testing

EXERCISE 10.3
Given the following program, which finds the maximum value in the
int array values, and some of its mutants, decide for each mutant if
it is equivalent or not. In the latter case, prove that the mutant is non-
equivalent by showing a test which kills the mutant.
public static int max(int[] values) {
int r, i;
r = 0;
for (i = 1; i<values.length; i++) {
if (values[i] > values[r])
r = i;
}
return values[r];
}

Mutant 1:
public static int max(int[] values) {
int r, i;
r = 0;
for(i = 0; i<values.length; i++) {
if (values[i] > values[r])
r = i;
}
return values[r];
}

Mutant 2:
public static int max(int[] values) {
int r, i;
r = 0;
for(i = 1; i<=values.length; i++) {
if (values[i] > values[r])
r = i;
}
return values[r];
}

Mutant 3:
public static int max(int[] values) {
int r, i;
r = 0;
for(i = 1; i<values.length; i++) {
if (values[i] >= values[r])
r = i;
}
return values[r];
}

Mutant 4:
public static int max(int[] values) {
int r, i;
r = 1;
for(i = 1; i<values.length; i++) {
if (values[i] > values[r])
r = i;
}
return values[r];
}

170
Chapter 10 Mutation testing

Mutant 5:
public static int max(int[] values) {
int r, i;
r = 0;
for(i = 1; i<values.length; i++) {
if (values[r] > values[r])
r = i;
}
return values[r];
}

10.5 Scalability

Depending on the mutation operators applied and the program under

test, the number of mutants generated can be substantial. This usually
makes mutation testing a computationally expensive process, because,
in the worst case, we need to compile each of the mutants, and on each
mutant execute all tests in order to calculate a mutation score and de-
termine which mutants are still alive. These computational costs can be
inhibitive in practice. However, various strategies exist to reduce these
costs, ranging from parallelising the evaluation of mutants to limiting
the number of mutants evaluated. In this section, we will consider some
of the most widely used and effective strategies.

10.5.1 Skipping unreachable mutants

A primary factor of the high costs are the many test executions that mu-
tation testing potentially requires. However, many test executions can
be avoided: first, and rather obviously, once a mutant has been killed,
no further tests need to be executed on that mutant. However, there is a
further simple yet effective optimisation: recall that tests need to reach
the mutant, infect the state, and propagate the state infection. Whether
or not a mutant is reached is something we can know without actually
executing the test on the mutant. It is sufficient to know whether the
mutated location in the source code is covered by the tests; if a mutant
is not covered, then by definition it cannot be killed, and we do not need
to execute any tests. Typically, all tests are executed on the non-mutated
version of the program as a first step in the mutation analysis (to check
whether all tests pass), and coverage information can be generated dur-
ing this initial execution.

EXERCISE 10.4
Why is such a simple solution effective in practice?

10.5.2 Mutant schemata

A mutant is a modified version of the original code, hence we need to

recompile the (mutated) code to execute and test it. Execution may in-
volve computationally expensive aspects such as starting a virtual ma-
chine for bytecode-based languages like Java. If each mutant is com-
piled individually, the test execution process has to be repeated from
scratch for each and every mutant. Rather than compiling each mutant

171
Universidad Politécnica de Valencia Structured Software testing

individually to a distinct binary, we can create meta-mutants. A meta-

mutant (also known as mutant schema [118]) is a version of the program
that combines all the mutations, and allows individual mutants to be
programmatically activated and deactivated. In a meta-mutant, basic
operators are replaced with parameterised function calls where the dif-
ferent options (e.g., operators) are determined by the parameter. For
example, the following function summarises all possible arithmetic op-
erators:
public static int aor(int op1, int op2, int aorID) {
switch(getVariant(aorID)) {
case aoADD: return op1 + op2;
case aoSUB: return op1 - op2;
case aoMULT: return op1 * op2;
case aoDIV: return op1 / op2;
case aoMOD: return op1 % op2;
case RIGHT: return op2;
case LEFT: return op1;
// ...
}
}

Such a function is called a metaoperand. The mutation of a program

now consists of replacing actual arithmetic operators with calls to this
metaoperand. For example, assume we are mutating a simple function
that sums up its parameters:
public static int sum(int x, int y) {
return x + y;
}

The mutation now simply replaces the + operator with the aor meta-
operator:
public static int sum(int x, int y) {
return aor(x, y, 1);
}

The number 1 in this example refers to the ID of the operator within

the program. The runtime environment of the meta-mutant now needs
to keep track of which actual operator the metaoperator should return;
this is determined in the getVariant function. Conceptually, we can
think of this function as follows, where activeMutation is some vari-
able that determines which mutant should be activated:
public static int getVariant(int opID) {
if (opID == 1) {
switch(activeMutation) {
case 1: return aoSUB;
case 2: return aoMULT;
case 3: return aoDIV;
case 4: return aoMOD;
case 5: return RIGHT;
case 6: return LEFT;
default: return aoADD;
}
}
}

172
Chapter 10 Mutation testing

This way, the compilation process only needs to be done once and the
overhead of repeatedly restarting virtual machines and other execution
infrastructure is removed. For each mutant, the same program is exe-
cuted, and simply setting activeMutation to the appropriate value
will activate the corresponding mutant.

EXERCISE 10.5
With mutant schemata the compilation process is done just once, but
what about test execution? Does test execution need to be repeated or
do we need to run each test only once?

EXERCISE 10.6
Given the following code snippet, create code for a mutant schema us-
ing the ROR mutation operator (assume that a getVariant function
exists).
public static boolean isAplusBgreaterThanC(int a, int b, int c){
return (a + b > c);

10.5.3 Mutant sampling

The systematic application of mutation operators not only leads to many

mutants, but many of these mutants are semantically very similar, and
thus detected by the same tests. Instead of running tests on all mu-
tants, it is therefore feasible to first generate all mutants, but then to
sample just a subset and only use that subset for mutation analysis in-
stead of the full set of mutants. As long as the mutants are uniformly
sampled (i.e., use any standard random number generator), a mutation
score calculated on the sample approximates the mutation score on the
full set of mutants. In particular, it has been shown that sampling just
5% to 10% of mutants approximates the full mutation score with high
accuracy [130].

An alternative approach to sampling from the entire set of mutants is

to restrict the mutants generated in the first place. This is usually done
by applying only a few representative mutation operators instead of all
available mutation operators. Indeed most modern mutation tools im-
plement only a subset of mutation operators which has been shown to
be sufficient; that is, if all the resulting mutants are killed, then (almost)
all mutants of the remaining operators are killed too. Sufficient sets of
operators have been defined for different languages, and standard suf-
ficient sets can be found in the literature [91, 108].

10.6 Mutation testing in practice

In the past, several research tools that implement mutation testing were
proposed. For example, for the Java programming language, notewor-
thy mutation testing tools are µJava [72], Javalanche [106], and Ma-
jor [58].

173
Universidad Politécnica de Valencia Structured Software testing

Usually, research prototypes have limitations and are only meant to il-
lustrate or evaluate the application of advanced mutation testing tech-
niques. For Java, a well-known mutation testing framework commonly
applied in practice and actively maintained is PIT (on the course site
there is a small laboratory exercise for using PIT).

10.6.1 Mutation testing at Google

Although mutation testing by far is not as wide spread in practice as

code coverage is, it is gaining momentum. As a point in case, muta-
tion testing is even used at Google, where the sheer scale of software
would make mutation analysis seem like a daunting task. However,
some of the main challenges are not even those of computational com-
plexity, but related to making developers value mutants and use them
to improve their test suites. Here we describe some of the adaptations
of mutation analysis to make it usable at Google. This is described in
more detail by Petrovic and Ivankovic [93].

At Google, mutation analysis is integrated into the code review pro-

cess. When developers submit a change request for review, some mu-
tants that survive the tests are shown to the developers. A conscious
choice at Google was to show at most one mutant per line, but also
to skip mutants that are likely irrelevant for developers. For example,
mutations of logging statements are rarely considered helpful. During
code review, the reviewers can decide, for each mutant shown, whether
the mutant is important and should be addressed by the developer, or
whether the mutant is not helpful. Over time, Google has collected mas-
sive amounts of data on which mutants developers think are useful and
which ones are less useful, and have built heuristics to select mutants
that are more likely to be useful.

An important consequence of this particular adaptation of mutation

analysis is that an overall quality measurement in terms of a mutation
score is no longer possible, since only few and selected mutants are
produced in the first place. However, at Google the perception is that
individual mutants that lead to improvements in the test suite are the
most useful aspect of mutation analysis [93].

10.6.2 Mutation testing for fun

Learning mutation testing, understanding which mutants are useful,

and writing tests to kill mutants, can be challenging tasks and require
practical experience. The aim of the Code-Defenders game is to make
the learning of mutation testing more entertaining and fun.

Code-Defenders [99] is a web-based game in which players compete

by mutating Java object-oriented classes under test (i.e., attacking) and
creating tests (i.e., defending). Attackers create mutants by directly edit-
ing the source code of the Java class under test, while defenders write

174
Chapter 10 Mutation testing

tests using a code editor. The goal of the attackers is to introduce arti-
ficial bugs into the classes under test that reveal weaknesses in the test
suites, hence they gain points if their mutants survive the tests. Con-
versely, defenders aim to improve the test suites by adding tests which
deflect the attacks, hence they gain points by killing mutants.

Attackers can create equivalent mutants either by accident or intention-

ally, as part of attackers’ strategy. Code-Defenders cannot automatically
identify equivalent mutants, so defenders must flag the mutants that
they suspect might be equivalent. This triggers an equivalence duel and
the attacker who created the flagged mutant must provide a test which
detects the mutant or accept that that mutant is equivalent.

There are three types of Code-Defenders games:

• battlegrounds, in which two teams compete over the class under test;
• duels, where two players compete like in a chess game;
• puzzles, where a player has to solve a predefined set of challenges.

Code-Defenders can be downloaded free from GitHub1 or it can be

played online at: http://code-defenders.org/

10.7 Summarising the mutation testing method

A test can only reveal a fault if the test (1) reaches the fault in the
code, (2) the execution of the fault infects the execution state, (3) the
infected execution state propagates to an observable output, and (4) the
test checks this output against an expected value. While code cover-
age focuses on reachability, mutation testing aims to optimise tests for
all of these conditions. To achieve this, artificial faults (mutants) are
seeded in the program, and the test suite is evaluated based on how
many of these mutants it can distinguish. Mutants that are not detected
point out where tests are missing in the test suite. In contrast to basic
coverage analysis, mutation testing is computationally expensive, for
example because the number of mutants that can be generated for any
non-trivial program tends to be huge. In this chapter, we looked at var-
ious optimisations to overcome these issues.

1 https://github.com/CodeDefenders/CodeDefenders

175
Chapter contents 11

Classification trees

Overview 177

176
Chapter 11

Classification trees
This chapter has been written together with Eduardo Miranda based on
his material in [70].

OVERVIEW

In this chapter, we look at tree-based structures to make test models.

Trees are a good way to classify different types of information hierar-
chically. Mind maps, for example, are tree-shaped and commonly used
as informal diagrams to visualise information and show relationships
between ideas, concepts or information.

In this chapter, we will concentrate on more formally defined tree mod-

els to design test cases: classification trees. Having more precise tree
semantics enables us to generate test suites using chosen coverage crite-
ria. Consequently, applying the classification tree method fits precisely
into our way of presenting test design techniques – modelling the test
problem with a classification tree and defining test cases according to
some selected coverage criterion (here t-wise combination).

Note that classification trees are not something totally new and differ-
ent from what we have seen in the previous chapters. They provide just
yet another way of making a test model, with its own advantages and
disadvantages, as we will describe in this chapter. Neither are we say-
ing that classification trees are the thing to use rather than other more
informal tree structures. Not at all! A mind map can be a very good
technique to clarify your testing strategy. It can even be a first draft for
a more formal classification tree if you want to make one.

LEARNING GOALS
After studying this chapter, you are expected to:
– understand the concepts of classification trees
– be able to create classification trees as test models
– apply t-wise coverage criteria to generate test cases from these
models.

177
Universidad Politécnica de Valencia Structured Software testing

CONTENTS

11.1 Introduction

Tree-based structures are useful for classifying different types of infor-

mation hierarchically. Mind maps, for example, are like trees and have
been used for some time now as an efficient way to take notes, plan
a project or presentation, or brainstorm. Mind maps are very flexible:
different nodes can take on many different kinds of information. Al-
though this flexibility is one of the strengths of mind maps, if we want
to use them to design test suites according to some coverage criterion,
we might want to have tree structures with slightly more precise se-
mantics. We will look at an example of those in this chapter.

These more formal tree models are called classification trees.

The Classification Tree Method (CTM) was introduced in 1993 by Grocht-

mann [52]. The CTM is also a partition testing method, like the method
we discussed in Chapter 6. The CTM not only concentrates on the input
domain (i.e. input parameters), but encourages testers to think about
the environment and all other test relevant aspects too.

The advantage of the CTM is that it provides a graphical representation

that is descriptive and easy to learn. Moreover, it comes with tool sup-
port (the TESTONA1 we already mentioned in Chapter 9 and will discuss
a bit later in this chapter).

11.2 Classification trees

A classification tree (CT) is a tree in the sense of graph theory. It is an

undirected acyclic graph with a root node, inner nodes and leaf nodes.
The root node of a classification tree represents the test object, which
is (part of) the SUT. The nodes of a classification tree are divided into
three types.
• Classification nodes, drawn with a thin border as in Figure 11.1, rep-
resent so-called test relevant aspects or test parameters (these will be
described in more detail in Section 11.4). Classifications are used to
model a test parameter of which the values are partitioned into of-
ten collectively exhaustive and always mutually exclusive sets (the
equivalence classes). Therefore, children (direct descendants) of clas-
sifications are always classes. Classifications define an is-a relation-
ship.
• Class nodes, drawn without a border, correspond to the choices avail-
able for a given classification. They model the equivalence classes
and exemplar values. Classes are allowed to have compositions and
classifications as children. Leaf nodes of the tree must be classes.
• Composition nodes, drawn with a thick border, are nodes for struc-
turing test relevant aspects. They model a test parameter that consists-
of several other relevant factors. Compositions can have other com-
positions and classifications as children. Children of compositions
do not have to be disjoint. As you can see in Figure 11.1, the root is
also a composition.
1 http://www.testona.net/

178
Chapter 11 Classification trees

FIGURE 11.1 Elements in a classification tree

Once we have the tree, test cases are composed by combining leaf classes
(using a t-wise combinatorial coverage criterion from Chapter 9).

11.3 Modelling with a classification tree

The subsequent steps can be followed to construct a classification tree:

1 As always, we start with the test object. This can be a complete SUT
or a component of it.
2 Define the relevant aspects to test (i.e. the classifications). Test rele-
vant aspects are explained in Section 11.4.
3 Classify the domains of each aspect (classification) separately (de-
fine the classes).
4 If necessary, further refine the classes with additional aspects/clas-
sifications.
5 Introduce dependency rules if necessary.

Before we look at these steps in detail, let us look at an example we

already know: the hardware store. This way we can show how CTs are
just another way of representing your test problem. Recall the domain
model we made on page ??:

input part comment

ID values
h vP1 [0, +∞[ valid
iP1 ]−∞, 0[ invalid, cannot be negative
iP2 not an integer invalid, should be an integer
s vP2 [0, 30] valid
vP3 ]30, +∞[ valid, 10% additional discount
iP3 ]−∞, 0[ invalid, cannot be negative
iP4 not an integer invalid, should be an integer
5h + 10s vP4 [0, 200] valid, no total discount
vP5 ]200, 1000] valid, 5% total discount
vP6 ]1000, +∞[ valid, 20% total discount
iP5 [−∞, 0[ invalid, cannot be negative

179
Universidad Politécnica de Valencia Structured Software testing

The test relevant aspects that we considered in Chapter 6 were related

to the number of hammers (h), the number of screwdrivers (s) and the
subtotal of buying these for 5 euros and 10 euros respectively (5h + 10s).
So we can add these as classifications. The domains for each of those
(valid and invalid) can then be defined as classes. The CT could look as
follows:

The test cases are composed by combining the leaf classes. Recall from
Chapter 6 that we cannot combine invalid and valid classes. We need to
apply different combination coverage criteria for them. We should pick
Each Choice Coverage (ECC) for the invalid classes:

test h s 5h expected covers

case +10s outcome
1 -20 10 * fault message h invalid, h < 0
2 aaa 20 * fault message h invalid, not an integer
3 50 -50 * fault message s invalid, s < 0
4 50 bbb * fault message s invalid, not an integer

For the valid classes, we can choose All Combinations Coverage (ACC),
giving the same results as before:

test h s 5h + 10s expected covers

case outcome combinations
5 10 10 150 no discount h ≥ 0, 0 ≤ s ≤ 30, 5h + 10s ∈ [0, 200[
6 10 20 250 5% discount h ≥ 0, 0 ≤ s ≤ 30, 5h + 10s ∈]200, 1000]
7 150 30 1050 20% discount h ≥ 0, 0 ≤ s ≤ 30, 5h + 10s ∈]1000, +∞[
8 50 50 750 15% discount h ≥ 0, s > 30, 5h + 10s ∈]200, 1000]
9 100 100 1500 30% discount h ≥ 0, s > 30, 5h + 10s ∈]1000, +∞[

In the following sections, we will go through CT design using exam-

ples. Test relevant aspects are described in Section 11.4. Section 11.5
will show an example of how to classify the domains of each aspect
(through the Find command). Section 11.6 discusses the coverage crite-
ria and the combination table. Section 11.7 is about designing the test
cases. Section 11.8 summarises the classification tree method. Section
11.10 will show more examples of how to refine a tree (through a flexi-
ble manufacturing system), how to take care of dependencies (through
an audio amplifier example) and how to deal with missing and infeasi-
ble combinations (through a password diagnoser example).

180
Chapter 11 Classification trees

11.4 Test relevant aspects

A test relevant aspect is any factor the tester thinks can have implica-
tions on the outcome of a test run. A non-exclusive list includes:

• The SUT input variables. This is what we considered in Chapter 6 as

we can see from the hardware store example before.

• The morphology of the inputs. That means any characteristic of the in-
put variables other than its apparent values that might lead to the
execution of an unspecified path and result in failure. For example,
the possible attributes of a data type. When considering data types
like lists, arrays, sets, sequences, strings, et cetera, think of:
– the cardinality (e.g. the minimum and maximum permitted length
of a password);
– whether its elements are ordered or not;
– the position of a particular character within a sequence;
– multiple occurrences of a value in a sequence.

• The environment in which it executes. We can think of the different in-

terfaces (with users, software, hardware, databases, APIs, et cetera),
physical factors (light, noise, distractions, temperature), resources (CPU,
memory, connections) and the platforms on which it runs. Let us look
at some examples:
– The result of a query depends on the data stored in the queried
repository. If there are two records that satisfy a query, the SUT
might provide a correct answer for the first but fail to find the sec-
ond. What happens if a repository contains zero, one or two or
more occurrences of the same value?
– In the case of a speech recognition system, the background noise
might not be an explicit input to the recognition algorithm, but its
level will certainly affect the ability of the system to perform an
accurate transcription of the words spoken. In consequence, noise
level is a very important test parameter.
– The amount of memory or other resources available to the SUT
might have some influence on the outcomes of a test.

• The state of the SUT itself. The state of the SUT refers to the internal in-
formation a SUT needs to maintain between successive executions so
that it performs as designed. A SUT that does not need information
from its first execution to respond correctly to the second is called
stateless. This means that after the SUT is executed, all local vari-
ables and objects that were created or assigned a value as part of the
execution are discarded. A stateful SUT, however, remembers infor-
mation from execution to execution. Consequently, when presented
with the same inputs it might produce a different result depending
on whether it is the first, the second or any subsequent invocation.

• The configuration of the SUT. For example particular software or hard-

ware configurations on which the SUT is configured or different lo-
calisation data can change the behaviour during a test run.

181
Universidad Politécnica de Valencia Structured Software testing

FIGURE 11.2 The dialogue window of the Find command

Also the taxonomies, catalogs and PCOs introduced in Chapter 5 could

help here.

EXERCISE 11.1
For each of the test relevant aspects mentioned in this section, find one
or two examples/exercises/SUTs from earlier in this course for which
that aspect is indeed relevant.

11.5 An example: The Find command

Imagine we need to test the Find command from a text editing program.
When invoking this command while editing some text, we will get a
dialogue window as depicted in Figure 11.2.

This dialogue enables us to specify the string we want to find in our text
together with some options.

EXERCISE 11.2
In Figure 11.2 you can see the dialogue window that starts up when
executing the Find command. Make a list or mind map of the test rel-
evant aspects concentrating on the finding functionality (i.e. ignoring
the ? (help), Cancel and X (close) buttons). Then look at the example CT
in Figure 11.3 and try to understand the different parts and how they
coincide with your test relevant aspects.

The root of the tree in Figure 11.3 defines the scope of the test design. By
stating the scope as FindCommand we have purposefully excluded from
the scope other commands embedded in the dialogue window, such as
? (help), Cancel and X (close). This way our tree stays manageable.

The second level nodes are compositions and highlight the three top-
level test relevant aspects, i.e. test parameters:
1 the command options (node Options);
2 the search string we input in the text field right of Find what:
(node FindWhat);
3 the environment in which the command is to execute, i.e. the text in
which the string has to be found (node SearchedText).

182
Chapter 11 Classification trees

FIGURE 11.3 A classification tree for the Find command

Optionsmodels the different options of the Find dialogue (i.e. Match

case and Direction) as a composition since the use of Match case
does not exclude the use of Direction. Consequently, Options consists-
of both classification MatchCase and classification Direction. Classification
MatchCase is-a Yes or a No. Classification Direction is-an Up or a Down.

FindWhat models the string searched for in the text. Instead of enumerat-
ing actual values for this input field, the test parameter is decomposed
into three lower level test parameters, corresponding to characteristics
(morphology) of the search string the test designer judged important to
test for (i.e. the length, the number of spaces and the presence of capital
letters). Note that these aspects might or might not have been described
in the specification document, but in the experience of the test designer,
they are aspects relevant to include in the test. For example, because
he or she has seen developers using spaces as delimiters and programs
crashing when presented with an empty string.

The StringLength test parameter is subsequently modelled as a classifica-

tion with classes: 0, 1, 2+, MaxLength. This is a classification because
the choice of a particular length precludes the others from being chosen.
Classes might model specific values like in the case of the Null string
or they might denote a set of values deemed equivalent like in the case
of the 2+ class which comprises all strings of length 2 to MaxLength -
1. Although this is not depicted in Figure 11.3, the test designer might
choose for example to refine the 2+ class using boundary value analysis
(from Chapter 7) to verify that indeed the processing is equivalent in
the range [2, MaxLength-1].

models the environment in which the command is to exe-

SearchedText
cute. This parameter is not an explicit input to the command. Here
you can see how this approach goes beyond partitioning of the input
domain. However, this environment is a very important part of the test

183
Universidad Politécnica de Valencia Structured Software testing

design since the outcomes of executing the command will depend on

the content of the text searched and on the position of the cursor at the
time the command is executed. This is why the node SearchedText is a
composition that consists of two classifications: Content and CursorPosition.

11.6 Combinatorial coverage criteria

To visualise coverage, the tree is mapped onto a so-called combination

table. In the table, a column is set up for each leaf node (i.e. a class
which is not further subdivided), and a row corresponds to a test case.
In each row of the table, those classes constituting the test cases are
marked. Making this type of combination table with the TESTONA tool2
(see Section 11.9) would look like this:

Another way of writing these test cases could be the following combi-
nation table:
Test Case# #hammers (h) #screwdrivers (s) 5h+10s
1 h<0 0<= s <= 30 ?
2 not an integer 0<= s <= 30 ?
3 h>=0 s<0 ?
4 h>=0 not an integer ?
5 h>=0 0<= s <= 30 in [0, 200[
6 h>=0 0<= s <= 30 in [200, 1000[
7 h>=0 0<= s <= 30 in [1000, ->[
8 h>=0 s>30 in [200, 1000[
9 h>=0 s>30 in [1000, ->[

All different ways of representing the same thing. Note that the combi-
nation table only describes which classes should be combined according
to the chosen coverage criterion, hence the name. There are no specific
values for the test cases yet. This will be discussed in the next section.

This is the combination table for the Find command, picking the 2-
wise coverage criterion, that we have seen in when doing the Problems-
exercise on YouLearn related to Chapter 9:

2 www.testona.net

184
Chapter 11 Classification trees

Dan voor Hfdstuk 9 omgezet ind e Eduardo stijl:

MatchCase Direction StrLength Spaces Capitals Content Cursor

1 Yes Up MaxLength 2+Cons All Empty AfterFirst
2 Yes Down 1 2+Cons Mixed Occurrences = 0 -
3 Yes - 2+ 1 No Occurrences > 1 -
4 Yes - MaxLength 2+Nocons - Occurrences = 1 AfterFirst
5 No Up 0 2+Nocons No Occurrences > 1 AfterFirst
6 No Down MaxLength 1 Mixed Empty BeforeFirst
7 No - 1 No All Occurrences = 1 BeforeFirst
8 No - 2+ 2+Cons - Occurrences = 0 BeforeFirst
9 - Up 1 1 - Empty -
10 - Down 2+ 2+Nocons All Occurrences = 1 -
11 - - MaxLength No No Occurrences = 0 AfterFirst
12 - - 0 2+Cons Mixed Occurrences > 1 BeforeFirst
13 - Up 2+ No Mixed Occurrences = 1 -
14 - Down 0 No - Occurrences > 1 -
15 - - MaxLength 2+Cons No Occurrences = 1 -
16 - - 1 2+Nocons No Empty BeforeFirst
17 Yes - 0 1 All Occurrences = 0 -
18 Yes - 0 No - Empty -
19 - - - 1 All Occurrences = 0 AfterFirst
20 - - 2+ 2+Nocons Mixed Empty AfterFirst
21 No Down 0 1 No Occurrences = 1 -
22 No - MaxLength 2+Nocons - Occurrences > 1 -
23 Yes Up - 2+Nocons - Occurrences = 0 BeforeFirst
24 - Down 1 - All Occurrences > 1 AfterFirst

11.7 Designing the test cases

As you can see from the example above, the test parameter values of
classifications (i.e. the leaves of a CT) are given in abstract form, they are
abstract test cases. To make them executable they need to be concretised
into concrete test cases like those in the tables on page 180 where you
can see explicit values for the input variables. This is also known as
sensitisation [13], finding input values that will cause a selected test case
to be executed.

For the hardware store example this process is simple; we just pick val-
ues that satisfy the equivalence classes. However, for the Find com-
mand dialogue window, it is less trivial. For example, to be executable,
the parameter value ExistMoreThanOnce would need to be transformed
into an actual sequence of strings of which at least two must be identical
to the one used as a value for the FindWhat test parameter.

11.8 Summarising the classification tree method

The CTM partitions test relevant aspects into equivalence classes. Talk-
ing in general terms about test relevant aspects encourages testers to
think beyond the input domain. The graphical representation as trees
is descriptive and easy to learn. Trees are also more scalable than tables
and hence allow for modelling bigger problems. Moreover, linking the
trees to the combination tables makes test coverage visible.

Figure 11.4 gives a pictorial description of the classification tree method.

The “impossible combinations” and “dependencies” that are mentioned
in the figure are explained in the next sections, which contain more ex-
amples.

185
Universidad Politécnica de Valencia Structured Software testing

FIGURE 11.4 The classification tree method as depicted by Eduardo Mi-

randa in [70]

Node Use Allowed Relationship Allowed

parents with descendants descendants
Composition,
Root Scope of the testing None Consists-of
Classification
Models a test parame-
Root,
ter that integrates sev- Composition,
Composition Composition, Consists-of
eral other relevant fac- Classification
Class
tors
Models a test parame-
ter that is partitioned
Root,
into collectively exhaus- Can be partitioned
Classification Composition, Class
tive and mutually ex- into
Class
clusive sets of values
(equivalence classes)
Models the equivalence
Composition,
Class classes and exemplar Classification Can be refined into
Classification
values

TABLE 11.1 Classification tree semantics

186
Chapter 11 Classification trees

As with all the test models we have seen in previous chapters, the re-
sults of CTM models are not unique. As the creators of the method ex-
plain, the CTM provides a structured and systematised approach to test
case design, making the test cases understandable and documentable.
However, the results will be highly dependent on the tester’s experi-
ence, knowledge and creativity. We will discuss the advantages and
disadvantages of alternative partitions as we go through the examples
in subsequent sections.

11.9 Tool support

The tool that supports the CTM is TESTONA3 . This is a commercial tool
that is exploited by a German company. For this course, students can
obtain a free license to draw classification trees and create combination
tables. On the course site you can find details on how to obtain the
license.

11.10 More examples

We illustrate the use of the CTM by way of a series of examples. The first
one describes the flexible manufacturing system, which is an extension
of the machine vision system used by Grochtmann in his presentation
at the STAR conference [52]. Its purpose is to explain the main mod-
elling concepts and how to deal with valid and invalid values to pre-
vent the introduction of ineffectual combinations in a test suite. The
second example corresponds to a sound amplifier and is used to illus-
trate the handling of ineffectual combinations. The third is a password
diagnoser which addresses the problems of missing and infeasible com-
binations.

For each of these three examples, we strongly encourage you to think

about the problem and try to make a start with a classification tree be-
fore reading the solution.

11.10.1 The flexible manufacturing system

The purpose of the flexible manufacturing system is to pick up metal

sheets of different shape, colour and size transported by a conveyor
and put them into appropriate bins. The sheets are transported by a
conveyor which positions them in front of a camera, which feeds the
images to a machine vision system that classifies them and issues a com-
mand to the robotic arm subsystem to pick up the sheet and move it to
the assigned bin:

3 http://www.testona.net/

187
Universidad Politécnica de Valencia Structured Software testing

The system is designed to recognise any combination of three types of

shapes (triangles, circles and squares), three colours (green, blue and
red) and two sizes (small and large).

The first step in the process is to select the scope of testing. Are we
going to test the entire system, the machine vision subsystem or the
robotic arm subsystem? If we choose the vision subsystem, our concern
will be that the system is able to correctly classify the sheets and issue
commands. If we choose the robotic arm portion, the starting point will
be the reception of a well-formed command, the ability of the system
to interpret and execute the command. If we choose the entire system
it will be a combination of the two previous scenarios plus some other
cases such as powering-up, shutting down, emergency stop and so on.
For the sake of this example we have selected the machine vision sub-
system as the SUT.

Once this has been decided, the test designer might start by considering
the figures to be recognised. An obvious choice, since the purpose of the
subsystem is to classify them according to their shape, colour and size
as a test parameter. He or she might also consider environmental condi-
tions such as the room illumination, the conveyor background and the
speed at which the figures move in front of the camera to be relevant
test parameters, either because a specification called for it or because
the test designer suspects the performance of the subsystem might be
affected by them. This first cut at the problem is documented by the
classification tree in Figure 11.5. We can make the following remarks
about this CT:
• Because each figure is defined by three non-exclusive attributes, namely
shape, colour and size, we model Figure as a composition node. Each
of the attributes is necessary to define the figure.
• Contrast this with the room Lighting, which is modelled as a classifica-
tion node. If the Lighting is Dim, whatever the definition of Dim is, it
cannot be Bright. Classifications define exclusive choices.
• The Conveyor aspect is also modelled as a composition since its effect
on the correctness of the classification depends on two coexisting at-
tributes: the conveyor’s Speed and its Background.

From a purely semantic perspective, we did not need to introduce the

nodes Figure and Conveyor in the tree, since the root of the tree is also a
composition. We could also have done it like this, which is semantically
equivalent:

188
Chapter 11 Classification trees

FIGURE 11.5 First cut decomposition for the machine vision subsystem

However, the representation in Figure 11.5 is more descriptive and bet-

ter communicates the nature of the problem and the test parameters
being considered.

Now that we have identified the first test parameters, we need to de-
cide whether to assign terminal values to them or to further refine or
decompose the parameters into sub-parameters. Notice that given the
hierarchical approach taken, sub-parameters might be employed as pa-
rameters in their own right or as values of the parent parameter. In
general, a test parameter or a value will be decomposed when the en-
tity it represents:
• is made up of sub-entities that might affect the output of a test; or

189
Universidad Politécnica de Valencia Structured Software testing

• can be categorised along several dimensions the test designer would

like to explore; or
• has structural features that are known to be error prone from a pro-
gramming point of view.

For example the node element Shape might take the values: Triangle,
Circle and Square. Since a shape can only have one of these values at
a time (i.e. it cannot be a circle and a square at the same time), we can
model Shape as a classification of classes Triangle, Circle and Square.

Likewise Color, Size, Speed and Background are modelled as classifications

with their respective classes as shown below:

At this point, the test designer might ask himself/herself about differ-
ent types of triangles. Is the machine vision system capable of recognis-
ing equilateral, scalene and isosceles triangles? This might be specified
or not, but it is a legitimate question. If the specification called for the
recognition of equilateral triangles, then the other two types must not
be classified as triangles, so we need to test that this is indeed the case.
However, if the specification assumed all kind of triangles, then the
tester must assure that the system not only works with equilateral but
with the other types of triangles as well. Consequently, the kind of tri-
angle is a test relevant aspect that applies to the value Triangle, which
now becomes a sub-test parameter. This can be modelled as shown by
the tree below:

Before continuing with the refinement of the model, this is a good point
to illustrate the relationship between the model constructed so far and
combinatorial coverage criteria for generating test cases (see Chapter 9).
The obvious choice would be to consider Shape, Color, Size, Lighting, Speed

190
Chapter 11 Classification trees

and Background as a combination table’s test parameters and their respec-

tive leaf classes as their values. Assuming that the desired strength is
to cover all 3-way interactions, this can result in the test suite shown by
the following table of 29 test cases:

Test Case# Shape Color Size Lighting Speed Background

1 Equilateral Red Small Dim Fast Light
2 Equilateral Red Large Bright Slow Dark
3 Equilateral Green Small Bright Fast Dark
4 Equilateral Green Large Dim Slow Light
5 Equilateral Blue Small Dim Slow Dark
6 Equilateral Blue Large Bright Fast Light
7 Scalene Red Small Bright Slow Light
8 Scalene Red Large Dim Fast Dark
9 Scalene Green Small Dim Fast Light
10 Scalene Green Large Bright Slow Dark
11 Scalene Blue Small Bright Fast Dark
12 Scalene Blue Large Dim Slow Light
13 Isosceles Red Small Dim Slow Dark
14 Isosceles Red Large Bright Fast Light
15 Isosceles Green Small Bright Slow Light
16 Isosceles Green Large Dim Fast Dark
17 Isosceles Blue Small Dim Fast Light
18 Isosceles Blue Large Bright Slow Dark
19 Circle Red Small Dim Fast Light
20 Circle Red Large Bright Slow Dark
21 Circle Green Small Bright Fast Dark
22 Circle Green Large Dim Slow Light
23 Circle Blue Small Dim Slow Dark
24 Circle Blue Large Bright Fast Light
25 Square Red Small Dim Fast Light
26 Square Red Large Bright Slow Dark
27 Square Green Small Bright Fast Dark
28 Square Green Large Dim Slow Light
29 Square Blue Small Dim Slow Dark

However, we could take another strategy to reduce the number of test

cases slightly. This strategy would be a hybrid one:
• First we establish the equivalence of processing for the different kinds
of triangles, i.e. we test whether the system is capable of recognising
the three types of triangles under fixed conditions.
• Second, we test for interaction effects using 3-way testing using Triangle
as an abstract parameter value that will be replaced with one of the
actual types (isosceles, equilateral or scalene) during testing.

This gives the following test cases for the first part:

Test Case# TriangleType Color Size Lighting Speed Background

1 Equilateral Red Small Dim Fast Light
2 Scalene Green Large Bright Slow Dark
3 Isosceles Blue - - - -

And for the second part the following remaining test cases:

191
Universidad Politécnica de Valencia Structured Software testing

Test Case# Shape Color Size Lighting Speed Background

4 Circle Red Small Dim Fast Light
5 Circle Red Large Bright Slow Dark
6 Circle Green Small Bright Fast Dark
7 Circle Green Large Dim Slow Light
8 Circle Blue Small Dim Slow Dark
9 Circle Blue Large Bright Fast Light
10 Square Red Small Bright Slow Light
11 Square Red Large Dim Fast Dark
12 Square Green Small Dim Fast Light
13 Square Green Large Bright Slow Dark
14 Square Blue Small Bright Fast Dark
15 Square Blue Large Dim Slow Light
16 Triangle Red Small Dim Slow Dark
17 Triangle Red Large Bright Fast Light
18 Triangle Green Small Bright Slow Light
19 Triangle Green Large Dim Fast Dark
20 Triangle Blue Small Dim Fast Light
21 Triangle Blue Large Bright Slow Dark

This can be a very important decision to reduce the number of test cases,
since we know that, for a given test strength, the number of test cases
generated grows with the number of test parameters.

Up until now, we have only considered valid values for the different test
parameters, but of course invalid values have to be taken into account
too. For example, what happens if the figure does not have one of the
defined shapes or colours? Does the system stop? Does it assign the
same classification as to the last one processed? Does it classify it as
unrecognisable and send it to a trash bin?

The challenge with negative testing in the CTM is that the inclusion of
an invalid value in a combined test case might result in the premature
termination of the processing – for example with the display of an error
message – and the discarding of valuable combinations of valid values
included in the combined test case.

This is called the problem of ineffectual combinations. In practical terms,

this means that although the generated test suite was of strength t, not
all the combinations of that order have been processed.

A solution to this problem consists of creating two separate branches

for each test parameter, as depicted in Figure 11.6: one for valid values
and the other for the invalid ones (the same as we have seen earlier for
the hardware store).

By separating the two branches it is very simple to generate test cases

using a test strategy which consists of:
• each choice combinations (ECC) of invalid values for negative testing:

192
Chapter 11 Classification trees

FIGURE 11.6 CT for flexible manufacturing system including invalid

values

Test Case# Figure Lighting Speed Background

1 UndefinedShape Dim Fast Light
2 UndefinedColor Bright Slow Dark
3 UndefinedSize - - -
4 Triangle+Red+Small Dark - -
5 Circle+Green+Large Saturated - -
6 Square+Blue+* - TooSlow -
7 - - TooFast -

Of course, we choose ECC since every test case should contain only
one invalid value, as we explained in Section 6.6.
• all t-way combinations for all valid values like we have seen before.

11.10.2 The audio amplifier

In this example the SUT is the audio amplifier shown below.

The amplifier has two input jacks, two volume controls, two toggle
switches and one three-way selection switch. The main purpose of this
example is to continue the discussion about ineffectual combinations
briefly addressed in the previous example when discussing how to deal
with invalid values. Here we are specifically addressing the problem of
ineffectual combinations caused by dependent parameters.

193
Universidad Politécnica de Valencia Structured Software testing

The problem of dependent parameters arises when the relevance of one

or more test parameters depends on the value of another parameter,
called the dominant parameter. Let us look at an example. Below is a
straightforward tree for the audio amplifier:

The test relevant aspects are:

• whether the two input Jacks are plugged or not;
• the positions of the two Volume controls;
• the three Switches.

While there is nothing wrong with the tree as drawn, if we were to me-
chanically map its nodes onto the parameters and values of a combina-
tion table, the resulting test suite would provide a coverage lower than
the table’s nominal strength. A look at the following table will show
how this is possible.
Test
J1 J2 Control1 Control2 RMS OnStdby PowerOffTweed Comment
Case#
1 Unplugged Unplugged 0 0 60Position Stdby Off Ineffectual
2 Plugged Plugged 0 1 100Position On Power
3 Unplugged Plugged 0 5 60Position On Tweed
4 Unplugged Unplugged 0 9 100Position Stdby Power Ineffectual
5 Plugged Unplugged 0 10 60Position Stdby Tweed Ineffectual
6 Plugged Plugged 1 0 100Position Stdby Tweed Ineffectual
7 Unplugged Unplugged 1 1 60Position On Off Ineffectual
8 Plugged Unplugged 1 5 60Position Stdby Power Ineffectual
9 Plugged Plugged 1 9 100Position On Off Ineffectual
10 Unplugged Plugged 1 10 100Position On Power
11 Plugged Unplugged 5 0 60Position On Power
12 Unplugged Plugged 5 1 100Position Stdby Tweed Ineffectual
13 Unplugged Unplugged 5 5 100Position On Off Ineffectual
14 Unplugged Plugged 5 9 60Position On Tweed
15 Plugged Unplugged 5 10 60Position Stdby Off Ineffectual
16 Unplugged Plugged 9 0 60Position Stdby Tweed Ineffectual
17 Plugged Unplugged 9 1 100Position On Off Ineffectual
18 Unplugged Plugged 9 5 60Position Stdby Power Ineffectual
19 Unplugged Plugged 9 9 60Position On Power
20 Unplugged Unplugged 9 10 60Position On Off Ineffectual
21 Unplugged Plugged 10 0 60Position Stdby Power Ineffectual
22 Plugged Unplugged 10 1 100Position On Off Ineffectual
23 Plugged Unplugged 10 5 100Position Stdby Tweed Ineffectual
24 Unplugged Plugged 10 9 100Position Stdby Off Ineffectual
25 Plugged Plugged 10 10 100Position Stdby Off Ineffectual

When the amplifier is Off or in Stdby it is not amplifying and so nothing

else is testable. In terms of the previous table this means that all value
combinations included in a row that contains an Off or a Stdby are lost.

194
Chapter 11 Classification trees

In this particular instance, the problem could be easily resolved by phys-

ically removing the offending parameter values from the table. How-
ever, we can also solve this by changing the tree such that we high-
light the different modes the amplifier can be in and the corresponding
switch positions. The adapted tree will look something like this:

Now we can split the test suite into three test groups, one for each mode,
as shown in the following table:

Mode Test
J1 J2 Control1 Control2 RMS OnStdby PowerOffTweed
Case#
Off 1 Off
Stdby 2 Stdby Power
Stdby 3 Stdby Tweed
On 4 Unplugged Unplugged 0 0 60Position On Tweed
On 5 Plugged Plugged 0 1 100Position On Power
On 6 Unplugged Unplugged 0 5 100Position On Power
On 7 Plugged Unplugged 0 9 100Position On Tweed
On 8 Unplugged Plugged 0 10 60Position On Tweed
On 9 Plugged Plugged 1 0 60Position On Power
On 10 Unplugged Unplugged 1 1 100Position On Tweed
On 11 Plugged Plugged 1 5 60Position On Tweed
On 12 Unplugged Plugged 1 9 60Position On Power
On 13 Plugged Unplugged 1 10 100Position On Power
On 14 Unplugged Plugged 5 0 100Position On Tweed
On 15 Plugged Unplugged 5 1 60Position On Power
On 16 Plugged Unplugged 5 5 60Position On Tweed
On 17 Plugged Unplugged 5 9 100Position On Tweed
On 18 Unplugged Unplugged 5 10 60Position On Power
On 19 Plugged Unplugged 9 0 60Position On Tweed
On 20 Unplugged Plugged 9 1 100Position On Power
On 21 Plugged Unplugged 9 5 60Position On Power
On 22 Plugged Unplugged 9 9 100Position On Tweed
On 23 Plugged Plugged 9 10 100Position On Power
On 24 Plugged Unplugged 10 0 60Position On Power
On 25 Unplugged Plugged 10 1 100Position On Tweed
On 26 Plugged Unplugged 10 5 60Position On Power
On 27 Plugged Unplugged 10 9 60Position On Power

11.10.3 The password diagnoser

The password diagnoser is a software module executed as part of the

registration process for an online banking system. The purpose of the
diagnoser is to verify that user passwords conform to good security
practices as defined by the following requirements:

195
Universidad Politécnica de Valencia Structured Software testing

Characteristic Requirement
Length Shall be 8 characters or more
Composition Shall include at least one upper case character,
one numeric character and one special character
Predictability Shall not have any QWERTY keyboard or
ASCII sequence of length greater than 3

The software module analyses the password submitted by a registrant

and issues a diagnostic message for each requirement not met.

This example highlights the distinction between the use of combina-

torial testing to combine actual values, as we did in the preceding ex-
ample (see also below), versus the combination of characteristics from
which the actual test values are to be derived. This example also intro-
duces the problems of missing combinations and infeasible combinations,
and continues the discussion about ineffectual combinations.

In the previous example (the audio amplifier), each class consists of one
value of a test parameter. We could make such a value-based tree for the
password diagnoser as well, in such a way that actual test passwords
can be generated by concatenating all combinations of the parameter
values, like in the tree below:

However, this is not an elegant solution. First of all, it is not general

enough; we only choose 1 possible value for each of the characteristics
(e.g. “A” for a password that has an uppercase letter, and lowercase “a”
for a password that does not). If we want to make it more general, then
it does not really scale, as we can see in the tree below:

So, for this specific example, the value-based approach is not the best
choice.

196
Chapter 11 Classification trees

A better solution would be a characteristic-based tree that makes the in-

tentions of the test designer more explicit:

In this second tree, instead of using tokens corresponding to actual se-

quences of characters to be included in the password, we used the re-
quirements the password must conform to as the values of the test pa-
rameters.

We use combinatorial testing to generate specifications with which the

actual test values must comply and then use these specifications to
manually or automatically generate the necessary test cases. The result
of the 3-way combination of all test parameters is shown in Table 11.2.

A quick look at the entries in the table shows that no valid password
was generated. Without a doubt, such a password must be part of any
test suite that we could call adequate. A valid password must comply
with five requirements:
1 its length has to be 8 characters or more (Length ≥ 8);
2 it shall include at least one upper case letter (UpperCase = Yes);
3 it shall include a number (Number = Yes);
4 it shall include a special character (SpecialCharacter = Yes); and
5 it shall not include a sequence longer than 3 characters (Predictability
= No).

However, because we chose to generate a test suite consisting of all 3-

way combinations there is no guarantee that such a combination will be
included. This is what we call the problem of missing combinations. This
problem can be addressed by:
• seeding [27], i.e. manually introducing handcrafted test cases to cover
specific combinations, such as a valid password; or
• using higher-strength combinatorial testing (e.g. 5-way instead of 3-
way) to make sure that mandatory combinations are not missed.

The second solution is only preferable to seeding when the required

combinations are made up of a small subset of test parameters since,
in that case, the higher-strength combinatorial testing can be applied
to just this small subset instead of to the entire set of test parameters.
Otherwise we will be forcing a large number of test cases just to cover
a few required combinations.

Did you notice that the test designer seems to have a tendency to cap-
italise the first letter of the passwords and to write the sequences in
ascending or left to right order? One cannot but wonder whether the
developer based his design on the same thought patterns:

197
Universidad Politécnica de Valencia Structured Software testing

Test case Test specification Test value† Expected result

number (diagnostic message)
1 - Length: >= 8 Eduardo7890 Password must include at least one special character
- UpperCase: Yes Password must not contain more than 3 consecutive
- Number: Yes keyboard keys
- SpecialCharacter: No
- Predictability: Qwerty
2 - Length: >= 8 eduardo974 Password must include at least one uppercase letter
- UpperCase: No Password must include at least a number
- Number: No Password must include at least one special character
- SpecialCharacter: No
- Predictability: No
3 - Length: >= 8 Eduardo#abcd Password must include at least a number
- UpperCase: Yes Password must not contain more than 3 consecutive
- Number: No letters or numbers
- SpecialCharacter: Yes
- Predictability: ASCII
4 - Length: < 8 Rick$1 Password must have at least 8 characters
- UpperCase: Yes
- Number: Yes
- SpecialCharacter: Yes
- Predictability: No
5 - Length: < 8 xcvb$a Password must have at least 8 characters
- UpperCase: No Password must include at least one uppercase letter
- Number: No Password must include at least a number
- SpecialCharacter: Yes Password must not contain more than 3 consecutive
- Predictability: Qwerty keyboard keys
6 - Length: < 8 arstu4 Password must have at least 8 characters
- UpperCase: No Password must include at least one uppercase letter
- Number: Yes Password must include at least one special character
- SpecialCharacter: No Password must not contain more than 3 consecutive
- Predictability: ASCII letters or numbers
7 - Length: < 8 Peopqr Password must have at least 8 characters
- UpperCase: Yes Password must include at least a number
- Number: No Password must include at least one special character
- SpecialCharacter: No Password must not contain more than 3 consecutive
- Predictability: ASCII letters or numbers
8 - Length: >= 8 hotspot2* Password must include at least one uppercase letter
- UpperCase: No
- Number: Yes
- SpecialCharacter: Yes
- Predictability: No
9 - Length: < 8 Asdf Password must have at least 8 characters
- UpperCase: Yes Password must include at least a number
- Number: No Password must include at least one special character
- SpecialCharacter: No Password must not contain more than 3 consecutive
- Predictability: Qwerty keyboard keys
.
.
.
18 - Length: >= 8 eduardocdef Password must include at least one upper case letter
- UpperCase: No Password must include at least a number
- Number: No Password must include at least one special character
- SpecialCharacter: No Password must not contain more than 3 consecutive
- Predictability: ASCII letters or numbers
† These values are written by the test designer following the specification

TABLE 11.2 Test table for password diagnoser

198
Chapter 11 Classification trees

Did the developer test for the presence of an uppercase letter in any position
other than the beginning of the string?

Did he or she consider descending or right to left sequences?

To prevent these biases, the test designer might want to include other
test relevant aspects such as the position where the characters appear
and the sequence order.

Note that these new test parameters do not come from the diagnoser
specification but from the experience or creativity of the test designer or
from organisational assets such as test catalogues or bug taxonomies.

If we were to combine all the leaves of the previous tree we would pro-
duce a large number of test cases including many infeasible combinations.
Look at the following test cases, for example:

Test case Length UpperCase Number SpecialCharacter SequenceType SequencePosition SequenceOrder Test value
number
1 <8 No No No No Beginning Ascending Ineffectual
2 >= 8 No No Beginning QWERTY InBetween Descending @lkjheduardo
.
.
.
21 <8 Beginning Beginning No No InBetween Descending Infeasible
22 <8 Beginning Beginning Beginning QWERTY Beginning Ascending Infeasible
.
.
.
69 >= 8 End End No No InBetween Descending Infeasible
70 >= 8 Beginning InBetween InBetween QWERTY End Descending E9#eduardolkjh

199
Universidad Politécnica de Valencia Structured Software testing

In test cases 21, 22 and 69, an UpperCase letter and a Number occupy
the same position in a string, a situation which is physically impossible.
Test case 70 is valid since the InBetween position is not a single position
but any position between the beginning and the end of the string.

We can deal with the infeasible combinations in different ways.

We can try to prevent the generation of infeasible test cases by modelling

the problem in a different way. We could, for example, model Position as a
classification to which we subordinate the character types: LowerCase,
UpperCase, Number, SpecialCharacter. The tree now looks like this:

Notice that from the point of view of the intellectual effort required to
create a tree, it is better to start treating every test parameter indepen-
dently from each other like we did for the previous version of the tree
–on page 199– and then rearrange it, than trying to figure the optimum
tree from the beginning.

With regards to the Predictability test parameter, if we were to flatten

out the tree and include all the test parameters in the same combination
table, we would be generating a lot of ineffectual combinations due to the
fact that the No value of the Predictability parameter would become a
dominant parameter controlling whether the Type, Position and Order
parameters are relevant or not. This is a problem akin to what we face
in the design of a relational database, in which we must break down
a hierarchical structure into several tables to avoid update anomalies.
To solve the problem we will split the test suite into three test groups as
shown below (using 2-way coverage).

200
Chapter 11 Classification trees

Test Length Beginning InBetween End Predictability Comment

Type Position Order
1 >=8 UpperCase Number SpecialCharacter QWERTY Beginning Descending Predictability testing
2 >=8 UpperCase Number SpecialCharacter ASCII Beginning Ascending Predictability testing
3 >=8 UpperCase Number SpecialCharacter QWERTY InBetween Ascending Predictability testing
4 >=8 UpperCase Number SpecialCharacter ASCII InBetween Descending Predictability testing
5 >=8 UpperCase Number SpecialCharacter QWERTY End Ascending Predictability testing
6 >=8 UpperCase Number SpecialCharacter ASCII End Descending Predictability testing
7 >=8 LowerCase LowerCase UpperCase Yes† Diagnostic testing
8 <8 LowerCase UpperCase SpecialCharacter No Diagnostic testing
9 >=8 LowerCase Number Number No Diagnostic testing
10 >=8 LowerCase SpecialCharacter LowerCase No Diagnostic testing
11 <8 UpperCase LowerCase Number Yes Diagnostic testing
12 >=8 UpperCase UpperCase LowerCase Yes Diagnostic testing
13 <8 UpperCase Number SpecialCharacter Yes Diagnostic testing
14 <8 UpperCase SpecialCharacter UpperCase No Diagnostic testing
15 >=8 Number LowerCase SpecialCharacter No Diagnostic testing
16 <8 Number UpperCase UpperCase Yes Diagnostic testing
17 <8 Number Number LowerCase Yes Diagnostic testing
18 <8 Number SpecialCharacter Number Yes Diagnostic testing
19 >=8 SpecialCharacter LowerCase LowerCase No Diagnostic testing
20 <8 SpecialCharacter UpperCase Number Yes Diagnostic testing
21 >=8 SpecialCharacter Number UpperCase No Diagnostic testing
22 >=8 SpecialCharacter SpecialCharacter SpecialCharacter Yes Diagnostic testing
23* >=8 Number SpecialCharacter Number No Seeded test case
† Any sequence will suffice since their equivalence was verified by the Predictability test suite (Test cases 1 till 6)
* This test case was seeded

The first test group tests for Sequences with different characteristics as-
suming all other requirements are met. The second group tests for inter-
action effects between all requirements, assuming that all variations of
Sequence form an equivalence class. The third group defines a seeded
test case that consists of a password that satisfies all requirements since
there is no guarantee that such a test case would be generated using
2-way interactions.

11.11 Exercise

EXERCISE 11.3
Look at the following Replace dialogue. It contains a FindNext com-
mand (similar to what we have seen in Section 11.5), but also Replace,
ReplaceAll, Help (?) and Close/×.

In Section 11.5, we focussed on just one command (i.e. Find), and ex-
cluded the other commands in the dialogue window. In this exercise
we do not want to do that. We want the scope of testing to be the en-
tire dialogue and all the functionalities it encompasses. This apparently
simple difference seems to be taken for granted in the literature but it

201
Universidad Politécnica de Valencia Structured Software testing

has a large effect on the complexity of the resulting models and the ex-
tent to which we will experience combination anomalies (i.e. ineffectual
or infeasible combinations). We want you to find out why while mod-
elling the Replace dialogue as a classification tree. While modelling,
think what combinations your tree would generate, and if there are a
lot of anomalies, think of different ways you can model.

202
Chapter 11 Classification trees

203
Chapter contents 12

Graph models

Overview 205

1 Introduction 205
2 Graphs 206
3 Paths in graphs 207
4 General graph-based coverage criteria 209
4.1 Vertex coverage 210
4.2 Edge coverage 210
4.3 Path coverage 211
5 Procedures for making test suites in the case of cycles 211
5.1 Transition trees 211
5.2 Prime paths 216
6 Graph coverage for source code 217
6.1 Control flow graph from source code 218
6.2 Statement and branch coverage 220
6.3 Condition coverage 221
6.4 Multiple condition coverage 221
6.5 Other code coverage criteria 222
6.6 McCabe’s cyclomatic complexity 222
7 Graph coverage for state machines 223
8 Flowcharts 226

204
Chapter 12

Graph models

OVERVIEW

In this chapter, graphs are the models for test case generation and cov-
erage. Graphs are used in many ways in software engineering and they
come in all kinds of flavours and styles. But in the end they all con-
sist of a collection of objects (called vertices and drawn as a circles, el-
lipses or boxes) and relations between them (called edges and drawn as
lines or arrows). Graphs can model flow dynamics in software applica-
tions. Consequently, the test cases that we can derive from graphs are
sequences describing steps through the SUT. Up until now the test cases
we have seen were testing input-output behaviour by inserting single
input values, not sequences.

Again, graphs fit in the way we present all testing techniques in this
course:

make a model – choose a coverage criterion – make test cases

So, the graph is the model, and we will need to make test cases with
the intention of covering it. Beizer’s [13] general principles for using
graphs in testing fit very well in this course. His motto is:

QUESTION: What do you do when you see a graph?

ANSWER: Cover it!

LEARNING GOALS
After studying this chapter, you are expected to:
– understand the use of graphs in testing
– know different types of graphs that are used in testing (control
flow, state machines and other flowcharts)
– understand and be able to apply different general coverage criteria
for graphs.

CONTENTS

12.1 Introduction

Graphs play a very important role in software engineering and test-

ing. Many different types of graphs are used: control flow graphs, data
flow graphs, transaction flow graphs, “whatever can” flow graphs, call
graphs, state machines, statecharts, or Petri nets.

205
Universidad Politécnica de Valencia Structured Software testing

In this chapter, we will first give an overview of the concepts and defi-
nitions related to graphs. This is not an extensive treatment of graphs,
but just the concepts we will use here to explain the different coverage
criteria for testing for general type of graphs. Subsequently we will look
at examples of more concrete graphs that are used in software engineer-
ing and show how the definitions on the general graphs translate to the
concrete examples (e.g., control flow graphs and state machines).

12.2 Graphs

A (directed) graph G = (V, E) is defined by a collection V of objects and

a specification E of which objects are related and how.
• V is a finite and non-empty set of objects that are called the vertices
or the nodes.
• E ⊆ V × V is a set of ordered pairs of nodes, called the edges, arcs or
links between nodes.
• Edge (vi , v j ) ∈ E goes from vi to v j .
• Graphs used in testing will often have:
– a set of initial vertices: Vinit ⊆ V
– a set of final vertices: Vf in ⊆ V

If |Vinit | = 1, then the graph is called a single entry graph. Similarly,

if |Vf in | = 1 then the graph is called a single exit graph. In testing we
will mostly part from single entry graphs, meaning that we will have 1
designated initial vertex.

Our definition is about directed graphs or digraphs since, in testing,

most graphs are directed. Might we need an undirected graph, i.e. a
graph in which the edges have no direction, then we can interpret E as
a set of unordered pairs of nodes.

Graphs can be represented in different forms. The most known is the

pictorial representation where the vertices are depicted by squares, cir-
cles or bubbles, and the edges are depicted by arrows. For example,
look at the pictorial representation of graph G1 in Figure 12.1.

V = {1, 2, 3, 4, 5, 6}
E = {(1, 3), (1, 4), (1, 6), 3
(2, 3), (2, 4), (3, 2),
(3, 6), (4, 5), (5, 6), 5 1
(6, 3)}
Vinit = {1}
Vf in = {6}

4 2

FIGURE 12.1 Graph G1

206
Chapter 12 Graph models

Another representation of graphs is with an adjacency matrix, a matrix

representation of exactly which vertices in a graph contain edges be-
tween them. The matrix m is a kind of lookup table: we can query
whether edge (vi ,v j ) is in the graph by looking at m[i][j]. The adjacency
matrix representation of graph G1 from Figure 12.1 is below:

We can also represent graphs with adjacency lists. Typically we will have
a list of size |V | where each list entry consists of a list of edges that go
out of the vertex that is represented by the list entry. The adjacency list
representation of graph G1 is below:

12.3 Paths in graphs

We need some general graph-related definitions and also some graph-

related definitions specific to testing. We start with the general defini-
tions.
• A path p from v0 to vk (k ≥ 0) in a graph G = (V, E) is a sequence of
vertices v0 , v1 , . . . , vk where (vi , vi+1 ) is an edge in E, for 0 ≤ i < k.
Note:
– according to this definition, a path of length 0 is a sequence that
just contains one vertex and has no edges.
– a path may contain more than one occurrence of the same vertex.

• The number of edges in the path is the length of the path. We denote
this with length( p).
• paths( G ) denotes the set of all paths in graph G = (V, E).
• A simple path is a path in which no vertex appears more than once.
• A path segment or a subpath of path p is a consecutive subsequence
of p.
• A vertex v is reachable from a vertex u (and u can reach v) if there
exists a path that starts with u and ends with v.
• A cycle or a loop is a path that starts and ends in the same vertex.

207
Universidad Politécnica de Valencia Structured Software testing

Make a graph

Pick a coverage criterion c Vertex coverage

Edge coverage

Make the set of test paths to (Prime) path coverage

abstract test cases reach high c coverage (a
priori coverage) Transition tree coverage

Round-trip coverage

concrete test cases Sensitise these

into suite T of test cases

Execute the
test cases

Obtain the set of

traces(T)

Determine c coverage
of the test run (a
posteriori coverage)

FIGURE 12.2 Graph-based testing as explained in this chapter

Now we get to the first test-related definitions. Recall that a test case
contains all information necessary to guide the execution of a particular
test (e.g. input, recipe, oracle). Also remember the way we present test
techniques in this course:

make a model – choose a coverage criterion – make test cases

So, in this section, when we have a graph as a model, we will make a

test case with the intention of making the system go through a certain
path1 in the graph: a path we want to cover with the test case. We will
call such a path a test path:
• A test path is an abstract test case or part of it. Test paths will start
in the vertex from Vinit and end in a vertex from Vf in . We can think
of test paths as the abstract test cases we define with a graph as a test
model. We can make these concrete by finding the input values that
should cause the test path to be covered together with the expected
outcomes (oracle) (in Section 11.7 this was called sensitisation).

Once we have our concrete test cases, we need to execute them to test
how our system responds to them, to see whether the intended test path
has been executed (i.e. the intended coverage has been reached) and to
detect failures if any.
1 Note that there are also techniques where a test case is not a path, but a subgraph or
a subtree. In this section we only look at paths.

208
Chapter 12 Graph models

• A trace represents an execution of a test case. Note that, in case of

non-determinism, two executions of the same test case may yield two
different traces. Also note that a trace does not necessarily end in a
vertex from Vf in because a failure during the execution might prevent
that. In the case of a failure, it can happen that only a subpath of the
trace is a path in the graph, since the failure may lead us to places
that are not defined in our graph.
• A test run is an execution of a test suite, and hence will result in a set
of test traces. For a test suite T, we will denote this set by traces( T ).

Note that a set of test paths and a set of traces can both be represented
as a set of subpaths in a graph G, but they are different things. To sum-
marise how they relate and how graph-based testing works, we refer to
Figure 12.2.

To define coverage criteria, we need terminology to express the pres-

ence of vertices, edges and subpaths in paths. We will use the terminol-
ogy familiar from travelling introduced in [4].

• A path p is said to visit vertex v if v is in p:

p visits v ≡ v ∈ p

Note that we overloaded the set operator ∈ for sequences here.

• A path p = v0 v1 . . . vk (k ≥ 1) is said to visit edge e = (u, w) if e

appears in p:

p visits e ≡ ∃i : 0 ≤ i ≤ k − 1 : (vi = u) ∧ (vi+1 = w)

• A path p is said to tour a path q, if q is a subpath of p:

p tours q ≡ q is a subpath of p

Note that for edges e ∈ E the following holds:

p visits e ⇔ p tours e

• The set of subpaths that are toured by the paths from a set of paths P
is defined as:

toured_by_paths( P) = {q | ∃ p : p ∈ P : ( p tours q) }

12.4 General graph-based coverage criteria

Following Figure 12.2, coverage criteria are used in two ways:

• To design a set of test paths (abstract test cases) to meet the criteria;
• To measure the coverage of the resulting test traces.

Since both are sets of paths, we will define the coverage criteria in terms
of sets of paths in a graph. You will not be surprised to find that there
are several coverage criteria, such as vertex coverage, edge coverage
and path coverage.

209
Universidad Politécnica de Valencia Structured Software testing

12.4.1 Vertex coverage

DEFINITION 12.1 Given a graph G = (V, E) and a set of paths P in G. The vertex coverage
of P is the percentage of all vertices in G that are visited by the paths in
P:
{v ∈ V | ∃ p : p ∈ P : ( p visits v)}
× 100%
|V |

For graph G1 from Figure 12.1 the test path

1, 3, 2, 4, 5, 6

gives 100% vertex coverage.

Note that this test path starts in a vertex from Vinit and ends in a vertex
from Vf in .

12.4.2 Edge coverage

DEFINITION 12.2 Given a graph G = (V, E) and a set of paths P in G, the edge coverage
of P is the percentage of all edges in G that are visited by the paths in P:

{e ∈ E | ∃ p : p ∈ P : ( p visits e)}
× 100%
| E|

For graph G1 from Figure 12.1 the following set of 4 test paths gives
100% edge coverage:

1, 4, 5, 6
1, 6, 3, 6
1, 3, 2, 3, 6
1, 3, 2, 4, 5, 6

Note that Definition 12.2 of edge coverage from above (which is ba-
sically considered the standard definition by many authors) does not
include vertex coverage. In [4] it is indicated that intuitively it might be
a good idea to include vertex coverage in edge coverage. The example
given there is a graph with a node that has no edges. Our definition of
edge coverage does not cover that node.

For that purpose, Ammann and Offutt define edge coverage as follows
[4]:

DEFINITION 12.3 Given a graph G = (V, E) and a set of paths P in G, the Ammann and
Offutt edge coverage of P is the percentage of all paths in G with length
up to and including 1 that are toured by the paths in P:

{ p | p ∈ toured_by_paths( P) ∧ length( p) ≤ 1}
× 100%
|{ p | p ∈ paths( G ) ∧ length( p) ≤ 1}|

So we reach 100% Ammann and Offutt edge coverage if each node and
each edge are visited at least once by a test run of a test suite.

210
Chapter 12 Graph models

In 1976, Pimont and Rault [95] introduced a criterion for covering pairs
of edges. They called it switch cover. In [4] this is called edge-pair coverage.
In 1978, Chow [32] generalized this and defined n-switch coverage for a
specific graph (i.e. state machines, which we will see in Section 12.7).
The n-switch coverage criterion is about the percentage of paths with
length n + 1 that are covered. This way 0-switch coverage is edge cover-
age and 1-switch coverage is Pimont and Rault’s switch cover.

In the general coverage criteria definitions from above, n-switch cover-

age would be defined as:

DEFINITION 12.4 Given a graph G and a set of paths P in G, the n-switch coverage of P
is the percentage of all paths with length n + 1 in G that are toured by
the paths in P:

{ p | p ∈ toured_by_paths( P) ∧ length( p) = n + 1}
× 100%
|{ p | p ∈ paths( G ) ∧ length( p) = n + 1}|

12.4.3 Path coverage

From coverage of paths of length up to and including 1, we can directly

go to covering all paths in a graph. This seems to be the most complete
and strong coverage criterion.

DEFINITION 12.5 Given a graph G and a set of paths P in G, the path coverage of P is the
percentage of all paths in G that are toured by the paths in P:

{ p | p ∈ toured_by_paths( P)}
× 100%
|paths( G )|

The problem with this coverage definition is that if the graph contains a
cycle, the total number of paths in G is infinite and hence it is impossible
to reach 100% coverage. In the next section, we discuss several attempts
to solve this problem and construct finite test suites.

12.5 Procedures for making test suites in the case of cycles

Selecting test paths from an infinite set of possible paths to create a test
suite is not easy. In this section, we will describe two different proce-
dures from testing literature that deal with the infinite number of paths
resulting from a cycle in the graph and construct finite test suites:
• transition trees defined in Binder [14] and
• prime paths from Offutt [4].

We will discuss each of them in the following subsections.

12.5.1 Transition trees

In [14], Robert Binder defines a test strategy for making a test suite to
cover a graph. It is an adaption of an ”automata-theoretic” test strategy
defined by Chow [32], in 1978, called the W-method for Finite State
Machines (FSM).

211
Universidad Politécnica de Valencia Structured Software testing

Constructing a Transition Tree (TT) for G = (V, E)

1 Label the root of the TT with the initial vertex of G. This is

level 1 of the TT.

2 Suppose we have already built the TT to a level k. Then the

(k + 1)th level is built by examining the nodes in the kth level
from left to right:
• A node at the kth level is terminated if its label is the same
as the label of some node that already is in the TT at some
level j, with j < k, or if a node with this label has already
been examined at level k.
• A node at the kth level is also terminated if it has no out-
going edges in E.
• Otherwise, if the node at level k represents vertex vi ∈ V
and there is an edge (vi , v j ) ∈ E, then we attach a branch
and a successor node to the node representing vi in the
TT . The successor node is labelled with v j and is at level
k + 1.

FIGURE 12.3 Recipe for making a Transition Tree [14, 32]2 .

Binder’s strategy only uses the set P of Chow’s method. Chow defines
this set, which we will call set PChow , as follows:

• Set PChow is any set of paths that contains, for every edge e = (vi , v j ),
both a path v0 , . . . , vi from the initial vertex to vi and the extension of
this path with v j , that is v0 , . . . , vi , v j .

Binder [14] gives a recipe for constructing such a set PChow by building
a transition tree (TT) of G. All subpaths of length ≥ 0 that start at the
root of the transition tree then constitute the set PChow . A procedure for
constructing such a transition tree for G = (V, E) is in Figure 12.3.

The procedure in Figure 12.3 always terminates, since there are only a
finite number of vertices in G. Also, depending on the left to right order
in which we place the successor nodes, a different tree may result.

Whichever tree results, Binder states [14] that covering all subpaths
starting in the root of a TT is a good test strategy for graphs in the pres-
ence of loops. We will define this as transition tree coverage.

2 This recipe is a mixture from the descriptions in Chow [32] and Binder[14]. In the
original paper of Chow [32], in step 2, the condition j ≤ k is stated for j. However, as you
can see later from Example 12.1 this does not always give the desired result and does not
define the complete set PChow . So we adapted that to "j < k or if a node with this label
has already been examined at level k". Also in [62] it is stated that a node is final if it is
already encountered higher up in the tree.

212
Chapter 12 Graph models

3
V = {1, 2, 3, 4, 5}
E = {(1, 1), (1, 2), (1, 3),
(2, 1), (2, 2), (2, 4), 5
(2, 5), (3, 1), (3, 5),
(4, 5), (5, 3)} 1
Vinit = {1}
Vf in = {5}
4 2

FIGURE 12.4 Graph G2

DEFINITION 12.6 Given a graph G, a set PChow for G and a set of paths P in G, the transi-
tion tree coverage of P for the given PChow is the percentage of all paths
in PChow that are toured by the paths in P:

{ p ∈ PChow | p ∈ toured_by_paths( P)}
× 100%
| PChow |

Let us look at an example of how to construct a TT and what test cases

would result for 100% transition tree coverage.

EXAMPLE 12.1 Consider graph G2 in Figure 12.4. We start by constructing a TT for G2

following the recipe from Figure 12.3:

Level 1 We start with initial vertex 1 as the root of the tree.

Level 2 Vertex 1 from level 1 has three outgoing edges, to vertices 1, 2
and 3. So for level 2 the tree will become:

Level 3
• The node with label 1 is terminated at level 2 because it is
already in the tree at level 1.
• Vertex 2 has four outgoing edges, to vertices 1, 2, 5 and 4, so
the tree will become:

• Vertex 3 has two outgoing edges, to vertices 1 and 5, so we

need to add more nodes:

213
Universidad Politécnica de Valencia Structured Software testing

Level 4
• All nodes with label 1 or 2 at level 3 are terminated because
they are already in the tree (at levels 1 and 2).
• Vertex 5 has one outgoing edge, to vertex 3, so the leftmost
node 5 in the tree gets a successor node 3.
• Vertex 4 has one outgoing edge, to vertex 5.
• The rightmost node 5 in the tree is now terminated because
we already added its successor earlier in this step.

This results in the TT from Figure 12.5.

FIGURE 12.5 TT of graph G2 from Figure 12.4

Note that the TT contains each edge of G2 exactly once, and does not
contain anything else. Consequently, transition tree coverage implies
edge coverage, but not necessarily the other way around. Moreover,
note that each edge is reachable from the root (i.e. the initial vertex of
G2 ). Consequently, the set of subpaths of length ≥ 0 starting in the root
of the TT indeed satisfies the definition of PChow . For our TT of G2 , this
set consists of the following paths:

1 1, 2, 2 1, 2, 4, 5
1, 1 1, 2, 5 1, 3
1, 2 1, 2, 5, 3 1, 3, 1
1, 2, 1 1, 2, 4 1, 3, 5

A set of test paths achieving 100% transition tree path coverage for this
PChow is, for instance:

1, 2, 5, 3, 1, 3, 5
1, 2, 4, 5
1, 2, 1, 1, 2, 2, 5
1, 3, 1, 2, 5

214
Chapter 12 Graph models

Binder [14] defines another test strategy based on round-trips.

• A round-trip path is a path that begins and ends with the same ver-
tex, with no repetitions of vertices other than the start and end vertex.

DEFINITION 12.7 Given a graph G and a set of paths P in G. The round-trip path coverage
of P is the percentage of all round-trip paths in G that are toured by the
paths in P:

{ p| p ∈ toured_by_paths( P) ∧ p is a round-trip path of G }
× 100%
|{ p| p is a round-trip path of G }|

Binder’s conjecture is that testing all subpaths starting in the root of a

TT together is as good as testing all the round-trip paths in the graph. He
claims that, although not all round-trip paths are necessarily in the TT
in their entirety, they are covered in pieces [62][6]. We will explain this
using graph G2 from Figure 12.4 again. The round-trip paths in G2 are:

1, 1 3, 5, 3 1, 2, 4, 5, 3, 1
2, 2 5, 3, 5 2, 4, 5, 3, 1, 2
1, 2, 1 1, 2, 5, 3, 1 4, 5, 3, 1, 2, 4
1, 3, 1 2, 5, 3, 1, 2 5, 3, 1, 2, 4, 5
2, 1, 2 5, 3, 1, 2, 5 3, 1, 2, 4, 5, 3
3, 1, 3 3, 1, 2, 5, 3

As mentioned before, Binder claims that the test suite that covers the TT
for G2 , is as good as covering all round-trip paths of G2 .

Note that for the round-trip paths 1, 2, 5, 3, 1 and 2, 4, 5, 3, 1, 2 for

instance, there is no path in the TT that tours these round-trips directly.
However, the round-trips are exercised in pieces: the tree contains 1, 2,
5, 3 and (1, )3, 1, together making 1, 2, 5, 3, 1. Similarly, the tree exercises
(1,) 2, 4, 5 and (1, 2,) 5, 3 and (1,) 3, 1 and 1, 2, together making 2, 4, 5, 3,
1, 2.

At this stage of the research, there is no clear, precise reason that ex-
plains why covering the paths in the TT (i.e. transition tree coverage) is
better than covering all edges. Moreover, also Binder’s conjecture that
transition tree coverage is as strong as round-trip coverage is also still
unproven.

The procedure from Figure 12.3 constructs the TT in a Breadth First

Search (BFS) approach: the tree is constructed level by level. We could,
however, also choose a Depth First Search (DFS) approach: each time
we add a new node to a tree, we wait with adding its siblings until we
have added all of its descendants. In fact, in [62], experiments show that
DFS seems to give better mutation scores (recall Section 1.9.2) for test-
ing. Note that, similar to choosing BFS, choosing DFS allows for multiple
possible trees since the order in which the edges of one vertex are traced
may vary.

215
Universidad Politécnica de Valencia Structured Software testing

2 1

FIGURE 12.6 Graph G3 with Vinit = {0} and Vf in = {6}

EXERCISE 12.1
a Make a TT of G2 using a DFS approach, which defines your PChow for
dfs
this exercise. Let us call this PChow .
b Construct (by hand) a set of test paths P that gives 100% transition
dfs
tree coverage for your PChow from part a of this exercise.
c Determine the transition tree path coverage of your set of test paths
dfs
P for the set PChow resulting from the BFS approach in Example 12.1.

EXERCISE 12.2
a What are the round-trip paths in graph G3 from Figure 12.6?
b Make a TT for G3 using the DFS approach.
c Which set of paths would result in 100% transition tree coverage of
G3 ?
d Do these paths also result in 100% round-trip coverage of G3 ?

12.5.2 Prime paths

In [4], prime paths are introduced to help creating test suites in the pres-
ence of loops.
• A prime path in a graph G = (V, E) is a simple path from vi ∈ V to
v j ∈ V that
– may have vi = v j ;
– does not appear as a proper subpath of any other path.

EXAMPLE 12.2 Graph G2 from Figure 12.4 contains the following prime paths:

2,4,5,3,1,2 1,2,5,3,1 1,2,1

1,2,4,5,3,1 3,1,2,5,3 5,3,5
3,1,2,4,5,3 5,3,1,2,5 3,5,3
5,3,1,2,4,5 2,1,3,5 3,1,3
4,5,3,1,2,4 2,1,2 2,2
2,5,3,1,2 1,3,1 1,1

216
Chapter 12 Graph models

Note that these are exactly all 17 round-trip paths and one prime path
that is not a round-trip path: 2,1,3,5. So there is a very subtle difference
between round-trip paths and prime paths, which can be expressed by
the following alternative definition of round-trip path: a round-trip
path is a prime path that starts and ends at the same vertex.

From the website of [4]:

https://cs.gmu.edu:8443/offutt/coverage/GraphCoverage

we can find that the following 12 test paths cover all prime paths:

test paths prime paths that are covered

1, 2, 4, 5, 3, 1, 2, 5 2, 4, 5, 3, 1, 2 − 1, 2, 4, 5, 3, 1 − 5, 3, 1, 2, 5
1, 3, 1, 2, 4, 5, 3, 5 3, 1, 2, 4, 5, 3 − 1, 3, 1 − 5, 3, 5
1, 2, 5, 3, 1, 2, 4, 5 5, 3, 1, 2, 4, 5 − 2, 5, 3, 1, 2 − 1, 2, 5, 3, 1
1, 2, 4, 5, 3, 1, 2, 4, 5 2, 4, 5, 3, 1, 2 − 1, 2, 4, 5, 3, 1 − 5, 3, 1, 2, 4, 5 − 4, 5, 3, 1, 2, 4
1, 2, 5, 3, 1, 2, 5 2, 5, 3, 1, 2 − 1, 2, 5, 3, 1 − 5, 3, 1, 2, 5
1, 3, 1, 2, 5, 3, 5 3, 1, 2, 5, 3 − 1, 3, 1 − 5, 3, 5
1, 2, 1, 3, 5 2, 1, 3, 5 − 1, 2, 1
1, 2, 1, 2, 5 2, 1, 2 − 1, 2, 1
1, 3, 5, 3, 5 5, 3, 5 − 3, 5, 3
1, 3, 1, 3, 5 1, 3, 1 − 3, 1, 3
1, 2, 2, 5 2, 2
1, 1, 2, 5 1, 1

Note that the last two test paths could be merged into one path 1, 1, 2,
2, 5 that then covers primepaths 1, 1 and 2, 2.

We define prime path coverage as follows:

DEFINITION 12.8 Given a graph G and a set of paths P in G. The prime path coverage of
P is the percentage of all prime paths in G that are toured by the paths
in P:

{ p| p ∈ toured_by_paths( P) ∧ p is a prime path of G }
× 100%
|{ p| p is a prime path of G }|

EXERCISE 12.3
How many test paths do we need for 100% prime path coverage of
graph G3 from Figure 12.6? Do this at first by hand. When finished,
use the aforementioned website to compare your answer to theirs.

12.6 Graph coverage for source code

In Section 1.9.1.1, we have briefly seen code coverage criteria, that give
an idea of the percentage of code that has been executed by a test suite.
Different criteria are defined based on whether specific parts of the code
are executed or not during the tests.

217
Universidad Politécnica de Valencia Structured Software testing

1 statement_a;
2 while (condition)
3 statement_b;
4 statement_c;

FIGURE 12.8 CFG for a while-loop structure

All these criteria we have seen are graph-based coverage criteria of the
Control Flow Graph (CFG). Control flow graphs are graphical represen-
tations of the source code of a program: each vertex corresponds to a
statement or a condition, and the edges correspond to branches. In the
following section we will show how to construct a CFG.

12.6.1 Control flow graph from source code

In Figure 12.7 we see the CFG for an if-then-else programming construct.

The labels of the vertices correspond to the line numbers where the
statements can be found.

1 if (condition)
2 statement_a;
3 else statement_b;
4 statement_c;

FIGURE 12.7 CFG for an if-then-else structure

The vertex with label 1 is called a decision vertex, representing the condi-
tion from line 1 whose value decides whether we continue to statement
2 or 3.

The vertex with label 4 is called a junction vertex since it has more than
1 incoming edge.

The CFG for a while-loop construct can be found in Figure 12.8. Decision
vertex 2 represents the execution of the condition that decides whether
we continue with the loop or not.

For the for-loop in Figure 12.9 we have written the initialisation, the con-
dition and the increment parts of the for on three different lines so it
is easier to see which vertex it corresponds with. The initialisation is
only executed once, and then the loop is executed until the condition in
vertex 3 no longer evaluates to True.

218
Chapter 12 Graph models

1 statement_a;
2 for ( init ;
3 condition;
4 increment )
5 statement_b;
6 statement_c;

FIGURE 12.9 CFG for a for-loop structure

In Figure 12.10 you can see the CFG of a do-while-loop. The loop body is
always executed at least once.

1 statement_a;
2 do statement_b;
3 while (condition);
4 statement_c;

FIGURE 12.10 CFG for a do-while-loop structure

The case or switch statement has the CFG as in Figure 12.11. Here we put
different statements together in a vertex. We can do this because these
statements are executed as a basic block, i.e. all statements in the block
are executed, in a sequential manner. We could make separated vertices
for each of the statements, but it would make the CFG larger.

EXERCISE 12.4
Make a CFG for the following program, which might look familiar. The
methods incharacter, outcharacter and nextcharacter can throw
an IOException.

219
Universidad Politécnica de Valencia Structured Software testing

1 get (ch);
2 switch (ch){
3 case 'A':
4 statement_a;
5 break;
6 case 'B':
7 statement_b;
8 break;
9 case 'C':
10 statement_c;
11 break;
12 default:
13 statement_d;
14 break;
15 }
16 statement_e;

FIGURE 12.11 CFG for a switch or case structure

1 public static void nextcharacter()

2 throws IOException {
3
4 CW = incharacter();
5
6 if ((CW == BL) || (CW == LF)) {
7 if (fill + 1 + bufpos <= MAXPOS) {
8 outcharacter(BL);
9 fill = fill + 1;
10 } else {
11 outcharacter(LF);
12 fill = 0;
13 }
14
15 for (int k=1; k<=bufpos; k++)
16 outcharacter(buffer[k]);
17
18 fill = fill + bufpos;
19 bufpos = 0;
20 } else {
21 if (bufpos == MAXPOS) {
22 Alarm = true;
23 } else {
24 bufpos = bufpos + 1;
25 buffer[bufpos] = CW;
26 }
27 }
28 nextcharacter();
29 }

12.6.2 Statement and branch coverage

The coverage criteria from the previous section can now be applied to
the CFGs and we can see that we get the coverage criteria explained in
Section 1.9.1.1. The notion of vertex coverage is statement coverage; edge
coverage is branch coverage or decision coverage.

220
Chapter 12 Graph models

12.6.3 Condition coverage

Recall that condition coverage (or predicate coverage) was defined as:
the percentage of Boolean sub-expressions present in the guards of a
branch that have been evaluated to both True and False during our tests.
If a decision vertex in the CFG represents a guard as a whole, then edge
coverage is not enough for condition coverage.

However if we draw the CFG a bit different, then edge coverage on that
graph does imply condition coverage.

If a condition that constitutes the guard of a branch contains several

sub-conditions (also called sub-expressions or clauses), we need to make
sure that we draw a decision vertex for each sub-expression. If we do
that, then condition coverage is subsumed by edge coverage on the re-
sulting graph.

For example, recall the program oddOrPos from Figure 1.3, repeated
below for convenience:

1 public int oddOrPos(int[] x) {

2 int count = 0;
3 for (int i=0; i<x.length; i++) {
4 if (x[i]%2 == 1 || x[i] > 0){
5 count++;
6 }
7 }
8 return count;
9 }

The if-then statement in line 4 has a guard with two sub-expressions:

sub-expr1 = (x [i ]%2 == 1)

sub-expr2 = (x [i ] > 0)

If we want to make sure that edge coverage subsumes condition cov-

erage, we need to draw two vertices for each of these sub-expressions
like the CFG in Figure 12.12. Note that in this CFG we have taken into
account the short-circuiting or lazy evaluation of Java.

Note that if you use tools for code coverage, it is important to make sure
you know how the CFG is constructed to make sure that edge coverage
indeed subsumes condition coverage.

12.6.4 Multiple condition coverage

Multiple condition coverage –the percentage of all possible True-False

combinations of sub-expressions present in the guards of a branch– is
not subsumed in any graph coverage criterion, since it is a combina-
torial coverage criterion. Covering each combination of possible truth
values might be impractical for predicates with more than a few clauses.
For those, we can use combinatorial techniques like those described in
Chapter 9 or we can make a decision table as described in Chapter 8.

221
Universidad Politécnica de Valencia Structured Software testing

i=0

i < x.length

x[i]%2 ==1

5 x[i] > 0

i++

FIGURE 12.12 CFG for the program oddOrPos from Figure 1.3

12.6.5 Other code coverage criteria

The modified condition/decision coverage (MC/DC) subsumes condition

coverage, but additionally requires that every sub-condition in a deci-
sion must be executed independently to reach full coverage.

This means that each sub-condition must be executed twice, with the
results True and False, but with no difference in the truth values of all
other conditions in the decision. In addition, it needs to be shown that
each condition independently affects the decision. With this coverage
criterion, some combinations of condition results turn out to be redun-
dant and are not counted in the coverage result.

Again, this is not a graph coverage criterion. In [4] it is classified under

the logical coverage criteria.

12.6.6 McCabe’s cyclomatic complexity

Finally we want to mention McCabe’s cyclomatic complexity. First of all

because of its history and second because many existing code analysis
tools still report this metric.

222
Chapter 12 Graph models

McCabe was one of the first to use CFGs in 1976 [76] when he defined
his cyclomatic complexity metric as a software metric used to indicate
the complexity of a program.

His metric is defined as follows, for a graph G = (V, E):

CG = | E | − | V | + 2 p

where p is the number of connected components of graph G. For most

programs, CFGs are single entry-single exit graphs, so it comes down
to:

CG = | E | − | V | + 2

In 1989, McCabe et.al. [77] proposed the basis path testing coverage crite-
rion, which states that it is enough to test CG distinct simple paths from
the initial to the final vertex in a single entry-single exit CFG.

Note that the number of nodes and edges depends on how we draw the
CFG : one vertex per guard, or one vertex per sub-condition in the guard.
However, there is more to be careful about. The basis path testing crite-
rion turns out to be problematic and unreliable. In the next exercise we
ask you to think about why that could be.

EXERCISE 12.5
Try –for a few minutes, then read the solution– to construct a single
entry-single exit CFG for which it is possible to select CG distinct simple
paths from the initial to the final vertex without reaching 100% state-
ment or branch coverage.

12.7 Graph coverage for state machines

A state machine model is an abstraction composed of states and transi-

tions between these states, that are caused by events (inputs) and can re-
sult in actions (outputs). All state machines can basically be represented
by graphs, and the coverage criteria defined above can be applied when
we use such a graph as a test model of our system.

A myriad of articles and textbooks exist that write about state machines.
Many different ways exist to represent them as a graph, define what can
be in a state, what a transition represents and what can be annotated
to the vertices and the edges. Just to give you an impression, people
have written about FSMs (Finite State Machines), EFSMs (Extended Fi-
nite State Machines), Petri Nets (PN), I/O Automata, Timed I/O Au-
tomata, Probabilistic I/O Automata, Labelled Transition Systems (LTS),
Labelled Terminal Transition Systems (LTTS), Timed Transition Systems
(TTS), and many other variants. In this chapter, we will look at a gen-
eral form of state machines and their representation as graphs. In Chap-
ter 13 we will look in more detail at Labelled Transition Systems (LTS)
as a test model, together with the techniques and accompanying tools
that can automatically generate and execute tests from LTSs.

State machines are, in their most general way, represented by graphs as

follows:
• vertices represent states

223
Universidad Politécnica de Valencia Structured Software testing

OFF

press off/
press on/
off sound
on sound
press off/
off sound IDLE

select program/ press reset/ /notify with ready sound

reset
press reset/
reset press start
PROGRAMMED
[door open] READY
press start
[door closed]

OPERATING /finish

press stop/ press continue/

message on continue
display

PAUSED

FIGURE 12.13 Simple state machine to model a washing machine

• edges represent transitions

• actions, events or both are represented by annotating the edges.

In Figure 12.13 you can see a simple state machine of a typical washing
machine. The vertices represent the states in which the machine can
be: off, idle, programmed, operating, paused and ready. The edges are
annotated with expressions in the format event[guard]/actions:
• the events (inputs) that a user can generate by interacting with the
machine: push the on button, push the off button, select a program,
reset, start, stop, continue.
• the actions (outputs) of the machine: finish, notify with ready sound,
off sound, on sound, message on display
• the guards mean that this transition can only happen when the guard
evaluates to True. You can only put the washing machine into oper-
ation when the door is closed.

The representation from above is also referred to as the Mealy [78] rep-
resentation.

Let us look again at the state machine in Figure 12.13. Vertex coverage
from Definition 12.1 means that all states have been visited once and
hence is also called state coverage. Edge coverage from Definition 12.2
means that all transitions in the state machine have been visited once
and hence is called transition coverage.

EXERCISE 12.6
a Define test cases that establish 100% transition tree path coverage
(Definition 12.6) for the state machine in Figure 12.13. Assume that
Vinit = Vf in = {OFF} and use a BFS approach.
b Do you need to add more tests for 100% round trip path coverage
(Definition 12.7)? You do not have to actually find new test paths, but

224
Chapter 12 Graph models

merely answer yes/no and why.

c Do you need to add more tests for 100% of prime path coverage
(Definition 12.8)? You do not have to actually find new test paths, but
merely answer yes/no and why.

The most difficult part of using state machines for testing is not the
coverage criteria, it is making the state machine. However, it is also the
most entertaining and creative part. Furthermore, it can help you as a
tester learn a lot about the SUT and even find errors while still modelling
[102, 4].

Let us look again at an example of a cash dispenser. In Chapter 2, in

Figure 2.2, we have seen an informal model for that, where the vertices
where actions. That kind of model is known as an informal Moore ma-
chine, as opposed to the Mealy machines we use here. Let us look here
at a more detailed example and make a Mealy machine.

EXAMPLE 12.3 We consider the following specification for the cash dispenser.

The user of the cash dispenser can insert a bank card. After inserting the card,
the user is asked for the card’s PIN. If the PIN is correct, the user is asked for
the amount of money that is required. If the user has enough credit, the money
is given, otherwise a message is displayed that there is not enough money in
the account. In both cases, the cash dispenser will return the card. If the PIN
is wrong, the machine will permit one other attempt. If the PIN is incorrect
again, the cash dispenser will issue a message that the card will be kept.

Reading the description we can detect the following states, events, guards
and actions:
• The machine is waiting (state) for a customer to insert a card. So we
can call this state waiting for card .
• When the card is inserted (event), the machine asks for the PIN (action)
and starts waiting for PIN (state).
• If the PIN is entered incorrectly (event), since it has not been tried two
times the machine will stay in the same state waiting for PIN .
• If the PIN is entered incorrectly (event) for the second time (guard), the
card will be swallowed (action) and the machine will return to the state
waiting for card .
• When the PIN is entered correctly (event), the machine asks for the amount
(action) and then starts waiting for amount (state) of money the user
wants.
• When the user enters the amount it will be checked (event) and if there
is enough money in the account (guard), the money will be given (ac-
tion), the card will be returned (action) and the machine will return to
the state waiting for card .
• When there is not enough money (guard), a message is displayed (ac-
tion) and the machine will return to the state waiting for card .

In a state machine that will look like:

225
Universidad Politécnica de Valencia Structured Software testing

insert card/
ask for PIN

waiting for
card

waiting for
PIN enter incorrect PIN [< 2 times]/
ask for PIN
enter incorrect PIN [2 times]/
swallow card, message

enter amount [enough money]/ enter correct PIN/

give money, return card ask for amount

waiting for
amount

enter amount [not enough money]/

message, return card

EXERCISE 12.7
a Adapt the state model of the cash dispenser such that it also checks
whether the inserted card is valid or not. When invalid it should eject
the card.
b Design a test suite for using the model with a 100% of transition
(edge) coverage.

12.8 Flowcharts

Flowcharts are a general purpose type of graphs that can be used for
testing. A CFG is a flowchart, showing the steps of a piece of code.
However, we can make all kinds of flowcharts at different levels of ab-
straction. Like any type of diagrams, flowcharts can help visualise what
is going on and thereby help understand a system we are testing, and
perhaps also find errors, bottlenecks, and other less-obvious features
within it.

There are many different types of flowcharts, and each type has its own
repertoire of different styles of boxes to draw vertices and other nota-
tional conventions.

The two most common types of boxes in a flowchart are:

• a processing step, usually called activity, denoted as a rectangular
box;
• a decision, usually denoted as a diamond.

For example, we have seen a flowchart for a cash dispenser in Fig-

ure 1.8. Also UML3 –a concept-modelling notation used in software
development– has many types of flow charts (e.g. the activity diagram).
All coverage criteria seen in this chapter can be applied to these charts.

3 https://www.uml.org/

226
Chapter 12 Graph models

227
Chapter contents 13

State-transition models

Overview 229

228
Chapter 13

State-transition models
This chapter has been written by Theo Ruys from Axini.

OVERVIEW

As we have seen, a state-transition model is an instance of the graph

models of Chapter 12. Here, we will particularly look at labelled tran-
sition systems, a well-known state-transition formalism. Labelled tran-
sition models have certain characteristics that makes it possible to use
them for Automated Test Case Generation (ATCG). A considerate part of
the chapter is devoted to the Axini Modeling Language (AML). AML is
a modelling language to specify control-oriented systems.

LEARNING GOALS
After studying this chapter, you are expected to:
– understand how labelled transition systems work
– understand the concepts of ATCG and its advantages and disad-
vantages
– know the basic constructs of AML
– understand how AML constructs are mapped upon labelled transi-
tion systems
– be able to construct AML models for small reactive systems.

CONTENTS

13.1 Introduction

The software systems discussed in Chapters 6, 7 and 8 were input/out-

put systems: given a set of inputs, the system would respond with a set
of corresponding output values; the behaviour of the SUT could thus be
described as a function. In these chapters, we discussed several tech-
niques to systematically test such input/output systems.

Reactive or control-oriented systems are systems that react on input and

evolve: outputs do not only depend on the current input but also on
earlier inputs; in control-oriented systems the set of available opera-
tions depends on the current state of the system. Typically, such sys-
tems are meant to be executed non-stop and will never terminate. Ex-
amples of control-oriented systems are communication protocols and
multi-threaded systems with several interfaces. Control-oriented sys-
tems are often specified using state-transition based notations.

229
Universidad Politécnica de Valencia Structured Software testing

For the reactive systems in this chapter, the SUT is treated as a black-box
exhibiting behaviour and interacting with its environment, but with-
out knowledge about its internal structure. The only way a tester can
control and observe an implementation is via its interfaces. The aim of
testing is to check the correctness of the behaviour of the SUT on its in-
terfaces [116]. These interfaces are not limited to GUIs, but can be any
communication interface, e.g., I / O interfaces (standard input/output,
file systems, middleware, et cetera), API calls, et cetera.

The type of testing we will see in the chapter does not require any
knowledge about the implementation of the SUT. However, if the source
code of the SUT is available, other test techniques from this course can
of course be used to (unit) test parts of the implementation before testing
the integration of the different parts which constitute the complete SUT.
Or one could apply code coverage measurements of the source code to
assess the quality of the tests.

A state machine model, or a state diagram, is used in computer science

and related fields to describe the behaviour of (reactive) systems. State
models describe the states that a system can have and the actions under
which the system changes state: the transitions. This is why they are also
called sometimes state-transition models. We will use all these terms
interchangeably.

Recall that a state-transition model is an instance of a graph model as

defined in Chapter 12: the states are the vertices and the transitions are
the edges. Consequently, as we have seen, the concepts and machinery
of graph models all carry over state-transition models.

Again, the testing techniques discussed in this chapter fit with the way
we present the test techniques in this book:

make a model – choose a coverage criterion – make test cases

However, when our test model consists of the state-transition models

that will be explained in this chapter (i.e. labelled transition models),
we do not have to manually design test cases based on applying this
criterion to the model. The state-transition models allow for Automated
Test Case Generation (ATCG). For state-transition models we will there-
fore use a slightly different approach: given a (formal) model and a
coverage criterion, we use a tool to automatically generate test cases:

230
Chapter 13 State-transition models

Make a model Labeled Transition System

Pick a coverage criterion State coverage

Transition coverage

Generate test cases

The ATCG approach is the result of more than three decades of scientific
research. The ATCG approach has a strong mathematical basis; many
theoretical papers and descriptions are available which formally define
the test derivation algorithms and prove their correctness. In this chap-
ter, we introduce ATCG by example and only touch upon the formal
foundations of ATCG, and we are (sometimes very) informal to ease the
presentation. For a more formal introduction the interested reader is
referred to, for example, [116].

The fundamental assumption of ATCG is that the implementation of the

SUT is based on a specification, which explicitly describes what the SUT
should do. A specification is typically a collection of documents which
states in detail how the SUT should behave; such a specification is the
blueprint for the development and implementation of the SUT. This
specification is also the source of information for the formal model of
the system: the model is thus an abstract description of the SUT. Testing
then amounts to checking whether the behaviour of the SUT is allowed
by the behaviour as specified by the abstract model.

13.2 Labelled transition tystems

This section introduces labelled transition systems, a well-known for-

malism for state-transition models. Several simple examples of such
systems will be presented. We will discuss traces, test cases and what
it means for a SUT to conform to a model. Coverage criteria for models
will also be presented.

A labelled transition system (LTS) is a structure consisting of states and

labelled transitions. The states model the states of the system. The la-
belled transitions can model:
• outputs, i.e. the actions that the system can perform, and
• inputs, i.e. the actions under which the system changes state.

The labels on the transitions represent the observable actions of the sys-
tem; they model the system’s interactions with the environment. Be-
sides obervable actions, the model may also contain internal actions
(often denoted by ι or τ) which are unobservable for the system’s envi-
ronment. We will not use internal actions in this chapter.

231
Universidad Politécnica de Valencia Structured Software testing

s0 s0 s0 ?kick s2
?button ?button
s1 ?button !tea
s1
!tea !coﬀee !tea
s1
s2 s2 s3
(a) (b) (c)

FIGURE 13.1 Some labelled transition systems

Some of the advantages of labelled transition systems are that they are
easy to draw and it is intuitively clear what the semantics of the system
are. Figure 13.1 shows some examples of LTSs. The bullets represent the
states and the arrows between the bullets are the transitions. Each label
is either prefixed with a question mark (?) or an exclamation mark (!):
a question mark ? represents an input to the system, an exclamation
mark ! represents an output from the system. The start state of the
system is the state with an ingoing arrow which has neither a source
state nor a label. A state without an outgoing arrow is called a final
state.

Let us look at the examples of Figure 13.1 in more detail. Figure 13.1 (a)
describes a system that can deliver a cup of tea. The system has three
states. From the start state s0 , a ?button can be pressed and the system
goes to state s1 . From state s1 the system can do a !tea action. After the
!tea action, the system goes to the final state s2 , and nothing can happen
anymore. Figure 13.1 (b) describes a system which can deliver coffee
or tea, non-deterministically. After receiving the ?button, the system can
either deliver !coffee or !tea. Figure 13.1 (c) is a continuous version of
Figure (a): after delivering the tea, the button can be pressed again to
deliver tea. If the system is ?kick-ed, the system goes out-of-order and
does nothing anymore.

Formally, we can define an LTS as follows:1

DEFINITION 13.1 A labelled transition system (LTS) is a tuple hS, L, T, s0 i, where

• S is the non-empty set of states,
• L is the set of transitions, with L = L I ∪ LO and L I ∩ L0 = ∅
- L I is the set of input transitions
- LO is the set of output transitions
• T the transition relation, and
• s0 is the initial (or start) state.

The labels in L I and LO represent the observable actions of a system.

L I ∩ LO = ∅. A synonym for input is stimulus. Likewise, a synonym for
output is response. The transition relation T defines the structure of the
LTS : it connects the states using transitions: T ⊆ S × L × S.

1 Tretmans [116] takes a more formal approach and distinguishes between labelled

transition systems (where the transitions do not have a direction) and input-output tran-
sition systems (where the transitions are divided into input- and output transitions).

232
Chapter 13 State-transition models

If we look at Figure 13.1 (a), the LTS can be formally defined as

h{s0 , s1 , s2 }, {?button} ∪ {!tea}, {(s0 , ?button, s1 ), (s1 , !tea, s2 )}, s0 i.

sidle

?Card

!AskPincode
!KeepCard

?Pincode
!Wrong

?Pincode !AskAmount

!Wrong sgive_money
!AskAmount
?Amount

!Money !NotEnough

!Card

FIGURE 13.2 LTS of a cash dispenser.

Now let us look at a larger example: the cash dispenser. In Chapter 2

we presented two models of a cash dispenser. Figure 2.2 showed a very
informal model showing the actions (as states) and the relative order of
these actions. Figure 2.3 showed a flow graph of a cash dispenser which
can be used for the implementation of a cash dispenser. In Chapter 12
(Example 12.3) we showed a (Mealy type) state machine, with informal
specifications of the events, guards and actions expressed in natural
language.

Figure 13.2 shows an LTS of a cash dispenser. The user of the cash dis-
penser can insert a bank card (input ?Card). After inserting the card, the
user is asked for the card’s PIN (output !AskPIN). If the PIN entered (in-
put ?PIN) is correct, then the user is asked how much money is required
(output !AskAmount). If the user has enough credit, the money is dis-
pensed (output !Money), otherwise a message is displayed that there is
not enough money in the account (output !NotEnough). In both cases, the
cash dispenser will return the card (output !Card). If the PIN is wrong,
the machine will permit one other attempt to enter the correct PIN (in-
put ?PIN). If the PIN is incorrect again, the cash dispenser will issue a
message about the PIN entered (output !Wrong) and the card will be kept
(output !KeepCard).

233
Universidad Politécnica de Valencia Structured Software testing

In Chapter 7 we discussed state abstractions to define the behaviour of

a component. There, the states were explicit (e.g., empty/loaded/full for
a Stack) but the actions where implicit. For LTSs this is the other way
around, the actions are explicit and the states are often implicit.

EXERCISE 13.1
In this exercise you are asked to define an LTS for an abstract Stack ma-
chine. The Stack can hold a maximum of three elements. The Stack
accepts two stimuli: push and pop. We abstract from the actual values
that are being pushed to and popped from the Stack. The Stack ma-
chine has two responses: value and error. After a pop stimulus, the Stack
responds with a value label. If the Stack machine receives a pop stim-
ulus when there are no more elements, then it outputs an error label.
Similarly, when the Stack is full and the machine receives a push la-
bel, the Stack also outputs an error label. Draw an LTS for this Stack
machine.

Labelled transition systems constitute a powerful semantic model to

reason about the behaviour of reactive systems. However, except for
the most trivial systems, a representation by means of a state-transition
graph is usually not feasible [116]. In Section 13.4, we will introduce
AML , a modelling language whose semantics is defined in terms of LTS s.
AML allows us to model processes with hundreds of states and transi-
tions.

13.2.1 Conformance

Having a formal semantics of a modelling language has an important

benefit: we can build a tool which translates a model to its state-transition
semantics. This is similar to the way a compiler of a programming lan-
guage translates a computer program to machine code which can be
executed. Subsequently, we can use the generated state-transition sys-
tem to automatically generate test cases.

A trace in an LTS is a sequence of labels representing the transitions that

the LTS can do, starting from the start state of the LTS. We denote a trace
by the list of labels enclosed in angle brackets (hi). Given an LTS L, the
set traces( L) is the set of all possible traces of L.

Consider Figure 13.1 again.

• In (a) there are three possible traces:
hi (i.e. the empty trace),
h?buttoni, and
h?button, !teai.
So: traces( L) = {hi, h?buttoni, h?button, !teai}.

• For (b), we have:

traces( L) = {hi, h?buttoni, h?button, !coffeei, h?button, !teai}.

• For (c), the set traces( L) is infinite:

traces( L) = {hi, h?kicki, h?buttoni, h?button, !teai, h?button, !tea,
?kicki, h?button, !tea, ?buttoni, h?button, !tea, ?button, !teai, . . . }.

234
Chapter 13 State-transition models

We assume that the SUT can be modelled as an LTS and that the input
and output actions of the SUT are the same as specified in L I and LO of
the LTS. In practice, though, the physical labels of the SUT have to be
translated to the logical labels of the model, and vice versa.

A test of the SUT is an experiment consisting of supplying stimuli to the

SUT and observing its responses. The specification of such an experi-
ment, including both the stimuli and the expected responses, is called
a test case. When we use an LTS model to generate the test cases, this is
frequently called conformance testing, since it involves assessing whether
a SUT conforms to the model. The process of applying test cases to the
implementation of the SUT is called test execution [116].

!coﬀee !tea ?button

?button
s0 fail fail
?button !coffee !tea !tea
θ
s1
fail
!coffee !tea !coffee !tea !tea
θ !coffee θ !tea
s2 s3
fail pass fail fail pass fail fail

(a) LTS (b) test case (c) test execution

FIGURE 13.3 LTS , a test case and a test execution.

Executing a test case results in a execution trace of the SUT, which either
corresponds to a trace of the LTS or not. Test execution may be success-
ful, meaning that the observed responses of the SUT correspond to the
expected responses of the test case. We say that the test passes and the
execution trace will correspond to a trace in the LTS. Test execution may
be unsuccessful: when executing the stimuli from the test case, we ob-
serve a response from the SUT that is not an expected response from the
test case. We say that the test fails. In this case the execution trace will
not correspond entirely to a trace in the LTS: the last observed response
does not correspond to a transition of the LTS.

Given an LTS, a test case can be represented by a tree, where each edge
is either a stimulus or a response, and ends with a verdict: pass or fail.
For example, recall the LTS in Figure 13.3 (a). Figure 13.3 (b) (taken
from [40]) is a test case which describes all traces for this LTS. The pass-
ing traces all start with a ?button stimulus, followed by either a !coffee
or !tea response, and then nothing more. Observing nothing – or the
absence of a response – is named quiescence. This is represented by θ:
it means that we do not observe any response from the SUT. The tran-
sition θ will be taken if none of the output responses can be observed.
Testing for quiescence is usually implemented as a timeout: if we do
not observe any response after a certain amount of time, we conclude
that no responses will be generated.

235
Universidad Politécnica de Valencia Structured Software testing

Figure 13.3 (c) shows a possible execution of the test case at the SUT:
h?button, !tea, !teai. After a ?button stimulus, the SUT responds with !tea
and then !tea once again. The test execution is a trace in the test case of
Figure 13.3 (b), leading to a fail verdict, because the second !tea output
is not allowed in the LTS. In other words, the observed execution trace
is not a trace of the LTS, and the test fails. Note that if we would exe-
cute the test case again, we might observe another test execution, e.g.
h?button, !coffee, θ i, which represents a passing test execution.
Most reactive systems are supposed to work uninterrupted: they are
designed to never stop. The set of traces of such systems is clearly infi-
nite and we have already seen that it is impossible to define a test suite
of test cases containing all possible traces. We can still test the machine
up to a predefined depth n, though: we limit the test cases to that depth
of n labels. If a test execution still conforms to the test case after n steps,
we consider the test case passed up to depth n.

For industrial-sized systems the test case will often be too large due to
the large number of possible non-deterministic stimuli, even if we limit
the test case vertically to a predefined depth of n steps. To limit the test
case horizontally we can leave out certain stimuli (in parts of) of the
LTS . The choice of stimuli to be included in the test case is driven by the
intended coverage of the model.

EXERCISE 13.2
Consider the LTS of the Stack machine of Exercise 13.1. Because the
behaviour of Stack machine is infinite, we cannot define a complete
test case. Draw a test case for the Stack machine up to depth 4, i.e. 4
labels deep.

EXERCISE 13.3
The test case of Exercise 13.2 includes all traces of depth 4. Conse-
quently, it does not include the interesting trace h?push, ?push, ?push,
?push, !errori of length 5. Draw a test case for the Stack machine up to
depth 5, where a ?pop stimulus is only allowed after two ?push stimuli.

13.2.2 Coverage

As an LTS is essentially a graph consisting of vertices (states) and edges

(transitions), we can use the coverage criteria as defined in Chapter 12.
In the realm of ATCG we often use different names, though.
• State coverage (= vertice coverage from Definition 12.1): how many
states of the model are covered by the test suite?
• Transition coverage (= edge coverage from Definition 12.2): how many
transitions of the model have been covered by the test suite?
• Trace coverage : comparing the number of traces executed by the test
suite to the complete set of traces is not very informative: the total
number of different traces far exceeds the number of traces that can
be generated and executed in practice. On the other hand, (sub)paths

236
Chapter 13 State-transition models

within the LTS can be of interest. That is, for example n-switch cov-
erage (see Definition 12.4) of a test suite, which tries to include all
(sub)paths of length n (from the start state or from each state) in the
generated test suite.
• Data coverage: in the examples that we have seen so far, the labels
of the LTS did not carry any data: they were just abstract names for
actions of the system. In practice, though, we will usually add more
details to the labels, including data values. In Section 13.4 on AML
we will see examples of this. For such labels, we can use the same
approaches as we discussed in Chapters 6, 7 and 8 of this book, e.g.,
input domain modelling with equivalence classes or input domain
boundaries, to obtain and increase the data coverage.

Consider Figure 13.1 (c). If we never ?kick this tea machine, and only
push the ?button to get tea, both the state coverage and the transition
coverage of the LTS will be 66% as the transition labelled with ?kick and
state s2 will not be covered.

Both state and transition coverage of the LTS model might not say much
about the coverage of the complete SUT. The model may be incomplete
in the sense that functionality of the SUT has been left out or certain
details have been abstracted from. From a dynamic point of view, cov-
erage of (sub)paths might be much more interesting than the static cov-
erage of states and transitions. Still, even with these reservations, state
and transition coverage should be as high as possible.

EXERCISE 13.4
Consider the LTS of the Stack machine of Exercise 13.1 again. Suppose
we would execute the following three tests:
• h?push, ?pop, !valuei
• h?push, ?push, ?pop, !value, ?pushi
• h?push, ?push, ?pushi
What is the state coverage of these three tests? What is the transition
coverage of these tests?

13.3 Test case generation from LTS

Automated Test Case Generation (ATCG)2 is an automated testing tech-

nique where we use a program to generate the test cases. The ATCG tool
accepts a model in a formal notation with a formal semantics, e.g. la-
belled transition systems. In the tool one or more coverage criteria can
typically be selected. The tool then automatically generates test cases
given the model and the selected coverage criteria.

Figure 13.4 gives an overview of an ATCG-approach. Given the model,

the test generator generates a test suite of test cases. The test executor
then attempts to execute each test case on the SUT:
2 In literature, the ATCG approach based on state-transition models is usually called

Model-Based Testing (MBT) or Conformance Testing. However, in this book we consider that
all software testing should be model-based (see Chapter 2), and not just the approach
using state-transition models. Therefore we deviate from literature here and use the term
ATCG in the context of testing state-transition models.

237
Universidad Politécnica de Valencia Structured Software testing

coverage
model
criteria

Test Case
Generator

test
test
test
case
case
case

responses

Test Case
Executor adapter SUT

stimuli

test
test
test
case pass / fail
case
execution

FIGURE 13.4 Off-line Automated Test Case Generation.

• If the label in the trace is a stimulus, the adapter transforms the stim-
ulus to a physical label and offers the physical label to the SUT.
• If the label in the trace is a response, the test executor waits until it
receives the logical response from the adapter.

If the SUT returns a response which is not expected in the test case, it
concludes that the test has failed. If the test executor can execute a com-
plete trace of the test case against the SUT, it concludes that the test has
passed. Figure 13.4 is regarded as an off-line method: the set of test cases
is generated before the actual testing takes place at the SUT.

coverage
model
criteria

responses

On-the-fly
adapter SUT
Tester

stimuli

test
test
test
case pass / fail
case
execution

FIGURE 13.5 Online Automated Test Case Generation.

Figure 13.5 shows an alternative ATCG approach: online testing. Test

cases are generated on-the-fly and partly, while testing the SUT. The
tester walks over the LTS-es of the processes of the model offering stim-
uli to the adapter and observing the responses of the SUT. The tester de-
cides upon the next test-step after a stimulus or response (as opposed

238
Chapter 13 State-transition models

to generating all test-steps before testing). Typically, state and transi-

tion information is recorded during a test execution such that – e.g., for
a next test execution – the tester can decide what stimulus to choose to
increase either the state or transition coverage of the model, or both.

The biggest advantage of ATCG is its automated nature. After develop-

ing a model of the SUT and the adapter to connect to the SUT, the actual
testing process is fully automated. Given coverage criteria, the ATCG
tool automatically generates test cases for the SUT. ATCG therefore al-
lows the automatic production of large and provably sound test suites
[116]. The smart coverage strategies (state, transition, n-switch, data)
of the ATCG-tool ensure that we can stop testing when a certain level of
coverage has been reached. Another important benefit of ATCG is that it
supports the development cycle of the SUT more naturally than manu-
ally constructed test suites. In the case of (evolutionary) changes to the
SUT , retesting the SUT with ATCG is relatively easy. In most cases only
the model has to be changed in accordance with the changes to the SUT,
and new test cases can be generated by the tool. No test scripts have to
be manually changed. This makes ATCG well-suited for iterative devel-
opment approaches like Agile and Scrum.

Crucial to test automation – and thus the ATCG approach – is the adapter,
which translates logical labels of the model to physical labels of the SUT
and vice versa. If we consider our coffee machine of our previous exam-
ple, the adapter has to translate the stimulus ?button to an actual press
on the button. Furthermore, the adapter has to observe the deliverance
of !tea and !coffee (in a cup). Fortunately, even without ATCG, the de-
velopers of a SUT need to test their system themselves. So usually the
manufacturer has provided a testing interface to the system to analyse
and diagnose the system. Such an interface can be used to connect the
adapter as well. Still, the development of an adapter can be expensive
and can be a considerable part of the ATCG effort.

ATCG in practice. The most important activity of software testing is

the development of the models which will be used for testing. This is
not different for the ATCG approach. Typically, several errors are al-
ready found during this modelling phase. The specification on which
the model is based (and which is used for the development of the SUT) is
often ambiguous and/or incomplete. To develop a precise model such
specification issues have to be fixed. Furthermore – for the online ATCG
approach using the Axini Modeling Suite – we have witnessed that al-
most all bugs in the SUT are found by random testing. That is, simply
walk over the model, randomly select the stimuli, and check whether
the observed responses conform to the model. When no more bugs are
found with random testing, we will switch to a testing strategy in which
the various coverage criteria are exploited to steer the test cases in order
to improve the test coverage. Furthermore, longer and deeper test cases
are generated. Remarkably, in practice, usually only a few more bugs
are found in this last phase.

So although coverage criteria are input for the ATCG approach, most
bugs are being found during modelling and random testing, which do
not use these coverage criteria.

239
Universidad Politécnica de Valencia Structured Software testing

13.4 Axini Modelling Language

Pure state-transition models are tedious to construct and to be main-

tained by hand. Therefore, we typically use a modelling language which
has a higher abstraction level. The semantics of this high-level mod-
elling language is then defined in terms of a state-transition system. The
high-level modelling language has thus a formal semantics: for every
construct of the language it is formally defined to what state-transition
model this is mapped.

As an example of a state-transition modelling language, this section de-

scribes the basics of the Axini Modelling Language (AML), the mod-
elling language of the Axini Modelling Suite (AMS)3 , a state-of-the-art,
online ATCG tool. The semantics of AML models is defined upon Sym-
bolic Transitions Systems (STS) [44], a data-extension of LTS. But in this
book we will mostly use the LTS part of AML.

AML can be used to describe the behaviour of reactive systems. In this

context, a reactive system is a (part of a) message-oriented system that
reacts with observable outputs to inputs received from the environ-
ment. A model of the system describes the inputs that it should accept
and the outputs it may, or should, send. For each language construct of
AML a formal translation to a (part of an) LTS is defined. Each process
defined in an AML model is translated to a complete LTS, embodying
the behaviour of the process.

13.4.1 Model

A model is an abstract, high-level description of the SUT (recall Chap-

ter 2). A model describes the behaviour of the system, and specifies
what the system should do. Conversely, a computer program (i.e. the
software) prescribes the system, and dictates how the system works. The
model is a genuine description of the SUT, the test tool is responsible for
generating test cases from the model.

An AML model consists of the declaration of the external interfaces of

the system and the processes which together define the behaviour of
the SUT. Each process should be named with a unique string, so they
can be distinguished from each other.
1 process('main') {
2 # declarations of labels, variables
3 # behaviour of the process
4 }

In the example above (lines 2-3) we have used comments as placehold-

ers for the declaration of the labels, and (optionally) variables, and the
behaviour of the process. AML supports line comments which start with
a hash (#): everything after # until the end of the line will be ignored.

The process 'main' has no behaviour defined; the semantics of this pro-
cess in terms of an LTS is a single state with no outgoing transitions.

3 https://www.axini.com/

240
Chapter 13 State-transition models

13.4.2 Communication: labels

A process should specify the external behaviour: the interactions of the

system with its environment. The interactions can be split:
• stimuli: the inputs that the system can process, and
• responses: the outputs that the system can or should send.

The generic name for a stimulus or a response is label. This name reflects
the fact that the names of the stimuli and responses are used to label
transitions in the underlying transition system.

A channel represents the communication interface via which the action

is to be executed. A channel must be declared explicitly before actions
can be associated with it. We distinguish internal and external channels.
External channels are used for communication with the SUT. Internal
channels are used for communication between processes. We will not
use internal channels in this text.

Below is an example of the definition of an external channel 'extern'

and a process 'button-tea':

1 external 'extern'
process('button-tea') {
s0
2
3 # declarations of labels, variables
4 timeout 10.0
?button
5 stimulus 'button', on: 'extern'
response 'tea', on: 'extern'
s1
6
7
8 # behaviour of the process !tea
9 receive 'button'
send 'tea'
10
s2
11 }

The process 'button-tea' can receive a stimulus (press on a) 'button'

label through the external channel and subsequently sends a response
'tea' label. The model is a description of the SUT, so the stimuli are the
inputs for the SUT and the responses are the output of the SUT.

Labels like 'button' and 'tea' have to be declared before they can
be used in the behavioural part of the process. The semantics of this
model coincides with the LTS of the Figure 13.1 (a). As can be seen in
this example, a label has to be assigned to a specific channel.

Also note the timeout declaration at the start of the process. The test
tool needs to know how long it has to wait for responses to arrive. This
timeout 10.0 declaration specifies that the default waiting time for re-
sponses is 10.0 seconds in process 'button-tea'. If – when waiting
for the response 'tea' – the response 'tea' does not arrive in 10.0
seconds, the test tool will exit with the verdict fail: quiescence is ob-
served instead of 'tea'.

13.4.3 Non-deterministic choice

The choice construct allows for the specification of a (non-deterministic)

choice between multiple actions.

241
Universidad Politécnica de Valencia Structured Software testing

1 choice {
2 o { <alternative_1> }
3 o { <alternative_2> }
4 ...
5 o { <alternative_n> }
6 }

The alternatives of the choice-construct are non-empty sequences of

other AML constructs (typically stimuli or responses). The options of
the choice construct are enclosed by curly brackets. A single option is
specified by lower case o (from option) followed by a sequence of ac-
tions, also enclosed by curly brackets. The placeholder <alternative_i>
stands for a sequence of AML constructs. In general, the choice construct
is used to model exclusive alternatives. When multiple alternatives are
possible, they will be taken together.

Note that a non-deterministic choice is a high-level construct that does

not have a counterpart in a programming language: a computer pro-
gram dictates precisely what instructions have to be executed, whereas
the choice construct specifies that multiple options are possible.

Below is an example for the LTS we have seen before in Figure 13.1
(repeated below for convenience). The process 'tea-or-coffee' first
waits for 'button' to be pressed. Then it non-deterministically sends a
'tea' or a 'coffee'. The behaviour of the process 'tea-or-coffee'
is captured by the LTS.

1 external 'extern'
2 process('tea-or-coffee') {
3 # declarations of labels, variables
timeout 10.0
s0
4
5 channel('extern') {
6 stimulus 'button'
7 responses 'tea', 'coffee' ?button
}
s1
8
9
10 # behaviour of the process
11 receive 'button' !coﬀee !tea
12 choice {
13
14
o { send 'tea' }
o { send 'coffee' }
s2 s3
15 }
16 }

We used some shorthand notation in this example. Instead of annotat-

ing each label with an explicit channel using the on: annotation, we de-
fined the stimulus and responses using a channel declaration. Also
observe that two or more responses can be declared with the keyword
responses. Similarly, two or more stimuli can be declared with the
keyword stimuli.

Typically, the alternatives of a choice are all stimuli or all responses.

Mixing stimuli and responses in a choice seems natural from a mod-
elling point of view. A system might be able to accept a stimulus or
do some output. However, with respect to testing it is not clear what
this would mean. When should we do the stimulus? How long should
we wait for the response? And although the syntax of AML allows the

242
Chapter 13 State-transition models

mixing of stimuli and responses in a choice, the tester will always give
precedence to the responses: it will wait to observe the responses, and
thus ignore the stimuli. The reason for this is subtle: if we were to
choose the stimulus and later observe the response, we do not know
whether the response is caused by the stimulus or whether the response
is just late.

13.4.4 Loop: repeat

The repeat statement can be used to specify that a certain sequence of

transitions can be repeated.
1 repeat {
2 o { <alternative_1> }
3 o { <alternative_2> }
4 ...
5 o { <alternative_n> }
6 }

The syntax and semantics of the repeat construct are similar to those
of the choice construct. The difference with the choice is that after an
alternative has been executed, the repeat construct is executed again.
The stop_repetition statement can be used to break out of the loop.

Below, the behaviour of the process 'tea-or-kick' is captured by the

LTS of Figure 13.1 (c).

1 external 'extern'
2 process('tea-or-kick') {
3 timeout 10.0
4 channel('extern') { s0 ?kick s2
5 stimuli 'button', 'kick'
6 response 'tea'
?button !tea
7 }
8 repeat {
9 o { receive 'kick' ; stop_repetition } s1
10 o { receive 'button'; send 'tea' }
11 }
12 }

Again, apart from the repeat construct itself, we introduced some short-
hand notation. The actions in the alternatives of the repeat are written
on a single line. With this syntax the actions have to be separated by a
semi-colon (;).

13.4.5 States and goto

Within programming languages, the goto statement is often considered

harmful [41]. When modelling reactive systems, though, the use of
states and goto statements is natural: the behaviour of the SUT is of-
ten specified as a state-transition system in the documentation of the
SUT .

243
Universidad Politécnica de Valencia Structured Software testing

AML supports named states and a goto construct to transfer control to

a named state. States are identified by a string. The following AML
snippet, for example, defines an infinite loop of 'ping' and 'pong'
messages.
1 state 'loop'
2 receive 'ping'
3 send 'pong'
4 goto 'loop'

The previous 'tea-or-kick' process can be rewritten using a named

state, a choice and a goto statement:
1 external 'extern'
2 process('tea-or-kick-goto') {
3 timeout 10.0
4 channel('extern') {
5 stimuli 'button', 'kick'
6 response 'tea'
7 }
8
9 state 'loop'
10 choice {
11 o { receive 'kick' }
12 o { receive 'button'; send 'tea'; goto 'loop' }
13 }
14 }

After the stimulus 'kick', there is no need to explicitly jump to the end
of the process: after the choice, the process has ended anyway.

For the larger example of the ATM, or cash dispenser, from Figure 13.2
and the given the AML constructs that we have seen so far, we can now
make the following AML model:
1 external 'extern'
2 process('cash-dispenser') {
3 timeout 10.0
4 channel('extern') {
5 stimuli 'Card', 'Pincode', 'Amount'
6 responses 'AskPincode', 'AskAmount', 'Wrong', 'Money',
7 'NotEnough', 'Card', 'KeepCard'
8 }
9
10 state 'idle'
11 receive 'Card'
12 send 'AskPincode'
13 receive 'Pincode'
14
15 choice {
16 o { send 'AskAmount'; goto 'give money' }
17 o {
18 send 'Wrong'
19 receive 'Pincode'
20 choice {
21 o { send 'AskAmount'; goto 'give money' }
22 o { send 'Wrong'; send 'KeepCard'; goto 'idle' }
23 }
24 }
25 }
26
27 state 'give money'

244
Chapter 13 State-transition models

28 receive 'Amount'
29 choice {
30 o { send 'Money' }
31 o { send 'NotEnough' }
32 }
33 send 'Card'; goto 'idle'
34 }

EXERCISE 13.5
Consider the Stack machine of Exercise 13.1 again. Write an AML model
for this Stack machine which uses named states and goto statements.

13.4.6 Data: parameters

So far, we have only seen models where the observable labels were ab-
stract names. In practice, however, stimuli and responses usually carry
data. AML supports data through parameters on labels. The names
of parameters are strings and the types can be simple types (:integer,
:string, :boolean, :decimal) or structured types (lists, structs, hashes).
In this text, we will only use simple types. The names and types of label
parameters are specified in a list enclosed by curly brackets and have to
be specified with the definition of the label.

For example, consider the following AML model where we have to in-
sert coins to get tea.
1 external 'extern'
2 process('coin-tea-parameters') {
3 timeout 10.0
4 channel('extern') {
5 stimulus 'coin', {'value' => :integer}
6 response 'tea', {'volume' => :integer}
7 }
8
9 repeat {
10 o {
11 receive 'coin', constraint: 'value == 50'
12 send 'tea', constraint: 'volume == 200'
13 }
14 o {
15 receive 'coin', constraint: 'value == 100'
16 send 'tea', constraint: 'volume == 300'
17 }
18 }
19 }

The stimulus 'coin' (line 5) has a parameter 'value' of type :integer,

denoting the worth of the coin. The response 'tea' (line 6) has a pa-
rameter 'volume' of type :integer, expressing the volume of the tea
delivered to the user. For a stimulus (like 'coin'), we now should spec-
ify the values of the parameters. In the example above, in lines 11 and 15
we constrain the 'value' to be exact, e.g., 50 and 100 respectively. For
a response (like 'tea'), we can constrain the value of the parameters
that the SUT should return (lines 12 and 16). As a result, the generated

245
Universidad Politécnica de Valencia Structured Software testing

test case will check the value of the parameters as offered by the SUT.
In the example above, we check that the 'volume' is 200 (line 12) or
300 (line 16). Besides equality, AML also supports relational and logical
operators to constrain the parameters, e.g.,
1 receive 'coin', constraint: 'value >= 50 && value < 100'

Here we specify that the 'value' of the 'coin' has to be at least 50,
but smaller than 100.

FIGURE 13.6 LTS of the AML process 'coin-tea-parameters'.

Figure 13.6 shows the LTS for the process 'coin-tea-parameters', as

generated by AMS, the tool suite for modelling in AML.

13.4.7 State variables

We take the previous example one step further. We add a stimulus

'stop' button to the machine which stops the coin-tea loop. After
pressing 'stop', the following response will be shown on the 'display'
of the machine: total value of the coins inserted.

To model this, we introduce another feature of AML: state variables. State

variables can be used to store information. In the following example,
we will use the state variable 'total' of type :integer to store the
total value of the coins inserted.
1 external 'extern'
2 process('coin-tea-total') {
3 timeout 10.0
4 channel('extern') {
5 stimulus 'coin', {'value' => :integer}
6 stimulus 'stop'
7 response 'tea', {'volume' => :integer}
8 response 'display', {'number' => :integer}
9 }
10 var 'total', :integer, 0
11
12 repeat {
13 o {
14 receive 'coin', constraint: 'value == 50',
15 update: 'total = total + value'
16 send 'tea', constraint: 'volume == 200'
17 }
18 o {

246
Chapter 13 State-transition models

FIGURE 13.7 LTS of the AML process 'coin-tea-total'.

19 receive 'coin', constraint: 'value == 100',

20 update: 'total = total + value'
21 send 'tea', constraint: 'volume == 300'
22 }
23 o { receive 'stop'; stop_repetition }
24 }
25 send 'display', constraint: 'number == total'
26 }

First, stimulus 'stop' and response 'display' are added on lines 6

and 8 respectively. The state variable 'total' is initialised to 0 on
line 10. State variables can be updated in an update: clause of a label
(lines 15 and 20). Note that for the response 'display' the parameter
'number' is checked against the state variable 'total', which has been
updated during the test run. If the SUT would send a response with an
incorrect number, the test case would fail.

Figure 13.7 shows the LTS for the process 'coin-tea-total'. Figure 13.8
shows a sample (passing) trace of the process 'coin-tea-total' by
running AMS’s EXPLORER on the model: three coins have been inserted
to deliver one small and two medium cups of tea.

EXERCISE 13.6
Consider the Stack machine of Exercise 13.1 again. In Exercise 13.5 you
developed an AML model which uses named states and explicit goto
statements to model the Stack machine. Rewrite your AML model such
that it no longer uses named states and goto statements but instead
uses a repeat statement and a state variable to count the number of
values on the Stack.

Note that the underlying LTS of both AML models will be different. And
hence the number of states and transitions will be different as well. This
also means that given a SUT and the same test execution, the state and
transition coverage for both AML models could be different.

247
Universidad Politécnica de Valencia Structured Software testing

FIGURE 13.8 Explored trace of the AML process 'coin-tea-total'.

13.4.8 Advanced features

In the previous section, we have seen an introduction to AML. The con-

cepts that we have seen are enough to start making simple AML models
with which you can start practicing doing the practical problem at the
end of this chapter. But there is a lot more to explore for which we do
not have time in this course. Several advanced and useful aspects of
AML have not been discussed. Just to give you a taste, we list some of
them below:
• structured data types for parameters and variables: lists, structures,
and hashes;
• the inclusion of model parts to structure the model;
• behaviour definitions to encapsulate common behaviour;
• internal communication between processes to pass information be-
tween processes;
• internal transitions to update state variables;
• urgent transitions to enforce internal communication;
• function definitions for complex computations with variables;
• the use of the Ruby programming language as pre-processor to struc-
ture the model.

The interested reader is referred to the AML tutorial which is available

from within AMS.

248
Chapter 13 State-transition models

249
Chapter contents 14

Test management and test process improvement

Overview 251

250
Chapter 14

Test management and test process improvement

OVERVIEW

There is much more to software testing than designing test cases. Test-
ing software is an engineering activity, and like any engineering project,
it should be managed using well-established test project management
processes. Test project management, or just test management, is all
about these processes.

In this chapter, we look into some aspects of test management processes

just to give you a flavour. Moreover, since any type of management
activity goes hand in hand with continuous improvement processes, we
will briefly touch upon these too. We will refer you to more extensive
books and text for a complete overview.

Many of the things mentioned in this chapter might sound logical, or

even obvious if you already know about project management. Still, you
will probably only really understand and appreciate these after partici-
pating in several software testing projects in practice. However, that is
an experience we cannot offer you; we have to settle for merely bringing
these aspects to your attention.

LEARNING GOALS
After studying this chapter, you are expected to:
– know that test management consists of many different activities
that are needed for the planning, monitoring, controlling, report-
ing and adjusting of test activities
– know the high-level contents of a test plan.
– understand how simple defect data can be used to monitor the
progress of testing
– explain why the bug reporting process is critical
– be able to list some properties of good bug reports
– name some of the most famous Test Process Improvement (TPI)
models.

This chapter contains reading assignments that involve reading the fol-
lowing papers, the links to which you can find on the course site:

251
Universidad Politécnica de Valencia Structured Software testing

[16] R. Black.
Charting the progress of system development using defect data.
12th International Software Quality Week, 24-28 May 1999.

[17] R. Black.
The bug reporting processes.
In Journal of Software Testing Professionals, 2000.

CONTENTS

14.1 Introduction

Test management consists of the planning, monitoring, controlling, re-

porting and adjusting of test activities (ISO/IEC/IEEE 29119 series of
software testing standards1 ).

When we talk about or google test management methodologies, we al-

most directly find as many different methods as there exist testing con-
sultancy companies. Many books elaborate in length on various as-
pects of test management processes. For example, TMAP2 and TMAP
Next3 [97, 67, 109], TestFrame [29, 104], TestGrip [75], SmarTest [22] and
STEP [37].

We will not treat each one of those here in detail; that is beyond the
scope of this course. In this chapter, we give an overview of the tasks
and processes that make up the planning, monitoring, controlling, re-
porting and adjusting. As a vehicle we have made the mind map from
Figure 14.1.

For two specific subjects (monitoring progress through defect tracking and
reporting defects and bugs) we have included two reading assignments of
the aforementioned articles by Rex Black. Although these were written
at the turn of the century, they describe insights and good practices that
are still needed and used widely today.

14.2 Planning: the Master Test Plan (MTP)

The Master Test Plan (MTP) is a document that describes in detail how
the testing is being planned and how it will be managed across the en-
tire test project. Although agile contexts write test lans with less details,
it is still important to have an overview of the why, what, how, who,
where, when and how much.

Many templates can be found on the internet. The most well-known are
offered by:

1 https://www.softwaretestingstandard.org/
2 TMAP ®is a registered trademark of Sogeti.
3 TMAP ®Next is a registered trademark of Sogeti.

252
Chapter 14 Test management and test process improvement

FIGURE 14.1 Mind map about test management

253
Universidad Politécnica de Valencia Structured Software testing

• Standards like ISO/IEC/IEEE 29119-34 . This is a new standard that

supersedes the well-known IEEE 829 Standard for Software and Sys-
tem Test Documentation. These standards are used, for example, in
the training syllabus of ISTQB5 (International Software Testing Qual-
ifications Board) [15], a non-profit software testing qualification or-
ganisation that operates internationally. STEP [37] is also based on
this standard.
• SOGETI. They offer a template [39] that lays the foundation of the
TMAP Next [67] approach.

14.2.1 The risk analysis: why do we test?

The why section is probably the most important of the MTP. It describes
the reasons why we are planning all these test activities the way we are.
Basically there is no reason for testing nor the need to make an MTP if
there is no risk [119, 67, 104]:

no risk ⇒ no test

The importance of a test plan increases exponentially with the increas-

ing complexity (and hence risk) of the test project (i.e. size of the SUT,
number of stakeholders, number of interfaces, communication, avail-
ability of information, et cetera) [37].

To clarify the risks we need to do a risk analysis, to determine:

project risks: these are circumstances or events that could potentially

harm the time-line, performance or budget of the project. These risks
originate from the environment in which the project is executed. The
environment consists of:
• The clients, users and other stakeholders. For example6 :
– Do we know them well?
– Is communication frequent and going well?
– Do they have expectations regarding the testing?
– Is it clear what they need and want?
– Will they change the requirements often?
– How many different types of users are there?

• Information we have about the project:

– Are the requirements of the system clear? (functional and non-
functional)
– Do we know the scope of testing?
– Are there any test acceptance criteria?
– What other information or documentation do we have to guide
testing?
– Do tests already exist? Can we re-use?
– How do we get informed about new information?

4 https://www.softwaretestingstandard.org/
5 https://www.istqb.org/
6 For more examples, [96] contains an extensive checklist for project risks.

254
Chapter 14 Test management and test process improvement

FIGURE 14.2 Example of MoSCoW priority chart for testing from [96]

• The planning and the budget:

– Do we have tight deadlines?
– Do we have limited budget?
– Do we get all information on time?
– Are the necessary resources available (people, hardware, tools, et
cetera)?

product risks: these are circumstances or events that could potentially

cause the system or software to fail to satisfy or fulfil some reasonable
expectation of the customer, user, or stakeholder. Possible things we
can think about are:
• Necessary integration with third-party components and the environ-
ment;
• Number of interfaces with other systems (internal or external);
• The use of new technologies that nobody in the team was familiar
with;
• Parts of the system that have been optimised, for example to improve
performance;
• Parts of the system that were made under time pressure;
• Parts of the system that were made by new or junior developers;
• New requirements added at the last moment.

During a risk analysis we need to identify the risks and for each risk
consider two aspects:
• likelihood: the probability that the circumstances or events occur;
• impact: the relative importance of the consequences.

Several of the above mentioned methodologies [96, 67] apply the MoSCoW
method for testing. MoSCoW was developed by Dai Clegg of Ora-
cle UK in 1994 to explain prioritisation. The term MoSCoW itself is
an acronym derived from the first letter of each of four prioritisation
categories (Must, Should, Could, and Won’t). For testing it has been
adapted in [96] to the example given in Figure 14.2. This model can be
adapted to the different types of software and clients.

255
Universidad Politécnica de Valencia Structured Software testing

Again, numerous books discuss how to do risk analysis for testing specif-
ically [37, 18, 67, 104] and project management in general. Moreover,
tools exist to help. Basically, the process comes down to doing brain-
storming sessions with as many different types of stakeholders involved
as possible. Books about brainstorming can also fill up an entire book-
case.

14.2.2 Test strategy: what and how will we test?

The test strategy describes the what we need to test and the how. Evi-
dently, our goal is to test the system that is being developed in some
software development project. But what are the properties and char-
acteristics of this system, both functional as well as non-functional (us-
ability, accessibility, security, performance, et cetera)? At what levels
do we need to test (unit, integration, system, acceptance) and do we
need to test the same for all subsystems? Do we only need to test the
software? Or also other accompanying artefacts (user documentation,
installation guides, et cetera)? And what about the hardware?

A test strategy depends on the software development methods and/or

methodologies used. Think of agile methodologies like Scrum, Adap-
tive Software Development and Dynamic Systems Development Method
(DSDM), et cetera. Or more traditional structured methods like Water-
fall, Incremental, Spiral, et cetera. Or even Test Driven Development
(TDD), eXtreme Programming, et cetera. Each of those can have an in-
fluence on the best test strategy.

14.2.3 Organisation: who will test and where?

The test organisation is about the people, their tasks, their skills, experi-
ence, education, knowledge, et cetera. Do they have the right technical
as well as business knowledge? Do we need additional training?

Organisation is about the environment in which they work; the office

environment as well as the testing environment. While planning, we
might need to think of questions like: Do we have the right hardware to
execute the tests? Can we execute several virtual machines at the same
time to test in parallel? Do we have a stable enough internet connection
to do the stress tests? Do we need any peripherals for the testing? Do
we need to simulate the system?

It is about the team structure responsibilities and tasks of the different

roles in a test team: test manager, test analyst, test consultant, team
leader, et cetera. Any company has its own different roles and naming
for a test team. This is again related to the team organisation. For exam-
ple, we can have a line organisation, functional organisation, or a project
organisation. Each with their own advantages and disadvantages, some
of which are described specifically for testing in the TMAP book [97].

256
Chapter 14 Test management and test process improvement

14.2.4 Time schedule and budget

Again, this is not different from any other project. To learn more about
scheduling and budgeting, just pick up any one of your favourite project
management books.

14.3 Monitoring: progress through defect tracking

The article [16] shows how simple means may provide insight in the
testing and development process. These “simple means” are charts that
can be drawn from basic logging information gathered during the test-
ing process. The first two charts that are described are not immediately
obvious, while the last two are simple but useful. The author also de-
scribes conclusions that may be drawn from these charts, with the pre-
caution that you always have to be careful: sometimes things are not
what they seem. A chart that looks good does not guarantee that all is
well. However, a chart that looks strange is definitely a reason to take a
better look.
Reading assignment: Read the article [16]. You may skip the para-
graphs that describe how to get it to work in Excel.

14.4 Reporting: about the bugs

The second article, [17], uses the same case study as the first, and actu-
ally refers to the first article at some point (page 12, line -18; at least, we
think that [16] fits the description of that “last article” fairly well).

If you have no experience with large test projects, it may be just too
much information when reading this article for the first time. In that
case, we suggest leaving it for a couple of days (or weeks) and then
reading it again.

The article [17] contains some typos. We mention one that may cause
confusion:
page 3, line -17 should read ‘... e.g., I could disambiguate this article
by using the alternative phrase “be clear” instead of disambiguate.’ By
“this article” the author means the current item in the numbered list.

Reading assignment: Read the article [17].

We expect you to have spotted all kinds of references to things de-

scribed in earlier chapters of this course. We mention a few: scripted
testing versus exploratory testing, Beizer’s phases of maturity, the error
– fault – failure terminology (is it consistent with Chapter 1?).

You may also have noticed that the tester in the case study is doing
exploratory testing on the fly, much like James Bach does in Chapter 4.

257
Universidad Politécnica de Valencia Structured Software testing

FIGURE 14.3 TMMi model taken from [43].

14.5 Controlling and adjusting: Test Process Improvement (TPI)

Any type of management activity goes hand in hand with continuous

improvement. At a very abstract and high level there is the Deming
Cycle, or PDCA Cycle, a continuous improvement model consisting of
a logical sequence of four repetitive steps for continuous improvement
and learning: Plan, Do, Check and Act. The cycle is also known as the
Deming wheel or spiral of continuous improvement.

An early version of the PDCA cycle was suggested by Walter A. She-

wart, in the 1920s. He introduced the steps PLAN, DO and SEE. Ed-
ward W. Deming modified this as we know it today: PLAN, DO, CHECK
and ACT.

In software engineering, there are various Software Process Improve-

ment (SPI) models. You may have heard of Six Sigma [105], Software
Process Improvement and Capability Determination (SPICE) [121] or
Capability Maturity Model Integration (CMMI) [33]. All of these are
defined as a sequence of tasks, tools, and techniques to Plan, Do, Check
and Act improvement activities to achieve specific goals such as in-
creasing development speed, achieving higher product quality, reduc-
ing costs, et cetera.

For testing, there are also improvement models that make it easier to
Plan, Do, Check and Act on specific test processes. We will mention two
of them that are the most well known. There is TMMi (Test Maturity
Model) that is described in [28] and online [43]. And there is TPI7 (Test
Process Improvement) [66] that has more recently been upgraded to TPI
Next8 [120].
7 TPI ®is a registered trademark of Sogeti.
8 TPI ®Next is a registered trademark of Sogeti.

258
Chapter 14 Test management and test process improvement

TMMi is a staged model for test process improvement (see Figure 14.3).
That means it contains levels of maturity through which an organisation
passes while its testing processes evolve from ad hoc and unmanaged,
to managed, defined, measured, and in optimization mode. The model
describes the different maturity levels and how you can advance from
one to another.

TPI is a continuous model for test process improvement. This means

that an organisation does not pass from one maturity level to another.
Instead, the model contains 16 key areas relevant for testing. Maturity
is defined per key area and is expressed by means of the levels Con-
trolled, Efficient and Optimising. Each level builds on the previous one.
The model provides a series of checkpoints that describes when a prac-
tice can reach a particular level of maturity. The model comes with a
maturity matrix that helps in visualising the progress and summarising
the key areas. You can download the matrix in an Excel file from the
website [110].

259
Universidad Politécnica de Valencia Structured Software testing

Bibliography

[1] Iso/iec/ieee 29119 software testing - international standards

for software testing. https://softwaretestingstandard.
org/. [Online; accessed 19-09-2019].

[2] Stop 29119. https://www.ipetitions.com/petition/

stop29119. [Online; accessed 19-09-2019].

[3] ISO/IEC 25000. Square (system and software quality require-

ments and evaluation). https://iso25000.com/index.
php/en/iso-25000-standards and https://www.iso.
org/standard/64764.html, 2014. [Online; accessed 24-10-
2019].

[4] Paul Ammann and Jeff Offutt. Introduction to software testing.

Cambridge University Press, 2017.

[5] James H Andrews, Lionel C Briand, and Yvan Labiche. Is muta-

tion an appropriate tool for testing experiments? In Proceedings of
the 27th international conference on Software engineering, pages 402–
411. ACM, 2005.

[6] G. Antoniol, L. C. Briand, M. Di Penta, and Y. Labiche. A case

study using the round-trip strategy for state-based class testing.
In Proceedings of the 13th International Symposium on Software Reli-
ability Engineering, ISSRE ’02, pages 269–, Washington, DC, USA,
2002. IEEE Computer Society.

[7] James Bach. Heuristic test strategy model, v5.7.2. http://www.

satisfice.com/tools/htsm.pdf, 2019. [Online; accessed
22-9-2019].

[8] J.M. Bach. Secrets of a Buccaneer-Scholar: How Self-Education and the

Pursuit of Passion Can Lead to a Lifetime of Success. Scribner, 2009.

[9] E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo. The

oracle problem in software testing: A survey. IEEE Transactions
on Software Engineering, 41(5):507–525, May 2015.

[10] BBC future. Rethinking Equivalence Class Partitioning. https:

//www.satisfice.com/blog/archives/1669, 2016. [On-
line; accessed 1-2-2019].

[11] Boris Beizer. Software testing techniques. Van Nostrand Reinhold,

2nd edition, 1990.

[12] Boris Beizer. Software testing techniques. Van Nostrand Reinhold,

2nd edition, 1990.

[13] Boris Beizer. Black-box testing - techniques for functional testing of

software and systems. Wiley, 1995.

[14] R.V. Binder. Testing Object-oriented Systems: Models, Patterns, and

Tools. Addison-Wesley object technology series. Addison-Wesley,
2000.

260
[15] R. Black, L. Van Der Aalst, and J.L. Rommens. The Expert Test
Manager: Guide to the ISTQB Expert Level Certification. Rocky
Nook, 2017.

[16] Rex. Black. Charting the progress of system development using

defect data. 1999.

[17] Rex. Black. The bug reporting processes. Journal of Software Testing
Professionals, 2000.

[18] Rex Black. Pragmatic Software Testing: Becoming an Effective and

Efficient Test Professional. Wiley, 2007.

[19] B.W. Boehm. Characteristics of software quality. TRW series of soft-

ware technology. North-Holland Pub. Co., 1978.

[20] Michael Bolton. Elemental Models. Betetr software. 2005.

[21] Boriz Beizer and Otto Vinter. Bug taxonomy and statistics. Tech-
nical report? Software Engineering Mentor, p. 2630, http:
//ottovinter.dk/Finalrp3.doc, 2001. [Online; accessed
27-2-2018].

[22] E. Bouman. SmarTEST, Slim testen van Informatiesystemen. Aca-

demic Service, 2008.

[23] George E. P. Box. Science and statistics. Journal of the American

Statistical Association, 71(356):791–799, 1976.

[24] George E P Box and Norman R Draper. Empirical Model-building

and Response Surface. John Wiley & Sons, Inc., New York, NY,
USA, 1986.

[25] G.E.P. Box. Robustness in the strategy of scientific model build-

ing. In ROBERT L. LAUNER and GRAHAM N. WILKINSON,
editors, Robustness in Statistics, pages 201 – 236. Academic Press,
1979.

[26] British Computer Society Specialist Interest Group in Software

Testing (BCS SIGIST). Standard for software component testing
working draft 3.4. http://www.testingstandards.co.uk/
ComponentTesting.pdf, 2001. [Online; accessed 12-12-2017].

[27] Rene Bryce and Charles J. Colbourn. Prioritized interaction test-

ing for pairwise coverage with seeding and avoids. 48:960–970,
10 2006.

[28] I. Burnstein. Practical Software Testing: A Process-Oriented Ap-

proach. Springer Professional Computing. Springer New York,
2006.

[29] H. Buwalda, D. Janssen, I. Pinkster, and P. Watters. Integrated Test

Design and Automation: Using the Test Frame Method. Addison-
Wesley, 2002.

[30] T. Buzan and B. Buzan. The Mind Map Book. Mind set. BBC Active,
2006.

261
Universidad Politécnica de Valencia Structured Software testing

[31] Hung Q. Nguyen Cem Kaner, Jack Falk. Testing Computer Soft-
ware. Wiley, 1999.
[32] T. S. Chow. Testing software design modeled by finite-state ma-
chines. IEEE Transactions on Software Engineering, SE-4(3):178–187,
May 1978.
[33] M.B. Chrissis, M. Konrad, and S. Shrum. CMMI for Development:
Guidelines for Process Integration and Product Improvement. SEI Se-
ries in Software Engineering. Pearson Education, 2011.
[34] Ross Collard. Appendix 1: Analyzing the triangle prob-
lem. http://www.testingeducation.org/conference/
wtst3_collard5.pdf, 2004. [Online; accessed 4-8-2018].
[35] Ross Collard. Exercise: Analyzing the triangle prob-
lem. http://www.testingeducation.org/conference/
wtst3_collard4.pdf, 2004. [Online; accessed 4-8-2018].
[36] Lee Copeland. A Practitioner’s Guide to Software Test Design. Soft-
ware Testing. Artech House, 2004.
[37] Rick D. Craig and Stefan P. Jaskiel. Systematic Soft-
ware Testing. Artech House, Inc. online available here:
https://flylib.com/books/en/2.174.1/, Norwood, MA, USA,
2002.
[38] Daily Mail UK. Up to 300,000 heart patients may have been
given wrong drugs or advice due to major NHS IT blun-
der. http://www.dailymail.co.uk/health/article-
3585149/Up-300-000-heart-patients-given-wrong-
drugs-advice-major-NHS-blunder.html, 2016. [Online;
accessed 4-2-2017].
[39] G. de Vries and E. Roodenrijs. Template master test plan. http:
//www.tmap.net/sites/default/files/Template_
Master_Test_Plan_TMap_NEXT_v2_1%20%281%29.doc,
2019. [Online; accessed 03-06-2019].
[40] René G. de Vries and Jan Tretmans. On-the-fly conformance test-
ing using SPIN. STTT, 2(4):382–393, 2000.
[41] Edsger W. Dijkstra. Letters to the editor: go to statement consid-
ered harmful. Communications of the ACM, 11(3):147–148, 1968.
[42] A. G. Duncan and J. S. Hutchison. Using attributed grammars to
test designs and implementations. In Proceedings of the 5th Interna-
tional Conference on Software Engineering, ICSE ’81, pages 170–178,
Piscataway, NJ, USA, 1981. IEEE Press.
[43] TMMi foundation. Tmmi. https://www.tmmi.org/, 2019.
[Online; accessed 03-06-2019].
[44] Lars Frantzen, Jan Tretmans, and Tim A. C. Willemse. Test gen-
eration based on symbolic specifications. In Jens Grabowski and
Brian Nielsen, editors, Formal Approaches to Software Testing, 4th
International Workshop, FATES 2004, Linz, Austria, September 21,
2004, Revised Selected Papers, volume 3395 of Lecture Notes in Com-
puter Science, pages 1–15. Springer, 2005.

262
[45] R. S. Freedman. Testability of software components. IEEE Trans-
actions on Software Engineering, 17(6):553–564, June 1991.
[46] De Gelderlander. Foutje: 104-jarige Zweedse mag naar kleuter-
school. http://www.gelderlander.nl/bizar/foutje-
104-jarige-zweedse-mag-naar-kleuterschool~
ab175865/, 2016. [Online; accessed 4-2-2017].
[47] Gerald M. Weinberg . Perfect Software and other illusions about test-
ing. Dorset House, 2008.
[48] Gerald M. Weinberg . Errors Bugs, Boo-boos, Blunders. Leanpub,
2015.
[49] John B. Goodenough and Susan L. Gerhart. Toward a theory of
test data selection. SIGPLAN Not., 10(6):493–510, April 1975.
[50] John B. Goodenough and Susan L. Gerhart. Toward a theory of
test data selection. In Proceedings of the International Conference on
Reliable Software, pages 493–510, New York, NY, USA, 1975. ACM.
[51] Mats Grindal, Jeff Offutt, and Sten F. Andler. Combination testing
strategies: a survey. Software Testing, Verification and Reliability,
15(3):167–199, 2005.
[52] Matthias Grochtmann and Klaus Grimm. Classification trees
for partition testing. Software Testing, Verification and Reliability,
3(2):63–82.
[53] William E. Howden. Theoretical and empirical studies of pro-
gram testing. In Proceedings of the 3rd International Conference on
Software Engineering, ICSE ’78, pages 305–311, Piscataway, NJ,
USA, 1978. IEEE Press.
[54] Michael Hunter. You are not done yet-checklist.
https://www.thebraidytester.com/downloads/
YouAreNotDoneYet.pdf, 2010. [Online; accessed 03-06-
2019].
[55] Bingchiang Jeng and Elaine J. Weyuker. A simplified domain-
testing strategy. ACM Trans. Softw. Eng. Methodol., 3(3):254–270,
July 1994.
[56] Jean-Marc Jézéquel and Bertrand Meyer. Design by contract: The
lessons of ariane. Computer, 30(1):129–130, January 1997.
[57] Paul C. Jorgensen. Software Testing: A Craftsman’s Approach. CRC
Press, Inc., Boca Raton, FL, USA, 4th edition, 2014.
[58] René Just. The major mutation framework: Efficient and scal-
able mutation analysis for java. In Proceedings of the 2014 Inter-
national Symposium on Software Testing and Analysis, ISSTA 2014,
pages 433–436, New York, NY, USA, 2014. ACM.
[59] René Just, Darioush Jalali, Laura Inozemtseva, Michael D Ernst,
Reid Holmes, and Gordon Fraser. Are mutants a valid substitute
for real faults in software testing? In Proceedings of the 22nd ACM
SIGSOFT International Symposium on Foundations of Software Engi-
neering, pages 654–665. ACM, 2014.

263
Universidad Politécnica de Valencia Structured Software testing

[60] C. Kaner, J. Bach, and B. Pettichord. Lessons Learned in Software

Testing: A Context-Driven Approach. Wiley, 2011.

[61] Cem Kaner, James Bach, and Bret Pettichord. Lessons Learned in
Software Testing. John Wiley & Sons, Inc., New York, NY, USA,
2001.

[62] H. Khalil and Y. Labiche. On fsm-based testing: An empirical

study: Complete round-trip versus transition trees. In 2017 IEEE
28th International Symposium on Software Reliability Engineering (IS-
SRE), pages 305–315, Oct 2017.

[63] Muhammad Khatibsyarbini, Mohd Adham Isa, Dayang N.A.

Jawawi, and Rooster Tumeng. Test case prioritization approaches
in regression testing: A systematic literature review. Information
and Software Technology, 93:74 – 93, 2018.

[64] P. J. H. King. Decision tables. The Computer Journal, 10(2):135–142,

1967.

[65] E. Kit and S. Finzi. Software Testing in the Real World: Improving the
Process. ACM Press books. Addison-Wesley, 1995.

[66] T. Koomen and M. Pol. Test Process Improvement: A Practical Step-

by-step Guide to Structured Testing. ACM Press books. Addison-
Wesley, 1999.

[67] Tim Koomen, Leo van der Aalst, Bart Broekman, and Michiel
Vroon. Tmap Next - voor resultaatgericht testen. Tutein Nolthenius,
Den Bosch, The Netherlands, 1st edition, 2006.

[68] Steve Krug. Don’T Make Me Think: A Common Sense Approach to

the Web (2Nd Edition). New Riders Publishing, Thousand Oaks,
CA, USA, 2005.

[69] D. Richard Kuhn, Raghu N. Kacker, and Yu Lei. Advanced com-

binatorial test methods for system reliability. In IEEE Reliability
Society 2010 Annual Technical Report, pages 1–6, 2010.

[70] D. Richard Kuhn, Raghu N. Kacker, and Yu Lei. Introduction to

Combinatorial Testing. Chapman & Hall/CRC, 1st edition, 2013.

[71] D.R. Kuhn, D.R. Wallace, and A.M. Gallo. Software fault interac-
tions and implications for software testing. IEEE Transactions on
Software Engineering, 30(6):418–421, jun 2004.

[72] Yu-Seung Ma, Jeff Offutt, and Yong-Rae Kwon. Mujava: A muta-
tion system for java. In Proceedings of the 28th International Confer-
ence on Software Engineering, ICSE ’06, pages 827–830, New York,
NY, USA, 2006. ACM.

[73] Robert Mandl. Orthogonal latin squares: An application of ex-

periment design to compiler testing. Commun. ACM, 28(10):1054–
1058, October 1985.

[74] Brian Marick. The Craft of Software Testing: Subsystem Testing In-
cluding Object-based and Object-oriented Testing. Prentice-Hall, Inc.,
Upper Saddle River, NJ, USA, 1995.

264
[75] R. Marselis, J. van Rooyen, C. Schotanus, and I. Pinkster. TestGrip:
Gaining Control on IT Quality and Processes Through Test Policy and
Test Organisation. LogicaCMG, 2007.

[76] T. J. McCabe. A complexity measure. IEEE Transactions on Software

Engineering, SE-2(4):308–320, Dec 1976.

[77] Thomas J. McCabe and Charles W. Butler. Design complexity

measurement and testing. Commun. ACM, 32(12):1415–1425, De-
cember 1989.

[78] George H. Mealy. A method for synthesizing sequential circuits.

Bell System Technical Journal, 34(5):1045–1079, 1955.

[79] Gerard Meszaros. xUnit Test Patterns: Refactoring Test Code.

Addison-Wesley Signature Series. Addison-Wesley, Upper Sad-
dle River, NJ, 2007.

[80] Bertrand Meyer. On Formalism in Specifications, pages 155–189.

Springer Netherlands, Dordrecht, 1993.

[81] E. Miranda. Running the Successful Hi-tech Project Office. Artech

House Technology Management Library. Artech House, 2003.

[82] D.J. Mosley. The Handbook of Mis Application Software Testing:

Methods, Techniques, and Tools for Assuring Quality Through Testing.
Yourdon Press Computing Series. Prentice Hall, 1993.

[83] Glenford J. Myers. The Art of Software Testing. John Wiley & Sons,
Inc., New York, NY, USA, 1st edition, 1979.

[84] Glenford J. Myers, Corey Sandler, and Tom Badgett. The Art of
Software Testing. John Wiley & Sons, Inc., New York, NY, USA,
3rd edition edition, 2011.

[85] Peter Naur. Programming by action clusters. BIT Numerical Math-

ematics, 9(3):250–258, Sep 1969.

[86] Jakob Nielsen and Robert L. Mack, editors. Usability Inspection

Methods. John Wiley & Sons, Inc., New York, NY, USA, 1994.

[87] Jakob Nielsen and Marie Tahir. Homepage Usability: 50 Websites

Deconstructed. New Riders Publishing, Thousand Oaks, CA, USA,
2001.

[88] American Society of Quality. Quality glossary. https:

//asq.org/quality-resources/quality-glossary/q,
2019. [Online; accessed 24-10-2019].

[89] A Jefferson Offutt. Investigations of the software testing coupling

effect. ACM Transactions on Software Engineering and Methodology
(TOSEM), 1(1):5–20, 1992.

[90] A Jefferson Offutt and W Michael Craft. Using compiler opti-

mization techniques to detect equivalent mutants. Software Test-
ing, Verification and Reliability, 4(3):131–154, 1994.

265
Universidad Politécnica de Valencia Structured Software testing

[91] A. Jefferson Offutt, Gregg Rothermel, and Christian Zapf. An

experimental evaluation of selective mutation. In Proceedings of
the 15th International Conference on Software Engineering, ICSE ’93,
pages 100–107, Los Alamitos, CA, USA, 1993. IEEE Computer So-
ciety Press.
[92] T. J. Ostrand and M. J. Balcer. The category-partition method
for specifying and generating functional tests. Commun. ACM,
31(6):676–686, June 1988.
[93] Goran Petrović and Marko Ivanković. State of mutation testing
at google. In Proceedings of the 40th International Conference on Soft-
ware Engineering: Software Engineering in Practice, pages 163–171.
ACM, 2018.
[94] M Pezze and M Young. Software Testing And Analysis: Process,
Principles And Techniques. John Wiley & Sons Inc„ Hoboken, N.J.,
2007.
[95] Simone Pimont and Jean-Claude Rault. A software reliability as-
sessment based on a structural and behavioral analysis of pro-
grams. In Proceedings of the 2Nd International Conference on Software
Engineering, ICSE ’76, pages 486–491, Los Alamitos, CA, USA,
1976. IEEE Computer Society Press.
[96] Iris Pinkster, Bob van de Burgt, Dennis Janssen, and Erik van
Veenendaal. Successful Test Management: An Integral Approach.
Springer Publishing Company, Incorporated, 2010.
[97] Martin Pol, Ruud Teunissen, and Erik VanVeenendaal. Software
Testing: A Guide to the TMap Approach. Addison-Wesley, 2002.
[98] Prof. J. L. LIONS. ARIANE 5 Flight 501 Failure - Report by the In-
quiry Board. http://www-users.math.umn.edu/~arnold/
disasters/ariane5rep.html, 1996. [Online; accessed 20-2-
2018].
[99] José Miguel Rojas and Gordon Fraser. Code defenders: A muta-
tion testing game. In Proc. of The 11th International Workshop on
Mutation Analysis. IEEE, 2016. To appear.
[100] RTL nieuws. 104-jarige krijgt brief van gemeente: je mag
naar de kleuterschool! http://www.rtlnieuws.nl/
nieuws/opmerkelijk/104-jarige-krijgt-brief-
van-gemeente-je-mag-naar-de-kleuterschool, 2016.
[Online; accessed 4-2-2017].
[101] J. Rubin, D. Chisnell, and J. Spool. Handbook of Usability Testing:
How to Plan, Design, and Conduct Effective Tests. Wiley, 2011.
[102] Torbjörn Ryber. Essential Software Test Design. Fearless Consult-
ing, 2007.
[103] Huib Schoots. 29119. http://www.huibschoots.nl/
wordpress/?page_id=1771. [Online; accessed 19-09-2019].
[104] C.C. Schotanus. TestFrame: An Approach to Structured Testing.
Springer Berlin Heidelberg, 2009.

266
[105] Roger G Schroeder, Kevin Linderman, Srilata Zaheer, and
Adrian S Choo. Six sigma: a goal-theoretic perspective. Qual-
ity control and applied statistics, 49(1):49–50, 2004.

[106] David Schuler, Valentin Dallmeier, and Andreas Zeller. Efficient

mutation testing by checking invariant violations. In Proceed-
ings of the Eighteenth International Symposium on Software Testing
and Analysis, ISSTA ’09, pages 69–80, New York, NY, USA, 2009.
ACM.

[107] Richard W Selby, Victor R Basili, Jerry Page, and Frank E Mc-
Garry. Evaluating software testing strategies. In Proc. Ninth Annu.
Software Eng. Workshop, 1984.

[108] Akbar Siami Namin, James H Andrews, and Duncan J Murdoch.

Sufficient mutation operators for measuring test effectiveness. In
Proceedings of the 30th international conference on Software engineer-
ing, pages 351–360. ACM, 2008.

[109] SOGETI. Tmap. http://www.tmap.net/, 2019. [Online; ac-

cessed 03-06-2019].

[110] SOGETI. Tpi next. http://www.tmap.net/tpi-downloads,

2019. [Online; accessed 03-06-2019].

[111] A. Spillner, T. Linz, and H. Schaefer. Software Testing Foundations:

A Study Guide for the Certified Tester Exam. Rocky Nook, 2014.

[112] D. Spinellis. Code Quality: The Open Source Perspective. Effective

Software Development Series. Pearson Education, 2006.

[113] The Atlantic. How ’Gangnam Style’ Broke YouTube. https:

//www.theatlantic.com/technology/archive/2014/
12/how-gangnam-style-broke-youtube/383389/, 2014.
[Online; accessed 20-2-2018].

[114] The independent. Swedish pensioner, aged 104, offered

Kindergarten place due to computer glitch. http://www.
independent.co.uk/news/world/europe/swedish-
pensioner-aged-104-offered-kindergarten-place-
due-to-computer-glitch-a6967786.html, 2016. [Online;
accessed 20-2-2018].

[115] TheTelegraph. Statins glitch means thousands may have been

incorrectly prescribed . http://www.telegraph.co.uk/
news/2016/05/11/statins-glitch-means-thousands-
may-have-been-incorrectly-prescri/, 2017. [Online;
accessed 4-2-2017].

[116] Jan Tretmans. Model based testing with labelled transition sys-
tems. In Robert M. Hierons, Jonathan P. Bowen, and Mark
Harman, editors, Formal Methods and Testing, An Outcome of the
FORTEST Network, Revised Selected Papers, volume 4949 of Lecture
Notes in Computer Science, pages 1–38. Springer, 2008.

267
Universidad Politécnica de Valencia Structured Software testing

[117] ARS Technica UK. Bug in GP software may have

coughed up wrong data on heart disease risk. http:
//arstechnica.co.uk/security/2016/05/bug-in-
gp-heart-risk-calculator-tool-tpp/, 2016. [Online;
accessed 4-2-2017].

[118] Roland H Untch, A Jefferson Offutt, and Mary Jean Harrold. Mu-
tation analysis using mutant schemata. In ACM SIGSOFT Soft-
ware Engineering Notes, volume 18, pages 139–148. ACM, 1993.

[119] A. van Ewijk, B. Linker, M. van Oosterwijk, and B. Visser. TPI

next: business driven test process improvement. Kleine Uil, Uitgev-
erij, 2013.

[120] A. van Ewijk, B. Linker, M. van Oosterwijk, and B. Visser. TPI

next: business driven test process improvement. Kleine Uil, Uitgev-
erij, 2013.

[121] Han van Loon. Process Assessment and Improvement. Springer,

2007.

[122] G.M. Weinberg. An Introduction to General Systems Thinking.

Dorset House, 2001.

[123] Victor Weinberg. Structured Analysis. Prentice Hall PTR, Upper

Saddle River, NJ, USA, 1978.

[124] Elaine J. Weyuker. On testing non-testable programs. The Com-

puter Journal, 25(4):465, 1982.

[125] L. J. White and E. I. Cohen. A domain strategy for computer

program testing. IEEE Transactions on Software Engineering, SE-
6(3):247–257, May 1980.

[126] James A. Whittaker. How to Break Software: A Practical Guide to

Testing. Addison Wesley., 2003.

[127] S. Yoo and M. Harman. Regression testing minimization, se-

lection and prioritization: A survey. Softw. Test. Verif. Reliab.,
22(2):67–120, March 2012.

[128] YouTube. Longer video of ’Ariane 5’ Rocket first launch failure/-

explosion. https://www.youtube.com/watch?v=gp_D8r-
2hwk, 1996. [Online; accessed 20-2-2018].

[129] YouTube Google+ blogpost. . https://plus.google.com/

+YouTube/posts/BUXfdWqu86Q, 2014. [Online; accessed 20-
2-2018].

[130] Lingming Zhang, Milos Gligoric, Darko Marinov, and Sarfraz

Khurshid. Operator-based and random mutant selection: Better
together. In Proceedings of the 28th IEEE/ACM International Con-
ference on Automated Software Engineering, ASE’13, pages 92–102,
Piscataway, NJ, USA, 2013. IEEE Press.

268