Debugging Testing and Verification PDF
Debugging Testing and Verification PDF
Computer Science
Research Division
Almaden - Austin - Beijing - Delhi - Haifa - India - T. J. Watson - Tokyo - Zurich
LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report
for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests.
After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Watson Research Center ,
P. O. Box 218, Yorktown Heights, NY 10598 USA (email: [email protected]). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home .
Software Debugging, Testing, and Verification
Abstract
1. Introduction:
While software has increased tremondously in complexity and scope over the past decades, the
advances in the state of the practice of software engineering techniques to produce the software have
been only moderate at best. Software development has remained primarily a labor intensive effort and
thus subject to the limitations of human endeavor. As Frederick Brooks explained over a quarter century
ago [1], there is a big difference between a programmer hacking on an isolated program and a
programming systems product. A programming systems product “can be run, tested, repaired, and
extended by anybody .. in many operating environments, for many sets of data” and forms a part of “a
collection of interacting programs, coordinated in function and disciplined in format, so that the
assemblage constitutes an entire facility for large tasks ”. Brooks asserted a nine-fold increase in cost to
develop a programming system product from an isolated program.
(Interfaces, System
An isolated program x3 Integration)
x3
x9
A Programming
Product
(Generalization, A Programming
Testing, Systems Product
Documentation,
Maintenance)
Figure 1: Evolution of a programming systems product. Reused with permission from Reference [1]
With the advent of the Internet and the World Wide Web, the problems that were recognized a
quarter century ago as having “no silver bullet” for the solution [1], have been magnified. The
-1-
challenges and problems of architecting and testing distributed computing systems with distributed data
and web services with the need for coexistence of heterogeneous platforms, unpredictable runtime
environments, and so on make the already difficult problem harder.
A key ingredient that contributes to a reliable programming systems product is the assurance that
the program will perform satisfactorily in terms of its functional and nonfunctional specifications under
the expected deployment environments. In a typical commercial development organization, the cost of
providing this assurance via appropriate debugging, verification and testing activities can easily range
from 50% to 75% of the total development cost. Thus we should understand what is involved in these
activities that make them so challenging and so expensive.
Since one of the goals of this special issue of the Systems Journal is to be accessible to the
students of software engineering at large, we want to spend some time defining the relevant terminology
and its implications (we include formal notations for this terminology, but it is not essential for the basic
understanding of problem definition). We begin with a software program written in a programming
language (let P be the program written in language L). The program is expected to satisfy a set of
specifications, where those specifications are written in a specification language (call the set of
specifications F c (v 1 , v 2 , v 3 , .., v n ) and the specification language L). In most real-world cases, the
specification language (L) is the natural language of the development team (i.e., English, Spanish, etc.)
Debugging: The process of debugging involves analyzing and possibly extending (with debugging
statements) the given program that does not meet the specifications in order to find a new program that is
close to the original and does satisfy the specifications (given specifications F and a program P, not
satisfying some v k c F, find a program P’ “close” to P that does satisfy v k ). Thus it is the process of
“diagnosing the precise nature of a known error and then correcting it” [2].
Verification: Given a program and a set of specifications, show that the program satisfies those
specifications (given P and a set of specifications F(v 1 , v 2 , v 3 , .., v n ), show that P satisfies F). Thus,
verification is the process of proving or demonstrating that the program correctly satisfies the
specifications [2]. Notice that we use the term verification in the sense of “functional correctness ”,
which is different from the typical discussion of verification activities discussed in some software
engineering literature [3,4], where it applies to ensuring that “each step of the development process
correctly echoes the intentions of the immediately preceding step”.
Testing: Whereas verification proves conformance with a specification, testing finds cases where a
program does not meet its specification (given specifications F and a program P find as many of
(v 1 , v 2 , v 3 , ..v p ) c F not satisfied by P). Based on this definition, any activity that exposes the
program behavior violating a specification can be called testing. In this context, activities such as
design reviews, code inspections, and static analysis of source code can all be called testing even though
code is not being executed in the process of finding the error or unexpected behavior. Some times they
are referred to as “Static Testing” [5]. Of course, execution of code by invoking specific test cases
targeting specific functionality (using, for example regression test suites) is a major part of testing .
Validation: The process of evaluation of software at the end of the software development process to
ensure compliance with requirements. Note that the verification community also uses the term
-2-
“validation” to differentiate formal functional verification and extensive testing of a program against its
specifications.
Defect (Bug): This is the result of each occurrence of the program design or the program code not
meeting a specification. [6]
System
Requirements
Software
Requirements
Analysis Verification
Design
Testing by
Coding Execution
Static
Testing Testing
Production /
Debugging Deployment
Figure 2: The activities that involve debugging, verification and testing in a typical software
development process.
2.1 Debugging:
The purpose of debugging is to locate and fix the offending code responsible for a symptom
violating a known specification. Debugging typically happens during three activities in software
development and the level of granularity of the analysis required for locating the defect differs in these
three. The first is during the coding process, when the programmer translates the design into an
executable code. During this process the errors made by the programmer in writing the code can lead to
defects which need to be quickly detected and fixed before the code goes to the next stages of
development. Most often, developer also performs unit test to expose any defects at the module /
component level. The second place for debugging is during the later stages of testing involving multiple
components or a complete system when unexpected behavior such as wrong return codes, abnormal
program termination (abends) may be found. A certain amount of debugging of the test execution is
necessary to conclude that the program under test is the cause of the unexpected behavior and not the
result of a “bad” test case due to incorrect specification, inappropriate data, or changes in functional
specification between different versions of the system. Once the defect is confirmed, debugging of the
-3-
program follows and the misbehaving component and the required fix are determined. The third place for
debugging is in production or deployment, when the software under test faces real operational conditions.
Some undesirable aspects of software behavior, such as inadequate performance under a severe
workload or unsatisfactory recovery from a failure, get exposed at this stage and the offending code
needs to be found and fixed before going to deployment. This process may also be called “problem
determination” due to the enlarged scope of the analysis required before the defect can be localized.
2.2 Verification:
In order to verify “functional correctness”of a program, one needs to capture the model of the
behavior of the program in a formal language or use the program itself. In most commercial software
development organizations, there is often no formal specification of the program being built. Formal
verification [8] is routinely used by only small pockets of the industrial software community, particularly
in the areas of protocol verification and embedded systems.
2.3 Testing:
Testing is clearly a necessary area for software validation. Typically, prior to program being
coded, one can do design reviews, code inspection as a part of the static testing effort [9]. Once the code
is written, various other static analysis methods based on source code can be applied [5]. The various
kinds and stages of testing targeting the different levels of integration and the various modes of software
failures are discussed in a wide body of literature [2,10,11]. The testing done at later stages (e.g., external
function tests, system tests, etc.) are black box testing based on external specifications and hence do not
involve the understanding of the detailed code implementations. Typical system testing targets key
aspects of the product such as recovery, security, stress, performance, hardware configurations, software
configurations, etc. Testing during production/deployment typically involves some level of customer
acceptance criteria. Many software companies have defined a pre-release beta programs with customers
to accomplish this.
Prod-5
Prod-4
Prod-3
Prod-2
Prod-1
Solution
Integration Test of
Func-1 Many Products
Func-3
Func-2 Func-4
Mod-2
Mod-3 Mod-1
System/Integration
or a Product Test
Module
Function/Component
Test
Unit Test/
Code Inspection
-4-
3. Current State of Technology and Practice:
Should one wish to go beyond fuzzy specifications written in a natural language, there is a long
history of many intellectually interesting formal models and techniques [12,13,14] that have been devised
to formally describe and prove the correctness of software: Hoare-style assertions, Petri Nets,
Communicating Sequential Processes, Temporal Logic, Algebraic Systems, Finite State Specifications,
Model Checking , and Interval Logics. A key aspect of formal modeling is that the level of detail needed
to capture the adequate aspects of the underlying software program can be overwhelming. If all the
details contained in the program are necessary to produce a specificaton or test cases, then the model may
well be at least as large as the program, thus making it not as attractive to the software engineers. For
example, there have been several attempts to model software programs as finite state machines (FSMs)
[12]. While they have been successful in the context of embedded systems and protocol verification, state
based representation of software leads to explosion in the number of states very quickly [12]. This
explosion is a direct result of software constructs like unbounded data structures, unbounded message
queues, the asynchronous nature of different software processes (without a global synchronizing clock),
and so on. In order to be relevant and manageable, software models have to use techniques such as
symbolic algorithms, partial order reduction, compositional reasoning, abstraction, symmetry and
induction [12].
There are many formal languages, such as Z, SDL, and Promela, that can be used to capture
specifications, but they are consistently used by only small pockets of the industrial software community.
Unlike other areas of computer science and engineering, software testing, debugging and verification
have been evolved more as a practitioner's collection of tricks rather than as a well accepted set of
theories or practices.
3.1. Debugging:
As is well known among software engineers, locating a defect takes the most effort in
debugging. Debugging during the coding process is at the smallest granularity. In the early years of
software development, defects that escaped code reviews were found by compilation and execution.
Through a painful process (such as inserting print statements for the known outputs at the appropriate
places in the program), a programmer could locate the exact location of the error and find a suitable fix.
Even today, debugging remains very much an art. Much of the computer science community has largely
ignored the debugging problem [15]. Eisenstadt [16] studied 59 anecdotal debugging experiences and his
conclusions were as follows: Just over 50% of the problems resulted from time/space chasm between
-5-
symptom and root cause or inadequate debugging tools. The predominant technique for bug finding
remained data gathering (e.g., print statements) and hand simulation. The two biggest causes of bugs
were memory overwrites and defects in vendor supplied hardware or software. To help the software
engineers in the debugging of the program during the coding process, many new approaches have been
proposed and many commercial debugging environments are available. Integrated Development
Environments (IDEs) provide a way to capture some of the language specific predetermined errors (e.g.,
missing end of statement characters, undefined variables, and so on) without requiring compilation. One
area that has caught the imagination of the industry is the visualization of the necessary underlying
programming constructs as a means to analyze a program. [17, 18]. There is also considerable work in
trying to automate the debugging process through program slicing [19].
When the testing of software results in a failure and the test case is analyzed to be not the source
of the problem, debugging of the program follows and the required fix is determined. Debugging during
testing still remains manual, by and large, despite advances in test execution technology. There is a clear
need for a stronger link between test creation and test execution processes in order to minimize the pain
involved here. Debugging in production or deployment is very complicated. Short of using some
advanced problem determination techniques for locating the specific defect or deficiency that led to the
unexpected behavior, this debugging can be painful, time consuming and very expensive. When this
involves multiple software products and critical interdependencies, with no easy way to narrow down
location of the offending code, this is a real nightmare. As the debugging moves away from the source
code, it becomes even more manual and time consuming.
3.2. Verification:
In order to verify “functional correctness”of a program, one needs to capture the specifications
for the program in a formal manner. One reaction to the difficulties in capturing a full formal
specification is to formalize only some properties of a program (such as the correctness of its
synchronization skeleton) and verify those by abstracting away details of the program. In this category,
one finds network protocols, reactive systems, and micro controller systems. In each case the
specification of the problem is relatively small (either because the protocols are layered with
well-defined assumptions, inputs, and outputs, or because the size of the program or the generality of the
implementation is restricted) and hence tractable by automatic or semiautomatic systems. There is also a
community that uses the building of a model representing the software requirements and design [12, 13]
and verifying that the model satisfies the program requirements. However, this does not assure that the
implemented code satisfies the property since there is no formal link between the model and the
implementation (that is, the program is not derived/created from the model).
Historically, software verification has had little impact on the real world. Despite the plethora of
specification and verification technologies (described above), the problem has been applying these
techniques and theories to full-scale real-world programs. Any fully-detailed specification must, by its
very nature, be as complex as the actual program. Any simplification or abstraction may hide details that
may be critical to the correct operation of the program. Similarly, any proof system that can
automatically verify a real program, must be able to handle very complex logical analyses, some of which
are formally undecidable. The use of complex theorem proving systems also requires high skills and
does not scale to large programs. The human factor also enters into the equation: crafting a correct
specification (especially one using an obscure formal system) is often much more difficult than writing
the program being proved (even one written in an obscure programming language) [20]. To date, success
in program verification has come in restricted domains where either the state space of the problem is
constrained or only a portion of the program is actually verified. General theorem provers, model
checkers, state machine analyzers, and tools customized to particular applications have all been used to
prove such systems.
-6-
3.3 Software Testing :
Dijkstra's criticism [21], "Program testing can be used to show the presence of bugs, but never
to show their absence" is well known. From his point of view, any amount of testing represents only a
small sampling of all possible computations and therefore never adequate to assure the expected behavior
of the program under all possible conditions. He asserted that “the extent to which the program
correctness can be established is not purely a function of the program’s external specifications and
behavior but it depends critically upon its internal structure ”. However, testing has become the preferred
process by which a software is “shown” to satisfy its requirements. This is primarily because no other
approach based on more formal methods comes close to giving the scalability and satisfying the intuitive
‘coverage’ needs of a software engineer. Hamlet [22] linked good testing to the measurement of the
dependability of the tested software in some statistical sense. The absence or presence of failures as
exposed by testing alone does not measure the dependability of the software; unless there is a way to
“quantify” the properties of testing to be certain that adequate dimensions of testing were covered, which
include the testability of the target software. Test planning [10, 11] based on partitioning of the
functionality, data, end user operational profiles [23] and so on, are very useful and popular in testing
research and among the practitioners. Many of the current technologies in testing have these ideas
underneath.
-7-
that the test cases are already manually defined and written (or captured via a tool) and they can be
executed in a test execution environment in terms of scheduling, logging of results (success or failure),
capturing of details of the failing environment, and so on. In the case of testing that requires explicit use
of graphical user interfaces, the automation of the test execution has already produced major productivity
gains across the industry. There are a number of commercial and research test tools available in the
industry.
Automation of test design (and hence test creation) is another matter [11, 27]. In order to
automate functional test design, we need a formal description of the specifications of the software
behavior resulting in a model of the software behavior. As discussed earlier, this formal description is
not captured in typical commercial organizations. While the use of finite state machine technology [28]
is beginning to take hold in the model checking community, its use in the broader industrial software
testing community is limited at best. The increasing adoption of UML (Unified Modeling Language) as
the design language by software developers may provide the first opportunity for a widely used technique
to capture a more formal description of sofware specifications. However, UML still lacks the constructs
to be an effective language for capturing realistic test specifications.[29,30].
4. Conclusions
We have only scratched the surface of debugging, verification and testing in this article. Other
significant work can be found on automated debugging [33], coverage-based testing techniques [34],
performance testing and analysis [35], and concurrent and distributed testing [36].
-8-
teach software engineering as a holistic discipline and not as a collection of tricks and tools that
experienced programmers know how to exploit.
References
-9-
[24]S. H. Kan, J. Parrish, and D. Manlove, “In-process Metrics for Software Testing”, IBM Systems
Journal, 40 (1), pp. 220-241, (2001).
[25]K. Bassin, T. Kratschmer, and P. Santhanam, “Evaluating Software Development Objectively”,
IEEE Software, 15 (6), pp.66-74 (1998).
[26]D. Brand, “A Software Falsifier”, Proceedings of the Eleventh IEEE International Symposium on
Software Reliability Engineering, pp. 174-185, San Jose , CA , 2000.
[27]R. M. Poston, “Automating Specification-Based Software Testing”, IEEE Computer Society Press,
Los Alamitos (1996).
[28]D. Lee and M. Yannakakis, “Principles and Methods of Testing Finite State Machines-A Survey”,
Proceedings of the IEEE, v.84, no. 8 pp 1090-1123 (1996).
[29]A. Paradkar, “ SALT-An Integrated Environment to Automate Generation of Function Tests for
APIs”, Proceedings of the Eleventh IEEE International Symposium on Software Reliability
Engineering, pp. 304-316, San Jose , CA , 2000.
[30]C. Williams, “Toward a Test-Ready Meta-model for Use Cases”, in the Proceedings of the
Workshop on Practical UML-Based Rigorous Development Methods (Evans, France, Moreira, and
Rumpe, eds.), Toronto, Canada, Oct. 1, 2001, pp. 270-287.
[31]J. A. Whittaker, “What is Software Testing? And Why is it so hard?”, IEEE Software, v.17 No.1,
pp.70-79, 2000.
[32]M. J. Harrold, J. Jones, T. Li, D. Liang, A. Orso, M. Pennings, S. Sinha, S. Spoon, and A. Gujarathi,
“Regression Test Selection for Java Software”, Proceedings of the ACM Conference on
Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2001) (to appear).
[33]A. Zeller. “Yesterday, my program worked. Today, it does not. Why?”, Proceedings of the 7th
European Engineering Conference held jointly with the 7th ACM SIGSOFT symposium on
Foundations of software engineering, pp.253-267, 1999.
[34]M. Benjamin, D. Geist, A. Hartman, G. Mas, R. Smeets, and Y. Wolfsthal, “A study in coverage
driven test generation”, Proceedings of the 36th Design Automation Conference (DAC) , June 1999.
[35]F. I. Vokolos and E. J. Weyuker, “Performance Testing of software systems”, Proceedings of the
first ACM SIGSOFT International Workshop on Software and Performance, pp. 80 - 87 (1998).
[36]R. H. Carver and K-C. Tai, “Replay and Testing for Concurrent Programs”, IEEE Software,
pp.66-74, March 1991.
Authors’ Biographies:
Dr. Brent Hailpern received his B.S. degree, summa cum laude, in Mathematics from the University of
Denver in 1976, and his M.S. and Ph.D. degrees in Computer Science from Stanford University in 1978
and 1980 respectively. His thesis was titled, "Verifying Concurrent Processes Using Temporal Logic".
Dr. Hailpern joined the IBM T. J. Watson Research Center as a Research Staff Member in 1980 after
receiving his PhD from Stanford University. He worked on and managed various projects relating to
issues of concurrency and programming languages. In 1990, Dr. Hailpern joined the Technical Strategy
Development Staff in IBM Corporate Headquarters, returning to the Research Division in 1991. Since
then he has managed IBM Research departments covering operating systems, multimedia servers,
Internet technology, and pervasive computing. He was also the client product manager for the IBM
NetVista education software product, for which he received IBM's Outstanding Innovation Award. Since
1999, he has been the Associate Director of Computer Science for IBM Research. Dr. Hailpern has
-10-
authored twelve journal publications and thirteen United States patents, along with numerous conference
papers and book chapters. He is a past Secretary of the ACM, a past Chair of the ACM Special Interest
Group on Programming Languages (SIGPLAN) and a Fellow of the IEEE. He was the chair of the
SIGPLAN '91 Conference on Programming Language Design and Implementation and was chair of
SIGPLAN's OOPSLA '99 Conference. In 1998, he received SIGPLAN's Distinguished Service Award.
He is currently Chair of the OOPSLA Conference Steering Committee and an Associate Editor for
ACM's Transactions on Programming Languages and Systems (TOPLAS).
Dr. Padmanabhan Santhanam holds a B.Sc. from University of Madras, India , an M.Sc. from Indian
Institute of Technology, Madras, an M.A. from Hunter College of the CUNY and a Ph.D. in Applied
Physics from Yale University. He joined IBM Research in 1985 and has been with the Center of
Software Engineering since 1993, which he currently manages. He has worked on deploying Orthogonal
Defect Classification across IBM software labs and with external customers. His interests include
software metrics, structure based testing algorithms, automation of test generation, and realistic
modeling of processes in software development and service. Dr. Santhanam is a member of the ACM
and a senior member of the IEEE.
-11-