Software Metrics
Software Metrics
Measurement is not only used by professional technologists, but also used by all of us in
everyday life. In a shop, the price acts as a measure of the value of an item. Similarly,
height and size measurements will ensure whether the cloth will fit properly or not. Thus,
measurement will help us compare an item with another.
The measurement takes the information about the attributes of entities. An entity is an
object such as a person or an event such as a journey in the real world. An attribute is a
feature or property of an entity such as the height of a person, cost of a journey, etc. In
the real world, even though we are thinking of measuring the things, actually we are
measuring the attributes of those things.
Attributes are mostly defined by numbers or symbols. For example, the price can be
specified in number of rupees or dollars, clothing size can be specified in terms of small,
medium, large.
Thus, for controlling software products, measuring the attributes is necessary. Every
measurement action must be motivated by a particular goal or need that is clearly
defined and easily understandable. The measurement objectives must be specific, tried to
what managers, developers and users need to know. Measurement is required to assess
the status of the project, product, processes, and resources.
In software engineering, measurement is essential for the following three basic activities
Measurement tells us the rules laying the ground work for developing and reasoning
about all kinds of measurement. It is the mapping from the empirical world to the formal
relational world. Consequently, a measure is the number or symbol assigned to an entity
by this mapping in order to characterize an entity.
Empirical Relations
In the real world, we understand the things by comparing them, not by assigning
numbers to them.
For example, to compare height, we use the terms ‘taller than’, higher than’. Thus, these
‘taller than’, higher than’ are empirical relations for height.
We can define more than one empirical relation on the same set.
Empirical relations in the real world can be mapped to a formal mathematical world.
Mostly these relations reflect the personal preferences.
Some of the mapping or rating technique used to map these empirical relations to the
mathematical world is follows −
Likert Scale
Here, the users will be given a statement upon which they have to agree or disagree.
Forced Ranking
For example: Rank the following 5 software modules according to their performance.
Module A
Module B
Module C
Module D
Module E
Ordinal Scale
Here, the users will be given a list of alternatives and they have to select one.
Hourly
Daily
Weekly
Monthly
Several times a year
Once or twice a year
Never
Comparative Scale
Here, the user has to give a number by comparing the different options.
12345678910
Numerical Scale
UnimportantImportant
12345678910
To perform the mapping, we have to specify domain, range as well as the rules to
perform the mapping.
The representational condition asserts that a measurement mapping (M) must map
entities into numbers, and empirical relations into numerical relations in such a way that
the empirical relations preserve and are preserved by numerical relations.
For example: The empirical relation ‘taller than’ is mapped to the numerical relation
‘>’.i.e., X is taller than Y, if and only if M(X) > M(Y)
Since, there can be many relations on a given set, the representational condition also has
implications for each of these relations.
For the unary relation ‘is tall’, we might have the numerical relation
X > 50
Models are useful for interpreting the behaviour of the numerical elements of the real-
world entities as well as measuring them. To help the measurement process, the model
of the mapping should also be supplemented with a model of the mapping domain. A
model should also specify how these entities are related to the attributes and how the
characteristics relate.
Direct measurement
Indirect measurement
Direct Measurement
These are the measurements that can be measured without the involvement of any other
entity or attribute.
Indirect Measurement
These are measurements that can be measured in terms of any other entity or attribute.
For allocating the appropriate resources to the project, we need to predict the effort,
time, and cost for developing the project. The measurement for prediction always
requires a mathematical model that relates the attributes to be predicted to some other
attribute that we can measure now. Hence, a prediction system consists of a
mathematical model together with a set of prediction procedures for determining the
unknown parameters and interpreting the results.
Measurement Scales
Measurement scales are the mappings used for representing the empirical relation
system. It is mainly of 5 types −
Nominal Scale
Ordinal Scale
Interval Scale
Ratio Scale
Absolute Scale
Nominal Scale
It places the elements in a classification scheme. The classes will not be ordered. Each
and every entity should be placed in a particular class or category based on the value of
the attribute.
The empirical relation system consists only of different classes; there is no notion of
ordering among the classes.
Any distinct numbering or symbolic representation of the classes is an acceptable
measure, but there is no notion of magnitude associated with the numbers or
symbols.
Ordinal Scale
The empirical relation system consists of classes that are ordered with respect to
the attribute.
Any mapping that preserves the ordering is acceptable.
The numbers represent ranking only. Hence, addition, subtraction, and other
arithmetic operations have no meaning.
Interval Scale
This scale captures the information about the size of the intervals that separate the
classification. Hence, it is more powerful than the nominal scale and the ordinal scale.
M = aM’ + b
Ratio Scale
This is the most useful scale of measurement. Here, an empirical relation exists to
capture ratios. It has the following characteristics −
M = aM’
Absolute Scale
On this scale, there will be only one possible measure for an attribute. Hence, the only
possible transformation will be the identity transformation.
The measurement is made by counting the number of elements in the entity set.
The attribute always takes the form “number of occurrences of x in the entity”.
There is only one possible measurement mapping, namely the actual count.
All arithmetic operations can be performed on the resulting count.
Empirical Investigations
Survey
Case study
Formal experiment
Survey
In this case, we have no control over the situation at hand. We can record a situation and
compare it with a similar one.
Case Study
It is a research technique where you identify the key factors that may affect the outcome
of an activity and then document the activity: its inputs, constraints, resources, and
outputs.
Formal Experiment
It is a rigorous controlled investigation of an activity, where the key factors are identified
and manipulated to document their effects on the outcome.
If the activity has already occurred, we can perform survey or case study. If it is yet
to occur, then case study or formal experiment may be chosen.
If we have a high level of control over the variables that can affect the outcome,
then we can use an experiment. If we have no control over the variable, then case
study will be a preferred technique.
If replication is not possible at higher levels, then experiment is not possible.
If the cost of replication is low, then we can consider experiment.
To boost the decision of a particular investigation technique, the goal of the research
should be expressed as a hypothesis we want to test. The hypothesis is the tentative
theory or supposition that the programmer thinks explains the behaviour they want to
explore.
After stating the hypothesis, next we have to decide the different variables that affect its
truth as well as how much control we have over it. This is essential because the key
discriminator between the experiment and the case studies is the degree of control over
the variable that affects the behaviour.
A state variable which is the factor that can characterize the project and can also
influence the evaluation results is used to distinguish the control situation from the
experimental one in the formal experiment. If we cannot differentiate control from
experiment, case study technique will be a preferred one.
The results of an experiment are usually more generalizable than case study or survey.
The results of the case study or survey can normally be applicable only to a particular
organization. Following points prove the efficiency of these techniques to answer a
variety of questions.
Case studies or surveys can be used to confirm the effectiveness and utility of the
conventional wisdom and many other standards, methods, or tools in a single
organization. However, formal experiment can investigate the situations in which the
claims are generally true.
Exploring relationships
The relationship among various attributes of resources and software products can be
suggested by a case study or survey.
For example, a survey of completed projects can reveal that a software written in a
particular language has fewer faults than a software written in other languages.
Understanding and verifying these relationships is essential to the success of any future
projects. Each of these relationships can be expressed as a hypothesis and a formal
experiment can be designed to test the degree to which the relationships hold. Usually,
the value of one particular attribute is observed by keeping other attributes constant or
under control.
Models are usually used to predict the outcome of an activity or to guide the use of a
method or tool. It presents a particularly difficult problem when designing an experiment
or case study, because their predictions often affect the outcome. The project managers
often turn the predictions into targets for completion. This effect is common when the
cost and schedule models are used.
Some models such as reliability models do not influence the outcome, since reliability
measured as mean time to failure cannot be evaluated until the software is ready for use
in the field.
Validating measures
There are many software measures to capture the value of an attribute. Hence, a study
must be conducted to test whether a given measure reflects the changes in the attribute
it is supposed to capture. Validation is performed by correlating one measure with
another. A second measure which is also a direct and valid measure of the affecting
factor should be used to validate. Such measures are not always available or easy to
measure. Also, the measures used must conform to human notions of the factor being
measured.
Software Measurement
Processes
Products
Resources
Internal attributes are those that can be measured purely in terms of the
process, product, or resources itself. For example: Size, complexity, dependency
among modules.
External attributes are those that can be measured only with respect to its
relation with the environment. For example: The total number of failures
experienced by a user, the length of time it takes to search the database and
retrieve information.
The different attributes that can be measured for each of the entities are as follows −
Processes
Processes are collections of software-related activities. Following are some of the internal
attributes that can be measured directly for a process −
Products
Products are not only the items that the management is committed to deliver but also
any artifact or document produced during the software life cycle.
The different internal product attributes are size, effort, cost, specification, length,
functionality, modularity, reuse, redundancy, and syntactic correctness. Among these
size, effort, and cost are relatively easy to measure than the others.
The different external product attributes are usability, integrity, efficiency, testability,
reusability, portability, and interoperability. These attributes describe not only the code
but also the other documents that support the development effort.
Resources
These are entities required by a process activity. It can be any input for the software
production. It includes personnel, materials, tools and methods.
The different internal attributes for the resources are age, price, size, speed, memory
size, temperature, etc. The different external attributes are productivity, experience,
quality, usability, reliability, comfort etc.
A particular measurement will be useful only if it helps to understand the process or one
of its resultant products. The improvement in the process or products can be performed
only when the project has clearly defined goals for processes and products. A clear
understanding of goals can be used to generate suggested metrics for a given project in
the context of a process maturity framework.
The GQM approach provides a framework involving the following three steps −
To use GQM paradigm, first we express the overall goals of the organization. Then, we
generate the questions such that the answers are known so that we can determine
whether the goals are being met. Later, analyze each question in terms of what
measurement we need in order to answer each question.
Typical goals are expressed in terms of productivity, quality, risk, customer satisfaction,
etc. Goals and questions are to be constructed in terms of their audience.
To help generate the goals, questions, and metrics, Basili & Rombach provided a series of
templates.
According to the maturity level of the process given by SEI, the type of measurement
and the measurement program will be different. Following are the different measurement
programs that can be applied at each of the maturity level.
Level 1: Ad hoc
At this level, the inputs are ill- defined, while the outputs are expected. The transition
from input to output is undefined and uncontrolled. For this level of process maturity,
baseline measurements are needed to provide a starting point for measuring.
Level 2: Repeatable
At this level, the inputs and outputs of the process, constraints, and resources are
identifiable. A repeatable process can be described by the following diagram.
The input measures can be the size and volatility of the requirements. The output may be
measured in terms of system size, the resources in terms of staff effort, and the
constraints in terms of cost and schedule.
Level 3: Defined
At this level, intermediate activities are defined, and their inputs and outputs are known
and understood. A simple example of the defined process is described in the following
figure.
The input to and the output from the intermediate activities can be examined, measured,
and assessed.
Level 4: Managed
At this level, the feedback from the early project activities can be used to set priorities
for the current activities and later for the project activities. We can measure the
effectiveness of the process activities. The measurement reflects the characteristics of
the overall process and of the interaction among and across major activities.
Level 5: Optimizing
At this level, the measures from activities are used to improve the process by removing
and adding process activities and changing the process structure dynamically in response
to measurement feedback. Thus, the process change can affect the organization and the
project as well as the process. The process will act as sensors and monitors, and we can
change the process significantly in response to warning signs.
At a given maturity level, we can collect the measurements for that level and all levels
below it.
Process maturity suggests to measure only what is visible. Thus, the combination of
process maturity with GQM will provide most useful measures.
At level 1, the project is likely to have ill-defined requirements. At this level, the
measurement of requirement characteristics is difficult.
At level 2, the requirements are well-defined and the additional information such
as the type of each requirement and the number of changes to each type can be
collected.
At level 3, intermediate activities are defined with entry and exit criteria for each
activity
The goal and question analysis will be the same, but the metric will vary with maturity.
The more mature the process, the richer will be the measurements. The GQM paradigm,
in concert with the process maturity, has been used as the basis for several tools that
assist managers in designing measurement programs.
GQM helps to understand the need for measuring the attribute, and process maturity
suggests whether we are capable of measuring it in a meaningful way. Together they
provide a context for measurement.
Validating the measurement of software system involves two steps −
Validating a software measurement system is the process of ensuring that the measure is
a proper numerical characterization of the claimed attribute by showing that the
representation condition is satisfied.
For validating a measurement system, we need both a formal model that describes
entities and a numerical mapping that preserves the attribute that we are measuring. For
example, if there are two programs P1 and P2, and we want to concatenate those
programs, then we expect that any measure m of length to satisfy that,
If a program P1 has more length than program P2, then any measure mshould also
satisfy,
The length of the program can be measured by counting the lines of code. If this count
satisfies the above relationships, we can say that the lines of code are a valid measure of
the length.
Prediction systems are used to predict some attribute of a future entity involving a
mathematical model with associated prediction procedures.
The degree of accuracy acceptable for validation depends upon whether the prediction
system is deterministic or stochastic as well as the person doing the assessment. Some
stochastic prediction systems are more stochastic than others.
Examples of stochastic prediction systems are systems such as software cost estimation,
effort estimation, schedule estimation, etc. Hence, to validate a prediction system
formally, we must decide how stochastic it is, then compare the performance of the
prediction system with known data.
Software Measurement Metrics
Software metrics is a standard of measure that contains many activities which involve
some degree of measurement. It can be classified into three categories: product metrics,
process metrics, and project metrics.
Some metrics belong to multiple categories. For example, the in-process quality metrics
of a project are both process metrics and project metrics.
Software measurement is a diverse collection of these activities that range from models
predicting software project costs at a specific stage to measures of program structure.
Effort is expressed as a function of one or more variables such as the size of the
program, the capability of the developers and the level of reuse. Cost and effort
estimation models have been proposed to predict the project cost during early phases in
the software life cycle. The different models proposed are −
Productivity can be considered as a function of the value and the cost. Each can be
decomposed into different measurable size, functionality, time, money, etc. Different
possible components of a productivity model can be expressed in the following diagram.
Data Collection
The quality of any measurement program is clearly dependent on careful data collection.
Data collected can be distilled into simple charts and graphs so that the managers can
understand the progress and problem of the development. Data collection is also
essential for scientific investigation of relationships and trends.
Quality models have been developed for the measurement of quality of the product
without which productivity is meaningless. These quality models can be combined with
productivity model for measuring the correct productivity. These models are usually
constructed in a tree-like fashion. The upper branches hold important high level quality
factors such as reliability and usability.
The notion of divide and conquer approach has been implemented as a standard
approach to measuring software quality.
Reliability Models
Most quality models include reliability as a component factor, however, the need to
predict and measure reliability has led to a separate specialization in reliability modeling
and prediction. The basic problem in reliability theory is to predict when a system will
eventually fail.
Here we measure the structural attributes of representations of the software, which are
available in advance of execution. Then we try to establish empirically predictive theories
to support quality assurance, quality control, and quality prediction.
Management by Metrics
For managing the software project, measurement has a vital role. For checking whether
the project is on track, users and developers can rely on the measurement-based chart
and graph. The standard set of measurements and reporting methods are especially
important when the software is embedded in a product where the customers are not
usually well-versed in software terminology.
This depends on the experimental design, proper identification of factors likely to affect
the outcome and appropriate measurement of factor attributes.
Data Manipulation
Software metrics is a standard of measure that contains many activities, which involves
some degree of measurement. The success in the software measurement lies in the
quality of the data collected and analyzed.
The data collected can be considered as a good data, if it can produce the answers for
the following questions −
Are they correct? − A data can be considered correct, if it was collected according
to the exact rules of the definition of the metric.
Are they accurate? − Accuracy refers to the difference between the data and the
actual value.
Are they appropriately precise? − Precision deals with the number of decimal
places needed to express the data.
Are they consistent? − Data can be considered as consistent, if it doesn’t show a
major difference from one measuring device to another.
Are they associated with a particular activity or time period? − If the data is
associated with a particular activity or time period, then it should be clearly
specified in the data.
Can they be replicated? − Normally, the investigations such as surveys, case
studies, and experiments are frequently repeated under different circumstances.
Hence, the data should also be possible to replicate easily.
Raw data − Raw data results from the initial measurement of process, products, or
resources. For example: Weekly timesheet of the employees in an organization.
Refined data − Refined data results from extracting essential data elements from
the raw data for deriving values for attributes.
Location
Timing
Symptoms
End result
Mechanism
Cause
Severity
Cost
Collection of data requires human observation and reporting. Managers, system analysts,
programmers, testers, and users must record row data on forms. To collect accurate and
complete data, it is important to −
Data collection planning must begin when project planning begins. Actual data collection
takes place during many phases of development.
For example − Some data related to project personnel can be collected at the start of
the project, while other data collection such as effort begins at project starting and
continues through operation and maintenance.
Once the database is designed and populated with data, we can make use of the data
manipulation languages to extract the data for analysis.
After collecting relevant data, we have to analyze it in an appropriate way. There are
three major items to consider for choosing the analysis technique.
To analyze the data, we must also look at the larger population represented by the data
as well as the distribution of that data.
Sampling is the process of selecting a set of data from a large population. Sample
statistics describe and summarize the measures obtained from a group of experimental
subjects.
Population parameters represent the values that would be obtained if all possible subjects
were measured.
The population or sample can be described by the measures of central tendency such as
mean, median, and mode and measures of dispersion such as variance and standard
deviation. Many sets of data are distributed normally as shown in the following graph.
As shown above, data will be evenly distributed about the mean. which is the significant
characteristics of a normal distribution.
Other distributions also exist where the data is skewed so that there are more data
points on one side of the mean than other. For example: If most of the data is present on
the left-hand side of the mean, then we can say that the distribution is skewed to the
left.
To confirm a theory
To explore a relationship
To achieve each of these, the objective should be expressed formally in terms of the
hypothesis, and the analysis must address the hypothesis directly.
To confirm a theory
The investigation must be designed to explore the truth of a theory. The theory usually
states that the use of a certain method, tool, or technique has a particular effect on the
subjects, making it better in some way than another.
There are two cases of data to be considered: normal data and non-normal data.
If the data is from a normal distribution and there are two groups to compare then, the
student’s t test can be used for analysis. If there are more than two groups to compare,
a general analysis of variance test called F-statistics can be used.
If the data is non-normal, then the data can be analyzed using Kruskal-Wallis test by
ranking it.
To explore a relationship
Investigations are designed to determine the relationship among data points describing
one variable or multiple variables.
There are three techniques to answer the questions about a relationship: box plots,
scatter plots, and correlation analysis.
A box plot can represent the summary of the range of a set of data.
A scatter plot represents the relationship between two variables.
Correlation analysis uses statistical methods to confirm whether there is a true
relationship between two attributes.
o For normally distributed values, use Pearson Correlation Coefficient to
check whether or not the two variables are highly correlated.
o For non- normal data, rank the data and use the Spearman Rank
Correlation Coefficient as a measure of association. Another measure for
non-normal data is the Kendall robust correlation coefficient, which
investigates the relationship among pairs of data points and can identify a
partial correlation.
Design Considerations
The investigation’s design must be considered while choosing the analysis techniques. At
the same time, the complexity of analysis can influence the design chosen. Multiple
groups use F-statistics rather than Student’s T-test with two groups.
For complex factorial designs with more than two factors, more sophisticated test of
association and significance is needed.
Statistical techniques can be used to account for the effect of one set of variables on
others, or to compensate for the timing or learning effects.
Internal product attributes describe the software products in a way that is dependent
only on the product itself. The major reason for measuring internal product attributes is
that, it will help monitor and control the products during development.
The main internal product attributes include size and structure. Size can be measured
statically without having to execute them. The size of the product tells us about the effort
needed to create it. Similarly, the structure of the product plays an important role in
designing the maintenance of the product.
Length
There are three development products whose size measurement is useful for predicting
the effort needed for prediction. They are specification, design, and code.
The diagrams in the documents have uniform syntax such as labelled digraphs, data-flow
diagrams or Z schemas. Since specification and design documents consist of texts and
diagrams, its length can be measured in terms of a pair of numbers representing the text
length and the diagram length.
For these measurements, the atomic objects are to be defined for different types of
diagrams and symbols.
The atomic objects for data flow diagrams are processes, external entities, data stores,
and data flows. The atomic entities for algebraic specifications are sorts, functions,
operations, and axioms. The atomic entities for Z schemas are the various lines
appearing in the specification.
Code
Code can be produced in different ways such as procedural language, object orientation,
and visual programming. The most commonly used traditional measure of source code
program length is the Lines of code (LOC).
i.e.,
Apart from the line of code, other alternatives such as the size and complexity suggested
by Maurice Halsted can also be used for measuring the length.
N=N1+N2N=N1+N2
The vocabulary of P is
μ=μ1+μ2μ=μ1+μ2
The volume of program = No. of mental comparisons needed to write a program of
length N, is
V=N×log2μV=N×log2μ
L=V∗VL=V∗V
Where, V∗V∗ is the potential volume, i.e., the volume of the minimal size implementation
of P
D=1╱LD=1╱L
L′=1╱D=2μ1×μ2N2L′=1╱D=2μ1×μ2N2
E=V╱L=μ1N2Nlog2μ2μ2E=V╱L=μ1N2Nlog2μ2μ2
In terms of the number of bytes of computer storage required for the program text
In terms of the number of characters in the program text
Object-oriented development suggests new ways to measure length. Pfleeger et al. found
that a count of objects and methods led to more accurate productivity estimates than
those using lines of code.
Functionality
The amount of functionality inherent in a product gives the measure of product size.
There are so many different methods to measure the functionality of software products.
We will discuss one such method ─ the Albrecht’s Function Point method ─ in the next
chapter.
FP (Function Point) is the most widespread functional type metrics suitable for
quantifying a software application. It is based on five users identifiable logical "functions",
which are divided into two data function types and three transactional function types. For
a given software application, each of these elements is quantified and weighted, counting
its characteristic elements, such as file references or logical fields.
The resulting numbers (Unadjusted FP) are grouped into Added, Changed, or Deleted
functions sets, and combined with the Value Adjustment Factor (VAF) to obtain the final
number of FP. A distinct final formula is used for each count type: Application,
Development Project, or Enhancement Project.
Let us now understand how to apply the Albrecht’s Function Point method. Its procedure
is as follows −
Determine the number of components (EI, EO, EQ, ILF, and ELF)
FTRs DETs
For files (ILF and ELF), the rating is based on the RET and DET.
o RET − The number of user-recognizable data elements in an ILF or ELF.
o DET − The number of user-recognizable fields.
o Based on the following table, an ILF that contains 10 data elements and 5
fields would be ranked as high.
RETs DETs
Rating Values
EO EQ EI ILF ELF
Low 4 3 3 7 5
Average 5 4 4 10 7
High 6 5 6 15 10
GSC 2 Distributed data How are distributed data and processing functions handled?
processing
GSC 3 Performance Was the response time or throughput required by the user?
GSC 4 Heavily used How heavily used is the current hardware platform where
configuration the application will be executed?
GSC 6 On-Line data entry What percentage of the information is entered online?
GSC 7 End-user efficiency Was the application designed for end-user efficiency?
GSC 8 On-Line update How many ILFs are updated by online transaction?
GSC 9 Complex processing Does the application have extensive logical or mathematical
processing?
Complexity
Measuring Complexity
One aspect of complexity is efficiency. It measures any software product that can be
modeled as an algorithm.
The control flow measures are usually modeled with directed graph, where each node or
point corresponds to program statements, and each arc or directed edge indicates the
flow of control from one statement to another. Theses graphs are called control-flow
graph or directed graph.
If ‘m’ is a structural measure defined in terms of the flow graph model, and if
program A is structurally more complex than program B, then the measure m(A) should
be greater than m(B).
Data flow or information flow can be inter-modular (flow of information within the
modules) or intra-modular (flow of information between individual modules and the rest
of the system).
According to the way in which data is moving through the system, it can be classified into
the following −
Local direct flow − If either a module invokes a second module and passes
information to it or the invoked module returns a result to the caller.
Local indirect flow − If the invoked module returns information that is
subsequently passed to a second invoked module.
Global flow − If information flows from one module to another through a global
data structure.
Information flow complexity can be expressed according to Henry and Kafura as,
Information flow complexity (M) = length (M) × fan-in (M) × (fan-out (M))2
Where,
Fan-in (M) − The number of local flows that terminate at M + the number of data
structures from which the information is retrieved by M.
Fan–out (M) − The number of local flows that emanate from M + the number of
data structures that are updated by M.
Locally, the amount of structure in each data item will be measured. A graph-theoretic
approach can be used to analyze and measure the properties of individual data
structures. In that simple data types such as integers, characters, and Booleans are
viewed as primes and the various operations that enable us to build more complex data
structures are considered. Data structure measures can then be defined hierarchically in
terms of values for the primes and values associated with the various operations.