0% found this document useful (0 votes)
10 views82 pages

11Ch - Fote 1

Uploaded by

Phan Phúc Tài
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views82 pages

11Ch - Fote 1

Uploaded by

Phan Phúc Tài
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

655

14

Reliability Data Analysis

14.1 Introduction
This chapter gives an introduction to reliability data analysis, also known as
survival analysis and lifetime analysis. The dataset to be analyzed consists of
lifetimes that are measured from a starting time to an endpoint of interest. The
starting time is usually the time when the item is put into operation for the first
time, but may also be the time when we start observing the item. The endpoint of
interest is usually a failure event, sometimes restricted to a specific failure mode.
For many datasets, the data collection is stopped before all the items fail. This
means that some items are still in a functioning state when the data collection
stops. The recorded times for such items are therefore not times-to-failure, but
the time measured from the starting time until the data collection stopped. These
times are said to be censored times, and a dataset with one or more censored times
is called a censored dataset.
Some datasets include one or more explanatory variables such as pressure, tem-
perature, flow-rate, and vibration. These variables are called covariates and help
explain why there are differences between the times-to-failure of the same type of
items.
Reliability data analysis is a loosely defined term that encompasses a variety of
statistical methods for analyzing positive-valued datasets. The methods presented
in this chapter are also extensively used in biostatistics and medical research under
the heading survival analysis.
A high number of books have been published on this topic, but to recommend
one of these for further study is difficult and will depend on your particular appli-
cation.
To analyze reliability data, we need to use a suitable computer program. Many
programs are available, and it is difficult to claim that one is better than all the
others. In this book, we have chosen the program R because it covers most of the

System Reliability Theory: Models, Statistical Methods, and Applications, Third Edition.
Marvin Rausand, Anne Barros, and Arnljot Høyland.
© 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc.
Companion Website: www.wiley.com/go/SystemReliabilityTheory3e
656 14 Reliability Data Analysis

techniques dealt with in this chapter, because it is used by many universities, and
because it is free software that can be run on all major computer platforms.
By searching the Internet for “survival analysis,” “survival models,” and simi-
lar terms, you find an almost endless number of presentations, lecture notes, and
slides. Most of these are made for medical applications, but they are in most cases
still relevant for the content of this chapter.

14.1.1 Purpose of the Chapter


The purpose of this chapter is to give an introduction to reliability data analysis
that can be understood based on the material presented in the previous chapters.
We focus on explaining the basic concepts and how the various methods can be
used and do not dig deeply into the theoretical problems. We do, however, give
references to where interested readers may find more extensive information. We
assume that the reader has installed the program R on an available computer and
has become a bit familiar with how it is used. We illustrate how R may be used
in the analyses and present simple R scripts, such that the reader may repeat the
analyses for other datasets. More complete R scripts may be found on the book
companion site.

14.2 Some Basic Concepts


Before discussing the various analysis methods, we need to introduce the main
terminology to be used.
Population. A population is a set of similar items or events that are of interest for
some question or experiment. The population may, for example be
● All the valves of the same type in a plant.

● All the mobile phones of a particular brand.

● All the brakes of the same type used in the railway rolling stock within a

country.
Model. To study an aspect of a population, we define a random variable X that may
give us information about this aspect. To be able to use statistical methods in our
study, we establish a probabilistic model, M, related to the random variable X.
The model may be parametric, nonparametric, or semiparametric. As a starting
point, we assume that the model M is parametric with some parameter 𝜃. The
parameter is fixed but unknown, applies for the population, and is sometimes
called a population parameter.
If X is a discrete variable, the model is formulated by a conditional probability
mass function Pr(X = x ∣ 𝜃), where 𝜃 is fixed, but unknown. If X is a continuous
variable, the model is formulated by a probability density function f (x ∣ 𝜃).
14.2 Some Basic Concepts 657

Sample. To study the entire population is usually too time-consuming and


expensive, and we therefore suffice by studying a sample from the population.
A sample is a subset of a population, collected or selected by a defined sampling
procedure. When the sampling is random, the sample is said to be a random
sample. In reliability studies, the sample is not always random, and we have to
suffice with the sample that is possible to get.
Experiment. To get information about the random variable X, we carry out inde-
pendent and identical experiments of the n items in the sample. When the n
experiments are completed, we have the dataset x1 , x2 , … , xn . The joint distri-
bution of obtaining this dataset is because of independence:

n
f (x1 , x2 , … , xn | 𝜃) = f (xi | 𝜃), (14.1)
i=1

for a continuous variable. The expression for a discrete variable is left to the
reader.
Inference. Inference is a procedure to use information gathered from a sample to
make statements about the population from which the sample was taken. The
main concepts involved in statistical inference are shown in Figure 14.1.

14.2.1 Datasets
The starting point in this chapter is a dataset containing a random sample from
a population of independent items. Throughout this chapter, we assume that the
time-to-failure of an item is a nonnegative random variable T. In most applica-
tions, we assume that we observe n identical items with random times-to-failure

Population Sample

Sampling

Tests/measurements

Data = {x1, x2, ... , xn}

Probabilistic model Analysis

Inference
Conclusions on the population Results from sample

Figure 14.1 Main concepts of statistical inference.


658 14 Reliability Data Analysis

Ti , T2 , … , Tn that are independent and identically distributed with distribution


function FT (t) and probability density function fT (t). The corresponding observed
sample survival times are denoted t1 , t2 , … , tn .
For many datasets, the observation of an item is stopped before the item has
failed, and we say that the time-to-failure is censored. There are many reasons for
censoring, including that the test equipment breaks down, the item is taken out of
service due to operational causes, or that the allocated test or observation period
is over. For each item, we assume that censoring occurs at time C that may be
deterministic or random.

14.2.2 Survival Times


When censoring is present, we cannot always observe the true time-to-failure T.
We only observe the survival time, the time until a failure or a censoring occurs, as
shown in Figure 14.2.
We may assume that two independent processes are competing to termi-
nate item i, a failure process and a censoring process. With no censoring, the
time-to-failure Ti would be observed, and with no failure, the censoring time Ci
would be observed. With both processes active, we observe the minimum of Ti
and Ci , that is min {Ti , Ci }, for i = 1, 2, … , n.
We still denote the dataset t1 , t2 , … , tn , but to each observation ti , we associate
an indicator 𝛿i , defined by
{
1 if ti ends with a failure (i.e. Ti < Ci )
𝛿i = for i = 1, 2, … , n.
0 if ti ends with censoring (i.e. Ti > Ci )
We call the indicator 𝛿i the status of survival time ti . The dataset therefore consists
of n duplets (ti , 𝛿i ), for i = 1, 2, … , n, telling how long time the item survived and
whether the observation stopped with a failure (F) or a censoring (C).
In this chapter, we assume that survival time ti is measured from when item i was
new. In many practical applications, item i has a certain age ti(0) when the obser-
vation starts. Here, we assume that ti(0) = 0 for all i = 1, 2, … , n, and we further
assume that all survival times can be shifted to a common starting point, without
loss of information. This is shown in Figure 14.3.

True time-to-failure

Observed survival time


Censored

Figure 14.2 Time-to-failure and Observed Survival Time.


14.2 Some Basic Concepts 659

Item no.
1 C
2 F
3 C
4 F
. .
. .
. .
n F

0 Time
(a)
Item no.
1 C
2 F
3 C
4. . F
. .
. .
n F

0 Time
(b)

Figure 14.3 An observed dataset (a), and the same dataset shifted to time 0 (b). F
denotes a failure and C denotes censoring.

Entering Survival Times into R


The dataset may be entered into R in several ways. The most common is as follows:
(i) a spreadsheet file, (ii) comma-separated values (CSV) file, or (iii) manually as
one or more vectors. For the last option, we enter the ordered survival times, for
example

ti 17.88 28.92 33.00 41.52 42.12 45.60


𝛿i 1 0 1 1 1 0

We denote the vector of survival times survtime and the status vector status
and enter the following in the R script

survtime <- c(17.88,28.92,33.00,41.52,42.12,45.60)


status <- c(1,0,1,1,1,0)
660 14 Reliability Data Analysis

For a dataset with many survival times, it may be wise to enter the data into a
spreadsheet program and save the file either as a CSV or as an Excel® file.1

The Survival R Package


The R package survival contains many of the survival analysis functions used
in this chapter. How the package is loaded into your R session is illustrated by the
following R script, which is based on the data in the R script above. Before load-
ing the script, you must have installed the package survival, by the command
install.packages(’survival’). The new dataset is called my.surv and
is prepared for further analysis by the function Surv.

library(survival) # Activate the package survival


survtime <- c(17.88,28.92,33.00,41.52,42.12,45.60)
status <- c(1,0,1,1,1,0)
# Arrange and give the dataset a name
my.surv <- Surv(survtime,status)
# Display the dataset my.surv
print(my.surv)

Running this script in R gives the output

> print(my.surv)
[1] 17.88 28.92+ 33.00 41.52 42.12 45.60+

Observe that + is added to the censored survival times. The + indicates that the
time-to-failure would have been somewhat longer if the survival time were not
censored.

14.2.3 Categories of Censored Datasets


This section describes four main types of censoring and two subtypes.

Censoring of Type I
A life test of n numbered and identical items is carried to gain information about
the probability distribution of the time-to-failure T of the items. A specific time
interval [0, 𝜏] has been allocated for the test. After the test, only the times-to-failure
of those items that failed before 𝜏 are known.

1 Commands to import data files into R may be found by searching the Internet for “import data
into R.”
14.2 Some Basic Concepts 661

This type of censoring is called censoring of type I, and the information in the
dataset consists of s (≤ n) observed, ordered survival times

t(1) ≤ t(2) ≤ · · · ≤ t(s) .

In addition, we know that (n − s) items have survived the time 𝜏, and this infor-
mation should also be used.
Because the number of items that fail before time 𝜏 obviously is random, there
is a chance that none or relatively few of the items will fail before 𝜏. This may be
a weakness of the test design.

Censoring of Type II
Consider the same life test as for censoring of type I, but assume that it has been
decided to continue the test until exactly r (< n) failures have occurred. The test
is therefore stopped when the rth failure occurs. This censoring is called censoring
of type II, and the dataset obtained from the test consists of

t(1) ≤ t(2) ≤ · · · ≤ t(r) ,

together with the fact that (n − r) items have survived the time t(r) .
In this case, the number r of recorded failures is not random. The price for
obtaining this is that the time t(r) to complete the test, is random. A weakness of
this design is therefore that we cannot know beforehand how long time the test
will last.

Censoring of Type III


Type III censoring is a combination of the first two types. The test terminates at
the time that occurs first, 𝜏 or the rth failure (𝜏 and r must both be fixed before the
test starts).

Censoring of Type IV
Consider a life test of n numbered identical items. Each item may either run to fail-
ure or be censored at a random time C. The time-to-failure T is, as before, assumed
to have distribution function FT (t) and probability density function fT (t), whereas
the censoring time C has distribution function FC (c) and probability density func-
tion fC (c). The two random variables T and C are assumed to be independent. The
survival time we observe is therefore the minimum of T and C.
This censoring is called censoring of type IV and is sometimes also called random
censoring. Many of the datasets that are relevant for reliability studies have random
censoring, especially when the datasets originate from systems in operation.
662 14 Reliability Data Analysis

Right Censoring
Right censoring means that the item is removed from the study before a failure
occurs, or that the study of the item ends before the item fails. For all the examples
in this chapter, the censoring is right censoring.

Example 14.1 (Censoring caused by other failures)


Consider a plant where two independent items are located close to each other in
a location that is difficult to access. When one of the items fails, both items are
replaced or totally refurbished. In this situation, failure of one item leads to cen-
soring of the other item. Because failures occur at random, this is an example of
random censoring (i.e. of type IV).
The same censoring applies to a data collection where only a particular failure
mode A is of interest. If another failure mode occurs, and is repaired, before failure
mode A occurs, the time-to-failure of failure mode A is censored. In this case, we
often say that we have competing failure modes. ◻

Informative Censoring
All the examples discussed in this chapter assume that the censoring is nonin-
formative. This means that the time-to-failure T is independent of the censoring
mechanism. The censoring may also be informative, for example when an item
is taken out of service because its level of performance is less than adequate, but
without failing.

14.2.4 Field Data Collection Exercises


In field data collection exercises, such as for the OREDA project, survival times
are collected from a certain time window (t1 , t2 ). We may, for example collect data
for failure events that occurred between 1 January 2015 and 31 December 2019.
At the beginning of the time window, at time t1 , the items may have different ages
ti(0) , whereas some items may be installed during the time window, often as replace-
ments for failed items. In many field data collection exercises, it is assumed that
repair of a failed item brings it back to an as-good-as-new condition.
The resulting dataset is sometimes complicated and is best entered into a spread-
sheet program, with the following columns:
Number Item number (i)
Age Age of item at the start of the data collection (ti(0) )
Start Starting time of observation (tistart )
stop
Stop Observation terminated (ti )
Status Status at stop (1 = failed, 0 = censored)

An example of such a dataset is shown in Figure 14.4.


14.3 Exploratory Data Analysis 663

Item no.
1 C
2 C
3 F

4 F
5 F
6 F
7
8 C

9 C
10 F

Observation period Time


t1 t2

Figure 14.4 Typical dataset for field data.

14.2.5 At-Risk-Set
The at-risk-set at time t is the set of items that have not failed or been censored
before time t, that is, the set of items that are at risk of failing at time t. When a
(single) failure or censoring occurs, one item is removed from the at-risk-set and
when a new item enters the study, the at-risk-set is increased by one item. The
number of items in the at-risk-set at time t is an important variable in several of
the survival analyses methods presented in this chapter.

14.3 Exploratory Data Analysis

An exploratory data analysis (EDA) is an essential first step in any data analysis.
The EDA gives a “first look at the data” before any modeling effort is done. An
EDA has two main parts: (i) calculation of a selection of sample statistics such as
the mean, median, and standard deviation, and (ii) data visualization in the form
of histograms, empirical distribution functions, Q–Q plots, and so on.
EDA helps the analyst to understand the underlying structure of the data, to
identify anomalies and outliers in the dataset, to assess the assumptions about the
data, and several more. The examination of the data helps seeing what the data
can tell us. EDA got increased importance following the publication of John W.
Tukey’s seminal book Exploratory Data Analysis (Tukey 1977).
664 14 Reliability Data Analysis

14.3.1 A Complete Dataset


The starting point of an EDA is a specific dataset. This section assumes that we
have a complete dataset t1 , t2 , … , tn where all survival times are times-to-failure.
This means that the status is 𝛿i = 1 for all items i = 1, 2, … , n, and that we do not
need to enter the status into R. All the n entries in the dataset are assumed to be
correct observations of a common variable. Many analytical methods require the
dataset to be sorted in ascending order. A sorted dataset is also called an ordered
dataset and is written as t(1) , t(2) , … , t(n) , such that t(1) ≤ t(2) , ≤ · · · ≤ t(n) .
As an illustration, we use the complete and ordered dataset of 22 observed values
in Table 14.1. We call the dataset survtime and Table 14.1 shows the most direct
way of entering the dataset into R by using the terminal.2
How the data is recorded in R is seen by launching the command print
(survtime) in the terminal. The result is:

[1] 17.88 28.92 33.00 41.52 42.12 45.60 48.40 51.84

[9] 51.96 54.12 55.56 67.80 68.64 68.64 68.88 84.12

[17] 93.12 98.64 105.12 105.84 127.92 138.04

If the dataset survtime were entered into R as an unordered set, it may be


ordered by the function sort(survtime).

Remark 14.1 (An advise)


We will illustrate several methods by using the survtime dataset. If you want
to test our examples or play with R, it may be wise to set up a textfile containing
the data. The simplest way is to write one datapoint per line (with a “full stop” as
decimal point and save the file as, for example, dataset.txt in your R working
directory.3 You may enter the textfile into R and activate the dataset survtime
by the command survtime<-read.table("dataset.txt",header=F,
dec=".")

Table 14.1 A complete and ordered dataset of survival times.

survtime <- c(17.88,28.92,33,41.52,42.12,45.6,48.4,51.84,


51.96,54.12,55.56,67.8,68.64,68.64,68.88,84.12,
93.12,98.64,105.12,105.84,127.92,138.04)

2 The terminal is called the Console in RStudio.


3 After having created the working directory, you may check the path by the command
getwd().
14.3 Exploratory Data Analysis 665

Instead of a textfile, you may alternatively create an Excel file or a CSV file (but
this requires another command to activate the data). ◻

Ties
Two or more continuously distributed survival times may sometimes be recorded
with the same value. This is called a tie and may be caused by common-cause
failures or by rounding-off. The dataset in Table 14.1 has a tie for 68.64, because
two survival times are recorded with the same value. The number of failures that
occur at time t(i) is called the multiplicity of the tie and is denoted by di . The dataset
may therefore be recorded in two different ways:
(1) The ordered dataset may be recorded as t(1) ≤ t(2) ≤ · · · ≤ t(n) with a survival
time for each item, thus realizing that some of the survival times may be equal.
(2) The ordered dataset may be recorded as nt ≤ n distinct survival times t(1) <
t(2) < · · · < t(nt ) associated with a vector giving the multiplicities of failures
d1 , d2 , … , dnt .
Most of the following sections use option 1.

14.3.2 Sample Metrics


Valuable information about the dataset can be obtained by applying sample met-
rics to the dataset. This section defines and shows how to calculate a number of
these metrics.

Mean
The mean of a dataset is a measure of the central location of the data values and
is calculated as the sum of its data values divided by the number n of data values.
1∑
n
t= t. (14.2)
n i=1 i
The R command to obtain the mean of the dataset is mean(survtime) and for
the data in Table 14.1, we obtain t = 68.08.

Median
The median tm of a dataset is the value at the middle of the ordered data. For
odd number of values (i.e. n = 2k + 1), the median is the (k + 1)th smallest in the
ordered dataset, that is t(k) . For an even number of values (i.e. n = 2k), the median
is the average of the two values in the middle of the ordered dataset, that is the
average of t(k) and t(k+1) . If, for example the sorted dataset has the six values 2, 4, 5,
7, 8, 10, the median is (5 + 7)∕2 = 6. A more formal definition is given in (14.3).
⎧t for n = 2k + 1
⎪ (k+1)
tm = ⎨ t(k) + t(k+1) . (14.3)
⎪ for n = 2k
⎩ 2
666 14 Reliability Data Analysis

The ordered dataset in Table 14.1 has n = 22 values and the median is therefore
the average of t(11) and t(12) ,
t(11) + t(12)
Median = = 61.68.
2
The same result is obtained by the R function median(survtime). Observe that
the mean value is larger than the median for this dataset.
A simple summary, including the mean and the median, of the dataset surv-
time is obtained by the command summary(survtime). If you have created
the textfile dataset.txt as recommended in Remark 14.1, you may use the
script

survtime <-read.table("dataset.txt",header=F,dec=".")
summary(survtime)

and obtain
Min. : 17.88
1st Qu.: 46.30
Median : 61.68
Mean : 68.08
3rd Qu.: 90.87
Max. :138.04
Quartiles (Qu.) are introduced below.

Variance and Standard Deviation


The variance is a measure of how the data values are dispersed around the mean
and is calculated as
1 ∑
n
s2 = (t − t)2 . (14.4)
n − 1 i=1 i
The standard deviation is the square root of the variance


√ 1 ∑ n
s=√ (t − t)2 . (14.5)
n − 1 i=1 i

The R commands to obtain the variance and the standard deviation of the dataset
are var(survtime) and sd(survtime), respectively. Observe that the stan-
dard deviation is measured with the same unit as the data values, whereas the
variance is measured with “squared units.” For the dataset in Table 14.1, the (sam-
ple) standard deviation obtained by sd(survtime) is 32.01.
14.3 Exploratory Data Analysis 667

Quantiles
For p ∈ (0, 1), the quantile of order p of the distribution FT (t) is the value tp such
that
FT (tp ) = p, which means that Pr(T ≤ tp ) = p.
For realistic life distributions, tp is unique.
Now, consider an ordered dataset t(1) ≤ t(2) ≤ · · · ≤ t(n) . The (sample) quantile of
order p may approximately be calculated as t([np]+1) , where [np] is the largest inte-
ger < np. The dataset in Table 14.1 has n = 22 values. To determine the (sample)
quantile of, say order p = 0.15, we first calculate np = 22 ⋅ 0.15 = 3.3. The largest
integer less than np is 3 and the quantile of order 0.15 is therefore t(4) = 41.52.
The (sample) quantile of order p is available in R by the function quan-
tile(survtime,p), which gives 41.61. This is not exactly the same result we
got by hand calculation because R applies a more elaborate and “correct” formula
based on interpolation of the ordered survival times. Interested readers may check
the help file in R, help(quantile).

Quartiles
The quantiles of order 0.25 and 0.75 are called the lower and upper quartiles,
respectively. The lower quartile (or 1st quartile), t0.25 is the value that cuts off the
first 25% of the ordered dataset, and the upper quartile (or 3rd quartile) t0.75 is the
value that cuts off the first 75% of the ordered dataset. Both are provided by the
command summary(survtime).

Interquartile Range
The distance between the upper and the lower quartile, t0.75 − t0.25 , called the
interquartile range, is a common measure for the dispersion of the dataset around
its mean or median. The interquartile range for the dataset in Table 14.1 is deter-
mined by quantile(survtime,0.75)-quantile(survtime,0.25) and
the result is 44.57.

Sample Moments and Central Moments


The kth (noncentral) sample moment for the dataset t1 , t2 , … , tn is defined as
1∑ k
n
mk,nc = t . (14.6)
n i=1 i
∑n
We observe that the first sample moment is t = n1 i=1 ti is the average value (i.e.
mean) of the dataset.
The kth (k ≥ 2) central sample moment is centered around its average value of
the dataset and is defined as
1∑
n
mk,c = (t − t )k . (14.7)
n i=1 i
668 14 Reliability Data Analysis

Moments are available in the R package moments that must be installed in R


before you can use it.

library(moments)
survtime <-read.table("dataset.txt",header=F,dec=".")
k <-3 # Choose the order of the noncentral moment
moment(survtime,order = k,central=F)
moment(survtime,order=k,central=T)

The result for the dataset in Table 14.1 is

Order (k) Noncentral moment Central moment

2 5612.752 978.3606
3 534022.7 18720.53

Skewness
Skewness is a measure of the asymmetry of the dataset. The skewness value can be
positive or negative. When the distribution of the values of the dataset is symmet-
ric, the skewness is zero. When the values are predominantly large (but with some
small values), the skewness is negative and when the values are predominantly
small (with some large values), the skewness is positive.
The skewness 𝛾1 is defined by
m3,c
𝛾1 = 3∕2
, (14.8)
m2,c
where mk,c is the kth central sample moment of the dataset. The skewness 𝛾1 is
available in the R package moments by the command skewness(survtime).
The result is 0.6117442, which indicates that the dataset is slightly skewed to
the left.

Kurtosis
The kurtosis describes the shape of the tails of the distribution of the values in the
dataset. The normal distribution has zero kurtosis. Negative kurtosis indicate a
thin tail of the distribution and positive kurtosis indicate a thicker tail.
The kurtosis 𝛾2 is defined as
m4,c
𝛾2 = − 3, (14.9)
m23,c
14.3 Exploratory Data Analysis 669

where 𝜇k is the kth central moment of the dataset. The kurtosis is available in the
R package moments by the command kurtosis(survtime) and the result for
the dataset in Table 14.1 is 2.555 003.

14.3.3 Histogram
A histogram consists of parallel bars that graphically show the frequency distribu-
tion of a variable. As a default, all the bars have the same width. We may choose
the number of bars to display. We may also choose whether to (i) show the number
of values in the dataset that fall into the interval corresponding to the width of the
bar, or (ii) to show the relative number (or percentage) of the values that fall into
the interval. With option (ii), the histogram is said to show the relative frequency
distribution or the distribution density of the values in the dataset.
The information obtained from the histogram depends on the resolution, that
is how many intervals we choose. Figure 14.5 shows three different histograms of
the data in Table 14.1, with different numbers of columns.
0.004 0.008

0.010
Density
Density
0.000

0.000

0 50 100 150 0 20 60 100 140


Time to failure Time to failure
(a) (b)
0.02
Density
0.00

20 60 100 140
Time to failure
(c)

Figure 14.5 Histogram of the dataset in Table 14.1 with different numbers of columns:
(a) 3 columns, (b) 7 columns, and (c) 26 columns.
670 14 Reliability Data Analysis

It is not always true that a higher resolution makes it easier to understand the
distribution of the data.
Histograms are established by the R script

survtime <-read.table("dataset.txt",header=F, dec=".")


hist(survtime$V1,breaks=3,freq=F) # Plots
the histogram

This script provides a histogram according to option (i). A relative frequency


histogram [option (ii)] is obtained by replacing freq=F with freq=T, where F
is an abbreviation for false and T is an abbreviation for true.
The reader is encouraged to run the script with different values for breaks.

14.3.4 Density Plot


The distribution of the dataset can also be illustrated by a sample density plot, by
using the R script:

survtime <-read.table("dataset.txt",header=F, dec=".")


d <- density(survtime) # Returns the density data
plot(d) # Plots the results

The resulting plot is shown in Figure 14.6. The plot is made by an averaging
technique and is based on a set of input parameters. The current plot is made with
the default parameters of the density command. Other parameters and other

0.015
Sample density

0.010

0.005

0.000
0 20 40 60 80 100 120 140
Time t

Figure 14.6 A sample density plot of the dataset in Table 14.1.


14.3 Exploratory Data Analysis 671

averaging techniques may be chosen. Interested readers may consult the help file,
by the command help(density) in the R terminal (console).

14.3.5 Empirical Survivor Function


The survivor function R(t) = Pr(T > t) is the probability that an item from the pop-
ulation will still be functioning at time t. When a complete dataset is available, the
survivor function may be estimated by the empirical survivor function Rn (t)
Number of items with survival time > t
Rn (t) = . (14.10)
n
Rn (t) is from (14.10) seen to be the relative frequency of items that survive time t
and is therefore an obvious estimate for R(t).
For a censored dataset, the estimate Rn (t) changes only at the failure times t(i) .
Between two failure times, such as in an interval t(i) ≤ t < t(i+1) , the number of
failures does not change, and Rn (t) remains constant. Observe that Rn (t) is reduced
by 1∕n each time a failure occurs. If more than one failure occurs at the same
failure time t and the tie has multiplicity d, Rn (t) is reduced by d∕n.
Consider a sample of n items from a population and let N(t) be the number of
these items that survive time t. We may consider this as a binomial experiment
with n independent trials and probability R(t) of survival, and write the estimator
for R(t) as4

̂(t) = N(t) .
R (14.11)
n
The random variable N(t) has a binomial distribution with probability mass func-
tion
( )
n
Pr(N(t) = m) = R(t)m [1 − R(t)]n−m for m = 0, 1, … , n,
m
with mean and variance
E[N(t)] = nR(t).
var[N(t)] = nR(t)[1 − R(t)].

The mean of the estimator is E[R ̂(t)] = nR(t)∕n = R(t) and the estimator is there-
fore unbiased. The variance of the estimator is
̂(t)] = var[N(t)] = R(t)[1 − R(t)] −−−−→ 0.
var[R
n2 n n→∞

The plot of Rn (t), as shown in Figure 14.7, for the dataset in Table 14.1 is also
called a survival curve and can be made with R using several different packages.

4 Estimators are discussed in Section 14.4.


672 14 Reliability Data Analysis

1.0
Survival probability

0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120 140
Time t

Figure 14.7 Empirical survivor function (survival curve) for the dataset in Table 14.1.

The authors prefer the package survival and the survival curve is obtained by
the script:

library(survival)
survtime <-read.table("dataset.txt",header=F,dec=".")
# Prepare the data and calculate required values
data<- Surv(survtime)
survfunct<- survfit(Surv(survtime)∼1,conf.type="none")
plot(survfunct, xlab="Time t", ylab="Survival
probability")

A 95% pointwise confidence interval is obtained by replacing conf.type=


’none’ with conf.type=’plain’ in the script above. The plot obtained is
shown in Figure 14.8.

1.0
Survival probability

0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120 140
Time t

Figure 14.8 Empirical survivor function (survival curve) for the dataset in Table 14.1
with 95% confidence intervals.
14.3 Exploratory Data Analysis 673

14.3.6 Q–Q Plot


A Q–Q plot compares the quantiles of the dataset with the quantiles of a specified
probability distribution F(t). The plot is constructed by plotting the kth smallest
observation out of n against the expected value of the kth smallest observation out
of n from a random sample from F(t). To construct such a plot by hand calculation
is time-consuming, and we need a computer program. A Q–Q plot for the normal
distribution  (0, 1) is available in R by the function qqnorm.
If the observations are approximately normally distributed, a normal Q–Q plot of
the observations results in an approximately straight line. The R script to produce
the Q–Q plot for the dataset survtime in Table 14.1 is

survtime<-read.table("dataset.txt",header=F,dec=".")
x<-survtime$V1
qqnorm(x)
qqline(x)

The resulting plot for the dataset in Table 14.1 is shown in Figure 14.9. The Q–Q
plot shows a fairly good fit to the normal distribution for this particular dataset.
If we consider the fit to the normal distribution to be acceptable, the parameters
of the normal distribution can be estimated from the slope and intercept of the
straight line. This is not pursued any further here.5
Q–Q plots for general distributions may be obtained in R by using the function
qqplot. To use this function, we need to compare our dataset with a simulated
dataset from the distribution we want to compare our data with. Assume that we
want to compare the data in Table 14.1 with the exponential distribution with rate
𝜆 = 1. A random sample of size, say, 300 from the exponential distribution is gen-
erated in R by the function rexp(300,rate=1). The Q–Q plot comparing the
data in Table 14.1 with the exponential distribution is obtained by the script

survtime <-read.table("dataset.txt",header=F,dec=".")
y<- survtime$V1
qqplot(rexp(300,rate=1),y)

The Q–Q plot produced by this script is shown in Figure 14.10.


Because the data from the exponential distribution are simulated, you do not get
exactly the same figure when re-running the script. The exponential Q–Q plot in
Figure 14.9 is rather far from a straight line, and we may therefore conclude that
the data probably do not come from the exponential distribution.
5 See, for example, https://en.wikipedia.org/wiki/Q-Q_plot.
674 14 Reliability Data Analysis

140 Figure 14.9 Normal Q–Q


plot for the dataset in
Table 14.1, made with the R
120 function qqnorm.
Survtime quantiles

100

80

60

40

20

−2 −1 0 1 2
Theoretical quantiles

140 Figure 14.10 Exponential


Q–Q plot for the dataset in
Table 14.1, made with the R
120 function qqplot.

100

80
y

60

40

20
0 1 2 3 4 5 6
Simulated exponential distribution

14.4 Parameter Estimation

Probability distributions usually have one or more quantities that we call parame-
ters. Examples of parameters include 𝜆 in the exponential distribution exp(𝜆) and
the mean 𝜇 and the standard deviation 𝜎 in the normal distribution  (𝜇, 𝜎 2 ).
14.4 Parameter Estimation 675

Parameters are generally – at least partly – unknown and cannot be measured


directly.
This chapter deals with a population of similar items, and we establish a prob-
abilistic model for a typical item in the population. The parameters of this model
are therefore called population parameters. To get information about population
parameters, we take a random sample of a certain number (n) of elements from
the population and measure some properties of each element. We then obtain a
dataset {t1 , t2 , … , tn }. Each measurement may be a scalar or a vector of values.
This process is shown in Figure 14.1.
Parameter estimation is the process of obtaining information about the param-
eter(s), based on the dataset. As part of this process, we have to answer ques-
tions such as (i) Which measurable properties of the sample elements shall be
measured? (ii) How shall we combine these measurements to provide informa-
tion about the population parameters? (iii) How accurate is this information? This
section will shed some light on the parameter estimation process, but first, we need
some terminology.

14.4.1 Estimators and Estimates


An estimator of a parameter 𝜃 is a statistic (i.e. a random variable) that is often
denoted 𝜃.̂ An estimator 𝜃̂ may be considered a metric where observed data can be
input to calculate an estimate (i.e. a numeric value) for 𝜃. This estimator is some-
times called a point estimator and the corresponding estimate is called a point
estimate for 𝜃.
We may also talk about interval estimators and interval estimates for 𝜃. For inter-
val estimators a probability, often called a confidence level, is specified as the prob-
ability that the interval contains the “true” value of the parameter. The interval is
also called a confidence interval for 𝜃.

14.4.2 Properties of Estimators


An estimator 𝜃̂ may be judged by the following features:

Unbiased
An estimator 𝜃̂ is said to be an unbiased (point) estimator for 𝜃 if its expected value
̂ = 𝜃. An unbiased estimator will not sys-
is equal to the parameter, that is, if E(𝜃)
tematically overestimate or underestimate the “true” parameter.
An estimator that is not unbiased is said to be biased. The bias is calculated as
̂ = E(𝜃)
bn (𝜃) ̂ − 𝜃.
An estimator 𝜃̂ is said to be asymptotically unbiased if limn→∞ bn (𝜃)̂ = 0.
676 14 Reliability Data Analysis

Small Variance
The estimator 𝜃̂ should preferably have a small spread or variability, that is, a small
variance and standard deviation.

Mean Squared Error


The mean squared error (MSE) of the estimator 𝜃̂ for the parameter 𝜃 is defined as
̂ = E(𝜽
MSE(𝜽) ̂ − 𝜽)2 = [bn (𝜽)]
̂ 2 + var(𝜽).
̂ (14.12)
̂ is said to be efficient if it has the smallest MSE among all competing
The estimator 𝜽
estimators.

Consistency
An estimator 𝜃̂ is said to be a consistent (point) estimator for 𝜃 if 𝜃̂ → 𝜃 when the
sample size n increases. More formally, we say that the estimator 𝜃̂ is consistent if
we for all 𝜀 > 0 have that
̂ − 𝜽| > 𝜀) → 0
Pr(|𝜽 when n → ∞. (14.13)
This means that the distribution of 𝜃̂ becomes more and more concentrated around
the “true” value of 𝜃 as the sample size increases.

Chebyshev’s Inequality
Chebyshev6 showed that for all 𝜀 > 0
̂ − 𝜽)2
E(𝜽 MSE(𝜽)̂
̂ − 𝜽| ≥ 𝜀) ≤
Pr(|𝜽 = . (14.14)
𝜀2 𝜀2
̂ tends to 0 when n → ∞, then 𝜽
If we can prove that the MSE of 𝜽 ̂ is consistent.
Estimator properties are illustrated in the following example.

Example 14.2 (Binomial model)


Consider a sequence of n independent and identically distributed Bernoulli trials
with probability p for a specific outcome A. Let X be the number of trials that
result in the outcome A. The random variable X is then binomially distributed,
binom(n, p)
( )
n x
Pr(X = x ∣ p) = p (1 − p)n−x for x = 0, 1, … , n.
x
The mean and variance of X is
E(X) = np.
var(X) = np(1 − p).

6 Named after the Russian mathematician Pafnuty Lvovich Chebyshev (1821–1894).


14.4 Parameter Estimation 677

It may be natural to estimate p as the relative frequency of the outcomes that result
in A, and a natural estimator is therefore
X
̂
p= . (14.15)
n
This estimator is seen to be unbiased, because
E(X)
E(̂
p) = = p.
n
The estimator ̂
p is consistent because it is unbiased and
var(X) np(1 − p)
var(̂
p) = = → 0 when n → ∞.
n2 n2
If we, for example, carry out n = 50 independent Bernoulli trials and get x = 3 out-
comes A, we may put this dataset into the estimator and obtain the point estimate
̂
p = 3∕50 = 0.06. Again, observe that the estimator ̂
p is a random variable, whereas
the estimate is a numerical value. ◻

Remark 14.2 (Confusing symbols)


Observe that it may be confusing to use the same symbol (here ̂
p) for both the
estimator and the estimate. The same confusion is found in almost all relevant
textbooks and papers. ◻

To find adequate parameter estimators, we may use some general approaches or


methods. In this book, we suffice by describing three popular methods for point
estimation:
(1) Method of moments estimation (MME)
(2) Maximum likelihood estimation (MLE)
(3) Bayesian estimation, which is treated in Chapter 15.

14.4.3 Method of Moments Estimation


Consider a random variable T. The first population moment of T is the same as the
mean value E(T), and the kth (noncentral) population moment is E(T k ) (if this
mean value exists).
MME is based on the assumption that the sample moments are good esti-
mates of the corresponding population moments. Assume that we have a
sample T1 , T2 , … , Tn from a distribution F(t ∣ 𝜽), where the parameter vector is
𝜽 = (𝜃1 , 𝜃2 , … , 𝜃k ). The procedure to determine the MME of the parameters has
three steps.
(1) Find the k first noncentral population moments 𝜇1,nc , 𝜇2,nc , … , 𝜇k,nc . Each
moment will contain one or more of the parameters 𝜃1 , 𝜃2 , … , 𝜃k .
678 14 Reliability Data Analysis

(2) Find the k first noncentral sample moments m1,nc , m2,nc , … , mk,nc .
(3) From the system of equations 𝜇i,nc = mi,nc , for i = 1, 2, … , k, solve for the
̂ = (𝜃̂1 , 𝜃̂2 , … , 𝜃̂k ).
parameters 𝜽 = (𝜃1 , 𝜃2 , … , 𝜃k ). The solution is the MME 𝜽
Recall that we – from the law of large numbers – know that the first sample
moment converges to the first population moment (i.e. the population mean).

1∑
n
m1,nc → 𝜇1,nc that is T → E(T) for n → ∞, (14.16)
n i=1 i
but we do not know much about the higher moments (i.e. for k ≥ 2).
We illustrate the MME procedure by two examples.

Example 14.3 (Exponential distribution)


We observe the times-to-failure of n similar items. The times-to-failure are denoted
by T1 , T2 , … , Tn , and we assume that they are independent and identically dis-
tributed with constant failure rate 𝜆, such that Ti ∼ exp(𝜆), for i = 1, 2, … , n. In
this case, we have only one unknown parameter to estimate, and we can suffice
with considering only the first (population) moment E(T) = 1∕𝜆. The first sample
∑n
moment is given by the metric T = n1 i=1 Ti . The method of moment estimator
for the parameter 𝜆 is therefore determined from E(T) = T,

1∑
n
1
= T.
𝜆 n i=1 i
Solving for 𝜆, we obtain the MME
n
𝜆̂ = ∑n .
i=1 Ti

Assume that we have a complete dataset with n = 8 items that have run to failure
∑8
and with a total time in operation i=1 ti = 25 800 hours. The MME (estimate) of
the failure rate 𝜆 with this dataset is then
8
𝜆̂ = h−1 ≈ 3.10 × 10−4 h−1 .
25 800 ◻

Example 14.4 (Gamma distribution)


Let T1 , T2 , … , Tn be a random sample of n independent gamma distributed ran-
dom variables, such that Ti ∼ gamma(𝛼, 𝜆), for i = 1, 2, … , n. The first population
moment (i.e. the mean) is from Chapter 5, 𝜇1 = E(Ti ) = 𝛼∕𝜆. The variance of Ti is
𝛼
var(Ti ) = E(Ti2 ) − [E(Ti )]2 = 2 .
𝜆
The second population moment is therefore
( )2 𝛼(𝛼 + 1)
𝛼 𝛼
𝜇2 = E(Ti2 ) = var(Ti ) + [E(Ti )]2 = 2 + = .
𝜆 𝜆 𝜆2
14.4 Parameter Estimation 679

Setting the first two population moments equal to the first two sample moments
yields
1∑
n
𝛼
= T.
𝜆 n i=1 i

1∑ 2
n
𝛼(𝛼 + 1)
= T .
𝜆2 n i=1 i
We may now solve the two equations to obtain
1 ∑n
i=1 Ti
̂
𝜆= n
( ∑ )2
1 ∑n 2 1 n
n i=1 Ti
− n i=1 Ti

and
( ∑ )2
1 n
Ti
1∑
n
n i=1
̂ = 𝜆̂
𝛼 T = ( ∑ )2 .
n i=1 i 1 ∑n 2 1 n
n i=1 Ti − n i=1 Ti

By using the dataset in Table 14.1, we may use the following R script to find the
estimates of 𝛼 and 𝜆.

survtime <-read.table("dataset.txt",header=F,dec=".")
a<-mean(survtime)
b<-mean(survtimeˆ2)
lambda<- a/(b-aˆ2)
print(lambda)
alpha<- lambda*a
print(alpha)

̂ ≈ 4.737 and 𝜆̂ ≈ 0.0696.


This gives the estimates 𝛼 ◻

General Properties of the MME


MMEs have a number of positive and negative properties. We suffice by listing
some of these properties, without any proofs:
(1) The MMEs are easy to compute and will always work. The method provides
estimators when other methods fail to do so or when estimators are hard to
obtain.
(2) The MMEs are consistent.
(3) The MMEs may not be unique.
(4) MMEs are usually not the “best estimators” (i.e. most efficient).
680 14 Reliability Data Analysis

(5) The minimum number of moments we need equals the number of unknown
parameters.
(6) Sometimes, the MMEs may be meaningless.

14.4.4 Maximum Likelihood Estimation


The method of maximum likelihood was first introduced in 1922 by the British
statistician and geneticist Ronald Aylmer Fischer (1890–1962) and has since been a
commonly used method for estimating parameters. With this method, the param-
eters are estimated by the values that maximize the likelihood function. Before
going into further detail, we need to introduce the likelihood function.

Likelihood Function
We start with the likelihood function for a discrete, binomial model. This model
is based on a random variable X with probability mass function
( )
n x
Pr(X = x ∣ p) = p (1 − p)n−x for x = 0, 1, 2, … , n. (14.17)
x
In the classical setup, the parameter p has a deterministic but an unknown value.
In (14.17), the unknown parameter p is made visible in the probability mass func-
tion Pr(X = x ∣ p) to highlight that the probability is also a function of p. The num-
ber n of trials is considered to be a known number and therefore not a parameter.
Assume that the experiment has been carried out and the data has been
recorded. The data may, for example be n = 10 and x = 3. We may now wonder
which value of p that produced this particular result. To shed light on this prob-
lem, we calculate the probability of obtaining X = 3 for different values of p. The
probabilities may be calculated by using the function dbinom(3, size=10,
prob=p) in R.
p 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Pr(X = 3 ∣ p) 0.0574 0.201 0.267 0.215 0.117 0.0425 0.009

The probabilities for p = 0.8 and p = 0.9 are very small and not included in this
table. Observe that with these p-values, the probability Pr(X = x ∣ p) is largest for
p = 0.3, which means that p = 0.3 is the most likely probability to have produced
X = 3.
Consider the probability Pr(X = 3 ∣ p) as function of p.
( )
10 3
L(p ∣ 3) = p (1 − p)7 for 0 ≤ p ≤ 1. (14.18)
3
We use the symbol L(p ∣ 3), because it seems natural to call this function the like-
lihood function of p for the observed data. It tells how likely it is that a particular
value of p has produced the observed result.
14.4 Parameter Estimation 681

0.25
0.20
Likelihood

0.15
0.10
0.05
0.00
0.0 0.2 0.4 0.6 0.8 1.0
p

Figure 14.11 Likelihood function for the binomial distribution (n = 10 and x = 3).

The likelihood function for the observed values n = 10 and x = 3 is shown in


Figure 14.11 as a function of p, and we observe that the most likely p-value to have
produced x = 3 is p = 0.3.

Remark 14.3 (The likelihood function is not a probability distribution)


We should observe that L(p ∣ 3) is not a probability distribution for p, because
1 ( ) 1 ( )
10 10
L(p ∣ 3) dp = p3 (1 − p)7 dp = B(11, 4) = 0.03 ≠ 1,
∫0 3 ∫0 3

where B(a, b) is the beta function that can be written as


Γ(a)Γ(b)
B(a, b) = .
Γ(a + b)
The beta function B(a, b) is available in R by the function beta(a,b). A factorial,
such as 7!, is calculated in R by the function factorial(7). ◻

Maximum Likelihood Estimate


As indicated above, the parameter value that maximizes the likelihood function
for some observed data should be a good estimate for that parameter. This value is
called the maximum likelihood estimate of the parameter.
To provide a general definition of the maximum likelihood estimate, we have to
start with a model for the observed data, f (data ∣ 𝜃). This model can be a probability
density function or a probability mass function depending on whether the model
is continuous or discrete. The parameter 𝜃 may be one-dimensional or a vector of
parameters. On this background, the maximum likelihood estimate is defined as
follows.
682 14 Reliability Data Analysis

Definition 14.1 (Maximum likelihood estimate, MLE)


̂ is the value of the parameter 𝜃 that maximizes the likelihood function
The MLE, 𝜃,
with respect to 𝜃. That is,

L(𝜃̂ ∣ data) = max L(𝜃 ∣ data),


𝜃

where the maximum is taken over all possible values of the parameter 𝜃. ◻

More formally, the MLE 𝜃̂ may be written as

𝜃̂ = arg max L(𝜃 ∣ data), (14.19)


𝜃

which means that 𝜃̂ is the value (the argument) 𝜃 that maximizes L(𝜃 ∣ data). The
MLE hence is the answer to the question: What value of the parameter 𝜃 makes
the data most likely to occur?
In many applications, the natural logarithm of the likelihood function, is more
convenient to work with. Because the logarithm log(⋅) is a monotonically increas-
ing function, the logarithm of L(𝜃 ∣ data) attains its maximum value at the same
point as the L(𝜃 ∣ data) and therefore the log-likelihood function can be used instead
of the likelihood function to obtain the MLE.
The log-likelihood function is written as

𝓁(𝜃 ∣ data) = log L(𝜃 ∣ data). (14.20)

When plotting the log-likelihood function 𝓁(𝜃 ∣ data) it is most common to plot
the negative log-likelihood function, −𝓁(𝜃 ∣ data), such that the MLE 𝜃̂ is deter-
mined from the minimum value of this function. The negative log-likelihood func-
tion for the binomial distribution (14.18) is shown in Figure 14.12.
We now illustrate the maximum likelihood principle by some simple examples.

−4.4
−log Likelihood

−4.5

−4.6

−4.7

0.0 0.2 0.4 0.6 0.8 1.0


p

Figure 14.12 The negative log-likelihood function for the binomial distribution.
14.4 Parameter Estimation 683

Example 14.5 (Binomial distribution)


Let X ∼ binom(n, p). The probability mass function is given by (14.18) and the
likelihood function is
( )
n x
L(p ∣ x, n) = p (1 − p)n−x ,
x
and the log-likelihood function is
( )
n
𝓁(p ∣ x, n) = log + x log n + (n − x) log(1 − p).
x
The MLE is found by taking the derivative of 𝓁(p ∣ x, n) and setting this derivative
equal to zero.
d x n−x
𝓁(p ∣ x, n) = − = 0.
dp p 1−p
An extreme point is found for p = x∕n. We should then check that this extreme
point really is a maximum. The ML estimate for the parameter p is therefore
x
̂
p= .
n
The above calculation may be done before any experiment is carried out, and it will
apply for any possible values of X and n. We may therefore establish the metric for
finding the maximum likelihood estimate as
X
̂
p= .
n
Observe the difference between the estimate and the estimator. The estimate is a
number that is determined by the observed data and is a numerical estimate that
is specific for the data. The estimator is a random variable that gives a metric for
determining the estimate when the data becomes available. This random variable
is called a maximum likelihood estimator and, unfortunately, the same symbol
and the same abbreviation, MLE, is commonly used for both the estimator and
the estimate.
Assume now that the experiment is carried out and that we have observed x = 5
in a total of n = 40 independent Bernoulli trials. With this data, the ML estimate
of p is therefore
x 5
̂
p= = = 0.125.
n 40 ◻

Example 14.6 (Homogeneous Poisson Process)


A homogeneous Poisson process (HPP) with unknown rate 𝜆 is observed during
a time period (0, 𝜏). Let N(𝜏) be the number of observed events. The probability
mass function is
(𝜆𝜏)n −𝜆𝜏
Pr(N(𝜏) = n) = e for n = 0, 1, 2, … .
n!
684 14 Reliability Data Analysis

Assume that we have observed n = 8 events during a time period of length 𝜏 =


10 560 hours. The likelihood function is
(𝜆𝜏)n −𝜆𝜏
L(𝜆 ∣ n, 𝜏) = e for 𝜆 > 0.
n!
The log-likelihood function is
𝓁(𝜆 ∣ n, 𝜏) = n log(𝜆𝜏) − log n! − 𝜆𝜏.
The MLE is found by taking the derivative of 𝓁(𝜆 ∣ n, 𝜏) and setting this derivative
equal to zero.
d n𝜏
𝓁(𝜆 ∣ n, 𝜏) = − 𝜏 = 0.
d𝜆 𝜆𝜏
The extreme (i.e. maximum) point is found for 𝜆 = n∕𝜏. As always, we should
check that this is really a maximum. With the given data, the MLE (estimate) of
𝜆 is
n 8
𝜆̂ = = ≈ 7.58 × 10−4 h−1 .
𝜏 10 560 h ◻

Example 14.7 (Exponential distribution)


Let T1 , T2 , … , Tn be n independent and identically distributed random variables
with distribution exp(𝜆). Because the variables are independent and identically
distributed, the joint probability density is

n

n
∑n
f (t1 , t2 , … , tn ∣ 𝜆) = f (ti ∣ 𝜆) = 𝜆e−𝜆ti = 𝜆n e−𝜆 i=1 ti for t ≥ 0.
i=1 i=1

Assume that we have observed n = 5 variables during an accumulated time period


∑5
𝜏 = i=1 ti = 15 600 hours. The likelihood function is
L(𝜆 ∣ n, 𝜏) = 𝜆n e−𝜆𝜏 for 𝜆 > 0.
The log-likelihood function is
𝓁(𝜆 ∣ n, 𝜏) = n log 𝜆 − 𝜆𝜏.
The MLE is found by taking the derivative of 𝓁(𝜆 ∣ n, 𝜏) and setting this derivative
equal to zero.
d n
𝓁(𝜆 ∣ n, 𝜏) = − 𝜏 = 0.
d𝜆 𝜆
The extreme (i.e. maximum) point is found for 𝜆 = n∕𝜏. Again, we should check
that this is really a maximum. With the given data, the MLE (estimate) of 𝜆 is
n 5
𝜆̂ = = ≈ 3.2 × 10−4 h−1 . ◻
𝜏 15 600 h
14.4 Parameter Estimation 685

Remark 14.4 (Factors not depending on the parameter can be deleted)


As seen from the above examples, the likelihood function can usually be written
as a product of two functions, such that, L(𝜃 ∣ x) = h(x)g(𝜃, x). The log-likelihood
function is then 𝓁(𝜃 ∣ x) = log h(x) + log g(𝜃, x). When taking the derivative
of 𝓁(𝜃, x) with respect to the parameter 𝜃, we get d log h(x)∕d𝜃 = 0. We may
therefore remove additive terms not containing unknown parameters from
the log-likelihood function. For the binomial( distribution
) in Example 14.5, the
n
likelihood function is a product of h(x) = x and g(p, x) = px (1 − p)n−x . The
likelihood function may be simplified to L(p, x, n) ∝ px (1 − p)n−x . ◻

General Properties of the MLE


The MLE has a high number of valuable properties. Here, we suffice by listing
some of these properties without proofs.

• Assume that we have found the MLE 𝜃̂ of 𝜃 and that g(𝜃) is a one-to-one function.
̂
The MLE of g(𝜃) is then g(𝜃).
• An MLE is asymptotically unbiased. E(𝜃̂n ) → 𝜃 when the sample size n
increases.
• Under relatively mild conditions, the MLE is consistent.
• Under certain regularity conditions, the ML estimator has an asymptotically
normal distribution.

Interested readers may consult almost any good book on estimation theory to find
proofs and further properties.

MLE with R
In most cases, ML estimation results in explicit formulas and R may therefore
not be needed to compute the MLEs. If a computer support is deemed to be
required, MLE is available by using the R packages: stats4 , bbmle, or maxLik.
If you want to use one of these packages, please read carefully the package
manuals that are found on the Internet (e.g. by searching for “CRAN package
bbmle”).
To illustrate the analysis, a brief R script using the package bbmle to calculate
the MLE for p in the binomial distribution is shown. To calculate the maximum
likelihood estimate, bbmle uses the function mle2, which again is based on the
negative log-likelihood function.
ML estimation in the binomial model was illustrated in Example 14.5. With
R and the dataset size = 40 and mydata = 5, we can use the following
R script.
686 14 Reliability Data Analysis

library(bbmle) # Activate the package bbmle


options(digits=3) # Set the precision of the output
size<-40
mydata<-c(5)
myfunc<-function(size,prob)(-sum(dbinom(mydata,size,prob,
log=T))#
mle2(myfunc,start=list(prob=0.5),data=list(size=40))

As in Example 14.5, the output is prob=0.125.

Likelihood Function for Censored Datasets


Consider a sample of n independent and identical items. If item i failed at time ti ,
its contribution to the likelihood function is
Li (𝜃 ∣ ti ) = f (ti ∣ 𝜃) = z(ti ∣ 𝜃)R(ti ∣ 𝜃),
because to fail at time ti , the item needs to be functioning just before time ti [with
probability R(ti ∣ 𝜃)] and then it must fail in a very short interval at ti . Recall the
definition of the failure rate function. Here, f (⋅) and R(⋅) are regarded as functions
of the parameter 𝜃, and ti is a specific and known time.
If, on the other hand, item i is still functioning at time ti , all we know is that its
time-to-failure exceeds ti . The contribution to the likelihood function is then
Li (𝜃 ∣ ti ) = R(ti ∣ 𝜃).
Let, as before, 𝛿i be a failure indicator for item i, such that 𝛿i = 1 if item i fails and
𝛿i = 0 if item i is (right) censored, for i = 1, 2, … , n. The likelihood function may
now be written as

n

n
L(𝜃 ∣ t1 , t2 , … , tn ) = Li (𝜃 ∣ ti ) = [z(ti )]𝛿i R(ti ). (14.21)
i=1 i=1

When the failure rate is a constant 𝜆, the likelihood function is



n
L(𝜆 ∣ t1 , t2 , … , tn ) = 𝜆𝛿i e−𝜆ti .
j=1

14.4.5 Exponentially Distributed Lifetimes


The exponential distribution plays an important role in system reliability anal-
ysis, and we therefore treat estimation in this distribution separately. Let T be
the time-to-failure of an item and assume that T is exponentially distributed with
failure rate 𝜆, such that T ∼ exp(𝜆). Further, assume that the survival times of n
identical and independent items are observed. The times-to-failure of the n items.
14.4 Parameter Estimation 687

T1 , T2 , … , Tn are therefore independent and identically distributed, exp(𝜆). The


dataset of observed survival times is t = (t1 , t2 , … , tn ). The dataset may be complete
or censored.

Exponentially Distribution: Complete Sample


The joint probability density function of T1 , T2 , … , Tn is
( )

n

n
f (t1 , t2 , … , tn ∣ 𝜆) = 𝜆 exp(−𝜆ti ) = 𝜆n exp −𝜆 ti .
i=1 i=1

The corresponding likelihood function is


( )
∑n
L(𝜆 ∣ t) = 𝜆 exp −𝜆
n
ti ,
i=1

and the log-likelihood function becomes



n
𝓁(𝜆 ∣ t) = n log 𝜆 − 𝜆 ti . (14.22)
i=1

The MLE is found by setting the derivative of the log-likelihood function equal to
zero.
n ∑
n
d
𝓁(𝜆 ∣ t) = − t = 0.
d𝜆 𝜆 i=1 i
Solving for 𝜆 gives the ML estimate
n
𝜆̂ = ∑n .
i=1 ti
The corresponding ML estimator is
n
𝜆̂ = ∑n . (14.23)
i=1 Ti
1 ∑n
The ML estimate can hence be expressed by the sample average t = n i=1 ti , as
1
𝜆̂ = .
t
When a complete dataset D = {t1 , t2 , … , tn } is available, the ML estimate can be
calculated in R as 1/mean(D), and a special R package is not required.

Example 14.8 (Exponential distribution, complete sample)


∑10
Assume that we have n = 10 observed values and that i=1 ti = 68 450 hours. With
these data, the likelihood function is shown in Figure 14.13 as a function of 𝜆.
The ML estimate in this case is
n 10
𝜆̂ = ∑10 = ≈ 1.461 × 10−4 h−1 ,
t 68 450 h
i=1 i
688 14 Reliability Data Analysis

2.0e−43
1.5e−43
1.0e−43
5.0e−44
0.0e+00
0e+00 1e−04 2e−04 3e−04 4e−04 5e−04 6e−04
λ

Figure 14.13 Likelihood function for the exponential distribution in Example 14.8.

a value that corresponds with the maximum of the likelihood curve in


Figure 14.13. ◻

We now study the properties of the ML estimator and first find out whether or
not it is unbiased.
Because Ti ∼ exp(𝜆), 2𝜆Ti is 𝜒 2 distributed with two degrees of freedom for i =
∑n
1, 2, … , n (e.g. see Ross 2014). Because the Ti s are independent, 2𝜆 i=1 Ti is 𝜒 2
distributed with 2n degrees of freedom.
The ML estimator can be written as
n 2n𝜆
𝜆̂ = ∑n = ∑n ,
T
i=1 i 2𝜆 i=1 Ti

and has the same distribution as 2n𝜆∕Z, where Z is 𝜒 2 distributed with 2n degrees
of freedom. Accordingly,
( )
̂ = 2n𝜆 E 1 .
E(𝜆)
Z
Here,
( ) ∞
1 1 1 1 n−1 −z∕2
E = z e dz
Z ∫0 z 2n Γ(n)

1 1
= zn−2 e−z∕2 dz
2(n − 1) ∫0 2n−1 Γ(n − 1)
1
= .
2(n − 1)
Therefore,
̂ = 2n𝜆 1 n
E(𝜆) = 𝜆.
2(n − 1) n − 1
The estimator 𝜆̂ is accordingly not unbiased, but the estimator 𝜆∗ , given by
n−1 ̂ n−1
𝜆∗ = 𝜆 = ∑n
n i=1 Ti
14.4 Parameter Estimation 689

is seen to be unbiased. Let us determine var(𝜆∗ ).


( ) ( )
n−1 2 1
var(𝜆∗ ) = var (𝜆∗ ) = 4(n − 1)2 𝜆2 var ,
n Z
where Z has the same meaning as above. Now,
( ) ( ) [ ( )]2
1 1 1
var =E − E
Z Z2 Z
and
( ) ∞
1 1 1 1 n−1 −z∕2 1
E = z e dz = .
Z 2 ∫0
2 n
z 2 Γ(n) 4(n − 1)(n − 2)
Hence,
( )
1 1
var(𝜆 ) = 4(n − 1) 𝜆
∗ 2 2

4(n − 1)(n − 2) 4(n − 1)2
( )
1 1 𝜆2
= (n − 1)𝜆2 − = .
n−2 n−1 n−2
The estimator
n−1
𝜆∗ = ∑n (14.24)
i=1 Ti

is therefore unbiased and has variance


𝜆2
var(𝜆∗ ) = . (14.25)
n−2
∑n
To establish a 1 − 𝜀 confidence interval for 𝜆, we use the fact that 2𝜆 i=1 Ti is
𝜒 2 distributed with 2n degrees of freedom. Hence,
( )
∑n
Pr z1−𝜀∕2,2n ≤ 2𝜆 Ti ≤ z𝜀∕2,2n = 1 − 𝜀
i=1

and
( )
z1−𝜀∕2,2n z𝜀∕2,2n
Pr ∑n ≤ 𝜆 ≤ ∑n = 1 − 𝜀.
2 i=1 Ti 2 j=1 Tj
Thus, a 1 − 𝜀 confidence interval for 𝜆 is
( )
z1−𝜀∕2,2n z𝜀∕2,2n
∑n , ∑n . (14.26)
2 i=1 Ti 2 j=1 Tj

Total-Time-on-Test
Let T(1) ≤ T(2) ≤ · · · ≤ T(n) be the order statistics for the variables T1 , T2 , … , Tn ,
and similarly, let t(1) ≤ t(2) ≤ · · · ≤ t(n) be the ordered dataset that is obtained from
the experiment. Assume that all the n items are put into operation at the same
time t = 0.
690 14 Reliability Data Analysis

We introduce the symbol  (t) for the accumulated time in operation in the inter-
val (0, t), and call  (t) the total-time-on-test (TTT) at time t. At time t(1) , the n items
have accumulated a time in operation  (t(1) ) = nt(1) . Just after time t(1) , there are
n − 1 items left in operation. The accumulated time in operation at time t(2) is
therefore  (t(2) ) = nt(1) + (n − 1)(t(2) − t(1) ).
Let di = t(i) − t(i−1) be the time interval between the termination of the operation
of the (i − 1)th entry and the termination of the ith entry, such that
t(1) = d1
t(2) = d1 + d2
⋮ ⋮
t(r) = d1 + d2 + · · · + dr .
The TTT at time t(r) has two parts
(1) The time on test of the items that have failed in the interval (0, t(r) ], which is
∑r
i=1 t(i) = rd1 + (r − 1)d2 + · · · + dr .
(2) The time on test of the n − r items that are still in operation at time t(r) , which
∑r
is (n − r)t(r) = (n − r) i=1 di .
The TTT at time t(r) is therefore,

r
 (t(r) ) = t(i) + (n − r)t(r)
i=1

r
= rd1 + (r − 1)d2 + · · · + dr + (n − r) di .
i=1
Tidying up this expression yields

r
 (t(r) ) = [n − (i − 1)]di . (14.27)
i=1
By introducing the corresponding random variables, we obtain

r
 (T(r) ) = [n − (i − 1)]Di . (14.28)
i=1

Exponentially Distribution: Censored Data


Assume that n independent and identical items with constant failure rate 𝜆 have
been observed until either failure or censoring. We assume that there are no ties
in the dataset {t1 , t2 , … , tn }. As before, let 𝛿i = 1 when survival time tj is a failure
time, and 𝛿j = 0 when tj is a censored time, for j = 1, 2, … , n. From (14.21), the
likelihood function may then be written as

n
L(𝜆 ∣ t1 , t2 , … , tn ) = 𝜆𝛿i e−𝜆ti . (14.29)
j=1
14.4 Parameter Estimation 691

Censoring of Type II
For censoring of type II, the life test is terminated as soon as r failures have been
observed. The ordered dataset may be written as t(1) < t(2) < · · · < t(r) < t(r+1) <
· · · < t(n) , for r < n. The dataset contains r times-to-failure and n − r censored
times. This means that tr is the longest time-to-failure. The likelihood function
for this situation is (see Remark 14.4)
( [ r ])

L(𝜆 ∣ t(1) , … , t(r) ) ∝ 𝜆 exp −𝜆
r
t(j) + (n − r)t(r)
j=1

= 𝜆r exp[−𝜆 (t(r) )] for 0 < t(1) < · · · < t(r) .


The log-likelihood function is
𝓁(𝜆 ∣ t) ∝ r log 𝜆 − 𝜆 (t(r) ),
where t = (t(1) , t(2) , … , t(r) ). The MLE is found by setting the derivative of the
log-likelihood function equal to zero.
d r
𝓁(𝜆 ∣ t) = −  (t(r) ) = 0.
d𝜆 𝜆
The ML estimate 𝜆∗II of 𝜆 is, therefore,
r
𝜆∗II = .
 (t(r) )
The corresponding ML estimator is
r
𝜆∗II = . (14.30)
 (T(r) )
The TTT at time T(r) is
 (T(r) ) = nD1 + (n − 1)D2 + · · · + [n − (r − 1)]Dr

r
= [n − (j − 1)]Dj .
j=1

Introducing
D∗j = [n − (j − 1)]Dj for j = 1, 2, … , r.
we know that 2𝜆D∗1 , 2𝜆D∗2 , … , 2𝜆D∗r are independent and 𝜒 2 distributed, each with
2 degrees of freedom. Hence, 2𝜆 (T(r) ) is 𝜒 2 distributed with 2r degrees of free-
dom, and we can utilize this to find E(𝜆∗II ).
( ) ( ) ( )
r 1 1

E(𝜆II ) = E = 2𝜆rE = 2𝜆rE ,
 (T(r) ) 2𝜆 (T(r) ) Z
where Z is 𝜒 2 distributed with 2r degrees of freedom. This implies that
( )
1 1
E = .
Z 2(r − 1)
692 14 Reliability Data Analysis

Hence,
1 r
E(𝜆∗II ) = 2𝜆r =𝜆 .
2r − 1 (r − 1)
The estimator 𝜆∗II is accordingly not unbiased, but
(r − 1)
𝜆∗II = (14.31)
 (T(r) )
is seen to be unbiased. By the method used for a complete dataset, we find that
𝜆2
var(𝜆̂II ) = .
(r − 2)
Confidence intervals, as well as tests for standard hypotheses about 𝜆, may now be
derived from the fact that 2𝜆 (T(r) ) is 𝜒 2 distributed with 2r degrees of freedom.

Censoring of Type I
The fact that the number (S) of items failing before time t0 is random, makes this
situation more difficult to deal with from a probabilistic point of view. We therefore
confine ourselves to suggesting an intuitive estimator for 𝜆.
First, observe that the estimators for 𝜆, derived in the case of complete
datasets and of type II censored data, both could be written as a fraction with
numerator equal to “number of recorded failures −1” and denominator equal to
“total-time-on-test at the termination of the test.” It seems intuitively reasonable
to use the same fraction when we have type I censoring.
In this case, the number of failures is S and the TTT is

S
 (t0 ) = T(j) + (n − S)t0 . (14.32)
j=1

Hence,
S−1
𝜆̂I =
 (t0 )
seems to be a reasonable estimator for 𝜆.
It can be shown that this estimator is biased for small samples, but asymptoti-
cally, it has the same properties as 𝜆̂II (see Mann et al. 1974, p. 173).

14.4.6 Weibull Distributed Lifetimes


Another important distribution in system reliability analyses is the Weibull distri-
bution. To find the MLEs for the parameters of the Weibull distribution is more
complicated than for the exponential distribution. We suffice by treating complete
datasets only.
14.4 Parameter Estimation 693

Complete Sample
Let T1 , T2 , … , Tn be a complete sample of lifetimes that are independent and iden-
tical to Weibull distribution with probability density
( ) [ ( )𝛼 ]
𝛼 t 𝛼−1 t
fT (t) = exp − for t > 0, 𝛼 > 0, 𝜃 > 0.
𝜃 𝜃 𝜃
The likelihood function is
( )𝛼−1 [ ( )𝛼 ]

n
𝛼 tj tj
L(𝛼, 𝜃 ∣ t1 , t2 , … , tn ) = exp − , (14.33)
j=1
𝜃 𝜃 𝜃

and the log-likelihood is


[ ( )𝛼 ]

n
tj
𝓁(𝛼, 𝜃 ∣ t1 , t2 , … , tn ) = log 𝛼 − 𝛼 log 𝜃 + (𝛼 − 1) log tj −
j=1
𝜃
n ( )𝛼
∑ n
∑ tj
= n log 𝛼 − n𝛼 log 𝜃 + (𝛼 − 1) log tj − .
j=1 j=1
𝜃

The likelihood equations become


( )
𝛼 ∑ 𝛼 𝛼n 1∑ 𝛼
n n
𝜕𝓁 n𝛼
=− + 𝛼+1 tj = 𝛼 t − 𝜃𝛼 = 0.
𝜕𝜃 𝜃 𝜃 j=1
𝜃 n j=1 j

Solving this equation yields


( )1∕𝛼
1∑ 𝛼
n
𝜃= t . (14.34)
n j=1 j

The derivative with respect to 𝛼 is


( )𝛼 ( )
𝜕𝓁 n ∑ ∑n n
tj tj
= − n log 𝜃 + log tj + log
𝜕𝛼 𝛼 j=1 j=1
𝜃 𝜃

n ∑ n
1 ∑ 𝛼
n
= − n log 𝜃 + log tj + 𝛼 t (log tj − log 𝜃).
𝛼 j=1
𝜃 j=1 j

Inserting (14.46) gives the MLE equation


∑n 𝛼
1∑
n
1 j=1 tj log tj
log tj + − ∑n 𝛼 = 0.
n j=1 𝛼 j=1 tj

This is an equation with a single unknown parameter 𝛼. We may therefore solve


for 𝛼 to obtain the MLE 𝛼
̂. It can be proved that there is a unique solution for 𝛼.
694 14 Reliability Data Analysis

Weibull Analysis with R


Several R packages can be used to determine ML estimates for the Weibull distri-
bution. Among these are bbmle, stat4, and survival. If you want to use one
of these, you should read the package documentation carefully and also search the
Internet for example scripts.
A dedicated R package for Weibull analysis, called WeibullR, is further
available, but is still under development. The package can be used to find the
ML estimates for both two-parameter and three-parameter Weibull distributions.
Examples of R scripts may be found in the package documentation. The package
can be used for both complete and censored datasets. WeibullR provides several
approaches to estimating the parameters 𝛼 and 𝜃 for a two-parameter Weibull
distribution. Here, we illustrate the most simple approach. In the basic setup, we
enter the times-to-failure and the censoring times as separate vectors, as shown
in the following R script

library(WeibullR)
failtime<-c(31.7,39.2,57.5,65.8,70.0,101.7,109.2,130.0)
censored<-c(65.0,75.0,75.2,87.5,88.3,94.2,105.8,110.0)
# Prepare the data for analysis
data<-wblr.conf(wblr.fit(wblr(failtime,censored)),lwd=1)
plot(data)

The function wblr is used to prepare the dataset for usage in WeibullR. The
resulting plot is shown in Figure 14.14.
Observe that the plot in Figure 14.14 is obtained by a simplified procedure in
WeibullR using only default settings. The default names of parameters are “beta”
for 𝛼 and “eta” for 𝜃. The estimates obtained are 𝛼 = 2.35 and 𝜃 = 115.2. Con-
fidence bounds are supplied. To choose a more advanced estimation procedure
and to adjust the settings, the reader should read the WeibullR documentation
carefully.

Censoring of Type II
With censoring of type II, the dataset contains r times-to-failure, and n − r cen-
sored times, and the censoring takes place at time t(r) . Analogous with (14.33) the
likelihood function is proportional with
( )𝛼−1 ( ) ( ( )𝛼 )n−r
∏r
𝛼 t(j) t(j) 𝛼 t(r)
L(𝛼, 𝜃 ∣ t) ∝ exp − exp −
j=1
𝜃 𝜃 𝜃 𝜃
( ( )𝛼 )
∏r
t(r)
𝛼−1
= 𝛼 r 𝜃 −𝛼r t(j) exp −(n − r) ,
j=1
𝜃
14.4 Parameter Estimation 695

99 10 50 100 500 1000

99
90

90
50

50
Unreliability (%)

ranks = median
n (fail | cens.) = 16 (8 | 8)
5 10 20

20
weibull (rr−xony)
beta = 2.35
eta = 115.2

10
r2 = 0.9715
prr = 88.15 by corr.

5
CI bounds, type = “pivotal−rr”
CI = 90 [%], S = 10000
B−life ssCL = 95 [%]
2

2
B10 = 22.15 | 44.21 | 65.46
B5 = 12.72 | 32.55 | 53.53
B1 = 3.607 | 16.27 | 34.67
1

1
10 50 100 500 1000
Time to failure

Figure 14.14 Output from a simple script using WeibullR.

where t is the ordered dataset, that is, the r times-to-failure and the n − r censoring
times that are all equal to t(r) . The log-likelihood is

r
𝓁(𝛼, 𝜆 ∣ t) = r log 𝛼 − r𝛼 log 𝜃 + (𝛼 − 1) log t(j)
j=1
( )𝛼 ( )𝛼

r
t(j) t(r)
− − (n − r) .
j=1
𝜃 𝜃

Analogous with the complete data situation, we can determine the MLE estimates
𝛼 ∗ and 𝜆∗ from
( )1∕𝛼∗
r
𝜆 = ∑r 𝛼∗

𝛼∗
(14.35)
j=1 t(j) + (n − r)t(r)

and
∑r 𝛼∗ 𝛼 ∗

r ∑r r j=1 t(j) log t(j) + (n − r)t(r) log t(r)


+ log t(j) − ∑r = 0. (14.36)
𝛼 ∗
j=1
𝛼∗
j=1 t(j)
𝛼
+ (n − r)t(r)

For further details on ML estimation in the Weibull distribution, e.g. see Meeker
and Escobar (1998) and McCool (2012).
696 14 Reliability Data Analysis

14.5 The Kaplan–Meier Estimate

A nonparametric estimate for the survivor function R(t) = Pr(T > t) was intro-
duced by Kaplan and Meier (1958) and is called the Kaplan–Meier estimate.7 A
valued feature of the Kaplan–Meier estimate is that it provides an intuitive graph-
ical representation. We first introduce the estimate for a complete dataset.

14.5.1 Motivation for the Kaplan–Meier Estimate Based a Complete


Dataset
Consider a complete dataset without ties. For this dataset, the obvious estimate for
R(t) is the empirical survivor function, which is presented in Section 14.3.5. The
empirical survivor function is based on binomial reasoning for each failure time t.
As a motivation for the Kaplan–Meier estimate, we now develop the empirical
survivor function by a different approach based on the ordered (complete) dataset
0 = t(0) < t(1) < t(2) < · · · < t(n) .
Consider a particular survival time, say t(i) , in this dataset. To survive t(i) , the
item has to survive the first interval (0, t(1) ). Given that this interval is survived,
the item has to survive the next interval (t(1) , t(2) ), and so on, until it must survive
the interval (t(i−1) , t(i) ). Let t(0) = 0. The probability of surviving the first interval is

R(t(1) ) = Pr(T > t(1) ) = Pr(T > t(1) ∣ T > t(0) ) = R(t(1) ∣ t(0) ).

The probability of surviving the next interval (when it is known that it has survived
the first interval) is

R(t(2) ∣ t(1) ) = Pr(T > t(2) ∣ T > t(1) ),

and so on. This means that the survivor function at time t(i) can be expressed by
using the multiplication rule for conditional probabilities as

i
R(t(i) ) = R(t(j) ∣ t(j−1) ), (14.37)
j=1

where R(t(0) ) = R(0) = 1.


Each factor in (14.37) can be estimated with the same binomial approach we
used to obtain the empirical distribution function. Just before time t(1) , n1 = n
items are in the at-risk-set and may fail, just before time t(2) , n2 = n − 1 items are in
the at-risk-set and may fail, and so on. Because we have a complete dataset with-
out censoring and ties, the number of items that failed at time t(j) , is dj = 1. The
number of items that survives t(j) is therefore nj − dj = (n − j − 1).

7 Named after the authors: Edward Lynn Kaplan (1920–2006) and Paul Meier (1924–2011).
14.5 The Kaplan–Meier Estimate 697

Based on the binomial model, we may then estimate the factors of (14.37) as
n1 − d1 d1 1
̂(t(1) ∣ t(0) ) =
R =1− =1−
n1 n1 n
n − d2 d2 1
̂(t(2)
R ∣ t(1) ) = 2 =1− =1− ,
n2 n2 n−1
and so on.
If we use this result in (14.37), we obtain a reformulated estimate for the empir-
ical survivor function
( )
∏ ∏ dj
̂(t) =
R ̂ (t(j) ∣ t(j−1) ) =
R 1−
j;t(j) <t j;t(j) <t
nj
( )
∏ 1
= 1− . (14.38)
j;t <t
n − j+1
(j)

̂ (t) = 0. The reason why we have written


For t > t(n) , all the n items are failed and R
the empirical survivor function in such a complicated way is to pave the way for
the introduction of the Kaplan–Meier estimate.

14.5.2 The Kaplan–Meier Estimator for a Censored Dataset


Kaplan and Meier (1958) extend the empirical survivor function to a randomly
censored dataset that may also include ties. Their approach is very similar to our
derivation of the empirical survivor function and their estimate is given as
( )
∏ dj
̂
R(t) = 1− .
j;t <t
nj
(j)

The only difference from (14.38) is the values for dj and nj . If t(j) is a censoring
time, dj = 0, the factor (1 − dj ∕nj ) = 1 and does not directly influence the estimate
̂(t), but the censoring influences the at-risk-set before the next event (failure or
R
censoring).
We may rewrite the definition of the Kaplan–Meier estimate to include the infor-
mation of whether a survival time is a failure or a censoring time by including the
status 𝛿j in the formula
( )
∏ dj
̂(t) =
R 1− , (14.39)
j;t <t,𝛿 =1
nj
(j) j

where the product includes all items j that have a failure time (i.e. 𝛿j = 1) such that
t(j) < t. This formula clearly shows that the factors are only included for survival
times that represent failure. Survival times that represent censoring give a factor
equal to one and will hence not influence the estimate directly.
698 14 Reliability Data Analysis

With ̂
pj = 1 − dj ∕nj = (nj − dj )∕nj we may write (14.54) as

̂(t) =
R ̂
pj . (14.40)
j;t(j) <t,𝛿j =1

The estimate R ̂ (t) (14.39) and (14.40) is known as the Kaplan–Meier estimate
and is also called the product limit (PL) estimate. The procedure to calculate the
Kaplan–Meier estimate is illustrated in Example 14.9.

Example 14.9 (Kaplan–Meier estimate)


Consider the ordered dataset in Table 14.2. The dataset has 16 survival times, of
which 9 are censored times (status 𝛿 = 0) and 7 are failure times (status 𝛿 = 1). The
dataset has no ties. With no ties, the number of items at risk just before survival
time t(j) is nj = n − j + 1, as listed in the second column of Table 14.2.
Immediately before t(1) , n = 16 items were at risk. After the failure at t(1) ,
n − 2 + 1 items are at risk before t(2) , and similar for the other failure times. The
Kaplan–Meier estimate R ̂(t) is found from (14.40) by multiplying the ̂ pj ’s for all
survival times ≤ t.
In Table 14.3, the Kaplan–Meier estimate is presented as a function of time. In
̂ (t) = 1. The
the time interval (0, 31.7) until the first failure, it is reasonable to set R
estimate may be displayed graphically as a Kaplan–Meier plot. ◻

Kaplan–Meier Estimate with R


The Kaplan–Meier estimate is available in the R package survival and a
Kaplan–Meier plot is generated by the script

library(survival)
survtime <- c(31.7,39.2,57.5,65.0,65.8,70,0,75.0,75.2,
87.5,88.3,94.2,101.7,105.8,109.2,110.0,130.0)
status <- c(1,1,1,0,1,1,0,0,0,0,0,0,1,0,1,0,1)
data<- Surv(survtime,status==1)
km <- survfit(Surv(survtime, status==1)∼1,conf
.type="none")
plot(km,xlab="Time t",ylab="Survival probability")

The additional command print(summary(km)) gives a summary of the


results.
time n.risk n.event survival std.err
31.7 16 1 0.938 0.0605
39.2 15 1 0.875 0.0827
14.5 The Kaplan–Meier Estimate 699

57.5 14 1 0.812 0.0976


65.8 12 1 0.745 0.1105
70.0 11 1 0.677 0.1194
101.7 5 1 0.542 0.1542
109.2 3 1 0.361 0.1797
130.0 1 1 0.000 NaN

Observe that these results are the same as we found by hand-calculation in


Table 14.2, but the estimates are only presented for failure times.

Table 14.2 Computation of the Kaplan–Meier Estimate (censored times are marked with
0 in column “Status”).

Number at risk Ordered survival


Rank j (n − j + 1) times t(j) Status 𝜹j ̂
pj ̂ )
R(t(j)

0 — — — 1 1.000

1 16 31.7 1 15∕16 0.938

2 15 39.2 1 14∕15 0.875

3 14 57.5 1 13∕14 0.813

4 13 65.0 0 1 0.813

5 12 65.8 1 11∕12 0.745

6 11 70.0 1 10∕11 0.677

7 10 75.0 0 1 0.677

8 9 75.2 0 1 0.677

9 8 87.5 0 1 0.677

10 7 88.3 0 1 0.677

11 6 94.2 0 1 0.677

12 5 101.7 1 4∕5 0.542

13 4 105.8 0 1 0.542

14 3 109.2 1 2∕3 0.361

15 2 110.0 0 1 0.361

16 1 130.0 1 0 0.000
700 14 Reliability Data Analysis

Table 14.3 The Kaplan–Meier estimate as a function of time.

t ̂
R(t)

0 ≤t< 31.7 = 1.000


15
31.7 ≤t< 39.2 = 0.938
16
15 14
39.2 ≤t< 57.5 ⋅ = 0.875
16 15
15 14 13
57.5 ≤t< 65.8 ⋅ ⋅ = 0.813
16 15 14
15 14 13 11
65.8 ≤t< 70.0 ⋅ ⋅ ⋅ = 0.745
16 15 14 12
15 14 13 11 10
70.0 ≤t< 101.7 ⋅ ⋅ ⋅ ⋅ = 0.677
16 15 14 12 11
15 14 13 11 10 4
101.7 ≤t< 109.2 ⋅ ⋅ ⋅ ⋅ ⋅ = 0.542
16 15 14 12 11 5
15 14 13 11 10 4 2
109.2 ≤t< 130.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ = 0.361
16 15 14 12 11 5 3
15 14 13 11 10 4 2 0
130.0 ≤t ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ = 0.000
16 15 14 12 11 5 3 1

We see from (14.39) that R ̂ (t) is a step function, continuous from the right, that
̂
equals 1 at t = 0. R(t) drops by a factor of (nj − 1)∕nj at each failure time t(j) . The
estimate R̂ (t) does not change at the censored times. The censored times influence
̂ (t).
the values of nj (i.e. the at-risk-set) and hence, the size of the steps in R
̂
A slightly problematic point is that R(t) never reduces to zero when the longest
survival time t(n) recorded is a censored time. For this reason, R ̂(t) is usually taken
to be undefined for t > t(n) . This issue is further discussed by Kalbfleisch and Pren-
tice (1980).

Some Properties of the Kaplan–Meier Estimator


A thorough discussion of the properties of the Kaplan–Meier estimator R ̂(t) may
be found in Kalbfleisch and Prentice (1980), Lawless (1982), Cox and Oakes
(1984), and Aalen et al. (2008). Here, we suffice by summarizing a few properties
without proofs:
(1) The Kaplan–Meier estimator R ̂(t) can be derived as a nonparametric MLE.
This derivation was originally given by Kaplan and Meier (1958).
̂(t) is a consistent estimator of R(t) under quite general conditions with esti-
(2) R
mated asymptotic variance (e.g. see Kalbfleisch and Prentice 1980, p. 14):
∑ dj
̂ R
var( ̂(t)) = [R
̂ (t)]2 . (14.41)
j∈Jt
nj (nj − dj )
14.6 Cumulative Failure Rate Plots 701

1.0
Survival probability

0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Time t

Figure 14.15 Kaplan–Meier plot for the data in Example 14.9. Made with R.

1.0
Survival probability

0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100 120
Time t

Figure 14.16 Kaplan–Meier plot of the dataset in Example 14.9 with 90% confidence
limits, made with R.

Expression (14.41) is known as Greenwood’s formula.


Confidence limits based on Greenwood’s formula are available in R and
are obtained by the option conf.type=’plain’ in the R script in
Example 14.9. The Kaplan–Meier plot in Figure 14.15 with 90% confidence
limits is shown in Figure 14.16. Because the plot is based on only eight failure
times, the confidence band is rather wide.
(3) Because it is a maximum likelihood estimator, the Kaplan–Meier estimator
has an asymptotic normal distribution. Confidence limits for R(t) can hence
be determined using normal approximation. For details see Cox and Oakes
(1984).

14.6 Cumulative Failure Rate Plots

Let R(t) be the survivor function for a certain type of items, and assume that the
distribution is continuous with probability density f (t) = R′ (t), where f (t) > 0 for
702 14 Reliability Data Analysis

t > 0. No further assumptions are made about the distribution (i.e. a nonparamet-
ric model).
The failure rate function was defined in Section 5.3.2 as
f (t) d
z(t) = = − log R(t).
R(t) dt
The cumulative failure rate function is
t
Z(t) = z(u) du = − log R(t), (14.42)
∫0
and the survivor function may therefore be written as

R(t) = e−Z(t) .

Plotting Z(t) as a function of t gives a cumulative failure rate plot. If the plot is
convex when plotted on a linear scale, the failure rate function is increasing, and
if the plot is concave, the failure rate function is decreasing.

Example 14.10 (Exponential distribution)


The cumulative failure rate function for the exponential distribution, exp(𝜆), is

Z(t) = 𝜆t for t ≥ 0, 𝜆 > 0.

Plotted as a function of t on a linear scale, the plot of Z(t) is a straight line with
slope 𝜆. If we are able to determine an estimate Z ̂ (t), the plotted values should
follow a reasonably straight line. ◻

Example 14.11 (Weibull distribution)


The cumulative failure rate function for the Weibull distribution with shape 𝛼 and
scale 𝜃 is
( )𝛼
t
Z(t) = for t ≥ 0, 𝛼 > 0, 𝜃 > 0.
𝜃
Taking logarithm yields

log Z(t) = 𝛼 log t − 𝛼 log 𝜃.

If Z(t) is plotted versus t on a log–log scale, the plot is a straight line with slope
𝛼. If we are able to determine an estimate Z ̂ (t), the plotted values should follow a
reasonably straight line on a log–log scale. ◻

The rest of this section is concerned with a particular type of cumulative failure
rate plots: the Nelson–Aalen plot.
14.6 Cumulative Failure Rate Plots 703

14.6.1 The Nelson–Aalen Estimate of the Cumulative Failure Rate


An obvious estimate of the cumulative failure rate Z(t), based on the Kaplan–Meier
̂ (t), is
estimator R
̂ (t) = − log R
Z ̂ (t). (14.43)

An alternative estimate of Z(t) is proposed by Nelson (1972) and elaborated by


Aalen (1978). This estimate is now known as the Nelson–Aalen estimate. Assume
that we have a stochastically censored (type IV) dataset. As before, let

0 = t(0) < t(1) < t(2) < · · · < t(n)

be the recorded ordered survival times until either failure or censoring, and let 𝛿j
be the status of survival time t(j) , for j = 1, 2, … , n.
The Nelson–Aalen estimate of the cumulative failure rate is then
∑ dj
̂ (t) =
Z , (14.44)
n
j;t <t,𝛿 =1 j
(j) j

where dj , as before, is the number of items that fail at time t(j) and nj is the number
of items at risk just before t(j) . The Nelson–Aalen estimator of the survivor function
at time t is
̂ (t)].
R∗ (t) = exp[−Z (14.45)

Before we give a justification for these estimators, we illustrate how they are
calculated in Example 14.12.

Example 14.12 (Nelson–Aalen estimate for a censored dataset)


Reconsider the censored (type IV) dataset in Table 14.2. The Nelson–Aalen
estimate Z ̂ (t) may be calculated from (14.44) for the eight failure times t(1) , t(2) ,
t(3) , t(5) , t(6) , t(12) , t(14) , and t(16) . Next, R∗ (t) is determined from (14.45). The results
are shown in Table 14.4. In the last column of Table 14.4, the corresponding
Kaplan–Meier estimate R ̂ (t) is shown.
As seen, there is good “agreement” between the Kaplan–Meier estimates and
the Nelson–Aalen estimates for the survivor function in this dataset, especially for
the shortest failure times. For the longest failure times, the discrepancy becomes
more significant.
By using the results in Table 14.4, we can now plot the survival times on the
x-axis and the corresponding Nelson–Aalen estimates Z ̂ (t) on the y-axis and obtain
the Nelson–Aalen plot. ◻

Making the Nelson–Aalen plot manually by the procedure in Example 14.12


may be tedious, but luckily, we may use the R survival package.
704 14 Reliability Data Analysis

Table 14.4 Nelson–Aalen estimate for the censored dataset in Example 14.12,
compared with the Kaplan–Meier estimate.

Kaplan–
Survival Status Nelson–Aalen Nelson–Aalen Meier
j time 𝜹j ̂ )
estimate Z(t R∗ (t(j) ) ̂ )
R(t
j (j)

= 0.0000 1.000 1.000


1
1 31.7 1 = 0.0625 0.939 0.938
16
1 1
2 39.2 1 + = 0.1292 0.879 0.875
16 15
1 1 1
3 57.5 1 + + = 0.2006 0.818 0.813
16 15 14
4 65.0 0
1 1 1 1
5 65.8 1 + + + = 0.2839 0.753 0.745
16 15 14 12
1 1 1
6 70.0 1 + +···+ = 0.3748 0.687 0.677
16 15 11
7 75.0 0

8 75.2 0

9 87.5 0

10 88.3 0

11 94.2 0
1 1 1
12 101.7 1 +···+ + = 0.5748 0.563 0.542
16 11 5
13 105.8 0
1 1 1
14 109.2 1 +···+ + = 0.9082 0.403 0.361
16 5 3
15 110.0 0
1 1 1
16 130.0 1 +···+ + = 1.9082 0.148 0.000
16 3 1

Nelson–Aalen Plot with R


There is no dedicated package in R for making the Nelson–Aalen plot, but we may
use the procedure in the following R script, which illustrates the plot by using the
same dataset as in Example 14.12.
14.6 Cumulative Failure Rate Plots 705

library(survival)
# Data to be analyzed
survtime<-c(31.7,39.2,57.5,65.0,65.8,70.0,75.0 75.2,
87.5,88.3,94.2,101.7,105.8,109.2,110.0,130.0)
status<-c(1,1,1,0,1,1,0,0,0,0,0,1,0,1,0,1)
# Prepare hazard data
revrank<-order(survtime,decreasing=T)
haz<- status/revrank
cumhaz<- cumsum(haz)
# Select only failures for plotting.
df<- data.frame(survtime status,cumhaz)
z<- subset(df,status==1)
# Generate cumulative failure rate plot for exp. distr.
plot(z$survtime, z$cumhaz,type="o",pch=19,xlab="Time",
ylab="Cumulative failure rate")

The plot obtained from this script is made with a linear scale on both axes.
This means that if the data come from an exponential distribution, the plot
should be approximately a straight line (see Example 14.10). The plot is shown
in Figure 14.17. The plot is rather far from linear, and we may conclude that the
underlying distribution is probably not exponential. To inspect the data used, you
may use the commands print(df) and print(z).

2.0
Cumulative failure rate

1.5

1.0

0.5

0.0
0 20 40 60 80 100 120 140
Time

Figure 14.17 Nelson–Aalen plot (linear scale).


706 14 Reliability Data Analysis

2.0
1.5
1.0
Cumulative failure rate

0.5

30 50 70 90 110 130
Time

Figure 14.18 Nelson–Aalen plot (log 10 scale).

We may also make the Nelson–Aalen plot with log 10 scale on both axes. As
shown in Example 14.11, an approximately straight line would indicate that the
underlying distribution may be a Weibull distribution. The plot is obtained by
adding the option log="xy" to the plot( ) command in the R script above.
The resulting Nelson–Aalen plot is shown in Figure 14.18. The plot is not too far
from a straight line, so the Weibull distribution might be an adequate model.

Justification for the Nelson–Aalen Estimate


Some steps in the following justification are approximative and far from rigorous,
but we hope the reader may get an understanding of how the Nelson–Aalen esti-
mate is developed. For a more rigorous development of the estimate, see Aalen
et al. (2008).
To justify the Nelson–Aalen estimate, we start with arguments similar to those
used when introducing the Kaplan–Meier estimate. An ordered dataset 0 = t(0) <
t(1) < t(2) < · · · < t(n) is available. The dataset may be censored and include ties. As
before, let nj be the number of items at risk just before survival time t(j) and let dj
be the number of items that fail at time t(j) . We again use (14.37)


i
R(t(i) ) = R(t(j) ∣ t(j−1) ),
j=1

and assume that the failure rate function in the interval (t(j−1) , t(j) ) may be approx-
imated by a constant failure rate 𝜆j , for j = 1, 2, ….
For a time t, such that t(m) < t < t(m+1) , we get
R(t) = Pr(T > t(1) ∣ T > t(0) ) · · · Pr(T > t ∣ T > t(m) ). (14.46)
14.6 Cumulative Failure Rate Plots 707

As for the Kaplan–Meier estimate, the idea is to estimate each single factor on the
right-hand side of (14.46) and use the product of these estimates as an estimate of
R(t). What is now a reasonable estimate of pj = Pr(T > t(j+1) ∣ T > t(j) )? With the
same approach, we used to justify the Kaplan–Meier estimate, the only survival
times for which it is natural to estimate pj with something other than 1, are the
survival times t(j) where a failure occurs. Because we only consider the times where
something happens (failure or censoring), nj items are at risk in the whole interval
(t(j−1) , t(j) ).
The total functioning time in the (t(j−1) , t(j) ) is nj (t(j) − t(j−1) ). Because we assume
a constant failure rate 𝜆j in (t(j−1) , t(j) ) a natural estimate of 𝜆j is
No. of failures dj
𝜆̂j = = . (14.47)
Total functioning time nj (t(j+1) − t(j) )
A natural estimate of pj , when dj failures occur at t(j) is therefore
( )
dj
̂ ̂
pj = exp[−𝜆j (t(j) − t(j−1) )] = exp − . (14.48)
nj
Inserting these estimates in (14.46) gives
( )
∏ dj ⎡ ∑ dj ⎤
̂
R(t) = exp − = exp ⎢− ⎥. (14.49)
nj ⎢ t <t,𝛿 =1 nj ⎥
t(j) <t,𝛿j =1 ⎣ (j) j ⎦
Because R(t) = exp[−Z(t)], a natural estimate for the cumulative failure rate func-
tion is
∑ dj
̂ (t) =
Z , (14.50)
n
t <t,𝛿 =1 j
(j) j

which is the Nelson–Aalen estimate.

Uncertainty of the Nelson–Aalen Estimator


The variance of the Nelson–Aalen estimator may be estimated by (e.g. see Aalen
et al. 2008)
∑ (nj − dj ) dj
var[Ẑ (t)] = 𝜎
̂2 (t) = . (14.51)
2
t(j) <t,𝛿j =1 (nj − 1) nj

It may be shown that both the Nelson–Aalen estimator and the variance estima-
tor are close to unbiased. For large samples, it may further be shown that the
Nelson–Aalen estimator at time t is approximately normally distributed. We may
̂ (t) as
therefore find a (1 − 𝜖) confidence interval for Z
̂ (t) ± u1−𝜖∕2 𝜎
Z ̂(t), (14.52)
where u1−𝜖∕2 is the 1 − 𝜖∕2 fractile of the standard normal distribution. More prop-
erties of the estimator may be found in Aalen et al. (2008).
708 14 Reliability Data Analysis

14.7 Total-Time-on-Test Plotting

A TTT plot is an alternative – but also a supplement – to Kaplan–Meier plot and


Nelson–Aalen plots.

14.7.1 Total-Time-on-Test Plot for Complete Datasets


Assume that we have a complete and ordered dataset t(1) < t(2) < · · · < t(n) of
independent lifetimes with continuous distribution function F(t) that is strictly
increasing for F −1 (0) = 0 < t < F −1 (1). Further, it is assumed that the distribution
has finite mean 𝜇.
The TTT at time t,  (t) has earlier been defined as


i
 (t) = t(j) + (n − i)t for i = 0, 1, … , n and t(i) ≤ t < t(i+1) ,
j=1
(14.53)
and t(0) is defined to be equal to 0 and t(n+1) = +∞.
 (t) is the total observed lifetime of the n items at time t. We assume that all the
n items are put into operation at time t = 0 and that the observation is terminated
at time t. In the time interval (0, t], a number, i, of the items have failed. The total
∑i
functioning time of these i items is j=0 t(j) . The remaining n − i items survive the
time interval (0, t]. The total functioning time of these n − i items is thus (n − i)t.
The TTT at the ith failure is

i
 (t(i) ) = t(j) + (n − i)t(i) for i = 1, 2, … , n. (14.54)
j=1

In particular,

n

n
 (t(n) ) = t(j) = tj .
j=1 j=1

The TTT at the ith failure,  (t(i) ), may be scaled by dividing by  (t(n) ). The scaled
TTT at time t is defined as  (t)∕ (t(n) ).
If we plot the points
( )
i  (t(i) )
, for i = 1, 2, … , n, (14.55)
n  (t(n) )
we obtain the TTT plot of the dataset.

Example 14.13 Suppose that we have activated 10 identical items and observed
their lifetimes (in hours):
14.7 Total-Time-on-Test Plotting 709

6.3 11.0 21.5 48.4 90.1


120.2 163.0 182.5 198.0 219.0

To construct the TTT plot for this (complete) dataset, calculate the necessary
quantities and put them in a table as done in Table 14.5. The TTT plot for this
(complete) dataset is shown in Figure 14.19. ◻

To be able to interpret the shape of the TTT plot, we need the following results,
which we state without proofs.

(1) Let U1 , U2 , … , Un−1 be independent random variables with a uniform distri-


bution over (0, 1] (i.e. Ui ∼ unif(0, 1) ). If the underlying life distribution is
exponential, the random variables

 (T(1) )  (T(2) )  (T(n−1) )


, , …, (14.56)
 (T(n) )  (T(n) )  (T(n) )

have the same joint distribution as the (n − 1) ordered variables U(1) , U(2) , … ,
U(n−1) . For a proof, see Barlow and Campo (1975).
(2) If the underlying life distribution F(t) is exponential, then
(a) var[ (Ti )∕ (Tn )] is finite
(b) E[ (Ti )∕ (Tn )] = 1∕n for i = 1, 2, … , n

Table 14.5 TTT Estimates for the dataset in Example 14.15.

∑i ∑i i  (t(i) )
i t(i) j=1 t(j) j=1 t(j) + (n − i)t(i) =  (t(i) )
n  (t(n) )

1 6.3 6.3 6.3 + 9⋅6.3 = 63.0 0.1 0.06


2 11.0 17.3 17.3 + 8⋅11.0 = 105.3 0.2 0.10
3 21.5 38.8 38.8 + 7⋅21.5 = 189.3 0.3 0.18
4 48.4 87.2 87.2 + 6⋅48.4 = 377.6 0.4 0.36
5 90.1 177.3 177.3 + 5⋅90.1 = 627.8 0.5 0.59
6 120.2 297.5 297.5 + 4⋅120.2 = 778.3 0.6 0.73
7 163.0 460.5 460.5 + 3⋅163.0 = 949.5 0.7 0.90
8 182.5 643.0 643.0 + 2⋅182.5 = 1008.0 0.8 0.95
9 198.0 841.0 841.0 + 1⋅198.0 = 1039.0 0.9 0.98
10 219.0 1060.0 1060.0 + 0 = 1060.0 1.0 1.00
710 14 Reliability Data Analysis

1.0

0.8

0.6
ϕF(v)

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
i/n (v)

Figure 14.19 TTT plot of the data in Example 14.13.

If the underlying life distribution is exponential, we should, from (14.56), expect


that for large n
 (T(i) ) i
≈ for i = 1, 2, … , (n − 1).
 (T(n) ) n
As this is not the case for the TTT plot in Figure 14.19, we conclude that the under-
lying life distribution for the data in Example 14.13 is probably not exponential.
To decide from a TTT plot whether or not the corresponding life distribution is
increasing failure rate (IFR) or decreasing failure rate (DFR), we need a little more
theory. We will be content with a heuristic argument.8
We claim that
t(i)
 (t(i) ) = n [1 − Fn (u)] du, (14.57)
∫0
where Fn (t) is the empirical distribution function. Assertion (14.57) can be proved
in the following way (remember that per definition t(0) = 0):
t(i)
n [1 − Fn (u)] du
∫0

8 A rigorous treatment is found, for example, in Barlow and Campo (1975).


14.7 Total-Time-on-Test Plotting 711

[ i ( ) ]
∑ t(j)
j−1
=n 1− du
j=1
∫t(j−1) n
∑ i
= (n − j + 1)(t(j) − t(j−1) )
j=1

= nt(1) + (n − 1)(t(2) − t(1) ) + · · · + (n − i + 1)(t(i) − t(i−1) )



i
= t(j) + (n − i)t(i) =  (t(i) ).
j=1

We now come to the heuristic part of the argument. First, let n equal 2m + 1, where
m is an integer. Then t(m+1) is the median of the dataset. What happens to the
integral
t(m+1)
[1 − Fn (u)] du when m → ∞.
∫0
When m → ∞, we can expect that
Fn (u) → F(u),
and that
t(m+1) → {median of F} = F −1 (1∕2),
and therefore that
F −1 (1∕2)
1
 (t(m+1) ) → [1 − F(u)] du. (14.58)
n ∫0
Next, let n = 4m + 3. In this case, t(2m+2) is the median of the data, and t(m+1) and
t(3m+3) are the lower and upper quartiles, respectively.
When m → ∞, by arguing as we did above, we can expect the following:
F −1 (1∕4)
1
 (t(m+1) ) → [1 − F(u)] du
n ∫0
F −1 (1∕2)
1
 (t(2m+2) ) → [1 − F(u)] du (14.59)
n ∫0
F −1 (3∕4)
1
 (t(3m+3) ) → [1 − F(u)] du.
n ∫0
In addition, we have that
∞ F −1 (1)
E(T) = 𝜇 = [1 − F(u)] du = [1 − F(u)] du. (14.60)
∫0 ∫0
When n → ∞, we can therefore expect that
1∑
n F −1 (1)
1
t =  (t(n) ) → [1 − F(u)] du. (14.61)
n i=1 i n ∫0
712 14 Reliability Data Analysis

1 F(t)

1–v

HF –1(v)

v R(t) = 1 – F(t)

0
0 F –1(v) Time t

Figure 14.20 The TTT transform of the distribution F.

The integrals that we obtain as limits by this approach seem to be of interest and
we will look at them more closely. They are all of the type
F −1 (𝑣)
[1 − F(u)] du for 0 ≤ 𝑣 ≤ 1.
∫0

The Total-Time-on-Test Transform


We now introduce the TTT transform of the distribution F(t) as
F −1 (𝑣)
HF−1 (𝑣) = [1 − F(u)] du for 0 ≤ 𝑣 ≤ 1. (14.62)
∫0
The TTT transform of the distribution F(t) is shown in Figure 14.20. Observe that
HF−1 (𝑣) is the “area” under the survivor function R(t) between t
= 0 and t = F −1 (𝑣).
It can be shown under assumptions of general nature that there is a one-to-one
correspondence between a distribution F(t) and its TTT transform HF−1 (𝑣) (see Bar-
low and Campo 1975).
Observe from (14.62) that
F −1 (1)
HF−1 (1) = [1 − F(u)] du = 𝜇. (14.63)
∫0
The scaled TTT transform of F(t) is defined as
HF−1 (𝑣) 1 −1
𝜑F (𝑣) = = H (𝑣) for 0 ≤ 𝑣 ≤ 1. (14.64)
HF−1 (1) 𝜇 F

Example 14.14 (Exponential distribution)


The distribution function of the exponential distribution is

F(t) = 1 − e−𝜆t for t ≥ 0, 𝜆 > 0,


14.7 Total-Time-on-Test Plotting 713

and hence
1
F −1 (𝑣) = − log(1 − 𝑣) for 0 ≤ 𝑣 ≤ 1.
𝜆
Thus, the TTT transform of the exponential distribution is
[− log(1−𝑣)]∕𝜆
1 −𝜆u |− 𝜆1 log(1−𝑣)
HF−1 (𝑣) = e−𝜆u du = − e |
∫0 𝜆 |0
|
1 1 𝜆 log(1−𝑣)∕𝜆
= − e
𝜆 𝜆
1 1 𝑣
= − (1 − 𝑣) = for 0 ≤ 𝑣 ≤ 1.
𝜆 𝜆 𝜆
Further
1
HF−1 (1) = .
𝜆
The scaled TTT transform for the exponential distribution is therefore
𝑣∕𝜆
= 𝑣 for 0 ≤ 𝑣 ≤ 1. (14.65)
1∕𝜆
The scaled TTT transform of the exponential distribution is thus a straight line
from (0, 0) to (1, 1), as shown in Figure 14.21. ◻

1.0

0.8
Scaled TTT transform

0.6

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
v

Figure 14.21 Scaled TTT transform of the exponential distribution (Example 14.16).
714 14 Reliability Data Analysis

Example 14.15 (Weibull distribution)


It is usually not straightforward to determine the TTT transform of a life distribu-
tion. We illustrate this by trying to determine the TTT transform of the Weibull
distribution
[ ( )𝛼 ]
t
F(t) = 1 − exp − for t ≥ 0, 𝜃 > 0, 𝛼 > 0.
𝜃
The inverse function of F is
F −1 (𝑣) = 𝜃 [− log(1 − 𝑣)]1∕𝛼 for 0 ≤ 𝑣 ≤ 1.
The TTT transform of the Weibull distribution is
F −1 (𝑣) 𝜃 [− log(1−𝑣)]1∕𝛼
𝛼
HF−1 (𝑣) = [1 − F(u)] du = e−(u∕𝜃) du.
∫0 ∫0
By substituting x = (u∕𝜃)𝛼 we obtain
− log(1−𝑣)
𝜃
HF−1 (𝑣) = x1∕𝛼+1 e−x dx, (14.66)
𝛼 ∫0
which shows that the TTT transform of the Weibull distribution may be expressed
by the incomplete gamma function. However, several approximation formulas are
available.
The mean time-to-failure (MTTF) is obtained by inserting 𝑣 = 1 in HF−1 (𝑣).
∞ ( ) ( )
𝜃 𝜃 1 1
HF−1 (1) = x1∕𝛼+1 e−x dx = Γ =𝜃Γ +1 ,
𝛼 ∫0 𝛼 𝛼 𝛼
which coincides with the result we obtained in (5.67). Observe that the scaled
TTT transform of the Weibull distribution depends only on the shape parame-
ter 𝛼 and is independent of the scale parameter 𝜃. Scaled TTT transforms of the
Weibull distribution for some selected values of the shape parameter 𝛼 are shown
in Figure 14.22. ◻

Three Useful Results


We now list three useful results and indicate a proof.
(1) If F(t) is a continuous life distribution that is strictly increasing for F −1 (0) =
0 < t < F −1 (1), then
d −1 1
H (𝑣)|𝑣=F(t) = , (14.67)
d𝑣 F z(t)
where z(t) is the failure rate of the distribution F(t).
14.7 Total-Time-on-Test Plotting 715

0.8 α=5

α=3

α=2
0.6
ϕF(v)

0.4
α = 0.8

α = 0.5
0.2
α = 0.3

0
0 0.2 0.4 0.6 0.8 1
v

Figure 14.22 Scaled TTT transforms of the Weibull distribution for some selected
values of 𝛼.

Proof :
Because
F −1 (𝑣)
d −1 d
H (𝑣) = [1 − F(u)] du
d𝑣 F d𝑣 ∫0
d −1 1
= (1 − F[F −1 (𝑣)]) F (𝑣) = (1 − 𝑣) ,
d𝑣 f [F −1 (𝑣)]
then
d −1 1 1
H (𝑣)|𝑣=F(t) = [1 − F(t)] = .
d𝑣 F f (t) z(t)
From (14.67) we obtain
(2) If F(t) is a continuous life distribution, strictly increasing for F −1 (0) = 0 < t <
F −1 (1), then
(a) F ∼ IFR ⇐⇒ HF−1 (𝑣) concave; 0 ≤ 𝑣 ≤ 1
(b) F ∼ DFR ⇐⇒ HF−1 (𝑣) convex; 0 ≤ 𝑣 ≤ 1
716 14 Reliability Data Analysis

The arguments, used to prove properties 1 and 2 are completely analogous. We


therefore prove only property 1.
Proof :

F ∼ IFR ⇐⇒ z(t) is nondecreasing in t


1
⇐⇒ is nonincreasing in t
z(t)
d −1
⇐⇒ H (𝑣)|𝑣=F(t) is nonincreasing in t
d𝑣 F
d −1
⇐⇒ H (𝑣) is nonincreasing in 𝑣
d𝑣 F
because F(t) is strictly increasing
⇐⇒ HF−1 (𝑣) is concave, 0 ≤ 𝑣 ≤ 1.

To estimate the scaled TTT transform of F(t) for different 𝑣 values on the basis
of the observed lifetimes, it is natural to use the estimator
Fn−1 (𝑣)
∫0 [1 − Fn (u)] du i
F −1 (1)
for 𝑣= , i = 1, 2, … , n. (14.68)
∫0 n [1 − Fn (u)] du n

Introducing the notation

Fn−1 (𝑣)
i
Hn−1 (𝑣) = [1 − Fn (u)] du for 𝑣= , i = 1, 2, … , n,
∫0 n
(14.69)

this estimator can be written as


Hn−1 (𝑣) i
for 𝑣= , i = 1, 2, … , n. (14.70)
Hn−1 (1) n

By comparing (14.70) with (14.64), it seems natural to call Hn−1 (𝑣)∕Hn−1 (1) the
empirical, scaled TTT transform of the distribution F(t).
The following result is useful when we wish to exploit the TTT plot to provide
information about the life distribution F(t):
(3) If F(t) is a continuous life distribution function, strictly increasing for F −1 (0) =
0 < t < F −1 (1), then
( )
Hn−1 ni  (t(i) )
= for i = 1, 2, … , n, (14.71)
−1
Hn (1)  (t(n) )

where  (t(i) ), as before, is the TTT at time t(i) .


14.7 Total-Time-on-Test Plotting 717

Proof :
According to (14.69), for i = 1, 2, … , n,
( ) Fn−1 ( ni )
i
Hn−1 = [1 − Fn (u)] du
n ∫0
T(i)
1
= [1 − Fn (u)] du =  (T(i) ),
∫0 n
where as
Fn−1 (1)
Hn−1 (1) = [1 − Fn (u)] du
∫0
1∑
∞ n
1
= [1 − Fn (u)] du =  (t(n) ) = t.
∫0 n n i=1 i
By introducing these results in (14.70), we get (14.71).

Therefore, the scaled TTT at time t(i) seems to be a natural estimate of the scaled
TTT transform of F(t) for 𝑣 = i∕n, for i = 1, 2, … , n. One way of obtaining an esti-
mate for the scaled TTT transform for (i − 1)∕n < 𝑣 < i∕n, is by applying linear
interpolation between the estimate for 𝑣 = (i − 1)∕n and 𝑣 = i∕n. In the following
we use this procedure.
Now suppose that we have access to a survival dataset. We first determine
 (t(i) )∕ (t(n) ) for i = 1, 2, … , n as we did in Example 14.13, plot the points
[i∕n,  (t(i) )∕ (t(n) )] and join pairs of neighboring points with straight lines. The
curve obtained is an estimate for HF−1 (𝑣)∕HF−1 (1) = 𝜇1 HF−1 (𝑣), for 0 ≤ 𝑣 ≤ 1.
We may now assess the shape of the curve (the estimate for HF−1 (𝑣)) in the light
of the result in (14.67) and its proof, and in this way obtain information about the
underlying distribution F(t).
A plot, such as the one shown in Figure 14.23a, shows that HF−1 (𝑣) is concave.
The plot therefore indicates that the corresponding life distribution F(t) is IFR.
Using the same type of argument, the plot in Figure 14.23b shows that HF−1 (𝑣) is
convex, so that the corresponding life distribution F(t) is DFR. Similarly, the plot
in Figure 14.23c indicates that HF−1 (𝑣) “is first convex” and “thereafter concave.”
In other words, the failure rate of the corresponding lifetime distribution has a
bathtub shape.
The TTT plot obtained in Example 14.13, therefore indicates that these data orig-
inate from a life distribution with bathtub shaped failure rate.

Example 14.16 (Ball bearing failures)


Lieblein and Zelen (1956) provide the numbers of millions of revolutions to failure
for each of 23 ball bearings. Below, the original data are put in numerical order for
convenience.
718 14 Reliability Data Analysis

(a)

(c)
ϕF(v)

(b)

0 1
v

Figure 14.23 TTT plots indicating (a) increasing failure rate (IFR), (b) decreasing failure
rate (DFR), and (c) bathtub-shaped failure rate.

17.88 28.92 33.00 41.52 42.12 45.60 48.40


51.84 51.96 54.12 55.56 67.80 68.64 68.64
68.88 84.12 93.12 98.64 105.12 105.84 127.92
128.04 173.40

The TTT plot of the ball bearing data is presented in Figure 14.24. The TTT plot
indicates an IFR. We may try to fit a Weibull distribution to the data. The Weibull
parameters 𝛼 and 𝜆 are estimated to be 𝛼 ̂ = 2.10 and 𝜆̂ = 1.22 × 10−2 . The TTT
transform of the Weibull distribution with these parameters is plotted as an overlay
curve to the TTT plot in Figure 14.24. ◻

TTT Plotting with R


The scaled TTT-plot is available in the package AdequacyModel. We illus-
trate its use by the data from Example 14.16. A simple script for Figure 14.24
is:
14.7 Total-Time-on-Test Plotting 719

1.0

0.8

0.6
T(i/n)

0.4

0.2

0.0
0.0 0.2 0.4 0.6 0.8 1.0
i/n

Figure 14.24 TTT plot of the ball bearing data in Example 11.11 together with an
overlay curve of the TTT transform of the Weibull distribution with shape parameter
𝛼 = 2.10.

library(AdequacyModel)
# Enter the dataset
data <- c(17.88,28.92,33.00,41.52,42.12,45.60,48.40,
51.84,51.96,54.12,55.56,67.80,68.64,68.64,
68.88,84.12,93.12,98.64,105.12,105.85,127.92,
128.04,173.40)
# Make the TTT plot
TTT(data,lwd=1.5,grid=F,lty=3)

If you want to establish the scaled TTT-transform for a particular distribution,


say a 2-parameter Weibull distribution with shape parameter 𝛼 = 3, this can be
obtained by a similar script where the data is a random sample from this distribu-
tion. To get a smooth curve, we need a rather high number of simulated values. A
script to obtain the TTT-transform is
720 14 Reliability Data Analysis

library(AdequacyModel)
# Generate a random sample from a Weibull distribution
data <- rweibull(8000,3,scale=1)
# Make the TTT transform
TTT(data, lwd=1.5,grid=F,lty=3)

Example 14.17 (Age replacement)


A well-known application of the TTT transform and the TTT plot is the age
replacement problem that is discussed in Section 12.3.1. Here an item is replaced
at a cost c + k at failure or at a cost c at a planned replacement when the item has
reached a certain age t0 .
The average replacement cost per time unit of this policy was found to be
c + kF(t0 )
C(t0 ) = t
. (14.72)
∫0 0 [1 − F(t)] dt
The objective is now to determine the value of t0 that minimizes C(t0 ). If the distri-
bution function F(t) and all its parameters are known, it is a straightforward task
to determine the optimal value of t0 . One way to solve this problem is to apply the
TTT transform.
By introducing the TTT transform (14.62) as
c + kF(t0 ) 1 c + kF(t0 )
C(t0 ) = = ,
HF−1 [F(t0 )] HF−1 (1) 𝜑F [F(t0 )]
where HF−1 (1) is the MTTF of the item, and 𝜑F (𝑣) = HF−1 (𝑣)∕HF−1 (1) is the scaled
TTT transform of the distribution function F(t).
The optimal value of t0 may be determined by first finding the value 𝑣0 = F(t0 )
that minimizes
c + k𝑣0
C1 (𝑣0 ) = ,
𝜑F (𝑣0 )
and thereafter determine t0 such that 𝑣0 = F(t0 ). The minimizing value of 𝑣0 may
be found by setting the derivative of C1 (𝑣0 ) with respect to 𝑣0 equal to zero, and
solve the equation for 𝑣0 .
d 𝜑F (𝑣0 ) k − 𝜑′F (𝑣0 )(c + k𝑣0 )
C1 (𝑣0 ) = = 0.
d𝑣0 𝜑F (𝑣0 )2
This implies that
𝜑F (𝑣0 )
𝜑′F (𝑣0 ) = . (14.73)
c∕k + 𝑣0
The optimal value of 𝑣0 , and hence t0 , may now be determined by the following
simple graphical method.
14.7 Total-Time-on-Test Plotting 721

ϕF(v)

–1 –c/k 0 v0 1

Figure 14.25 Determination of the optimal replacement age from the scaled TTT
transform.

(1) Draw the scaled TTT transform in a 1 × 1 –coordinate system.


(2) Identify the point (−c∕k, 0) on the abscissa axis.
(3) Draw a tangent from (−c∕k, 0) to the TTT transform.

The optimal value of 𝑣0 can now be read as the abscissa of the point where
the tangent touches the TTT transform. If 𝑣0 = 1, then t0 = ∞, and no preventive
replacements should be performed. The procedure is shown in Figure 14.25.
When a set of times-to-failure of the actual type of item has been recorded, we
may use this dataset to obtain the empirical, scaled TTT transform of the underly-
ing distribution function F(t), and draw a TTT plot. The optimal replacement age t0
may now be determined by the same procedure as described above. This is shown
in Figure 14.26. The procedure is further discussed, for example, by Bergman and
Klefsjö (1982,1984). ◻

14.7.2 Total-Time-on-Test Plot for Censored Datasets


When the dataset is incomplete with random censoring (type IV), we may argue as
follows to obtain a TTT plot: The TTT transform, as defined in (14.62), is valid for a
wide range of distribution functions F(t), also for step functions. Instead of estimat-
ing the TTT transform HF−1 (t) by introducing the empirical distribution function
Fn (t) as we did in (14.69), we could estimate F(t) by [1 − R ̂ (t)], where R
̂(t) is the
Kaplan–Meier estimator of R(t).
722 14 Reliability Data Analysis

–1 –c/k 0 i/n 1

Figure 14.26 Determination of the optimal replacement age from a TTT plot.

Technically, the plot is obtained as follows: Let t(1) , t(2) , … , t(k) denote the k
ordered failure times among t1 , t2 , … , tn and let
̂(t(i) )
𝑣(i) = 1 − R for i = 1, 2, … , k.
Define
t(i) ∑
i−1
̂ −1 (𝑣(i) ) =
H ̂ (u) du =
R ̂(t(j) ),
(t(j+1) − t(j) )R
∫0 j=1

where t(0) = 0.
The TTT plot is now obtained by plotting the points
( )
𝑣(i) Ĥ −1 (𝑣(i) )
, for 1 = 1, 2, … , k.
𝑣(k) Ĥ −1 (𝑣(k) )
Observe that when k = n, that is, when the dataset is complete, then
i
𝑣(i) = ,
n
̂ −1 (𝑣(i) ) =  (t(i) ),
H
and we get the same TTT plot as we got for complete datasets.

14.7.3 A Brief Comparison


Sections 14.5–14.7 present three nonparametric estimation and plotting tech-
niques that may be applied to both complete and censored data. (The empirical
14.8 Survival Analysis with Covariates 723

survivor function is equal to the Kaplan–Meier estimate when the dataset is


complete, and is therefore considered as a special case of the Kaplan–Meier
approach.) The estimates obtained by using the Kaplan–Meier, and the
Nelson–Aalen approaches are rather similar, so it is not important which of
these is chosen. The nature of the estimate based on TTT transform is different
from the other two estimates and may provide supplementary information.
The plots may also be used as a basis for selection of an adequate parametric
distribution F(t). In this respect, the three plots provide somewhat different infor-
mation. The Kaplan–Meier plot is very sensitive to variations in the early and
middle phases of an item’s lifetime, but is not very sensitive in the right tail of
the distribution. The Nelson–Aalen plot is not at all sensitive in the early part of
the life distribution, because the plot is “forced” to start in (0, 0). The TTT plot is
very sensitive in the middle phase of the life distribution, but less sensitive in the
early phase and in the right tail, because the plot is “forced” to start in (0, 0) and
end up in (1, 1). To get adequate information about the whole distribution, all the
three plots should be studied.

14.8 Survival Analysis with Covariates


The reliability of items is often found to be influenced by one or more covariates.
Covariates and various models applying covariates were introduced in Section 5.5.
This section sheds some light on how to analyze data with different covariate lev-
els. This is a huge and complicated area, so we only scratch the surface of this
topic.
We assume that all covariates are measurable, either on a continuous scale, a
discrete scale, or simply as “yes” or “no.” We further assume that all covariates
remain constant during the data collection exercise.

14.8.1 Proportional Hazards Model


By a proportional hazards (PH) model the failure rate function z(t) is modified by a
factor g(s), where s is the covariate vector. The term “hazard” is here used with the
same meaning as failure rate. We could therefore talk about proportional failure
rates instead of PH, but PH is the standard term used in most other application
areas, such as biostatistics and medical research.
The PH model assumes that the failure rate function related to a specific covari-
ate vector s may be written as
z(t ∣ s) = z0 (t) g(s). (14.74)
The failure rate function z(t ∣ s) is seen to be the product of two factors:
724 14 Reliability Data Analysis

(1) A time-dependent factor z0 (t), which is called the baseline failure rate and does
not depend on s. The baseline failure rate is usually not specified in the PH
model.
(2) A proportionality factor g(s), which is a function of the covariate vector s, and
not of time t.

Hazard Ratio
We may compare the effects of two covariate vectors s1 and s0 by the ratio:
z(t ∣ s1 ) g(s1 )
HR(s1 , s0 ) = = . (14.75)
z(t ∣ s0 ) g(s0 )
This expression is called the hazard ratio (HR) for the covariate vectors s1 and s0 .
The covariate vector s0 often refers to a basic and known application of the item,
called the baseline application, whereas the covariate vector s1 refers to the use of a
similar item in a new environment. The hazard ratio shows that the two failure rate
functions are proportional for any value of t. This proportionality is the reason for
calling the model a PH model. The factor of interest is how large g(s1 ) is compared
to g(s0 ) and not the value of each of them. Therefore, we often set g(s0 ) = 1, such
that g(s1 ) = HR(s1 , s0 ).
In cases where g(s0 ) = 1, and we study a single alternative covariate vector, this
vector is usually denoted s (i.e. without an index) and the hazard ratio is HR(s) =
g(s).
The effect of the covariate vector s is, therefore, determined by g(s), which
scales the baseline failure rate function z0 (t). Figure 14.27 shows a baseline failure
rate function z0 (t) for a Weibull distribution with shape parameter 𝛼 = 1.65 (fully
drawn line) together with the failure rate function (dotted line) for an item with
covariate vector s and hazard ratio g(s) = 2 based on a PH model. The failure rate
function for s is obtained by multiplying z0 (t) with HR = 2 for each point of time t.

0.6

0.4
z(t)

0.2

0.0
0 2 4 6 8 10
Time t

Figure 14.27 Failure rate function for the PH model. The baseline failure rate function
(fully drawn line) and for another condition with hazard ratio (HR)= 2 (dotted line). The
baseline is a Weibull distribution with shape parameter 𝛼 = 1.65.
14.8 Survival Analysis with Covariates 725

Cumulative Failure Rate


t
In the PH model, the cumulative failure rate Z(t) = ∫0 z(u) du is

Z(t ∣ s) = Z0 (t)g(s). (14.76)

Survivor Function
Let R0 (t) be the survivor function for the baseline application. The survivor func-
tion for a new application with covariate vector s is from (14.74)

R(t ∣ s) = exp[−Z(t ∣ s)] = exp[−Z0 (t)g(s)] = [R0 (t)]g(s) . (14.77)

This result implies that if we know the survivor function in the baseline applica-
tion and if we are able to determine the hazard ratio, g(s), it is easy to find the sur-
vivor function – and all the related reliability measures – in the new environment s.

Example 14.18 (Exponential distribution)


Consider at item with constant failure rate. During normal operation in a base-
line environment, the failure rate is 𝜆0 . The assumption of constant failure rate is
considered realistic also for an alternative environment with covariate vector s. A
simple model for describing the failure rate in this environment is
(m )

𝜆(s) = k 1 si 𝜆0 ,
i=1

where ki is a constant that determines the effect of si on the failure rate, for i =
1, 2, … , m. To comply with our knowledge about the effect of the various influ-
ences, the covariates used may be transformed values of the physical variables.
The square of the voltage may, for example, be used as a covariate. ◻

Example 14.19 (The MIL-HDBK-217 prediction method)


The MIL-HDBK-217F (1995) has for a long time been the state-of-the-art approach
for predicting the constant failure rate of an electronic item that is used under
non-baseline conditions. Let 𝜆0 be the constant failure rate when the item is used
under baseline conditions. For these conditions, 𝜆0 can be estimated from data
obtained from laboratory testing or from field data. The MIL-HDBK-217F suggests
that the failure rate 𝜆 in the nonreference conditions is determined as follows:

𝜆 = 𝜆0 ⋅ 𝜋 S ⋅ 𝜋 T ⋅ 𝜋 E ⋅ 𝜋 Q ⋅ 𝜋 A , (14.78)

where

𝜋S is the stress factor.


𝜋T is the temperature factor.
𝜋E is the environment factor.
726 14 Reliability Data Analysis

𝜋Q is the quality factor.


𝜋A is the adjustment factor.
These factors may be found in the handbook when we know the conditions the
item is used in. The MIL-HDBK-217F therefore applies a simple PH approach.
The MIL-HDBK-217F is discussed further in Chapter 16. ◻

14.8.2 Cox Models


The Cox model was introduced by the British statistician Sir David Roxbee Cox in
his famous paper “Regression models and life tables” Cox (1972). The Cox model
is a special case of a PH model, where the failure rate function is written as
z(t ∣ s) = z0 (t) e𝜷s . (14.79)
The hazard ratio g(s) of this model is
( k )

𝜷s
g(s) = e = exp 𝛽i si ,
i=1

where 𝜷 = (𝛽1 , 𝛽2 , … , 𝛽k ) is a vector of unknown parameters that need to be esti-


mated from observed data.
Consider two different stress levels; a baseline application with s0 and a new
application with s. It is common practice to set s0 = 0 for the baseline application
and to measure the covariates s as the difference from the baseline application. It
is further common to scale the function g(⋅) such that g(s0 ) = 1. The hazard ratio
of this Cox model is
( k )
z(t ∣ s) ∑
= exp(𝜷s) = exp 𝛽j sj .
z(t ∣ 0) j=1

For the Cox model, the log-failure rate function is a linear function
log z(t ∣ s) = log z0 (t) + 𝛽1 s1 + 𝛽2 s2 + · · · + log 𝛽k sk . (14.80)
This indicates that (14.80) may be a suitable basis for some sort of regression anal-
ysis.
The Cox model is said to be a semiparametric model. It is not a parametric model
because the baseline failure rate z0 (t) is unspecified, and it is not nonparametric
because it is assumed how the failure rate function varies with the value of the
covariates. If we make special assumptions about the baseline failure rate func-
tion z0 (t), the Cox model becomes a parametric model. The baseline distribution
may, for example, be assumed to be an exponential or a Weibull distribution. The
advantage of the Cox model is that such assumptions can be avoided. Even though
z0 (t) is unspecified, our objective is to estimate the parameters 𝜷. One of the biggest
14.8 Survival Analysis with Covariates 727

advantages of the Cox model is that we can estimate the parameters 𝜷 that reflect
the effects of the covariates without having to make any assumptions about the
form of z0 (t).

14.8.3 Estimating the Parameters of the Cox Model


A thorough introduction of the theory required to estimate the parameters (𝜷) of
the Cox model would involve several new concepts and is considered to be outside
the scope of this book. The theory described by Cox (1972) is elaborated in several
books (e.g. see Cox and Oakes 1984, Ansell and Phillips 1994, Crowder et al. 1991,
Kalbfleisch and Prentice 1980, Lawless 1982). Theoretical introductions and sur-
veys may also be found in a high number of presentations and lecture notes that
are available on the Internet.
Here, we suffice with a simple introduction, where we highlight some of
the main concepts. We start with a right-censored dataset of survival times
t = (t1 , t2 , … , tn ) from n independent and identical items used in different envi-
ronments. All the survival times are measured from time t = 0. As before, we use
the indicator 𝛿i to tell whether the survival time ended with a failure (𝛿i = 1) or
with a censoring (𝛿i = 0), for i = 1, 2, … , n.
From (14.21), the likelihood function may be written as

n
L(𝜷 ∣ data) = [z(ti ∣ 𝜷, si )]𝛿i R(ti ∣ 𝜷, si ),
i=1

where z(⋅) and R(⋅) are seen as functions of 𝜷 and “data” includes all the data avail-
able in the dataset, including ti , 𝛿i , and si for all the items. For the Cox model, the
likelihood function may be written as

n
L(𝜷 ∣ data) = z0 (ti )[exp(𝜷si )]𝛿i [R0 (ti )]exp(𝜷si ) .
i=1

The corresponding log-likelihood function is



n
𝓁(𝜷 ∣ data) = log z0 (ti ) + 𝛿i 𝜷si + exp(𝜷si ) log R0 (ti ).
i=1

It is not possible to find the 𝜷 that maximizes this log-likelihood function unless
we specify the baseline failure rate function z0 (t). A detailed discussion of this
problem is given in Cox and Oakes (1984, chapter 7). Instead, Cox (1972) intro-
duced a partial likelihood function that does not depend on z0 (t). This function
uses the at-risk-set RS(t) at time t, that is, the set of all items that are functioning
and exposed to failure just before time t. Items that have failed or have been cen-
sored before time t are not members of RS(t). In this simplified introduction, we
assume that there are no ties in the dataset.
728 14 Reliability Data Analysis

Consider a dataset with n distinct survival times t1 , t2 , … , tn . To each survival


time ti is connected an indicator 𝛿i and a covariate vector si , for i = 1, 2, … , n. Each
covariate vector may be regarded as an observation of a general covariate vector s.
This means that the same covariates are measured for each and every survival time.
Next, the survival times are ordered, such that t1 < t2 < · · · < tn . To establish the
partial likelihood function, Cox (1972) starts by considering the conditional prob-
ability that a specific item, say i [∈ RS(ti )] fails at time ti given that one individual
item from the at-risk-set RS(ti ) fails at time ti .9 If the dataset were complete, this
probability would be
z(ti ∣ 𝜷si )
Lp (𝜷 ∣ ti , si ) = ∑ ,
j∈RS(ti ) z(ti ∣ 𝜷sj )

and is the contribution to the partial likelihood [Lp (⋅)] from survival time ti . The
arguments used to arrive at the above result may be summarized as follows:

Pr(Item i fails at time ti ∣ One item from RS(ti ) fails at time ti )


Pr(Item i fails at ti )
=
Pr(One failure in RS(ti ) at ti )
Pr(Item i fails at ti )
= (∑ )
Pr j∈RS(ti ) Item j ∈ RS(ti ) fails at ti , ti + Δt

Pr(Item i fails in (ti , ti + Δt))∕Δt


≈ (∑ )
Pr j∈RS(ti ) Item j ∈ RS(ti ) fails in ti , ti + Δt ∕Δt
limΔt→0 Pr(Item i fails in (ti , ti + Δt))∕Δt
≈ (∑ )
limΔt→0 Pr j∈RS(ti ) Item j ∈ RS(ti ) fails in ti , ti + Δt ∕Δt

z(ti ∣ 𝜷si )
= ∑ .
j∈RS(ti ) z(ti ∣ 𝜷sj )

To simplify the notation, we introduce

𝜓i = exp(𝜷si ) for i = 1, 2, … , n,

which is the factor we must multiply the baseline failure rate z0 (t) with to obtain
the failure rate for the covariate vector si , that is, z(ti ∣ 𝜷, si ) = 𝜓i z0 (t). The con-
tribution to the total partial likelihood function from failure time ti can hence be

9 Any item in RS(ti ) would do. We assume that item i corresponds to the ordered survival time
ti , so it is obviously a member of RS(ti ). We focus on item i to simplify the notation.
14.8 Survival Analysis with Covariates 729

written as
𝜓i
Lp (𝜷 ∣ ti , si ) = ∑ . (14.81)
j∈RS(ti ) 𝜓j

The total partial likelihood for the complete dataset is then



n
𝜓i
Lp (𝜷 ∣ data) = ∑ .
i=1 j∈RS(ti ) 𝜓j

For a right censored dataset, the partial likelihood can be shown to be


[ ]𝛿i
∏n
𝜓i
p
L (𝜷 ∣ data) = ∑ , (14.82)
i=1 j∈RS(ti ) 𝜓j

where censored times are excluded by the indicator 𝛿i = 0 (remember x0 = 1). The
partial likelihood function is obtained by multiplying the contributions (14.80)
from the actual failure times, but the censoring times are still important because
they enter into the at-risk-sets RS(t).
There are at least two reasons why Lp (𝜷 ∣ data) is called a partial likelihood:
• It is not a complete likelihood function for all parameters of the density function
(because the baseline failure rate function is not covered).
• All the data in the dataset is not used because the actual survival times play no
part in (14.81), but only their ranking, i.e. when they enter into the at-risk set.
More thorough treatments may be found in Cox and Oakes (1984), Lawless
(1982), and Kalbfleisch and Prentice (1980).
When there are many ties in the dataset, computation of maximum partial-
likelihood estimates is still possible but may become time-consuming. Of this
reason, the partial likelihood function is often approximated. Two commonly
employed approximations are due to Norman E. Breslow and to Bradley Efron.
Both approximations are available in the R survival package.
The procedures to find estimates for the various parameters are rather techni-
cal and are not presented in the current book. Readers who plan to use the Cox
model on a practical dataset are advised to consult a more specialized book and to
carefully read the documentation of the relevant R packages.

Cox Model Analysis with R


The Cox model is available in R by the function coxph in the survival pack-
age. Related aspects are also treated in several other R packages. Among these are
simPH, coxme, Coxnet, coxphw, and several more.
We suggest that you start by learning the function coxph in the survival
package. You have many options when using this package, and it is therefore
important that you read carefully the package documentation.
Additional theory and several worked examples with R may be found in Moore
(2016) and Fox and Weisberg (2019).
730 14 Reliability Data Analysis

14.9 Problems

14.1 Assume that you have determined the lifetimes for a total of 12 identical
items and obtained the following results (given in hours): 10.2, 89.6, 54.0,
96.0, 23.3, 30.4, 41.2, 0.8, 73.2, 3.6, 28.0, 31.6
The dataset can be downloaded from the book companion site.
(a) Find the sample mean and the sample standard deviation for the
dataset. Can you draw any conclusions about the underlying distri-
bution F(t) by comparing the sample mean and the sample standard
deviation?
(b) Construct the empirical survivor function for the dataset.
(c) Plot the data on a Weibull paper.10 What conclusions can you draw
from the plot?
(d) Construct the TTT plot for the dataset. What conclusion can you draw
from the TTT plot about the corresponding life distribution?

14.2 Failure time data from a compressor were discussed in Example 10.2. All
compressor failures at a certain process plant in the time period from
1968 until 1989 have been recorded. In this period, a total of 90 critical
failures occurred. In this context, a critical failure is defined to be a fail-
ure causing compressor downtime. The compressor is very important for
the operation of the process plant, and every effort is taken to restart a
failed compressor as soon as possible. The 90 repair times (in hours) are
presented chronologically in Table 14.6. The repair time associated with
the first failure was 1.25 hours, the second repair time was 135.00 hours,
and so on. The dataset can be downloaded from the book companion
site.
(a) Plot the repair times in chronological order to check whether or not
there is a trend in the repair times. Is there any reason to claim that
the repair times increase with the age of the compressor?
(b) Assume now that the repair times are independent and identically
distributed. Construct the empirical distribution function for the
repair times
(c) Plot the repair times on a lognormal plotting paper. Is it reason to
believe that the repair times are lognormally distributed?

14.3 Consider the set of material strength data presented by Crowder et al.
(1991, p. 46) and given in Table 14.7. An experiment has been carried

10 Weibull paper may be downloaded from https://www.weibull.com/GPaper/ or you can use


the R package WeibullR.
14.9 Problems 731

Table 14.6 Dataset for Problem 14.2.

1.25 135.00 0.08 5.33 154.00 0.50 1.25 2.50 15.00


6.00 4.50 32.50 9.50 0.25 81.00 12.00 0.25 1.66
5.00 7.00 39.00 106.00 6.00 5.00 17.00 5.00 2.00
2.00 0.33 0.17 0.50 18.00 2.50 0.33 0.50 2.00
0.33 4.00 20.00 6.00 6.30 15.00 23.00 4.00 5.00
28.00 16.00 11.50 0.42 38.33 10.50 9.50 8.50 17.00
34.00 0.17 0.83 0.75 1.00 0.25 0.25 2.25 13.50
0.50 0.25 0.17 1.75 0.50 1.00 2.00 2.00 38.00
0.33 2.00 40.50 4.28 1.62 1.33 3.00 5.00 120.00
0.50 3.00 3.00 11.58 8.50 13.50 29.50 29.50 112.00

Table 14.7 Dataset for Problem 14.3.

26.8∗ 29.6∗ 33.4∗ 35.0∗ 36.3 40.0∗ 41.7 41.9∗ 42.5∗


43.9 49.9 50.1 50.8 51.9 52.1 52.3 52.3 52.4
52.6 52.7 53.1 53.6 53.6 53.9 53.9 54.1 54.6
54.8 54.8 55.1 55.4 55.9 56.0 56.1 56.5 56.9
57.1 57.1 57.3 57.7 57.8 58.1 58.9 59.0 59.1
59.6 60.4 60.7

*Censored data points.

out to gain information about the strength of a certain type of braided


cord. A total of 48 pieces of cord were investigated. Seven cords were dam-
aged during the experiment, implying right-censored strength values. The
dataset can be downloaded from the book companion site.
(a) Establish a Kaplan–Meier plot of the material strength data.
(b) Establish a TTT plot of the material strength data.
(c) Discuss the effect of this type of censoring.
(d) Describe the form of the related failure rate function.

14.4 Establish a graph paper such that the Nelson–Aalen plot of Weibull
distributed life data is close to a straight line. Describe how the Weibull
parameters 𝛼 and 𝜆 can be estimated from the plot.

14.5 The Pareto distribution has cumulative distribution function F(x) =


Pr(X ≤ x) = 1 − x−𝜃 for x > 1. Let x1 , x2 , … , xn be n independent
observations of X.
732 14 Reliability Data Analysis

(a) Find the method of moments estimation (MME) estimator for 𝜃.


(b) Find the mean and the standard deviation of this estimator.

14.6 Let X1 , X2 , … , Xn be independent and identically distributed variables


with uniform distribution unif(0, 𝜃). Assume that x = (x1 , x2 , … , xn ) has
been observed.
(a) Find the likelihood function L(𝜃 ∣ x).
(b) Find the MLE for 𝜃 and derive its mean value.
(c) Find an unbiased estimator for 𝜃.

14.7 Let X1 , X2 , … , Xn independent and identically distributed Po(𝜆), where 𝜆


is unknown.
(a) Find an MLE for e−𝜆 .
(b) Find an unbiased estimator for e−𝜆 .

14.8 Consider a homogeneous Poisson process (HPP) with rate 𝜆. Let N(t)
be the number of failures (events) in a time interval of length t. N(t) is
hence Poisson distributed with parameter 𝜆t. Assume that the process is
observed in a time interval of length t = 2 years. In this time period, a
total of seven failures have been observed.
(a) Find an estimate for 𝜆
(b) Determine a 90% confidence interval for 𝜆

14.9 Let X ∼ Po(𝜆).


(a) Determine an exact 90% confidence interval for 𝜆 when X is observed
and found equal to 6. For comparison, also determine an approximate
90% confidence interval for 𝜆, using the approximation of the Poisson
distribution to  (𝜆, 𝜆)
(b) Solve the same problem as stated in (a) when X is observed and found
equal to 14.

14.10 Denote the distribution function of the Poisson distribution with parame-
ter 𝜆 by o (x; 𝜆), and the distribution function of the 𝜒 2 distribution with
𝜈 degrees of freedom by Γ𝜈 (z).
(a) Show that o (x ∣ 𝜆) = 1 − Γ2(x+1) (2𝜆). (Hint: First show that
∞ x
1 − Γ2(x+1) (2𝜆) = ∫2𝜆 ux! e−u du, and next apply repeated partial
integrations to the integral).
14.9 Problems 733

Table 14.8 Dataset for Problem 14.11.

12 373 107 318 9 739 13 000 12 207 63 589 31 893


98 474 5 784 9 662 61 731 15 269 4 730 11 269
26 947 27 838 90 682 8 086 7 905 48 162

(b) Let 𝜆1 (X) and 𝜆2 (X) be defined by


𝛼
o (x ∣ 𝜆1 (x)) = .
2
𝛼
o (x − 1 ∣ 𝜆2 (x)) = 1 − .
2
Use the result of (a) to show that
1
𝜆1 (x) = z𝛼∕2,2x and
2
1
𝜆2 (x) = z1−𝛼∕2,2(x+1) ,
2
where z𝜀,𝜈 is the upper 100 𝜀% percentile of the 𝜒 2 distribution with 𝜈
degrees of freedom.

14.11 Historical data with a record of 20 times-to-failure (in hours) of a pressure


transmitter (PT) are available in Table 14.8. The dataset can be down-
loaded from the book companion site.
(a) Explain why it is reasonable to assume a constant failure rate for
the PT.
(b) Determine the empirical cumulative distribution corresponding to
this dataset and plot it.
(c) Estimate the failure rate of PT.
(d) Find the survivor function obtained with the estimated failure rate
and compare to the one obtained with the empirical distribution.
Comment and explain how to improve the result.

14.12 Reconsider the situation in Example 14.16, but assume that the
times-to-failure are those that are not starred. They are given in
Table 14.9. The dataset can be downloaded from the book companion
site.
(a) Determine the Kaplan–Meier estimate R̂(t) and display it graphically.
(b) Determine the Nelson–Aalen estimate R∗ (t) for the survivor function
and display it graphically.

14.13 Table 14.10 shows the intervals in operating hours between successive
failures of air-conditioning equipment in a Boeing 720 aircraft. The first
734 14 Reliability Data Analysis

Table 14.9 Dataset for Problem 14.12.

31.7 39.2∗ 57.5 65.5 65.8∗ 70.0 75.0∗ 75.2∗

87.5∗ 88.3∗ 94.2 101.7∗ 105.8∗ 109.2 110.0 130.0∗

*Censored data points.

Table 14.10 Dataset for Problem 14.13.

413 14 58 37 100 65 9 169


447 184 36 201 118 34 31 18
18 67 57 62 7 22 34

Source: Proschan (1963).

interval is 413, the second is 14, and so on. The data are from Proschan
(1963). The dataset can be downloaded from the book companion
site.
(a) Establish the Nelson–Aalen plot (N(t) plot) of the dataset. Describe
(with words) the shape of the rate of occurrence of failures (ROCOF).

14.14 Suppose that the dataset in Problem 14.11 was obtained by simultane-
ously activating 20 identical items, but that the test was terminated at the
12th failure.
(a) What type of censoring is this?
(b) Estimate 𝜆 in this situation.
(c) Calculate a 95% confidence interval for 𝜆.
(d) Compare the results with those derived in Problem 14.11.

14.15 Establish a graph paper such that the Nelson–Aalen plot of normally dis-
tributed ( (𝜇, 𝜎 2 )) life data is close to a straight line. Describe how the
parameters 𝜇 and 𝜎 may be estimated from the plot.

14.16 Table 14.11 shows the intervals in days between successive failures of a
piece of software developed as part of a large data system. The first interval
is 9, the second is 12, and so on. The data are from Jelinski and Moranda
(1972).The dataset can be downloaded on the book companion site.
(a) Establish the Nelson–Aalen plot (N(t) plot) of the dataset. Is the
ROCOF increasing or decreasing?
(b) Assume that the ROCOF follows a log-linear model, and find the max-
imum likelihood estimates (MLE) for the parameters of this model.
14.9 Problems 735

Table 14.11 Dataset for Problem 14.16.

9 12 11 4 7 2 5 8 5 7
1 6 1 9 4 1 3 3 6 1
11 33 7 91 2 1 87 47 12 9
135 258 16 35

Source: Jelinski and Moranda (1972).

Table 14.12 Dataset for Problem 14.17.

31.7 39.2 57.5 65.0 65.8 70.0 75.0 75.2


87.7 88.3 94.2 101.7 105.8 109.2 110.0 130.0

(c) Draw the estimated cumulative ROCOF in the same diagram as the
Nelson–Aalen plot. Is the fit acceptable?
(d) Use the Laplace test to test whether the ROCOF is decreasing or not
(use a 5% level of significance).

14.17 Independent lifetimes (given in months in Table 14.12) have been


observed with no censoring. The dataset can be downloaded from the
book companion site.
(a) Give the analytical expression of empirical distribution function and
explain your method.
(b) Give a script to get this function.
(c) Give a plot of it.
(d) Assuming that the probability density function is an exponential law
of parameter 𝜆, give the optimal value of 𝜆 to fit to the given dataset.
(e) Is it reasonable to assume that such a unit has an exponential density
function? Why?

14.18 A record of the times-to-failure (given in hours) of a sensor give the fol-
lowing historical dataset in Table 14.13. The dataset can be downloaded
from the book companion site.

Table 14.13 Dataset for Problem 14.18.

1.2 × 104 9.3 × 104 0.5 × 104 0.2 × 104 1.1 × 104
2.6 × 104 9.4 × 104 1.2 × 104 4.9 × 104 9.6 × 104
4 4 4 4
0.9 × 10 8.6 × 10 6.5 × 10 0.5 × 10 1.0 × 104
0.1 × 104 0.8 × 104 3.6 × 104 3.2 × 104
736 14 Reliability Data Analysis

(a) Demonstrate that it is reasonable to assume a constant failure rate for


the sensor.
(b) Determine the empirical cumulative distribution corresponding to
this dataset and plot it.
(c) Propose two methods to estimate the failure rate.
(d) Determine the survivor function obtained with the estimated failure
rate and compare with the one you obtain with the empirical distri-
bution. Comment and explain how you could improve your results.
(e) For all the units, calculate their MTTF and the probability that they
survive their own MTTF. Give comments.
(f) Determine a plot for the survivor functions and identify the time hori-
zons tk for which the survivor function of k items (k = 0, 1, 2, …) is
higher than 0.9.

References

Aalen, O.O. (1978). Nonparametric inference for a family of counting processes.


Annals of Statistics 6: 701–726.
Aalen, O.O., Borgan, Ø., and Gjessing, H.K. (2008). Survival and Event History
Analysis; A Process Point of View. New York: Springer.
Ansell, J.I. and Phillips, M.J. (1994). Practical Methods for Reliability Data Analysis.
Oxford: Oxford University Press.
Barlow, R.E. and Campo, R. (1975). Total time on test processes and applications to
failure analysis. In: Reliability and Fault Tree Analysis (ed. R.E. Barlow, J.B. Fussell,
and N.D. Singpurwalla). Philadelphia, PA: SIAM. 451–481.
Bergman, B. and Klefsjö, B. (1982). A graphical method applicable to age-replacement
problems. IEEE Transactions on Reliability R-31 (5): 478–481.
Bergman, B. and Klefsjö, B. (1984). The total time on test concept and its use in
reliability theory. Operations Research 32 (3): 596–606.
Cox, D.R. (1972). Regression models and life tables (with discussion). Journal of the
Royal Statistical Society B 21: 411–421.
Cox, D.R. and Oakes, D. (1984). Analysis of Survival Data. London: Chapman and
Hall.
Crowder, M.J., Kimber, A.C., Sweeting, T.J., and Smith, R.L. (1991). Statistical
Analysis of Reliability Data. Boca Raton, FL: Chapman and Hall.
Fox, J. and Weisberg, S. (2019). An R Companion to Applied Regression. Los Angeles,
CA: Sage Publications.
Jelinski, Z. and Moranda, P.B. (1972). Software reliability research. In: Statistical
Computer Performance Evaluation (ed. W. Freiberger), 465–484. New York:
Academic Press.

You might also like