High-Utility Pattern Mining: Theory, Algorithms and Applications Philippe Fournier-Viger pdf download
High-Utility Pattern Mining: Theory, Algorithms and Applications Philippe Fournier-Viger pdf download
https://textbookfull.com/product/high-utility-pattern-mining-
theory-algorithms-and-applications-philippe-fournier-viger/
https://textbookfull.com/product/biota-grow-2c-gather-2c-cook-
loucas/
https://textbookfull.com/product/data-mining-algorithms-in-c-
data-patterns-and-algorithms-for-modern-applications-1st-edition-
timothy-masters/
https://textbookfull.com/product/introduction-to-statistical-
decision-theory-utility-theory-and-causal-analysis-silvia-bacci/
https://textbookfull.com/product/metaheuristic-algorithms-for-
image-segmentation-theory-and-applications-diego-oliva/
New Trends in Mechanism and Machine Science: Theory and
Industrial Applications 1st Edition Philippe Wenger
https://textbookfull.com/product/new-trends-in-mechanism-and-
machine-science-theory-and-industrial-applications-1st-edition-
philippe-wenger/
https://textbookfull.com/product/hidden-semi-markov-models-
theory-algorithms-and-applications-1st-edition-yu/
https://textbookfull.com/product/glowworm-swarm-optimization-
theory-algorithms-and-applications-1st-edition-krishnanand-n-
kaipa/
https://textbookfull.com/product/optimization-of-complex-systems-
theory-models-algorithms-and-applications-hoai-an-le-thi/
https://textbookfull.com/product/mobile-data-mining-and-
applications-hao-jiang/
Studies in Big Data 51
Philippe Fournier-Viger
Jerry Chun-Wei Lin
Roger Nkambou
Bay Vo
Vincent S. Tseng Editors
High-Utility
Pattern
Mining
Theory, Algorithms and Applications
Studies in Big Data
Volume 51
Series editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: [email protected]
The series “Studies in Big Data” (SBD) publishes new developments and advances
in the various areas of Big Data- quickly and with a high quality. The intent is to
cover the theory, research, development, and applications of Big Data, as embedded
in the fields of engineering, computer science, physics, economics and life sciences.
The books of the series refer to the analysis and understanding of large, complex,
and/or distributed data sets generated from recent digital sources coming from
sensors or other physical instruments as well as simulations, crowd sourcing, social
networks or other internet transactions, such as emails or video click streams and
other. The series contains monographs, lecture notes and edited volumes in Big
Data spanning the areas of computational intelligence incl. neural networks,
evolutionary computation, soft computing, fuzzy systems, as well as artificial
intelligence, data mining, modern statistics and Operations research, as well as
self-organizing systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
** Indexing: The books of this series are submitted to ISI Web of Science,
DBLP, Ulrichs, MathSciNet, Current Mathematical Publications, Mathematical
Reviews, Zentralblatt Math: MetaPress and Springerlink.
Vincent S. Tseng
Editors
123
Editors
Philippe Fournier-Viger Bay Vo
Harbin Institute of Technology Ho Chi Minh City University of Technology
(Shenzhen) Ho Chi Minh City, Vietnam
Shenzhen, China
Vincent S. Tseng
Jerry Chun-Wei Lin National Chiao Tung University
Western Norway University Hsinchu, Taiwan
of Applied Sciences
Bergen, Norway
Roger Nkambou
Université du Québec à Montréal
Montreal, QC, Canada
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
More and more data are being collected and stored in databases. As a result,
analyzing data by hand is often difficult and time-consuming. Hence, a key problem
has emerged in the 1990s, which is to design automated techniques for identifying
interesting patterns in data. Research in this area was initially motivated by the
analysis of goods purchased by customers in retail stores. The main focus of these
studies was to identify frequent patterns, that is values that frequently appear
together in a database. For example, a frequent pattern may be that many customers
buy bread with cheese. Other examples of frequent patterns are words that
frequently co-occur in a text, or sequences of actions that often lead to failures in a
complex system. Discovering such patterns can be used to understand the data (e.g.,
the behavior of customers) and to support decision-making (e.g., to develop
marketing strategies to co-promote products).
Although identifying frequent patterns is useful for many applications,
frequency is not always the best measure to find interesting patterns. For instance,
although some products may be frequently purchased by customers in a retail store,
they may yield a low profit, while not so frequent items may yield a high profit.
Based on this observation, a new measure called utility was introduced to select
interesting patterns. The utility is a mathematical function that measures the
importance of patterns by considering quantities and weights, indicating the relative
importance of data values to users. Discovering high utility patterns in data has
many applications since the task is defined in a general way, and has been extended
to consider various types of data such as transaction databases and sequences. For
example, it can be used to discover sets of products that yield a high profit in retail
stores and sets of Web pages where users spent a lot of time on a Web site.
From a research perspective, the discovery of high utility patterns has attracted
the attention of more and more researchers in recent years because it generalizes the
problem of frequent pattern mining, and it is also more challenging. The key reason
is that the powerful “anti-monotonic property” of the frequency does not hold for
the utility measure and thus cannot be used to reduce the search space. Thus,
traditional techniques to discover frequent patterns cannot be directly used for
discovering high utility patterns. In the last decade, this has lead to the proposal of
v
vi Preface
many novel data structures, algorithms, and optimizations for discovering high
utility patterns.
The motivation for writing this book is that the research on utility mining has
become quite mature. There is thus a need to provide an up-to-date introduction and
overview of current techniques and recent advances for discovering high utility
patterns.
The book is a collection of chapters, written by experienced researchers in the
field. The chapters were selected to ensure that the key topics and techniques in
utility mining are discussed. Several of the chapters are written as survey papers to
give a broad overview of current work in utility mining, while other chapters
present techniques and applications in more details. The book is designed so that it
can be used both by researchers and people who are new to the field. Selected
chapters from this book could be used to teach an advanced undergraduate or
graduate course on pattern mining. Besides, the book provides enough details about
state-of-the-art algorithms so that it could be used by industry practitioners who
want to implement high utility pattern mining techniques in commercial software,
to analyze transaction database. Several of the algorithms discussed in this book are
implemented in the SPMF open-source data mining software (http://www.philippe-
fournier-viger.com/spmf/).
vii
viii Contents
Abstract High utility pattern mining is an emerging data science task, which con-
sists of discovering patterns having a high importance in databases. The utility of
a pattern can be measured in terms of various objective criterias such as its profit,
frequency, and weight. Among the various kinds of high utility patterns that can be
discovered in databases, high utility itemsets are the most studied. A high utility
itemset is a set of values that appears in a database and has a high importance to
the user, as measured by a utility function. High utility itemset mining generalizes
the problem of frequent itemset mining by considering item quantities and weights.
A popular application of high utility itemset mining is to discover all sets of items
purchased together by customers that yield a high profit. This chapter provides an
introduction to high utility itemset mining, reviews the state-of-the-art algorithms,
their extensions, applications, and discusses research opportunities. This chapter is
aimed both at those who are new to the field of high utility itemset mining, as well
as researchers working in the field.
P. Fournier-Viger (B)
Harbin Institute of Technology (Shenzhen), Shenzhen, China
e-mail: [email protected]
J. Chun-Wei Lin
Department of Computing Mathematics and Physics, Western Norway University of Applied
Sciences (HVL), Bergen, Norway
e-mail: [email protected]
T. Truong-Chi
University of Dalat, Dalat, Vietnam
e-mail: [email protected]
R. Nkambou
University of Quebec, Montreal, Canada
e-mail: [email protected]
1 Introduction
The goal of data mining is to extract patterns or train models from databases to
understand the past or predict the future. Various types of data mining algorithms
have been proposed to analyze data [1, 38]. Several algorithms produce models that
operates as black boxes. For example, several types of neural networks are designed
to perform predictions very accurately but cannot be easily interpreted by humans.
To extract knowledge from data that can be understood by humans, pattern mining
algorithms are designed [27, 28]. The goal is to discover patterns in data that are
interesting, useful, and/or unexpected. An advantage of pattern mining over several
other data mining approaches is that discovering patterns is a type of unsupervised
learning as it does not require labeled data. Patterns can be directly extracted from
raw data, and then be used to understand data and support decision-making. Pattern
mining algorithms have been designed to extract various types of patterns, each
providing different information to the user, and for extracting patterns from different
types of data. Popular types of patterns are sequential patterns [27], itemsets [28],
clusters, trends, outliers, and graph structures [38].
Research on pattern mining algorithms has started in the 1990s with algorithms to
discover frequent patterns in databases [2]. The first algorithm for frequent pattern
mining is Apriori [2]. It is designed to discover frequent itemsets in customer transac-
tion databases. A transaction database is a set of records (transactions) indicating the
items purchased by customers at different times. A frequent itemset is a group of val-
ues (items) that is frequently purchased by customers (appears in many transactions)
of a transaction database. For example, a frequent itemset in a database may be that
many customers buy the item noodles with the item spicy sauce. Such patterns are
easily understandable by humans and can be used to support decision-making. For
instance, the pattern {noodles, spicy sauce} can be used to take marketing decisions
such as co-promoting noodles with spicy sauce. The discovery of frequent itemsets
is a well-studied data mining task, and has applications in numerous domains. It can
be viewed as the general task of analyzing a database to find co-occurring values
(items) in a set of database records (transactions) [10, 16, 20, 37, 61, 64–66].
Although, frequent pattern mining is useful, it relies on the assumption that fre-
quent patterns are interesting. But this assumption does not hold for numerous appli-
cations. For example, in a transaction database, the pattern {milk, br ead} may be
highly frequent but may be uninteresting as it represents a purchase behavior that
is common, and may yield a low profit. On the other hand, several patterns such as
{caviar, champagne} may not be frequent but may yield a higher profit. Hence, to
find interesting patterns in data, other aspects can be considered such as the profit or
utility.
To address this limitation of frequent itemset mining, an emerging research area
is the discovery of high utility patterns in databases [31, 52, 56, 58, 59, 83, 87,
94]. The goal of utility mining is to discover patterns that have a high utility (a
high importance to the user), where the utility of a pattern is expressed in terms
of a utility function. A utility function can be defined in terms of criteria such as
A Survey of High Utility Itemset Mining 3
the profit generated by the sale of an item or the time spent on webpages. Various
types of high utility patterns have been studied. This chapter surveys research on the
most popular type, which is high utility itemsets [83]. Mining high utility itemsets
can be seen as a generalization of the problem of frequent itemset mining where
the input is a transaction database where each item has a weight representing its
importance, and where items can have non binary quantities in transactions. This
general problem formulation allows to model various tasks such as discovering all
itemsets (sets of items) that yield a high profit in a transaction database, finding sets of
webpages where users spend a large amount of time, or finding all frequent patterns
as in traditional frequent pattern mining. High utility itemset mining is a very active
research area. This chapter provides a comprehensive survey of the field that is both
an introduction and a guide to recent advances and research opportunities.
The rest of this chapter is organized as follows. Section 2 introduces the problem of
high utility itemset mining, its key properties, and how it generalizes frequent itemset
mining. Section 3 surveys popular techniques for efficiently discovering high utility
itemsets in databases. Section 4 presents the main extensions of high utility itemset
mining. Section 5 discusses research opportunities. Section 6 present open-source
implementations. Finally, Sect. 7 draws a conclusion.
2 Problem Definition
This section first introduces the problem of frequent itemset mining [2], and then
explains how it is generalized as high utility itemset mining [31, 52, 56, 58, 59,
83, 87, 94]. Then, key properties of the problem of high utility itemset mining are
presented and contrasted with those of frequent itemset mining.
The problem of frequent itemset mining consists of extracting patterns from a trans-
action database. In a transaction database, each record (called transaction) is a
set of items (symbols). Formally, a transaction database D is defined as follows.
Let there be the set I of all items (symbols) I = {i 1 , i 2 , . . . , i m } that occur in the
database. A transaction database D is a set of records, called transactions, denoted
as D = {T0 , T1 , . . . , Tn }, where each transaction Tq is a set of items (i.e. Tq ⊆ I ),
and has a unique identifier q called its TID (Transaction IDentifier). For example,
consider the customer transaction database shown in Table 1. The database in Table 3
contains five transactions denoted as T0 , T1 , T3 and T4 . The transaction T2 indicates
that the items a, c and d were purchased together by a customer in that transaction.
The goal of frequent itemset mining is to discover itemsets (sets of items) that
have a high support (that appear frequently). Formally, an itemset X is a finite set
of items such that X ⊆ I . Let the notation |X | denote the set cardinality or, in other
4 P. Fournier-Viger et al.
For example, the support of the itemset {a, c} in the database of Table 3 is 3, since
this itemset appears in three transactions (T0 , T2 and T3 ). This definition of the support
measure is called relative support. Another equivalent definition is to express the
support as a percentage of the total number of transactions (called absolute support).
For example, the absolute support of {a, c} is 60% since it appears in 3 out of 5
transactions. The problem of frequent itemset mining is defined as follows:
For example, consider the database of Table 3 and minsup = 3. There are 11
frequent itemsets, listed in Table 2.
The problem of frequent itemset mining has been studied for more than two
decades. Numerous algorithms have been proposed to discover frequent patterns effi-
ciently, including Apriori [2], FP-Growth [39], Eclat [91], LCM [81] and
H-Mine [69]. Although frequent itemset mining has many applications, a strong
assumption of frequent itemset mining is that frequent patterns are useful or inter-
esting to the user, which is not always true. To address this important limitation of
A Survey of High Utility Itemset Mining 5
traditional frequent pattern mining, it has been generalized as high utility itemset
mining, where items are annotated with numerical values and patterns are selected
based on a user-defined utility function.
The task of high utility itemset mining [31, 52, 56, 58, 59, 87, 94] consists of
discovering patterns in a generalized type of transaction database called quantitative
transaction database, where additional information is provided, that is the quantities
of items in transactions, and weights indicating the relative importance of each item
to the user.
Formally, a quantitative transaction database D is defined as follows. Let there
be the set I of all items I = {i 1 , i 2 , . . . i m }. A quantitative transaction database D is
a set of transactions, denoted as D = {T0 , T1 , . . . , Tn }, where each transaction Tq is
a set of items (i.e. Tq ⊆ I ), and has a unique identifier q called its TID (Transaction
IDentifier). Each item i ∈ I is associated with a positive number p(i), called its
external utility. The external utility of an item is a positive number representing its
relative importance to the user. Furthermore, every item i appearing in a transaction
Tc has a positive number q(i, Tc ), called its internal utility, which represents the
quantity of i in the transaction Tc .
To illustrate these definitions, consider an example customer transaction database
depicted in Table 3, which will be used as running example. In this example, the set of
items is I = {a, b, c, d, e}. It can be considered as representing different products sold
in a retail store such as apple, br ead cer eal, duck and egg. The database in Table 3
contains five transactions (T0 , T1 , . . . T4 ). The transaction T3 indicates that items a,
c, and e were bought with purchase quantities (internal utilities) of respectively 2, 6,
and 2. Table 4 provides the external utilities of the items, which represents their unit
profits. Assume that the dollar ($) is used as currency. The sale of a unit of items a,
b, c, d, and e yields a profit of 5$, 2$, 1$, 2$ and 3$, respectively.
The goal of high utility itemset mining is to discover itemsets (sets of items) that
appear in a quantitative database and have a high utility (e.g. yield a high profit).
The utility of an itemset is a measure of its importance in the database, which is
computed using a utility function. The utility measure is generally defined by the
following definition, although alternative measures have been proposed [83] (which
will be reviewed in Sect. 4). In the running example, the utility measure is interpreted
as the amount of profit generated by each set of items.
discovering high utility itemsets can also be used to discover frequent itemsets in a
transaction database. To do that, the following steps can be applied:
1. The transaction database is converted to a quantitative transaction database. For
each item i ∈ I , the external utility value of i is set to 1, that is p(i) = 1 (to indicate
that all items are equally important). Moreover, for each item i and transaction
Tc , if i ∈ Tc , set q(i, Tc ) = 1. Otherwise, set q(i, Tc ) = 0.
2. Then a high utility mining algorithm is applied on the resulting quantitative trans-
action database with minutil set to minsup, to obtain the frequent itemsets.
For example, the database of Table 1 can be transformed in a quantitative database.
The result is the transaction database of Tables 6 and 7. Then, frequent itemsets can
be mined from this database using a high utility itemset mining algorithm. How-
ever, although a high utility itemset mining algorithm can be used to mine frequent
itemsets, it may be preferable to use frequent itemset mining algorithms when per-
formance is important as these latter are optimized for this task.
8 P. Fournier-Viger et al.
For a given quantitative database and minimum utility threshold, the problem of high
utility itemset mining always has a single solution. It is to enumerate all patterns that
have a utility greater than or equal to the user-specified minimum utility threshold.
The problem of high utility itemset mining is difficult for two main reasons. The
first reason is that the number of itemsets to be considered can be very large to find
those that have a high utility. Generally, if a database contains m distinct items there
are 2m − 1 possible itemsets (excluding the empty set). For example, if I = {a, b, c},
the possible itemsets are {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, and {a, b, c}. Thus, there
are 23 − 1 = 7 itemsets, which can be formed with I = {a, b, c}. A naive approach
to solve the problem of high utility itemset mining is to count the utilities of all
possible itemsets by scanning the database, to then keep the high utility itemsets.
Although this approach produces the correct result, it is inefficient. The reason is
that the number of possible itemsets can be very large. For example, if a retail store
has 10,000 items on its shelves (m = 10, 000), the utilities of 210,000 − 1 possible
itemsets should be calculated, which is unmanageable using the naive approach. It is
to be noted that the problem of high utility itemset mining can be very difficult even
for small databases. For example, a database containing a single transaction of 100
items can produce 2100 − 1 possible itemsets. Thus, the size of the search space (the
number of possible itemsets) can be very large even if there are few transactions in
a database. In fact, the size of the search space does not only depend on the size of
the database, but also on how similar the transactions are in the database, how large
the utility values are, and also on how low the minutil threshold is set by the user.
A second reason why the problem of high utility itemset mining is difficult is
that high utility itemsets are often scattered in the search space. Thus, many itemsets
must be considered by an algorithm before it can find the actual high utility itemsets.
To illustrate this, Fig. 1 provides a visual representation of the search space for the
running example, as a Hasse diagram. A Hasse diagram is a graph where each
possible itemset is represented as a node, and an arrow is drawn from an itemset
X to another itemset Y if and only if X ⊆ Y and |X | + 1 = |Y |. In Fig. 1, high
utility itemsets are depicted using light gray nodes, while low utility itemsets are
represented using white nodes. The utility value of each itemset is also indicated.
An important observation that can be made from that figure is that the utility of an
itemset can be greater, higher or equal, to the utility of any of its supersets/subsets.
For example, the utility of the itemset {b, c} is 28, while the utility of its supersets
{b, c, d} and {a, b, c, d, e} are 34 and 25, respectively. Formally, it is thus said that
the utility measure is neither monotone nor anti-monotone.
Property 1 (The utility measure is neither monotone nor anti-monotone) Let there
be two itemsets X and Y such that X ⊂ Y . The relationship between the utilities of
X and Y is either u(X ) < u(Y ), u(X ) > u(Y ), or u(X ) = u(Y ) [83].
A Survey of High Utility Itemset Mining 9
Fig. 1 The search space of high utility itemset mining for the running example and minutil = 25
Because of this property, the high utility itemsets appear scattered in the search
space, as it can be observed in Fig. 1. This is the main reason why the problem of
high utility itemset mining is more difficult than the problem of frequent itemset
mining [2]. In frequent itemset mining, the support measure has the nice property of
being monotone [2], that is, the support of an itemset is always greater than or equal
to the frequency of any of its supsersets.
Property 2 (The support measure is monotone) Let there be two itemsets X and Y
such that X ⊂ Y . It follows that sup(X ) ≥ sup(Y ) [2].
For example, in the database of Table 1, the support of {b, c} is 3, while the
support of its supersets {b, c, d} and {a, b, c, d, e} are 2 and 1, respectively. The
monotonicity of the support measure makes it easy to find frequent patterns as it
guarantees that all supersets of an infrequent itemset are also infrequent [2]. Thus, a
frequent itemset mining algorithm can discard all supersets of an infrequent itemset
from the search space. For example, if an algorithm finds that the itemset {a, d} is
infrequent, it can directly eliminate all supersets of {a, d} from further exploration,
thus greatly reducing the search space. The search space for the example database
of Table 1 is illustrated in Fig. 2. The anti-monotonicity of the support can be clearly
observed in this picture as a line is drawn that clearly separates frequent itemsets
from infrequent itemsets. Property 2 is also called the downward-closure property,
anti-monotonicity-property or Apriori-property [2]. Although it holds for the support
measure, it does not hold for the utility measure used in high utility itemset mining.
As a result, in Fig. 1, it is not possible to draw a clear line to separate low utility
itemsets from high utility itemsets.
Due to the large search space in high utility itemset mining, it is thus important to
design fast algorithms that can avoid considering all possible itemsets in the search
space and that process each itemset in the search space as efficiently as possible,
while still finding all high utility itemsets. Moreover, because the utility measure is
not monotone nor anti-monotone, efficient strategies for reducing the search space
used in frequent itemset mining cannot be directly used to solve the problem of high
10 P. Fournier-Viger et al.
Fig. 2 The search space of frequent itemset mining for the database of Table 1 and minsup = 3
utility itemset mining. The next section explains the key ideas used by the state-of-
the-art high utility itemset mining algorithms to solve the problem efficiently.
3 Algorithms
Several high utility itemset mining algorithms have been proposed such as UMin-
ing [82], Two-Phase [59], IHUP [5], UP-Growth [79], HUP-Growth [52], MU-
Growth [87], HUI-Miner [58], FHM [31], ULB-Miner [17], HUI-Miner* [71] and
EFIM [94]. All of these algorithms have the same input and the same output. The
differences between these algorithms lies in the data structures and strategies that are
employed for searching high utility itemsets. More specifically, algorithms differ in
(1) whether they use a depth-first or breadth-first search, (2) the type of database rep-
resentation that they use internally or externally, (3) how they generate or determine
the next itemsets to be explored in the search space, and (4) how they compute the
utility of itemsets to determine if they satisfy the minimum utility constraint. These
design choices influence the performance of these algorithms in terms of execution
time, memory usage and scalability, and also how easily these algorithms can be
implemented and extended for other data mining tasks. Generally, all high utility
itemset mining algorithms are inspired by classical frequent itemset mining algo-
rithms, although they also introduce novel ideas to cope with the fact that the utility
measure is neither monotone nor anti-monotone.
Early algorithms for the problem of high utility itemset mining were incomplete
algorithms that could not find he complete set of high utility itemsets due to the use
of heuristic strategies to reduce the search space. For example, this is the case of
the UMining and UMining_H algorithms [82]. In the rest of this section, complete
algorithms are reviewed, which guarantees to find all high utility itemsets. It is also
interesting to note that the term high utility itemset mining has been first used in
2003 [11], although the problem definition used by most researchers nowadays, and
used in this chapter, has been proposed in 2005 [83].
A Survey of High Utility Itemset Mining 11
The first complete algorithms to find high utility itemsets perform two phases, and
are thus said to be two phase algorithms. This includes algorithms such as Two-
Phase [59], IHUP [5], UP-Growth [79], HUP-Growth [52], and MU-Growth [87].
The breakthrough idea that has inspired all these algorithms was introduced in Two-
Phase [59]. It is that it is possible to define a monotone measure that is an upper-bound
on the utility measure, and to use that measure to safely reduce the search space
without missing any high utility itemsets. The measure proposed in the Two-Phase
algorithm is the TWU (Transaction Weighted Utilization) measure, which is defined
as follows:
The TWU measure is interesting because it can be used to reduce the search space.
For this purpose, the following property was proposed.
Property 4 (Pruning the search space using the TWU) For any itemset X , if
T W U (X ) < minutil, then X is a low-utility itemset as well as all its supersets.
This directly follows from Property 3.
For example, the utility of the itemset {a, b, c, d} is 20, and T W U ({a, b, c, d}) =
25. Thus, by the Property 4, it is known that any supersets of {a, b, c, d} cannot have
a TWU and a utility greater than 25. As a result, if the user sets the minutil threshold
to a value greater than 25, all supsersets of {a, b, c, d} can be eliminated from the
search space as it is known by Property 4 that their utilities cannot be greater than
25.
Algorithms such as IHUP [5], PB [47], Two-Phase [59], UP-Growth [79], HUP-
Growth [52] and MU-Growth [87] utilize Property 4 as main property to prune the
search space. They operate in two phases:
12 P. Fournier-Viger et al.
1. In the first phase, these algorithms calculate the TWU of itemsets in the search
space. For an itemset X , if T W U (X ) < X , then X and its supersets cannot be
high utility itemsets. Thus, they can be eliminated from the search space and their
TWU do not need to be calculated. Otherwise, X and its supersets may be high
utility itemsets. Thus, X is kept in memory as a candidate high utility itemset and
its supersets may be explored.
2. In the second phase, the exact utility of each candidate high utility itemset X
found in phase 1 is calculated by scanning the database. If u(X ) ≥ minutil, then
X is output since it is a high utility itemset.
This two phase process ensures that only low utility itemsets are pruned from
the search space. Thus, two phase algorithms can find all high utility itemsets while
reducing the search space to improve their performance. A representative two phase
algorithm is Two-Phase [59]. It is described next, and then its limitations are dis-
cussed.
The Two-Phase algorithm generalizes the Apriori algorithm, which was proposed for
frequent itemset mining [2]. Two-Phase explores the search space of itemsets using
a breadth-first search. A breadth-first search algorithm first considers single items
(1-itemsets). In the running example, those are {a}, {b}, {c}, {d} and {e}. Then,
Two-Phase generates 2-itemsets such as {a, b}, {a, c}, {a, d}, and then 3-itemsets,
and so on, until it generates the largest itemset {a, b, c, d, e} containing all items.
Two-Phase [59] takes a quantitative transaction database and the minutil threshold
as input. Two-Phase uses a standard database representation, as shown in Table 3, also
called a horizontal database. The pseudocode of Two-Phase is given in Algorithm 1.
In phase 1, Two-Phase scans the database to calculate the TWU of each 1-itemset (line
1). Then, Two-Phase uses this information to identify the set of all candidate high-
utility items, denoted as P1 (line 2). An itemset X is said to be a candidate high utility
itemset if T W U (X ) ≥ minutil. Then, Two-Phase performs a breadth-first search to
find larger candidate high utility itemsets (line 4–10). During the search, Two-Phase
uses the candidate high utility itemsets of a given length k − 1 (denoted as Pk−1 ) to
generate itemsets of length k (denoted as Pk ). This is done by combining pairs of
candidate high utility itemsets of length k that share all but one item (line 5). For
example, if the candidate high utility 1-itemsets are {a}, {b}, {c} and {e}, Two-Phase
combine pairs of these itemsets to obtain the following 2-itemsets: {a, b}, {a, c},
{a, e}, {b, c}, {b, e}, and {c, e}. After generating itemsets of length k, Two-Phase
checks if the (k − 1)-subsets of each itemset are candidate high utility itemsets. If an
itemset X has a (k − 1)-subset that is not a candidate high utility itemset, X cannot
be a high utility itemset (it would violate Property 4) and it is thus removed from the
set of k-itemsets. Then, Two-Phase scans the database to calculate the TWU of all
remaining itemsets in Pk (line 7). Each itemset having a TWU not less than minutil
is added to the set Pk of candidate high utility k-itemsets (line 8). This process is
A Survey of High Utility Itemset Mining 13
repeated until no candidate high utility itemsets can be generated. Then, the second
phase is performed (line 12–13). Two-Phase scans the database to calculate the exact
utility of each candidate high utility itemsets. The set of all candidate high utility
itemsets that have a utility not less than minutil are the high utility itemsets. They
are returned to the user (line 13).
Apriori and Two-Phase is that the latter performs a second phase where the exact
utility of each pattern is calculated by scanning the database. Various optimizations
can be used to reduce the cost of the second phase such as storing itemsets in a
hash-tree to avoid comparing each itemset with each transaction [2]. However, the
second phase remains very costly [79, 90].
6). However, it can be observed that not all items in D can be appended to X ∪ {z}
to generate larger itemsets. In fact, the itemset X ∪ {z} may not even appear in all
transactions of the database D. For this reason, a pattern-growth algorithm will create
the projected database of the itemset X ∪ {z} (line 5) and will use this database to
perform the depth-first search (line 6). This will allow reducing the cost of scanning
the database. After recursively performing the depth-first search for all items, the set
of all candidate high utility itemsets P has been generated.
Then, Algorithm 2 performs a second phase in the same way as the previously
described Two-Phase algorithm. The database is scanned to calculate the exact utility
of each candidate high utility itemsets (line 4). Those having a utility not less than
the minutil threshold are returned to the user (line 5).
Now, let’s illustrate these steps in more details with an example. Consider the
database of Tables 3 and 4 and assume that minutil = 25. In phase 1, the algorithm
scans the database and finds that 1-itemsets {a}, {b}, {c} and {e}, have TWU values
of 55, 54, 84, 53, and 76, respectively. These itemsets are thus candidate high utility
itemsets. The algorithm first considers the item a to try to find larger candidate
itemsets starting with the prefix {a}. The algorithm then builds the projected database
of {a} as shown in Table 8. The projected database of an item i is defined as the set of
transactions where i appears, but where the item i and items preceding i according
to the ≺ order have been removed. Then, to find candidate itemsets starting with
{a} containing one more item, the algorithm scans the projected database of {a} and
count the TWU of all items appearing in that database. For example, the TWU of
items in the projected database of {a} are: {b} : 25, {c} : 55 and {e} : 47. This means
that the TWU of {a, b} is 25, that the TWU of {a, c} is 55, and that the TWU of
{a, e} is 47. Since these three itemsets have a TWU no less than minutil, these
itemsets are candidate high utility itemsets and are next used to try to generate larger
itemsets by performing the depth-first search starting from each itemset. The itemset
{a, c} is first considered. The algorithm builds the projected database of {a, c} from
the projected database of {a}. The projected database of {a, c} is shown in Table 9.
Then, the algorithm scans the projected database of {a, c} to find items having a TWU
no less than minutil in that database. This process will continue until all candidate
high utility itemsets have been found by the depth-first search. Then, in phase 2, the
database is scanned to calculate the exact utilities of all candidates found in phase
1. Then, itemsets having a utility less than minutil are eliminated. The remaining
itemsets are output as the high utility itemsets. The result is shown in Table 5.
A major advantage of pattern-growth algorithms is that they only explore itemsets
that actually appear at least once in the input database, contrarily to Apriori-based
algorithms, which may generate patterns that do not appear in the database. Besides,
the concept of projected database is also useful to reduce the cost of database scans,
since projected databases are smaller than the original database. A common question
about the concept of projected database is: is it costly to create all these copies of
the original database? The answer is no if an optimization called pseudo-projection
is used, which consists of implementing a projected database as a set of pointers on
the original database rather than as a copy [69, 81]. For example, Fig. 3 shows the
pseudo-projected database of {a, c}, which is equivalent to the projected database of
Table 4, excepts that it is implemented using three pointers on the original database,
to avoid creating a copy of the original database. Note that many other optimiza-
tions can also be integrated in pattern-growth algorithms. For example, the IHUP
[5], UP-Growth [79], HUP-Growth [52], and MU-Growth [87] algorithms utilize
prefix-tree structures for representing projected-databases and reduce memory usage.
These structures extends the FP-tree structure used in frequent itemset mining by the
FPGrowth algorithm [39]. The main differences between these algorithms lies in the
use of various strategies to reduce the TWU upper-bounds on the utility. Among two
phase algorithms, UP-Growth is one of the fastest. It was shown to be up to 1,000
times faster than Two-phase and IHUP. More recent two phase algorithms such as PB
and MU-Growth have introduced various optimizations and different design but only
provide a small speed improvement over Two-Phase or UP-Growth (MU-Growth is
reported to be only up to 15 times faster than UP-Growth).
Although two phase algorithms have been well-studied and introduced many
key ideas, they remain inefficient. As explained, two phase algorithm mines high
utility itemsets in two phases. In phase 1, a set of candidates is found. Then, in
phase 2, the utility of these candidates is calculated by scanning the database. Then,
low-utility itemsets are filtered and the high utility itemsets are returned to the user.
This approach is inefficient because the set of candidate itemsets found in phase 1
can be very large and performing the second phase to evaluate these candidates is
very costly [79, 90]. In the worst case, all candidate itemsets are compared to all
transactions of the database during the second phase. Thus, the performance of two
phase algorithms is highly influenced by the number of candidates generated to find
the actual high utility itemsets. To reduce the number of candidates, various strategies
have been design to decrease the TWU upper-bound, and thus prune more candidates
[5, 52, 79, 87]. But to address the fundamental problem of two phase algorithms,
which is to generate candidates, one-phase algorithms have been designed, which
are described next.
The second major breakthrough in high utility itemset mining has been the design
of algorithms that do not generate candidates. These one phase algorithms imme-
diately calculate the utility of each pattern considered in the search space. Thus, an
itemset can be immediately identified as a low utility or high utility itemset, and
candidates do not need to be stored in memory. The concept of one phase algorithm
was first published in HUI-Miner [58, 71], and then in the d2 HUP [60] algorithm.
Then, improved and more efficient one phase algorithms have been designed such
as FHM [31], mHUIMiner [70], ULB-Miner [17], HUI-Miner* [71] and EFIM [94].
Besides the novelty of discovering high utility itemsets in one phase, one phase
algorithms have also introduced novel upper-bounds on the utility of itemsets that
are based on the exact utility of each itemset, and can thus prune a larger part of
the search space compared to the TWU measure. These upper-bounds include the
remaining utility [58, 60], and newer measures such as the local-utility and sub-tree
utility [94]. The next subsections gives an overview of one phase algorithms.
One of the most popular type of high utility itemset mining algorithms are those
based on the utility-list structure. This structure was introduced in the HUI-Miner
algorithm [58] by generalizing the tid-list structure [91] used in frequent itemset min-
ing. Then, faster utility-list based algorithms have been proposed such as FHM [31],
mHUIMiner [70] and ULB-Miner [17], and extensions have been proposed for sev-
eral variations of the high utility itemset mining problem. The reason for the popu-
larity of utility-list based algorithms is that they are fast and easy to implement. This
subsection describes the FHM algorithm [31] as a representative utility-list based
algorithm, which was shown to be up to seven times faster than HUI-Miner, and as
been used and extended by many researchers.
FHM is a one-phase algorithm that performs a depth-first search to explore the
search space of itemsets. During the search, the FHM algorithm creates a utility-
list for each visited itemset in the search space. The utility-list of an itemset stores
information about the utility of the itemset in transactions where it appears, and
information about the utilities of remaining items in these transactions. Utility-lists
allows to quickly calculate the utility of an itemset and upper-bounds on the utility
of its super-sets, without scanning the database. Moreover, utility-lists of k-itemsets
k > 1) can be quickly created by joining utility-lists of shorter patterns. The utility-
list structure is defined as follows.
For example, assume that is the alphabetical order. The utility-lists of {a}, {d}
and {a, d} are shown in Fig. 4. Consider the utility-list of {a}. It contains three rows
(tuples) corresponding to transactions T0 , T2 and T3 since {a} appears in these three
transactions. The second column of the utility list (iutil values) of {a} indicates that
A Survey of High Utility Itemset Mining 19
the utility of {a} in T0 , T2 and T3 are 5, 5, and 10, respectively. The third column of
the utility list of {a} indicates that the r util values of {a} for transactions T0 , T2 and
T3 are 20, 3, and 10, respectively.
The FHM algorithm scans the database once to create the utility-lists of 1-
itemsets (single items). Then, the utility-lists of larger itemsets are constructed by
joining the utility-lists of smaller itemsets. The join operation for single items is
performed as follows. Consider two items x, y such that x y, and their utility-
lists ul({x}) and ul({y}). The utility-list of {x, y} is obtained by creating a tuple
(ex.tid, ex.iutil + ey.iutil, ey.r util) for each pairs of tuples ex ∈ ul({x}) and
ey ∈ ul({y}) such that ex.tid = ey.tid. The join operation for two itemsets P ∪ {x}
and P ∪ {y} such that x y is performed as follows. Let ul(P), ul({x}) and ul({y})
be the utility-lists of P, {x} and {y}. The utility-list of P ∪ {x, y} is obtained by cre-
ating a tuple (ex.tid, ex.iutil + ey.iutil − ep.iutil, ey.r util) for each set of tuples
ex ∈ ul({x}), ey ∈ ul({y}), ep ∈ ul(P) such that ex.tid = ey.tid = ep.tid. For
example, the utility-list of {a, d} can be obtained by joining the utility-lists of {a}
and {d} (depicted in Fig. 4), without scanning the database.
The utility-list structure of an itemset is very useful, as it allows to directly obtain
the utility of an itemset without scanning the database.
Property 5 (Calculating the utility of an itemset using its utility-list) Let there be
in its utility-list ul(X ) is equal to the utility
an itemset X . The sum of the iutil values
of X [58]. In other words, u(X ) = e∈ul(X ) e.iutil.
For example, the utility of the itemset {a, d} is equal to the sum of the values
in the iutil column of its utility-list (depicted in Fig. 4). Hence, by looking at the
utility-list of {a, d}, it is found that its utility is u({a, d}) = 11 + 7 = 18.
The utility-list of an itemset is also used to prune the search space based on the
following definition and property.
For example, consider calculating the remaining utility upper-bound of the itemset
{a, d} using its utility-list (depicted in Fig. 4). The upper-bound is the sum of the
values in the iutil and rutil columns of its utility-list, that is r eu({a, d}) = 11 + 7 =
18. It is thus known that the itemset {a, d} and all its extensions such as {a, d, e}
cannot have a utility greater than 18. If we assume that minutil = 25, as in the
running example, these itemsets can thus be pruned from the search space, as they
will be low-utility itemsets. This is formalized by the following property.
II
Of the events over which Miss Nightingale cried alas! in this letter,
the one which came first was the loss of Mr. Villiers's Poor Law Bill.
The loss, however, as she rightly surmised in writing to Miss
Martineau, was only temporary. The whole subject is connected with
a distinct branch of Miss Nightingale's work, of which a description
must be reserved for the next chapter. She was in large measure, as
we shall hear, the founder of Sick Nursing among the Indigent Poor,
and a pioneer in Poor Law Reform.
III
So, then, she had been too late. “I am furious to that degree,” she
wrote to Captain Galton (June 23), “at having lost Lord de Grey's
five months at the India Office that I am fit to blow you all to pieces
with an infernal machine of my own invention.” She threw some of
the blame upon Dr. Sutherland, whose mission to the Mediterranean
she had not been able to cancel, and who, for weeks at a time
during this year, was absent at Malta and Gibraltar or in Algiers.
Algiers, indeed, she wrote tauntingly, “why not Astley's?” That would
be quite as good a change for him. Sometimes she varied the figure,
and Dr. Sutherland and his party figured in her letters as Wombwell's
Menagerie. “The Menagerie, I hear,” she wrote (Jan. 26), “including
three ladies, H.M. Commissioners, and two ladies' maids, has gone
after a column in the interior.” Had he stayed at home, he might
have been able to find the missing dispatch; and in any case they
could have written at leisure, from the hints in Sir John Lawrence's
letter to her, the Memorandum which they ultimately had to write in
haste. The truant seems to have foreseen what a rod in pickle was
awaiting him on his return. “I have been thinking,” he wrote to her
from Algiers (Jan. 28), “Will she be glad to hear from me? or Will
she swear? I don't know, but nevertheless I will tell her a bit of my
mind about our visit to Astley's.” And he goes on to write an
admirable account of his experiences, in which he ingeniously
emphasizes the vast importance of his inquiries in connection with
their Indian work. Nor was this only an excuse; Dr. Sutherland's
Report on Algeria, and the French sanitary service there, was a most
valuable piece of work. It is impossible to read his writings—whether
in published reports or in his manuscripts among Miss Nightingale's
papers—without perceiving how well based was the reliance which
she placed upon his collaboration. His wife stayed at home and saw
much of Miss Nightingale. Mrs. Sutherland must have reported the
state of things in South Street; for a month later Dr. Sutherland
wrote thus to Miss Nightingale (Feb. 20): “The mail which ought to
have arrived yesterday came in to-day, and I am trying to save the
out mail, which leaves the harbour at 12, without much prospect of
success. I have had a letter to-day from home about you, and if it
had come yesterday, Ellis and I would certainly have been
embarking to-day for England. After the account of your suffering,
and of the pressure of business under which you are sinking, I feel
wild to get away from this. To-night we leave Algeria, and by the
time you get this we will be on our way home. God bless you and
keep you to us. Amen.” Well, I can only hope that Dr. Sutherland
enjoyed his trip while it lasted; for I fear that he may have had a
bad quarter-of-an-hour when he reported himself at South Street on
his return. She had complained of his absence to another of her
close allies, Dr. Farr. “I have all Dr. Sutherland's business to do,” she
wrote (Jan. 19), “besides my own. If it could be done, I should not
mind. I had just as soon wear out in two months as in two years, so
the work be done. But it can't. It is just like two men going into
business with a million each. The one suddenly withdraws. The other
may wear himself to the bone, but he can't meet the engagements
with one million which he made with two. Add to this, I have been
so ill since the beginning of the year as to be often unable to have
my position moved from pain for 48 hours at a time. But to
business.…”
One good stroke of business, however, Miss Nightingale had been
able to do during Dr. Sutherland's absence. She reported it to
Dr. Farr: “The compensation to my disturbed state of mind has been
a convert to the sanitary cause I have made for Madras—no less a
person than Lord Napier. I managed to scramble up to see him
before he sailed.” The “conversion” means not necessarily that Lord
Napier needed to find salvation, but refers rather to the fact that his
predecessor in the governorship of Madras had been unsympathetic.
Lord Napier, on receiving the appointment, had expressed a desire to
learn Miss Nightingale's views. He had been secretary to the British
Embassy at Constantinople during the Crimean War, and had there
formed a high opinion of her ability and devotion. She now wrote to
him about Indian sanitary reform, and he at once replied:—
(Lord Napier to Miss Nightingale.) 24 Princes Gate, Feb. 16 [1866]. I beg
you to believe that I am far from being impatient of your communication
or indifferent to your wishes. I have read your letter with great interest,
and I regret that you had not time and strength to make it longer. You will
confer a great favour on me by sending me the 8vo volume of which you
speak, and I would not stumble at the two folio blue books.… The
Sanitary question like the railway question or the irrigation question will
probably remain subordinated in some degree to financial requirements,
to the necessity of shewing a surplus at the end of the year; but within
the limits of my available resources I promise you a zealous intervention
on behalf of the cause you have so much at heart. You say that you do
not know me well; but you cannot deprive me of the happiness and honor
of having seen you at the greatest moment of your life in the little parlour
of the hospital at Scutari. I was a spectator, and I would have been a
fellow-labourer if any one would have employed my services. I remain at
your orders for any day and hour.—Very sincerely yours, Napier.
Their interview took place three days later. Lord Napier, during his
governorship of Madras, which lasted six years, tried hard to fulfil his
promise. To other matters he attended also; but it was to questions
connected with the public health that he devoted his most particular
attention, and throughout his residence in India he kept up a
correspondence with Miss Nightingale about them.
IV
Captain Galton replied that he had it from Mr. Lowe himself that
he would not join the Tories; that of the actual appointments he had
not as yet heard; but that as the Secretary of State's was an
impersonal office, Dr. Sutherland's commission to visit the
Mediterranean would still hold good—or bad. “You say the S. of S. is
an impersonal creature,” replied Miss Nightingale (July 3); “I wish he
wuz!” When the names of the new Ministers were announced,
Captain Galton threw out a suggestion tentatively that Lord
Cranborne[71] (India Office) might be approachable through Lady
Cranborne. “I have a much better recommendation to him than
that,” wrote Miss Nightingale in some triumph (July 7), “and have
already been put into ‘direct communication’ with him, not at my
own request.” The letters tell the story of her introduction to new
masters at the India Office and the Poor Law Board:—
(Lord Stanley to Miss Nightingale.) St. James's Square, July 6. I shall see
Lord Cranborne to-day (we go down to be sworn in) and will tell him the
whole sanitary story, and also say that I have advised you to write to him
as you have always done to me to my great advantage. You will find him
shrewd, industrious, and a good man of business.
(Mr. Gathorne Hardy to Miss Nightingale.) Poor Law Board, July 25. You
owe me no apology for calling my attention to material points connected
with the subject in the consideration of which you are so much engaged. I
should say this to any one who wrote in the same spirit as yourself, but I
am really indebted to you who have earned no common title to advise and
suggest upon anything which affects the treatment of the sick. Your note
arrived at the very instant when a gentleman was urging me to lay before
you questions relating to Workhouse Infirmaries, and I should not have
hesitated to do so if needful even without the cordial invitation which you
give me to ask your assistance. At present I have not advanced very far
from want of time, as while Parliament is sitting I am necessarily very
much occupied with other business, and I am anxious to remedy, if
possible, present and urgent grievances before I enter thoroughly upon
legislation for the future. I shall bear in mind the offer which you have
made and in all probability avail myself of it to the full.
V
Meanwhile Miss Nightingale had been very busily engaged with
the correspondence and other tasks thrown upon her by the
outbreak of war in Europe. “Saw Florence for half an hour this
morning,” reported her father (June); “over-fatigued certainly, but
speaking with a voice only too loud and strong. Princess [Alice of]
Hesse writes to her to ask for instructions for the hospitals there,
and Sutherland's joke is ‘There's nothing left for you, all is gone to
Garibaldi.’” She had been applied to by representatives of all three
combatants. Prussia, as usual, was the better prepared, and the
Crown Princess had written to Miss Nightingale in March (three
months before hostilities actually began) asking for her assistance
and advice about hospital and nursing arrangements. A Prussian
manufacturer communicated with her about the best form of
hospital tents for field-service. The two sisters of the British Royal
House were on opposite sides in this war, for Hesse-Darmstadt had
thrown in its lot with Austria; but it was not till after the outbreak of
hostilities that the Princess Alice wrote to Miss Nightingale through
Lady Ely[72] for advice about war hospitals. Miss Nightingale at once
sent it. Her Memorandum, she was told (July 3), had been
forwarded to Prince Louis for use at Headquarters, and the Princess
begged her to send further information for use by the hospital
authorities in Darmstadt. The Italians had been earlier in “going to
Miss Nightingale.” The Secretary of the “Florence Committee for
helping the Sick and Wounded” had written to her for advice in May.
Her reply caused great delight, as an English correspondent at
Florence recorded. “I have read the letter,” he wrote, “which will be
translated and inserted in the Nazione. Miss Nightingale gives, with
her accustomed clearness and precision, excellent advice to the
Committee, which some of them very much need. At the same time
she expresses her cordial sympathy with the Italian cause. She
recalls the admirable condition in which the Sardinian army was
landed in the Crimea, and the praise which its appearance extorted
from Lord Clyde. And she concludes her letter by saying that if the
sacrifice of her poor life would hasten their cause by one half-hour,
she would gladly give it them. But she is a miserable invalid.”[73] The
Committee had asked whether she would not come to Italy “were it
but for one day” in order to inspire them by her presence. Her piece
of “froth” (as she called it) was widely printed in the Italian press.
She had deplored the outbreak of the war, but when it resulted in an
extension of the boundaries of free Italy she felt that there were
compensations. Miss Nightingale also joined the Committee of the
“Ladies' Association” formed in this country “for the Relief of the Sick
and Wounded of all nations engaged.” She advised the Committee
on the form of aid most requisite, and at the end of the war, in
thanking the Crown Princess of Prussia for a letter, she gave Her
Royal Highness an account of what had been done by the English
Committee. The correspondence with the Princess was long, and it
formed a new tie between Miss Nightingale and Mr. Jowett, who was
a great favourite with the Crown Princess and who entertained a
very high opinion of her abilities. The answering letter from the
Princess covers eighteen pages, containing (as Dr. Sutherland said of
it) “just the kind of practical information which a person who has
had experience in these matters desires to obtain.” A characteristic
extract or two from the correspondence on each side must here
suffice:—
(Miss Nightingale to the Crown Princess of Prussia.) 35 South Street,
Sept. 22 [1866].… I think your Royal Highness may be pleased to hear
even the humble opinion of an old campaigner like myself about how well
the Army Hospital Service was managed in the late terrible war.
Information reached me through my old friends and trainers of
Kaiserswerth. The Knights of St. John of Jerusalem took charge of all the
Deaconesses and all the offers of houses and rooms made to them. The
system seems to me to have been admirably managed—especially the
sending away the wounded in hundreds to towns where rooms and
houses and nursing were offered. The overcrowding[118] and massing
together of large numbers of wounded is always more disastrous than
battle itself. From many different quarters I have heard of the great
devotion, skill and generous kindness of the Prussian surgeons—to all
sides alike.… On this, the day of Manin's death nine years ago, the exiled
Dictator of Venice and one of the purest and most far-seeing of
statesmen, who fought so good a battle for the freedom of Venice, but
who did not live to see its accomplishment, I cannot but congratulate your
Royal Highness, at the risk of impertinence, at seeing the fulfilment of that
liberation brought about by Prussian arms.
VI
The year 1866 was, then, one of great activity with Miss
Nightingale; but by the middle of August her work was not at such
high pressure as in the preceding months. Parliament was up, and
the new Ministers, with whom she had established friendly relations,
were turning round. At this time a home call came to Miss
Nightingale. Her mother was reported to be ailing. She was
disinclined to make the usual move with her husband from
Hampshire to Derbyshire; so, while the father went to Lea Hurst,
Miss Nightingale decided to stay with her mother at Embley. It was
an event in the family circle, for Florence had not been to either of
the homes for ten years. There was much correspondence and many
preparations. Father and mother were equally delighted, and the
journey in an invalid carriage did the daughter no serious harm. She
stayed at Embley from the middle of August till the end of
November. It was the first holiday she had taken, for ten years also;
but it was not much of a holiday either. She set to work on the
health of Romsey, the nearest town, and of Winchester, the county
town. She wrote up to her friend Dr. Farr at the Registrar-General's
Office for the mortality tables, found the figures for those towns
above the average, and bade the citizens look to their drains. Then
she commanded Dr. Sutherland to Embley for the transaction of
business in view of next year's session. She found her mother happy
and cheerful. “I don't think my dear mother was ever more touching
or interesting to me,” she wrote to Madame Mohl (Aug. 21), “than
she is now in her state of dilapidation. She is so much gentler,
calmer, more thoughtful.” She was a little critical, however, of her
mother still, and thought her habits self-indulgent. Poor lady! she
was 78; she had been shaken and bruised in a carriage accident,
and was threatened with the loss of her eye-sight. Certainly,
Florence was not always able to make due allowances for other
people. But if she was critical of others, she was yet more severe
with herself. During this holiday at Embley, she resumed those
written self-examinations and meditations for which, frequent in her
earlier years, she seems to have found little time during the
strenuous decade 1856–66. “I never failed in energy,” she said once
in later years; “but to do everything from the best motive—that is
quite another thing.” In reviewing her past life on October 21, 1866,
the anniversary of her departure for the Crimea, and on subsequent
days, she seems to have had a like thought. Her meditations were
not so much of what she had done as of what she had done amiss;
her resolutions were of greater purity of motive, and greater peace,
through a more entire trust in God: “Called to be the ‘handmaid of
the Lord,’ and I have complained of my suffering life! What return
does God expect from me—with what purity of heart and intention
should I make an offering of myself to Him! The word of the Lord
unto thee: He was oppressed and he was afflicted, yet he opened
not his mouth.… But, when we are ill, how can we be like God? I
look up and see the drops of dew, blue, golden, green, and red,
glittering in the sun on the top of the deciduous cypress—that is like
God. We see Him for a moment—we perceive His beauty. It lights
us, even when we lie here prostrate.… Blessed are the pure in heart:
for they shall see God—in all temptation, trials, and aridities, in the
agony and bloody sweat, in the Cross and Passion: this is not the
prerogative of the future life, but of the present.”
Footnotes:
[2] Miss Nightingale related this incident in two letters—to Dr. Farr
(Sept. 10), and to Harriet Martineau (Sept. 24).
[7] The reference here is to the Aunt who, in earlier years, had
been in close companionship with her. At this time there was some
misunderstanding between them. Mrs. Smith's advancing age and
home claims brought a cessation of her constant activity in Miss
Nightingale's service; but in later years aunt and niece took much
counsel together in a resumed study of the religious subjects upon
which they had formerly held intimate converse: see below, pp. 353,
387.
[12] A true prediction: see Sir Bartle Frere's saying, below, p. 158.
[19] Her nominations were, in the end, all approved. The Indian
representatives were Sir Proby Cautley and Sir James Ranald Martin.
[20] “On the 5th February 1864, the Government of India informed
the Secretary of State that, in consequence of the non-arrival of the
Report of the Royal Commission, it had not been possible to carry
out the measures indicated in the despatch of the 15th August, but
that having just received a few copies, &c., &c.” (Memorandum on
Measures adopted for Sanitary Improvements in India up to the end
of 1867, p. 2).
[24] This was not designedly a practical joke. The Clerk to the
Commission held a post in the Board.
[33] This was no idle taunt. The Government of India had already
put in force some of the recommendations of the Royal Commission
before it had officially received copies of the Report: see above, p. 34
and n.
[38] If any reader should desire to follow up the criticisms and the
replies, he will find the Reply to Dr. Leith in Parliamentary Papers,
1865, No. 329; and the Government of India's dispatch with the
Reply, in Nos. 108 and 324. Dr. Leith's Report does not appear to
have been reprinted as a Parliamentary Paper. A copy of it, printed at
Bombay, 1864, is among Miss Nightingale's papers.
[55] Attention was called to it, and the moral was pointed, by a
leading article in the Daily News (July 8), doubtless written by Harriet
Martineau.
[60] Her places of residence in 1862 and 1863 have been given
above, p. 24. In 1864 she lived at 32 (now No. 4) South Street, the
Verneys' house (Jan.); at 115 Park Street (Feb.–July); at 7 Oak Hill
Park, Hampstead (Aug.–Oct.). She was at 27 Norfolk Street from Nov.
1864 to May 1, 1865. During May and June 1865 and again in Oct.,
she was at 34 (now No. 8) South Street; in July–Sept., she was at
Hampstead.
[61] She was still so beset by begging letters, that Mr. Smith had a
notice inserted in the Times of April 29, 1864, to the effect that she
could not answer them or return any papers enclosed to her.
[63] See the Times, April 18, 1864. The interview took place on
Sunday afternoon April 17. On the day before, Garibaldi had been at
Bedford.
MANY THREADS
(1867–1872)
I beg of you and pray you to look back upon the past with thankfulness
and upon the future with hope—when there has been so much done and
there is so much to do … many beginnings and ravelled threads to be
woven in and completed.—Benjamin Jowett (Letter to Miss Nightingale,
1867).
CHAPTER I
WORKHOUSE REFORM
(1864–1867)
From the first I had a sort of fixed faith that Florence Nightingale could
do anything, and that faith is still fresh in me; and so it came to pass that
the instant that name entered the lists I felt the fight was virtually won,
and I feel this still.—H. B. Farnall, Poor Law Inspector (Dec. 1866).
Fifty years ago the state of things which Miss Nightingale had
seen, and cured, in the military hospitals during the Crimean War
was almost equalled, and was in some respects surpassed in
scandal, by the condition of the peace hospitals for the sick poor at
home. Those hospitals were the sick wards or infirmaries of
workhouses, for the hospitals usually so-called skim only the surface
of sickness in any great town. The state of the Metropolitan
workhouses, as reported upon by the Poor Law Board in 1866,
showed that the sick wards were for the most part insanitary and
overcrowded; that the beds were insufficient and admirably
contrived to induce sores; that the eating and drinking vessels were
unclean; that there was a deficiency of basins, towels, brushes and
combs; that the food for the patients was cooked by paupers and
frequently served cold; that although the medical officers did their
duty to the best of their ability, the attendance given and the salaries
paid were inadequate to the needs of the sick. As for the nursing, it
was done by paupers, many of whom could neither read nor write,
whose love of drink often drove them to rob the sick of stimulants,
and whose treatment of the poor was characterized neither by
judgment nor by gentleness. This is the restrained euphemism of an
official report.[74] Sometimes a patient would miss the ministration
of a nurse for days because the pauper charged to give it was
herself bed-ridden. The rule of one nurse was to give medicine three
times a day to the very ill and once to the rather ill. It was
administered in a gallipot; the nurse “poured out the medicine and
judged according.” Cases were reported in which a patient's bed was
not made for five days and nights; in which patients had no food
from 4 o'clock in the afternoon of one day to 8 o'clock in the
morning of the next; in which patients died, or, to speak more
correctly, were killed, by the most wanton neglect.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com