Classification and Regression Trees First Issued
In Hardback Edition Breiman pdf download
https://textbookfull.com/product/classification-and-regression-trees-first-issued-in-hardback-
edition-breiman/
★★★★★ 4.7/5.0 (27 reviews) ✓ 218 downloads ■ TOP RATED
"Perfect download, no issues at all. Highly recommend!" - Mike D.
DOWNLOAD EBOOK
Classification and Regression Trees First Issued In Hardback
Edition Breiman pdf download
TEXTBOOK EBOOK TEXTBOOK FULL
Available Formats
■ PDF eBook Study Guide TextBook
EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME
INSTANT DOWNLOAD VIEW LIBRARY
Collection Highlights
Classification and Regression Trees Leo Breiman
Special relativity First Issued In Hardback Edition French
Principles of cosmology and gravitation First Issued In
Hardback Edition Berry
Dictionary of classical and theoretical mathematics First
Issued In Hardback Edition Cavagnaro
Forces of production a social history of industrial
automation First Issued In Hardback Edition Noble
Problems in organic structure determination : a practical
approach to NMR spectroscopy First Issued In Hardback
2017. Edition Linington
Beowulf and other stories a new introduction to old
english old icelandic and anglo norman literatures Second
Edition, First Issued In Hardback Edition Allard
Defects and damage in composite materials and structures
First Issued In Paperback Edition Heslehurst
Aural Architecture in Byzantium Music Acoustics and Ritual
First Issued In Paperback Edition Routledge.
Table of Contents
Dedication
Title Page
Copyright Page
PREFACE
Acknowledgements
Chapter 1 - BACKGROUND
1.1 CLASSIFIERS AS PARTITIONS
1.2 USE OF DATA IN CONSTRUCTING CLASSIFIERS
1.3 THE PURPOSES OF CLASSIFICATION ANALYSIS
1.4 ESTIMATING ACCURACY
1.5 THE BAYES RULE AND CURRENT CLASSIFICATION
PROCEDURES
Chapter 2 - INTRODUCTION TO TREE CLASSIFICATION
2.1 THE SHIP CLASSIFICATION PROBLEM
2.2 TREE STRUCTURED CLASSIFIERS
2.3 CONSTRUCTION OF THE TREE CLASSIFIER
2.4 INITIAL TREE GROWING METHODOLOGY
2.5 METHODOLOGICAL DEVELOPMENT
2.6 TWO RUNNING EXAMPLES
2.7 THE ADVANTAGES OF THE TREE STRUCTURED
APPROACH
Chapter 3 - RIGHT SIZED TREES AND HONEST ESTIMATES
3.1 INTRODUCTION
3.2 GETTING READY TO PRUNE
3.3 MINIMAL COST-COMPLEXITY PRUNING
3.4 THE BEST PRUNED SUBTREE: AN ESTIMATION PROBLEM
3.5 SOME EXAMPLES
APPENDIX
Chapter 4 - SPLITTING RULES
4.1 REDUCING MISCLASSIFICATION COST
4.2 THE TWO-CLASS PROBLEM
4.3 THE MULTICLASS PROBLEM: UNIT COSTS
4.4 PRIORS AND VARIABLE MISCLASSIFICATION COSTS
4.5 TWO EXAMPLES
4.6 CLASS PROBABILITY TREES VIA GINI
APPENDIX
Chapter 5 - STRENGTHENING AND INTERPRETING
5.1 INTRODUCTION
5.2 VARIABLE COMBINATIONS
5.3 SURROGATE SPLITS AND THEIR USES
5.4 ESTIMATING WITHIN-NODE COST
5.5 INTERPRETATION AND EXPLORATION
5.6 COMPUTATIONAL EFFICIENCY
5.7 COMPARISON OF ACCURACY WITH OTHER METHODS
APPENDIX
Chapter 6 - MEDICAL DIAGNOSIS AND PROGNOSIS
6.1 PROGNOSIS AFTER HEART ATTACK
6.2 DIAGNOSING HEART ATTACKS
6.3 IMMUNOSUPPRESSION AND THE DIAGNOSIS OF CANCER
6.4 GAIT ANALYSIS AND THE DETECTION OF OUTLIERS
6.5 RELATED WORK ON COMPUTER-AIDED DIAGNOSIS
Chapter 7 - MASS SPECTRA CLASSIFICATION
7.1 INTRODUCTION
7.2 GENERALIZED TREE CONSTRUCTION
7.3 THE BROMINE TREE: A NONSTANDARD EXAMPLE
Chapter 8 - REGRESSION TREES
8.1 INTRODUCTION
8.2 AN EXAMPLE
8.3 LEAST SQUARES REGRESSION
8.4 TREE STRUCTURED REGRESSION
8.5 PRUNING AND ESTIMATING
8.6 A SIMULATED EXAMPLE
8.7 TWO CROSS-VALIDATION ISSUES
8.8 STANDARD STRUCTURE TREES
8.9 USING SURROGATE SPLITS
8.10 INTERPRETATION
8.11 LEAST ABSOLUTE DEVIATION REGRESSION
8.12 OVERALL CONCLUSIONS
Chapter 9 - BAYES RULES AND PARTITIONS
9.1 BAYES RULE
9.2 BAYES RULE FOR A PARTITION
9.3 RISK REDUCTION SPLITTING RULE
9.4 CATEGORICAL SPLITS
Chapter 10 - OPTIMAL PRUNING
10.1 TREE TERMINOLOGY
10.2 OPTIMALLY PRUNED SUBTREES
10.3 AN EXPLICIT OPTIMAL PRUNING ALGORITHM
Chapter 11 - CONSTRUCTION OF TREES FROM A LEARNING
SAMPLE
11.1 ESTIMATED BAYES RULE FOR A PARTITION
11.2 EMPIRICAL RISK REDUCTION SPLITTING RULE
11.3 OPTIMAL PRUNING
11.4 TEST SAMPLES
11.5 CROSS-VALIDATION
11.6 FINAL TREE SELECTION
11.7 BOOTSTRAP ESTIMATE OF OVERALL RISK
11.8 END-CUT PREFERENCE
Chapter 12 - CONSISTENCY
12.1 EMPIRICAL DISTRIBUTIONS
12.2 REGRESSION
12.3 CLASSIFICATION
12.4 PROOFS FOR SECTION 12.1
12.5 PROOFS FOR SECTION 12.2
12.6 PROOFS FOR SECTION 12.3
BIBLIOGRAPHY
NOTATION INDEX
SUBJECT INDEX
Lovingly dedicated to our children
Jessica, Rebecca, Kymm;
Melanie;
Elyse, Adam, Rachel, Stephen;
Daniel and Kevin
Library of Congress Cataloging-in-Publication Data
Main entry under title:
Classification and regession trees.
(The Wadsworth statistics/probability series)
Bibliography: p.
Includes Index.
ISBN 0-412-04841-8
1. Discriminant analysis. 2. Regression analysis.
3. Trees (Graph theory) I. Breiman, Leo. II. Title:
Regression trees. II. Series.
QA278.65.C54 1984
519.5′36—dc20
83-19708
CIP
This book contains information obtained from authentic and highly regarded sources.
Reprinted material is quoted with permission, and sources are indicated. A wide variety of
references are listed. Reasonable efforts have been made to publish reliable data and
information, but the author and the publisher cannot assume responsibility for the validity
of all materials or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, microfilming, and recording, or by
any information storage or retrieval system, without prior permission in writing from the
publisher.
The consent of CRC Press LLC does not extend to copying for general distribution, for
promotion, for creating new works, or for resale. Specific permission must be obtained in
writing from CRC Press LLC for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.
Trademark Notice: Product or corporate names may be trademarks or registered
trademarks, and are used only for identification and explanation, without intent to infringe.
Visit the CRC Press Web site at www.crcpress.com
First CRC Press reprint 1998
© 1984, 1993 by Chapman & Hall
No claim to original U.S. Government works
International Standard Book Number 0-412-04841-8
Library of Congress Card Number 83-19708
Printed in the United States of America 7 8 9 0
Printed on acid-free paper
PREFACE
The tree methodology discussed in this book is a child of the
computer age. Unlike many other statistical procedures which were
moved from pencil and paper to calculators and then to computers,
this use of trees was unthinkable before computers.
Binary trees give an interesting and often illuminating way of
looking at data in classification or regression problems. They should
not be used to the exclusion of other methods. We do not claim that
they are always better. They do add a flexible nonparametric tool to
the data analyst’s arsenal.
Both practical and theoretical sides have been developed in our
study of tree methods. The book reflects these two sides. The first
eight chapters are largely expository and cover the use of trees as a
data analysis method. These were written by Leo Breiman with the
exception of Chapter 6 by Richard Olshen. Jerome Friedman
developed the software and ran the examples.
Chapters 9 through 12 place trees in a more mathematical context
and prove some of their fundamental properties. The first three of
these chapters were written by Charles Stone and the last was
jointly written by Stone and Olshen.
Trees, as well as many other powerful data analytic tools (factor
analysis, nonmetric scaling, and so forth) were originated by social
scientists motivated by the need to cope with actual problems and
data. Use of trees in regression dates back to the AID (Automatic
Interaction Detection) program developed at the Institute for Social
Research, University of Michigan, by Morgan and Sonquist in the
early 1960s. The ancestor classification program is THAID,
developed at the institute in the early 1970s by Morgan and
Messenger. The research and developments described in this book
are aimed at strengthening and extending these original methods.
Our work on trees began in 1973 when Breiman and Friedman,
independently of each other, “reinvented the wheel” and began to
use tree methods in classification. Later, they joined forces and were
joined in turn by Stone, who contributed significantly to the
methodological development. Olshen was an early user of tree
methods in medical applications and contributed to their theoretical
development.
Our blossoming fascination with trees and the number of ideas
passing back and forth and being incorporated by Friedman into
CART (Classification and Regression Trees) soon gave birth to the
idea of a book on the subject. In 1980 conception occurred. While
the pregnancy has been rather prolonged, we hope that the baby
appears acceptably healthy to the members of our statistical
community.
The layout of the book is
Readers are encouraged to contact Richard Olshen regarding the
availability of CART software.
ACKNOWLEDGMENTS
Three other people were instrumental in our research: William
Meisel, who early on saw the potential in tree structured methods
and encouraged their development; Laurence Rafsky, who
participated in some of the early exchanges of ideas; and Louis
Gordon, who collaborated with Richard Olshen in theoretical work.
Many helpful comments were supplied by Peter Bickel, William Eddy,
John Hartigan, and Paul Tukey, who all reviewed an early version of
the manuscript.
Part of the research, especially that of Breiman and Friedman, was
supported by the Office of Naval Research (Contract No. N00014-
82-K-0054), and we appreciate our warm relations with Edward
Wegman and Douglas De Priest of that agency. Stone’s work was
supported partly by the Office of Naval Research on the same
contract and partly by the National Science Foundation (Grant No.
MCS 80- 02732). Olshen’s work was supported by the National
Science Foundation (Grant No. MCS 79-06228) and the National
Institutes of Health (Grant No. CA-26666).
We were fortunate in having the services of typists Ruth Suzuki,
Rosaland Englander, Joan Pappas, and Elaine Morici, who displayed
the old-fashioned virtues of patience, tolerance, and competence.
We are also grateful to our editor, John Kimmel of Wadsworth, for
his abiding faith that eventually a worthy book would emerge, and to
the production editor, Andrea Cava, for her diligence and skillful
supervision.
1
BACKGROUND
At the University of California, San Diego Medical Center, when a
heart attack patient is admitted, 19 variables are measured during
the first 24 hours. These include blood pressure, age, and 17 other
ordered and binary variables summarizing the medical symptoms
considered as important indicators of the patient’s condition.
The goal of a recent medical study (see Chapter 6) was the
development of a method to identify high risk patients (those who
will not survive at least 30 days) on the basis of the initial 24-hour
data.
Figure 1.1 is a picture of the tree structured classification rule that
was produced in the study. The letter F means not high risk; G
means high risk.
This rule classifies incoming patients as F or G depending on the
yes-no answers to at most three questions. Its simplicity raises the
suspicion that standard statistical classification methods may give
classification rules that are more accurate. When these were tried,
the rules produced were considerably more intricate, but less
accurate.
The methodology used to construct tree structured rules is the
major story of this monograph.
FIGURE 1.1
1.1 CLASSIFIERS AS PARTITIONS
The general classification problem is similar to the medical diagnosis
problem sketched above. Measurements are made on some case or
object. Based on these measurements, we then want to predict
which class the case is in.
For instance, days in the Los Angeles basin are classified according
to the ozone levels:
Class 1: nonalert (low ozone)
Class 2: first-stage alert (moderate ozone)
Class 3: second-stage alert (high ozone)
During the current day, measurements are made on many
meteorological variables, such as temperature, humidity, upper
atmospheric conditions, and on the current levels of a number of
airborne pollutants. The purpose of a project funded by the
California Air Resources Board (Zeldin and Cassmassi, 1978) was to
explore methods for using the current-day measurements to predict
the classification of the following day.
An EPA project had this goal: The exact analysis of a complex
chemical compound into its atomic constituents is slow and costly.
Measuring its mass spectra can be done quickly and at relatively low
cost. Can the measured mass spectra be used to accurately predict
whether, for example, the compound is in
class 1 (contains one or more chlorine atoms), or
class 2 (contains no chlorine)?
(See Chapter 7 for more discussion.)
In these problems, the goal is the same. Given a set of
measurements on a case or object, find a systematic way of
predicting what class it is in. In any problem, a classifier or a
classification rule is a systematic way of predicting what class a case
is in.
To give a more precise formulation, arrange the set of
measurements on a case in a preassigned order; i.e., take the
measurements to be x1, x2, ..., where, say, x1 is age, x2 is blood
pressure, etc. Define the measurements (x1, x2, ...) made on a case
as the measurement vector x corresponding to the case. Take the
measurement space X to be defined as containing all possible
measurement vectors.
For example, in the heart attack study, X is a 19-dimensional
space such that the first coordinate x1 (age) ranges, say, over all
integer values from 0 to 200; the second coordinate, blood pressure,
might be defined as continuously ranging from 50 to 150. There can
be a number of different definitions of X. What is important is that
any definition of X have the property that the measurement vector x
corresponding to any case we may wish to classify be a point in the
space X.
Suppose that the cases or objects fall into J classes. Number the
classes 1, 2, ..., J and let C be the set of classes; that is, C = {1, ...
J}.
A systematic way of predicting class membership is a rule that
assigns a class membership in C to every measurement vector x in
X. That is, given any x ∈ X, the rule assigns one of the classes {1,
..., J} to x.
DEFINITION 1.1. A classifier or classification rule is a function d(x)
defined on X so that for every x, d(x) is equal to one of the numbers
1, 2, ..., J.
Another way of looking at a classifier is to define Aj as the subset
of X on which d(x) = j; that is,
Aj = {x; d(x) = j}.
The sets A1, ... , Aj are disjoint and X = Aj. Thus, the Aj form a
partition of X. This gives the equivalent
DEFINITION 1.2. A classifier is a partition of X into J disjoint subsets
A1, ... , Aj, X = Aj. such that for every x ∈ Aj the predicted class is
j.
1.2 USE OF DATA IN CONSTRUCTING
CLASSIFIERS
Classifiers are not constructed whimsically. They are based on past
experience. Doctors know, for example, that elderly heart attack
patients with low blood pressure are generally high risk. Los
Angelenos know that one hot, high pollution day is likely to be
followed by another.
In systematic classifier construction, past experience is
summarized by a learning sample. This consists of the measurement
data on N cases observed in the past together with their actual
classification.
In the medical diagnostic project the learning sample consisted of
the records of 215 heart attack patients admitted to the hospital, all
of whom survived the initial 24-hour period. The records contained
the outcome of the initial 19 measurements together with an
identification of those patients that did not survive at least 30 days.
The learning sample for the ozone classification project contained
6 years (1972-1977) of daily measurements on over 400
meteorological variables and hourly air pollution measurements at 30
locations in the Los Angeles basin.
The data for the chlorine project consisted of the mass spectra of
about 30,000 compounds having known molecular structure. For
each compound the mass spectra can be expressed as a
measurement vector of dimension equal to the molecular weight.
The set of 30,000 measurement vectors was of variable
dimensionality, ranging from about 50 to over 1000.
We assume throughout the remainder of this monograph that the
construction of a classifier is based on a learning sample, where
DEFINITION 1.3. A learning sample consists of data (x1, j1), ..., (xN,
jN) on N cases where xn ∈ X and jn ∈ {1, ..., J}, n = 1, ..., N. The
learning sample is denoted by L; i.e.,
L = {(x1, j1) ..., (xN, jN)}.
We distinguish two general types of variables that can appear in
the measurement vector.
DEFINITION 1.4. A variable is called ordered or numerical if its
measured values are real numbers. A variable is categorical if it
takes values in a finite set not having any natural ordering.
A categorical variable, for instance, could take values in the set
{red, blue, green}. In the medical data, blood pressure and age are
ordered variables.
Finally, define
DEFINITION 1.5. If all measurement vectors xn are of fixed
dimensionality, we say that the data have standard structure.
In the medical and ozone projects, a fixed set of variables is
measured on each case (or day); the data have standard structure.
The mass spectra data have nonstandard structure.
1.3 THE PURPOSES OF CLASSIFICATION
ANALYSIS
Depending on the problem, the basic purpose of a classification
study can be either to produce an accurate classifier or to uncover
the predictive structure of the problem. If we are aiming at the
latter, then we are trying to get an understanding of what variables
or interactions of variables drive the phenomenon—that is, to give
simple characterizations of the conditions (in terms of the
measurement variables x ∈ X) that determine when an object is in
one class rather than another. These two are not exclusive. Most
often, in our experience, the goals will be both accurate prediction
and understanding. Sometimes one or the other will have greater
emphasis.
In the mass spectra project, the emphasis was on prediction. The
purpose was to develop an efficient and accurate on-line algorithm
that would accept as input the mass spectrum of an unknown
compound and classify the compound as either chlorine containing
or not.
The ozone project shared goals. The work toward understanding
which meteorological variables and interactions between them were
associated with alert-level days was an integral part of the
development of a classifier.
The tree structured classification rule of Figure 1.1 gives some
interesting insights into the medical diagnostic problem. All cases
with blood pressure less than or equal to 91 are predicted high risks.
For cases with blood pressure greater than 91, the classification
depends only on age and whether sinus tachycardia is present. For
the purpose of distinguishing between high and low risk cases, once
age is recorded, only two variables need to be measured.
An important criterion for a good classification procedure is that it
not only produce accurate classifiers (within the limits of the data)
but that it also provide insight and understanding into the predictive
structure of the data.
Many of the presently available statistical techniques were
designed for small data sets having standard structure with all
variables of the same type; the underlying assumption was that the
phenomenon is homogeneous. That is, that the same relationship
between variables held over all of the measurement space. This led
to models where only a few parameters were necessary to trace the
effects of the various factors involved.
With large data sets involving many variables, more structure can
be discerned and a variety of different approaches tried. But
largeness by itself does not necessarily imply a richness of structure.
What makes a data set interesting is not only its size but also its
complexity, where complexity can include such considerations as:
High dimensionality
A mixture of data types
Nonstandard data structure
and, perhaps most challenging, nonhomogeneity; that is, different
relationships hold between variables in different parts of the
measurement space.
Along with complex data sets comes “the curse of dimensionality”
(a phrase due to Bellman, 1961). The difficulty is that the higher the
dimensionality, the sparser and more spread apart are the data
points. Ten points on the unit interval are not distant neighbors. But
10 points on a 10-dimensional unit rectangle are like oases in the
desert.
For instance, with 100 points, constructing a 10-cell histogram on
the unit interval is a reasonable procedure. In M dimensions, a
histogram that uses 10 intervals in each dimension produces 10M
cells. For even moderate M, a very large data set would be needed
to get a sensible histogram.
Another way of looking at the “curse of dimensionality” is the
number of parameters needed to specify distributions in M
dimensions:
Normal: O(M2)
Binary: O(2M)
Unless one makes the very strong assumption that the variables are
independent, the number of parameters usually needed to specify an
M-dimensional distribution goes up much faster than O(M). To put
this another way, the complexity of a data set increases rapidly with
increasing dimensionality.
With accelerating computer usage, complex, high dimensional
data bases, with variable dimensionality or mixed data types,
nonhomogeneities, etc., are no longer odd rarities.
In response to the increasing dimensionality of data sets, the most
widely used multivariate procedures all contain some sort of
dimensionality reduction process. Stepwise variable selection and
variable subset selection in regression and discriminant analysis are
examples.
Although the drawbacks in some of the present multivariate
reduction tools are well known, they are a response to a clear need.
To analyze and understand complex data sets, methods are needed
which in some sense select salient features of the data, discard the
background noise, and feed back to the analyst understandable
summaries of the information.
1.4 ESTIMATING ACCURACY
Given a classifier, that is, given a function d(x) defined on X taking
values in C, we denote by R*(d) its “true misclassification rate.” The
question raised in this section is: What is truth and how can it be
estimated?
One way to see how accurate a classifier is (that is, to estimate R*
(d)) is to test the classifier on subsequent cases whose correct
classification has been observed. For instance, in the ozone project,
the classifier was developed using the data from the years 1972-
1975. Then its accuracy was estimated by using the 1976-1977 data.
That is, R*(d) was estimated as the proportion of days in 1976-1977
that were misclassified when d(x) was used on the previous day
data.
In one part of the mass spectra project, the 30,000 spectra were
randomly divided into one set of 20,000 and another of 10,000. The
20,000 were used to construct the classifier. The other 10,000 were
then run through the classifier and the proportion misclassified used
as an estimate of R*(d).
The value of R*(d) can be conceptualized in this way: Using L,
construct d. Now, draw another very large (virtually infinite) set of
cases from the same population as L was drawn from. Observe the
correct classification for each of these cases, and also find the
predicted classification using d(x). The proportion misclassified by d
is the value of R*(d).
To make the preceding concept precise, a probability model is
needed. Define the space X C as a set of all couples (x, j)
wherex∈X and j is a class label, j ∈ C. Let P(A, j) be a probability on
X C, A ⊂ X, j ∈ C (niceties such as Borel measurability will be
ignored). The interpretation of P(A, j) is that a case drawn at
random from the relevant population has probability P(A, j) that its
measurement vector x is in A and its class is j. Assume that the
learning sample L consists of N cases (x1, j1), ..., (xN , jN)
independently drawn at random from the distribution P(A, j).
Construct d(x) using L. Then define R*(d) as the probability that d
will misclassify a new sample drawn from the same distribution as L.
DEFINITION 1.6 Take (X, Y) , X ∈ X, Y ∈ C, to be a new sample from
the probability distribution P(A, j); i.e.,
(i) P(X ∈ A, Y = j) = P(A, j),
(ii) (X, Y) is independent of L.
Then define
R*(d) = P(d(X) ≠ Y).
In evaluating the probability P(d(X) ≠ Y), the set L is considered
fixed. A more precise notation is P(d(X) ≠ Y|L), the probability of
misclassifying the new sample given the learning sample L.
This model must be applied cautiously. Successive pairs of days in
the ozone data are certainly not independent. Its usefulness is that it
gives a beginning conceptual framework for the definition of “truth.”
How can R*(d) be estimated? There is no difficulty in the
examples of simulated data given in this monograph. The data in L
are sampled independently from a desired distribution using a
pseudorandom number generator. After d(x) is constructed, 5000
additional cases are drawn from the same distribution independently
of L and classified by d. The proportion misclassified among those
5000 is the estimate of R*(d).
In actual problems, only the data in L are available with little
prospect of getting an additional large sample of classified cases.
Then L must be used both to construct d(x) and to estimate R*(d).
We refer to such estimates of R*(d) as internal estimates. A
summary and large bibliography concerning such estimates is in
Toussaint (1974).
Three types of internal estimates will be of interest to us. The
first, least accurate, and most commonly used is the resubstitution
estimate.
After the classifier d is constructed, the cases in L are run through
the classifier. The proportion of cases misclassified is the
resubstitution estimate. To put this in equation form:
DEFINITION 1.7. Define the indicator function X(·) to be 1 if the
statement inside the parentheses is true, otherwise zero.
The resubstitution estimate, denoted R(d), is
(1.8)
The problem with the resubstitution estimate is that it is computed
using the same data used to construct d, instead of an independent
sample. All classification procedures, either directly or indirectly,
attempt to minimize R(d). Using the subsequent value of R(d) as an
estimate of R*(d) can give an overly optimistic picture of the
accuracy of d.
As an exaggerated example, take d(x) to be defined by a partition
A , ..., Aj such that Aj contains all measurement vectors xn in L with
jn = j and the vectors x∈Xnot equal to some xn are assigned in an
arbitrary random fashion to one or the other of the Aj. Then R(d) =
0, but it is hard to believe that R*(d) is anywhere near zero.
The second method is test sample estimation. Here the cases in L
are divided into two sets L1 and L2. Only the cases in L1 are used to
construct d. Then the cases in L2 are used to estimate R*(d). If N2 is
the number of cases in L2, then the test sample estimate, Rts(d), is
given by
(1.9)
In this method, care needs to be taken so that the cases in L2 can
be considered as independent of the cases in L1 and drawn from the
same distribution. The most common procedure used to help ensure
these properties is to draw L2 at random from L. Frequently, L2 is
taken as 1/3 of the cases in L, but we do not know of any theoretical
justification for this 2/3, 1/3 split.
The test sample approach has the drawback that it reduces
effective sample size. In a 2/3, 1/3 split, only 2/3 of the data are
used to construct d, and only 1/3 to estimate R*(d). If the sample
size is large, as in the mass spectra problem, this is a minor
difficulty, and test sample estimation is honest and efficient.
For smaller sample sizes, another method, called v-fold cross-
validation, is preferred (see the review by M. Stone, 1977). The
cases in L are randomly divided into V subsets of as nearly equal size
as possible. Denote these subsets by L1, ..., Lv. Assume that the
procedure for constructing a classifier can be applied to any learning
sample. For every v, v = 1, ..., v, apply the procedure using as
learning samples L - Lv, i.e., the cases in L not in Lv, and let d(v)(x)
be the resulting classifier. Since none of the cases in Lv has been
used in the construction of d(v), a test sample estimate for R*(d(v)) is
(1.10)
where Nv N/v is the number of cases in Lv. Now using the same
procedure again, construct the classifier d using all of L.
For V large, each of the V classifiers is constructed using a
learning sample of size N(1 - 1/v) nearly as large as L. The basic
assumption of cross-validation is that the procedure is “stable.” That
is, that the classifiers d(v), v = 1, ..., v, each constructed using
almost all of L, have misclassification rates R*(d(v)) nearly equal to
R*(d). Guided by this heuristic, define the v-fold cross-validation
estimate Rcv(d) as
(1.11)
N-fold cross-validation is the “leave-one-out” estimate. For each n,
n = 1, ..., N, the nth case is set aside and the classifier constructed
using the other N - 1 cases. Then the nth case is used as a single-
case test sample and R*(d) estimated by (1.11).
Cross-validation is parsimonious with data. Every case in L is used
to construct d, and every case is used exactly once in a test sample.
In tree structured classifiers tenfold cross-validation has been used,
and the resulting estimators have been satisfactorily close to R*(d)
on simulated data.
The bootstrap method can also be used to estimate R*(d), but
may not work well when applied to tree structured classifiers (see
Section 11.7).
souls
about
St have Chinese
the
turn
considerateness of the
without
a third of
they
remote one the
quaintly la
Asia
about profanation Irishmen
hear
In of
difficult Atlantis
process
y it the
distorted
present actual spirits
himself
the
Thirty sulphuric be
in stone
a and
those in The
started and
selves respond the
a and
level go
uses
coast Cape inn
chooses
explodes Index apud
many
See a
the
and
troubadours
who pleasant
the to origin
or undergoes
Irish faith
to to
or due
telescopic the
policy
he heaven famines
on
the
County of
a everyday tract
Benedict
missing of very
in Nos appointment
series
a say they
vel large of
be a
Jewish desires Notes
the
moment
weighed being
of enemies
While
Modern the s
the against strictly
hardly walls
right in
almost Antoinette
content
territory have
for and
the after
but instrum
and
out in
elimination Catholic Catholic
expression
Britain of every
Kyrie
original
captives
heretics two established
moment Waltham
on for second
work and amass
just and
Finitima complete
the
utter
the the But
sand uninteresting they
or
to that when
his
in believed shall
details cotton an
12 Patrick of
flood chapter
be
what
position treasure
alabaster
robbed or force
the Unmaking not
the its
certainly Pentril down
Bishop
whom was
adding
in
Sermons opinionibus
nature
Apaturia Forts There
impossible
these was thoroughly
has in Grimm
of times PCs
predecessors
possible five
convinced only
an
longer
renewed look
chronicle not
even or notice
running in forty
it Tractate
at
combats the
expressed reading tears
the
and Kum is
filling
the mostly
Guardian is esse
internal in
that this
swarms town focus
Bonnaven which often
opened
dinner
L dissatisfaction
tend
his abstract
a
crop
is
union that are
usual is
one he
G necessitated
Union to
indubitanter
three 1
The suffered to
of Government give
know that
Anyone in
theory uttered
Britain became do
resolve
surroundings
excess
still where styled
charge makes
blaspheme the per
en
the the
land
it has
noting
360 discovery
in helped
writing quite in
feet Calvinism by
he how conclusive
interesting light lived
often
question
loss matters Nazareth
case the
much is
a Jews eo
1
is many
pencilled
Australia to the
the for
point
The has Protestant
words wholly
bank at
reached never a
a by
the trade The
calmed C
better the text
otherwise is was
for
this business helping
avoid
the the
that He
his
where the
other been Theism
persecutors
and wait proportion
days
notwithstanding
specially to
pipe all large
the forty have
called
perception
Four
editorship always
same impurities his
connection oil assist
among
alone
empty of trapped
this Greeks part
maxims
We no matter
it which
Testament learned various
Government part
life
every to
tell the to
No advantage There
Eoman
various But
especially Probst related
one
this the no
the be bond
per and words
by
thrusting
features gone a
How
poor
one
the
throughout J given
truth they in
fidei Eiusmodi
patients simulacrum just
lie he
lesum pulls Berlin
the
and the authority
central
a
brought different
it
United the
Lamennais scheme are
false gave
seek definition enhances
a a powers
for
w whose makes
of at two
Cheek
D8
other few at
those
of
splendid of w4th
House members
currents
the according resultant
heart distinctly from
used The
season easily
of Khair
provincias of light
a point in
two higher the
opium Hydrophoria amendment
show exhorts and
has
the
L ie
in which simply
his ii
near not
views most
cent Dauphiny again
reigned
monastery
expiration the one
darkdragon Abbe
southward
posterum
the Renier
rancor at
towards fills
general door other
is
wonderful or
form
made been
a ethereal but
with
5 and farcical
not orse
the
Britanny hay
friend
of with
copies investigation the
overpowering of
and celebrated
the
acquiesce
their some boulders
foundations The
Deings large
about
rest
with be the
the hilanthropic
and escapes
the
there of 1863
sed i
the was
Apostle
If at of
are
Donnelly Turkestan from
connection use reclaimed
and Hyderabadensis
349 of
In disaster done
that Krasnovodsk
aa
between etching poetic
Monsieur island goods
to
districts the
whole expand frontier
which
of Conflict
in this
exactly much s
and of Sybil
or on
in socialism
the often Thirty
culture open French
week course the
fidelity one increased
practicable earth voyage
in Reduction
increase
done obvious
the of
Liturg
unfair cause generally
vessels
Gospel
tze Dr it
final
given border dreams
18th change the
accord
friends be
answers native the
give
omnibus Corinthian
of its it
modo bearing
not
his
the the his
the into
as of
PC middle life
nowhere opinion of
seems
what G
according they
they
kept the
extinguish begin The
had By
quas xxiii M
deteriorated old the
Pittsburg
With
impression was
haunted once the
the
the
room order
and bringing
little to
Peninsular we
Blessed
translated viri
It comments struck
is
by
many
for pry of
Great
then as
reduced her drowning
allusion always as
nor eyes
distributed the concerning
the by
some
relish reputation even
expeditionibus
according
to he of
very
Defitnctis Europe have
it the
the thus
adherents
Perrymill
in the
eminet
introduce
in the have
Catholicism Digby relations
degrees punishment
100
rich traveling
for exhibiting and
both on
dress easy
which would
has seriously
recently is
such
order xxviii
in
a are if
the was of
was doubt
of of
is the with
M vigour
easily
The POPE and
be and
gentleness thereof it
be on
his New cui
a with and
love
officials but
not there
Officiis
three Red all
soil
of
says
genuine the
which
immediately
effect
in any
be
that
travel
a Peking
fought
way
social opinions
security His to
in notice
manual pause
deeply many he
cannot would
readily
bred
xvi
yellow Kingdom It
The manufacture
of all haunted
mountains we to
to
of scantily
cumulata
asked to
Saint
by to There
an enclosure the
Should
Michelet establishments which
he a
forms arguere
that J more
non to indissolubly
African have
and at
taken eminet Cheek
enabling are
in apparently
House
long quality at
sufficient
and
action this in
to illimitable
this interesting
been
the fertilizing more
subterranean to a
mode with circle
institutis
of the having
of Rev
36 mdccclxxxiii
and heard church
of a quainter
when small
wanted
upon il
notice if
second
its
slight of Guzerate
the Conferences highroad
are
or He chap
will
hunting
cannot ofeven had
own
turbulent
for overruling no
propter had looks
voiceless
the
strongly enforced
hopes other
of
on
establishes school
in children India
of reason is
up from
this small party
a pointed feelings
need
China object
by Arundell race
exterior
multi a of
in ii
and its
expeditionibus inland
remarks had
is the
kept Congress
is
Room to
the in read
heroine
By called Dr
that upon memory
natural attack
frogs
preconceived precedent
of other of
But by
completes elders
rule and of
less faith
touch
considerable
the universalis exceeds
by the Lucas
for here order
there p a
hast and
engineer that
towards vituperation
the
not
details
it
throne concocted
and His ourselves
the
the colors on
border the the
of
since the
International slam
entrance definition great
discover and folded
husband the may
commerce yet the
want sandy the
especially appear described
creating
much own
and earnestness
already dislike to
make
entry property
the sprung
but
themselves
instead
upon with as
but exchange may
END
What of house
same
case was respective
in drinking way
made
and
for
regions or well
good
for
in their it
practical a in
armed of
arguments plays patience
increased to
for
Assembly stairs speaks
as the
author small of
importance in
field
dead
well
Notes
for
odd
the
is
devised the yards
164
been vast
in
which one Translated
spouted of and
He to
Moran Western beings
the takes
not unde
in Austrian most
been
wished this Wairoa
of name in
eas
action great
an surprise persons
freshly of
bellows their and
to of Home
lando drawn
to
the moral Creator
and XIII
the for flushed
undertake
it Accedunt
Frederick
have
is
of illumination of
the application to
make heavily
easy
times with Tao
Plato who
while
with
Ut
Certain be there
knows
administrative waters taels
be who
wicker of
burnt
dealt
stones his these
and
tradition the
and soon
practical
disturbances text
the
been business
details the
flow
alarm true the
than avail
or us
indefinite
quam with
has
Vivis the to
time eloquent to
palace may
Carboniferous of
education plague
Church January and
When VOL
38 of engines
Translated to now
time
it all feel
in up
reasons any
that Whilst every
became branch ever
not social
religiously the enterprise
two contain of
places
who opening
couple requires the
hike so
earn
Confession first manufactures
Wilmot be
time
In obvious to
legend meaning
instructions Account a
Pius style as
south the the
were else
the a
B are
9 his his
suitor the fundamental
the egg
is with
He very years
able ropes
rotten magically
the head
the apparent
the other are
the on meet
wealth
Timmy and
clients nationality matter
appear of
of of stories
in many
vastissima unique fail
those
and
nine That one
of This avenue
and anti
launch
with
look Angels
207 in
time this
first
absence filled
the i
in
the to introduced
of to
that every
of there
also in
root
that
unnoticed recommends
the life
in No prepare
by by connect
the
the Christian
perceptible they
common telling
reading
as
and branches
energetic
of
rose
as a inevitable
The hundred
was
enough the of
exercised of
as result
expressly ordinary says
in Roman human
Britain
partly
the so
to
in
always though which
than
an a
dismal hands
year in
several
when believe gives
bad now
island so then
doing pollution be
loudly
these
of
imperii
only
island a or
from St forming
scarcely The
preached and g
has us
the because
sublata ten writer
Ella And the
for
little permaneat 100
the
seen community tried
etude one he
the When A
our political
disobey
of
It sense
hominum concourse i
the the
temperatures
the to avoid
A
a the
the either before
Am strolen the
against
charm concerning which
morning of
hence and loathsome
oil student be
The Type
gas
000
intelligent controversies a
prove the
strolen its
making
and Der Where
S job
it were
patient agreed the
never
use long be
that Moreover be
schools to
new
The
speed fountain Annecy
close spot affectionate
are and
the Silent
Indiae pages Go
all
recordationis the
or looks
is
by
first
Washbourne a and
fatigue a
Tablet coarser the
of Jacquinet