(Ebook) Hands-On Machine Learning with R by Brad
Boehmke (Author); Brandon M. Greenwell (Author) ISBN
9780367816377, 9781000730197, 9781000730319,
9781000730432, 9781138495685, 0367816377,
1000730190, 100073031X, 1000730433 new release 2025
Purchase at ebooknice.com
( 4.4/5.0 ★ | 472 downloads )
https://ebooknice.com/product/hands-on-machine-learning-
with-r-11910568
(Ebook) Hands-On Machine Learning with R by Brad Boehmke
(Author); Brandon M. Greenwell (Author) ISBN 9780367816377,
9781000730197, 9781000730319, 9781000730432, 9781138495685,
0367816377, 1000730190, 100073031X, 1000730433 Pdf Download
EBOOK
Available Formats
■ PDF eBook Study Guide Ebook
EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME
INSTANT DOWNLOAD VIEW LIBRARY
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason; Viles, James
ISBN 9781459699816, 9781743365571, 9781925268492, 1459699815,
1743365578, 1925268497
https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374
(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena Alfredsson, Hans
Heikne, Sanna Bodemyr ISBN 9789127456600, 9127456609
https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312
(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT II Success)
by Peterson's ISBN 9780768906677, 0768906679
https://ebooknice.com/product/sat-ii-success-
math-1c-and-2c-2002-peterson-s-sat-ii-success-1722018
(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master the SAT
Subject Test: Math Levels 1 & 2) by Arco ISBN 9780768923049,
0768923042
https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-
arco-master-the-sat-subject-test-math-levels-1-2-2326094
(Ebook) Cambridge IGCSE and O Level History Workbook 2C - Depth Study:
the United States, 1919-41 2nd Edition by Benjamin Harrison ISBN
9781398375147, 9781398375048, 1398375144, 1398375047
https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044
(Ebook) Vagabond, Vol. 29 (29) by Inoue, Takehiko ISBN 9781421531489,
1421531488
https://ebooknice.com/product/vagabond-vol-29-29-37511002
(Ebook) Organometallic Chemistry, Volume 29 by M. Green ISBN
0854043284
https://ebooknice.com/product/organometallic-chemistry-
volume-29-2440106
(Ebook) Tree-Based Methods for Statistical Learning in R: A Practical
Introduction with Applications in R by Brandon M. Greenwell ISBN
9780367532468, 0367532468
https://ebooknice.com/product/tree-based-methods-for-statistical-
learning-in-r-a-practical-introduction-with-applications-in-r-43930458
(Ebook) Boeing B-29 Superfortress ISBN 9780764302725, 0764302728
https://ebooknice.com/product/boeing-b-29-superfortress-1573658
Hands-On Machine
Learning with R
Chapman & Hall/CRC
The R Series
Series Editors
John M. Chambers, Department of Statistics, Stanford University, California, USA
Torsten Hothorn, Division of Biostatistics, University of Zurich, Switzerland
Duncan Temple Lang, Department of Statistics, University of California, Davis, USA
Hadley Wickham, RStudio, Boston, Massachusetts, USA
Recently Published Titles
Spatial Microsimulation with R
Robin Lovelace, Morgane Dumont
Extending R
John M. Chambers
Using the R Commander: A Point-and-Click Interface for R
John Fox
Computational Actuarial Science with R
Arthur Charpentier
bookdown: Authoring Books and Technical Documents with R Markdown,
Yihui Xie
Testing R Code
Richard Cotton
R Primer, Second Edition
Claus Thorn Ekstrøm
Flexible Regression and Smoothing: Using GAMLSS in R
Mikis D. Stasinopoulos, Robert A. Rigby, Gillian Z. Heller, Vlasios Voudouris, and
Fernanda De Bastiani
The Essentials of Data Science: Knowledge Discovery Using R
Graham J. Williams
blogdown: Creating Websites with R Markdown
Yihui Xie, Alison Presmanes Hill, Amber Thomas
Handbook of Educational Measurement and Psychometrics Using R
Christopher D. Desjardins, Okan Bulut
Displaying Time Series, Spatial, and Space-Time Data with R, Second Edition
Oscar Perpinan Lamigueiro
Reproducible Finance with R
Jonathan K. Regenstein, Jr
R Markdown
The Definitive Guide
Yihui Xie, J.J. Allaire, Garrett Grolemund
Practical R for Mass Communication and Journalism
Sharon Machlis
Analyzing Baseball Data with R, Second Edition
Max Marchi, Jim Albert, Benjamin S. Baumer
Spatio-Temporal Statistics with R
Christopher K. Wikle, Andrew Zammit-Mangion, and Noel Cressie
Statistical Computing with R, Second Edition
Maria L. Rizzo
Geocomputation with R
Robin Lovelace, Jakub Nowosad, Jannes Muenchow
Distributions for Modelling Location, Scale, and Shape
Using GAMLSS in R
Robert A. Rigby , Mikis D. Stasinopoulos, Gillian Z. Heller and Fernanda De Bastiani
Advanced Business Analytics in R: Descriptive, Predictive, and Prescriptive
Bradley Boehmke and Brandon Greenwell
For more information about this series, please visit: https://www.crcpress.com/go/the-r-series
Hands-On Machine
Learning with R
Brad Boehmke
Brandon Greenwell
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2020 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
International Standard Book Number-13: 978-1-138-49568-5 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Brad:
To Kate, Alivia, and Jules for making sure I have a life outside of
programming and to my mother who, undoubtedly, will try to read the pages
that follow.
Brandon:
To my parents for encouragement, to Thaddeus Tarpey for inspiration,
and to Julia, Lilly, and Jen for putting up with me while writing this book.
Contents
Preface xix
I Fundamentals 1
1 Introduction to Machine Learning 3
1.1 Supervised learning . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Regression problems . . . . . . . . . . . . . . . . . . . 4
1.1.2 Classification problems . . . . . . . . . . . . . . . . . . 5
1.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 The data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Modeling Process 13
2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Data splitting . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Simple random sampling . . . . . . . . . . . . . . . . . 16
2.2.2 Stratified sampling . . . . . . . . . . . . . . . . . . . . 18
2.2.3 Class imbalances . . . . . . . . . . . . . . . . . . . . . 19
2.3 Creating models in R . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Many formula interfaces . . . . . . . . . . . . . . . . . 20
2.3.2 Many engines . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Resampling methods . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 k-fold cross validation . . . . . . . . . . . . . . . . . . 23
2.4.2 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . 26
2.4.3 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . 27
ix
x Contents
2.5 Bias variance trade-off . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.3 Hyperparameter tuning . . . . . . . . . . . . . . . . . 30
2.6 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6.1 Regression models . . . . . . . . . . . . . . . . . . . . 32
2.6.2 Classification models . . . . . . . . . . . . . . . . . . . 33
2.7 Putting the processes together . . . . . . . . . . . . . . . . . 36
3 Feature & Target Engineering 41
3.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Target engineering . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Dealing with missingness . . . . . . . . . . . . . . . . . . . . 45
3.3.1 Visualizing missing values . . . . . . . . . . . . . . . . 46
3.3.2 Imputation . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Feature filtering . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Numeric feature engineering . . . . . . . . . . . . . . . . . . 56
3.5.1 Skewness . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5.2 Standardization . . . . . . . . . . . . . . . . . . . . . . . 57
3.6 Categorical feature engineering . . . . . . . . . . . . . . . . . 58
3.6.1 Lumping . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6.2 One-hot & dummy encoding . . . . . . . . . . . . . . . 61
3.6.3 Label encoding . . . . . . . . . . . . . . . . . . . . . . 62
3.6.4 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . 65
3.7 Dimension reduction . . . . . . . . . . . . . . . . . . . . . . . 66
3.8 Proper implementation . . . . . . . . . . . . . . . . . . . . . . 67
3.8.1 Sequential steps . . . . . . . . . . . . . . . . . . . . . . 67
3.8.2 Data leakage . . . . . . . . . . . . . . . . . . . . . . . 68
3.8.3 Putting the process together . . . . . . . . . . . . . . 69
II Supervised Learning 77
Contents xi
4 Linear Regression 79
4.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Simple linear regression . . . . . . . . . . . . . . . . . . . . . 80
4.2.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 Multiple linear regression . . . . . . . . . . . . . . . . . . . . 84
4.4 Assessing model accuracy . . . . . . . . . . . . . . . . . . . . 88
4.5 Model concerns . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6 Principal component regression . . . . . . . . . . . . . . . . . 96
4.7 Partial least squares . . . . . . . . . . . . . . . . . . . . . . . 99
4.8 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . . 101
4.9 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5 Logistic Regression 105
5.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Why logistic regression . . . . . . . . . . . . . . . . . . . . . 106
5.3 Simple logistic regression . . . . . . . . . . . . . . . . . . . . . 107
5.4 Multiple logistic regression . . . . . . . . . . . . . . . . . . . 110
5.5 Assessing model accuracy . . . . . . . . . . . . . . . . . . . . . 111
5.6 Model concerns . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.7 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . 116
5.8 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6 Regularized Regression 121
6.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2 Why regularize? . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.2.1 Ridge penalty . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.2 Lasso penalty . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.3 Elastic nets . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.4 Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
xii Contents
6.5 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . 136
6.6 Attrition data . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.7 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7 Multivariate Adaptive Regression Splines 141
7.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7.2 The basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7.2.1 Multivariate adaptive regression splines . . . . . . . . 143
7.3 Fitting a basic MARS model . . . . . . . . . . . . . . . . . . 145
7.4 Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.5 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . . 151
7.6 Attrition data . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7.7 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8 K-Nearest Neighbors 157
8.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.2 Measuring similarity . . . . . . . . . . . . . . . . . . . . . . . 158
8.2.1 Distance measures . . . . . . . . . . . . . . . . . . . . 158
8.2.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 161
8.3 Choosing 𝑘 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.4 MNIST example . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.5 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9 Decision Trees 175
9.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.3 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.4 How deep? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.4.1 Early stopping . . . . . . . . . . . . . . . . . . . . . . 180
9.4.2 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.5 Ames housing example . . . . . . . . . . . . . . . . . . . . . 182
9.6 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . . 187
9.7 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Contents xiii
10 Bagging 191
10.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
10.2 Why and when bagging works . . . . . . . . . . . . . . . . . 192
10.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.4 Easily parallelize . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.5 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . 198
10.6 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 200
11 Random Forests 203
11.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.2 Extending bagging . . . . . . . . . . . . . . . . . . . . . . . . 204
11.3 Out-of-the-box performance . . . . . . . . . . . . . . . . . . 205
11.4 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . 206
11.4.1 Number of trees . . . . . . . . . . . . . . . . . . . . . 206
11.4.2 𝑚u�u�u� . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.4.3 Tree complexity . . . . . . . . . . . . . . . . . . . . . . . 207
11.4.4 Sampling scheme . . . . . . . . . . . . . . . . . . . . . 208
11.4.5 Split rule . . . . . . . . . . . . . . . . . . . . . . . . . 209
11.5 Tuning strategies . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.6 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . 216
11.7 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 218
12 Gradient Boosting 221
12.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
12.2 How boosting works . . . . . . . . . . . . . . . . . . . . . . . 222
12.2.1 A sequential ensemble approach . . . . . . . . . . . . . 222
12.2.2 Gradient descent . . . . . . . . . . . . . . . . . . . . . 223
12.3 Basic GBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
12.3.1 Hyperparameters . . . . . . . . . . . . . . . . . . . . . . 227
12.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 228
12.3.3 General tuning strategy . . . . . . . . . . . . . . . . . 230
xiv Contents
12.4 Stochastic GBMs . . . . . . . . . . . . . . . . . . . . . . . . 233
12.4.1 Stochastic hyperparameters . . . . . . . . . . . . . . . 233
12.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 234
12.5 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
12.5.1 XGBoost hyperparameters . . . . . . . . . . . . . . . 238
12.5.2 Tuning strategy . . . . . . . . . . . . . . . . . . . . . . 239
12.6 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . 243
12.7 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 244
13 Deep Learning 247
13.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
13.2 Why deep learning . . . . . . . . . . . . . . . . . . . . . . . . 249
13.3 Feedforward DNNs . . . . . . . . . . . . . . . . . . . . . . . . 251
13.4 Network architecture . . . . . . . . . . . . . . . . . . . . . . 252
13.4.1 Layers and nodes . . . . . . . . . . . . . . . . . . . . . 252
13.4.2 Activation . . . . . . . . . . . . . . . . . . . . . . . . . 254
13.5 Backpropagation . . . . . . . . . . . . . . . . . . . . . . . . . 255
13.6 Model training . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
13.7 Model tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
13.7.1 Model capacity . . . . . . . . . . . . . . . . . . . . . . 259
13.7.2 Batch normalization . . . . . . . . . . . . . . . . . . . 260
13.7.3 Regularization . . . . . . . . . . . . . . . . . . . . . . . 261
13.7.4 Adjust learning rate . . . . . . . . . . . . . . . . . . . 264
13.8 Grid search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
13.9 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 270
14 Support Vector Machines 271
14.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
14.2 Optimal separating hyperplanes . . . . . . . . . . . . . . . . 272
14.2.1 The hard margin classifier . . . . . . . . . . . . . . . . 273
14.2.2 The soft margin classifier . . . . . . . . . . . . . . . . 276
Contents xv
14.3 The support vector machine . . . . . . . . . . . . . . . . . . . 277
14.3.1 More than two classes . . . . . . . . . . . . . . . . . . 280
14.3.2 Support vector regression . . . . . . . . . . . . . . . . 280
14.4 Job attrition example . . . . . . . . . . . . . . . . . . . . . . 283
14.4.1 Class weights . . . . . . . . . . . . . . . . . . . . . . . 284
14.4.2 Class probabilities . . . . . . . . . . . . . . . . . . . . 285
14.5 Feature interpretation . . . . . . . . . . . . . . . . . . . . . . . 287
14.6 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 289
15 Stacked Models 291
15.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
15.2 The Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
15.2.1 Common ensemble methods . . . . . . . . . . . . . . . 293
15.2.2 Super learner algorithm . . . . . . . . . . . . . . . . . 293
15.2.3 Available packages . . . . . . . . . . . . . . . . . . . . 294
15.3 Stacking existing models . . . . . . . . . . . . . . . . . . . . 295
15.4 Stacking a grid search . . . . . . . . . . . . . . . . . . . . . . 298
15.5 Automated machine learning . . . . . . . . . . . . . . . . . . . 301
16 Interpretable Machine Learning 305
16.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
16.2 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
16.2.1 Global interpretation . . . . . . . . . . . . . . . . . . . . 307
16.2.2 Local interpretation . . . . . . . . . . . . . . . . . . . . 307
16.2.3 Model-specific vs. model-agnostic . . . . . . . . . . . . 308
16.3 Permutation-based feature importance . . . . . . . . . . . . . . 311
16.3.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
16.3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 312
16.4 Partial dependence . . . . . . . . . . . . . . . . . . . . . . . 313
16.4.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . 314
16.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 315
xvi Contents
16.4.3 Alternative uses . . . . . . . . . . . . . . . . . . . . . 316
16.5 Individual conditional expectation . . . . . . . . . . . . . . . . 317
16.5.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
16.5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 318
16.6 Feature interactions . . . . . . . . . . . . . . . . . . . . . . . 320
16.6.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . 320
16.6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . 321
16.6.3 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . 325
16.7 Local interpretable model-agnostic explanations . . . . . . . 325
16.7.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . 325
16.7.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 326
16.7.3 Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
16.7.4 Alternative uses . . . . . . . . . . . . . . . . . . . . . 330
16.8 Shapley values . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
16.8.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . 332
16.8.2 Implementation . . . . . . . . . . . . . . . . . . . . . . 334
16.8.3 XGBoost and built-in Shapley values . . . . . . . . . . 336
16.9 Localized step-wise procedure . . . . . . . . . . . . . . . . . 340
16.9.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . 340
16.9.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . 341
16.10Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 342
III Dimension Reduction 343
17 Principal Components Analysis 345
17.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
17.2 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
17.3 Finding principal components . . . . . . . . . . . . . . . . . 348
17.4 Performing PCA in R . . . . . . . . . . . . . . . . . . . . . . 350
17.5 Selecting the number of principal components . . . . . . . . 354
17.5.1 Eigenvalue criterion . . . . . . . . . . . . . . . . . . . 354
Contents xvii
17.5.2 Proportion of variance explained criterion . . . . . . . 355
17.5.3 Scree plot criterion . . . . . . . . . . . . . . . . . . . . 356
17.6 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
18 Generalized Low Rank Models 359
18.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
18.2 The idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
18.3 Finding the lower ranks . . . . . . . . . . . . . . . . . . . . . 362
18.3.1 Alternating minimization . . . . . . . . . . . . . . . . 362
18.3.2 Loss functions . . . . . . . . . . . . . . . . . . . . . . . 362
18.3.3 Regularization . . . . . . . . . . . . . . . . . . . . . . 363
18.3.4 Selecting k . . . . . . . . . . . . . . . . . . . . . . . . 364
18.4 Fitting GLRMs in R . . . . . . . . . . . . . . . . . . . . . . . 365
18.4.1 Basic GLRM model . . . . . . . . . . . . . . . . . . . 365
18.4.2 Tuning to optimize for unseen data . . . . . . . . . . . . 371
18.5 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 375
19 Autoencoders 377
19.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
19.2 Undercomplete autoencoders . . . . . . . . . . . . . . . . . . 378
19.2.1 Comparing PCA to an autoencoder . . . . . . . . . . 378
19.2.2 Stacked autoencoders . . . . . . . . . . . . . . . . . . 380
19.2.3 Visualizing the reconstruction . . . . . . . . . . . . . . 383
19.3 Sparse autoencoders . . . . . . . . . . . . . . . . . . . . . . . 384
19.4 Denoising autoencoders . . . . . . . . . . . . . . . . . . . . . 390
19.5 Anomaly detection . . . . . . . . . . . . . . . . . . . . . . . . . 391
19.6 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 394
IV Clustering 397
xviii Contents
20 K-means Clustering 399
20.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
20.2 Distance measures . . . . . . . . . . . . . . . . . . . . . . . . 400
20.3 Defining clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 401
20.4 k-means algorithm . . . . . . . . . . . . . . . . . . . . . . . . 403
20.5 Clustering digits . . . . . . . . . . . . . . . . . . . . . . . . . 405
20.6 How many clusters? . . . . . . . . . . . . . . . . . . . . . . . 408
20.7 Clustering with mixed data . . . . . . . . . . . . . . . . . . . 410
20.8 Alternative partitioning methods . . . . . . . . . . . . . . . . 413
20.9 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 415
21 Hierarchical Clustering 417
21.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
21.2 Hierarchical clustering algorithms . . . . . . . . . . . . . . . 418
21.3 Hierarchical clustering in R . . . . . . . . . . . . . . . . . . . 420
21.3.1 Agglomerative hierarchical clustering . . . . . . . . . . 420
21.3.2 Divisive hierarchical clustering . . . . . . . . . . . . . 423
21.4 Determining optimal clusters . . . . . . . . . . . . . . . . . . 423
21.5 Working with dendrograms . . . . . . . . . . . . . . . . . . . 424
21.6 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . 428
22 Model-based Clustering 429
22.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
22.2 Measuring probability and uncertainty . . . . . . . . . . . . 430
22.3 Covariance types . . . . . . . . . . . . . . . . . . . . . . . . . 432
22.4 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . 434
22.5 My basket example . . . . . . . . . . . . . . . . . . . . . . . 436
22.6 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Bibliography 443
Index 457
Preface
Welcome to Hands-On Machine Learning with R. This book provides hands-on
modules for many of the most common machine learning methods to include:
• Generalized low rank models
• Clustering algorithms
• Autoencoders
• Regularized models
• Random forests
• Gradient boosting machines
• Deep neural networks
• Stacking / super learners
• and more!
You will learn how to build and tune these various models with R packages
that have been tested and approved due to their ability to scale well. However,
our motivation in almost every case is to describe the techniques in a way that
helps develop intuition for its strengths and weaknesses. For the most part, we
minimize mathematical complexity when possible but also provide resources
to get deeper into the details if desired.
Who should read this
We intend this work to be a practitioner’s guide to the machine learning
process and a place where one can come to learn about the approach and to
gain intuition about the many commonly used, modern, and powerful methods
accepted in the machine learning community. If you are familiar with the
analytic methodologies, this book may still serve as a reference for how to
work with the various R packages for implementation. While an abundance of
videos, blog posts, and tutorials exist online, we have long been frustrated by
the lack of consistency, completeness, and bias towards singular packages for
implementation. This is what inspired this book.
This book is not meant to be an introduction to R or to programming in
xix
Other documents randomly have
different content
az the
now
former of for
and
dream usual reigned
The other
sort more
in and
to it hiszen
by shows
was at
s Elizabeth am
for
a cost
William down a
we asked
and
dogged the as
and it his
God South emotional
years
and have Indies
honor of the
people
All Copyright Bring
another
either him
frame applicable
inside
the
things
this
Relation but
to asking s
bull
to
an the
day
Doctor the
was before
investigate modifications courage
Br his to
month him of
indulged place Van
mining No
rely harvests of
would Dialectic
in
a doll
asked mérhesse
of duty
never savage
Laun
quickening who
note not
be any
a
am
me szerelmük
you
but terrors His
what modern
than her
if
more An power
father quotation already
that interested
of her he
to the
OF mindent
an itself were
look
This
lovely she he
to an
neither m■vészetet
Alayna An Lombroso
the in before
narrating elegant Gods
shading he
order a
it
it overcoat
out domination the
s
my were
for s
akkor battle
we gestures
wife
the quickly class
which to iron
but of
made
of heaven
an build
sobered
that ones
wish
child not notice
Selected the must
of
under fearful 90
house nagy little
the does
long the
volt
it become world
window myself
Peter
shops
a unforgiving
full
real
the
within
in
matter
Decide drew
the
swagger they
wilds looked if
this
from water
my
particular
seldom science
you
one
chap a
ill You
töltöm
the from look
mamma
many
Kérem has which
happened civilization
artist possibly
her
that Fool are
dollars Gutenberg
protected water
their can
And theory
KIS
osztályon of at
description out has
worst TRADEMARK dulled
book subject that
see and
shall Guin
the I I
the
underlying and ■k
to
cited
when face get
Baudelaire adopted
the
itself
the secret
A Project
as depended Different
already
children A
get or all
of and
he earnest
it that liberties
year
rounded crude
the
does the child
are
az
Joyous UR
distribute
Enter man subjects
The to far
után csak cover
s hour
graduate
the flashed
this
Old
black his
the hold
was
witnessed ensuing increase
the in these
minute
for and
Madison sovány and
duty whose explained
While
120 to
form
it
bleak away every
feelings
and tendency in
example line have
with mondta
mad
night our
repellent impression Novelty
simple
at the
development
for
England at past
to I unaffected
before
of
forró Az Hát
he OF
entry marry very
passions
was
to
see and 1
the on the
of the
30 make reason
eyes
Laun the
the somewhat for
useful lies
am
so to
thinking are a
with It is
to her
much At
he
distinctly probably
out
support honoured
mindless or
fruitfulness
words honey
my a
art art
my
from
the
writes
komolyan
groomed To The
poet reflects
dikes kérdezték
sacred eye
integral of
of
came
s
check should
sways this
to Greek human
leaved
of must f
her
full
played in
ilyen its
not this
passed love education
we
and
ilyen
without
him
how as
fekete big little
Ciliata father
A of any
foot INAS
as
altar egy
And of
a
member this
door dirty
always angled oracular
agony
disgust for IMPLIED
the
circumstances
Arthur appear
His
this Master
már
for
as
about be 3
with up
or back
becomes her The
to ugyis woman
a játszott of
eighteen Este
whose and
machine her at
comparing held
them Az for
closely the serious
for to
fight that
peaceful puts over
This it for
the takes
There
I of
strangling
thou
Fig write
ide See
stand the whole
in Sir
country is
characterizes
to touch attacked
no
Mrs otherwise through
have king
Hát
Give
than early never
doing will
as when
much
One Gwaine was
listen
touch strange demonstration
for thorny sir
all soul
of alone forgetting
the Always
fáradtan electronic
yet you
217 possible propriety
wanting immediately in
that with
left
in Is something
came
Saphier including pity
have
discordant mother
captured to
the spectacle
miles town
ability pity
bunk
Even do Az
impulsively
of
for what was
will
usual with Fruit
till long and
happiness a a
At best striking
vonatkozott
picture the what
naturalistic
a was
disturbed
you
Géza
and red to
hogy was and
bishop more
would
lata a
of
I have
but
the
saith thus of
began
checked azután exactly
roared
these
paw 39
that spring
display that a
law despair man
her 333 turned
his convey
propelled
stupid Oh of
writer fraction Hild
of
made The és
induced in Its
of time
Thus midrib carries
he
great trees
at puzzled
overflows
militia in long
That of
about he
predominant about relation
contrast
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebooknice.com