0% found this document useful (0 votes)

143 views

Practical Statistics for Data Scientists

The document is the second edition of 'Practical Statistics for Data Scientists' by Peter Bruce, Andrew Bruce, and Peter Gedeck, published by O'Reilly Media. It covers over 50 essential statistical concepts using R and Python, aimed at data scientists. The book includes topics such as exploratory data analysis, statistical experiments, regression, classification, and machine learning techniques.

Uploaded by

Enjoy Insta

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views

Practical Statistics for Data Scientists

Uploaded by

Enjoy Insta

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Se dit

E
co ion
nd
Practical
Statistics
for Data Scientists
50+ Essential Concepts Using R and Python

Peter Bruce, Andrew Bruce

& Peter Gedeck
SECOND EDITION

Practical Statistics for

Data Scientists
50+ Essential Concepts Using R and Python

Peter Bruce, Andrew Bruce, and Peter Gedeck

Beijing Boston Farnham Sebastopol Tokyo

Practical Statistics for Data Scientists
by Peter Bruce, Andrew Bruce, and Peter Gedeck
Copyright © 2020 Peter Bruce, Andrew Bruce, and Peter Gedeck. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional
sales department: 800-998-9938 or [email protected].

Editor: Nicole Tache Indexer: Ellen Troutman-Zaig

Production Editor: Kristen Brown Interior Designer: David Futato
Copyeditor: Piper Editorial Cover Designer: Karen Montgomery
Proofreader: Arthur Johnson Illustrator: Rebecca Demarest

May 2017: First Edition

May 2020: Second Edition

Revision History for the Second Edition

2020-04-10: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781492072942 for release details.

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Practical Statistics for Data Scientists,
the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors, and do not represent the publisher’s views.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.

978-1-492-07294-2
[LSI]
Peter Bruce and Andrew Bruce would like to dedicate this book to the memories of our
parents, Victor G. Bruce and Nancy C. Bruce, who cultivated a passion for math and
science; and to our early mentors John W. Tukey and Julian Simon and our lifelong
friend Geoff Watson, who helped inspire us to pursue a career in statistics.
Peter Gedeck would like to dedicate this book to Tim Clark and Christian Kramer, with
deep thanks for their scientific collaboration and friendship.
This page intentionally left blank
Table of Contents

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

1. Exploratory Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Elements of Structured Data 2
Further Reading 4
Rectangular Data 4
Data Frames and Indexes 6
Nonrectangular Data Structures 6
Further Reading 7
Estimates of Location 7
Mean 9
Median and Robust Estimates 10
Example: Location Estimates of Population and Murder Rates 12
Further Reading 13
Estimates of Variability 13
Standard Deviation and Related Estimates 14
Estimates Based on Percentiles 16
Example: Variability Estimates of State Population 18
Further Reading 19
Exploring the Data Distribution 19
Percentiles and Boxplots 20
Frequency Tables and Histograms 22
Density Plots and Estimates 24
Further Reading 26
Exploring Binary and Categorical Data 27
Mode 29
Expected Value 29
Probability 30

v
Further Reading 30
Correlation 30
Scatterplots 34
Further Reading 36
Exploring Two or More Variables 36
Hexagonal Binning and Contours (Plotting Numeric Versus Numeric Data) 36
Two Categorical Variables 39
Categorical and Numeric Data 41
Visualizing Multiple Variables 43
Further Reading 46
Summary 46

2. Data and Sampling Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Random Sampling and Sample Bias 48
Bias 50
Random Selection 51
Size Versus Quality: When Does Size Matter? 52
Sample Mean Versus Population Mean 53
Further Reading 53
Selection Bias 54
Regression to the Mean 55
Further Reading 57
Sampling Distribution of a Statistic 57
Central Limit Theorem 60
Standard Error 60
Further Reading 61
The Bootstrap 61
Resampling Versus Bootstrapping 65
Further Reading 65
Confidence Intervals 65
Further Reading 68
Normal Distribution 69
Standard Normal and QQ-Plots 71
Long-Tailed Distributions 73
Further Reading 75
Student’s t-Distribution 75
Further Reading 78
Binomial Distribution 78
Further Reading 80
Chi-Square Distribution 80
Further Reading 81
F-Distribution 82

vi | Table of Contents
Further Reading 82
Poisson and Related Distributions 82
Poisson Distributions 83
Exponential Distribution 84
Estimating the Failure Rate 84
Weibull Distribution 85
Further Reading 86
Summary 86

3. Statistical Experiments and Significance Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

A/B Testing 88
Why Have a Control Group? 90
Why Just A/B? Why Not C, D,…? 91
Further Reading 92
Hypothesis Tests 93
The Null Hypothesis 94
Alternative Hypothesis 95
One-Way Versus Two-Way Hypothesis Tests 95
Further Reading 96
Resampling 96
Permutation Test 97
Example: Web Stickiness 98
Exhaustive and Bootstrap Permutation Tests 102
Permutation Tests: The Bottom Line for Data Science 102
Further Reading 103
Statistical Significance and p-Values 103
p-Value 106
Alpha 107
Type 1 and Type 2 Errors 109
Data Science and p-Values 109
Further Reading 110
t-Tests 110
Further Reading 112
Multiple Testing 112
Further Reading 116
Degrees of Freedom 116
Further Reading 118
ANOVA 118
F-Statistic 121
Two-Way ANOVA 123
Further Reading 124
Chi-Square Test 124

Table of Contents | vii

Chi-Square Test: A Resampling Approach 124
Chi-Square Test: Statistical Theory 127
Fisher’s Exact Test 128
Relevance for Data Science 130
Further Reading 131
Multi-Arm Bandit Algorithm 131
Further Reading 134
Power and Sample Size 135
Sample Size 136
Further Reading 138
Summary 139

4. Regression and Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Simple Linear Regression 141
The Regression Equation 143
Fitted Values and Residuals 146
Least Squares 148
Prediction Versus Explanation (Profiling) 149
Further Reading 150
Multiple Linear Regression 150
Example: King County Housing Data 151
Assessing the Model 153
Cross-Validation 155
Model Selection and Stepwise Regression 156
Weighted Regression 159
Further Reading 161
Prediction Using Regression 161
The Dangers of Extrapolation 161
Confidence and Prediction Intervals 161
Factor Variables in Regression 163
Dummy Variables Representation 164
Factor Variables with Many Levels 167
Ordered Factor Variables 169
Interpreting the Regression Equation 169
Correlated Predictors 170
Multicollinearity 172
Confounding Variables 172
Interactions and Main Effects 174
Regression Diagnostics 176
Outliers 177
Influential Values 179
Heteroskedasticity, Non-Normality, and Correlated Errors 182

viii | Table of Contents

Partial Residual Plots and Nonlinearity 185
Polynomial and Spline Regression 187
Polynomial 188
Splines 189
Generalized Additive Models 192
Further Reading 193
Summary 194

5. Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Naive Bayes 196
Why Exact Bayesian Classification Is Impractical 197
The Naive Solution 198
Numeric Predictor Variables 200
Further Reading 201
Discriminant Analysis 201
Covariance Matrix 202
Fisher’s Linear Discriminant 203
A Simple Example 204
Further Reading 207
Logistic Regression 208
Logistic Response Function and Logit 208
Logistic Regression and the GLM 210
Generalized Linear Models 212
Predicted Values from Logistic Regression 212
Interpreting the Coefficients and Odds Ratios 213
Linear and Logistic Regression: Similarities and Differences 214
Assessing the Model 216
Further Reading 219
Evaluating Classification Models 219
Confusion Matrix 221
The Rare Class Problem 223
Precision, Recall, and Specificity 223
ROC Curve 224
AUC 226
Lift 228
Further Reading 229
Strategies for Imbalanced Data 230
Undersampling 231
Oversampling and Up/Down Weighting 232
Data Generation 233
Cost-Based Classification 234
Exploring the Predictions 234

Table of Contents | ix
Further Reading 236
Summary 236

6. Statistical Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

K-Nearest Neighbors 238
A Small Example: Predicting Loan Default 239
Distance Metrics 241
One Hot Encoder 242
Standardization (Normalization, z-Scores) 243
Choosing K 246
KNN as a Feature Engine 247
Tree Models 249
A Simple Example 250
The Recursive Partitioning Algorithm 252
Measuring Homogeneity or Impurity 254
Stopping the Tree from Growing 256
Predicting a Continuous Value 257
How Trees Are Used 258
Further Reading 259
Bagging and the Random Forest 259
Bagging 260
Random Forest 261
Variable Importance 265
Hyperparameters 269
Boosting 270
The Boosting Algorithm 271
XGBoost 272
Regularization: Avoiding Overfitting 274
Hyperparameters and Cross-Validation 279
Summary 282

7. Unsupervised Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

Principal Components Analysis 284
A Simple Example 285
Computing the Principal Components 288
Interpreting Principal Components 289
Correspondence Analysis 292
Further Reading 294
K-Means Clustering 294
A Simple Example 295
K-Means Algorithm 298
Interpreting the Clusters 299

x | Table of Contents
Selecting the Number of Clusters 302
Hierarchical Clustering 304
A Simple Example 305
The Dendrogram 306
The Agglomerative Algorithm 308
Measures of Dissimilarity 309
Model-Based Clustering 311
Multivariate Normal Distribution 311
Mixtures of Normals 312
Selecting the Number of Clusters 315
Further Reading 318
Scaling and Categorical Variables 318
Scaling the Variables 319
Dominant Variables 321
Categorical Data and Gower’s Distance 322
Problems with Clustering Mixed Data 325
Summary 326

Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

Table of Contents | xi

Com 114 - ND 1 Statistics For Computing - 2024
100% (1)
Com 114 - ND 1 Statistics For Computing - 2024
43 pages
Foundations and Applications of Statistics: An Introduction Using
100% (3)
Foundations and Applications of Statistics: An Introduction Using
640 pages
Previewpdf
50% (2)
Previewpdf
41 pages
Hazards Associated With Three Signal Words and Four Colours On Warning Signs
No ratings yet
Hazards Associated With Three Signal Words and Four Colours On Warning Signs
12 pages
Statistics Compendium 1st edition Edition Brink D. instant download
100% (1)
Statistics Compendium 1st edition Edition Brink D. instant download
82 pages
Cen Math08 Amparolr
No ratings yet
Cen Math08 Amparolr
108 pages
Quantitaive Anlysis
No ratings yet
Quantitaive Anlysis
14 pages
Statistical Methods For Business and Economics
No ratings yet
Statistical Methods For Business and Economics
888 pages
R for Data Science 1st Edition Garrett Grolemund - The ebook in PDF/DOCX format is ready for download now
100% (3)
R for Data Science 1st Edition Garrett Grolemund - The ebook in PDF/DOCX format is ready for download now
68 pages
Physics Project
No ratings yet
Physics Project
2 pages
An Introduction to Statistics 1st Edition George Woodbury download
No ratings yet
An Introduction to Statistics 1st Edition George Woodbury download
77 pages
Achieve Maths-Bk6-Data Statistics Drawing Graphs - FREE 2019
100% (2)
Achieve Maths-Bk6-Data Statistics Drawing Graphs - FREE 2019
66 pages
Descriptive Statistics - Book
No ratings yet
Descriptive Statistics - Book
101 pages
Introductionof Statistics
No ratings yet
Introductionof Statistics
163 pages
Data Science Report
No ratings yet
Data Science Report
35 pages
Probst at Book
No ratings yet
Probst at Book
539 pages
2.central Tendency and Dispersion
No ratings yet
2.central Tendency and Dispersion
114 pages
Statistics 152
No ratings yet
Statistics 152
236 pages
20. Introduction to Statistical Thinking
No ratings yet
20. Introduction to Statistical Thinking
380 pages
Probability and Statistics UIUC Luthuli
100% (2)
Probability and Statistics UIUC Luthuli
451 pages
MATH2016 - 2021 - S2 Notes Week 1
No ratings yet
MATH2016 - 2021 - S2 Notes Week 1
11 pages
DoingMathWithPython_Solutions
No ratings yet
DoingMathWithPython_Solutions
54 pages
Tutorial Rnaseq
No ratings yet
Tutorial Rnaseq
75 pages
CH 02
No ratings yet
CH 02
32 pages
JB Ies 109 Exercises Answers
No ratings yet
JB Ies 109 Exercises Answers
246 pages
(eBook PDF) Business Statistics A First Course First Canadian Edition - Read the ebook now with the complete version and no limits
100% (1)
(eBook PDF) Business Statistics A First Course First Canadian Edition - Read the ebook now with the complete version and no limits
50 pages
Advanced Statistics Problems (New) 1
No ratings yet
Advanced Statistics Problems (New) 1
5 pages
Baron Rpsychx
No ratings yet
Baron Rpsychx
46 pages
Download Experimental Statistics and Data Analysis for Mechanical and Aerospace Engineers (Advances in Applied Mathematics) 1st Edition Middleton ebook All Chapters PDF
100% (1)
Download Experimental Statistics and Data Analysis for Mechanical and Aerospace Engineers (Advances in Applied Mathematics) 1st Edition Middleton ebook All Chapters PDF
37 pages
STAT Exercises
No ratings yet
STAT Exercises
258 pages
(eBook PDF) Business Statistics A First Course First Canadian Edition All Chapters Instant Download
100% (10)
(eBook PDF) Business Statistics A First Course First Canadian Edition All Chapters Instant Download
45 pages
Probability in Computer Science
100% (1)
Probability in Computer Science
353 pages
Visual Statistics Use R PDF
No ratings yet
Visual Statistics Use R PDF
388 pages
Blueman 5th - Chapter 2 HW Soln
No ratings yet
Blueman 5th - Chapter 2 HW Soln
14 pages
Visual Statistics Use R!
50% (2)
Visual Statistics Use R!
388 pages
Section 2-1 #'S 3, 7,, 11: Math 227
0% (1)
Section 2-1 #'S 3, 7,, 11: Math 227
14 pages
Business Club: Basic Statistics
No ratings yet
Business Club: Basic Statistics
26 pages
PDF (eBook PDF) Business Statistics A First Course First Canadian Edition download
100% (7)
PDF (eBook PDF) Business Statistics A First Course First Canadian Edition download
46 pages
Prob Stat Book
No ratings yet
Prob Stat Book
543 pages
Jamovi
100% (2)
Jamovi
519 pages
(eBook PDF) Business Statistics A First Course First Canadian Editioninstant download
100% (3)
(eBook PDF) Business Statistics A First Course First Canadian Editioninstant download
59 pages
Experimental Statistics and Data Analysis for Mechanical and Aerospace Engineers (Advances in Applied Mathematics) 1st Edition Middleton - Download the ebook and explore the most detailed content
100% (1)
Experimental Statistics and Data Analysis for Mechanical and Aerospace Engineers (Advances in Applied Mathematics) 1st Edition Middleton - Download the ebook and explore the most detailed content
72 pages
Statistical Data Analysis
No ratings yet
Statistical Data Analysis
150 pages
DOM105 Session 1
No ratings yet
DOM105 Session 1
31 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
Applied Statistics for Social and Management Sciences 1st Edition by Abdul Quader Miah 9811003998 9789811003998 instant download
100% (1)
Applied Statistics for Social and Management Sciences 1st Edition by Abdul Quader Miah 9811003998 9789811003998 instant download
51 pages
Shipunov Visual Statistics
No ratings yet
Shipunov Visual Statistics
429 pages
Anoka - Hennepin - Probability - and - Statistics-4 2013 NEW PDF
No ratings yet
Anoka - Hennepin - Probability - and - Statistics-4 2013 NEW PDF
357 pages
From Algorithms To ZScores SHORT
100% (2)
From Algorithms To ZScores SHORT
409 pages
Anoka - Hennepin - Probability - and - Statistics-4 2013 NEW - 4 PDF
100% (2)
Anoka - Hennepin - Probability - and - Statistics-4 2013 NEW - 4 PDF
357 pages
(eBook PDF) Business Statistics A First Course First Canadian Edition instant download
No ratings yet
(eBook PDF) Business Statistics A First Course First Canadian Edition instant download
56 pages
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Causal Inference in R: Decipher complex relationships with advanced R techniques for data-driven decision-making
From Everand
Causal Inference in R: Decipher complex relationships with advanced R techniques for data-driven decision-making
Subhajit Das
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
Get Hired as a Data Analyst FAST in 2024
From Everand
Get Hired as a Data Analyst FAST in 2024
Silas Meadowlark
No ratings yet
Mastering Data Science: A Comprehensive Guide to Techniques and Applications
From Everand
Mastering Data Science: A Comprehensive Guide to Techniques and Applications
Adam Jones
No ratings yet
Introduction to R for Business Intelligence
From Everand
Introduction to R for Business Intelligence
Jay Gendron
No ratings yet
Statistics with Rust: 50+ Statistical Techniques Put into Action
From Everand
Statistics with Rust: 50+ Statistical Techniques Put into Action
Keiko Nakamura
No ratings yet
Cha 2
No ratings yet
Cha 2
52 pages
Software Training File 2months Infowiz
No ratings yet
Software Training File 2months Infowiz
72 pages
Biostatistics Practicals
No ratings yet
Biostatistics Practicals
37 pages
AD630 Bal Mod Demod
No ratings yet
AD630 Bal Mod Demod
12 pages
Application Note - Using Low Level Metrics On Speedway Revolution
No ratings yet
Application Note - Using Low Level Metrics On Speedway Revolution
16 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
10 pages
SR Iit Elite RW-7 16-12-2023 Mathematics Solutions
No ratings yet
SR Iit Elite RW-7 16-12-2023 Mathematics Solutions
14 pages
Spatial Design Theories-Roger Trancik
100% (2)
Spatial Design Theories-Roger Trancik
2 pages
Antonius Victor - Viable Systems Model by Stafford Beer
No ratings yet
Antonius Victor - Viable Systems Model by Stafford Beer
21 pages
Unit 1 Progress Check FRQ
No ratings yet
Unit 1 Progress Check FRQ
5 pages
High Frequency Academic Vocabulary Word List
No ratings yet
High Frequency Academic Vocabulary Word List
2 pages
Mathematics - Area Under The Curve
100% (1)
Mathematics - Area Under The Curve
43 pages
Cs106B CheatSheet 2
No ratings yet
Cs106B CheatSheet 2
2 pages
Buy ebook Energy the Subtle Concept The discovery of Feynman s blocks from Leibniz to Einstein Jennifer Coopersmith cheap price
100% (4)
Buy ebook Energy the Subtle Concept The discovery of Feynman s blocks from Leibniz to Einstein Jennifer Coopersmith cheap price
43 pages
Divsion Mathlympics Year 3
No ratings yet
Divsion Mathlympics Year 3
5 pages
One-Dimensional Gas Dynamics: NPTEL IIT Kharagpur: Prof. K.P. Sinhamahapatra, Dept. of Aerospace Engineering
No ratings yet
One-Dimensional Gas Dynamics: NPTEL IIT Kharagpur: Prof. K.P. Sinhamahapatra, Dept. of Aerospace Engineering
8 pages
Q3 Statistics and Probability 11 Module 6
No ratings yet
Q3 Statistics and Probability 11 Module 6
17 pages
Experimental and Numerical Analysis of in Situ Pull Out Tests On Rock Bolts in Claystones
No ratings yet
Experimental and Numerical Analysis of in Situ Pull Out Tests On Rock Bolts in Claystones
25 pages
Geoffrey N. Leech - Explorations in Semantics and Pragmatics PDF
100% (3)
Geoffrey N. Leech - Explorations in Semantics and Pragmatics PDF
141 pages
Runoff Ratio Method
No ratings yet
Runoff Ratio Method
5 pages
11th Maths EM 1st Mid Term Exam 2023 Original Question Paper Virudhunagar District English Medium PDF Download
No ratings yet
11th Maths EM 1st Mid Term Exam 2023 Original Question Paper Virudhunagar District English Medium PDF Download
2 pages
An Introduction To Programming The HP 48G/48GX Calculator: Thiel College
No ratings yet
An Introduction To Programming The HP 48G/48GX Calculator: Thiel College
25 pages
Analisis Probit-Modul MP
No ratings yet
Analisis Probit-Modul MP
11 pages
Sol 9 Fall 04
No ratings yet
Sol 9 Fall 04
8 pages
Bertallanfy An Outline of General Systems Theory
No ratings yet
Bertallanfy An Outline of General Systems Theory
13 pages
Curves and Curve Fitting
No ratings yet
Curves and Curve Fitting
52 pages
Ring Resonator Permitivity
No ratings yet
Ring Resonator Permitivity
66 pages
ME Math 10 Q2 1101 PS
No ratings yet
ME Math 10 Q2 1101 PS
28 pages

Practical Statistics for Data Scientists

Uploaded by

Practical Statistics for Data Scientists

Uploaded by

Se dit

Peter Bruce, Andrew Bruce

Practical Statistics for

Peter Bruce, Andrew Bruce, and Peter Gedeck

Beijing Boston Farnham Sebastopol Tokyo

Editor: Nicole Tache Indexer: Ellen Troutman-Zaig

May 2017: First Edition

Revision History for the Second Edition

See http://oreilly.com/catalog/errata.csp?isbn=9781492072942 for release details.

1. Exploratory Data Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. Data and Sampling Distributions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3. Statistical Experiments and Significance Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Table of Contents | vii

4. Regression and Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

viii | Table of Contents

6. Statistical Machine Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

7. Unsupervised Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

You might also like