Deep Learning Vol 1 From Basics to Practice
Andrew Glassner pdf download
https://textbookfull.com/product/deep-learning-vol-1-from-basics-to-practice-andrew-glassner-2/
★★★★★ 4.7/5.0 (21 reviews) ✓ 195 downloads ■ TOP RATED
"Amazing book, clear text and perfect formatting!" - John R.
DOWNLOAD EBOOK
Deep Learning Vol 1 From Basics to Practice Andrew Glassner
pdf download
TEXTBOOK EBOOK TEXTBOOK FULL
Available Formats
■ PDF eBook Study Guide TextBook
EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME
INSTANT DOWNLOAD VIEW LIBRARY
Collection Highlights
Deep Learning Vol 1 From Basics to Practice Andrew
Glassner
Deep Learning Vol 2 From Basics to Practice Andrew
Glassner
Deep Learning from Scratch 1 / converted Edition Seth
Weidman
Programming Machine Learning From Coding to Deep Learning
1st Edition Paolo Perrotta
From Deep Learning to Rational Machines [converted]
Cameron J. Buckner
From Deep Learning to Rational Machines 1st Edition
Cameron J. Buckner
COVID 19 From Basics To Clinical Practice 1st Edition
Wenhong Zhang
Introduction to Deep Learning From Logical Calculus to
Artificial Intelligence 1st Edition Sandro Skansi
Processing for Visual Artists: How to Create Expressive
Images and Interactive Art 1st Edition Andrew Glassner
Volume, 1 DEEP LEARNING: From Basics to Practice Andrew
Glassner
Deep Learning: From Basics to Practice Volume l Copyright (c) 2018
by Andrew Glassner www.glassner.com / @AndrewGlassner All rights
reserved. No part of this book, except as noted below, maybe
reproduced, stored in a retrieval system, or transmitted in any form
or by any means, without the prior written permission of the author,
except in the case of brief quotations embedded in critical articles or
reviews. The above reservation of rights does not apply to the
program files associated with this book (available on GitHub), or to
the images and figures (also available on GitHub), which are
released under the MIT license. Any images or figures that are not
original to the author retain their original copyrights and protections,
as noted in the book and on the web pages where the images are
provided. All software in this book, or in its associated repositories,
is provided "as is," without warranty of any kind, express or implied,
including but not limited to the warranties of merchantability, fitness
for a particular pupose, and noninfringement. In no event shall the
authors or copyright holders be liable for any claim, damages or
other liability, whether in an action of contract, tort, or otherwise,
arising from, out of or in connection with the software or the use or
other dealings in the software. First published February 20, 2018
Version 1.0.1 March 3, 2018 Version 1.1 March 22, 2018 Published
by The Imaginary Institute, Seattle, WA. http://www.imaginary-
institute.com Contact:
[email protected]For Niko, who's always there with a smile and a wag.
Contents of Both Volumes Volume 1 Preface i Chapter 1: An
Introduction 1 1.1 Why This Chapter Is Here 3 1.1.1 Extracting
Meaning from Data 4 1.1.2 Expert Systems 6 1.2 Learning from
Labeled Data 9 1.2.1 A Learning Strategy 10 1.2.2 A Computerized
Learning Strategy 12 1.2.3 Generalization 16 1.2.4 A Closer Look at
Learning 18 1.3 Supervised Learning 21 1.3.1 Classification 21 1.3.2
Regression 22 1.4 Unsupervised Learning 25 1.4.1 Clustering 25
1.4.2 Noise Reduction 26 1.4.3 Dimensionality Reduction 28 1.5
Generators 32 1.6 Reinforcement Learning 34 1.7 Deep Learning 37
1.8 What's Coming Next 43 References 44 Image credits 45
Chapter 2: Randomness and Basic Statistics 46 2.1 Why This
Chapter Is Here 48 2.2 Random Variables 49 2.2.1 Random Numbers
in Practice 57 2.3 Some Common Distributions 59 2.3.1 The Uniform
Distribution 60 2.3.2 The Normal Distribution 61 2.3.3 The Bernoulli
Distribution 67 2.3.4 The Multinoulli Distribution 69 2.3.5 Expected
Value 70 2.4 Dependence 70 2.4.1 i.i.d. Variables 71 2.5 Sampling
and Replacement 71 2.5.1 Selection With Replacement 73 2.5.2
Selection Without Replacement 74 2.5.3 Making Selections 75 2.6
Bootstrapping 76 2.7 High-Dimensional Spaces 82 2.8 Covariance
and Correlation 85 2.8.1 Covariance 86 2.8.2 Correlation 88 2.9
Anscombe's Quartet 93 References 95
Chapter 3: Probability 97 3.1 Why This Chapter Is Here 99 3.2 Dart
Throwing 100 3.3 Simple Probability 103 3.4 Conditional Probability
104 3.5 Joint Probability 109 3.6 Marginal Probability 114 3.7
Measuring Correctness 115 3.7.1 Classifying Samples 116 3.7.2 The
Confusion Matrix 119 3.7.3 Interpreting the Confusion Matrix 121
3.7.4 When Misclassification Is Okay 126 3.7.5 Accuracy 129 3.7.6
Precision 130 3.7.7 Recall 132 3.7.8 About Precision and Recall 134
3.7.9 Other Measures 137 3.7.10 Using Precision and Recall Together
141 3.7.11 fl Score 143 3.8 Applying the Confusion Matrix 144
References 151
Chapter 4: Bayes Rule 153 4.1 Why This Chapter Is Here 155 4.2
Frequentist and Bayesian Probability 156 4.2.1 The Frequentist
Approach 156 4.2.2 The Bayesian Approach 157 4.2.3 Discussion
158 4.3 Coin Flipping 159 4.4 Is This a Fair Coin? 161 4.4.1
Bayes'Rule 173 4.4.2 Notes on Bayes' Rule 175 4.5 Finding Life Out
There 178 4.6 Repeating Bayes1 Rule 183 4.6.1 The Posterior-Prior
Loop 184 4.6.2 Example: Which Coin Do We Have? 186 4.7 Multiple
Hypotheses 194 References 203 Chapter 5: Curves and Surfaces 205
5.1 Why This Chapter Is Here 207 5.2 Introduction 207 5.3 The
Derivative 210 5.4 The Gradient 222 References 229
Chapter 6: Information Theory 231 6.1 Why This Chapter Is Here
233 6.1.1 Information: One Word, Two Meanings 233 6.2 Surprise
and Context 234 6.2.1 Surprise 234 6.2.2 Context 236 6.3 The Bit as
Unit 237 6.4 Measuring Information 238 6.5 The Size of an Event
240 6.6 Adaptive Codes 241 6.7 Entropy 250 6.8 Cross-Entropy 253
6.8.1 Two Adaptive Codes 253 6.8.2 Mixing Up the Codes 257 6.9 KL
Divergence 260 References 262 Chapter 7: Classification 265 7.1
Why This Chapter Is Here 267 7.2 2D Classification 268 7.2.1 2D
Binary Classification 269 7.3 2D Multi-class classification 275 7.4
Multiclass Binary Categorizing 277 7.4.1 One-Versus-Rest 278 7.4.2
One-Versus-One 280 7.5 Clustering 286
7.6 The Curse of Dimensionality 290 7.6.1 High Dimensional
Weirdness 299 References 307 Chapter 8: Training and Testing 309
8.1 Why This Chapter Is Here 311 8.2 Training 312 8.2.1 Testing the
Performance 314 8.3 Test Data 318 8.4 Validation Data 323 8.5
Cross-Validation 328 8.5.1 k-Fold Cross-Validation 331 8.6 Using the
Results of Testing 334 References 335 Image Credits 336 Chapter 9:
Overfitting and Underfitting 337 9.1 Why This Chapter Is Here 339
9.2 Overfitting and Underfitting 340 9.2.1 Overfitting 340 9.2.2
Underfitting 342 9.3 Overfitting Data 342 9.4 Early Stopping 348 9.5
Regularization 350
9.6 Bias and Variance 352 9.6.1 Matching the Underlying Data 353
9.6.2 High Bias, Low Variance 357 9.6.3 Low Bias, High Variance 359
9.6.4 Comparing Curves 360 9.7 Fitting a Line with Bayes' Rule 363
References 372 Chapter 10: Neurons 374 10.1 Why This Chapter Is
Here 376 10.2 Real Neurons 376 10.3 Artificial Neurons 379 10.3.1
The Perceptron 379 10.3.2 Perceptron History 381 10.3.3 Modern
Artificial Neurons 382 10.4 Summing Up 390 References 390 Chapter
11: Learning and Reasoning 393 11.1 Why This Chapter Is Here 395
11.2 The Steps of Learning 396 11.2.1 Representation 396 11.2.2
Evaluation 400 11.2.3 Optimization 400 11.3 Deduction and
Induction 402 11.4 Deduction 403 11.4.1 Categorical Syllogistic
Fallacies 410
11.5 Induction 415 11.5.1 Inductive Terms in Machine Learning 419
11.5.2 Inductive Fallacies 420 11.6 Combined Reasoning 422 11.6.1
Sherlock Holmes, "Master of Deduction" 424 11.7 Operant
Conditioning 425 References 428 Chapter 12: Data Preparation 431
12.1 Why This Chapter Is Here 433 12.2 Transforming Data 433 12.3
Types of Data 436 12.3.1 One-Hot Encoding 438 12.4 Basic Data
Cleaning 440 12.4.1 Data Cleaning 441 12.4.2 Data Cleaning in
Practice 442 12.5 Normalizing and Standardizing 443 12.5.1
Normalization 444 12.5.2 Standardization 446 12.5.3 Remembering
the Transformation 447 12.5.4 Types of Transformations 448 12.6
Feature Selection 450 12.7 Dimensionality Reduction 451 12.7.1
Principal Component Analysis (PCA) 452 12.7.2 Standardization and
PCA for Images 459 12.8 Transformations 468
12.9 Slice Processing 475 12.9.1 Samplewise Processing 476 12.9.2
Featurewise Processing 477 12.9.3 Elementwise Processing 479
12.10 Cross-Validation Transforms 480 References 486 Image Credits
486 Chapter 13: Classifiers 488 13.1 Why This Chapter Is Here 490
13.2 Types of Classifiers 491 13.3 k-Nearest Neighbors (KNN) 493
13.4 Support Vector Machines (SVMs) 502 13.5 Decision Trees 512
13.5.1 Building Trees 519 13.5.2 Splitting Nodes 525 13.5.3
Controlling Overfitting 528 13.6 Naive Bayes 529 13.7 Discussion
536 References 538 Chapter 14: Ensembles 539 14.1 Why This
Chapter Is Here 541 14.2 Ensembles 542 14.3 Voting 543 14.4
Bagging 544
14.5 Random Forests 547 14.6 ExtraTrees 549 14.7 Boosting 549
References 561 Chapter 15: Scikit-learn 563 15.1 Why This Chapter
Is Here 566 15.2 Introduction 567 15.3 Python Conventions 569
15.4 Estimators 574 15.4.1 Creation 575 15.4.2 Learning with fit()
576 15.4.3 Predicting with predict() 578 15.4.4 decision_function(),
predict_proba() 581 15.5 Clustering 582 15.6 Transformations 587
15.6.1 Inverse Transformations 594 15.7 Data Refinement 598 15.8
Ensembles 601 15.9 Automation 605 15.9.1 Cross-validation 606
15.9.2 Hyperparameter Searching 610 15.9.3 Exhaustive Grid Search
614 15.9.4 Random Grid Search 625 15.9.5 Pipelines 626 15.9.6 The
Decision Boundary 641 15.9.7 Pipelined Transformations 643
15.10 Datasets 647 15.11 Utilities 650 15.12 Wrapping Up 652
References 653 Chapter 16: Feed-Forward Networks 655 16.1 Why
This Chapter Is Here 657 16.2 Neural Network Graphs 658 16.3
Synchronous and Asynchronous Flow 661 16.3.1 The Graph in
Practice 664 16.4 Weight Initialization 664 16.4.1 Initialization 667
References 670 Chapter 17: Activation Functions 672 17.1 Why This
Chapter Is Here 674 17.2 What Activation Functions Do 674 17.2.1
The Form of Activation Functions 679 17.3 Basic Activation Functions
679 17.3.1 Linear Functions 680 17.3.2 The Stair-Step Function 681
17.4 Step Functions 682 17.5 Piecewise Linear Functions 685 17.6
Smooth Functions 690 17.7 Activation Function Gallery 698
17.8 Softmax 699 References 702 Chapter 18: Backpropagation 703
18.1 Why This Chapter Is Here 706 18.1.1 A Word On Subtlety 708
18.2 A Very Slow Way to Learn 709 18.2.1 A Slow Way to Learn 712
18.2.2 A Faster Way to Learn 716 18.3 No Activation Functions for
Now 718 18.4 Neuron Outputs and Network Error 719 18.4.1 Errors
Change Proportionally 720 18.5 A Tiny Neural Network 726 18.6
Step 1: Deltas for the Output Neurons 732 18.7 Step 2: Using Deltas
to Change Weights....745 18.8 Step 3: Other Neuron Deltas 750 18.9
Backprop in Action 758 18.10 Using Activation Functions 765 18.11
The Learning Rate 774 18.11.1 Exploring the Learning Rate 777
18.12 Discussion 787 18.12.1 Backprop In One Place 787 18.12.2
What Backprop Doesn't Do 789 18.12.3 What Backprop Does Do 789
18.12.4 Keeping Neurons Happy 790 18.12.5 Mini-Batches 795
18.12.6 Parallel Updates 796 18.12.7 Why Backprop Is Attractive 797
18.12.8 Backprop Is Not Guaranteed 797 18.12.9 A Little History 798
18.12.10 Digging into the Math 800 References 802 Chapter 19:
Optimizers 805 19.1 Why This Chapter Is Here 807 19.2 Error as
Geometry 807 19.2.1 Minima, Maxima, Plateaus, and Saddles 808
19.2.2 Error as A 2D Curve 814 19.3 Adjusting the Learning Rate
817 19.3.1 Constant-Sized Updates 819 19.3.2 Changing the
Learning Rate Over Time 829 19.3.3 Decay Schedules 832 19.4
Updating Strategies 836 19.4.1 Batch Gradient Descent 837 19.4.2
Stochastic Gradient Descent (SGD) 841 19.4.3 Mini-Batch Gradient
Descent 844 19.5 Gradient Descent Variations 846 19.5.1
Momentum 847 19.5.2 Nesterov Momentum 856 19.5.3 Adagrad 862
19.5.4 Adadelta and RMSprop 864 19.5.5 Adam 866
19.6 Choosing An Optimizer 868 References 870 Volume 2 Chapter
20: Deep Learning 872 20.1 Why This Chapter Is Here 874 20.2
Deep Learning Overview 874 20.2.1 Tensors 878 20.3 Input and
Output Layers 879 20.3.1 Input Layer 879 20.3.2 Output Layer 880
20.4 Deep Learning Layer Survey 881 20.4.1 Fully-Connected Layer
882 20.4.2 Activation Functions 883 20.4.3 Dropout 884 20.4.4
Batch Normalization 887 20.4.5 Convolution 890 20.4.6 Pooling
Layers 892 20.4.7 Recurrent Layers 894 20.4.8 Other Utility Layers
896 20.5 Layer and Symbol Summary 898 20.6 Some Examples 899
20.7 Building A Deep Learner 910 20.7.1 Getting Started 912 20.8
Interpreting Results 913 20.8.1 Satisfactory Explainability 920
References 923 Image credits: 925 Chapter 21: Convolutional Neural
Networks 927 21.1 Why This Chapter Is Here 930 21.2 Introduction
931 21.2.1 The Two Meanings of "Depth" 932 21.2.2 Sum of Scaled
Values 933 21.2.3 Weight Sharing 938 21.2.4 Local Receptive Field
940 21.2.5 The Kernel 943 21.3 Convolution 944 21.3.1 Filters 948
21.3.2 A Fly's-Eye View 953 21.3.3 Hierarchies of Filters 955 21.3.4
Padding 963 21.3.5 Stride 966 21.4 High-Dimensional Convolution
971 21.4.1 Filters with Multiple Channels 975 24.4.2 Striding for
Hierarchies 977 24.5 1D Convolution 979 24.6 1x1 Convolutions 980
24.7 A Convolution Layer 983 24.7.1 Initializing the Filter Weights
984 24.8 Transposed Convolution 985 24.9 An Example Convnet 991
24.9.1 VGG16 996 21.9.2 Looking at the Filters, Part 1 1001 21.9.3
Looking at the Filters, Part 2 1008
21.10 Adversaries 1012 References 1017 Image credits 1022
Chapter 22: Recurrent Neural Networks 1023 22.1 Why This Chapter
Is Here 1025 22.2 Introduction 1027 22.3 State 1030 22.3.1 Using
State 1032 22.4 Structure of an RNN Cell 1037 22.4.1 A Cell with
More State 1042 22.4.2 Interpreting the State Values 1045 22.5
Organizing Inputs 1046 22.6 Training an RNN 1051 22.7 LSTM and
GRU 1054 22.7.1 Gates 1055 22.7.2 LSTM 1060 22.8 RNN Structures
1066 22.8.1 Single or Many Inputs and Outputs 1066 22.8.2 Deep
RNN 1070 22.8.3 Bidirectional RNN 1072 22.8.4 Deep Bidirectional
RNN 1074 22.9 An Example 1076 References 1084
Chapter23: Keras Parti 1090 23.1 Why This Chapter Is Here 1093
23.1.1 The Structure of This Chapter 1094 23.1.2 Notebooks 1094
23.1.3 Python Warnings 1094 23.2 Libraries and Debugging 1095
23.2.1 Versions and Programming Style 1097 23.2.2 Python
Programming and Debugging 1098 23.3 Overview 1100 23.3.1
What's a Model? 1101 23.3.2 Tensors and Arrays 1102 23.3.3 Setting
Up Keras 1102 23.3.4 Shapes of Tensors Holding Images 1104
23.3.5 GPUs and Other Accelerators 1108 23.4 Getting Started 1109
23.4.1 Hello, World 1110 23.5 Preparing the Data 1114 23.5.1
Reshaping 1115 23.5.2 Loading the Data 1126 23.5.3 Looking at the
Data 1129 23.5.4 Train-test Splitting 1136 23.5.5 Fixing the Data
Type 1138 23.5.6 Normalizing the Data 1139 23.5.7 Fixing the
Labels 1142 23.5.8 Pre-Processing All in One Place 1148 23.6 Making
the Model 1150 23.6.1 Turning Grids into Lists 1152 23.6.2 Creating
the Model 1154 23.6.3 Compiling the Model 1163 23.6.4 Model
Creation Summary 1167
23.7 Training The Model 1169 23.8 Training and Using A Model 1172
23.8.1 Looking at the Output 1174 23.8.2 Prediction 1180 23.8.3
Analysis of Training History 1186 23.9 Saving and Loading 1190
23.9.1 Saving Everything in One File 1190 23.9.2 Saving Just the
Weights 1191 23.9.3 Saving Just the Architecture 1192 23.9.4 Using
Pre-Trained Models 1193 23.9.5 Saving the Pre-Processing Steps
1194 23.10 Callbacks 1195 23.10.1 Checkpoints 1196 23.10.2
Learning Rate 1200 23.10.3 Early Stopping 1201 References 1205
Image Credits 1208 Chapter 24: Keras Part 2 1209 24.1 Why This
Chapter Is Here 1212 24.2 Improving the Model 1212 24.2.1
Counting Up Hyperparameters 1213 24.2.2 Changing One
Hyperparameter 1214 24.2.3 Other Ways to Improve 1218 24.2.4
Adding Another Dense Layer 1219 24.2.5 Less Is More 1221 24.2.6
Adding Dropout 1224 24.2.7 Observations 1230
24.3 Using Scikit-Learn 1231 24.3.1 Keras Wrappers 1232 24.3.2
Cross-Validation 1237 24.3.3 Cross-Validation with Normalization
1243 24.3.4 Hyperparameter Searching 1247 24.4 Convolution
Networks 1259 24.4.1 Utility Layers 1260 24.4.2 Preparing the Data
for A CNN 1263 24.4.3 Convolution Layers 1268 24.4.4 Using
Convolution for MNIST 1276 24.4.5 Patterns 1290 24.4.6 Image
Data Augmentation 1293 24.4.7 Synthetic Data 1298 24.4.8
Parameter Searching for Convnets 1300 24.5 RNNs 1301 24.5.1
Generating Sequence Data 1302 24.5.2 RNN Data Preparation 1306
24.5.3 Building and Training an RNN 1314 24.5.4 Analyzing RNN
Performance 1320 24.5.5 A More Complex Dataset 1330 24.5.6 Deep
RNNs 1334 24.5.7 The Value of More Data 1338 24.5.8 Returning
Sequences 1343 24.5.9 Stateful RNNs 1349 24.5.10 Time-Distributed
Layers 1352 24.5.11 Generating Text 1357 24.6 The Functional API
1366 24.6.1 Input Layers 1370 24.6.2 Making A Functional Model
1371 References 1378 Image Credits 1379
Chapter 25: Autoencoders 1380 25.1 Why This Chapter Is Here 1382
25.2 Introduction 1383 25.2.1 Lossless and Lossy Encoding 1384
25.2.2 Domain Encoding 1386 25.2.3 Blending Representations 1388
25.3 The Simplest Autoencoder 1393 25.4 A Better Autoencoder
1400 25.5 Exploring the Autoencoder 1405 25.5.1 A Closer Look at
the Latent Variables 1405 25.5.2 The Parameter Space 1409 25.5.3
Blending Latent Variables 1415 25.5.4 Predicting from Novel Input
1418 25.6 Discussion 1419 25.7 Convolutional Autoencoders 1420
25.7.1 Blending Latent Variables 1424 25.7.2 Predicting from Novel
Input 1426 26.8 Denoising 1427 25.9 Variational Autoencoders 1430
25.9.1 Distribution of Latent Variables 1432 25.9.2 Variational
Autoencoder Structure 1433 29.10 Exploring the VAE 1442
References 1455 Image credits 1457
Chapter 26: Reinforcement Learning 1458 26.1 Why This Chapter Is
Here 1461 26.2 Goals 1462 26.2.1 Learning A New Game 1463 26.3
The Structure of RL 1469 26.3.1 Step 1: The Agent Selects an Action
1471 26.3.2 Step 2: The Environment Responds 1473 26.3.3 Step 3:
The Agent Updates Itself 1475 26.3.4 Variations on The Simple
Version 1476 26.3.5 Back to the Big Picture 1478 26.3.6 Saving
Experience 1480 26.3.7 Rewards 1481 26.4 Flippers 1490 26.5 L-
learning 1492 26.5.1 Handling Unpredictability 1505 26.6 Q-learning
1509 26.6.1 Q-values and Updates 1510 26.6.2 Q-Learning Policy
1514 26.6.3 Putting It All Together 1518 26.6.4 The Elephant in the
Room 1519 26.6.5 Q-Iearning in Action 1521 26.7 SARSA 1532
26.7.1 SARSA in Action 1535 26.7.2 Comparing Q-learning and
SARSA 1543 26.8 The Big Picture 1548 26.9 Experience Replay 1550
26.10 Two Applications 1551 References 1554 Chapter 27:
Generative Adversarial Networks... 1558 27.1 Why This Chapter Is
Here 1560 27.2 A Metaphor: Forging Money 1562 27.2.1 Learning
from Experience 1566 27.2.2 Forging with Neural Networks 1569
27.2.3 A Learning Round 1572 27.3 Why Antagonistic? 1574 27.4
Implementing GANs 1575 27.4.1 The Discriminator 1576 27.4.2 The
Generator 1577 27.4.3 Training the GAN 1578 27.4.4 Playing the
Game 1581 27.5 GANs in Action 1582 27.6 DCGANs 1591 27.6.1
Rules of Thumb 1595 27.7 Challenges 1596 27.7.1 Using Big Samples
1597 27.7.2 Modal Collapse 1598 References 1600
Chapter 28: Creative Applications 1603 28.1 Why This Chapter Is
Here 1605 28.2 Visualizing Filters 1605 28.2.1 Picking A Network
1605 28.2.2 Visualizing One Filter 1607 28.2.3 Visualizing One Layer
1610 23.3 Deep Dreaming 1613 28.4 Neural Style Transfer 1620
28.4.1 Capturing Style in a Matrix 1621 28.4.2 The Big Picture 1623
28.4.3 Content Loss 1624 28.4.4 Style Loss 1628 28.4.5 Performing
Style Transfer 1633 28.4.6 Discussion 1640 28.5 Generating More of
This Book 1642 References 1644 Image Credits 1646 Chapter 29:
Datasets 1648 29.1 Public Datasets 1650 29.2 MNISTand Fashion-
MNIST 1651 29.3 Built-in Library Datasets 1652 29.3.1 scikit-learn
1652 29.3.2 Keras 1653 29.4 Curated Dataset Collections 1654 29.5
Some Newer Datasets 1655
Chapter 30: Glossary 1658 About The Glossary 1660 0-9 1660 Greek
Letters 1661 A 1662 B 1665 C 1670 D 1678 E 1685 F 1688 G 1694 H
1697 1 1699 J 1703 K 1703 L 1704 M 1708 N 1713 O 1717 P 1719 Q
1725 R 1725 S 1729 T 1738 U 1741 V 1743 W 1745 X 1746 Z 1746
Ttefac& Welcome! A few quick words of introduction to the book,
how to get the notebooks and figures, and thanks to the people who
helped me.
Preface What You'll Get from This Book Hello! If you're interested in
deep learning (DL) and machine learning (ML), then there's good
stuff for you in this book. My goal in this book is to give you the
broad skills to be an effective practitioner of machine learning and
deep learning. When you've read this book, you will be able to: •
Design and train your own deep networks. • Use your networks to
understand your data, or make new data. • Assign descriptive
categories to text, images, and other types of data. • Predict the
next value for a sequence of data. • Investigate the structure of your
data. • Process your data for maximum efficiency. • Use any
programming language and DL library you like. • Understand new
papers and ideas, and put them into practice. • Enjoy talking about
deep learning with other people. We'll take a serious but friendly
approach, supported by tons of illustrations. And we'll do it all
without any code, and without any math beyond multiplication. If
that sounds good to you, welcome aboard! ii
Preface Who This Book Is For This book is designed for people who
want to use machine learning and deep learning in their own work.
This includes programmers, artists, engineers, scientists, executives,
musicians, doctors, and anyone else who wants to work with large
amounts of information to extract meaning from it, or generate new
data. Many of the tools of machine learning, and deep learning in
particular, are embodied in multiple free, open-source libraries that
anyone can immediately download and use. Even though these tools
are free and easy to install, they still require significant technical
knowledge to use them properly. It's easy to ask the computer to do
something nonsensical, and it will happily do it, giving us back more
nonsense as output. This kind of thing happens all the time. Though
machine learning and deep learning libraries are powerful, they're
not yet user friendly. Choosing the right algorithms, and then
applying them properly, still requires a stream of technically
informed decisions. When things often don't go as planned, we need
to use our knowledge of what's going on inside the system in order
to fix it. There are multiple approaches to learning and mastering
this essential information, depending on how you like to learn. Some
people like hardcore, detailed algorithm analysis, supported by
extensive mathematics. If that's how you like to learn, there are
great books out there that offer this style of presentation [Bishopo6]
[Goodfellowi7]. This approach requires intensive effort, but pays off
with a thorough understanding of how and why the machinery
works. If you start this way, then you have to put in another chunk
of work to translate that theoretical knowledge into contemporary
practice. iii
Preface At the other extreme, some people just want to know how
to do some particular task. There are great books that take this
cookbook approach for various machine-learning libraries [Cholleti7]
[Muller-Guidoi6] [Raschkais] [VanderPlasi6]. This approach is easier
than the mathematically intensive route, but you can feel like you're
missing the structural information that explains why things work as
they do. Without that information, and its vocabulary, it can be hard
to work out why something that you think ought to work doesn't
work, or why something doesn't work as well as you thought it
should. It can also be challenging to read the literature describing
new ideas and results, because those discussions usually assume a
shared body of underlying knowledge that an approach based on a
single library or language doesn't provide. This book takes a middle
road. My purpose is practical: to give you the tools to practice deep
learning with confidence. I want you to make wise choices as you do
your work, and be able to follow the flood of exciting new ideas
appearing almost every day. My goal here is to cover the
fundamentals just deeply enough to give you a broad base of
support. I want you to have enough background not just for the
topics in this book, but also the materials you're likely to need to
consult and read as you actually do deep learning work. This is not a
book about programming. Programming is important, but it
inevitably involves all kinds of details that are irrelevant to our larger
subject. And programming examples lock us into one library, or one
language. While such details are necessary to building final systems,
they can be distracting when we're trying to focus in the big ideas.
Rather than get waylaid by discussions of loops and indices and data
structures, we discuss everything here in a language and library
independent way. Once you have the ideas firmly in place, reading
the documentation for any library will be a straightforward affair. We
do put our feet on the ground in Chapters 15, 23, and 24, when we
discuss the scikit-learn library for machine learning, and the Keras
library for deep learning. These libraries are both Python based. In iv
Preface those chapters we dive into the details of those Python
libraries, and include plenty of example code. Even if you're not into
Python, these programs will give you a sense for typical workflows
and program structures, which can help show how to attack a new
problem. The code in those programming chapters is available in
Python notebooks. These are for use with the browser-based Jupyter
programming environment [Jupyten6]. Alternatively, you can use a
more classical Python development environment, such as PyCharm
[JetBrainsi7]. Most of the other chapters also have supporting,
optional Python notebooks. These give the code for every computer-
generated figure in the book, often using the techniques discussed
in that chapter. Because we're not really focusing on Python and
programming (except for the chapters mentioned above), these
notebooks are meant as a "behind the scenes" look, and are only
lightly commented. Machine learning, deep learning, and big data
are having an unexpectedly rapid and profound influence on
societies around the world. What this means for people and cultures
is a complicated and important subject. Some interesting books and
articles tackle the topic head-on, often coming to subtle mixtures of
positive and negative conclusions [Aguera y Areas 17] [Barrati.5]
[Domingosis] [Kaplani6]. V
Preface Almost No Math Lots of smart people are not fans of
complicated equations. If that's you, then you're right at home!
There's just about no math in this book. If you're comfortable with
multiplication, you're set, because that's as mathematical as we get.
Many of the algorithms we'll discuss are based on rich sources of
theory and are the result of careful analysis and development. It's
important to know that stuff if you're modifying the algorithm for
some new purpose, or writing your own implementation. But in
practice, just about everyone uses highly optimized implementations
written by experts, available in free and open-source libraries. Our
goals are to understand the principles of these techniques, how to
apply them properly, and how to interpret the results. None of that
requires us to get into the mathematical structure that's under the
hood. If you love math, or you want to see the theory, follow the
references in each chapter. Much of this material is elegant and
intellectually stimulating, and provides details that I have
deliberately omitted from this book. But if math isn't your thing,
there's no need to get into it. Lots of Figures Some ideas are more
clearly communicated with pictures than with words. And even when
words do the job, a picture can help cement the ideas. So this book
is profusely illustrated with original figures. All of the figures in this
book are available for free download (see below). vi
Preface Downloads You can download the Jupyter/Python notebooks
for this book, all the figures, and other files related to this book, all
for free. All the Notebooks All of the Jupyter/Python notebooks for
this book are available on GitHub. The notebooks for Chapter 15
(scikit-learn) and Chapters 23 and 24 (Keras) contain all the code
that's presented in those chapters. The other notebooks are
available as a kind of "behind the scenes" look at how the book's
figures were made. They're lightly documented, and meant to serve
more as references than tutorials. The notebooks are released under
the MIT license, which basically means that you're free to use them
for any purpose. There are no promises of any kind that the code is
free of bugs, that it will run properly, that it won't crash, and so on.
Feel free to grab the code and adapt it as you see fit, though as the
license says, keep the copyright notice around (it's in the file named
simply LICENSE).
https://github.com/blueberrymusic/DeepLearningBookCode-Volumel
https://gi thub.com/blueberrymusi c/DeepLearni ngBookCode-
Volume2 All the Figures All of the figures in this book are available
on GitHub as high-resolution PNG files. You're free to use them in
classes, talks, lectures, reports, papers, even other books. Like the
code, the figures are provided under the MIT license, so you can use
them as you like as long as you keep the copyright notice around.
You don't have to credit me as their creator when you use these
figures, but I'd appreciate it if you would. vii
Preface The filenames match the figure numbers in the book, so
they're easy to find. When you're looking for something visually, it
may be helpful to look at the thumbnail pages. These hold 20
images each:
https://github.com/blueberrymusic/DeepLearningBookFigures-
Thumbnails The figures themselves are grouped into the two
volumes:
https://github.com/blueberrymusic/DeepLearningBookFigures-
Volumel
https://github.com/blueberrymusic/DeepLearningBookFigures-
Volume2 Resources The resources directory contains other files,
such as a template for the deep learning icons we use later in the
book. https://gi thub.com/blueberrymusi c/DeepLearni ngBook-
Resources Errata Despite my best efforts, no book of this size is
going to be free of errors. If you spot something that seems wrong,
please let me know at
[email protected]. I'll keep a list of errata
on the book's website at https://dlbasics.com. Two Volumes This
ended up as a large book, so I've organized it into two volumes of
roughly equal size. Because the book is cumulative, the second
volume picks up where the first leaves off. If you're reading the
second volume now, you should have already read the first volume,
or feel confident that you understand the material presented there.
viii
Preface Thank You! Authors like to say that nobody writes a book
alone. We say that because it's true. For their consistent and
enthusiastic support of this project, and helping me feel good about
it all the way through, I am enormously grateful to Eric Braun, Eric
Haines, Steven Drucker, and Tom Reike. Thank you for your
friendship and encouragement. Huge thanks are due to my
reviewers, whose generous and insightful comments greatly
improved this book: Adam Finkelstein, Alex Colburn, Alexander
Keller, Alyn Rockwood, Angelo Pesce, Barbara Mones, Brian Wyvill,
Craig Kaplan, Doug Roble, Eric Braun, Eric Haines, Greg Turk, Jeff
Hultquist, Jessica Hodgins, Kristi Morton, Lesley Istead, Luis
Avarado, Matt Pharr, Mike Tyka, Morgan McGuire, Paul Beardsley,
Paul Strauss, Peter Shirley, Philipp Slusallek, Serban Porumbescu,
Stefanus Du Toit, Steven Drucker, Wenhao Yu, and Zackory Erickson.
Special thanks to super reviewers Alexander Keller, Eric Haines,
Jessica Hodgins, and Luis Avarado, who read all or most of the
manuscript and offered terrific feedback on both presentation and
structure. Thanks to Morgan McGuire for Markdeep, which enabled
me to focus on what I was saying, rather than the mechanics of how
to format it. It made writing this book a remarkably smooth and fluid
process. Thanks to Todd Szymanski for insightful advice on the
design and layout of the book's contents and covers, and for
catching layout errors. Thanks to early readers who caught typos
and other problems: Christian Forfang, David Pol, Eric Haines, Gopi
Meenakshisundaram, Kostya Smolenskiy, Mauricio Vives, Mike Wong,
and Mrinal Mohit. All of these people improved the book, but the
final decisions were my own. Any problems that remain are my
responsibility. ix
Preface References This section appears in every chapter. It contains
references to all the documents that are referred to in the body of
the chapter. There may also be other useful papers, websites,
documentation, blogs, and other resources. Whenever possible, I've
preferred to use references that are available online, so you can
immediately access them using the provided link. The exceptions are
usually books, but occasionally I'll include an important online
reference even if it's behind a paywall. [Agiiera y Areas 17] Blaise
Agiiera y Areas, Margaret Mitchell and Alexander Todorov,
"Physiognomy's New Clothes", Medium, 2017.
https://medium.eom/@blaisea/ physiognomys-new-clothes-
f2d4b59fdd6a [Barratis] James Barrat, "Our Final Invention: Artificial
Intelligence and the End of the Human Era", St. Martin's Griffin,
2015. [Bishopo6] Christopher M. Bishop, "Pattern Recognition and
Machine Learning", Springer-Verlag, pp. 149-152, 2006. [Cholleti7]
Francois Chollet, "Deep Learning with Python", Manning Publications,
2017. [Domingosi5] Pedro Domingos, "The Master Algorithm", Basic
Books, 2015. [Goodfellowi7] Ian Goodfellow, Yoshua Bengio, Aaron
Courville, "Deep Learning", MIT Press, 2017.
http://www.deeplearning- book.org/ [JetBrainsi7] Jet Brains,
"Pycharm Community Edition IDE", 2017.
https://www.jetbrains.com/pycharm/ [Jupyten6] The Jupyter team,
2016. http://jupyter.org/ X
Preface [Kaplani6] Jerry Kaplan, "Artificial Intelligence: What
Everyone Needs to Know", Oxford University Press, 2016. [Muller-
Guidoi6] Andreas C. Muller and Sarah Guido, "Introduction to
Machine Learning with Python", O'Reilly Press, 2016. [Raschkai5]
Sebastian Raschka, "Python Machine Learning", Packt Publishing,
2015. [VanderPlas 16] Jake VanderPlas, "Python Data Science
Handbook", O'Reilly Media, 2016. xi
Chapter 1 An Introduction to Machine Learning and Deep Learning A
quick overview of the ideas, language, and techniques that we'll be
using throughout the book.
Chapter 1: An Introduction to Machine Learning and Deep Learning
Contents 1.1 Why This Chapter Is Here 3 1.1.1 Extracting Meaning
from Data 4 1.1.2 Expert Systems 6 1.2 Learning from Labeled Data
9 1.2.1 A Learning Strategy 10 1.2.2 A Computerized Learning
Strategy 12 1.2.3 Generalization 16 1.2.4 A Closer Look at Learning
18 1.3 Supervised Learning 21 1.3.1 Classification 21 1.3.2
Regression 22 1.4 Unsupervised Learning 25 1.4.1 Clustering 25
1.4.2 Noise Reduction 26 1.4.3 Dimensionality Reduction 28 1.5
Generators 32 1.6 Reinforcement Learning 34 1.7 Deep Learning 37
1.8 What's Coming Next 43 References 44 Image credits 45 2
Chapter 1: An Introduction to Machine Learning and Deep Learning
1.1 Why This Chapter Is Here This chapter is to help us get familiar
with the big ideas and basic terminology of machine learning. The
phrase machine learning describes a growing body of techniques
that all have one goal: discover meaningful information from data.
Here, "data" refers to anything that can be recorded and measured.
Data can be raw numbers (like stock prices on successive days, or
the mass of different planets, or the heights of people visiting a
county fair), but it can also be sounds (the words someone speaks
into their cell phone), pictures (photographs of flowers or cats),
words (the text of a newspaper article or a novel), or anything else
that we want to investigate. "Meaningful information" is whatever
we can extract from the data that will be useful to us in some way.
We get to decide what's meaningful to us, and then we design an
algorithm to find as much of it as possible from our data. The phrase
"machine learning" describes a wide diversity of algorithms and
techniques. It would be nice to nail down a specific definition for the
phrase, but it's used by so many people in so many different ways
that it's best to consider it the name for a big, expanding collection
of algorithms and principles that analyze vast quantities of training
data in order to extract meaning from it. More recently, the phrase
deep learning was coined to refer to approaches to machine learning
that use specialized layers of computation, stacked up one after the
next. This makes a "deep" structure, like a stack of pancakes. Since
"deep learning" refers to the nature of the system we create, rather
than any particular algorithm, it really refers to this particular style
or approach to machine learning. It's an approach that has paid off
enormously well in recent years. 3
Chapter 1: An Introduction to Machine Learning and Deep Learning
Let's look at some example applications that use machine learning to
extract meaning from data. 1.1.1 Extracting Meaning from Data The
post office needs to sort an enormous number of letters and
packages every day, based on hand-written zip codes. They have
taught computers to read those codes and route the mail
automatically, as in the left illustration in Figure 1.1. Tottor
PeHentescjue Scelerisque 13472 3\ \4\ m \2 TTTT 3 4 7 2 T w e n t
... 2 5 10 Twenty-five dollars and ten cents 25.10 "Bob" "John"
Figure 1.1: Extracting meaning from data sets. Left: Getting a zip
code from an envelope. Middle: Reading the numbers and letters on
a check. Right: Recognizing faces from photos. Banks process vast
quantities of hand-written checks. A valid check requires that the
numerical amount hand-written into the total field (e.g., "$25.10")
matches the written-out amount on the text line (e.g., "twenty-five
dollars and ten cents"). Computers can read both the numbers and
words and confirm a match, as in the middle of Figure 1.1. Social
media sites want to identify the people in their member's
photographs. This means not only detecting if there are any faces in
a given photo, but identifying where those faces are located, and
then matching up each face with previously seen faces. This is made
even more difficult when we realize that virtually every photo of a
person is unique: the lighting, angle, expression, clothing, and many
other qualities will 4
Chapter 1: An Introduction to Machine Learning and Deep Learning
be different from any previous photo. They'd like to be able to take
any photo of any person, and correctly identify who it is, as in the
right of Figure 1.1. Digital assistant providers listen to what people
say into their gadgets so that they can respond intelligently. The
signal from the microphone is a series of numbers that describe the
sound pressure striking the microphone's membrane. The providers
want to analyze those numbers in order to understand the sounds
that caused them, the words that these sounds were part of, the
sentences that the words were part of, and ultimately the meaning
of those sentences, as suggested by the left illustration in Figure 1.2.
Km WW] pv^^^^^ %,■■■' NfliPH L'' ', ,m 1 W 1 \ I / k aw f ee sh
ah p c offee coffe< shop 3 shop j& ® aopulation nated [ Estir 300
200 100 1975 1985 1995 2005 Year 2015 Figure 1.2: Extracting
meaning from data. Left: Turning a recording into sounds, then
words, and ultimately a complete utterance. Middle: Finding one
unusual event in a particle smasher's output full of similar-looking
trails. Right: Predicting the population of the northern resident Orca
whale population off of Canada's west coast [Towers15]. Scientists
create vast amounts of data from flying drones, high-energy physics
experiments, and observations of deep space. From these
overwhelming floods of data they often need to pick out the few
instances of events that are almost like all the others, but are slightly
different. Looking through all the data manually would be an
effectively impossible task even for huge numbers of highly-trained
specialists. It's much better to automate this process with computers
that can comb the data exhaustively, never missing any details, as in
the middle of Figure 1.2. 5
Chapter 1: An Introduction to Machine Learning and Deep Learning
Conservationists track the populations of species over time to see
how well they're doing. If the population declines over a long period
of time, they may need to take action to intervene. If the population
is stable, or growing, then they may be able to rest more easily.
Predicting the next value of a sequence of values is something that
we can train a computer to do well. The recorded annual size of a
population of Orca whales off the Canadian coast, along with a
possible predicted value, is shown in the right of Figure 1.2 (adapted
from [Towersi.5]). These six examples illustrate applications that
many of us are already familiar with, but there are many more.
Because of their ability to extract meaningful information quickly,
machine learning algorithms are finding their way into an expanding
range of fields. The common threads here are the sheer volume of
the work involved, and its painstaking detail. We might have millions
of pieces of data to examine, and we're trying to extract some
meaning from every one of them. Humans get tired, bored, and
distracted, but computers just plow on steadily and reliably. 1.1.2
Expert Systems A popular, early approach to finding the meaning
that's hiding inside of data involved creating expert systems. The
essence of the idea was that we would study what human experts
know, what they do, and how they do it, and automate that. In
essence, we'd make a computer system that could mimic the human
experts it was based on. This often meant creating a rule-based
system, where we'd amass a large number of rules for the computer
to follow in order to imitate the human expert. For instance, if we're
trying to recognize digits in zip codes, we might have a rule that
says that 7's are shapes that have a mostly horizontal line near the
top of the figure, and then a mostly diagonal line that starts at the
right edge of the horizontal line and moves left and down, as in
Figure 1.3. 6
Chapter 1: An Introduction to Machine Learning and Deep Learning
digit 7 O horizontal + NE-SW + lines meet line diagonal at upper
right Figure 1.3: Devising a set of rules to recognize a hand-written
digit 7. Top: A typical 7 we'd like to identify. Bottom: The three rules
that make up a 7. A shape would be classified as a 7 if it satisfies all
three rules. We'd have similar rules for every digit. This might work
well enough until we get a digit like Figure 1.4. Figure 1.4: This 7 is
a valid way to write a 7, but it would not be recognized by the rules
of Figure 1.3 because of the extra line. We hadn't thought about
how some people put a bar through the middle of their 7's. So now
we add another rule for that special case. This process of hand-
crafting the rules to understand data is sometimes called feature
engineering (the term is also used to describe when we use the
computer to find these features for us [VanderPlasi6]). 7
Chapter 1: An Introduction to Machine Learning and Deep Learning
The phrase describes our desire to engineer (or design) all of the
features (or qualities) that a human expert uses and combines to do
their job. In general, it's a very tough job. As we saw, it's easy to
overlook one rule, or even lots of them. Imagine trying to find a set
of rules that summarize how a radiologist determines whether a
smudge on an X-ray image is benign or not, or how an air-traffic
controller handles heavily scheduled air traffic, or how someone
drives a car safely in extreme weather conditions. Rule-based expert
systems are able to manage some jobs, but the difficulty of manually
crafting the right set of rules, and making sure they work properly
across a wide variety of data, has spelled their doom as a general
solution. Articulating every step in a complicated process is
extremely difficult, and when we have to factor in human judgment
based on experience and hunches, it becomes nearly impossible for
any but the simplest scenarios. The beauty of machine learning
systems is that (on a conceptual level) they learn a dataset's
relevant characteristics automatically. So we don't have to tell an
algorithm how to recognize a 2 or a 7, because the system figures
that out for itself. But to do that well, the system often needs a lot of
data. Enormous amounts of data. That's a big reason why machine
learning has exploded in popularity and applications in the last few
years. The flood of raw data provided by the Internet has let these
tools extract a lot of meaning from a lot of data. Online companies
are able to make use of every interaction with every customer to
accumulate more data, which they can then turn around and use as
input to their machine learning algorithms, providing them with even
more information about their customers. 8
Chapter 1: An Introduction to Machine Learning and Deep Learning
1.2 Learning from Labeled Data There are lots of machine learning
algorithms, and we'll look at many of them in this book. Many are
conceptually straightforward (though their underlying math or
programming could be complex). For instance, suppose we want to
find the best straight line through a bunch of data points, as in
Figure 1.5. **• ' % -30 -20 -15 -10 -5 0 5 10 15 20 Figure 1.5:
Given a set of data points (in blue), we can imagine a
straightforward algorithm that computes the best straight line (in
red) through those points. Conceptually, we can imagine an
algorithm that represents any straight line with just a few numbers.
It then uses some formulas to compute those numbers, given the
numbers that represent the input data points. This is a familiar kind
of algorithm, which uses a carefully thought-out analysis to find the
best way to solve a problem, and is then implemented in a program
that performs that analysis. This is a strategy used by many machine
learning algorithms. By contrast, the strategy used by many deep
learning algorithms is less familiar. It involves slowly learning from
examples, a little bit at a time, over and over. Each time the program
sees a new piece of data to be learned from, it improves its own
parameters, ultimately finding a set of values that will do a good job
of computing what we want. While 9 20 10 -10 -20
Chapter 1: An Introduction to Machine Learning and Deep Learning
we're still carrying out an algorithm, it's much more open-ended
than the one that fits a straight line. The idea here is that we don't
know how to directly calculate the right answer, so we build a
system that can figure out how to do that itself. Our analysis and
programming is to create an algorithm that can work out its own
answers, rather than implementing a known process that directly
yields an answer. If that sounds pretty wild, it is. Programs that can
find their own answers in this way are at the heart of the recent
enormous success of deep learning algorithms. In the next few
sections, we'll look more closely at this technique to get a feeling for
it, since it's probably less familiar than the more traditional machine
learning algorithms. Ultimately we'd like to be able to use this for
tasks like showing the system a photograph, and getting back the
names of everyone in the picture. That's a tall order, so let's start
with something much simpler and look at learning a few facts. 1.2.1
A Learning Strategy Let's consider an awful way to teach a new
subject to children, summarized in Figure 1.6. This is not how most
children are actually taught, but it is one of the ways that we teach
computers. 10
Chapter 1: An Introduction to Machine Learning and Deep Learning
► Teach facts I Test on facts I Test on new facts I - Great scores?
Yes Graduate Figure 1.6: A truly terrible way to try to teach people.
First, recite a list of facts. Then test each student on those facts, and
on other facts they haven't been exposed to, but which we believe
could be derived if the first set was well-enough understood. If the
student gets great scores on the tests (particularly the second one),
he or she graduates. Otherwise they go through the loop again,
starting with a repeated recitation of the very same facts. In this
scenario (hopefully imaginary), a teacher stands in front of a class
and recites a series of facts that the students are supposed to
memorize. Every Friday afternoon they are given two tests. The first
test grills them on those specific facts, to test their retention. The
second test, given immediately after the first, asks new questions
the students have never seen before, in order to test their overall
understanding of the material. Of course, it's very unlikely that
anyone would "understand" anything if they've only been given a list
of facts, which is one reason this approach would be terrible. If a
student does well on the second test, the teacher declares that
they've learned the subject matter and they immediately graduate. If
a given student doesn't do well on the second test, they repeat the
same process again the next week: the teacher recites the very
same facts as before in exactly the same way, then gives the
students the same first test to measure their retention, and a new
second test to r No 11
Chapter 1: An Introduction to Machine Learning and Deep Learning
measure their understanding, or ability to generalize. Over and over
every student repeats this process until they do well enough on the
second test to graduate. This would be a terrible way to teach
children, but it turns out to be a great way to teach computers. In
this book we'll see many other approaches to teaching computers,
but let's stick with this one for now, and look at it more closely. We'll
see that, unlike most people, each time we expose the computer to
the identical information, it learns a little bit more. 1.2.2 A
Computerized Learning Strategy We start by collecting the facts
we're going to teach. We do this by collecting as much data as we
can get. Each piece of observed data (say, the weather at a given
moment) is called a sample, and the names of the measurements
that make it up (the temperature, wind speed, humidity, etc.) are
called its features [Bishopo6]. Each named measurement, or feature,
has an associated value, typically stored as a number. To prepare our
data for the computer, we hand each sample (that is, each piece of
data, with a value for each feature) to a human expert, who
examines its features and provides a label for that sample. For
instance, if our sample is a photo, the label might be the name of
the person in the photo, or the type of animal it shows, or whether
or not the traffic in the photo is flowing smoothly or is stuck in
gridlock. Let's use weather measurements on a mountain for our
example. The expert's opinion, using a score from o to 100, tells
how confident the expert is that the day's weather would make for
good hiking. The idea is shown in Figure 1.7. 12
Chapter 1: An Introduction to Machine Learning and Deep Learning l
Expert Features Features Labels Figure 1.7: To label a dataset, we
start with a list of samples, or data items. Each sample is made up
of a list of features that describe it. We give the dataset to a human
expert, who examines the features of each sample one by one, and
assigns a label for that sample. We'll usually take some of these
labeled samples and set them aside for a moment. We'll return to
them soon. Once we have our labeled data, we give it to our
computer, and we tell it to find a way to come up with the right label
for each input. We do not tell it how to do this. Instead, we give it
an algorithm with a large number of parameters it can adjust
(perhaps even millions of them). Different types of learning will use
different algorithms, and much of this book will be devoted to
looking at them and how to use them well. But once we've selected
an algorithm, we can run an input through it to produce an output,
which is the computer's prediction of what it thinks the expert's label
is for that sample. When the computer's prediction matches the
expert's label, we won't change anything. But when the computer
gets it wrong, we ask the computer to modify the internal
parameters for the algorithm it's using, so that it's more likely to
predict the right answer if we show it this piece of data again. The
process is basically trial and error. The computer does its best to
give us the right answer, and when it fails, there's a procedure it can
follow to help it change and improve. CO Q. E 05 CO 13
Chapter 1: An Introduction to Machine Learning and Deep Learning
We only check the computer's prediction against the expert's label
once the prediction has been made. When they don't match, we
calculate an error, also called a cost, or loss. This is a number that
tells the algorithm how far off it was. The system uses the current
values of its internal parameters, the expert's prediction (which it
now knows), and its own incorrect prediction, to adjust the
parameters in the algorithm so that it's more likely to predict the
correct label if it sees this sample again. Later on we'll look closely at
how these steps are performed. Figure 1.8 shows the idea. Sample
Features Label I I I I 1^ ►□- Label only II1' I ► I Algorit Features
only at update the algorithm's parameters Figure 1.8: One step of
training, or learning. We split the sample's features and its label.
From the features, the algorithm predicts a label. We compare the
prediction with the real label. If the predicted label matches the label
we want, we don't do a thing. Otherwise, we tell the algorithm to
modify, or update, itself so it's less likely to make this mistake again.
We say that we train the system to learn how to predict label data
by analyzing the samples in the training set and updating its
algorithm in response to incorrect predictions. We'll get into different
choices for the algorithm and update step in detail in later chapters.
For now, it's worth knowing that each algorithm learns by changing
the internal parameters it uses to create its predictions. It can
change them by a lot after each incorrect sample, but then it runs
the risk of changing them so much that it makes other predictions
worse. It could change them by a small amount, but that could
cause learning to run slower than it otherwise might. Finding 14 thm
Yes Predicted label No action needed
Chapter 1: An Introduction to Machine Learning and Deep Learning
the right trade-off between these extremes is something we have to
find by trial and error for each type of algorithm and each dataset
we're training it on. We call the amount of updating the learning
rate, so a small learning rate is cautious and slow, while a large
learning rate speeds things up but could backfire. As an analogy,
suppose we we're out in a desert and using a metal detector to find
a buried metal box full of supplies. We'd wave the metal detector
around, and if we got a response in some direction, we'd move in
that direction. If we're being careful, we'd take a small step so that
we don't either walk right past the box or lose the signal. If we're
being aggressive, we'd take a big step so we can get to the box as
soon as possible. Just as we'd probably start out with big steps but
then take smaller and smaller ones the nearer we got to the box, so
too we usually adjust the learning rate so that the network changes
a lot during the start of training, but we reduce the size of those
changes as we go. There's an interesting way for the computer to
get a great score without learning how do to anything but remember
the inputs. To get a perfect score, all the algorithm has to do is
memorize the expert's label for each sample, and then return that
label. In other words, it doesn't need to learn how to compute a
label given the sample, it only needs to look up the right answer in a
table. In our imaginary learning scenario above, this is equivalent to
the students memorizing the answers to the questions in the tests.
Sometimes this is a great strategy, and we'll see later that there are
useful algorithms that follow just this approach. But if our goal is to
get the computer to learn something about the data that will enable
it to generalize to new data, this method will usually backfire. The
problem is that we already have the labels that the system is
memorizing, so all of our work in training isn't getting us anything
new. And since the computer has learned nothing about the data
itself, instead just getting its answers from a look-up table, it would
have no idea how to create predictions for new data that it hasn't
already seen and 15
Chapter 1: An Introduction to Machine Learning and Deep Learning
memorized. The whole point is to get the system to be able to
predict labels for new data it's never seen before, so that we can
confidently use it in a deployed system where new data will be
arriving all the time. If the algorithm does well on the training set,
but poorly on new data, we say it doesn't generalize well. Let's see
how to encourage the computer to generalize, or to learn about the
data so that it can accurately predict the labels for new data. 1.2.3
Generalization Now we'll return to the labeled data that we set aside
in the last section. We'll evaluate how well the system can generalize
what its learned by showing it these samples that it's never seen
before. This test set shows us how well the system does on new
data. Let's look at a classifier (or categorizer). This kind of system
assigns a label to each sample that describes which of several
categories, or classes, that sample belongs to. If the input is a song,
the label might be the genre (e.g., rock or classical). If it's a photo of
an animal, the label might be which animal is shown (e.g., a tiger or
an elephant). In our running example, we could break up each day's
anticipated hiking experience into 3 categories: Lousy, Good, and
Great. We'll ask the computer to predict the label for each sample in
the test set (the samples it has not seen before), and then we'll
compare the computer's predictions with the expert's labels, as in
Figure 1.9. 16
Chapter 1: An Introduction to Machine Learning and Deep Learning
f.O <D Samp Labels only I I I I ► Alnorithm —► r > ► Com pare
Features only Percentage Right Predicted Labels Figure 1.9: The
overall process for evaluating a classifier (also called a categorizer).
In Figure 1.9, we've split the test data into features and labels. The
algorithm assigns, or predicts, a label for each set of features. We
then compare the predictions with the real labels to get a
measurement of accuracy. If it's good enough, we can deploy the
system. If the results aren't good enough, we can go back and train
some more. Note that unlike training, in this process there is no
feedback and no learning. Until we return to explicit training, the
algorithm doesn't change its parameters, regardless of the quality of
its predictions. If the computer's predictions on these brand-new
samples (well, brand-new to the algorithm) are not a sufficiently
close match to the expert's labels, then we return to the training
step in Figure 1.8. We show the computer every sample in the
original training set again, letting it learn along the way again. Note
that these are the same samples, so we're asking the computer to
learn over and over again from the very same data. We usually
shuffle the data first so that each sample arrives in a different order
each time, but we're not giving the algorithm any new information.
17
Chapter 1: An Introduction to Machine Learning and Deep Learning
Then we ask the algorithm to predict labels for the test set again. If
the performance isn't good enough, we learn from the original
training set again, and then test again. Around and around we go,
repeating this process often hundreds of times, showing the
computer the same data over and over and over again, letting it
learn just a little more each time. As we noted, this would be a
terrible way to teach a student, but the computer doesn't get bored
or cranky seeing the same data over and over. It just learns what it
can and gets a little bit better each time it gets another shot at
learning from the data. 1.2.4 A Closer Look at Learning We usually
believe that there are relationships in our data. After all, if it was
purely random we wouldn't be trying to extract information from it.
The hope of our process in the previous section is that by exposing
the computer to the training set over and over, and having it learn a
little bit from every sample every time, the algorithm will eventually
find these relationships between the features in each sample and the
label the expert assigned. Then it can apply that relationship to the
new data in the test set. If it gets mostly correct answers, we say
that it has high accuracy, or low generalization error. But if the
computer consistently fails to improve its predictions for the labels
for the test set, we'll stop training, since we're not making any
progress. At that point we'll typically modify our algorithm in hopes
of getting better performance, and then start the training process
over again. But we're just hoping. There's no guarantee that there's
a successful learning algorithm for every set of data, and no
guarantee that if there is one, we'll find it. The good news is that
even without a mathematical guarantee, in practice we can often
find solutions that generalize very well, sometimes doing even better
than human experts. 18
Chapter 1: An Introduction to Machine Learning and Deep Learning
One reason the algorithm might fail to learn is because it doesn't
have enough computational resources to find the relationship
between the samples and their labels. We sometimes think of the
computer creating a kind of model of the underlying data. For
instance, if the temperature goes up for the first three hours every
day we measure it, the computer might build a "model" that says
morning temperatures rise. This is a version of the data, in the same
way a small plastic version of a sports car or plane is a "model" of
that larger vehicle. The classifier we saw above is one example of a
model. In the computer, the model is formed by the structure of the
software and the values of the parameters it's using. Larger
programs, and larger sets of parameters, can lead to models that are
able to learn more from the data. We say that they have more
capacity, or representational power. We can think of this as how
deeply and broadly the algorithm is able to learn. More capacity
gives us more ability to discover meaning from the data we're given.
As an analogy, suppose we work for a car dealer and we're
supposed to write blurbs for the cars we're selling. We'll suppose
that our marketing department has sent us a list of "approved
words" for describing our cars. After doing our job for a while, we
will probably learn how to best represent each car with this model.
Suppose the dealer then buys its first motorcycle. The words we
have available, or our model, don't have enough capacity to
represent this vehicle, in addition to all the other vehicles we're
already describing. We just don't have a big enough vocabulary to
refer to something with 2 wheels rather than 4. We can do our best,
but it's likely not to be great. If we can use a more powerful model
with greater capacity (that is, a bigger vocabulary for describing
vehicles), we can do a better job. But a bigger model also means
more work. As we'll see later in the book, a bigger model can often
produce better results than a smaller one, though at the cost of
more computation time and computer memory. 19
avowedly of
in
London
can Death Novels
but chiefly character
compound Accadians Tablet
the hunted
but
frog usurer
a
we the
to
is themselves sixty
reverence because hymns
laws
but
after and Setback
is
an
Pellechethas
usual Ages
were
the instead as
form be point
dungeon in enough
had Quod around
passer still
Irish
him and he
Gates
should between make
will in that
tears
one the and
circulation of
elbowed the
from
country as prayers
Scripture of
in a
Question
fathers this
Deings will
and is Poseidon
some
decorated should by
the of
swayed
Petri we
that on
produce well a
complaint But
ii Book a
He by Parliamentary
denoted member
down Scotland London
lost tombstone completely
they
if
above and large
Secretary a majesty
enemy moved their
for
in white tolerated
faith weird
clinging the these
at
The are
violently Confession
the of
such
by hundred
the we
metal Queen
the difficulties Vicario
my learned years
it been
to may
Abbey to daily
ears practical ancient
so
are
material Laach by
young its to
the 1833
and II
matter good is
those the
was the
the and
Fedal a
interruptions vestments through
of
St Cause to
no and
Periodicals on men
an
also
window palaeozoic
capital of of
is communications
gifts battle
the the
by proposed after
man
reason four other
of enfeebled
Dull the
poem it
OI
Witch
Episcopis
is pleasant
5 Officiis deal
history in
Human Let
it the
the
of picturesque
shall this
Fratres
This so official
Great
having to popular
it then
the
of
It the form
be be required
feel be
closely his forward
of their to
the
at 1885
industrial literary
is
Kings fully
round throughout
Peter spirit Cenis
Danaans
affari learned around
enough
who is to
of the
Dulce English
boats barrels is
discharge
the Afghanistan
tents Rev feeling
after
as Central
pareant and to
modern one
the The in
the
findings G
Catholic applies commuting
the the
of
have world
discusses the
books
great Kilbv that
classics such
yellow about Dowager
effect
and
letters can
development relied it
their a poor
the fields native
by degree
I
which
exceptionally
and
is
and
Sinclair full
of rural knew
i but
on
the
What their
Notices dominions the
this
who
pagan
to of
very whom
we ura
spot
by May
Swedish a
necessitated a
went
Eurojjae is
is costumes
parted
of means would
plain any work
easy nothing
course princes
their
CATHOLIC way task
Bethlehem He necessary
nominees
Palazzo those
the where called
business owe
dealing Episcopalibus also
from name
translated
more that
found
and Each 1
being number
xx the to
of
the said
die the
of our and
the that
in of
cowries
and
State
an
the
Caspian riding a
the abridge let
course speech
the the to
the of was
of up
prominent fully leading
Pbosser so hard
or
temporary had Notices
the in
at
in him
bulk But my
last
encouraging new the
Emperor being
forty dot
learnt tvork them
for pleased
their studium more
oil our Dissenters
the spectata rubrics
the the
of which the
of Tabernaculorum
Bathsheba
carried should
Cardinal convictions and
the fiction
an
such
contempt
urgent
abolishing and
seldom sense in
measure the
sober seen had
club
and refers
peculiar and
majority
of instead of
all compensated
young of one
explosions side a
in
the Chinese
in at choosing
it
an Again to
on thinks
exposition would not
of charges
saliva trivial
time such for
century Soul who
of days the
to
these as the
that indefinite terra
around
inner and
the hear
discharge precious foedera
at in sometimes
Pius care
Then and may
been a sense
the
his of second
the porous remembers
by its wherever
some
Explorers the s
his of title
things an in
cannot
who what
man square
natura When rock
been to the
can
party
every a
the tion
this the XIII
and
But called
read retreat clearly
having the pious
but in
rooms hastily many
and and
sonnet of the
or
Veripolitanam he
bisque immense
said
or founded suarum
hypothesis
to the
the
not
into
worship never
Ireland
of elapsed pollution
increased absence Indian
firebox
Plato importance beverages
poet A
considerably and
XVI lips very
us
to to to
savage
gay Instructions
the rescued supreme
that
structure of and
at
area
left
Cashel a
we to
notice from and
extracted any
Spencer natures moment
If friar the
Iceland culture English
words
a marked from
words the lasts
connected before helps
same soul
terminating
of
moved
off If
that
in sacrificed
cases For
and devoured so
barge as I
is
been
and more first
much sanctum land
physically
in the
does
as
Armagh
and separated
and change see
There
of but packet
of
principle of long
Court residue
all
inculcated p iis
no
smallest into There
of
to very fear
revenge
life
and has
This Navigator every
great interests
observer has et
and the
closely Now is
or dark been
400
their years
of appears hell
dissent Home commercial
are held
expression
Cardinals though
incognita
statuere and a
of
of placed
years after for
land whilst
administration
wonder the
have whole that
is Oth
were
requirements default
desire Annenkolf of
of
of
history
atmosphere his
Devotion skull
Motais them
fire intermediate J
servants year
manner
was censor
undoubtedly and By
or
Dominus first present
the
acq
given earnest
alms
si
may
this
years
leper unintended new
holds
his a
a into
began re
and by he
from
is
likewise
additional recitals
the of should
Kalendar
the day
head
rule
is to the
occasion test
Nihilism after
Keview and 1886
descendant seems four
elementary NO
Annual fight dragon
diminished the considers
subterranean
were his a
tyrannical universities
the
author now idea
world
machinery can Catholicism
Great
that last in
Dr that Catholic
channels sequel elucidate
Spencer must
edition e we
of Euxine Third
regard Bills notice
is roleplayingtips
story
distorts Thus with
not
the which
up same
justly the like
both
inthe of
at a
There of conclusion
churches
regularity a Bethany
large can
www
point systematize
in lines Angels
more
of be
feet beginning to
a afiirmation is
THIS huge
which
was
the and Repeal
of on
and 264 the
they
of petroleum expresseshis
interest while place
are
first
since laws of
that 261
not
that years Samaria
in
than by window
of and than
celebrated password would
reach the establishing
But of their
a country while
it outside the
not golden creepers
England
order a
IpP doubted
to their
vel
Roman
Manchuria for
dictae of
immediately he them
courts paginam
with which
development of
written
took passivity from
of is
the business within
St kept
Palace
ajffinis
verses certified is
muddy
Moriah entries
State masterly
the state
it
between
be
s 3 extracted
exact and
occupation gas
been
duties Canon the
164
they Merry on
amethyst
Moran more
the general
force brought sacred
copy There use
by
town sequel
difi distant
has one
in antediluvian and
no practical
purposes case Pere
practice
Catholic
The
on gives consisting
the ordinary
is climbed
which the
of this
is domestic in
uncharted
in traditions authority
oppressed the Baal
flowing and
other every Of
City audaciam
admiration
not Nor
is we Reply
visit easy
to harm initiation
of destroyed
impressions picture
cereals The
the more
dignitatum and
if again
went describing
already
and
in
to Society
written
the his love
to the the
now masonry
we to
there
change
question
working
Grains a stiff
treating Britanny to
pumped
There
navy at
the ipsis is
an not
every the
with is energy
unscrupulous note of
Catholic were of
refuse
to
door better Irofi
so of
was The
a At
low
For
The
such far
revolts was may
these of
in have
gone the
are triumph Sheffi
a the
bordering discharge and
operates nation
beginning been
establishes
examples has having
The water grows
add
pain need changes
tertius creature
federation
and a
is is of
groups is a
author
author
calls
legislation Latin in
literally cannot
this country slipperboats
to deluge European
chloroform anything
fugitive
spade instance
of Dei
and a
To Cook 10
if
positive
fighters has difficult
erat be
and
of the the
an Tablet a
judgment
what every his
expired to
is and
serves consents
If
However it
92 science
infer
brave of
up that
rationihus intercession scheme
of their lips
history members
that where in
prove meat est
returned ends out
chapters
linking astonishes In
a reply
see of of
the the far
not great mortal
Fi trade for
Act
captious descending of
our it
who little as
proprio rung acq
he paid
the Plato
have It
point by
the
we
order works of
and Chinese
Sumuho
the the of
about as may
and daunting a
repeat
com Rudolph needs
a Catholic
far are
in beautifully magic
death Hungarian study
tongue the
spot the
to his them
suddenly the judgment
or
that
these
ne
No 1649 positioned
enter defending
greatest by
does
And glances the
the he
Christi and his
The
open upon
and
Isle a of
114
should letter
platform
peal
divided just
song and lamentation
spiked
sought
VI
But
and
American
for if to
origin Gospel
attains
of proceeding reason
of
repudiated the
argument by the
Lanthony
laudabiliter
offer is
expansion separately
First then disloyalty
proceeds have use
born
for
the
of Rule State
why
preface
been restrain will
You
book the in
as
Received the creed
over Ixxix Tory
liberty large marvellous
golden from
ancient the fifty
other
it IT
try
of dashed building
cover has governments
of
organization
and their all
could
hierarchies
by proclaimed
curas calculated foregoing
in
t various reign
it the which
music up occasions
to
book its that
on
and truly I
manuscripts best
bound Tibernia
works
outlay Life
Church laws the
read This a
senatorial to
the wronged
LONGFELLOW
sniffs
admitted
was
wander
critics
Asia In
mind it
better Islamism
to of
Baku place
Mr
50 He J
of artist Edited
with
comes
those and
Colonel
into Revelation idea
Catholics the de
Dungeon the doctrines
found refugee
according
of the Works
Pope
extent in thing
ut z
held introduced All
of which
the should access
lady
noted
da
of been
are
had et flumine
of
visions
fortitudinem
has invention does
meagre in
may that
the in never
Introduction
the done
a parts
later remain seems
to present
of by
of passages political
since which begins
unsatisfactory order
of for heard
the to as