0% found this document useful (0 votes)
201 views155 pages

Deep Learning Vol 1 From Basics To Practice Andrew Glassner Instant Download

Deep Learning Vol 1 From Basics to Practice by Andrew Glassner is a comprehensive textbook covering fundamental concepts and practices in deep learning. It includes various chapters on topics such as supervised and unsupervised learning, probability, classification, and neural networks, along with practical applications and statistical methods. The book is highly rated and available for download in PDF format.

Uploaded by

tqgwxvmp3490
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
201 views155 pages

Deep Learning Vol 1 From Basics To Practice Andrew Glassner Instant Download

Deep Learning Vol 1 From Basics to Practice by Andrew Glassner is a comprehensive textbook covering fundamental concepts and practices in deep learning. It includes various chapters on topics such as supervised and unsupervised learning, probability, classification, and neural networks, along with practical applications and statistical methods. The book is highly rated and available for download in PDF format.

Uploaded by

tqgwxvmp3490
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 155

Deep Learning Vol 1 From Basics to Practice

Andrew Glassner pdf download

https://textbookfull.com/product/deep-learning-vol-1-from-basics-to-practice-andrew-glassner-2/

★★★★★ 4.7/5.0 (21 reviews) ✓ 195 downloads ■ TOP RATED


"Amazing book, clear text and perfect formatting!" - John R.

DOWNLOAD EBOOK
Deep Learning Vol 1 From Basics to Practice Andrew Glassner
pdf download

TEXTBOOK EBOOK TEXTBOOK FULL

Available Formats

■ PDF eBook Study Guide TextBook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Collection Highlights

Deep Learning Vol 1 From Basics to Practice Andrew


Glassner

Deep Learning Vol 2 From Basics to Practice Andrew


Glassner

Deep Learning from Scratch 1 / converted Edition Seth


Weidman

Programming Machine Learning From Coding to Deep Learning


1st Edition Paolo Perrotta
From Deep Learning to Rational Machines [converted]
Cameron J. Buckner

From Deep Learning to Rational Machines 1st Edition


Cameron J. Buckner

COVID 19 From Basics To Clinical Practice 1st Edition


Wenhong Zhang

Introduction to Deep Learning From Logical Calculus to


Artificial Intelligence 1st Edition Sandro Skansi

Processing for Visual Artists: How to Create Expressive


Images and Interactive Art 1st Edition Andrew Glassner
Volume, 1 DEEP LEARNING: From Basics to Practice Andrew
Glassner

Deep Learning: From Basics to Practice Volume l Copyright (c) 2018


by Andrew Glassner www.glassner.com / @AndrewGlassner All rights
reserved. No part of this book, except as noted below, maybe
reproduced, stored in a retrieval system, or transmitted in any form
or by any means, without the prior written permission of the author,
except in the case of brief quotations embedded in critical articles or
reviews. The above reservation of rights does not apply to the
program files associated with this book (available on GitHub), or to
the images and figures (also available on GitHub), which are
released under the MIT license. Any images or figures that are not
original to the author retain their original copyrights and protections,
as noted in the book and on the web pages where the images are
provided. All software in this book, or in its associated repositories,
is provided "as is," without warranty of any kind, express or implied,
including but not limited to the warranties of merchantability, fitness
for a particular pupose, and noninfringement. In no event shall the
authors or copyright holders be liable for any claim, damages or
other liability, whether in an action of contract, tort, or otherwise,
arising from, out of or in connection with the software or the use or
other dealings in the software. First published February 20, 2018
Version 1.0.1 March 3, 2018 Version 1.1 March 22, 2018 Published
by The Imaginary Institute, Seattle, WA. http://www.imaginary-
institute.com Contact: [email protected]

For Niko, who's always there with a smile and a wag.

Contents of Both Volumes Volume 1 Preface i Chapter 1: An


Introduction 1 1.1 Why This Chapter Is Here 3 1.1.1 Extracting
Meaning from Data 4 1.1.2 Expert Systems 6 1.2 Learning from
Labeled Data 9 1.2.1 A Learning Strategy 10 1.2.2 A Computerized
Learning Strategy 12 1.2.3 Generalization 16 1.2.4 A Closer Look at
Learning 18 1.3 Supervised Learning 21 1.3.1 Classification 21 1.3.2
Regression 22 1.4 Unsupervised Learning 25 1.4.1 Clustering 25
1.4.2 Noise Reduction 26 1.4.3 Dimensionality Reduction 28 1.5
Generators 32 1.6 Reinforcement Learning 34 1.7 Deep Learning 37
1.8 What's Coming Next 43 References 44 Image credits 45

Chapter 2: Randomness and Basic Statistics 46 2.1 Why This


Chapter Is Here 48 2.2 Random Variables 49 2.2.1 Random Numbers
in Practice 57 2.3 Some Common Distributions 59 2.3.1 The Uniform
Distribution 60 2.3.2 The Normal Distribution 61 2.3.3 The Bernoulli
Distribution 67 2.3.4 The Multinoulli Distribution 69 2.3.5 Expected
Value 70 2.4 Dependence 70 2.4.1 i.i.d. Variables 71 2.5 Sampling
and Replacement 71 2.5.1 Selection With Replacement 73 2.5.2
Selection Without Replacement 74 2.5.3 Making Selections 75 2.6
Bootstrapping 76 2.7 High-Dimensional Spaces 82 2.8 Covariance
and Correlation 85 2.8.1 Covariance 86 2.8.2 Correlation 88 2.9
Anscombe's Quartet 93 References 95

Chapter 3: Probability 97 3.1 Why This Chapter Is Here 99 3.2 Dart


Throwing 100 3.3 Simple Probability 103 3.4 Conditional Probability
104 3.5 Joint Probability 109 3.6 Marginal Probability 114 3.7
Measuring Correctness 115 3.7.1 Classifying Samples 116 3.7.2 The
Confusion Matrix 119 3.7.3 Interpreting the Confusion Matrix 121
3.7.4 When Misclassification Is Okay 126 3.7.5 Accuracy 129 3.7.6
Precision 130 3.7.7 Recall 132 3.7.8 About Precision and Recall 134
3.7.9 Other Measures 137 3.7.10 Using Precision and Recall Together
141 3.7.11 fl Score 143 3.8 Applying the Confusion Matrix 144
References 151

Chapter 4: Bayes Rule 153 4.1 Why This Chapter Is Here 155 4.2
Frequentist and Bayesian Probability 156 4.2.1 The Frequentist
Approach 156 4.2.2 The Bayesian Approach 157 4.2.3 Discussion
158 4.3 Coin Flipping 159 4.4 Is This a Fair Coin? 161 4.4.1
Bayes'Rule 173 4.4.2 Notes on Bayes' Rule 175 4.5 Finding Life Out
There 178 4.6 Repeating Bayes1 Rule 183 4.6.1 The Posterior-Prior
Loop 184 4.6.2 Example: Which Coin Do We Have? 186 4.7 Multiple
Hypotheses 194 References 203 Chapter 5: Curves and Surfaces 205
5.1 Why This Chapter Is Here 207 5.2 Introduction 207 5.3 The
Derivative 210 5.4 The Gradient 222 References 229

Chapter 6: Information Theory 231 6.1 Why This Chapter Is Here


233 6.1.1 Information: One Word, Two Meanings 233 6.2 Surprise
and Context 234 6.2.1 Surprise 234 6.2.2 Context 236 6.3 The Bit as
Unit 237 6.4 Measuring Information 238 6.5 The Size of an Event
240 6.6 Adaptive Codes 241 6.7 Entropy 250 6.8 Cross-Entropy 253
6.8.1 Two Adaptive Codes 253 6.8.2 Mixing Up the Codes 257 6.9 KL
Divergence 260 References 262 Chapter 7: Classification 265 7.1
Why This Chapter Is Here 267 7.2 2D Classification 268 7.2.1 2D
Binary Classification 269 7.3 2D Multi-class classification 275 7.4
Multiclass Binary Categorizing 277 7.4.1 One-Versus-Rest 278 7.4.2
One-Versus-One 280 7.5 Clustering 286

7.6 The Curse of Dimensionality 290 7.6.1 High Dimensional


Weirdness 299 References 307 Chapter 8: Training and Testing 309
8.1 Why This Chapter Is Here 311 8.2 Training 312 8.2.1 Testing the
Performance 314 8.3 Test Data 318 8.4 Validation Data 323 8.5
Cross-Validation 328 8.5.1 k-Fold Cross-Validation 331 8.6 Using the
Results of Testing 334 References 335 Image Credits 336 Chapter 9:
Overfitting and Underfitting 337 9.1 Why This Chapter Is Here 339
9.2 Overfitting and Underfitting 340 9.2.1 Overfitting 340 9.2.2
Underfitting 342 9.3 Overfitting Data 342 9.4 Early Stopping 348 9.5
Regularization 350

9.6 Bias and Variance 352 9.6.1 Matching the Underlying Data 353
9.6.2 High Bias, Low Variance 357 9.6.3 Low Bias, High Variance 359
9.6.4 Comparing Curves 360 9.7 Fitting a Line with Bayes' Rule 363
References 372 Chapter 10: Neurons 374 10.1 Why This Chapter Is
Here 376 10.2 Real Neurons 376 10.3 Artificial Neurons 379 10.3.1
The Perceptron 379 10.3.2 Perceptron History 381 10.3.3 Modern
Artificial Neurons 382 10.4 Summing Up 390 References 390 Chapter
11: Learning and Reasoning 393 11.1 Why This Chapter Is Here 395
11.2 The Steps of Learning 396 11.2.1 Representation 396 11.2.2
Evaluation 400 11.2.3 Optimization 400 11.3 Deduction and
Induction 402 11.4 Deduction 403 11.4.1 Categorical Syllogistic
Fallacies 410

11.5 Induction 415 11.5.1 Inductive Terms in Machine Learning 419


11.5.2 Inductive Fallacies 420 11.6 Combined Reasoning 422 11.6.1
Sherlock Holmes, "Master of Deduction" 424 11.7 Operant
Conditioning 425 References 428 Chapter 12: Data Preparation 431
12.1 Why This Chapter Is Here 433 12.2 Transforming Data 433 12.3
Types of Data 436 12.3.1 One-Hot Encoding 438 12.4 Basic Data
Cleaning 440 12.4.1 Data Cleaning 441 12.4.2 Data Cleaning in
Practice 442 12.5 Normalizing and Standardizing 443 12.5.1
Normalization 444 12.5.2 Standardization 446 12.5.3 Remembering
the Transformation 447 12.5.4 Types of Transformations 448 12.6
Feature Selection 450 12.7 Dimensionality Reduction 451 12.7.1
Principal Component Analysis (PCA) 452 12.7.2 Standardization and
PCA for Images 459 12.8 Transformations 468

12.9 Slice Processing 475 12.9.1 Samplewise Processing 476 12.9.2


Featurewise Processing 477 12.9.3 Elementwise Processing 479
12.10 Cross-Validation Transforms 480 References 486 Image Credits
486 Chapter 13: Classifiers 488 13.1 Why This Chapter Is Here 490
13.2 Types of Classifiers 491 13.3 k-Nearest Neighbors (KNN) 493
13.4 Support Vector Machines (SVMs) 502 13.5 Decision Trees 512
13.5.1 Building Trees 519 13.5.2 Splitting Nodes 525 13.5.3
Controlling Overfitting 528 13.6 Naive Bayes 529 13.7 Discussion
536 References 538 Chapter 14: Ensembles 539 14.1 Why This
Chapter Is Here 541 14.2 Ensembles 542 14.3 Voting 543 14.4
Bagging 544

14.5 Random Forests 547 14.6 ExtraTrees 549 14.7 Boosting 549
References 561 Chapter 15: Scikit-learn 563 15.1 Why This Chapter
Is Here 566 15.2 Introduction 567 15.3 Python Conventions 569
15.4 Estimators 574 15.4.1 Creation 575 15.4.2 Learning with fit()
576 15.4.3 Predicting with predict() 578 15.4.4 decision_function(),
predict_proba() 581 15.5 Clustering 582 15.6 Transformations 587
15.6.1 Inverse Transformations 594 15.7 Data Refinement 598 15.8
Ensembles 601 15.9 Automation 605 15.9.1 Cross-validation 606
15.9.2 Hyperparameter Searching 610 15.9.3 Exhaustive Grid Search
614 15.9.4 Random Grid Search 625 15.9.5 Pipelines 626 15.9.6 The
Decision Boundary 641 15.9.7 Pipelined Transformations 643

15.10 Datasets 647 15.11 Utilities 650 15.12 Wrapping Up 652


References 653 Chapter 16: Feed-Forward Networks 655 16.1 Why
This Chapter Is Here 657 16.2 Neural Network Graphs 658 16.3
Synchronous and Asynchronous Flow 661 16.3.1 The Graph in
Practice 664 16.4 Weight Initialization 664 16.4.1 Initialization 667
References 670 Chapter 17: Activation Functions 672 17.1 Why This
Chapter Is Here 674 17.2 What Activation Functions Do 674 17.2.1
The Form of Activation Functions 679 17.3 Basic Activation Functions
679 17.3.1 Linear Functions 680 17.3.2 The Stair-Step Function 681
17.4 Step Functions 682 17.5 Piecewise Linear Functions 685 17.6
Smooth Functions 690 17.7 Activation Function Gallery 698

17.8 Softmax 699 References 702 Chapter 18: Backpropagation 703


18.1 Why This Chapter Is Here 706 18.1.1 A Word On Subtlety 708
18.2 A Very Slow Way to Learn 709 18.2.1 A Slow Way to Learn 712
18.2.2 A Faster Way to Learn 716 18.3 No Activation Functions for
Now 718 18.4 Neuron Outputs and Network Error 719 18.4.1 Errors
Change Proportionally 720 18.5 A Tiny Neural Network 726 18.6
Step 1: Deltas for the Output Neurons 732 18.7 Step 2: Using Deltas
to Change Weights....745 18.8 Step 3: Other Neuron Deltas 750 18.9
Backprop in Action 758 18.10 Using Activation Functions 765 18.11
The Learning Rate 774 18.11.1 Exploring the Learning Rate 777

18.12 Discussion 787 18.12.1 Backprop In One Place 787 18.12.2


What Backprop Doesn't Do 789 18.12.3 What Backprop Does Do 789
18.12.4 Keeping Neurons Happy 790 18.12.5 Mini-Batches 795
18.12.6 Parallel Updates 796 18.12.7 Why Backprop Is Attractive 797
18.12.8 Backprop Is Not Guaranteed 797 18.12.9 A Little History 798
18.12.10 Digging into the Math 800 References 802 Chapter 19:
Optimizers 805 19.1 Why This Chapter Is Here 807 19.2 Error as
Geometry 807 19.2.1 Minima, Maxima, Plateaus, and Saddles 808
19.2.2 Error as A 2D Curve 814 19.3 Adjusting the Learning Rate
817 19.3.1 Constant-Sized Updates 819 19.3.2 Changing the
Learning Rate Over Time 829 19.3.3 Decay Schedules 832 19.4
Updating Strategies 836 19.4.1 Batch Gradient Descent 837 19.4.2
Stochastic Gradient Descent (SGD) 841 19.4.3 Mini-Batch Gradient
Descent 844 19.5 Gradient Descent Variations 846 19.5.1
Momentum 847 19.5.2 Nesterov Momentum 856 19.5.3 Adagrad 862
19.5.4 Adadelta and RMSprop 864 19.5.5 Adam 866

19.6 Choosing An Optimizer 868 References 870 Volume 2 Chapter


20: Deep Learning 872 20.1 Why This Chapter Is Here 874 20.2
Deep Learning Overview 874 20.2.1 Tensors 878 20.3 Input and
Output Layers 879 20.3.1 Input Layer 879 20.3.2 Output Layer 880
20.4 Deep Learning Layer Survey 881 20.4.1 Fully-Connected Layer
882 20.4.2 Activation Functions 883 20.4.3 Dropout 884 20.4.4
Batch Normalization 887 20.4.5 Convolution 890 20.4.6 Pooling
Layers 892 20.4.7 Recurrent Layers 894 20.4.8 Other Utility Layers
896 20.5 Layer and Symbol Summary 898 20.6 Some Examples 899
20.7 Building A Deep Learner 910 20.7.1 Getting Started 912 20.8
Interpreting Results 913 20.8.1 Satisfactory Explainability 920

References 923 Image credits: 925 Chapter 21: Convolutional Neural


Networks 927 21.1 Why This Chapter Is Here 930 21.2 Introduction
931 21.2.1 The Two Meanings of "Depth" 932 21.2.2 Sum of Scaled
Values 933 21.2.3 Weight Sharing 938 21.2.4 Local Receptive Field
940 21.2.5 The Kernel 943 21.3 Convolution 944 21.3.1 Filters 948
21.3.2 A Fly's-Eye View 953 21.3.3 Hierarchies of Filters 955 21.3.4
Padding 963 21.3.5 Stride 966 21.4 High-Dimensional Convolution
971 21.4.1 Filters with Multiple Channels 975 24.4.2 Striding for
Hierarchies 977 24.5 1D Convolution 979 24.6 1x1 Convolutions 980
24.7 A Convolution Layer 983 24.7.1 Initializing the Filter Weights
984 24.8 Transposed Convolution 985 24.9 An Example Convnet 991
24.9.1 VGG16 996 21.9.2 Looking at the Filters, Part 1 1001 21.9.3
Looking at the Filters, Part 2 1008
21.10 Adversaries 1012 References 1017 Image credits 1022
Chapter 22: Recurrent Neural Networks 1023 22.1 Why This Chapter
Is Here 1025 22.2 Introduction 1027 22.3 State 1030 22.3.1 Using
State 1032 22.4 Structure of an RNN Cell 1037 22.4.1 A Cell with
More State 1042 22.4.2 Interpreting the State Values 1045 22.5
Organizing Inputs 1046 22.6 Training an RNN 1051 22.7 LSTM and
GRU 1054 22.7.1 Gates 1055 22.7.2 LSTM 1060 22.8 RNN Structures
1066 22.8.1 Single or Many Inputs and Outputs 1066 22.8.2 Deep
RNN 1070 22.8.3 Bidirectional RNN 1072 22.8.4 Deep Bidirectional
RNN 1074 22.9 An Example 1076 References 1084

Chapter23: Keras Parti 1090 23.1 Why This Chapter Is Here 1093
23.1.1 The Structure of This Chapter 1094 23.1.2 Notebooks 1094
23.1.3 Python Warnings 1094 23.2 Libraries and Debugging 1095
23.2.1 Versions and Programming Style 1097 23.2.2 Python
Programming and Debugging 1098 23.3 Overview 1100 23.3.1
What's a Model? 1101 23.3.2 Tensors and Arrays 1102 23.3.3 Setting
Up Keras 1102 23.3.4 Shapes of Tensors Holding Images 1104
23.3.5 GPUs and Other Accelerators 1108 23.4 Getting Started 1109
23.4.1 Hello, World 1110 23.5 Preparing the Data 1114 23.5.1
Reshaping 1115 23.5.2 Loading the Data 1126 23.5.3 Looking at the
Data 1129 23.5.4 Train-test Splitting 1136 23.5.5 Fixing the Data
Type 1138 23.5.6 Normalizing the Data 1139 23.5.7 Fixing the
Labels 1142 23.5.8 Pre-Processing All in One Place 1148 23.6 Making
the Model 1150 23.6.1 Turning Grids into Lists 1152 23.6.2 Creating
the Model 1154 23.6.3 Compiling the Model 1163 23.6.4 Model
Creation Summary 1167

23.7 Training The Model 1169 23.8 Training and Using A Model 1172
23.8.1 Looking at the Output 1174 23.8.2 Prediction 1180 23.8.3
Analysis of Training History 1186 23.9 Saving and Loading 1190
23.9.1 Saving Everything in One File 1190 23.9.2 Saving Just the
Weights 1191 23.9.3 Saving Just the Architecture 1192 23.9.4 Using
Pre-Trained Models 1193 23.9.5 Saving the Pre-Processing Steps
1194 23.10 Callbacks 1195 23.10.1 Checkpoints 1196 23.10.2
Learning Rate 1200 23.10.3 Early Stopping 1201 References 1205
Image Credits 1208 Chapter 24: Keras Part 2 1209 24.1 Why This
Chapter Is Here 1212 24.2 Improving the Model 1212 24.2.1
Counting Up Hyperparameters 1213 24.2.2 Changing One
Hyperparameter 1214 24.2.3 Other Ways to Improve 1218 24.2.4
Adding Another Dense Layer 1219 24.2.5 Less Is More 1221 24.2.6
Adding Dropout 1224 24.2.7 Observations 1230

24.3 Using Scikit-Learn 1231 24.3.1 Keras Wrappers 1232 24.3.2


Cross-Validation 1237 24.3.3 Cross-Validation with Normalization
1243 24.3.4 Hyperparameter Searching 1247 24.4 Convolution
Networks 1259 24.4.1 Utility Layers 1260 24.4.2 Preparing the Data
for A CNN 1263 24.4.3 Convolution Layers 1268 24.4.4 Using
Convolution for MNIST 1276 24.4.5 Patterns 1290 24.4.6 Image
Data Augmentation 1293 24.4.7 Synthetic Data 1298 24.4.8
Parameter Searching for Convnets 1300 24.5 RNNs 1301 24.5.1
Generating Sequence Data 1302 24.5.2 RNN Data Preparation 1306
24.5.3 Building and Training an RNN 1314 24.5.4 Analyzing RNN
Performance 1320 24.5.5 A More Complex Dataset 1330 24.5.6 Deep
RNNs 1334 24.5.7 The Value of More Data 1338 24.5.8 Returning
Sequences 1343 24.5.9 Stateful RNNs 1349 24.5.10 Time-Distributed
Layers 1352 24.5.11 Generating Text 1357 24.6 The Functional API
1366 24.6.1 Input Layers 1370 24.6.2 Making A Functional Model
1371 References 1378 Image Credits 1379

Chapter 25: Autoencoders 1380 25.1 Why This Chapter Is Here 1382
25.2 Introduction 1383 25.2.1 Lossless and Lossy Encoding 1384
25.2.2 Domain Encoding 1386 25.2.3 Blending Representations 1388
25.3 The Simplest Autoencoder 1393 25.4 A Better Autoencoder
1400 25.5 Exploring the Autoencoder 1405 25.5.1 A Closer Look at
the Latent Variables 1405 25.5.2 The Parameter Space 1409 25.5.3
Blending Latent Variables 1415 25.5.4 Predicting from Novel Input
1418 25.6 Discussion 1419 25.7 Convolutional Autoencoders 1420
25.7.1 Blending Latent Variables 1424 25.7.2 Predicting from Novel
Input 1426 26.8 Denoising 1427 25.9 Variational Autoencoders 1430
25.9.1 Distribution of Latent Variables 1432 25.9.2 Variational
Autoencoder Structure 1433 29.10 Exploring the VAE 1442
References 1455 Image credits 1457

Chapter 26: Reinforcement Learning 1458 26.1 Why This Chapter Is


Here 1461 26.2 Goals 1462 26.2.1 Learning A New Game 1463 26.3
The Structure of RL 1469 26.3.1 Step 1: The Agent Selects an Action
1471 26.3.2 Step 2: The Environment Responds 1473 26.3.3 Step 3:
The Agent Updates Itself 1475 26.3.4 Variations on The Simple
Version 1476 26.3.5 Back to the Big Picture 1478 26.3.6 Saving
Experience 1480 26.3.7 Rewards 1481 26.4 Flippers 1490 26.5 L-
learning 1492 26.5.1 Handling Unpredictability 1505 26.6 Q-learning
1509 26.6.1 Q-values and Updates 1510 26.6.2 Q-Learning Policy
1514 26.6.3 Putting It All Together 1518 26.6.4 The Elephant in the
Room 1519 26.6.5 Q-Iearning in Action 1521 26.7 SARSA 1532
26.7.1 SARSA in Action 1535 26.7.2 Comparing Q-learning and
SARSA 1543 26.8 The Big Picture 1548 26.9 Experience Replay 1550

26.10 Two Applications 1551 References 1554 Chapter 27:


Generative Adversarial Networks... 1558 27.1 Why This Chapter Is
Here 1560 27.2 A Metaphor: Forging Money 1562 27.2.1 Learning
from Experience 1566 27.2.2 Forging with Neural Networks 1569
27.2.3 A Learning Round 1572 27.3 Why Antagonistic? 1574 27.4
Implementing GANs 1575 27.4.1 The Discriminator 1576 27.4.2 The
Generator 1577 27.4.3 Training the GAN 1578 27.4.4 Playing the
Game 1581 27.5 GANs in Action 1582 27.6 DCGANs 1591 27.6.1
Rules of Thumb 1595 27.7 Challenges 1596 27.7.1 Using Big Samples
1597 27.7.2 Modal Collapse 1598 References 1600

Chapter 28: Creative Applications 1603 28.1 Why This Chapter Is


Here 1605 28.2 Visualizing Filters 1605 28.2.1 Picking A Network
1605 28.2.2 Visualizing One Filter 1607 28.2.3 Visualizing One Layer
1610 23.3 Deep Dreaming 1613 28.4 Neural Style Transfer 1620
28.4.1 Capturing Style in a Matrix 1621 28.4.2 The Big Picture 1623
28.4.3 Content Loss 1624 28.4.4 Style Loss 1628 28.4.5 Performing
Style Transfer 1633 28.4.6 Discussion 1640 28.5 Generating More of
This Book 1642 References 1644 Image Credits 1646 Chapter 29:
Datasets 1648 29.1 Public Datasets 1650 29.2 MNISTand Fashion-
MNIST 1651 29.3 Built-in Library Datasets 1652 29.3.1 scikit-learn
1652 29.3.2 Keras 1653 29.4 Curated Dataset Collections 1654 29.5
Some Newer Datasets 1655

Chapter 30: Glossary 1658 About The Glossary 1660 0-9 1660 Greek
Letters 1661 A 1662 B 1665 C 1670 D 1678 E 1685 F 1688 G 1694 H
1697 1 1699 J 1703 K 1703 L 1704 M 1708 N 1713 O 1717 P 1719 Q
1725 R 1725 S 1729 T 1738 U 1741 V 1743 W 1745 X 1746 Z 1746

Ttefac& Welcome! A few quick words of introduction to the book,


how to get the notebooks and figures, and thanks to the people who
helped me.

Preface What You'll Get from This Book Hello! If you're interested in
deep learning (DL) and machine learning (ML), then there's good
stuff for you in this book. My goal in this book is to give you the
broad skills to be an effective practitioner of machine learning and
deep learning. When you've read this book, you will be able to: •
Design and train your own deep networks. • Use your networks to
understand your data, or make new data. • Assign descriptive
categories to text, images, and other types of data. • Predict the
next value for a sequence of data. • Investigate the structure of your
data. • Process your data for maximum efficiency. • Use any
programming language and DL library you like. • Understand new
papers and ideas, and put them into practice. • Enjoy talking about
deep learning with other people. We'll take a serious but friendly
approach, supported by tons of illustrations. And we'll do it all
without any code, and without any math beyond multiplication. If
that sounds good to you, welcome aboard! ii

Preface Who This Book Is For This book is designed for people who
want to use machine learning and deep learning in their own work.
This includes programmers, artists, engineers, scientists, executives,
musicians, doctors, and anyone else who wants to work with large
amounts of information to extract meaning from it, or generate new
data. Many of the tools of machine learning, and deep learning in
particular, are embodied in multiple free, open-source libraries that
anyone can immediately download and use. Even though these tools
are free and easy to install, they still require significant technical
knowledge to use them properly. It's easy to ask the computer to do
something nonsensical, and it will happily do it, giving us back more
nonsense as output. This kind of thing happens all the time. Though
machine learning and deep learning libraries are powerful, they're
not yet user friendly. Choosing the right algorithms, and then
applying them properly, still requires a stream of technically
informed decisions. When things often don't go as planned, we need
to use our knowledge of what's going on inside the system in order
to fix it. There are multiple approaches to learning and mastering
this essential information, depending on how you like to learn. Some
people like hardcore, detailed algorithm analysis, supported by
extensive mathematics. If that's how you like to learn, there are
great books out there that offer this style of presentation [Bishopo6]
[Goodfellowi7]. This approach requires intensive effort, but pays off
with a thorough understanding of how and why the machinery
works. If you start this way, then you have to put in another chunk
of work to translate that theoretical knowledge into contemporary
practice. iii

Preface At the other extreme, some people just want to know how
to do some particular task. There are great books that take this
cookbook approach for various machine-learning libraries [Cholleti7]
[Muller-Guidoi6] [Raschkais] [VanderPlasi6]. This approach is easier
than the mathematically intensive route, but you can feel like you're
missing the structural information that explains why things work as
they do. Without that information, and its vocabulary, it can be hard
to work out why something that you think ought to work doesn't
work, or why something doesn't work as well as you thought it
should. It can also be challenging to read the literature describing
new ideas and results, because those discussions usually assume a
shared body of underlying knowledge that an approach based on a
single library or language doesn't provide. This book takes a middle
road. My purpose is practical: to give you the tools to practice deep
learning with confidence. I want you to make wise choices as you do
your work, and be able to follow the flood of exciting new ideas
appearing almost every day. My goal here is to cover the
fundamentals just deeply enough to give you a broad base of
support. I want you to have enough background not just for the
topics in this book, but also the materials you're likely to need to
consult and read as you actually do deep learning work. This is not a
book about programming. Programming is important, but it
inevitably involves all kinds of details that are irrelevant to our larger
subject. And programming examples lock us into one library, or one
language. While such details are necessary to building final systems,
they can be distracting when we're trying to focus in the big ideas.
Rather than get waylaid by discussions of loops and indices and data
structures, we discuss everything here in a language and library
independent way. Once you have the ideas firmly in place, reading
the documentation for any library will be a straightforward affair. We
do put our feet on the ground in Chapters 15, 23, and 24, when we
discuss the scikit-learn library for machine learning, and the Keras
library for deep learning. These libraries are both Python based. In iv

Preface those chapters we dive into the details of those Python


libraries, and include plenty of example code. Even if you're not into
Python, these programs will give you a sense for typical workflows
and program structures, which can help show how to attack a new
problem. The code in those programming chapters is available in
Python notebooks. These are for use with the browser-based Jupyter
programming environment [Jupyten6]. Alternatively, you can use a
more classical Python development environment, such as PyCharm
[JetBrainsi7]. Most of the other chapters also have supporting,
optional Python notebooks. These give the code for every computer-
generated figure in the book, often using the techniques discussed
in that chapter. Because we're not really focusing on Python and
programming (except for the chapters mentioned above), these
notebooks are meant as a "behind the scenes" look, and are only
lightly commented. Machine learning, deep learning, and big data
are having an unexpectedly rapid and profound influence on
societies around the world. What this means for people and cultures
is a complicated and important subject. Some interesting books and
articles tackle the topic head-on, often coming to subtle mixtures of
positive and negative conclusions [Aguera y Areas 17] [Barrati.5]
[Domingosis] [Kaplani6]. V

Preface Almost No Math Lots of smart people are not fans of


complicated equations. If that's you, then you're right at home!
There's just about no math in this book. If you're comfortable with
multiplication, you're set, because that's as mathematical as we get.
Many of the algorithms we'll discuss are based on rich sources of
theory and are the result of careful analysis and development. It's
important to know that stuff if you're modifying the algorithm for
some new purpose, or writing your own implementation. But in
practice, just about everyone uses highly optimized implementations
written by experts, available in free and open-source libraries. Our
goals are to understand the principles of these techniques, how to
apply them properly, and how to interpret the results. None of that
requires us to get into the mathematical structure that's under the
hood. If you love math, or you want to see the theory, follow the
references in each chapter. Much of this material is elegant and
intellectually stimulating, and provides details that I have
deliberately omitted from this book. But if math isn't your thing,
there's no need to get into it. Lots of Figures Some ideas are more
clearly communicated with pictures than with words. And even when
words do the job, a picture can help cement the ideas. So this book
is profusely illustrated with original figures. All of the figures in this
book are available for free download (see below). vi

Preface Downloads You can download the Jupyter/Python notebooks


for this book, all the figures, and other files related to this book, all
for free. All the Notebooks All of the Jupyter/Python notebooks for
this book are available on GitHub. The notebooks for Chapter 15
(scikit-learn) and Chapters 23 and 24 (Keras) contain all the code
that's presented in those chapters. The other notebooks are
available as a kind of "behind the scenes" look at how the book's
figures were made. They're lightly documented, and meant to serve
more as references than tutorials. The notebooks are released under
the MIT license, which basically means that you're free to use them
for any purpose. There are no promises of any kind that the code is
free of bugs, that it will run properly, that it won't crash, and so on.
Feel free to grab the code and adapt it as you see fit, though as the
license says, keep the copyright notice around (it's in the file named
simply LICENSE).
https://github.com/blueberrymusic/DeepLearningBookCode-Volumel
https://gi thub.com/blueberrymusi c/DeepLearni ngBookCode-
Volume2 All the Figures All of the figures in this book are available
on GitHub as high-resolution PNG files. You're free to use them in
classes, talks, lectures, reports, papers, even other books. Like the
code, the figures are provided under the MIT license, so you can use
them as you like as long as you keep the copyright notice around.
You don't have to credit me as their creator when you use these
figures, but I'd appreciate it if you would. vii

Preface The filenames match the figure numbers in the book, so


they're easy to find. When you're looking for something visually, it
may be helpful to look at the thumbnail pages. These hold 20
images each:
https://github.com/blueberrymusic/DeepLearningBookFigures-
Thumbnails The figures themselves are grouped into the two
volumes:
https://github.com/blueberrymusic/DeepLearningBookFigures-
Volumel
https://github.com/blueberrymusic/DeepLearningBookFigures-
Volume2 Resources The resources directory contains other files,
such as a template for the deep learning icons we use later in the
book. https://gi thub.com/blueberrymusi c/DeepLearni ngBook-
Resources Errata Despite my best efforts, no book of this size is
going to be free of errors. If you spot something that seems wrong,
please let me know at [email protected]. I'll keep a list of errata
on the book's website at https://dlbasics.com. Two Volumes This
ended up as a large book, so I've organized it into two volumes of
roughly equal size. Because the book is cumulative, the second
volume picks up where the first leaves off. If you're reading the
second volume now, you should have already read the first volume,
or feel confident that you understand the material presented there.
viii

Preface Thank You! Authors like to say that nobody writes a book
alone. We say that because it's true. For their consistent and
enthusiastic support of this project, and helping me feel good about
it all the way through, I am enormously grateful to Eric Braun, Eric
Haines, Steven Drucker, and Tom Reike. Thank you for your
friendship and encouragement. Huge thanks are due to my
reviewers, whose generous and insightful comments greatly
improved this book: Adam Finkelstein, Alex Colburn, Alexander
Keller, Alyn Rockwood, Angelo Pesce, Barbara Mones, Brian Wyvill,
Craig Kaplan, Doug Roble, Eric Braun, Eric Haines, Greg Turk, Jeff
Hultquist, Jessica Hodgins, Kristi Morton, Lesley Istead, Luis
Avarado, Matt Pharr, Mike Tyka, Morgan McGuire, Paul Beardsley,
Paul Strauss, Peter Shirley, Philipp Slusallek, Serban Porumbescu,
Stefanus Du Toit, Steven Drucker, Wenhao Yu, and Zackory Erickson.
Special thanks to super reviewers Alexander Keller, Eric Haines,
Jessica Hodgins, and Luis Avarado, who read all or most of the
manuscript and offered terrific feedback on both presentation and
structure. Thanks to Morgan McGuire for Markdeep, which enabled
me to focus on what I was saying, rather than the mechanics of how
to format it. It made writing this book a remarkably smooth and fluid
process. Thanks to Todd Szymanski for insightful advice on the
design and layout of the book's contents and covers, and for
catching layout errors. Thanks to early readers who caught typos
and other problems: Christian Forfang, David Pol, Eric Haines, Gopi
Meenakshisundaram, Kostya Smolenskiy, Mauricio Vives, Mike Wong,
and Mrinal Mohit. All of these people improved the book, but the
final decisions were my own. Any problems that remain are my
responsibility. ix
Preface References This section appears in every chapter. It contains
references to all the documents that are referred to in the body of
the chapter. There may also be other useful papers, websites,
documentation, blogs, and other resources. Whenever possible, I've
preferred to use references that are available online, so you can
immediately access them using the provided link. The exceptions are
usually books, but occasionally I'll include an important online
reference even if it's behind a paywall. [Agiiera y Areas 17] Blaise
Agiiera y Areas, Margaret Mitchell and Alexander Todorov,
"Physiognomy's New Clothes", Medium, 2017.
https://medium.eom/@blaisea/ physiognomys-new-clothes-
f2d4b59fdd6a [Barratis] James Barrat, "Our Final Invention: Artificial
Intelligence and the End of the Human Era", St. Martin's Griffin,
2015. [Bishopo6] Christopher M. Bishop, "Pattern Recognition and
Machine Learning", Springer-Verlag, pp. 149-152, 2006. [Cholleti7]
Francois Chollet, "Deep Learning with Python", Manning Publications,
2017. [Domingosi5] Pedro Domingos, "The Master Algorithm", Basic
Books, 2015. [Goodfellowi7] Ian Goodfellow, Yoshua Bengio, Aaron
Courville, "Deep Learning", MIT Press, 2017.
http://www.deeplearning- book.org/ [JetBrainsi7] Jet Brains,
"Pycharm Community Edition IDE", 2017.
https://www.jetbrains.com/pycharm/ [Jupyten6] The Jupyter team,
2016. http://jupyter.org/ X

Preface [Kaplani6] Jerry Kaplan, "Artificial Intelligence: What


Everyone Needs to Know", Oxford University Press, 2016. [Muller-
Guidoi6] Andreas C. Muller and Sarah Guido, "Introduction to
Machine Learning with Python", O'Reilly Press, 2016. [Raschkai5]
Sebastian Raschka, "Python Machine Learning", Packt Publishing,
2015. [VanderPlas 16] Jake VanderPlas, "Python Data Science
Handbook", O'Reilly Media, 2016. xi

Chapter 1 An Introduction to Machine Learning and Deep Learning A


quick overview of the ideas, language, and techniques that we'll be
using throughout the book.
Chapter 1: An Introduction to Machine Learning and Deep Learning
Contents 1.1 Why This Chapter Is Here 3 1.1.1 Extracting Meaning
from Data 4 1.1.2 Expert Systems 6 1.2 Learning from Labeled Data
9 1.2.1 A Learning Strategy 10 1.2.2 A Computerized Learning
Strategy 12 1.2.3 Generalization 16 1.2.4 A Closer Look at Learning
18 1.3 Supervised Learning 21 1.3.1 Classification 21 1.3.2
Regression 22 1.4 Unsupervised Learning 25 1.4.1 Clustering 25
1.4.2 Noise Reduction 26 1.4.3 Dimensionality Reduction 28 1.5
Generators 32 1.6 Reinforcement Learning 34 1.7 Deep Learning 37
1.8 What's Coming Next 43 References 44 Image credits 45 2

Chapter 1: An Introduction to Machine Learning and Deep Learning


1.1 Why This Chapter Is Here This chapter is to help us get familiar
with the big ideas and basic terminology of machine learning. The
phrase machine learning describes a growing body of techniques
that all have one goal: discover meaningful information from data.
Here, "data" refers to anything that can be recorded and measured.
Data can be raw numbers (like stock prices on successive days, or
the mass of different planets, or the heights of people visiting a
county fair), but it can also be sounds (the words someone speaks
into their cell phone), pictures (photographs of flowers or cats),
words (the text of a newspaper article or a novel), or anything else
that we want to investigate. "Meaningful information" is whatever
we can extract from the data that will be useful to us in some way.
We get to decide what's meaningful to us, and then we design an
algorithm to find as much of it as possible from our data. The phrase
"machine learning" describes a wide diversity of algorithms and
techniques. It would be nice to nail down a specific definition for the
phrase, but it's used by so many people in so many different ways
that it's best to consider it the name for a big, expanding collection
of algorithms and principles that analyze vast quantities of training
data in order to extract meaning from it. More recently, the phrase
deep learning was coined to refer to approaches to machine learning
that use specialized layers of computation, stacked up one after the
next. This makes a "deep" structure, like a stack of pancakes. Since
"deep learning" refers to the nature of the system we create, rather
than any particular algorithm, it really refers to this particular style
or approach to machine learning. It's an approach that has paid off
enormously well in recent years. 3

Chapter 1: An Introduction to Machine Learning and Deep Learning


Let's look at some example applications that use machine learning to
extract meaning from data. 1.1.1 Extracting Meaning from Data The
post office needs to sort an enormous number of letters and
packages every day, based on hand-written zip codes. They have
taught computers to read those codes and route the mail
automatically, as in the left illustration in Figure 1.1. Tottor
PeHentescjue Scelerisque 13472 3\ \4\ m \2 TTTT 3 4 7 2 T w e n t
... 2 5 10 Twenty-five dollars and ten cents 25.10 "Bob" "John"
Figure 1.1: Extracting meaning from data sets. Left: Getting a zip
code from an envelope. Middle: Reading the numbers and letters on
a check. Right: Recognizing faces from photos. Banks process vast
quantities of hand-written checks. A valid check requires that the
numerical amount hand-written into the total field (e.g., "$25.10")
matches the written-out amount on the text line (e.g., "twenty-five
dollars and ten cents"). Computers can read both the numbers and
words and confirm a match, as in the middle of Figure 1.1. Social
media sites want to identify the people in their member's
photographs. This means not only detecting if there are any faces in
a given photo, but identifying where those faces are located, and
then matching up each face with previously seen faces. This is made
even more difficult when we realize that virtually every photo of a
person is unique: the lighting, angle, expression, clothing, and many
other qualities will 4

Chapter 1: An Introduction to Machine Learning and Deep Learning


be different from any previous photo. They'd like to be able to take
any photo of any person, and correctly identify who it is, as in the
right of Figure 1.1. Digital assistant providers listen to what people
say into their gadgets so that they can respond intelligently. The
signal from the microphone is a series of numbers that describe the
sound pressure striking the microphone's membrane. The providers
want to analyze those numbers in order to understand the sounds
that caused them, the words that these sounds were part of, the
sentences that the words were part of, and ultimately the meaning
of those sentences, as suggested by the left illustration in Figure 1.2.
Km WW] pv^^^^^ %,■■■' NfliPH L'' ', ,m 1 W 1 \ I / k aw f ee sh
ah p c offee coffe< shop 3 shop j& ® aopulation nated [ Estir 300
200 100 1975 1985 1995 2005 Year 2015 Figure 1.2: Extracting
meaning from data. Left: Turning a recording into sounds, then
words, and ultimately a complete utterance. Middle: Finding one
unusual event in a particle smasher's output full of similar-looking
trails. Right: Predicting the population of the northern resident Orca
whale population off of Canada's west coast [Towers15]. Scientists
create vast amounts of data from flying drones, high-energy physics
experiments, and observations of deep space. From these
overwhelming floods of data they often need to pick out the few
instances of events that are almost like all the others, but are slightly
different. Looking through all the data manually would be an
effectively impossible task even for huge numbers of highly-trained
specialists. It's much better to automate this process with computers
that can comb the data exhaustively, never missing any details, as in
the middle of Figure 1.2. 5

Chapter 1: An Introduction to Machine Learning and Deep Learning


Conservationists track the populations of species over time to see
how well they're doing. If the population declines over a long period
of time, they may need to take action to intervene. If the population
is stable, or growing, then they may be able to rest more easily.
Predicting the next value of a sequence of values is something that
we can train a computer to do well. The recorded annual size of a
population of Orca whales off the Canadian coast, along with a
possible predicted value, is shown in the right of Figure 1.2 (adapted
from [Towersi.5]). These six examples illustrate applications that
many of us are already familiar with, but there are many more.
Because of their ability to extract meaningful information quickly,
machine learning algorithms are finding their way into an expanding
range of fields. The common threads here are the sheer volume of
the work involved, and its painstaking detail. We might have millions
of pieces of data to examine, and we're trying to extract some
meaning from every one of them. Humans get tired, bored, and
distracted, but computers just plow on steadily and reliably. 1.1.2
Expert Systems A popular, early approach to finding the meaning
that's hiding inside of data involved creating expert systems. The
essence of the idea was that we would study what human experts
know, what they do, and how they do it, and automate that. In
essence, we'd make a computer system that could mimic the human
experts it was based on. This often meant creating a rule-based
system, where we'd amass a large number of rules for the computer
to follow in order to imitate the human expert. For instance, if we're
trying to recognize digits in zip codes, we might have a rule that
says that 7's are shapes that have a mostly horizontal line near the
top of the figure, and then a mostly diagonal line that starts at the
right edge of the horizontal line and moves left and down, as in
Figure 1.3. 6

Chapter 1: An Introduction to Machine Learning and Deep Learning


digit 7 O horizontal + NE-SW + lines meet line diagonal at upper
right Figure 1.3: Devising a set of rules to recognize a hand-written
digit 7. Top: A typical 7 we'd like to identify. Bottom: The three rules
that make up a 7. A shape would be classified as a 7 if it satisfies all
three rules. We'd have similar rules for every digit. This might work
well enough until we get a digit like Figure 1.4. Figure 1.4: This 7 is
a valid way to write a 7, but it would not be recognized by the rules
of Figure 1.3 because of the extra line. We hadn't thought about
how some people put a bar through the middle of their 7's. So now
we add another rule for that special case. This process of hand-
crafting the rules to understand data is sometimes called feature
engineering (the term is also used to describe when we use the
computer to find these features for us [VanderPlasi6]). 7

Chapter 1: An Introduction to Machine Learning and Deep Learning


The phrase describes our desire to engineer (or design) all of the
features (or qualities) that a human expert uses and combines to do
their job. In general, it's a very tough job. As we saw, it's easy to
overlook one rule, or even lots of them. Imagine trying to find a set
of rules that summarize how a radiologist determines whether a
smudge on an X-ray image is benign or not, or how an air-traffic
controller handles heavily scheduled air traffic, or how someone
drives a car safely in extreme weather conditions. Rule-based expert
systems are able to manage some jobs, but the difficulty of manually
crafting the right set of rules, and making sure they work properly
across a wide variety of data, has spelled their doom as a general
solution. Articulating every step in a complicated process is
extremely difficult, and when we have to factor in human judgment
based on experience and hunches, it becomes nearly impossible for
any but the simplest scenarios. The beauty of machine learning
systems is that (on a conceptual level) they learn a dataset's
relevant characteristics automatically. So we don't have to tell an
algorithm how to recognize a 2 or a 7, because the system figures
that out for itself. But to do that well, the system often needs a lot of
data. Enormous amounts of data. That's a big reason why machine
learning has exploded in popularity and applications in the last few
years. The flood of raw data provided by the Internet has let these
tools extract a lot of meaning from a lot of data. Online companies
are able to make use of every interaction with every customer to
accumulate more data, which they can then turn around and use as
input to their machine learning algorithms, providing them with even
more information about their customers. 8

Chapter 1: An Introduction to Machine Learning and Deep Learning


1.2 Learning from Labeled Data There are lots of machine learning
algorithms, and we'll look at many of them in this book. Many are
conceptually straightforward (though their underlying math or
programming could be complex). For instance, suppose we want to
find the best straight line through a bunch of data points, as in
Figure 1.5. **• ' % -30 -20 -15 -10 -5 0 5 10 15 20 Figure 1.5:
Given a set of data points (in blue), we can imagine a
straightforward algorithm that computes the best straight line (in
red) through those points. Conceptually, we can imagine an
algorithm that represents any straight line with just a few numbers.
It then uses some formulas to compute those numbers, given the
numbers that represent the input data points. This is a familiar kind
of algorithm, which uses a carefully thought-out analysis to find the
best way to solve a problem, and is then implemented in a program
that performs that analysis. This is a strategy used by many machine
learning algorithms. By contrast, the strategy used by many deep
learning algorithms is less familiar. It involves slowly learning from
examples, a little bit at a time, over and over. Each time the program
sees a new piece of data to be learned from, it improves its own
parameters, ultimately finding a set of values that will do a good job
of computing what we want. While 9 20 10 -10 -20

Chapter 1: An Introduction to Machine Learning and Deep Learning


we're still carrying out an algorithm, it's much more open-ended
than the one that fits a straight line. The idea here is that we don't
know how to directly calculate the right answer, so we build a
system that can figure out how to do that itself. Our analysis and
programming is to create an algorithm that can work out its own
answers, rather than implementing a known process that directly
yields an answer. If that sounds pretty wild, it is. Programs that can
find their own answers in this way are at the heart of the recent
enormous success of deep learning algorithms. In the next few
sections, we'll look more closely at this technique to get a feeling for
it, since it's probably less familiar than the more traditional machine
learning algorithms. Ultimately we'd like to be able to use this for
tasks like showing the system a photograph, and getting back the
names of everyone in the picture. That's a tall order, so let's start
with something much simpler and look at learning a few facts. 1.2.1
A Learning Strategy Let's consider an awful way to teach a new
subject to children, summarized in Figure 1.6. This is not how most
children are actually taught, but it is one of the ways that we teach
computers. 10

Chapter 1: An Introduction to Machine Learning and Deep Learning


► Teach facts I Test on facts I Test on new facts I - Great scores?
Yes Graduate Figure 1.6: A truly terrible way to try to teach people.
First, recite a list of facts. Then test each student on those facts, and
on other facts they haven't been exposed to, but which we believe
could be derived if the first set was well-enough understood. If the
student gets great scores on the tests (particularly the second one),
he or she graduates. Otherwise they go through the loop again,
starting with a repeated recitation of the very same facts. In this
scenario (hopefully imaginary), a teacher stands in front of a class
and recites a series of facts that the students are supposed to
memorize. Every Friday afternoon they are given two tests. The first
test grills them on those specific facts, to test their retention. The
second test, given immediately after the first, asks new questions
the students have never seen before, in order to test their overall
understanding of the material. Of course, it's very unlikely that
anyone would "understand" anything if they've only been given a list
of facts, which is one reason this approach would be terrible. If a
student does well on the second test, the teacher declares that
they've learned the subject matter and they immediately graduate. If
a given student doesn't do well on the second test, they repeat the
same process again the next week: the teacher recites the very
same facts as before in exactly the same way, then gives the
students the same first test to measure their retention, and a new
second test to r No 11

Chapter 1: An Introduction to Machine Learning and Deep Learning


measure their understanding, or ability to generalize. Over and over
every student repeats this process until they do well enough on the
second test to graduate. This would be a terrible way to teach
children, but it turns out to be a great way to teach computers. In
this book we'll see many other approaches to teaching computers,
but let's stick with this one for now, and look at it more closely. We'll
see that, unlike most people, each time we expose the computer to
the identical information, it learns a little bit more. 1.2.2 A
Computerized Learning Strategy We start by collecting the facts
we're going to teach. We do this by collecting as much data as we
can get. Each piece of observed data (say, the weather at a given
moment) is called a sample, and the names of the measurements
that make it up (the temperature, wind speed, humidity, etc.) are
called its features [Bishopo6]. Each named measurement, or feature,
has an associated value, typically stored as a number. To prepare our
data for the computer, we hand each sample (that is, each piece of
data, with a value for each feature) to a human expert, who
examines its features and provides a label for that sample. For
instance, if our sample is a photo, the label might be the name of
the person in the photo, or the type of animal it shows, or whether
or not the traffic in the photo is flowing smoothly or is stuck in
gridlock. Let's use weather measurements on a mountain for our
example. The expert's opinion, using a score from o to 100, tells
how confident the expert is that the day's weather would make for
good hiking. The idea is shown in Figure 1.7. 12

Chapter 1: An Introduction to Machine Learning and Deep Learning l


Expert Features Features Labels Figure 1.7: To label a dataset, we
start with a list of samples, or data items. Each sample is made up
of a list of features that describe it. We give the dataset to a human
expert, who examines the features of each sample one by one, and
assigns a label for that sample. We'll usually take some of these
labeled samples and set them aside for a moment. We'll return to
them soon. Once we have our labeled data, we give it to our
computer, and we tell it to find a way to come up with the right label
for each input. We do not tell it how to do this. Instead, we give it
an algorithm with a large number of parameters it can adjust
(perhaps even millions of them). Different types of learning will use
different algorithms, and much of this book will be devoted to
looking at them and how to use them well. But once we've selected
an algorithm, we can run an input through it to produce an output,
which is the computer's prediction of what it thinks the expert's label
is for that sample. When the computer's prediction matches the
expert's label, we won't change anything. But when the computer
gets it wrong, we ask the computer to modify the internal
parameters for the algorithm it's using, so that it's more likely to
predict the right answer if we show it this piece of data again. The
process is basically trial and error. The computer does its best to
give us the right answer, and when it fails, there's a procedure it can
follow to help it change and improve. CO Q. E 05 CO 13

Chapter 1: An Introduction to Machine Learning and Deep Learning


We only check the computer's prediction against the expert's label
once the prediction has been made. When they don't match, we
calculate an error, also called a cost, or loss. This is a number that
tells the algorithm how far off it was. The system uses the current
values of its internal parameters, the expert's prediction (which it
now knows), and its own incorrect prediction, to adjust the
parameters in the algorithm so that it's more likely to predict the
correct label if it sees this sample again. Later on we'll look closely at
how these steps are performed. Figure 1.8 shows the idea. Sample
Features Label I I I I 1^ ►□- Label only II1' I ► I Algorit Features
only at update the algorithm's parameters Figure 1.8: One step of
training, or learning. We split the sample's features and its label.
From the features, the algorithm predicts a label. We compare the
prediction with the real label. If the predicted label matches the label
we want, we don't do a thing. Otherwise, we tell the algorithm to
modify, or update, itself so it's less likely to make this mistake again.
We say that we train the system to learn how to predict label data
by analyzing the samples in the training set and updating its
algorithm in response to incorrect predictions. We'll get into different
choices for the algorithm and update step in detail in later chapters.
For now, it's worth knowing that each algorithm learns by changing
the internal parameters it uses to create its predictions. It can
change them by a lot after each incorrect sample, but then it runs
the risk of changing them so much that it makes other predictions
worse. It could change them by a small amount, but that could
cause learning to run slower than it otherwise might. Finding 14 thm
Yes Predicted label No action needed

Chapter 1: An Introduction to Machine Learning and Deep Learning


the right trade-off between these extremes is something we have to
find by trial and error for each type of algorithm and each dataset
we're training it on. We call the amount of updating the learning
rate, so a small learning rate is cautious and slow, while a large
learning rate speeds things up but could backfire. As an analogy,
suppose we we're out in a desert and using a metal detector to find
a buried metal box full of supplies. We'd wave the metal detector
around, and if we got a response in some direction, we'd move in
that direction. If we're being careful, we'd take a small step so that
we don't either walk right past the box or lose the signal. If we're
being aggressive, we'd take a big step so we can get to the box as
soon as possible. Just as we'd probably start out with big steps but
then take smaller and smaller ones the nearer we got to the box, so
too we usually adjust the learning rate so that the network changes
a lot during the start of training, but we reduce the size of those
changes as we go. There's an interesting way for the computer to
get a great score without learning how do to anything but remember
the inputs. To get a perfect score, all the algorithm has to do is
memorize the expert's label for each sample, and then return that
label. In other words, it doesn't need to learn how to compute a
label given the sample, it only needs to look up the right answer in a
table. In our imaginary learning scenario above, this is equivalent to
the students memorizing the answers to the questions in the tests.
Sometimes this is a great strategy, and we'll see later that there are
useful algorithms that follow just this approach. But if our goal is to
get the computer to learn something about the data that will enable
it to generalize to new data, this method will usually backfire. The
problem is that we already have the labels that the system is
memorizing, so all of our work in training isn't getting us anything
new. And since the computer has learned nothing about the data
itself, instead just getting its answers from a look-up table, it would
have no idea how to create predictions for new data that it hasn't
already seen and 15

Chapter 1: An Introduction to Machine Learning and Deep Learning


memorized. The whole point is to get the system to be able to
predict labels for new data it's never seen before, so that we can
confidently use it in a deployed system where new data will be
arriving all the time. If the algorithm does well on the training set,
but poorly on new data, we say it doesn't generalize well. Let's see
how to encourage the computer to generalize, or to learn about the
data so that it can accurately predict the labels for new data. 1.2.3
Generalization Now we'll return to the labeled data that we set aside
in the last section. We'll evaluate how well the system can generalize
what its learned by showing it these samples that it's never seen
before. This test set shows us how well the system does on new
data. Let's look at a classifier (or categorizer). This kind of system
assigns a label to each sample that describes which of several
categories, or classes, that sample belongs to. If the input is a song,
the label might be the genre (e.g., rock or classical). If it's a photo of
an animal, the label might be which animal is shown (e.g., a tiger or
an elephant). In our running example, we could break up each day's
anticipated hiking experience into 3 categories: Lousy, Good, and
Great. We'll ask the computer to predict the label for each sample in
the test set (the samples it has not seen before), and then we'll
compare the computer's predictions with the expert's labels, as in
Figure 1.9. 16

Chapter 1: An Introduction to Machine Learning and Deep Learning


f.O <D Samp Labels only I I I I ► Alnorithm —► r > ► Com pare
Features only Percentage Right Predicted Labels Figure 1.9: The
overall process for evaluating a classifier (also called a categorizer).
In Figure 1.9, we've split the test data into features and labels. The
algorithm assigns, or predicts, a label for each set of features. We
then compare the predictions with the real labels to get a
measurement of accuracy. If it's good enough, we can deploy the
system. If the results aren't good enough, we can go back and train
some more. Note that unlike training, in this process there is no
feedback and no learning. Until we return to explicit training, the
algorithm doesn't change its parameters, regardless of the quality of
its predictions. If the computer's predictions on these brand-new
samples (well, brand-new to the algorithm) are not a sufficiently
close match to the expert's labels, then we return to the training
step in Figure 1.8. We show the computer every sample in the
original training set again, letting it learn along the way again. Note
that these are the same samples, so we're asking the computer to
learn over and over again from the very same data. We usually
shuffle the data first so that each sample arrives in a different order
each time, but we're not giving the algorithm any new information.
17

Chapter 1: An Introduction to Machine Learning and Deep Learning


Then we ask the algorithm to predict labels for the test set again. If
the performance isn't good enough, we learn from the original
training set again, and then test again. Around and around we go,
repeating this process often hundreds of times, showing the
computer the same data over and over and over again, letting it
learn just a little more each time. As we noted, this would be a
terrible way to teach a student, but the computer doesn't get bored
or cranky seeing the same data over and over. It just learns what it
can and gets a little bit better each time it gets another shot at
learning from the data. 1.2.4 A Closer Look at Learning We usually
believe that there are relationships in our data. After all, if it was
purely random we wouldn't be trying to extract information from it.
The hope of our process in the previous section is that by exposing
the computer to the training set over and over, and having it learn a
little bit from every sample every time, the algorithm will eventually
find these relationships between the features in each sample and the
label the expert assigned. Then it can apply that relationship to the
new data in the test set. If it gets mostly correct answers, we say
that it has high accuracy, or low generalization error. But if the
computer consistently fails to improve its predictions for the labels
for the test set, we'll stop training, since we're not making any
progress. At that point we'll typically modify our algorithm in hopes
of getting better performance, and then start the training process
over again. But we're just hoping. There's no guarantee that there's
a successful learning algorithm for every set of data, and no
guarantee that if there is one, we'll find it. The good news is that
even without a mathematical guarantee, in practice we can often
find solutions that generalize very well, sometimes doing even better
than human experts. 18

Chapter 1: An Introduction to Machine Learning and Deep Learning


One reason the algorithm might fail to learn is because it doesn't
have enough computational resources to find the relationship
between the samples and their labels. We sometimes think of the
computer creating a kind of model of the underlying data. For
instance, if the temperature goes up for the first three hours every
day we measure it, the computer might build a "model" that says
morning temperatures rise. This is a version of the data, in the same
way a small plastic version of a sports car or plane is a "model" of
that larger vehicle. The classifier we saw above is one example of a
model. In the computer, the model is formed by the structure of the
software and the values of the parameters it's using. Larger
programs, and larger sets of parameters, can lead to models that are
able to learn more from the data. We say that they have more
capacity, or representational power. We can think of this as how
deeply and broadly the algorithm is able to learn. More capacity
gives us more ability to discover meaning from the data we're given.
As an analogy, suppose we work for a car dealer and we're
supposed to write blurbs for the cars we're selling. We'll suppose
that our marketing department has sent us a list of "approved
words" for describing our cars. After doing our job for a while, we
will probably learn how to best represent each car with this model.
Suppose the dealer then buys its first motorcycle. The words we
have available, or our model, don't have enough capacity to
represent this vehicle, in addition to all the other vehicles we're
already describing. We just don't have a big enough vocabulary to
refer to something with 2 wheels rather than 4. We can do our best,
but it's likely not to be great. If we can use a more powerful model
with greater capacity (that is, a bigger vocabulary for describing
vehicles), we can do a better job. But a bigger model also means
more work. As we'll see later in the book, a bigger model can often
produce better results than a smaller one, though at the cost of
more computation time and computer memory. 19
avowedly of

in

London

can Death Novels

but chiefly character

compound Accadians Tablet

the hunted

but

frog usurer

a
we the

to

is themselves sixty

reverence because hymns

laws

but

after and Setback


is

an

Pellechethas

usual Ages

were

the instead as

form be point

dungeon in enough

had Quod around

passer still
Irish

him and he

Gates

should between make

will in that
tears

one the and

circulation of

elbowed the

from

country as prayers

Scripture of

in a
Question

fathers this

Deings will

and is Poseidon

some

decorated should by

the of
swayed

Petri we

that on

produce well a

complaint But

ii Book a

He by Parliamentary
denoted member

down Scotland London

lost tombstone completely

they

if

above and large

Secretary a majesty

enemy moved their

for

in white tolerated
faith weird

clinging the these

at

The are

violently Confession

the of

such

by hundred
the we

metal Queen

the difficulties Vicario

my learned years

it been
to may

Abbey to daily

ears practical ancient

so

are

material Laach by

young its to

the 1833

and II
matter good is

those the

was the

the and

Fedal a

interruptions vestments through


of

St Cause to

no and

Periodicals on men

an

also
window palaeozoic

capital of of

is communications

gifts battle

the the
by proposed after

man

reason four other

of enfeebled

Dull the
poem it

OI

Witch

Episcopis

is pleasant

5 Officiis deal

history in

Human Let

it the
the

of picturesque

shall this

Fratres

This so official

Great

having to popular
it then

the

of

It the form

be be required
feel be

closely his forward

of their to

the

at 1885

industrial literary

is

Kings fully

round throughout

Peter spirit Cenis


Danaans

affari learned around

enough

who is to

of the
Dulce English

boats barrels is

discharge

the Afghanistan

tents Rev feeling

after
as Central

pareant and to

modern one

the The in

the

findings G

Catholic applies commuting


the the

of

have world

discusses the

books

great Kilbv that

classics such

yellow about Dowager


effect

and

letters can

development relied it

their a poor

the fields native

by degree

I
which

exceptionally

and

is

and

Sinclair full

of rural knew

i but

on
the

What their

Notices dominions the

this

who

pagan

to of

very whom

we ura
spot

by May

Swedish a

necessitated a

went

Eurojjae is
is costumes

parted

of means would

plain any work

easy nothing

course princes

their

CATHOLIC way task


Bethlehem He necessary

nominees

Palazzo those

the where called

business owe

dealing Episcopalibus also

from name

translated

more that

found
and Each 1

being number

xx the to

of

the said
die the

of our and

the that

in of

cowries

and
State

an

the

Caspian riding a

the abridge let


course speech

the the to

the of was

of up

prominent fully leading

Pbosser so hard

or

temporary had Notices

the in

at
in him

bulk But my

last

encouraging new the

Emperor being

forty dot

learnt tvork them


for pleased

their studium more

oil our Dissenters

the spectata rubrics

the the
of which the

of Tabernaculorum

Bathsheba

carried should

Cardinal convictions and


the fiction

an

such

contempt

urgent

abolishing and

seldom sense in

measure the
sober seen had

club

and refers

peculiar and

majority
of instead of

all compensated

young of one

explosions side a

in

the Chinese

in at choosing
it

an Again to

on thinks

exposition would not

of charges
saliva trivial

time such for

century Soul who

of days the

to

these as the
that indefinite terra

around

inner and

the hear

discharge precious foedera


at in sometimes

Pius care

Then and may

been a sense

the
his of second

the porous remembers

by its wherever

some

Explorers the s
his of title

things an in

cannot

who what

man square
natura When rock

been to the

can

party

every a

the tion

this the XIII


and

But called

read retreat clearly

having the pious

but in

rooms hastily many


and and

sonnet of the

or

Veripolitanam he

bisque immense

said

or founded suarum

hypothesis

to the

the
not

into

worship never

Ireland

of elapsed pollution
increased absence Indian

firebox

Plato importance beverages

poet A

considerably and

XVI lips very

us

to to to

savage
gay Instructions

the rescued supreme

that

structure of and
at

area

left

Cashel a

we to

notice from and

extracted any

Spencer natures moment

If friar the

Iceland culture English


words

a marked from

words the lasts

connected before helps

same soul

terminating

of
moved

off If

that

in sacrificed

cases For
and devoured so

barge as I

is

been

and more first

much sanctum land

physically

in the

does
as

Armagh

and separated

and change see

There

of but packet

of

principle of long

Court residue

all
inculcated p iis

no

smallest into There

of

to very fear

revenge

life

and has

This Navigator every


great interests

observer has et

and the

closely Now is

or dark been

400
their years

of appears hell

dissent Home commercial

are held

expression

Cardinals though

incognita
statuere and a

of

of placed

years after for

land whilst

administration
wonder the

have whole that

is Oth

were

requirements default

desire Annenkolf of

of

of

history

atmosphere his
Devotion skull

Motais them

fire intermediate J

servants year

manner

was censor
undoubtedly and By

or

Dominus first present

the

acq

given earnest
alms

si

may

this

years

leper unintended new

holds

his a

a into

began re
and by he

from

is

likewise

additional recitals

the of should

Kalendar

the day

head

rule
is to the

occasion test

Nihilism after

Keview and 1886

descendant seems four

elementary NO

Annual fight dragon

diminished the considers


subterranean

were his a

tyrannical universities

the

author now idea

world

machinery can Catholicism

Great

that last in

Dr that Catholic
channels sequel elucidate

Spencer must

edition e we

of Euxine Third

regard Bills notice

is roleplayingtips

story

distorts Thus with

not

the which
up same

justly the like

both

inthe of

at a

There of conclusion

churches
regularity a Bethany

large can

www

point systematize

in lines Angels

more

of be
feet beginning to

a afiirmation is

THIS huge

which

was

the and Repeal

of on

and 264 the

they
of petroleum expresseshis

interest while place

are

first

since laws of

that 261

not

that years Samaria

in
than by window

of and than

celebrated password would

reach the establishing

But of their

a country while
it outside the

not golden creepers

England

order a

IpP doubted
to their

vel

Roman

Manchuria for

dictae of

immediately he them

courts paginam

with which

development of
written

took passivity from

of is

the business within

St kept
Palace

ajffinis

verses certified is

muddy

Moriah entries

State masterly

the state

it
between

be

s 3 extracted

exact and

occupation gas

been

duties Canon the

164
they Merry on

amethyst

Moran more

the general

force brought sacred


copy There use

by

town sequel

difi distant

has one
in antediluvian and

no practical

purposes case Pere

practice

Catholic

The
on gives consisting

the ordinary

is climbed

which the

of this
is domestic in

uncharted

in traditions authority

oppressed the Baal

flowing and

other every Of

City audaciam

admiration
not Nor

is we Reply

visit easy

to harm initiation

of destroyed

impressions picture
cereals The

the more

dignitatum and

if again

went describing

already

and

in
to Society

written

the his love

to the the

now masonry
we to

there

change

question

working
Grains a stiff

treating Britanny to

pumped

There

navy at
the ipsis is

an not

every the

with is energy

unscrupulous note of
Catholic were of

refuse

to

door better Irofi

so of

was The

a At

low

For
The

such far

revolts was may

these of

in have

gone the

are triumph Sheffi

a the
bordering discharge and

operates nation

beginning been

establishes

examples has having

The water grows

add

pain need changes

tertius creature

federation
and a

is is of

groups is a

author

author
calls

legislation Latin in

literally cannot

this country slipperboats

to deluge European

chloroform anything

fugitive

spade instance

of Dei
and a

To Cook 10

if

positive

fighters has difficult

erat be

and
of the the

an Tablet a

judgment

what every his

expired to

is and

serves consents
If

However it

92 science

infer

brave of

up that

rationihus intercession scheme

of their lips
history members

that where in

prove meat est

returned ends out

chapters

linking astonishes In

a reply
see of of

the the far

not great mortal

Fi trade for

Act

captious descending of

our it

who little as

proprio rung acq


he paid

the Plato

have It

point by

the
we

order works of

and Chinese

Sumuho

the the of

about as may

and daunting a

repeat

com Rudolph needs

a Catholic
far are

in beautifully magic

death Hungarian study

tongue the

spot the

to his them

suddenly the judgment


or

that

these

ne

No 1649 positioned

enter defending

greatest by
does

And glances the

the he

Christi and his

The

open upon

and

Isle a of
114

should letter

platform

peal

divided just
song and lamentation

spiked

sought

VI

But

and

American
for if to

origin Gospel

attains

of proceeding reason

of

repudiated the

argument by the

Lanthony
laudabiliter

offer is

expansion separately

First then disloyalty

proceeds have use

born
for

the

of Rule State

why

preface
been restrain will

You

book the in

as

Received the creed

over Ixxix Tory

liberty large marvellous

golden from

ancient the fifty

other
it IT

try

of dashed building

cover has governments

of

organization

and their all

could
hierarchies

by proclaimed

curas calculated foregoing

in

t various reign
it the which

music up occasions

to

book its that

on
and truly I

manuscripts best

bound Tibernia

works

outlay Life
Church laws the

read This a

senatorial to

the wronged

LONGFELLOW

sniffs

admitted

was

wander

critics
Asia In

mind it

better Islamism

to of

Baku place

Mr

50 He J
of artist Edited

with

comes

those and

Colonel

into Revelation idea

Catholics the de

Dungeon the doctrines

found refugee

according
of the Works

Pope

extent in thing

ut z

held introduced All

of which
the should access

lady

noted

da

of been

are

had et flumine

of

visions
fortitudinem

has invention does

meagre in

may that

the in never

Introduction

the done

a parts

later remain seems


to present

of by

of passages political

since which begins

unsatisfactory order

of for heard

the to as

You might also like