0% found this document useful (0 votes)
25 views

A Practical and Technical Introduction To Machine Learning

The document discusses the machine learning project lifecycle including problem framing, data collection, data analysis, data preparation, and model training and evaluation. It provides details on each step such as expressing goals, sampling strategies, exploratory data analysis, feature engineering, establishing baselines, debugging models, and monitoring performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

A Practical and Technical Introduction To Machine Learning

The document discusses the machine learning project lifecycle including problem framing, data collection, data analysis, data preparation, and model training and evaluation. It provides details on each step such as expressing goals, sampling strategies, exploratory data analysis, feature engineering, establishing baselines, debugging models, and monitoring performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to

Machine Learning

Photo by Google DeepMind on Unsplash


What is Machine learning?
● Machine learning is a subset of Artificial Intelligence that enables
computers to learn from data without being explicitly programmed.

Rules Output
Traditional Machine
Output Rules
Programming Learning
Data Data
What is Machine learning?
● Machine learning is a subset of Artificial Intelligence that enables
computers to learn (progressively improve performance on tasks) from
data (examples, experience) without explicit (rule-based) programming and
make predictions or decisions.
Autonomously learning from examples; pattern recognition; autonomously identify patterns and extract
insights from data;training (learning) time followed by test (prediction, evaluation) time;

Rules Output
Traditional Machine
Output Rules
Programming Learning
Data Data

supervised learning at training time


Types of Machine Learning

● Prediction (supervised learning): Given an input observation, the model


predicts a numeric value (regression) or a class (classification). (one-time
prediction)
● Analysis (unsupervised learning): The model extracts information (patterns,
structure) from the data.
● Generation: The model generates content (possibly given an input).
● Decision (reinforcement learning): The model (agent) makes decisions by
taking actions and getting rewards in an environment to achieve a goal
(sequential decisions, no labels but learns from experience).
Types of Machine Learning (loosely speaking)

Ground Model
Data Objective
truth output
Prediction Label

Analysis Latent variable

Generation Target output

Decision World state

The four pillars of Machine Learning https://www.youtube.com/watch?v=ZlIjJ9Es-fg


Machine learning project lifecycle

Model
Problem Data Data Data Model
training/
framing collection wrangling analysis deployment
evaluation

Express the goal clearly and concisely


❏ What is the main goal ? In which context? (SMART)
❏ What value does it add? (cost-benefit analysis, include maintenance)
❏ Can we solve it without ML? Is it feasible?
❏ Do we have enough quality data? (correct, representative, predictive power)
Express the goal technically
❏ What is the model’s goal (measurable)? What is the success/failure metric?
❏ What are the Input and Output of the model?
❏ What is the performance measure?
❏ What are the non-ML baselines?
Machine learning project lifecycle

Model
Problem Data Data Data Model
training/
framing collection wrangling analysis deployment
evaluation

❏ Select the dataset size and the sampling strategy


❏ Identify feature and label sources (actual vs proxy labels)
❏ Measure data quality (noise, label or feature error, missing values, predictive power)
❏ Make sure data is representative of production use case (beware sampling bias)
❏ Split the data into train, validation , test (beware random splitting, fix random seed, beware
data leakage, train older than test)
Machine learning project lifecycle

Model
Problem Data Data Data Model
training/
framing collection wrangling analysis deployment
evaluation

❏ Data transformation (convert non-numeric features to numeric, resize input to fixed size)
- Numeric: Normalization (range scaling, log-scaling, clipping, z-score or
standardization), binning (equally-spaced, quantile-based)
- Categorical: one-hot encoding, tokenization
❏ Transform within a pipeline (beware data leakage)
❏ Data cleaning (missing values, imputation)
❏ Feature engineering (determining which features are important for training and creating
them from raw data)
Machine learning project lifecycle

Model
Problem Data Data Data Model
training/
framing collection wrangling analysis deployment
evaluation

❏ Look at data distributions (summary statistics, histogram, density, CDF)


❏ (Periodically) validate the data (expected behavior with data schema; consistency over time;
consider context, source and sampling strategy)
❏ Identify and decide what to do with outliers
❏ Handle noise (confidence interval, hypothesis testing)
❏ Look at individual examples
❏ Group the data (look from different subgroups perspective)
❏ Exploratory data analysis (graphs)
❏ Make hypothesis and look for evidence (scientific method)
❏ Remember correlation != causation

Machine learning project lifecycle

Model
Problem Data Data Data Model
training/
framing collection wrangling analysis deployment
evaluation

❏ Establish strong baselines


❏ Make sure (data) pipeline is correct before training (always validate data quality)
❏ Start with a simple model (then incrementally add complexity, train on small data first)
❏ Train on training data, select models and tune hyperparameters on validation data and
test once on test data (update validation and test data)
❏ Debug based on loss and metric:
- Debug data (validate input data with tests: correct, representative, predictive power;
splits; preprocessing; numerical overflow)
- Debug model: Overfitting (reduce model capacity, regularization, more data,
train-test same distribution); Underfitting (increase model capacity, reduce
regularization, feature engineering); Hyperparameter tuning; Feature selection
(correlation with labels, performance on validation set)
❏ Document everything (especially failures)
Machine learning project lifecycle

Model
Problem Data Data Data Model
training/
framing collection wrangling analysis deployment
evaluation

❏ Periodically train model on new data


❏ Treat data and model as code (version control)
❏ Test each component of the pipeline (input data, data transformations, model updates,
serving infrastructure)
❏ Integration test of end-to-end pipeline (when introducing new models or training on new
data)
❏ Track training-serving skew (data schema skew, features skew, beware feedback loops,
update the model on new data)
❏ Monitor model performance and efficiency (regression testing, checkpointing)


Machine learning project lifecycle
1. Problem framing
Express the problem within the business context, emphasize its values
Decide if solvable without ML, cost-benefit analysis; feasibility; data requirements
Define the problem technically and choose a performance measure
Prepare the environment
2. Data collection
Make sure data is representative of production use cases
Reduce sampling bias
Data annotation strategy if required
Split the data for evaluation)
3. Data analysis
EDA (summary statistics, visualisations, identify outliers)
Extract insights from data
4. Data preparation
Data cleaning and formatting (imputation, encoding, standardization)
Feature engineering
5. Model training and evaluation
Build an end-to-end pipeline that can be tested
Start simple simple models and find strong baselines
Model selection and Hyperparameter tuning
Error analysis
6. Model deployment and maintenance
Pipeline integration
Monitoring and regression testing
7. Presentation
Terminology
Data
❖ Features
❖ Examples
❖ Labels
❖ Dataset

● Supervised learning: Examples are labeled. The goal is to find a model that
predicts y from x.
❏ Classification: Label is a category.
❏ Regression: Label is a real number.
● Unsupervised learning: Examples are unlabeled.
● Reinforcement learning
Supervised learning
Data
❖ Features
❖ Examples (in sample space):
❖ Labels (in label space):
❖ Labeled Dataset:

Examples are labeled. The goal is to find that outputs a


“good” prediction of y given x on unseen examples.
➔ A loss function measures how good a prediction is.
The goal is to find f with a small loss on unseen examples.
Error decomposition
● All pairs are drawn i.i.d from an unknown
distribution on (Data generating assumption)

● The goal is to find f with a small expected loss or risk (generalization error):

● But we cannot compute it since is unknown (no access to population).


So we use an estimate (using a sample or train dataset) and minimize the
empirical risk (train error):

Law of Large Numbers L_emp → L_true a.s


● The function f* with the smallest risk is the Bayes function (best in theory)

● We have to choose f from a set of functions called the hypothesis space.


The function with the smallest risk in F is (best in class)

● The function with the smallest empirical risk in F is (best in practice)

approximation
rror
at ion e error
estim
r approximation error Bayes error
ion erro
at somewhere here
estim

● Approximation error: comes from restricting to class F instead of all


measurable functions (does not change with infinite data). Smaller when F
is bigger (more complex).
● Estimation error: comes from using finite training data (empirical risk
instead of true risk, zero with infinite data). Smaller when F is smaller (less
complex).
This is the Bias-Variance trade-off. Your job is to find F that balances these
errors.
r approximation error Bayes error
ion erro
at somewhere here
estim

● Approximation error: comes from restricting to class F instead of all


measurable functions (does not change with infinite data). Smaller when F
is bigger (more complex).
● Estimation error: comes from using finite training data (empirical risk
instead of true risk, zero with infinite data). Smaller when F is smaller (less
complex).
This is the Bias-Variance trade-off. Your job is to find F that balances these
errors.
r approximation error Bayes error
ion erro
at somewhere here
r estim
on erro
t im izati
op

● Approximation (Representation) error: comes from restricting to class


F instead of all measurable functions (does not change with infinite data).
Smaller when F is bigger (more complex).
● Estimation (Generalization) error: comes from using finite training data
(empirical risk instead of true risk, zero with infinite data). Smaller when F
is smaller (less complex).
● Optimization error: comes from the algorithmic problem of minimizing
the empirical risk ( may overfit more than , loss might not be convex)
Fundamental questions of Machine Learning

● Representation: What is the class of functions F we should


choose?
● Generalization: Will the performance of predictor transfer
from seen training examples to unseen examples?
● Optimization: How can we efficiently solve the optimization
problem?

Intertwined rather than independent questions


Generalization bounds
(Data generating assumption)
● Finite F
Let F be a finite hypothesis set. Then, for all f in F, for all δ>0, with probability at
least 1-δ

We can also bound the estimation error with this

● Infinite F (Vapnik-Chervonenkis)
Let F be a finite hypothesis set with finite VC dimension dVC. Then, for all f in F, for
all δ>0, with probability at least 1-δ

measure the “effective” size of the class, that is, the size of the projection of the class onto finite observations.
Regularization Complexity, capacity, richness, expressivity

● The main goal of regularization is to reduce the generalization error by


reducing the complexity of the hypothesis space F.
● Given a complexity measure (a norm on F) the constrained
hypothesis space is the set of functions with complexity at most C.

Increasing complexities C=0,1,2,3.56,... gives nested spaces

● Constrained or penalized (structural) empirical risk minimization


Photo by Google DeepMind on Unsplash

You might also like