Université Blida 1
Faculté des Sciences
Département d’Informatique
ISI (Ingénierie des Systèmes Intelligents)
Advance Machine
Learning
Remmide
2024/2025
Course Objectives
This subject covers advanced learning concepts to address
complex problems in data science using :
● data streams
● incremental and constructive learning,
● reinforcement learning
● complex neural networks
● deep learning
● multi-task learning and
● transfer learning between domains.
Course content
01 Neural Network
02 CNN
03 RNN
04 Transfer Learning
05 GAN
06 Reinforcement Learning
Université Blida 1
Faculté des Sciences
Département d’Informatique
ISI (Ingénierie des Systèmes Intelligents)
Introduction to
Advance Machine
Learning
Remmide
2024/2025
Artificial intelligence
Artificial Intelligence (AI), noun: The field of computer science focused
on creating systems capable of performing tasks that typically require
human intelligence. This encompasses the theory, development, and
deployment of algorithms and models that can:
1. Process and analyze complex data
2. Learn from experience and adapt behavior
3. Recognize patterns and make predictions
4. Understand and generate natural language
5. Perceive and interpret visual information
6. Make decisions and solve problems
7. Engage in reasoning and planning
Machine Learning
Machine Learning (ML), noun: A subset of artificial intelligence that
develops algorithms and statistical models enabling computer systems
to improve their performance on a specific task through experience,
without explicit programming. ML systems learn patterns from data to
make predictions, decisions, or generate insights.
Key characteristics:
1. Data-driven approach to problem-solving
2. Ability to automatically adapt and improve with exposure to more
data
3. Focus on creating models that can generalize from examples
4. Emphasis on statistical and probabilistic methods
Machine Learning
Most of them follow the same general structure:
Learning from examples
3 main ingredients
1. Training set / examples:
{x1, x2, . . . , xN }
2. Machine or model:
x → f(x; θ) | {z } function / algorithm → y |{z}
prediction θ: parameters of the model
3. Loss, cost, objective function / energy:
argmin θ E(θ; x1, x2, . . . , xN )
Learning from examples
Tools: ( Data ↔ Statistics Loss ↔ Optimization
Goal: to extract information from the training set
● relevant for the given task,
● relevant for other data of the same kind.
Terminology
● Sample (Observation or Data): item to process (e.g., classify).
Example: an individual, a document, a picture, a sound, a video. . .
● Features (Input): set of distinct traits that can be used to describe
each sample in a quantitative manner. Represented as a
multi-dimensional vector usually denoted by x. Example: size,
weight, citizenship, . . .
● Training set: Set of data used to discover potentially predictive
relationships.
● Validation set: Set used to adjust the model hyperparameters.
● Testing set: Set used to assess the performance of a model.
● Label (Output): The class or outcome assigned to a sample. The actual
prediction is often denoted by y and the desired/targeted class by d or t.
Example: man/woman, wealth, education level, . . .
Learning approaches
● Unsupervised learning: Discovering patterns in unlabeled data.
Example: cluster similar documents based on the text content.
● Supervised learning: Learning with a labeled training set.
Example: email spam detector with training set of already labeled
emails.
● Semisupervised learning: Learning with a small amount of
labeled data and a large amount of unlabeled data. Example: web
content and protein sequence classifications.
● Reinforcement learning: Learning based on feedback or reward.
Example: learn to play chess by winning or losing.
Learning approaches
Machine learning workflow
Problem types
ML Models
Linear Regression:
A supervised learning model that assumes a linear relationship between input features and a
continuous output. It finds the best-fit line by minimizing prediction errors, serving as a simple
but interpretable model.
Logistic Regression:
A classification model that predicts the probability of class membership using a logistic function.
It's commonly used for binary classification tasks and produces easy-to-interpret, probabilistic
outputs.
Decision Trees:
A model that splits data into if-then rules based on feature values, forming a tree structure. It
works for both classification and regression, is easy to understand, but can overfit if too deep.
ML Models
Random Forests:
An ensemble of decision trees built on random data subsets to prevent overfitting. Random
forests offer better accuracy and feature importance insights, handling high-dimensional data
well.
Support Vector Machines (SVM):
SVMs find the optimal hyperplane to separate classes in high-dimensional space. Using kernels,
they handle non-linear classification effectively, particularly when there are more features than
samples.
k-Nearest Neighbors (k-NN):
An instance-based algorithm that predicts based on the majority class (classification) or average
value (regression) of the k nearest data points. It’s intuitive but can be slow on large datasets.
ML Models
K-Means Clustering:
An unsupervised algorithm that groups data into k clusters based on similarity. It’s simple and
scalable but requires choosing the number of clusters in advance and assumes similar-sized
clusters.
Neural Networks:
Inspired by the brain, neural networks consist of layers of nodes that learn complex patterns.
They are powerful for tasks like image recognition and natural language processing.
What is deep learning?
● Part of the machine learning field of learning representations of data. Exceptionally
effective at learning patterns.
● Utilizes learning algorithms that derive meaning out of data by using a hierarchy of
multiple layers that mimic the neural networks of our brain.
● If you provide the system tons of information, it begins to understand it and respond in
useful ways.
● Rebirth of artificial neural networks
Actors and applications
● Very active technology adopted by big actors
● Success story for many different academic problems
○ Image processing
○ Computer vision
○ Speech recognition
○ Natural language processing
○ Translation
○ etc
● Today all industries wonder if DL can improve their process
Timeline of (deep) learning
Limitations of Linear Classifiers
● Linear classifiers (e.g., logistic regression)
● classify inputs based on linear combinations of features xi
● Many decisions involve non-linear functions of the input
● Canonical example: do 2 input elements have the same value?
● The positive and negative cases cannot be separated by a plane
How to Construct Nonlinear Classifiers?
● We would like to construct non-linear discriminative classifiers that utilize functions of
input variables
● Use a large number of simpler functions
○ If these functions are fixed (Gaussian, sigmoid, polynomial basis functions), then
optimization still involves linear combinations of (fixed functions of) the inputs
○ Or we can make these functions depend on additional parameters → need an
efficient method of training extra parameters
A simple decision
Say you want to decide whether you are going to attend a cheese festival this upcoming
weekend. There are three variables that go into your decision:
1. Is the weather good?
2. Does your friend want to go with you?
3. Is it near public transportation?
We’ll assume that answers to these questions are the only factors that go into your decision.
A simple decision
I will write the answers to these question as binary variables xi with zero being the answer ‘no’
and one being the answer ‘yes’:
1. Is the weather good? x1
2. Does your friend want to go with you? x2
3. Is it near public transportation? x3
Now, what is an easy way to describe the decision statement resulting from these inputs.
A simple decision
We could determine weights wi indicating how important each feature is to whether you would
like to attend. We can then see if:
For some predetermined threshold. If this statement is true, we would attend the
festival, and otherwise we would not.
A simple decision
For example, if we really hated bad weather but care less about going with our friend and public
transit, we could pick the weights 6, 2 and 2.
With a threshold of 5, this causes us to go if and only if the weather is good.
What happens if the threshold is decreased to 3? What about if it is decreased to 1?
A simple decision
If we define a new binary variable y that represents whether we go to the festival, we can write
this variable as:
A simple decision
Now, if I rewrite this in terms of a dot product between the vector of of all binary inputs (x), a
vector of weights (w), and change the threshold to the negative bias (b), we have:
So we are really just finding separating hyperplanes again, much as we did with logistic
regression and support vector machines!
A perceptron
We can graphically represent this decision algorithm as an object that takes 3 binary inputs and
produces a single binary output:
This object is called a perceptron when using the type of weighting scheme we just developed
A network of perceptrons
A perceptron takes a number of binary inputs and emits a binary output. Therefore it is easy to
build a network of such perceptrons, where the output from some perceptrons are used in the
inputs of other perceptrons:
Notice that some perceptrons seem to have multiple output arrows, even though we have
defined them as having only one output. This is only meant to indicate that a single output is
being sent to multiple new perceptrons.
A network of perceptrons
The input and outputs are typically represented as their own neurons, with the other neurons
named hidden layers
A network of perceptrons
The biological interpretation of a perceptron is this: when it emits a 1 this is equivalent to ‘firing’
an electrical pulse, and when it is 0 this is when it is not firing. The bias indicates how difficult it is
for this particular node to send out a signal.
Inspiration: The Brain
● Many machine learning methods inspired by biology, e.g., the (human) brain
● Our brain has ∼ 1011 neurons, each of which communicates (is connected) to ∼ 104 other
neurons
Principle
1. Data are represented as vectors:
2. Collect training data with positive and negative examples:
Principle
3. Training: find w and b so that:
<w, x> + b is positive for positive samples x,
<w, x> + b is negative for negative samples x.
Principle
3. Training: find w and b so that:
<w, x> + b is positive for positive samples x,
<w, x> + b is negative for negative samples x.
The equation <w, x> + b = 0 defines a
hyperplane.
The hyperplane acts as a linear
separator.
w is a normal vector to the hyperplane.
Principle
4. Testing: the perceptron can now classify new examples.
Principle
4. Testing: the perceptron can now classify new examples.
● A new example x is classified positive if <w, x> + b is positive,
Principle
4. Testing: the perceptron can now classify new examples.
● A new example x is classified positive if <w, x> + b is positive,
● and negative if hw, xi + b is negative.
Mathematical Model of a Neuron
● Neural networks define functions of the inputs (hidden features), computed by neurons
● Artificial neurons are called units
Activation Functions
Most commonly used activation functions:
Sigmoid:
Tanh:
ReLU (Rectified Linear Unit):
Neuron in Python
Example in Python of a neuron with a sigmoid activation function
Neural Network Architecture (Multi-Layer
Perceptron)
Network with one layer of four hidden units:
Each unit computes its value based on linear combination of values of units that point into it,
and an activation function
Neural Network Architecture (Multi-Layer
Perceptron)
Naming conventions; a 2-layer neural network: (shallow network)
● One layer of hidden units
● One output layer (we do not count the inputs as a layer)
Neural Network Architecture (Multi-Layer
Perceptron)
Going deeper: a 3-layer neural network with two layers of hidden units
Neural Network Architecture (Multi-Layer
Perceptron)
Naming conventions; a N-layer neural network: (deep network),
● N − 1 layers of hidden units
● One output layer
Representational Power
Neural network with at least one hidden layer is a universal approximator (can represent any
function).
The capacity of the network increases with more hidden units and more hidden layers
Neural Networks
We only need to know two algorithms
● Forward pass: performs inference
● Backward pass: performs learning
Forward Pass: What does the Network
Compute?
● Output of the network can be written as:
(j indexing hidden units, k indexing the output units, D number of inputs)
● Activation functions f , g: sigmoid/logistic, tanh, or rectified linear (ReLU)
Forward Pass in Python
Example code for a forward pass for a 3-layer network in Python:
Special Case
What is a single layer (no hiddens) network with a sigmoid act. function?
Special Case
Network:
Logistic regression!
Example Application
● Classify image of handwritten digit (32x32 pixels): 4 vs non-4
● How would you build your network?
● For example, use one hidden layer and the sigmoid activation function:
● How can we train the network, that is, adjust all the parameters w?
Training Neural Networks
● Find weights:
where o = f (x; w) is the output of a neural network
● Define a loss function, eg:
○ Squared loss:
○ Cross-entropy loss:
● Gradient descent:
where η is the learning rate (and E is error/loss)
Training Neural Networks: Back-propagation
Back-propagation: an efficient method for computing gradients needed to perform
gradient-based optimization of the weights in a multi-layer network
Monitor Loss During Training
Check how your loss behaves during training, to spot wrong hyperparameters, bugs, etc
Monitor Accuracy on Train/Validation During
Training
Check how your desired performance metrics behaves during training
Why ”Deep”?
Convolutional neural networks inside our brains
Human Vision <-> many layers of abstraction <-> Deep learning