0% found this document useful (0 votes)
32 views

CS L03 MachineLearning Basics 01

Uploaded by

shripurva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

CS L03 MachineLearning Basics 01

Uploaded by

shripurva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

AIMLCZG567: AI & ML

Techniques for Cyber Security


BITS Pilani Jagdish Prasad
Pilani Campus WILP
BITS Pilani
Pilani Campus

Session : 03
Title : Basics for Machine Learning - I
Agenda
• Primary Data Types
• Structured, Unstructured, Labeled and Unlabeled data
• Data Selection
• Data Sampling
• Machine Learning Fundamentals
• Supervised
• Unsupervised
• Reinforcement

BITS Pilani, Pilani Campus


Types of Data

BITS Pilani, Pilani Campus


Primary Data Types

Data Type Description


Numerical Any form of measurable data like height, weight, cost
of grocery bill etc
Categorical Grouping based on similar attributes like gender,
species, equipment type etc
Time Series Data points that are indexed at specific points in time
i.e. data is collected at consistent intervals (daily,
weekly, monthly etc)
Text Words, sentences, or paragraphs that can provide a
level of insight etc

BITS Pilani, Pilani Campus


Structured Data
• Data is usually composed of numbers or text.
• Stored in Relational databases and can be easily searched using
SQL queries.
• Numerical (Quantitative) Data
• Discrete
• Continuous
• Categorical (Qualitative) Data
• Ordinal - bug severity, customer satisfaction, happiness level
• Nominal – hair colour, gender

BITS Pilani, Pilani Campus


Unstructured Data
• Usually composed of everything else including texts, images,
videos, speech/audio, time series etc.
• Stored in non-relational databases and cannot be searched easily
• Examples:
• Image
• Video
• Time Series
• Text

BITS Pilani, Pilani Campus


Labeled and Unlabeled Data

• Labelled Data
• Any data which has a characteristic,
category, or attributes assigned to it
can be referred to as labelled data.
• Examples: photo of a cat, height of a
person, price of a product etc.
• Unlabelled Data
• Any data that does not have any labels
specifying its characteristics, identity,
classification, or properties can be
considered unlabelled data.
• Examples: photos, videos, or text that
do not have any category or
classification assigned to it.

BITS Pilani, Pilani Campus


Popular Datasets for Machine Learning
Dataset Name Description
Google Dataset Search • Google’s unified repository of major available dataset.
Engine • One can search for the dataset required for their learning model.
https://datasetsearch.research.
google.com/
Microsoft Dataset • Data repository of dataset created by the researchers at Microsoft.
https://msropendata.com/
Computer Vision Dataset • This source provides a dataset of images.
https://visualdata.io/ • Suitable for image processing, deep learning or computer vision
problems
• Excellent visual datasets available to build computer vision models.
Kaggle Dataset • Multiple type of datasets of different shapes and sizes.
https://www.kaggle.com/datas • Most of the datasets have kernels associated with them, with
ets analysis notes from data scientist.
Amazon Dataset • Contains a dataset from the field of public transport, satellite
https://registry.opendata.aws/ images, etc.
• Datasets are available on the Amazon Web Service resource like
Amazon S3.
• Convenient for AWS based machine learning model development.

BITS Pilani, Pilani Campus


Data Selection

BITS Pilani, Pilani Campus


What is Data Selection?
• Process of determining the appropriate data type and source and
suitable instruments to collect data.
• Data selection precedes the actual practice of data collection.
• Process of selecting suitable data for an investigation can impact
data integrity.
• Primary objective of data selection is to determine appropriate
data type, source, and instrument that allow investigators to
answer questions adequately.
• Determination is domain-specific and is primarily driven by the
nature of the investigation, existing literature, and accessibility to
necessary data sources.

BITS Pilani, Pilani Campus


What Data to Select?
• What is the problem being solved?
• What is the scope of the investigation?
• Defines the parameters of any investigation.
• Selected data should not extend beyond the scope of the investigation.
• What type of data should be considered: quantitative, qualitative
or a combination of both?
• What does previous experience indicate to be the most
appropriate data to collect?
• What features/attributes are required for investigation?

BITS Pilani, Pilani Campus


What is Feature Selection?
• ML models require large quantities of data for better learning.
• Collected data will have:
• Large number of attributes (features) – all may not be required for
investigation
• Data may contain noise resulting in poor learning of model
• Can slow down the training process and cause the model to be slower.
• Feature selection is a process to:
• Identify attributes (features) which are suitable for model training
• Separates usable features from rest of the features without altering their
value
• Removes noise from data
• Separates good data from the rest – reduces size of dataset

BITS Pilani, Pilani Campus


What is Feature Selection?
• Size • Size
• Locality • Locality
• Construction type • Construction type
• Type of interior • Type of interior
• Income of the owner • Availability
• Availability • People looking to rent
• People looking to rent

BITS Pilani, Pilani Campus


Why Feature Selection?
• Improved accuracy
• Simple models are easier to interpret
• Shorter training times
• Enhanced generalization by reducing Overfitting
• Easier to implement by software developers
• Reduced risk of data errors by model use
• Variable redundancy
• Bad learning behaviour in high dimensional spaces

BITS Pilani, Pilani Campus


Feature Selection Methods

• Filter methods
• Wrapper methods
• Embedded methods

• Ref: https://www.kdnuggets.com/2023/06/advanced-feature-selection-techniques-machine-
learning-models.html

BITS Pilani, Pilani Campus


Feature Selection: Filter Methods
• Filter methods are generally used as a pre-processing step.
• Selection of features is independent of any machine learning algorithms.
• Features are selected on the basis of their scores in various statistical tests
for their correlation with the outcome variable.
• Characteristics of these methods are:
• They rely on the characteristics of the data (feature characteristics)
• These are model agnostic and tend to be less computationally expensive.
• They usually give lower prediction performance than wrapper methods.
• They are suitable for a quick screen and removal of irrelevant features.
• Common techniques:
• Basic methods
• Univariate methods
• Information Gain
• Fisher’s Score
• ANOVA F-Value
• Correlation matrix with heat map
BITS Pilani, Pilani Campus
Filter Methods: Common Techniques
Technique Description
Basic selection • Remove constant value features
methods • Remove quasi-constant value features
Univariate selection • Estimate the degree of linear dependency between two variable (F-Score)
methods • 4 types of methods: SelectKBest (k-highest scores), SelectPercentile
(Percentile cut-off), SelectFpr (False positive rate), SelectFdr (False
discovery rate), SelectFwe (Family wise error)
• Chi-Square techniques used to find SelectKBest
Information Gain • Measures how much information the presence/absence of a feature
(Mutual Information) contributes to making the correct prediction on the target.
• If X and Y are independent, then knowing X does not give any information
about Y and vice versa, so their mutual information is zero.
• If X is a deterministic function of Y and Y is a deterministic function of X
then all information conveyed by X is shared with Y: knowing X determines
the value of Y and vice versa. In this case the mutual information is the
same as the uncertainty contained in Y (or X) alone, also known as the
entropy of Y (or X).
• Mutual_info_classif: for discrete variable
• Mutual_info_regression: For continuous variable

BITS Pilani, Pilani Campus


Filter Methods: Common Techniques
Technique Description
Fisher Score • Evaluates categorical variables in a classification task
• Compares the observed distribution of the different classes of target Y
among the different categories of the feature, against the expected
distribution of the target classes, regardless of the feature categories.
ANOVA F-Value • Used for quantitative features - compute the ANOVA F-value between each
(Analysis of feature and the target vector.
Variance) • F-value scores examine if there is a statistically significant difference in the
means of various groups of features and the target vector.
Correlation Matrix • Correlation is a measure of the linear relationship of 2 or more features.
with Heatmap • Features to be correlated with target but uncorrelated among themselves.
• Pearson correlation method returns coefficient values between -1 and 1.
• Correlation between two features is 0: changing any of these two features
will not affect the other.
• Correlation between two features is greater than 0: increasing the values in
one feature will also increase the values in the other feature.
• Correlation between two features is less than 0: increasing the values in
one feature will decrease the values in the other feature.

BITS Pilani, Pilani Campus


Filter Methods: Common Techniques

Technique Description
Chi-Square Test • Statistical test used to assess the relationship between two categorical
variables.
• Analyzes the relationship between a categorical feature and the target
variable.
• A greater Chi-square score shows a stronger link between the feature and
the target i.e. the feature is more important for the classification job.

BITS Pilani, Pilani Campus


Feature Selection: Wrapper Method
• Use a subset of features and train a model using
them.
• Based on the inferences drawn from the
previous model, decide to add or remove
features from the subset.
• Essentially reduced to a search problem.
• These methods are usually computationally
very expensive.
• Common techniques:
• Forward Selection
• Backwards Elimination
• Exhaustive selection
• Recursive selection
• Recursive feature elimination with cross-
validation

BITS Pilani, Pilani Campus


Wrapper Methods: Common Techniques

Technique Description
Forward Selection • Start with an empty feature set and iteratively add features to the set.
Backward Selection • Opposite of forward selection method
• Start with entire feature set and iteratively remove features
Exhaustive Feature • Compares the performance of all possible feature sub-sets
Selection • Chooses the best performing feature sub-set
Recursive Feature • Starts with whole feature set
Selection • Eliminates features repeatedly depending on their relevance as determined
by learning algorithm
Recursive Feature • Selects the best subset of features for the estimator by removing 0 to N
Elimination with features iteratively using recursive feature elimination.
Cross Validation • Selects the best subset based on the accuracy or cross-validation score of
the model.

BITS Pilani, Pilani Campus


Wrapper Methods: Forward Selection
• Start with having no feature in the model i.e. empty set of features [reduced set].
• Add the feature which best improves model till addition of a new feature does not
improve the performance of the model.
• Best of the original features is determined and added to the reduced set.
• At each later iteration, best of the remaining original features is added to the set.
• Step forward feature selection
• Starts by evaluating all features individually and selects the one that generates the best
performing algorithm, according to a pre-set evaluation criteria.
• In the second step, evaluates all possible combinations of the selected feature and a second
feature, and selects the pair that produce the best performing algorithm based on the same
pre-set criteria.
• Pre-set criteria can be the roc_auc for classification and the r_squared for regression for
example.
• Forward selection procedure is called greedy, because it evaluates all possible single,
double, triple and so on feature combinations.
• Computationally expensive, and if feature space is big, may not be feasible.

BITS Pilani, Pilani Campus


Wrapper Methods: Backward Elimination
• Start with all the features and removes the least significant feature at each iteration
which improves the performance of the model.
• Repeat this until no improvement is observed on removal of features.
• Procedure starts with the full set of attributes.
• At each step, it removes the worst attribute remaining in the set.

BITS Pilani, Pilani Campus


Wrapper Methods: Exhaustive Feature Selection
• The best subset of features is selected, over all possible feature subsets, by optimizing
a specified performance metric for a certain machine learning algorithm.
• Example: if the classifier is a logistic regression and the dataset consists of 4 features,
the algorithm will evaluate all 15 feature combinations as follows:
• all possible combinations of 1 feature
• all possible combinations of 2 features
• all possible combinations of 3 features
• all the 4 features
• Select the one that results in the best performance (e.g., classification accuracy) of
the logistic regression classifier.
• Another greedy algorithm as it evaluates all possible feature combinations.
• Computationally expensive, and if feature space is big, can be unfeasible.
• The stopping criteria is an arbitrarily set number of features - the search will finish
when it reaches the desired number of selected features.
• This is somewhat arbitrary because it may be selecting a sub-optimal number of
features, or likewise, a high number of features.

BITS Pilani, Pilani Campus


Wrapper Methods: Recursive Feature Elimination
• A greedy optimization algorithm which aims to find the best performing
feature subset.
• Repeatedly creates models and keeps aside the best or the worst performing
feature at each iteration.
• Constructs the next model with the left features until all the features are
exhausted.
• It then ranks the features based on the order of their elimination.
• In the worst case, if a dataset contains N number of features RFE will do a
greedy search for 2N combinations of features.

BITS Pilani, Pilani Campus


Wrapper Methods: Recursive Feature Elimination
with Cross-Validation
• Recursive Feature Elimination with Cross-Validated (RFECV) feature
selection technique selects the best subset of features for the estimator by
removing 0 to N features iteratively using recursive feature elimination.
• Selects the best subset based on the accuracy or cross-validation score or
roc-auc of the model.
• Recursive feature elimination technique eliminates n features from a model
by fitting the model multiple times and at each step, removing the weakest
features.

BITS Pilani, Pilani Campus


Feature Selection: Embedded Method
• Embedded methods are iterative in a sense that takes care of each iteration
of the model training process and extract those features which contribute
the most to the training for a particular iteration.
• Regularization methods are the most commonly used embedded methods
which penalize a feature given a coefficient threshold.
• Regularization methods are also called penalization methods that introduce
additional constraints into the optimization of a predictive algorithm that
bias the model toward lower complexity - fewer coefficients.
• LASSO and RIDGE regression are popular examples of these methods which
have inbuilt penalization functions to reduce overfitting.

BITS Pilani, Pilani Campus


Embedded Method: LASSO Regression
• Stands for Least Absolute Shrinkage and Selection Operator

• Linear regression equation can be represented as:


y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ε
Where: y is the dependent variable (target).
β₀, β₁, β₂, ..., βₚ are coefficients (parameters) to be estimated.
x₁, x₂, ..., xₚ are the independent variables (features).
ε represents the error term.

RSS = Sum [y – y’]2

L1 = λ * (|β₁| + |β₂| + ... + |βₚ|)


Where: λ is the regularization parameter that controls the amount of regularization
applied.
β₁, β₂, ..., βₚ are the coefficients.

Minimize = RSS + L1

BITS Pilani, Pilani Campus


Embedded Method: LASSO Regression
• L1 regularization adds penalty equivalent to absolute value of the magnitude of
coefficients.
• Penalty is added to the different parameters of model to reduce the freedom of the
model and to avoid overfitting.
• In linear regularisation, the penalty is applied over the coefficients that multiply
each of the predictors.
• Lasso is able to shrink some of the coefficients to zero so that feature can be
removed from the model.
• Lasso regularisation helps to remove non-important features from the dataset.
• Increasing the penalty will result in higher number of features removed.
• If the penalty is too high then important features may be removed resulting in a
drop in the performance of the algorithm

BITS Pilani, Pilani Campus


Filter Based Feature Selection Approach
Numerical Input, Numerical Output
• Regression predictive modelling problem with numerical input variables.
• Common techniques is to use a correlation coefficient.
• Pearson’s correlation coefficient (linear)
• Spearman’s rank coefficient (nonlinear)
Numerical Input, Categorical Output
• Classification predictive modelling problem with numerical input variables.
• Common techniques is to use correlation with a categorical target.
• ANOVA correlation coefficient (linear)
• Kendall’s rank coefficient (nonlinear)

BITS Pilani, Pilani Campus


Statistics for Filter Based Feature Selection
Categorical Input, Numerical Output
• Regression predictive modelling problem with categorical input variables.
• Can use the “Numerical Input, Categorical Output” methods but in reverse.
Categorical Input, Categorical Output
• Classification predictive modelling problem with categorical input variables.
• Common correlation measure for categorical data is the chi-squared test.
• Can also use mutual information (information gain) method of information
theory.
• Mutual information is a powerful method that may prove useful for both
categorical and numerical data, e.g. it is agnostic to the data types.

BITS Pilani, Pilani Campus


Data Sampling

BITS Pilani, Pilani Campus


What is Data Sampling?

• Method that allows to get


information about the population
based on the statistics from a
subset (sample) of the
population, without having to
investigate every individual
• Example:
• When conducting research about
a group of people, it’s rarely
possible to collect data from
every person in that group.
• We select a sample which is the
group of individuals who will
actually participate in the
research.

BITS Pilani, Pilani Campus


Data Sampling Steps

Sample Goal: Property of the population that is required to


be estimated using the sample.

Population: Scope or domain from which observations could


theoretically be made.

Selection Criteria: Methodology that will be used to accept


or reject observations in the sample.

Sample Size: Number of observations that will constitute the


sample.

BITS Pilani, Pilani Campus


Data Sampling Process

1 2 3 4 5

Identify and Select Choose Determine Collect the


define sampling sampling sampling required
target frame methods methods data
population

BITS Pilani, Pilani Campus


Sampling Types

• Probabilistic sampling methods


• Simple random sampling
• Cluster sampling
• Systematic sampling
• Stratified random sampling
• Non-probabilistic sampling methods
• Convenience sampling
• Judgemental or Selective sampling
• Snow-ball sampling
• Quota sampling

BITS Pilani, Pilani Campus


Simple Random Sampling

• Every individual is chosen entirely


randomly, and each member of
the population has an equal
chance of being selected.
• Simple random sampling reduces
selection bias.
• Simple random sampling reduces
the chances of sampling error.
• Sampling error is lowest in this
method across all the methods.

BITS Pilani, Pilani Campus


Cluster Sampling

• Clustered sample used the


subgroups of the population as the
sampling unit rather than
individuals.
• Population is divided into
subgroups, known as clusters, and a
whole cluster is randomly selected
to be included in the study.
• This sampling is used when the
focus is on a specific region or area.
• There are different types of cluster
sampling: single stage, two stage
and multi stage cluster sampling
methods.

BITS Pilani, Pilani Campus


Systematic Sampling

• Systematic sampling is defined as


a probability sampling method
• Data is chosen from a target
population by selecting a random
starting point
• A member is selected after a
fixed ‘sampling interval.’
• Sampling interval is calculated by
dividing the entire population size
by the desired sample size.
• Three types:
• Systematic random sampling
• Linear systematic
• Circular systematic

BITS Pilani, Pilani Campus


Stratified Random Sampling

• Population is divided into


subgroups (called strata) based
on different traits like gender,
category, etc.
• Select the sample(s) from these
subgroups.
• This type of sampling when
representation is required from
all subgroups of the population.
• Stratified sampling requires
proper knowledge of the
characteristics of the population.

BITS Pilani, Pilani Campus


Convenience Sampling

• Convenience sampling (also


known as availability sampling)
method that relies on data
collection from population
members who are conveniently
available.
• Facebook polls or questions are
popular example for convenience
sampling.
• Convenience sampling is a type of
sampling where the first available
primary data source will be used
without additional requirements.

BITS Pilani, Pilani Campus


Judgmental or Selective Sampling

• Based on the assessment of


experts in the field when
choosing who to ask to be
included in the sample.
• Takes less time than other
methods, and since there’s a
smaller data set

BITS Pilani, Pilani Campus


Snow-ball Sampling

• Existing people are asked to


nominate further people known
to them so that the sample
increases in size like a rolling
snowball.
• Effective when a sampling frame
is difficult to identify.
• A significant risk of selection bias
in snowball sampling, as the
referenced individuals will share
common traits with the person
who recommends them.

BITS Pilani, Pilani Campus


Quota Sampling

• Members are selected based on


pre-determined characteristics of
the population.
• A fast way of collecting samples
but leaves space for bias.

BITS Pilani, Pilani Campus


Sampling Errors

• Selection bias
• Bias is the selection of individuals in the sample that isn’t random.
• Sample cannot be representative of the population that is to be analyzed.
• Sampling error
• Statistical error that occurs when the researcher doesn’t select a sample
that represents the entire population of data.
• Results based on the sample don’t represent the results that would have
been obtained from the entire population.

BITS Pilani, Pilani Campus


Machine Learning

BITS Pilani, Pilani Campus


What is Machine Learning?

• [Machine Learning is the] field of study that gives computers


the ability to learn without being explicitly programmed. –
Arthur Samuel, 1959

• A computer program is said to learn from experience E with


respect to some task T and some performance measure P, if
its performance on T, as measured by P, improves with
experience E. – Tom Mitchell, 1997

• Machine Learning is the science (and art) of programming


computers so they can learn from data.

BITS Pilani, Pilani Campus


Machine Learning: Example

• Email spam filter is a Machine Learning program that learns to


flag spam with the help of examples of spam (flagged by
users) and examples of regular (non-spam or “ham”) emails.
• Examples that the system uses to learn are called the training set.
• Each training example is called a training instance (or sample).
• The task T is to flag spam for new emails, the experience E is the
training data, and the performance measure P needs to be defined i.e.
the ratio of correctly classified emails.
• This particular performance measure is called accuracy and it is often
used in classification tasks.
• If you download a copy of Wikipedia and your computer has a
lot more data - is this Machine Learning?

BITS Pilani, Pilani Campus


1. First you would look at what spam typically looks like. You might no

Why Machine Learning? some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”)
come up a lot in the subject. Perhaps you would also notice a few other
in the sender’s name, the email’s body, and so on.
2. You would write a detection algorithm for each of the patterns that you
and your program would flag emails as spam if a number of these patt
detected.
Email Spam: Traditional Approach 3. You would test your program, and repeat steps 1 and 2 until it is good eno
• Analyze what spam typically looks like.
– Occurrence of some words or phrases (such
as “4U,” “credit card,” “free,” and “amazing”)
tend to come up a lot in the subject.
– Other similar patterns in sender’s name,
email’s subject and body, and so on.
• Write a detection algorithm for each of the
patterns observed for the program to flag
emails as spam if a number of these
patterns are detected. Figure 1-1. The traditional approach

• Test the program and repeat steps 1 and


Since2the problem is not trivial, your program will likely become a long list
until it has an accuracy level. Since
plex rules—pretty hardthe problem is not trivial, program
to maintain.
In contrast, a will
spam likely become
filter based a long
on Machine list techniques
Learning of complexautomatical
which words and phrases are good predictors of spam by detecting unusu
rules and will be hard to maintain
quent patterns of words in the spam examples compared to the ham e
(Figure 1-2). The program is much shorter, easier to maintain, and most like
accurate.

BITS Pilani, Pilani Campus


Why Machine Learning?

Email Spam: Machine Learning


Approach
• Automatically learns which words and
phrases are good predictors of spam
by detecting unusually frequent
patterns of words in the spam
compared to the ham emails.
• The program is much shorter, easier
Figure 1-2. Machine Learning approach
to maintain, and most likely more
accurate. Moreover, if spammers notice that all their emails containing “4U” are bloc
might start writing “For U” instead. A spam filter using traditional prog
techniques would need to be updated to flag “For U” emails. If spammers ke
ing around your spam filter, you will need to keep writing new rules forever.
In contrast, a spam filter based on Machine Learning techniques automatic
ces that “For U” has become unusually frequent in spam flagged by users, an
flagging them without your intervention (Figure 1-3).

BITS Pilani, Pilani Campus


Moreover, if spammers notice that all their emails containing “4U” are bloc
might start writing “For U” instead. A spam filter using traditional prog
techniques would need to be updated to flag “For U” emails. If spammers ke
Why Machine Learning? ing around your spam filter, you will need to keep writing new rules forever.
In contrast, a spam filter based on Machine Learning techniques automatic
ces that “For U” has become unusually frequent in spam flagged by users, an
flagging them without your intervention (Figure 1-3).
• If spammers find that all their emails
containing “4U” are blocked, they
might start writing “For U” instead.
• A spam filter using traditional
programming techniques would need
to be updated to flag “For U” emails.
– If spammers keep working around
spam filter, one will need to keep
writing new rules forever. Figure 1-3. Automatically adapting to change

• A spam filter based on MachineAnother area where Machine Learning shines is for problems that either are
Learning techniques automaticallyplex for traditional approaches or have no known algorithm. For example,
speech recognition: say you want to start simple and write a program capab
notices that “For U” has become
tinguishing the words “one” and “two.” You might notice that the word “tw
unusually frequent in spam flagged
with a by
high-pitch sound (“T”), so you could hardcode an algorithm that
users, and it starts flagging them
high-pitch sound intensity and use that to distinguish ones and twos. Obvio
without programmatic interventiontechnique will not scale to thousands of words spoken by millions of very

6 | Chapter 1: The Machine Learning Landscape BITS Pilani, Pilani Campus


inspected to see what they have learned (although for some algorithms th
tricky). For instance, once the spam filter has been trained on enough spa
easily be inspected to reveal the list of words and combinations of wor

Why Machine Learning? believes are the best predictors of spam. Sometimes this will reveal unsuspe
relations or new trends, and thereby lead to a better understanding of the pr
Applying ML techniques to dig into large amounts of data can help discove
that were not immediately apparent. This is called data mining.
• ML can solve problems that are too complex for
traditional approach like speech recognition.
– Traditional approach can never scale to the required
level.
– Machine Learning algorithm can learn by itself using
many examples of audio.
• ML can help humans learn as algorithms can be
inspected to see what they have learned.
– For instance, spam filter can be inspected to reveal the
list of words and combinations of words that it believes
Figure 1-4. Machine Learning can help humans learn
are the predictors of spam.
To summarize, Machine Learning is great for:
– Sometimes this may reveal unsuspected correlations
or new trends leading to a better understanding of • Problems
the for which existing solutions require a lot of hand-tuning or lo
rules: one Machine Learning algorithm can often simplify code and per
problem.
ter.
• ML techniques can dig into large amounts of data
• Complex problems for which there is no good solution at all using a t
and help discover patterns that may not approach: the best Machine Learning techniques can find a solution.
immediately apparent. This is data mining. • Fluctuating environments: a Machine Learning system can adapt to new
• Getting insights about complex problems and large amounts of data.

BITS Pilani, Pilani Campus


History of Machine Learning

BITS Pilani, Pilani Campus


Types of Machine Learning

• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning

BITS Pilani, Pilani Campus


Supervised Learning
• Supervised learning is where a known dataset is used to classify
or predict with data in hand.
• Supervised learning methods learn from labelled data and then
use the insight to make decisions on the testing data.
• Supervised learning has following two major categories:
• Semi-supervised learning
• Type of learning where the initial training data is incomplete.
• In this type of learning, both labelled and unlabelled are used in the training
phase.
• Active learning
• In this type of learning algorithm, the machine learning system gets active
queries made to the user and learns on-the-go.
• This is a specialized case of supervised learning.

BITS Pilani, Pilani Campus


Supervised Learning
Types of Supervised learning
• Classification: A
classification problem is
when the output variable is
a category, such as “red” or
“blue” or “disease” and “no
disease”.
• Regression: A regression
problem is when the output
variable is a real value, such
as “dollars” or “weight”.

BITS Pilani, Pilani Campus


Supervised Learning: Example
• Face recognition
• Face recognizers use supervised approaches to identify new faces.
• Face recognizers extract information from a bunch of facial images that
are provided to it during the training phase.
• It uses insights gained after training to detect new faces.
• Spam detect
• Supervised learning helps distinguish spam emails in the inbox by
separating them from legitimate emails also known as ham emails.
• During this process, the training data enables learning, which helps such
systems to send ham emails to the inbox and spam emails to the Spam
folder.

BITS Pilani, Pilani Campus


Unsupervised Learning
• Unsupervised learning technique is where the initial data is not
labelled.
• Insights are drawn by processing data whose structure is not
known before hand.
• These are more complex processes since the system learns by
itself without any intervention.

BITS Pilani, Pilani Campus


Unsupervised Learning

Types of Unsupervised learning


• Clustering: A clustering
problem is where you want to
discover the inherent
groupings in the data, such as
grouping customers by
purchasing behavior.
• Association: An association
rule learning problem is
where you want to discover
rules that describe large
portions of your data, such as
people that buy X also tend to
buy Y.

BITS Pilani, Pilani Campus


Unsupervised Learning: Example
• User behavior analysis
• Behavior analytics uses unlabelled data about different human traits
and human interactions.
• This data is then used to put each individual into different groups based
on their behavior patterns.
• Market basket analysis
• This is another example where unsupervised learning helps identify the
likelihood that certain items will always appear together.
• An example of such an analysis is the shopping cart analysis, where
chips, dips, and beer are likely to be found together in the basket, as
shown in the following diagram

BITS Pilani, Pilani Campus


Reinforced Learning
• Type of dynamic programming where the machine learns from its
environment to produce an output that will maximize the
reward.
• Machine requires no external agent but learns from the
surrounding processes in the environment.

BITS Pilani, Pilani Campus


Reinforced Learning

BITS Pilani, Pilani Campus


Reinforced Learning
• Self driving cars
• Self driving cars exhibit autonomous motion by learning from the
environment.
• The robust vision technologies in such a system are able to adapt from
surrounding traffic conditions.
• These technologies are amalgamated with complex software and
hardware movements, to make it possible to navigate through the traffic.
• Intelligent gaming programs
• DeepMind's artificially intelligent G program has been successful in
learning a number of games in a matter of hours.
• Such systems use reinforcement learning in the background to quickly
adapt game moves.
• The G program was able to beat world known AI chess agent Stockfish
with just four hours of training.

BITS Pilani, Pilani Campus


Machine Learning Types on a Page

BITS Pilani, Pilani Campus


Machine Learning Process

BITS Pilani, Pilani Campus


Machine Learning Process

• The ingested data is analyzed to detect patterns in the data


Analysis phase • Use patterns to create explicit features or parameters that can
be used to train the model.

• Data parameters generated in the previous phases are used to


create machine learning models in this phase.
Training phase • Training phase is an iterative process, where the data
incrementally helps to improve the quality of prediction.

• Machine learning models created in the training phase are


tested with more data and model's performance is assessed.
Testing phase • Test with data that has not been used in previous phase
• Model evaluation may or may not require parameter training.

• Tuned models are fed with real-world data at this phase.


Application phase • The model is deployed in the production environment.

BITS Pilani, Pilani Campus


Learning Components

• Learning = Representation + Evaluation + Optimization

• Representation refers to one or a set of learning algorithms


chosen for learner to learn from.
• Also known as Classifiers or Hypothesis Plane.
• Evaluation function scores the algorithms to distinguish the
good algorithm from not so good one.
• Optimization determines the efficiency and accuracy of the
evaluated algorithm and finds if the algorithm has more than
one optimum.

BITS Pilani, Pilani Campus


review articles

Learning Components: Examples


Table 1. The three components of learning algorithms. sib
the
the
Representation Evaluation Optimization mi
Instances Accuracy/Error rate Combinatorial optimization gin
K-nearest neighbor Precision and recall Greedy search and
Support vector machines Squared error Beam search cho
Hyperplanes Likelihood Branch-and-bound dat
Naive Bayes Posterior probability Continuous optimization dom
Logistic regression Information gain Unconstrained to
Decision trees K-L divergence Gradient descent som
Sets of rules Cost/Utility Conjugate gradient the
Propositional rules Margin Quasi-Newton methods ver
Logic programs Constrained ac
Neural networks Linear programming fro
Graphical models Quadratic programming tes
Bayesian networks end
Conditional random fields cla

tes
for
Algorithm 1. Decision tree induction. BITS Pilani, Pilani Campus
Machine Learning in Nutshell

BITS Pilani, Pilani Campus


Applications of Machine Learning
Prediction: Machine learning can also be used in the prediction systems. Considering the loan
example, to compute the probability of a fault, the system will need to classify the available data in
groups.

Image recognition: Machine learning can be used for face detection in an image as well. There is a
separate category for each person in a database of several people.

Speech Recognition: It is the translation of spoken words into the text. It is used in voice searches
and more. Voice user interfaces include voice dialling, call routing, and appliance control. It can also
be used a simple data entry and the preparation of structured documents.

Medical diagnoses: ML is trained to recognize cancerous tissues.

Financial industry and trading: companies use ML in fraud investigations and credit checks.

Examples: wearable fitness tracker like Fitbit, or intelligent home assistant like Google Home.

BITS Pilani, Pilani Campus


Machine Learning in Real Life
• Few real life instances where ML is applied:
• Google search
• Spam mail filter
• Self-driving Google car
• Cyber fraud detection
• Recommendation engines: Facebook, Netflix, Amazon, Flipkart,
Makemytrip etc
• Big Data has better sophistication due to ML
• Enables the analysis of large chunks of Big Data.
• Automating generic methods/algorithms for data extraction and
interpretation to replace traditional statistical techniques.
• Rapid evolution of ML has enabled a large number of use cases
and demands in modern life.

BITS Pilani, Pilani Campus


Thank You

BITS Pilani, Pilani Campus

You might also like