0% found this document useful (0 votes)
0 views

Module1 ML2 Final

The document discusses machine learning, focusing on supervised learning with linear regression and unsupervised learning with clustering. Linear regression is used to predict a dependent variable based on an independent variable, while clustering aims to group similar objects without labeled training data. Additionally, it introduces multidimensional scaling (MDS) as a technique for visualizing similarities in datasets.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Module1 ML2 Final

The document discusses machine learning, focusing on supervised learning with linear regression and unsupervised learning with clustering. Linear regression is used to predict a dependent variable based on an independent variable, while clustering aims to group similar objects without labeled training data. Additionally, it introduces multidimensional scaling (MDS) as a technique for visualizing similarities in datasets.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Machine learning (ML)

Supervised learning – regression


• Linear regression plays an important role in the subfield of artificial intelligence
known as machine learning. The linear regression algorithm is one of the
fundamental supervised machine-learning algorithms due to its relative
simplicity and well-known properties.
• Linear regression analysis is used to predict the value of a variable based on
the value of another variable. The variable you want to predict is called the
dependent variable. The variable you are using to predict the other variable's
value is called the independent variable.
• A well-fitted regression model mixes out predicted values close to actual
values.
• Hence, a regression model which ensures that the difference between
predicted and actual values is low can be considered as a good model.
Model
• Figure represents a very simple problem of real estate value
prediction solved using linear regression model. If ‘area’ is the
predictor variable (say x) and ‘value’ is the target variable (say y), the
linear regression model can be represented in the form:

For a certain value of x, say x̂, the value of y is


predicted as ŷ whereas the actual value of y is Y (say).
Linear regression

The distance between the actual value and the


fitted or predicted value, i.e. ŷ is known as
residual. The regression model can be
considered to be fitted well if the difference
between actual and predicted value, i.e. the
residual value is less.
Unsupervised learning
• Unlike supervised learning, in unsupervised learning, there is no
labelled training data to learn from and no prediction to be made. In
unsupervised learning, the objective is to take a dataset as input and try
to find natural groupings or patterns within the data elements or
records.
• Therefore, unsupervised learning is often termed as descriptive model
and the process of unsupervised learning is referred as pattern
discovery or knowledge discovery.
Clustering
• Clustering is the main type of unsupervised learning. It intends to group or
organize similar objects together. For that reason, objects belonging to the
same cluster are quite similar to each other while objects belonging to different
clusters are quite dissimilar.
• Hence, the objective of clustering to discover the intrinsic grouping of
unlabelled data and form clusters. Different measures of similarity can be
applied for clustering. One of the most commonly adopted similarity measure is
distance.
• Two data items are considered as a part of the same cluster if the distance
between them is less. In the same way, if the distance between the data items is
high, the items do not generally belong to the same cluster.
• This is also known as distance-based clustering.
Unsupervised learning
Clustering
How does unsupervised learning
work?
• As the name suggests, unsupervised learning uses self-learning
algorithms—they learn without any labels or prior training.
Instead, the model is given raw, unlabeled data and has to infer
its own rules and structure the information based on similarities,
differences, and patterns without explicit instructions on how to
work with each piece of data.
• Unsupervised learning algorithms are better suited for more
complex processing tasks, such as organizing large datasets into
clusters. They are useful for identifying previously undetected
patterns in data and can help identify features useful for
categorizing data.
• Imagine that you have a large dataset about weather. An
unsupervised learning algorithm will go through the data and
identify patterns in the data points. For instance, it might group
data by temperature or similar weather patterns.
• While the algorithm itself does not understand these patterns
based on any previous information you provided, you can then
go through the data groupings and attempt to classify them
based on your understanding of the dataset. For instance, you
might recognize that the different temperature groups represent
all four seasons or that the weather patterns are separated into
different types of weather, such as rain, sleet, or snow.
Bulk RNA-seq
MDS
• Multidimensional scaling (MDS) is a
means of visualizing the level of similarity
of individual cases of a dataset.
• MDS is used to translate "information
about the pairwise 'distances' among a
set of objects or individuals" into a
configuration of points mapped into an
abstract Cartesian space”
• It is a form of non-linear dimensionality
reduction.
Infer sample quality (quality check)
of bulk RNA-seq samples

FPKM counts

• Multidimensional scaling (MDS) is a set of data analysis techniques


used to explore the structure of (dis)similarity data.
• MDS represents a set of objects as points in a multidimensional
? space in such a way that the points corresponding to similar objects
are located close together, while those corresponding to dissimilar
objects are located far apart.
• The investigator then attempts to make sense of the derived object
configuration by identifying meaningful regions and/or directions in
the space.
Clusters from single-cell RNA-seq
data

You might also like