The document discusses machine learning, focusing on supervised learning with linear regression and unsupervised learning with clustering. Linear regression is used to predict a dependent variable based on an independent variable, while clustering aims to group similar objects without labeled training data. Additionally, it introduces multidimensional scaling (MDS) as a technique for visualizing similarities in datasets.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
0 views
Module1 ML2 Final
The document discusses machine learning, focusing on supervised learning with linear regression and unsupervised learning with clustering. Linear regression is used to predict a dependent variable based on an independent variable, while clustering aims to group similar objects without labeled training data. Additionally, it introduces multidimensional scaling (MDS) as a technique for visualizing similarities in datasets.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12
Machine learning (ML)
Supervised learning – regression
• Linear regression plays an important role in the subfield of artificial intelligence known as machine learning. The linear regression algorithm is one of the fundamental supervised machine-learning algorithms due to its relative simplicity and well-known properties. • Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable. • A well-fitted regression model mixes out predicted values close to actual values. • Hence, a regression model which ensures that the difference between predicted and actual values is low can be considered as a good model. Model • Figure represents a very simple problem of real estate value prediction solved using linear regression model. If ‘area’ is the predictor variable (say x) and ‘value’ is the target variable (say y), the linear regression model can be represented in the form:
For a certain value of x, say x̂, the value of y is
predicted as ŷ whereas the actual value of y is Y (say). Linear regression
The distance between the actual value and the
fitted or predicted value, i.e. ŷ is known as residual. The regression model can be considered to be fitted well if the difference between actual and predicted value, i.e. the residual value is less. Unsupervised learning • Unlike supervised learning, in unsupervised learning, there is no labelled training data to learn from and no prediction to be made. In unsupervised learning, the objective is to take a dataset as input and try to find natural groupings or patterns within the data elements or records. • Therefore, unsupervised learning is often termed as descriptive model and the process of unsupervised learning is referred as pattern discovery or knowledge discovery. Clustering • Clustering is the main type of unsupervised learning. It intends to group or organize similar objects together. For that reason, objects belonging to the same cluster are quite similar to each other while objects belonging to different clusters are quite dissimilar. • Hence, the objective of clustering to discover the intrinsic grouping of unlabelled data and form clusters. Different measures of similarity can be applied for clustering. One of the most commonly adopted similarity measure is distance. • Two data items are considered as a part of the same cluster if the distance between them is less. In the same way, if the distance between the data items is high, the items do not generally belong to the same cluster. • This is also known as distance-based clustering. Unsupervised learning Clustering How does unsupervised learning work? • As the name suggests, unsupervised learning uses self-learning algorithms—they learn without any labels or prior training. Instead, the model is given raw, unlabeled data and has to infer its own rules and structure the information based on similarities, differences, and patterns without explicit instructions on how to work with each piece of data. • Unsupervised learning algorithms are better suited for more complex processing tasks, such as organizing large datasets into clusters. They are useful for identifying previously undetected patterns in data and can help identify features useful for categorizing data. • Imagine that you have a large dataset about weather. An unsupervised learning algorithm will go through the data and identify patterns in the data points. For instance, it might group data by temperature or similar weather patterns. • While the algorithm itself does not understand these patterns based on any previous information you provided, you can then go through the data groupings and attempt to classify them based on your understanding of the dataset. For instance, you might recognize that the different temperature groups represent all four seasons or that the weather patterns are separated into different types of weather, such as rain, sleet, or snow. Bulk RNA-seq MDS • Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. • MDS is used to translate "information about the pairwise 'distances' among a set of objects or individuals" into a configuration of points mapped into an abstract Cartesian space” • It is a form of non-linear dimensionality reduction. Infer sample quality (quality check) of bulk RNA-seq samples
FPKM counts
• Multidimensional scaling (MDS) is a set of data analysis techniques
used to explore the structure of (dis)similarity data. • MDS represents a set of objects as points in a multidimensional ? space in such a way that the points corresponding to similar objects are located close together, while those corresponding to dissimilar objects are located far apart. • The investigator then attempts to make sense of the derived object configuration by identifying meaningful regions and/or directions in the space. Clusters from single-cell RNA-seq data