0% found this document useful (0 votes)
40 views

Lecture 2. Dimension Reduction

차원축소 알고리즘 2

Uploaded by

김상우
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Lecture 2. Dimension Reduction

차원축소 알고리즘 2

Uploaded by

김상우
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Dimensionality Reduction

Lecture 2.

Sangsoo Lim
Assistant Professor
School of AI Software Convergence
Dongguk University-Seoul
Course Overview
Lecture 1. Lecture 3.

Introduction to Dimensionality Reduction Specialized Techniques & Real-World Applications


- Pathway based Dimension Reduction
in Bioinformatics
- Benchmarking joint dimension reduction
- Introduction on dimensionality reduction
- Deep learning based methods
- Challenges in Bioinformatics
- Basics of Principal Component Analysis (PCA)
- PCA in Omics data

Lecture 2. Lecture 4.

Advanced Linear & Non-linear Methods Practices


- Linear Discriminant Analysis (LDA) in Bioinformatics
- t-SNE and UMAP
- Regression
- Canonical Correlation Analysis (CCA)
- Dimension reduction in Multi-omics Analysis
Recap on Lecture 1
• Dimension: the measure of a specific aspect of a physical object or a mathematical
or conceptual construct

• Curse of Dimensionality

• Singular Value Decomposition & Principal Component Analysis


Linear Discriminant Analysis (LDA)
• Supervised Technique: Uses class labels to maximize the between-class differences and minimize
the within-class differences.

• Dimensionality Reduction: Reduces data dimensions while preserving as much of the class
discriminatory information as possible.

• Projection: Projects data onto a line (or hyperplane) to achieve optimal class separation.

• Between-Class & Within-Class Scatter: Quantifies the separation between and compactness
within different classes.

• Assumptions: Assumes class-conditional densities are Gaussian and share a common covariance
matrix.

• Class Discriminatory Power: Aims to make data of one class distinct from another, enhancing
classification accuracy.

• Applications: Widely used in pattern recognition, face recognition, and predictive modeling.
LDA vs PCA
• Multiple classes and PCA

• Suppose there are C classes in the training data.

• PCA is based on the sample covariance which characterizes the scatter of the
entire data set, irrespective of class-membership.

• The projection axes chosen by PCA might not provide good discrimination
power.
LDA vs PCA
• What is the goal of LDA?

• Perform dimensionality reduction while preserving as much of the class


discriminatory information as possible.

• Seeks to find directions along which the classes are best separated.

• Takes into consideration the scatter within-classes


but also the scatter between-classes.

• More capable of distinguishing image variation due to identity from variation due
to other sources such as illumination and expression.
Mathematics behind LDA
Mathematics behind LDA
• LDA computes a transformation that maximizes the between-class scatter while
minimizing the within-class scatter:

• Such a transformation should retain class separability while reducing the variation
due to sources other than identity (e.g., illumination).
tSNE
• t-Distributed Stochastic Neighbor Embedding

• Aims to solve the problems of PCA

• Non-linear scaling to represent changes at different levels

• Optimal separation in 2-dimensions


tSNE
• A nonlinear embedding algorithm that is particularly adept at preserving points
within clusters
tSNE
• t-SNE does not preserve global data structure distances matter
only within each cluster
tSNE – Mathematics
• Calculates similarity of points in high dimensional space and points in the
corresponding low dimensional space

• Probability of 𝑗𝑗 being neighbor of 𝑗𝑗


2
𝑥𝑥𝑖𝑖 −𝑥𝑥𝑗𝑗
− 2
2𝜎𝜎2 − 𝑦𝑦𝑖𝑖 −𝑦𝑦𝑗𝑗
𝑒𝑒 𝑖𝑖 𝑒𝑒
𝑝𝑝𝑗𝑗|𝑖𝑖 = 2 𝑞𝑞𝑗𝑗|𝑖𝑖 = 2
𝑥𝑥 −𝑥𝑥
− 𝑖𝑖 2𝑘𝑘 ∑𝑘𝑘 𝑒𝑒 − 𝑦𝑦𝑖𝑖 −𝑦𝑦𝑘𝑘
2𝜎𝜎𝑖𝑖
∑𝑘𝑘 𝑒𝑒

𝑝𝑝𝑗𝑗|𝑖𝑖
• 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = ∑𝑖𝑖 𝐾𝐾𝐾𝐾(𝑃𝑃𝑖𝑖 | 𝑄𝑄𝑖𝑖 = ∑𝑖𝑖 ∑𝑗𝑗 𝑝𝑝𝑗𝑗|𝑖𝑖 log
𝑞𝑞𝑗𝑗|𝑖𝑖
How does tSNE work?
• Based around all-vs-all table of pairwise cell to cell distances

0 10 10 295 158 153


9 0 1 217 227 213
1 8 0 154 225 238
205 189 260 0 23 45
248 227 246 44 0 54
233 176 184 41 36 0
Distance scaling and perplexity
• Perplexity = expected number of neighbours within a cluster

• Distances scaled relative to perplexity neighbours

0 4 6 586 657 836


4 0 4 815 527 776
9 3 0 752 656 732
31 28 29 0 4 7
31 24 25 4 0 7
40 37 32 8 8 0
Perplexity Robustness
tSNE Projection
• Randomly scatter all points within the space (normally 2D)

• Start a simulation

• Aim is to make the point distances match the distance matrix

• Shuffle points based on how well they match

• Stop after fixed number of iterations, or

• Stop after distances have converged


tSNE Projection

• X and Y don’t mean anything (unlike PCA)

• Distance doesn’t mean anything (unlike PCA)

• Close proximity is highly informative

• Distant proximity isn’t very interesting

• Can’t rationalize distances, or add in more data


tSNE: Perplexity Settings Matter

Original Perplexity = 2 Perplexity = 30 Perplexity = 100

https://distill.pub/2016/misread-tsne/
tSNE: Cluster Sizes are Meaningless

Original Perplexity = 5 Perplexity = 50

https://distill.pub/2016/misread-tsne/
tSNE: Distances between clusters can’t be trusted

Original Perplexity = 5 Perplexity = 30

https://distill.pub/2016/misread-tsne/
So tSNE is great then?

Kind of…
Imagine a dataset with only one super informative gene

• Now 3 genes
• Now 3,000 genes

• Everything is the same


distance from everything

Distance within cluster = low Distance within cluster = higher


Distance between clusters = high Distance between clusters = lower
So what is the choice?

• PCA • tSNE
• Requires more than 2 • Can’t cope with noisy data
dimensions • Loses the ability to cluster
• Thrown off by quantised data
• Expects linear relationships
Answer: Combine the two methods, get the best of both worlds.

• PCA • tSNE
– Good at extracting signal from – Can reduce to 2D well
noise
– Can cope with non-linear scaling
– Extracts informative dimensions

This is what many pipelines do in their default analysis.


So PCA + tSNE is great then?

Kind of…
• tSNE is slow. This is probably it’s biggest crime.
– tSNE doesn’t scale well to large numbers of cells (10k+).

• tSNE only gives reliable information on the closest neighbours large distance
information is almost irrelevant
UMAP to the rescue!

• UMAP is a replacement for tSNE to fulfil the same role

• Conceptually very similar to tSNE, but with a couple of relevant (and


somewhat technical) changes

• Practical outcome is:


– UMAP is quite a bit quicker than tSNE
– UMAP can preserve more global structure than tSNE*
– UMAP can run on raw data without PCA preprocessing*
– UMAP can allow new data to be added to an existing projection

* In theory, but possibly not in practice


UMAP differences

• Instead of the single perplexity value in tSNE, UMAP defines


– Nearest neighbours: the number of expected nearest neighbours –
basically the same concept as perplexity

– Minimum distance: how tightly UMAP packs points which are close
together

• Nearest neighbours will affect the influence given to global vs local


information. Min dist will affect how compactly packed the local parts of the
plot are.
UMAP differences

• Structure preservation – mostly in the 2D projection scoring

Scoring
(penalty)
value

Distance in Distance in
projected data original data

tSNE UMAP
https://towardsdatascience.com/how-exactly-umap-works-13e3040e1668
So UMAP is great then?

• Kind of…

tSNE

UMAP
So UMAP is all hype then?

• No, it really does better for some datasets…

3D mammoth skeleton projected into 2D


tSNE: Perplexity 2000 2h 5min

UMAP: Nneigh 200, mindist 0.25, 3min


https://pair-code.github.io/understanding-umap/
Practical approach PCA + tSNE/UMAP

• Filter heavily before starting


– Nicely behaving cells
– Expressed genes
– Variable genes

• Do PCA
– Extract most interesting signal
– Take top PCs. Reduce dimensionality (but not to 2)

• Do tSNE/UMAP
– Calculate distances from PCA projections
– Scale distances and project into 2-dimensions
So PCA + UMAP is great then?

Kind of… as long as you only have one dataset


– In 10X every library is a 'batch'
– More biases over time/distance
– Biases prevent comparisons
– Need to align the datasets
Data Integration

• Works on the basis that there are 'equivalent' collections


of cells in two (or more) datasets

• Find 'anchor' points which are equivalent cells which


should be aligned

• Quantitatively skew the data to optimally align the anchors


UMAP/tSNE integration

Define key 'anchor' points between equivalent cells


UMAP/tSNE integration

Skew data to align the anchors


Defining Integration Anchors

• Mutual Nearest Neighbours (MNN)

For each cell in data1 find the 3 closest cells in data2.


Defining Integration Anchors

• Mutual Nearest Neighbours (MNN)

Do the same thing the other way around.


Defining Integration Anchors

• Mutual Nearest Neighbours (MNN)

Select pairs of cells which are in others nearest neighbor groups.


Defining Integration Anchors

• Distance in original expression quantitation


– Really noisy (different technology, normalization, depth)
– Slow and prone to mis-prediction

• Use a cleaner (less noisy) representation


– Correlation (CCA)
– Principal Components (rPCA)
Canonical Correlation Analysis (at glance)

Cells Cells

Genes
Genes

DataSet 1 DataSet 2
Gene Expression Values may match poorly, but gene correlations are more robust
Reciprocal PCA (at glance)

Define PCA Space for Data 1 Define PCA Space for Data 2

Project cells from data 2 Repeat by projecting Data1


into Data2 PCA space

PC1
into the data1 PCA space
PC1

Find mutual nearest


neighbours
Find nearest neighbours

PC2

PC2
Linear Regression with Regularization
• Linear regression is a linear approach to modeling the relationship between
• a scalar response (or dependent variable) and

• one or more explanatory variables (or independent variables)

Regression formula
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 1 + 𝛽𝛽1 𝑥𝑥𝑖𝑖𝑖 + ⋯ + 𝛽𝛽𝑝𝑝 𝑥𝑥𝑖𝑖𝑖𝑖 + 𝜀𝜀𝑖𝑖 , 𝑖𝑖 = 1, … , 𝑛𝑛

𝑦𝑦�𝑖𝑖

Loss term (quantitative reasoning)


 Least square error
𝑛𝑛 𝑛𝑛
1 2 1
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = � 𝜀𝜀𝑖𝑖 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2
𝑛𝑛 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1
Linear Regression with Regularization
• Fitting model can be polynomial
• class sklearn.preprocessing.PolynomialFeatures

• How to decide degree?


Linear Regression with Regularization
• Loss regularization
• constraints expressive power of the model
𝑛𝑛 𝑝𝑝
1
𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = �(𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )2 + 𝜆𝜆 � |𝛽𝛽𝑗𝑗 |
𝑛𝑛
𝑖𝑖=1 𝑗𝑗=1

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 1 + 𝛽𝛽1 𝑥𝑥𝑖𝑖𝑖 + ⋯ + 𝛽𝛽𝑝𝑝 𝑥𝑥𝑖𝑖𝑖𝑖 + 𝜀𝜀𝑖𝑖 , 𝑖𝑖 = 1, … , 𝑛𝑛

𝑦𝑦�𝑖𝑖

• The model parameter learns not only to reduce square errors but also
to reduce the sum of coefficients 𝛽𝛽𝑗𝑗s.
LASSO, Ridge, and ElasticNet

LASSO Ridge ElasticNet

Regularization term 𝜆𝜆 � |𝛽𝛽| 𝜆𝜆 � 𝛽𝛽 2 𝜆𝜆1 � |𝛽𝛽| + 𝜆𝜆2 � 𝛽𝛽 2


LASSO, Ridge, and ElasticNet
Canonical Correlation Analysis (CCA)
• Q: Axes that are shared by two datasets?

• Canonical components:
• linear combinations of the variables in each group

• The goal of CCA is to find maximally correlated canonical components


Canonical Correlation Analysis (CCA)
• What are canonical components (CCs) that are shared between two
datasets?

• Axes that are shared by two datasets are selected

• 𝑖𝑖-th CC is orthogonal to the first (𝑖𝑖−1) CCs


Canonical Correlation Analysis (CCA)
• Applications
e.g) single cell batch correction
: CCA embeds two different features into common space

Stuart et al. 2019. Cell


Canonical Correlation Analysis (CCA)
• How to compute CCs?
1. Compute the covariance matrix 𝐾𝐾𝑋𝑋𝑋𝑋 of dataset 𝑋𝑋 and 𝑌𝑌

2. Compute the variance matrix 𝐾𝐾𝑋𝑋𝑋𝑋 of dataset 𝑋𝑋 and 𝐾𝐾𝑋𝑋𝑋𝑋 of


dataset 𝑌𝑌,respectively

3. Eigenvalue decomposition to find canonical weights

−1 −1
𝐾𝐾𝑌𝑌𝑌𝑌 K YX 𝐾𝐾𝑋𝑋𝑋𝑋 K XY 𝐰𝐰𝐛𝐛 = ρ2 K 𝑌𝑌Y 𝐰𝐰𝒃𝒃
−1
𝐾𝐾𝑋𝑋𝑋𝑋 K XY 𝐰𝐰𝐛𝐛
𝐰𝐰𝐚𝐚 =
𝜌𝜌

 Power method for approximation


Canonical Correlation Analysis (CCA)
Pros & Cons of CCA
• Remove data-specific variations or noise in the two datasets

• Find latent variable shared by the two data sets.

• Only considers linear transformations of the original variables

• Only considers linear correlation of canonical components

 Power method for approximation


Multi-Omics Factor Analysis (MOFA)
A general framework for the unsupervised integration of multi-omic data sets

Source: EMBL-EBI
What are the problems of CCA for multi-omics data
integration?

• In CCA the canonical components are defined as linear combinations


of features that maximize the cross-correlation between the two data
sets. This implies that:
• It only works for the integration of 2 data sets

• It only finds sources of covariation between the two data sets. CCA is not able to
find the sources of variation that are present within individual data sets
Multi-Omics Factor Analysis (MOFA)

• The structure of the data is specified in


the prior distributions of the Bayesian
model

• The critical part of the model is the use


sparsity priors, which enable automatic
relevance determination of the factors
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)

Differentiation state
(hidden/latent variable)
Multi-Omics Factor Analysis (MOFA)

Differentiation state
(hidden/latent variable)

Omics
(observed variables)
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)
MOFA: Downstream Analysis
MOFA: Downstream Analysis
MOFA: Downstream Analysis
MOFA: Downstream Analysis
MOFA Applications: Chronic Lymphocytic Leukemia
MOFA Applications: Chronic Lymphocytic Leukemia
MOFA Applications: Chronic Lymphocytic Leukemia

• Inspection of feature weights for Factors 1 and 2


MOFA Applications: Chronic Lymphocytic Leukemia

• Visualization of samples in the latent space


MOFA Applications: Chronic Lymphocytic Leukemia

• Factors are associated with clinical response.


MOFA Applications: Chronic Lymphocytic Leukemia

biofam.github.io/MOFA2

You might also like