Lecture 2. Dimension Reduction
Lecture 2. Dimension Reduction
Lecture 2.
Sangsoo Lim
Assistant Professor
School of AI Software Convergence
Dongguk University-Seoul
Course Overview
Lecture 1. Lecture 3.
Lecture 2. Lecture 4.
• Curse of Dimensionality
• Dimensionality Reduction: Reduces data dimensions while preserving as much of the class
discriminatory information as possible.
• Projection: Projects data onto a line (or hyperplane) to achieve optimal class separation.
• Between-Class & Within-Class Scatter: Quantifies the separation between and compactness
within different classes.
• Assumptions: Assumes class-conditional densities are Gaussian and share a common covariance
matrix.
• Class Discriminatory Power: Aims to make data of one class distinct from another, enhancing
classification accuracy.
• Applications: Widely used in pattern recognition, face recognition, and predictive modeling.
LDA vs PCA
• Multiple classes and PCA
• PCA is based on the sample covariance which characterizes the scatter of the
entire data set, irrespective of class-membership.
• The projection axes chosen by PCA might not provide good discrimination
power.
LDA vs PCA
• What is the goal of LDA?
• Seeks to find directions along which the classes are best separated.
• More capable of distinguishing image variation due to identity from variation due
to other sources such as illumination and expression.
Mathematics behind LDA
Mathematics behind LDA
• LDA computes a transformation that maximizes the between-class scatter while
minimizing the within-class scatter:
• Such a transformation should retain class separability while reducing the variation
due to sources other than identity (e.g., illumination).
tSNE
• t-Distributed Stochastic Neighbor Embedding
𝑝𝑝𝑗𝑗|𝑖𝑖
• 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 = ∑𝑖𝑖 𝐾𝐾𝐾𝐾(𝑃𝑃𝑖𝑖 | 𝑄𝑄𝑖𝑖 = ∑𝑖𝑖 ∑𝑗𝑗 𝑝𝑝𝑗𝑗|𝑖𝑖 log
𝑞𝑞𝑗𝑗|𝑖𝑖
How does tSNE work?
• Based around all-vs-all table of pairwise cell to cell distances
• Start a simulation
https://distill.pub/2016/misread-tsne/
tSNE: Cluster Sizes are Meaningless
https://distill.pub/2016/misread-tsne/
tSNE: Distances between clusters can’t be trusted
https://distill.pub/2016/misread-tsne/
So tSNE is great then?
Kind of…
Imagine a dataset with only one super informative gene
• Now 3 genes
• Now 3,000 genes
• PCA • tSNE
• Requires more than 2 • Can’t cope with noisy data
dimensions • Loses the ability to cluster
• Thrown off by quantised data
• Expects linear relationships
Answer: Combine the two methods, get the best of both worlds.
• PCA • tSNE
– Good at extracting signal from – Can reduce to 2D well
noise
– Can cope with non-linear scaling
– Extracts informative dimensions
Kind of…
• tSNE is slow. This is probably it’s biggest crime.
– tSNE doesn’t scale well to large numbers of cells (10k+).
• tSNE only gives reliable information on the closest neighbours large distance
information is almost irrelevant
UMAP to the rescue!
– Minimum distance: how tightly UMAP packs points which are close
together
Scoring
(penalty)
value
Distance in Distance in
projected data original data
tSNE UMAP
https://towardsdatascience.com/how-exactly-umap-works-13e3040e1668
So UMAP is great then?
• Kind of…
tSNE
UMAP
So UMAP is all hype then?
• Do PCA
– Extract most interesting signal
– Take top PCs. Reduce dimensionality (but not to 2)
• Do tSNE/UMAP
– Calculate distances from PCA projections
– Scale distances and project into 2-dimensions
So PCA + UMAP is great then?
Cells Cells
Genes
Genes
DataSet 1 DataSet 2
Gene Expression Values may match poorly, but gene correlations are more robust
Reciprocal PCA (at glance)
Define PCA Space for Data 1 Define PCA Space for Data 2
PC1
into the data1 PCA space
PC1
PC2
PC2
Linear Regression with Regularization
• Linear regression is a linear approach to modeling the relationship between
• a scalar response (or dependent variable) and
Regression formula
𝑦𝑦𝑖𝑖 = 𝛽𝛽0 1 + 𝛽𝛽1 𝑥𝑥𝑖𝑖𝑖 + ⋯ + 𝛽𝛽𝑝𝑝 𝑥𝑥𝑖𝑖𝑖𝑖 + 𝜀𝜀𝑖𝑖 , 𝑖𝑖 = 1, … , 𝑛𝑛
𝑦𝑦�𝑖𝑖
𝑦𝑦�𝑖𝑖
• The model parameter learns not only to reduce square errors but also
to reduce the sum of coefficients 𝛽𝛽𝑗𝑗s.
LASSO, Ridge, and ElasticNet
• Canonical components:
• linear combinations of the variables in each group
−1 −1
𝐾𝐾𝑌𝑌𝑌𝑌 K YX 𝐾𝐾𝑋𝑋𝑋𝑋 K XY 𝐰𝐰𝐛𝐛 = ρ2 K 𝑌𝑌Y 𝐰𝐰𝒃𝒃
−1
𝐾𝐾𝑋𝑋𝑋𝑋 K XY 𝐰𝐰𝐛𝐛
𝐰𝐰𝐚𝐚 =
𝜌𝜌
Source: EMBL-EBI
What are the problems of CCA for multi-omics data
integration?
• It only finds sources of covariation between the two data sets. CCA is not able to
find the sources of variation that are present within individual data sets
Multi-Omics Factor Analysis (MOFA)
Differentiation state
(hidden/latent variable)
Multi-Omics Factor Analysis (MOFA)
Differentiation state
(hidden/latent variable)
Omics
(observed variables)
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)
Multi-Omics Factor Analysis (MOFA)
MOFA: Downstream Analysis
MOFA: Downstream Analysis
MOFA: Downstream Analysis
MOFA: Downstream Analysis
MOFA Applications: Chronic Lymphocytic Leukemia
MOFA Applications: Chronic Lymphocytic Leukemia
MOFA Applications: Chronic Lymphocytic Leukemia
biofam.github.io/MOFA2