Dimension
Reduction
CS5122 DESCRIPTIVE & PREDICTIVE ANALYTICS
DILUM BANDARA
DILUM.BANDARA@UOM.LK
Recommender Systems
Use knowledge about preference of a group of users about a
certain items & help predict the interest level for other users
from same group
Collaborative filtering
◦ Widely used method for recommender systems
◦ Tries to find traits of shared interest among users in a group to
help predict likes & dislikes of other users within the group
2
Source: Roberto Mirizzi
Methods Employed for Netflix
Prize Problem
Nearest Neighbor methods
◦ k-NN with variations
Matrix factorization
◦ Probabilistic Latent Semantic Analysis
◦ Probabilistic Matrix Factorization
◦ Expectation Maximization for Matrix Factorization
◦ Singular Value Decomposition
◦ Regularized Matrix Factorization
3
Dimension Reduction
Statistical methods that provide information about
point scattered in multivariate space
◦ Simplify complex relationships between cases and/or
variables
◦ Makes it easier to recognize patterns by
◦ Identify & describe dimensions that underlie input data
◦ Identifying sets of variables with similar behavior & use only a few
of them
4
Consider a 2D scatter of points that show a
high degree of correlation …
x
y
bar-x
bar-y
orthogonal
regression…
Rotated Data
6
1st var. may capture so much of the
information content in original dataset
that we can ignore remaining axis
length
width
“size”
“shape”
Principal Components Analysis
(PCA)
Why?
•Clarify relationships among variables
•Clarify relationships among cases
When?
•Significant correlations exist among variables
How?
•Define new axes (components)
•Examine correlation between axes & variables
•Find scores of cases on new axes
9
r = 0
r = -1
r = 1
x4
x3
x2
x1
pc2
pc1
component
loading
eigenvalue: sum of all
squared loadings on one
component
Eigenvalues
Sum of all eigenvalues = 100% of variance in original
data
Proportion accounted for by each eigenvalue = ev/n
◦ n = # of vars
Correlation matrix; variance in each variable = 1
◦ If an eigenvalue < 1, it explains less variance than one of
original variables
◦ But .7 may be a better threshold…
‘Scree plots’ – show trade-off between loss of
information, & simplification
R Example
If range of each variable is very different data need
to be 1st scaled
◦ Else, larger variables will have an impact on final
result
Examples
◦ Flower dimension dataset
◦ Panel Survey of Income Dynamics
12

Introduction to Dimension Reduction with PCA

  • 1.
  • 2.
    Recommender Systems Use knowledgeabout preference of a group of users about a certain items & help predict the interest level for other users from same group Collaborative filtering ◦ Widely used method for recommender systems ◦ Tries to find traits of shared interest among users in a group to help predict likes & dislikes of other users within the group 2 Source: Roberto Mirizzi
  • 3.
    Methods Employed forNetflix Prize Problem Nearest Neighbor methods ◦ k-NN with variations Matrix factorization ◦ Probabilistic Latent Semantic Analysis ◦ Probabilistic Matrix Factorization ◦ Expectation Maximization for Matrix Factorization ◦ Singular Value Decomposition ◦ Regularized Matrix Factorization 3
  • 4.
    Dimension Reduction Statistical methodsthat provide information about point scattered in multivariate space ◦ Simplify complex relationships between cases and/or variables ◦ Makes it easier to recognize patterns by ◦ Identify & describe dimensions that underlie input data ◦ Identifying sets of variables with similar behavior & use only a few of them 4
  • 5.
    Consider a 2Dscatter of points that show a high degree of correlation … x y bar-x bar-y orthogonal regression…
  • 6.
    Rotated Data 6 1st var.may capture so much of the information content in original dataset that we can ignore remaining axis
  • 7.
  • 8.
  • 9.
    Principal Components Analysis (PCA) Why? •Clarifyrelationships among variables •Clarify relationships among cases When? •Significant correlations exist among variables How? •Define new axes (components) •Examine correlation between axes & variables •Find scores of cases on new axes 9
  • 10.
    r = 0 r= -1 r = 1 x4 x3 x2 x1 pc2 pc1 component loading eigenvalue: sum of all squared loadings on one component
  • 11.
    Eigenvalues Sum of alleigenvalues = 100% of variance in original data Proportion accounted for by each eigenvalue = ev/n ◦ n = # of vars Correlation matrix; variance in each variable = 1 ◦ If an eigenvalue < 1, it explains less variance than one of original variables ◦ But .7 may be a better threshold… ‘Scree plots’ – show trade-off between loss of information, & simplification
  • 12.
    R Example If rangeof each variable is very different data need to be 1st scaled ◦ Else, larger variables will have an impact on final result Examples ◦ Flower dimension dataset ◦ Panel Survey of Income Dynamics 12