**Factor Analysis: A Comprehensive Overview**
### Introduction to Factor Analysis
Factor analysis is a multivariate statistical method used to identify
underlying relationships among a set of observed variables. It reduces the
dimensionality of data by grouping variables that share common variance
into latent factors. The primary goal is to summarize and simplify complex
datasets while retaining as much relevant information as possible.
Factor analysis is widely used in psychology, social sciences, market
research, and economics to uncover hidden patterns in data and
understand the underlying constructs measured by observed variables.
### Assumptions of Factor Analysis
For accurate and meaningful results, factor analysis relies on several key
assumptions:
1. **Adequate Sample Size:** A large sample is required for stable and
reliable factor solutions. A common guideline is at least five observations
per variable.
2. **Linear Relationships:** Factor analysis assumes that the relationships
between variables are linear.
3. **Multivariate Normality:** The data should approximate a normal
distribution, especially in maximum likelihood factoring.
4. **No Multicollinearity:** Excessive correlations among variables can
distort factor extraction.
5. **Sufficient Correlations:** The variables must exhibit significant
intercorrelations, typically assessed through measures like the Kaiser-
Meyer-Olkin (KMO) test and Bartlett’s test of sphericity.
6. **Interval or Ratio Data:** Factor analysis assumes that the variables
are measured on an interval or ratio scale.
### Principal Factor Analysis (PFA) and Simple Principal Axis (SPA)
**Principal Factor Analysis (PFA)**
- Also known as principal axis factoring (PAF) or common factor analysis,
PFA seeks the least number of factors to explain common variance among
a set of variables.
- Unlike Principal Component Analysis (PCA), which considers total
variance, PFA focuses only on shared variance.
- PFA is commonly used in structural equation modeling (SEM) as it
provides a better representation of latent constructs.
**Simple Principal Axis (SPA)**
- SPA is a variation of PFA where initial communalities are estimated from
the squared multiple correlations of each variable with all others.
- It iteratively refines communalities to improve the accuracy of factor
extraction.
### Factor Loadings
Factor loadings represent the correlation coefficients between observed
variables and latent factors.
- High loadings (e.g., above 0.6) indicate a strong association between a
variable and a factor.
- Squared factor loadings explain the percentage of variance in a variable
accounted for by a factor.
- Factor loadings are used to interpret the meaning of extracted factors.
### Rotation in Factor Analysis
Rotation enhances the interpretability of factor solutions by simplifying the
structure of factor loadings.
**Types of Rotation:**
1. **Orthogonal Rotations (Factors Remain Uncorrelated)**
- **Varimax:** Maximizes variance among factor loadings, leading to a
clearer factor structure.
- **Quartimax:** Minimizes the number of factors needed to explain
each variable.
- **Equimax:** A compromise between Varimax and Quartimax.
2. **Oblique Rotations (Factors Can Be Correlated)**
- **Direct Oblimin:** Allows for correlated factors, which may provide a
more realistic representation of data.
- **Promax:** A computationally efficient alternative to Direct Oblimin,
often used for large datasets.
### Eigenvalues and Factor Retention
Eigenvalues indicate the amount of variance explained by each factor. The
total variance in a dataset is partitioned among factors based on their
eigenvalues.
**Rules for Factor Retention:**
1. **Kaiser’s Criterion:** Factors with eigenvalues greater than 1 are
retained.
2. **Scree Plot:** A graphical method where factors above the “elbow” of
the plot are considered meaningful.
3. **Variance Explained:** Retaining factors that explain a significant
proportion (e.g., 60-80%) of the total variance.
### Data Matrix in Factor Analysis
Factor analysis begins with a data matrix that consists of:
- **Variables (Columns):** The observed measurements.
- **Cases (Rows):** Individual observations or subjects.
- **Correlation Matrix:** Shows the relationships among variables and
serves as the basis for factor extraction.
### Conclusion
Factor analysis is a powerful statistical technique for uncovering hidden
structures within data. By identifying latent factors, it helps researchers
and analysts reduce complexity, enhance interpretability, and make data-
driven decisions in various fields.