Factor Analysis in R programming
Last Updated :
19 Apr, 2025
Factor Analysis (FA) is a statistical method that is used to analyze the underlying structure of a set of variables. It is a data reduction technique that attempts to account for the intercorrelations among a large number of variables in terms of fewer unobservable (latent) variables, or factors. In R Programming Language, the psych package offers a range of functions to conduct factor analysis.
Factor analysis involves several steps:
- Data preparation: The data is usually scaled so that all the variables are on a similar scale.
- Factor Extraction: The factors are extracted based on how well they can explain the variance in the data. There are a number of methods for factor extraction, such as principal components analysis (PCA), maximum likelihood estimate(MLE), and minimum residuals (MR).
- Factor Rotation: The factors are usually rotated to make their interpretation easier. The most common method of rotation is Varimax rotation, which tries to maximize the variance of the factor loadings.
- Factor interpretation: Interpreting the factors by examining the factor loadings (correlation between variables and factors). High loadings indicate a strong relationship between a variable and a factor.
Loading the Data
We will use the Iris dataset, which is built into R, for this example. The dataset contains measurements of the sepal length, sepal width, petal length, and petal width of three different iris species.
R
data(iris)
# View the first few rows of the dataset
head(iris)
Output:

First five rows of the dataset
Data Preparation
Prior to performing factor analysis, we must prepare the data by scaling the variables to a mean of zero and a standard deviation of one. This is necessary because factor analysis is sensitive to scale differences
R
# Scale the data
iris_scaled <- scale(iris[,1:4])
Determining the Number of Factors
The next step is to determine the number of factors to extract from the data. This can be done using a variety of methods, such as the Kaiser criterion, scree plot, or parallel analysis. In this example, we will use the Kaiser criterion, which suggests extracting factors with eigenvalues greater than one.
R
# Perform factor analysis
library(psych)
fa <- fa(r = iris_scaled,
nfactors = 4,
rotate = "varimax")
summary(fa)
Output:
Factor analysis with Call: fa(r = iris_scaled, nfactors = 4, rotate = “varimax”)
Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the model is -4 and the objective function was 0
The number of observations was 150 with Chi Square = 0 with prob < NA
The root mean square of the residuals (RMSA) is 0
The df corrected root mean square of the residuals is NA
Tucker Lewis Index of factoring reliability = 1.009
This summary shows that the factor analysis extracted 2 factors, and provides the standardized loadings (or factor loadings) for each variable on each factor. It also shows the eigenvalues and proportion of variance explained by each factor, as well as the results of a test of the hypothesis that 2 factors are sufficient. The goodness of fit statistic is also reported.
Interpreting the Results of Factor Analysis
After the factor analysis is finished, we can interpret the findings by looking at the factor loadings, which are the correlations between the observed variables and the factors that were extracted. Generally, loadings of more than 0.4 or 0.5 are significant.
R
# View the factor loadings
fa$loadings
Output:
Loadings:
MR1 MR2 MR3 MR4
Sepal.Length 0.997
Sepal.Width -0.108 0.757
Petal.Length 0.861 -0.413 0.288
Petal.Width 0.801 -0.317 0.492
MR1 MR2 MR3 MR4
SS loadings 2.389 0.844 0.332 0.000
Proportion Var 0.597 0.211 0.083 0.000
Cumulative Var 0.597 0.808 0.891 0.891
In this case, MR1 (the first factor) is strongly associated with Petal.Length and Petal.Width, while MR2 is associated with Sepal.Length and Sepal.Width.
Validating the Results of Factor Analysis
It is important to validate the factor structure by checking the assumptions and comparing results across different subsets of the data.
R
# Examine factor structure for different subsets
subset1 <- subset(iris[,1:4],
iris$Sepal.Length < mean(iris$Sepal.Length))
fa1 <- fa(subset1, nfactors = 4)
print(fa1)
Output:
Factor Analysis using method = minres
Call: fa(r = subset1, nfactors = 4)
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 MR3 MR4 h2 u2 com
Sepal.Length 0.66 0.61 -0.12 0 0.82 0.178 2.1
Sepal.Width -0.68 0.61 0.11 0 0.85 0.150 2.0
Petal.Length 1.00 0.00 0.00 0 1.00 0.005 1.0
Petal.Width 0.97 0.01 0.16 0 0.97 0.031 1.1
MR1 MR2 MR3 MR4
SS loadings 2.85 0.74 0.05 0.00
Proportion Var 0.71 0.18 0.01 0.00
Cumulative Var 0.71 0.90 0.91 0.91
Proportion Explained 0.78 0.20 0.01 0.00
Cumulative Proportion 0.78 0.99 1.00 1.00
Mean item complexity = 1.5
Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the null model are 6 and the objective function was
4.57 with Chi Square of 351.02
The degrees of freedom for the model are -4 and the objective function was 0
The root mean square of the residuals (RMSR) is 0
The df corrected root mean square of the residuals is NA
The harmonic number of observations is 80 with the empirical chi square 0 with prob < NA
The total number of observations was 80 with Likelihood Chi Square = 0 with prob < NA
Tucker Lewis Index of factoring reliability = 1.018
Fit based upon off diagonal values = 1
Measures of factor score adequacy
MR1 MR2 MR3 MR4
Correlation of (regression) scores with factors 1.00 0.91 0.69 0
Multiple R square of scores with factors 1.00 0.82 0.47 0
Minimum correlation of possible factor scores 0.99 0.64 -0.05 -1
By examining the factor loadings for different subsets, we ensure the results are stable and reliable.
Using factanal() Function for Factor Analysis
The factanal() function is used to perform factor analysis on a data set. The factanal() function takes several arguments described below
Syntax:
factanal(x, factors, rotation, scores, covmat)
where,
- x – The data set to be analyzed.
- factors – The number of factors to extract.
- rotation – The rotation method to use. Popular rotation methods include varimax, oblimin, and promax.
- scores – Whether to compute factor scores for each observation.
- covmat – A covariance matrix to use instead of the default correlation matrix.
Here is an example code snippet that demonstrates how to use factanal() function in R:
R
install.packages("psych")
library(psych)
data(mtcars)
# Perform factor analysis on the mtcars dataset
factor_analysis <- factanal(mtcars,
factors = 3,
rotation = "varimax")
print(factor_analysis)
Output:
Call:
factanal(x = mtcars, factors = 3, rotation = “varimax”)
Uniquenesses:
mpg cyl disp hp drat wt qsec vs am gear carb
0.135 0.055 0.090 0.127 0.290 0.060 0.051 0.223 0.208 0.125 0.158
Loadings:
Factor1 Factor2 Factor3
mpg 0.643 -0.478 -0.473
cyl -0.618 0.703 0.261
disp -0.719 0.537 0.323
hp -0.291 0.725 0.513
drat 0.804 -0.241
wt -0.778 0.248 0.524
qsec -0.177 -0.946 -0.151
vs 0.295 -0.805 -0.204
am 0.880
gear 0.908 0.224
carb 0.114 0.559 0.719
Factor1 Factor2 Factor3
SS loadings 4.380 3.520 1.578
Proportion Var 0.398 0.320 0.143
Cumulative Var 0.398 0.718 0.862
Test of the hypothesis that 3 factors are sufficient.
The chi square statistic is 30.53 on 25 degrees of freedom.
The p-value is 0.205
In this example, we load the psych package, which provides functions for data analysis and visualization, and the mtcars data set, which contains information about different car models. We then use the factanal() function to perform factor analysis on the mtcars data set, specifying that we want to extract three factors and use the varimax rotation method. Finally, we print the results of the factor analysis.
Similar Reads
Multiple Factor Analysis In R
Multiple factor analysis(MFA) is designed to handle data sets with distinct groups (blocks) of variables. In this article, we will discuss what multiple factor analysis is and how to implement It in R Programming Language. What is Multiple factor analysis(MFA)?Multiple Factor Analysis (MFA) is a sta
6 min read
Function Arguments in R Programming
Arguments are the parameters provided to a function to perform operations in a programming language. In R programming, we can use as many arguments as we want and are separated by a comma. There is no limit on the number of arguments in a function in R. In this article, we'll discuss different ways
4 min read
Functions in R Programming
A function accepts input arguments and produces the output by executing valid R commands that are inside the function. Functions are useful when you want to perform a certain task multiple times. In R Programming Language when you are creating a function the function name and the file in which you a
8 min read
Array vs Matrix in R Programming
The data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. The two most important data structures in R ar
3 min read
Learn R Programming
R is a Programming Language that is mostly used for machine learning, data analysis, and statistical computing. It is an interpreted language and is platform independent that means it can be used on platforms like Windows, Linux, and macOS. In this R Language tutorial, we will Learn R Programming La
15+ min read
Basic Syntax in R Programming
R is the most popular language used for Statistical Computing and Data Analysis with the support of over 10, 000+ free packages in CRAN repository. Like any other programming language, R has a specific syntax which is important to understand if you want to make use of its powerful features. This art
3 min read
Melting and Casting in R Programming
Melting and Casting are one of the interesting aspects in R programming to change the shape of the data and further, getting the desired shape. R programming language has many methods to reshape the data using reshape package. melt() and cast() are the functions that efficiently reshape the data. Th
3 min read
Data Munging in R Programming
Data Munging is the general technique of transforming data from unusable or erroneous form to useful form. Without a few degrees of data munging (irrespective of whether a specialized user or automated system performs it), the data can't be ready for downstream consumption. Basically the procedure o
11 min read
Performing Analysis of a Factor in R Programming - factanal() Function
Factor Analysis also known as Exploratory Factor Analysis is a statistical technique used in R programming to identify the inactive relational structure and further, narrowing down a pool of variables to few variables. The main motive to use this technique is to find out which factor is most respons
2 min read
The Factor Issue in a DataFrame in R Programming
DataFrames are generic data objects of R which are used to store the tabular data. Data frames are considered to be the most popular data objects in R programming because it is more comfortable to analyze the data in the tabular form. Data frames can also be taught as matrices where each column of a
4 min read