0% found this document useful (0 votes)
41 views

Multiple Regression Analysis

This document discusses multiple regression analysis, which involves using multiple independent variables to explain the variation in a single dependent variable. It defines key terms like the regression equation, coefficients, R-squared, and standard error. It also describes two approaches to regression - a hit-and-trial method and a pre-conceived method. Finally, it outlines the SPSS commands used to perform regression analysis and output the results.

Uploaded by

Dhara Kanungo
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Multiple Regression Analysis

This document discusses multiple regression analysis, which involves using multiple independent variables to explain the variation in a single dependent variable. It defines key terms like the regression equation, coefficients, R-squared, and standard error. It also describes two approaches to regression - a hit-and-trial method and a pre-conceived method. Finally, it outlines the SPSS commands used to perform regression analysis and output the results.

Uploaded by

Dhara Kanungo
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

MULTIPLE REGRESSION ANALYSIS: EXPLAINING CAUSATION AND

ESTIMATION

5.1 INTRODUCTION
A regression analysis is done to explain the variation in one variable (dependent variable),
based on variation in one or more other variables (independent variables). In case there is
only one independent variable to explain the variation in one dependent variable, it is known
as simple regression. If there are multiple independent variables to explain the variation in a
single dependent variable, it is known as a multiple regression model.
The linear equation commonly used for a regression analysis is:
y=b0 +b1 x 1+ b2 x 2 +b3 x 3+ … ..+ bn xn where, y is dependent variable and x 1 , x 2 , x 3 … .∧x n are
the independent variables, b 0 is the intercept and b 1 , b2 , b3 … .∧b n are the coefficients of the
respective independent variables.
5.2 KEYWORDS
Beta Distribution
Regression coefficients are expressed in terms of the units of the associated variable, thereby
making comparisons inappropriate, beta coefficients use standardized data and can be directly
compared.
Coefficient of Determination R2
Measure of proportion of the variance of the dependent variable about its mean that is
explained by the independent, or predictor, variables .The coefficient can vary from 0 to 1.If
the regression model is properly applied and estimated, the researcher can assume that the
higher the value of R2, the greater the explanatory power of the regression equation, and
therefore the better the prediction of the dependent variable.
Degrees of Freedom
Value calculated from the total number of observations minus from the number of estimated
parameters. The parameter estimates are restrictions on the data because, once made, they
define the population from which the data are assumed to have been drawn. For example, for
estimating a regression model with a single independent variable, we estimate two
parameters, the intercept(b¿ ¿ o)¿ and a regression coefficient for the independent variable
(b 1).Degrees of freedom provide a measure of how restricted the data are to reach a certain
level of prediction.
Dependent Variable ( y )
Variable being predicted or explained by the set of independent variables is known as
dependent variable.
Independent Variable
Variable selected as predictors and potential explanatory variables of the dependent variable
is known as independent variable.
Intercept (b o ¿
Value of the y axis where the line defined by the regression equation y=bo +b 1 x 1 crosses the
axis is known as intercept. It is described by the constant term b oin the regression equation. In
addition to its role in prediction, the intercept may have a managerial interpretation. If the
complete absence of the independent variable has meaning, then the intercept represents that
amount. For example, when estimating sales from past advertising expenditures, the intercept
represents the level of sales expected if advertising is eliminated. But in many instances the
constants has only predictive value because in no situation are all independent variables
absent. An example in predicting product preference based on consumer attitudes. All
individuals have some level of attitudes. All the individuals have level of attitude, so the
intercept has no managerial use, but it still aids in prediction.
Least Squares
Estimation procedure used in simple and multiple regression whereby the regression
coefficients are estimated so as to minimize the total sum of residuals is known as least
squares.
Linearity
Term used to express the concept that the model possesses the properties of additively and
homogeneity. In a sample sense, linear models predict values that fall in a straight line by
having a constant unit change of the dependent variable for a constant change of independent
variable. In the population model y=bo +b 1 x 1+ ε , the effect of changing x 1 by a value of 1.0
is to add b 1 units of y .
Multiple Regression
Regression model with two or more independent variables is known as multiple regression.
Outlier
In strict terms, an observation that has a substantial difference between the value for the
dependent variable and the predicted value is known as outlier. Cases that are substantially
different with regard to either the dependent or independent variables are often termed
outliers as well. In all instances, the objective is to identify observations that are inappropriate
representations of the population from which the sample is drawn, so that they may be
discounted or even eliminated from the analysis as unrepresentative.
Parameter
Quantity characteristics of the population for example, μand σ 2 are the symbols used for the
population parameters mean( μ) and variance( σ 2).They are typically estimated from the
sample data in which the arithmetic average of the sample is used as a measure of the
population average and the variance of the sample is used to estimate the variance of the
population.
Regression Coefficient (b n ¿
Numerical value of the parameter estimates directly associated with an independent variable;
for example, in the model y=bo +b 1 x 1, the value b 1 is the regression coefficient for the
variable x 1.The regression coefficient represents the amount of change in the independent
variable. In the multiple predictor model, the regression coefficient are partial coefficient
because each takes into account not only the relationship between yand x1 and between
yand x2 , but also between x 1∧x 2.The coefficient is not limited in range,as it is based on both
the degree of association and the scale units of independent variable. For instance, two
variables with the same association to y would have different coefficients if one independent
variable was measured on a 7 point scale and another was based on a 100-point scale.
Residual (e)
Error in predicting our sample data is known as residual. We assume that random error will
occur, but we assume that this is an estimate of the true random error in the population, not
just the error in prediction for our sample. We assume that the error in the population we are
estimating is distributed with mean of 0 and a constant variance.
Sampling Error
The expected variation in any estimated parameter that is due to use of a sample rather than
the population is sampling error. Sampling error is reduced as the sample size is increased
and is used to statistically test whether the estimated parameter differs from zero.
Standard Error
Expected distribution of an estimated regression coefficient is standard error. The standard
error is similar to the standard deviation of any set of values, but instead denotes the expected
range of the coefficient across multiple across multiple samples of the data. It is useful in
statistical test of significance that to see whether the coefficient is significantly different from
zero. The t value of regression coefficient is the coefficient divided by its standard error.
5.3 METHODS
There are basically two approaches to regression analysis:
 A hit-and-trial approach
 A pre-conceived approach
In hit-and-trial approach, we collect data on a large number of independent variables and
then try to fit a regression model with a stepwise regression model, entering one variable into
the regression equation at a time. The general linear regression model is
y=b0 +b1 x 1+ b2 x 2 +… …+b n x n, where y is the dependent variable and x 1 , x 2 , x 3 … … x n are
the independent variables to be related to y and expected to explain or predict
y . b1 ,b 2 , b3 … … b nare the coefficients of the respective independent variables, which will be
determined from the input data and b 0 is the intercept.
The pre-conceived approach assumes the researcher knows reasonably well which
variables explain y and the model is pre-conceived, say, with 4 independent variables
x 1 , x 2 , x 3 , x 4 . Therefore, not too much experimentation is done. In this case the main
objective is to find out if the pre-conceived model is good or not. The equation is of the same
form as earlier.
Input data on y and each of x variables is required to do a regression analysis. This
data is input into a package to perform the regression analysis. The output consists of the b
coefficients for all the respective independent variables.
5.4 SPSS COMMANDS FOR REGRESSION ANALYSIS
After the input data has been typed along with variable labels and value labels in an SPSS
file, to get the output for a Regression problem similar to that described in the chapter on
regression in the text.
 Click on ANALYZE at the SPSS menu bar.
 Click on REGRESSION, followed by LINEAR.
 On the dialogue box which appears, select a dependent variable by clicking on the right
arrow leading to the dependent box after highlighting the appropriate variable from the
variable list on the left side.
 Select the independent variables to be included in the regression model in the same way,
transferring them from left side to the right side box by clicking on the arrow leading to
the box called independent variables or independent.
 In the same dialogue box, select the METHOD. Choose
 ENTER as the method if you want all independent variables to be included in the model.
 STEPWISE if you want to use forward stepwise regression.
 BACKWARD if you want to backward stepwise regression.
 Select OPTIONS if you want additional output options, select the ones you want, and click
CONTINUE.
 Select PLOTS if you want to see some plots such as residual plots, select those you want,
and click CONTINUE.
 Click OK from the main dialogue box to get the REGRESSION output.
General: All output files can be saved using File Save command. They can be printed using
the File Print Command. Input data also can be separately saved, or printed, using the same
commands (FILE SAVE, FILE PRINT) while the cursor is on the input data file.
5.5 CASE STUDY-22
Problem
A manufacturer and marketer of two wheelers would like to build a regression model
consisting of seven variables to predict sales. Past data has been collected for 15 sales
territories, on seven different variables. Build a regression model and recommend whether or
not it should be used by the company. We will assume that data are for different territories in
which the company operates, and the variables on which data are collected are as follows:
Dependent variable: y=sales∈Rs .lakh∈the territory
Independent variables:
x 1=market potential∈theterritory ( ¿ Rs .lakh )
x 2=no . of dealers of the company∈the territory
x 3=no . of sales people∈the territory
x 4 =index of competitor activity∈the territory on a5− point scale ( 1=low ,5=high )
x 5=no . of service peoplei n the territory
x 6=no . of existing customers∈the territory
Input Data
The data set consisting of 15 observations means from 15 different sales territories), is given
in Table 5.5.1.
Table 5.5.1: Input Data
SALES POTENTIA DEALER PEOPLE COMPENSATIO SERVICE CUSTOME
L N R
5 25 1 6 5 2 20
60 150 12 30 4 5 50
20 45 5 15 3 2 25
11 30 2 10 3 2 20
45 75 12 20 2 4 30
6 10 3 8 2 3 16
15 29 5 18 4 5 30
22 43 7 16 3 6 40
29 70 4 15 2 5 39
3 40 1 6 5 2 5
16 40 4 11 4 2 17
8 25 2 9 3 3 10
18 32 7 14 3 4 31
23 73 10 10 4 3 43
81 150 15 35 4 7 70
From the Table 5.5.2 it is observed that R2(coefficient of determination) is 0.977. It indicates
that the 97.7% of the variation in sales can be explained by the six independent variables or
by the six factors.
Table 5.5.2: Coefficient of Determination
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .989a .977 .960 4.39102
a. Predictors: (Constant), CUSTOMER, COMPENSN, SERVICE, POTENTIAL, DEALER, PEOPLE
From the Table 5.5.3 we can study whether the regression model is statistically significant or
not. Since the significance of F is 0.00 or p−value is 0.00 this indicates the regression model
is statistically significant.
Table 5.5.3: Analysis of Variance
ANOVAa
Model Sum of Squares Df Mean Square F Sig.
Regression 6609.485 6 1101.581 57.133 .000b
1 Residual 154.249 8 19.281
Total 6763.733 14
a. Dependent Variable: SALES
b. Predictors: (Constant), CUSTOMER, COMPENSN, SERVICE, POTENTIAL, DEALER, PEOPLE
We also have the t test value for the significance of individual independent variables to
indicate the significance level at 95% confidence level. From the Table 5.5.4 we can see that
only potential and compensation are statistically significance with values of 0.016 and 0.031
which are less than 0.05 (significant) respectively. The other four independent variables are
individually not significant because all the p−values corresponding to these independent
variables is greater than 0.05 (insignificant). From Table 5.5.4, the regression model can be
written as follows:
y=−3.173+ 0.227 x1 +0.819 x 2 +1.091 x 3−1.893 x 4−0.549 x 5 +0.066 x 6

sales=−3.173+ 0.227( potential)+ 0.819(dealer)+1.091( people)−1.893( compensation)−0.549 (service )+ 0
Table 5.5.4: Regression Model Output
Coefficientsa
Model Unstandardized Coefficients Standardized T Sig.
Coefficients
B Std. Error Beta
(Constant) -3.173 5.813 -.546 .600
POTENTIAL .227 .075 .439 3.040 .016
DEALER .819 .631 .164 1.298 .230
1 PEOPLE 1.091 .418 .414 2.609 .031
COMPENSN -1.893 1.340 -.085 -1.413 .195
SERVICE -.549 1.568 -.041 -.350 .735
CUSTOMER .066 .195 .050 .338 .744
a. Dependent Variable: SALES
5.6 CASE STUDY-23
Problem
An ABC organization would like to build a regression model consisting of four independent
variables to predict compensation. Past data has been collected for 15 different employees,
on five different variables. The variables on which data are collected are as follows:
Dependent variable: y=compensation∈rupees
Independent variables:
x 1=experienc e∈ years
x 2=education∈ years after 10 th standard
x 3=no . of employees supervised
x 4 =no . of projects handled
Input Data
The data set consisting of 15 observations means from 15 different sales territories), is given
in Table 5.6.1.
Table 5.6.1: Input Data
Sl. No. Compens experien Education Noofsuper Projects
1 1500 2 5 4 10
2 1650 3 6 5 10
3 1750 3 3 5 12
4 1400 2 3 3 9
5 2000 4 4 6 15
6 2200 5 6 6 14
7 2100 1 5 4 12
8 2750 5 8 7 15
9 2900 8 9 8 25
10 1100 3 3 2 7
11 1000 4 2 1 5
12 1350 6 4 4 12
13 1550 4 6 4 11
14 1375 8 4 8 13
15 1400 4 3 5 10
2
From the Table 5.6.2 it is observed that R (coefficient of determination) is 0.888. It indicates
that the 88.8% of the variation in compensation can be explained by the four independent
variables or by the four factors.
Table 5.6.2: Coefficient of Determination
Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .942a .888 .843 221.06937
a. Predictors: (Constant), projects, experien, education, noofsuper
From the Table 5.6.3 we can study whether the regression model is statistically significant or
not. Since the significance of F is 0.00 or p−value is 0.00 this indicates the regression model
is statistically significant.
Table 5.6.3: Analysis of Variance
ANOVAa
Model Sum of Squares Df Mean Square F Sig.
Regression 3861033.355 4 965258.339 19.751 .000b
1 Residual 488716.645 10 48871.665
Total 4349750.000 14
a. Dependent Variable: compens
b. Predictors: (Constant), projects, experien, education, noofsuper
We also have the t test value for the significance of individual independent variables to
indicate the significance level at 95% confidence level. From the Table 5.6.4 we can see that
experience, education and projects are statistically significance with values of 0.040, 0.044
and 0.021 which are less than 0.05 (significant) respectively. The other only one independent
variable is individually not significant because the p−value corresponding to the
independent variable is greater than 0.05 (insignificant). From Table 5.6.4, the regression
model can be written as follows:
y=456.847−93.432 x 1 +110.270 x 2 + 44.211 x 3+ 77.515 x 4

compensation=456.847−93.432 ( experience ) +110.27(education)+ 44.211(no . of supervisor )+77.515( proje
Table 5.6.4: Regression Model Output
Coefficientsa
Model Unstandardized Coefficients Standardized T Sig.
Coefficients
B Std. Error Beta
(Constant) 456.847 173.949 2.626 .025
Experien -93.432 39.717 -.340 -2.352 .040
1 Education 110.270 47.788 .392 2.307 .044
Noofsuper 44.211 55.721 .159 .793 .446
Projects 77.515 28.216 .631 2.747 .021
a. Dependent Variable: compens

Logistic regression

SPSS COMMANDS FOR LOGISTIC REGRESSION ANALYSIS


After the input data has been typed along with variable labels and value labels in an SPSS
file, to get the output for a Logistic Regression analysis problem to that described in the text:
 Click on ANALYZE, REGRESSION, BINARY LOGISTIC.
 Select the dependent variable, independent variables (covariates) and method (the ENTER
method is preferred), where you choose to enter all independent variables, since the
statistical tests like Wald’s will find them insignificant.
 Click on SAVE and select the statistics (like the HOSMER LEMESHOW TEST) and plots
you wish to see.
 Click OK to get the output.
FACTOR ANALYSIS: DATA REDUCTION

8.1 INTRODUCTION
Factor analysis is an interdependence technique, whose primary purpose is to define the
underlying structure among the variables in the analysis. Variables play a key role in any
multivariate analysis. Whether we are making a sales forecast with regression, predicting
success or failure of a new firm with discriminant analysis, or using any of the other
multivariate techniques, we must have a set of variables upon which to form relationships.
As such, variables are the building blocks of relationship. Factor analysis can play a unique
role in the application of other multivariate technique. Broadly speaking, factor analysis
provides the tools for analyzing the structure of the interrelationships among a large number
of variables by set of variables that are highly interrelated, known as factors. Factor analysis
presents several ways of representing the groups of variables for the use in other multivariate
techniques.
Factor analysis is a very commanding method of reducing data complexity by reducing the
number of variables being studied. It is a common experience, for example, to find a
marketing decision maker wondering that what exactly makes a consumer buy his product.
The possible purchasing criteria could range one or two to twenty or twenty five, and often,
the marketing manager is shooting in the dark, trying to figure out what really drives buyer
behavior. In other words, what are the underlying significant drivers of his behavior? Factor
analysis is a good way of resolving this confusion and identifying latent or underlying factors
from a number of given significant variables. In general, factor analysis is a set of techniques
which, by analyzing correlations between variables, reduces their number into fewer
significant factors which explain much of the original data, more economically and
practically. Even though a subjective interpretation can result from a factor analysis output,
the procedure often provides an insight into relevant psychographic variables, and results in
economical use of data collection efforts. The subjective element of factor analysis could be
reduced by splitting the sample randomly into two, and extracting factors separately from
both parts. If similar factors result, the analysis could be assumed as reliable or stable. Other
innovative uses of factor analysis could be to do it separately for two groups such as users
and nonusers of a brand, and check what differences exist in the factors extracted. This would
be an indirect way to find out differences in buying criteria for the same product category
among the two groups. In many real-life applications, the number of independent variables
used in predicting a response variable will be too many. The difficulties in having too many
independent variables in such exercise are as follows:
 Increased computational time to get solution
 Increased time and error in data collection
 Too much expenditure in data collection
 Presence of redundant independent variables
 Difficulty in making inferences
These can be avoided using factor analysis. Factor analysis aims at grouping the original
input variables into factors which underline the input variables. Each factor will account for
one or more input variables. Theoretically, the total number of factors in the factor analysis is
equal to the total number of input variables. But, after performing factor analysis, the total
number of factors in the study can be reduced by dropping the insignificant factors based on
certain criterion. To demonstrate these aspects, consider the case analysis related to factor
analysis. It is a very useful method of reducing data complexity by reducing the number of
variables being studied. It is a good way to resolve the confusion and identifying latent
factors or underlying factors /significant factors from an array of important variables.
The Stages of Factor Analysis are: Extraction and Rotation
Stage-1: Factor Extraction process
To identify how many factors will be extracted from the data. The method of extraction of
factors from the data is known as Principal Component Analysis.
Stage-2: Rotation of Principal Component
After the number of factors is decided upon in Stage-1, the next task is to interpret and name
the factors. It is done by the process of identifying which factors are associated with each of
the original variables. Methods of Orthogonal Rotation are: Varimax and Quartimax. Higher
Eigen value implies higher amount of variance explained by the factor.
8.2 KEYWORDS
Factor analysis involves many of terminologies which are presented in this keyword section
for better understanding of the related techniques.
Factor Analysis
Correlation between the original variables and the factors, and the key o understanding the
nature of particular factor is factor analysis. Squared factor loadings indicate what percentage
of the variance in an original variable is explained by a factor.
Factor Rotation
Process of manipulation or adjusting the factor axes to achieve a simpler and pragmatically
more meaningful factor solution is factor rotation.
Communality
Total amount of variance an original variable shares with all other variables included in the
analysis is communality. It is the sum of squares of the factor loadings of the variable i on all
n
factors: hi =∑ Lij
2 2

j =1
Component Analysis
Factor analysis model in which the factors are based on the total variance. With component
analysis 1s are used in the diagonal of the correlation matrix; this procedure computationally
implies that all the variance is common or shared.
Equmax
Orthogonal factor rotation method is a “compromise” between the varimax and quartimax
approaches, but is not widely used.
Factor
Linear combination (variate) of the original variables is known as factor. Factors also
represent the underlying dimensions (constructs) that summarize for the original set of
observed variables.
Orthogonal
Mathematical independence (no or zero correlation) of factor axes to each other (i.e. at right
angles or 90 degrees).
Quartimax
A type of orthogonal factor rotation method focusing on simplifying the columns of a factor
matrix is quartimax. Generally it is considered less effective than the varimax rotation
method.
Varimax
The most popular orthogonal factor rotation methods focusing on simplifying the columns in
a factor matrix is varimax. Generally it is considered superior to other orthogonal factor
rotation methods in achieving a simplified factor structure.
Factor Matrix
Table displaying the factor loadings of all variables in each factor is factor matrix.
Correlation Coefficient Matrix
Table showing the inter-correlations among all the variables is known as correlation matrix. It
is the matrix of correlation coefficients of the original observations between different pairs of
input variables.
Factor Loadings
It is a matrix representing the correlation between different combinations of variables and
factors. Li ( j)is the factor loading of the variable j on the factor i , where
i=1 , 2 ,3 , … … , n andj=1 , 2 ,3 , … . , n .
Eigen values
It is the sum of squares of the factor loadings of all variables on a factor.
n
Eigen valueoft h e factorj=∑ L2ij
i=1
Note: The sum of the Eigen values of all factors (if no factor is dropped) is equal to the sum
of the communalities of all variables.
Rotation
After obtaining factor loadings, one should examine whether the factor loading matrix
possesses simple structure. If a factor loading matrix has a simple structure, it is easy to make
interpretations about the factors. If there is no simple structure, then the n−¿dimensional
space of the factors should be rotated by an angle such that the factor loadings are revised to
have a simple structure which will simplify the process of interpretation of the factors. Such
rotation is called rotation of factors.
A simple structure means that each variable has very high factor loading (as high as 1)
on one of the factors and very low factor loading (as low as 0) on other factors. The
communalities of each variable before and after factor rotation will be the same.
The popular methods of rotation of factors are Varimax method and Promax or Quartimax
method. Both the techniques aim at better interpretations.
Varimax Method
Varimax method of factor rotation employs orthogonality between different pairs of factor
rotation.
Quartimax Method
The Promax method or Quartimax method employs oblique rotation. This means that the
angles between different pairs of factors axes are not 90 0 after rotation.
Significant Number of Factors
The main objective of the factor analysis is to group the given set of input variables into
minimal number of factors with the maximum capability of extracting information with
reduced set of factors. The criteria to determine the number of factors to be retained for future
study such as: minimum Eigen value criterion and screen plot criterion.
Minimum Eigen Value Criterion
If the Eigen value (sum of squares of the factor loadings of all variables on a factor) of a
factor is more than or equal to 1, then that factor is to be retained; otherwise, that factor is to
be dropped.
Screen Plot Criterion
As per this criterion, plot the Eigen values of the factors by taking the factor number on X-
axis and the Eigen values on Y-axis. Then, identify the factor number at which the slope of
the line connecting the points changes from steep to a gradual trailing off towards the right of
the identified factor number. Such change in slope in the graph is known as scree and the
point is known as scree point. The factors which are marked up to the scree point from the
origin are to be retained for future study and all the factors to the right of the scree point are
to be dropped from the study.
Factor Scores
Though a factor is not visible like an original input variable, it is still a variable which can be
used to find the scores for respondents. At the initial stage, the respondents assign scores for
the variables. After performing factor analysis, each factor assigns a score for each
respondent. Such scores are called factor scores. The equation to compute the factor score of
a respondent by the factor k is shown below. By substituting the standardized values of the
input variables assigned by a respondent in this expression, the factor score of that respondent
can be obtained:
n
F k =w 1 k X 1+ w2 k X 2 +w 3 k X 3 + … …..+ wik X i+ …..+ wnk X n=∑ wik X i
i =1
where,w ik is the weight of the input variable X i in the linear composite of the factor, k , for
k =1 ,2 , 3 , … … … … … .., n .
Latent Factor
Latent factors are otherwise known as underlying factors where factor analysis is a set of
techniques which, by analyzing correlations between variables, reduces their number into few
factors which explain much of the original data, more economically.
Variance Explained
The concept of Eigen value translates approximately to the “variance explained” concept of
regression analysis. The higher the Eigen value of a factor, the higher is the amount of
variance explained by the factor. What we are attempting to do is to extract the least number
of factors possible which will maximize the explained variance.
8.3 METHODS
There are two stages in factor analysis. Stage 1 can be called the Factor Extraction process,
where the objective is to identify how many factors will be extracted from the data. The most
popular method for this is called Principal Component Analysis. There is also a rule-of-
thumb based on the computation of an Eigen value, to determine how many factors to extract.
The concept of Eigen value translates approximately to the “Variance Explained”
concept of regression analysis. The higher the Eigen value of a factor, the higher is the
amount of variance explained by the factor. What we are attempting to do so is to extract the
least number of factors possible which will maximize the explained variance. Before
extraction, it is assumed that each of the original variables has an Eigen value is equal to 1.
Therefore it stands to reason that we would expect any single factor, which is a linear
combination of some of the original variables, to exceed the value of 1. Indeed, that is exactly
what the rule-of-thumb for factor extraction says. Theoretically, we can have many factors as
there are original variables. But since the objective is to reduce the variables to a fewer
number of factors, we usually retain those with an Eigen value of 1 or more (in other words, a
factor must explain at least as much of the variance if not more, than a single original
variable).
Stage 2 is called Rotation of principal components. This is actually optional, but
highly recommended. After the number of extracted factors is decided upon in Stage 1, the
next task of the researcher is to interpret and name the factors. This is done by the process of
identifying which factors are associated with which of the original variables. The factor
matrix is used for this purpose. The original factor matrix is un-rotated, and is a part of the
output from Stage 1. The rotated factor matrix comes about in Stage 2, it gives a rotated
factor matrix. The factor matrix (whether un-rotated or rotated) gives us the loading of each
variable on each of the extracted factors. This is similar to a correlation matrix, with
“loadings” having values between 0 and 1. Values close to 1 represent high loadings and
those close to 0, low loadings. The objective is to find variables which have a high loading on
one factor, but low loadings on other factors. If factor 1 is loaded highly by variables 3, 7,
and 10, for example, it is assumed that factor 1 is a linear combination of these three
variables, and it is given a suitable name, representing the essence of the original variables, of
which it is a combination. This process is somewhat subjective, but in skilful hands, results in
a very useful interpretation, as it is illustrated through case studies.
There are two popular methods of orthogonal rotation, the varimax and the quartimax. Any
one of these two is adequate for the average user of factor analysis in exploratory or empirical
research.
1.4 RECOMMENDED PROCEDURE
The authors recommend using factor analysis as described above, to reduce data variables
into a smaller set of factors. The analysis could be started by observing through a correlation
matrix, if correlations exist between at least some of the original variables. Unless some of
the variables are correlated with each other, factor analysis is not recommended. It may result
in forced extraction of non-existing factors. Formal test like Bartlett’s test of sphericity can
also be used to ensure that there are some significant correlations among the variables in the
input data.
Stage 1 should be performed using Principal Component Analysis followed by
varimax (or quartimax) rotation in stage 2. Both the un-rotated and rotated factor matrix
should be looked at for interpretation of factors. Unless the researcher is doing only an
exploratory factor analysis, the thumb rule of Eigen value being equal to 1 or more should be
used to determine the number of factors to be extracted. The procedure will be illustrated
through the case studies.
1.5 SPSS COMMANDS FOR FACTOR ANALYSIS
After the input data has been typed along with variable labels and value labels in an SPSS
file, to get the output for a factor analysis problem is described as follows:
Step-
 Click on ANALYZE at the SPSS menu bar.
 Click on DATA REDUCTION, followed by FACTOR.
 On the dialogue box which appears, select all the variables required for the factor analysis
by clicking on the right arrow to transfer them from the variable list on the left to the
variables box on the right.
 Click on EXTRACTION in the lower part of the dialogue box.
 Select “PRINCIPAL COMPONENTS” as the method.
 Under DISPLAY, select “UNROTATED FACTOR SOLUTION”.
 Under EXTRACT, select “EIGEN VALUES OVER 1”.
 Under ANALYZE, choose “CORRELATION MATRIX”.
 Click CONTINUE.
 Click on ROTATION in the lower part of the main dialogue box. Select VARIMAX from
the options under METHOD. Click CONTINUE.
 Click OK to get the FACTOR ANALYSIS output, including the un-rotated factor matrix,
the rotated factor matrix using varimax rotation and extracted factors along with Eigen
values and cumulative variance. Communality figures would also be a part of the output.
Note: It is possible to use other methods such as Generalized Least Squares to get the factor
analysis output instead of Principal Components. It is also possible to use other rotation
methods instead of varimax rotation method.
8.6 CASE STUDY-35
Factor analysis was done by using Principal Component Analysis Method to reduce the
number of factors responsible for variance in the responses of the employees.
Strongly Agree=5, Agree=4, Cannot say=3, Disagree=2, Strongly Disagree=1
1. I feel proud for my employer’s brand.
2. I feel satisfied with the benefits provided by the employer.
3. I feel the tasks assigned to me by my supervisor help me grow professionally.
4. The expectations of my supervisors are realistic.
5. My opinion about work matters to my co-workers.
6. I feel my salary is adequate for workload and responsibility.
7. My job is challenging and my work is meaningful.
8. I feel satisfied with my job security.
9. I feel satisfied with the performance appraisal system and the outcome/feedback.
10. I feel satisfied with my organization’s employee policies (related to promotion, transfer,
training, compensation etc)
11. I feel satisfied with the fairness of the way the organization treats all its employees.
12. I feel satisfied with the overall communication at my organization.
13. I feel satisfied with the facilities provided by my organization.
14. I feel satisfied with my job.
15. I feel satisfied with my level of involvement in project planning and execution.
16. I feel satisfied with the training and development provided by my organization.
17. The organization recognizes my importance for my work.
18. I feel that change is required to improve the working conditions of my organization.
19. I feel satisfied with the present working hours of my organization.
20. I feel satisfied with the clean and hygienic working place of my organization.
21. I am indifferent when somebody is talking ill of my organization.
22. There is high compatibility between my personal goal and organizational goal.
23. Incentives in this organization are based on performance.
24. I am fully satisfied on Bank’s Vision, Mission and Values and work on that.
25. My personal competency is used by the organization.
26. I get co-operation from all other departments.
27. My organization also cares for my family.
28. My co-workers are very supportive and friendly.
29. The organization has a well defined procedure for grievance handling.
30. The organization is concerned with the long-term welfare of all the employees.
Methodology
We tabulate data from the above study and analyzed it using the SPSS package.
Table 8.6.1 Total Variance Explained
Componen Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared
t Loadings
Total % of Cumulative Total % of Cumulative Total % of Cumul
Variance % Variance % Variance ative
%
1 8.08 26.948 26.948 8.08 26.948 26.948 3.62 12.066 12.066
4 4 0
2 1.78 5.952 32.899 1.78 5.952 32.899 3.19 10.651 22.718
5 5 5
3 1.51 5.047 37.946 1.51 5.047 37.946 2.63 8.780 31.498
4 4 4
4 1.32 4.404 42.350 1.32 4.404 42.350 2.60 8.693 40.192
1 1 8
5 1.26 4.217 46.566 1.26 4.217 46.566 1.50 4.999 45.191
5 5 0
6 1.17 3.911 50.477 1.17 3.911 50.477 1.44 4.826 50.017
3 3 8
7 1.03 3.441 53.918 1.03 3.441 53.918 1.17 3.901 53.918
2 2 0
8 .979 3.264 57.182
9 .954 3.178 60.360
10 .847 2.822 63.183
11 .829 2.763 65.946
12 .752 2.508 68.454
13 .743 2.475 70.930
14 .720 2.400 73.330
15 .681 2.271 75.600
16 .650 2.165 77.766
17 .618 2.059 79.824
18 .608 2.028 81.852
19 .600 2.001 83.854
20 .548 1.826 85.679
21 .538 1.794 87.473
22 .511 1.703 89.176
23 .480 1.600 90.775
24 .457 1.525 92.300
25 .441 1.471 93.771
26 .412 1.373 95.145
27 .405 1.351 96.495
28 .371 1.235 97.731
29 .359 1.196 98.926
30 .322 1.074 100.000
Extraction Method: Principal Component Analysis.
Table 8.6.2: Principal Component Analysis (Un-Rotated)
Table 10 Principal Component Un-Rotated Factor Matrix
Component
1 2 3 4 5 6 7
VAR00001 .288 .109 .397 -.118 .033 -.478 .036
VAR00002 .486 -.546 .302 .087 .035 .102 -.089
VAR00003 .552 -.017 .181 -.195 .275 -.180 .190
VAR00004 .606 -.066 .116 -.138 .262 -.124 -.064
VAR00005 .505 .227 .235 -.086 .133 -.099 -.099
VAR00006 .420 -.543 .181 .180 .061 .350 -.053
VAR00007 .427 .263 .400 .001 .191 -.044 .252
VAR00008 .434 -.152 .262 -.203 -.232 .177 .192
VAR00009 .608 -.055 -.196 .145 .208 -.150 .110
VAR00010 .622 -.295 -.255 -.126 .105 -.102 .085
VAR00011 .648 -.203 -.168 -.023 .045 -.240 -.057
VAR00012 .583 .208 -.033 .004 -.053 -.156 -.376
VAR00013 .596 -.385 .212 .082 -.217 -.013 -.119
VAR00014 .675 -.012 .233 -.083 -.007 .089 -.052
VAR00015 .558 .311 .039 -.043 .116 .117 -.078
VAR00016 .582 .096 -.255 -.154 .248 .105 .226
VAR00017 .628 -.032 -.176 .067 .155 .116 .138
VAR00018 -.034 .176 .402 .459 -.128 -.009 .513
VAR00019 .339 .005 .024 -.068 .253 .591 .045
VAR00020 .394 -.146 .158 .000 -.107 -.114 -.398
VAR00021 .126 .070 .122 .732 .012 -.107 -.160
VAR00022 .481 .242 -.006 .364 .092 -.003 -.084
VAR00023 .400 -.132 -.383 .328 .206 -.033 -.023
VAR00024 .577 .201 -.044 -.089 -.174 .055 -.114
VAR00025 .547 .413 -.129 .126 .137 .168 -.082
VAR00026 .583 .329 -.162 .085 -.288 .182 -.032
VAR00027 .519 .003 -.092 .060 -.551 .117 .110
VAR00028 .487 .363 .157 -.231 -.249 .157 -.135
VAR00029 .557 -.071 -.342 .005 -.247 -.243 .204
VAR00030 .672 -.114 -.170 -.040 -.314 -.152 .252
Extraction Method: Principal Component Analysis.
a. 7 components extracted.
Table 8.6.3: Rotated Component Matrix
Rotated Component Matrixa
Component
1 2 3 4 5 6 7
VAR00001 .047 .061 .622 .071 -.310 .004 .057
VAR00002 .177 -.021 .169 .763 .084 .059 -.020
VAR00003 .355 .041 .583 .163 .131 -.087 .027
VAR00004 .330 .107 .500 .258 .148 .043 -.179
VAR00005 .095 .281 .529 .093 .101 .117 -.080
VAR00006 .159 -.024 -.040 .737 .291 .097 .028
VAR00007 .061 .159 .596 .056 .186 .066 .294
VAR00008 .113 .313 .185 .416 .089 -.281 .197
VAR00009 .597 .095 .250 .119 .133 .225 -.020
VAR00010 .655 .096 .159 .280 .099 -.074 -.163
VAR00011 .580 .171 .244 .275 -.052 .092 -.201
VAR00012 .207 .466 .313 .101 -.043 .275 -.327
VAR00013 .250 .257 .153 .670 -.095 .079 -.018
VAR00014 .200 .375 .399 .393 .181 .030 -.024
VAR00015 .169 .410 .359 .020 .302 .146 -.079
VAR00016 .540 .202 .244 -.030 .403 -.069 -.027
VAR00017 .514 .214 .167 .173 .326 .114 .020
VAR00018 -.114 -.005 .122 .005 -.058 .219 .778
VAR00019 .048 .151 .044 .211 .680 -.040 .001
VAR00020 .044 .235 .199 .393 -.145 .161 -.295
VAR00021 -.014 .015 -.006 .130 -.104 .740 .181
VAR00022 .217 .285 .213 .043 .154 .486 .049
VAR00023 .534 -.010 -.071 .084 .163 .374 -.104
VAR00024 .230 .550 .203 .110 .090 .050 -.105
VAR00025 .233 .443 .232 -.106 .372 .314 -.060
VAR00026 .262 .690 .035 .019 .158 .164 .042
VAR00027 .305 .623 -.116 .266 -.084 -.030 .194
VAR00028 -.036 .663 .288 .066 .121 -.073 -.061
VAR00029 .667 .332 .030 .073 -.171 -.019 .050
VAR00030 .621 .422 .114 .242 -.126 -.092 .143
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 19 iterations.
Interpretation
(Before extraction it is assumed that each of the original variables has as Eigen value=1). As
evident from Table 8.6.1it is found that 7 factors extracted together account for 53.198% of
the total variance (information contained in the original 30 variables). Hence, the number of
variables has been reduced from 30 to 7 underlying factors. Table 8.6.2 represents the
component factor matrix of 7 components.
Looking at Table 8.6.3 it is seen that the variables 9, 10, 11, 29 and 30 have factor
loadings of 0.597, 0.655, 0.580, 0.667 and 0.621 respectively indicating that factor1 is a
combination of these 5 variables. Therefore these factors can be interpreted as “Employee-
centric policies and practices”. Now for factor 2, it is seen that variables 26, 27 and 28 have
a high loading of 0.690, 0.623 and 0.663 respectively indicating that factor 2 is a combination
of these variables. These variables can be clubbed into a single factor called “Co-operative
work environment”. As for factor 3, it is evident that variables 1, 3, 5 and 7 have high
loadings of 0.622, 0.583, 0.529 and 0.596 respectively. These factors consisting of the above
4variables can be termed as “Personal growth and motivation”. As for factor 4, it is evident
that variables 2, 6 and 13 have high loadings of 0.763, 0.737 and 0.670 respectively. These
factors consisting of the above 3 variables, can be termed as “Financial and Non-financial
benefits”. As for factor 5, it is evident that variables 16 and 19 have high loadings of 0.403
and 0.680. This factor consisting of the above 2 variables can be termed as “Measures for
Improved Performance”. As for factor 6, it is evident that variable 21 has high loading of
0.740. This factor consisting of the above 1 variable can be termed as “Brand value”. As for
factor 7, it is evident that variable 18 has high loading of 0.778. This factor consisting of the
above 1 variable can be termed as “Perceived Opportunity for Change”.

You might also like