0% found this document useful (0 votes)
91 views31 pages

Advanced Statistical Methods Project: Data Analysis Using Spss

This document describes a statistical analysis project using SPSS to analyze data on 109 world countries. Correlation and multiple regression analyses were conducted to examine the association between average female life expectancy and variables like literacy rates, GDP per capita, daily calorie intake, and birth rate. The analysis found all variables to be highly correlated with life expectancy and significant predictors in the regression model. The multiple regression helped predict life expectancy based on the collective influence of these factors.

Uploaded by

DHWANI SONI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views31 pages

Advanced Statistical Methods Project: Data Analysis Using Spss

This document describes a statistical analysis project using SPSS to analyze data on 109 world countries. Correlation and multiple regression analyses were conducted to examine the association between average female life expectancy and variables like literacy rates, GDP per capita, daily calorie intake, and birth rate. The analysis found all variables to be highly correlated with life expectancy and significant predictors in the regression model. The multiple regression helped predict life expectancy based on the collective influence of these factors.

Uploaded by

DHWANI SONI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

ADVANCED STATISTICAL METHODS

PROJECT
DATA ANALYSIS USING SPSS

Submitted to-
Ms.Shailaja Rego

By-
Mayank Bhatia
A013
MBA Banking 2nd Year
NMiMS Mumbai

On-
31st August 2012
TECHNIQUES USED

1. Correlation Coefficient and Multiple Regression


2. Factor Analysis and Cluster Analysis
CORRELATION COEFFICIENT AND MULTIPLE
REGRESSION

ABOUT THE DATA:

Data set used has information and statistics of 109 world countries
The input file contains the following information:
1. Name of Country
2. Population in thousands
3. Number of people / sq. kilometer
4. People living in cities (%)
5. Predominant religion
6. Average female life expectancy
7. Average male life expectancy
8. People who read (%)
9. Population increase (% per year))
10. Infant mortality (deaths per 1000 live births)
11. Gross domestic product / capita
12. Region or economic group
13. Daily calorie intake
14. Aids cases
15. Birth rate per 1000 people
16. Death rate per 1000 people
17. Number of aids cases / 100000 people
18. Log (base 10) of GDP_CAP
19. Log (base 10) of AIDS_RT
20. Birth to death ratio
21. Fertility: average number of kids
22. Log (base 10) of Population
23. Males who read (%)
24. Females who read (%)
25. Predominant climate
The following is the variable view of the data:

OBJECTIVE
The objective is to look at association between several variables with female
life expectancy. Variables will also be used later in Regression.

METHODOLOGY

Bivariate Correlation is used. The first variable used is Average Female Life
Expectancy. This will also be used as outcome variable during Multiple
Regression. The other variables used are:
1. People who read( % literacy)
2. Gross Domestic Product per capita
3. Daily Calorie Intake
4.Birth Rate per 1000 people
Total variables are 5. Pearson Correlation Coefficient and Two tailed test of
significance is used.
The output comes out to be

Correlations
Gross
Average People domestic Daily Birth rate
female life who read product / calorie per 1000
expectancy (%) capita intake people
Average female Pearson 1 .865** .642** .775** -.862**
life expectancy Correlation
Sig. (2-tailed) .000 .000 .000 .000
N 109 107 109 75 109
** ** **
People who read Pearson .865 1 .552 .682 -.869**
(%) Correlation
Sig. (2-tailed) .000 .000 .000 .000
N 107 107 107 74 107
** ** **
Gross domestic Pearson .642 .552 1 .751 -.651**
product / capita Correlation
Sig. (2-tailed) .000 .000 .000 .000
N 109 107 109 75 109
** ** **
Daily calorie Pearson .775 .682 .751 1 -.762**
intake Correlation
Sig. (2-tailed) .000 .000 .000 .000
N 75 74 75 75 75
** ** ** **
Birth rate per Pearson -.862 -.869 -.651 -.762 1
1000 people Correlation
Sig. (2-tailed) .000 .000 .000 .000
N 109 107 109 75 109
**. Correlation is significant at the 0.01 level (2-tailed).

The topmost row and leftmost column contains the name of the variables,
The value of correlation between any two variables ranges from 0 to 1. The
reason for the diagonal elements to be 1 is because each variable is perfectly
correlated to itself. The value 1 is referred to as the perfect correlation. This
means that everything falls on the regression line
The positive and negative sign show whether the relationship is a direct
relationship or an inverse relationship.
Note that the matrix above is symmetrical about the diagonal. This shows that
order of the variables in correlation doesn’t matter. So, the correlation between
variable Daily calorie intake and GDP/capita is same as the correlation
between GDP/ capita and Daily Calorie intake.
Now each of the cells in the above matrix has 3 values. Lets take them one by
one.
Considering the first column:
The first value is that of the Pearson coefficient that is the Pearson Product
moment correlation also referred to as R. The value is equal to 0.865 which is a
very high positive value. This implies that countries having a high literacy rate
have a longer average female life expectancy. The second value is that of the
significance value also known as the p value. Generally, if this is less than 0.05
then correlation is considered as statistically significant or reliably different
from 0. The last number is the N value which shows the number of countries
that have data for both the variables. So in this case there are 107 countries
which have data related to both average female life expectancy and Literacy
level.
Note that all of the associations above are statistically significant since the 2-
tailed significance value is less than 0.05 for each of the cells.
The two asterisks along with the Pearson Coefficient value means that
correlation are statistically significant. A single asterisk denotes a value less
than 0.05 which is also known as the standard level of statistical significance.
Note that all the variables are highly correlated not only in terms of
significance value but also in terms of absolute value. the association between
GDP/Capita and Literacy level has the least absolute value equivalent to 0.552
which is large association.
The negative correlation the birth rate per 1000 and other variables signifies
that more the number of births per 1000 population less is the Average female
life expectancy, less is the literacy level, less is the GDP/capita, less is the
Daily calorie intake. This may be due to the fact that most of the countries
taken in the data are developing countries and therefore there may be lack of
resources in these countries.
MULTIPLE REGRESSION
Multiple Regression looks at the correlation between the variables collectively.
Multiple Regression is used to predict the values on a quantitative outcome
variable using several other predictive variable that can be quantitative or
categorical.
Average female life expectancy is used as the outcome. The others like Birth
Rate per 1000, Daily calorie intake, Literacy level, and Gross Domestic
Product per capita are taken as the predictor variables of female life
expectancy.

OBJECTIVE-

In this case, Multiple Regression is used to view the association of all predictor
variables together to predict life expectancy of women.

METHODOLOGY-

Linear Regression is used with the following variables:


Dependent:- 1. Average Female Life expectancy
Independent-
1. Birth Rate per 1000
2. Daily calorie intake
3. Literacy level
4. Gross Domestic Product per capita
Linear Regression is used.
Enter Method is used.
The following statistics are used:
• Regression Coefficient Estimates
• Model Fit
• Descriptives
• Part and Partial Correlations
• Residual Durbin Watson
• Histogram and Normal Probability Plots for Standardized Residual
Plots are used

The following is the output from SPSS:

Descriptive Statistics

Mean Std. Deviation N

Average female life 68.70 11.448 74


expectancy

People who read (%) 75.47 23.127 74

Birth rate per 1000 people 27.743 12.4296 74

Daily calorie intake 2741.96 562.262 74

Gross domestic product / 5833.46 7196.274 74


capita

The above table shows the general overall distribution of variables There are a
total of 74 cases with all the data.

b
Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate Durbin-Watson
a
1 .912 .832 .822 4.824 2.071

a. Predictors: (Constant), Gross domestic product / capita, People who read (%), Daily
calorie intake, Birth rate per 1000 people

b. Dependent Variable: Average female life expectancy

Under the model summary R is known as the Multiple Correlation Coefficient.


The value is 0.912 which is extremely high. Many a times the value of
Adjusted R Square is taken which is 0.832 in this case. This means that 83 per
cent of the variance in Average Female life expectancy is explained by
combination of the four predictor variables (Birth Rate per 1000, Daily calorie
intake, Literacy level, Gross Domestic Product per capitaThe Adjusted R
Square takes into consideration the number of observations and the number of
predictor variables to arrive at the value so that these are not inflated. The
value of Standard error of the estimate is used for Hypothesis testing

The Durbin Watson statistic for this model is 2.071 which is within the
acceptable level of 1.5 -2.5.

b
ANOVA

Model Sum of Squares df Mean Square F Sig.


a
1 Regression 7961.937 4 1990.484 85.544 .000

Residual 1605.523 69 23.268

Total 9567.459 73

a. Predictors: (Constant), Gross domestic product / capita, People who read (%), Daily calorie
intake, Birth rate per 1000 people

b. Dependent Variable: Average female life expectancy

The ANOVA table for the regression analysis indicates whether the model is
significant and valid or not. The ANOVA is significant, if the ‘Sig’ column in
the above table is less than the level of significance (generally taken as 5% or
1%). Since 0.000<0.01 the model is significant and it is a tight and a good
model.
Variables Entered/Removed
Variables Variables
Model Entered Removed Method
1 Birth rate per . Enter
1000 people,
Gross
domestic
product /
capita, Daily
calorie
intake,
People who
read (%)a
a. All requested variables entered.

This table shows the different predictor variables which have been taken.

a
Coefficients

Unstandardized Standardized
Coefficients Coefficients Correlations

Zero-
Model B Std. Error Beta t Sig. order Partial Part

1 (Constant) 43.778 8.075 5.421 .000

People who read (%) .226 .050 .457 4.527 .000 .869 .479 .223

Birth rate per 1000 people -.256 .110 -.277 - .023 -.864 -.269 -
2.324 .115

Daily calorie intake .006 .002 .271 3.190 .002 .776 .358 .157

Gross domestic product / -3.589E-5 .000 -.023 -.273 .786 .676 -.033 -
capita .013

a. Dependent Variable: Average female life expectancy

Under the coefficient table the value of B for Constant indicates that keeping
all the predictor variables as 0 then average female life expectancy would be of
43.778 years. The B values for other predictor variables can be interpreted as
follows:
For every percentage point increase in People who read the average female life
expectancy increase by .226 years.
For each additional Daily calorie intake the average female life expectancy
increases by 0.006 years. The value is too less because 1000’s of calories are
taken daily.
For each unit increase in Birth Rate per 1000 the average female life
expectancy would decrease by 0.256 years.
The significance value for each of the predictor variable show the probability
level of each of the predictor variable, These generally need to be less than
0.05 to be considered as reliable, significant or meaningful. All are reliable
except for GDP whose value is 0.786.
Now below is the bivariate correlation between Average Female Life
Expectancy and other predictor variables taken from the above analysis,

Average female life


expectancy

Average female life expectancy Pearson Correlation 1

Sig. (2-tailed)

N 109

People who read (%) Pearson Correlation .865**

Sig. (2-tailed) .000

N 107

Gross domestic product / capita Pearson Correlation .642**

Sig. (2-tailed) .000

N 109

Daily calorie intake Pearson Correlation .775**

Sig. (2-tailed) .000

N 75

Birth rate per 1000 people Pearson Correlation -.862**

Sig. (2-tailed) .000

N 109

All predictor variables including GDP have high correlation to the outcome
variable when taken individually or on their own. The correlation between
GDP per capita and Average Female life expectancy is 0.642 which is quite
high and has probability level of less than .001.
However, in the Multiple Regression Model, GDP is no longer significantly
associated. The reason for this is that Multiple Correlation looks at the
combination of these four variables to predict the outcome. The Coefficient
table shows the contribution of each variable but only in combination with
each other.

a
Residuals Statistics

Minimum Maximum Mean Std. Deviation N

Predicted Value 46.91 83.11 68.70 10.444 74

Residual -14.991 7.579 .000 4.690 74

Std. Predicted Value -2.086 1.379 .000 1.000 74

Std. Residual -3.108 1.571 .000 .972 74

a. Dependent Variable: Average female life expectancy

The above chart is to test the validity of the assumption that the residuals are
normally distributed. Looking at the chart one may conclude that the residuals
are normal.
Since all the three regression coefficient are not significant, the enter method
cannot be used for estimation.
Hence for estimation, Stepwise method is used.

Descriptive Statistics

Mean Std. Deviation N

Average female life 68.70 11.448 74


expectancy

People who read (%) 75.47 23.127 74

Birth rate per 1000 people 27.743 12.4296 74

Daily calorie intake 2741.96 562.262 74

Gross domestic product / 5833.46 7196.274 74


capita

Correlations

Daily
Average female People who Birth rate per calorie Gross domestic
life expectancy read (%) 1000 people intake product / capita

Pearson Average female 1.000 .869 -.864 .776 .676


Correlation life expectancy

People who read .869 1.000 -.871 .682 .627


(%)

Birth rate per 1000 -.864 -.871 1.000 -.757 -.741


people

Daily calorie intake .776 .682 -.757 1.000 .760

Gross domestic .676 .627 -.741 .760 1.000


product / capita

Sig. (1-tailed) Average female . .000 .000 .000 .000


life expectancy

People who read .000 . .000 .000 .000


(%)

Birth rate per 1000 .000 .000 . .000 .000


people

Daily calorie intake .000 .000 .000 . .000


Gross domestic .000 .000 .000 .000 .
product / capita

N Average female 74 74 74 74 74
life expectancy

People who read 74 74 74 74 74


(%)

Birth rate per 1000 74 74 74 74 74


people

Daily calorie intake 74 74 74 74 74

Gross domestic 74 74 74 74 74
product / capita

a
Variables Entered/Removed

Variables Variables
Model Entered Removed Method

1 People who read . Stepwise


(%) (Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove >=
.100).

2 Daily calorie . Stepwise


intake (Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove >=
.100).

3 Birth rate per . Stepwise


1000 people (Criteria:
Probability-of-F-
to-enter <= .050,
Probability-of-F-
to-remove >=
.100).

a. Dependent Variable: Average female life expectancy


d
Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate Durbin-Watson
a
1 .869 .756 .752 5.698
b
2 .905 .818 .813 4.948
c
3 .912 .832 .825 4.792 2.065

a. Predictors: (Constant), People who read (%)

b. Predictors: (Constant), People who read (%), Daily calorie intake

c. Predictors: (Constant), People who read (%), Daily calorie intake, Birth rate per 1000
people

d. Dependent Variable: Average female life expectancy

In the previous method, there was only one model. Stepwise method gives all
the models that are significant in each step. The Durbin Watson is within the
acceptable range of 1.5-2.5
The last model is generally the best model.

d
ANOVA

Model Sum of Squares Df Mean Square F Sig.


a
1 Regression 7229.894 1 7229.894 222.690 .000

Residual 2337.565 72 32.466

Total 9567.459 73
b
2 Regression 7829.451 2 3914.726 159.922 .000

Residual 1738.008 71 24.479

Total 9567.459 73
c
3 Regression 7960.208 3 2653.403 115.563 .000

Residual 1607.252 70 22.961

Total 9567.459 73

a. Predictors: (Constant), People who read (%)

b. Predictors: (Constant), People who read (%), Daily calorie intake

c. Predictors: (Constant), People who read (%), Daily calorie intake, Birth rate per 1000 people

d. Dependent Variable: Average female life expectancy

The above table gives ANOVA for all iterations and both are significant.
a
Coefficients

Unstandardized Standardized
Coefficients Coefficients Correlations

Zero-
Model B Std. Error Beta t Sig. order Partial Part

1 (Constant) 36.226 2.275 15.924 .000

People who read (%) .430 .029 .869 14.923 .000 .869 .869 .869

2 (Constant) 25.838 2.882 8.964 .000

People who read (%) .315 .034 .636 9.202 .000 .869 .738 .465

Daily calorie intake .007 .001 .342 4.949 .000 .776 .506 .250

3 (Constant) 43.784 8.022 5.458 .000

People who read (%) .227 .049 .460 4.605 .000 .869 .482 .226

Daily calorie intake .005 .002 .261 3.472 .001 .776 .383 .170

Birth rate per 1000 -.245 .103 -.267 -2.386 .020 -.864 -.274 -
people .117

a. Dependent Variable: Average female life expectancy

d
Excluded Variables

Collinearity
Statistics
Partial
Model Beta In t Sig. Correlation Tolerance
a
1 Birth rate per 1000 people -.442 -4.134 .000 -.440 .242
a
Daily calorie intake .342 4.949 .000 .506 .535
a
Gross domestic product / .215 3.036 .003 .339 .606
capita
b
2 Birth rate per 1000 people -.267 -2.386 .020 -.274 .192
b
Gross domestic product / .042 .525 .601 .063 .400
capita
c
3 Gross domestic product / -.023 -.273 .786 -.033 .355
capita

a. Predictors in the Model: (Constant), People who read (%)

b. Predictors in the Model: (Constant), People who read (%), Daily calorie intake

c. Predictors in the Model: (Constant), People who read (%), Daily calorie intake, Birth rate per 1000 people

d. Dependent Variable: Average female life expectancy


FACTOR ANALYSIS AND CLUSTER ANALYSIS

Factor Analysis is used to reduce large number of variables into a smaller


number of factors

ABOUT THE DATA:

Technology survey data. The CMU technology survey was conducted to find
out the current situation of using technology for instruction by faculty. Many
issues are addressed in the survey, including the use of technology, the
awareness of CMU technology availability, the participation in workshop,
development of web site and its purposes, the difficulty faced when using
technology, the priority issues and policy issues, and so on.

SOURCE OF DATA:

CMU Teaching, Learning & Technology


Variables in the data set are:
Q2: Faculty Rank Classification.
Q5: How often do you use your office computer?
Q10: How often you use your home computer for university related activities?
Q26: Do you use information technology in class instruction?
Q31_A1 to Q31_A12 and A31_B1 to Q31_B4: These are questions about the
level of difficulty faced when using information technology in class.

The following is the variable view of the data


Considering questions Q31 A1 through to Q31 A12. These questions are rated
on a scale of 1 to 5 where 1 is least difficult to use and 5 is most difficult to use.

OBJECTIVE:

To check if these 12 variables can be reduced to smaller number of factors.

METHODOLOGY:

• Analyze->Dimension Reduction-> Factor


• Univariate Descriptives and Initial Solution is used as Statistics
• Coefficient Correlation Matrix is selected
• The extraction method used is Principal Components Method
• Under Analyze Correlation Matrix is selected
• Unrotated and Scree plot are displayed
• Factors with eigenvalues greater than 1 are extracted
• Under Rotation, VArimax rotation method is used
• Under Options, Display to be sorted by size is chosen.
Performing Factor Analysis on the variable Q31_A1 to Q31_A12 the output
is:

Descriptive Statistics

Mean Std. Deviation Analysis N

Q31A1 2.77 1.565 126

Q31A2 2.73 1.567 126

Q31A3 2.90 1.504 126

Q31A4 2.48 1.672 126

Q31A5 2.86 1.396 126

Q31A6 2.74 1.438 126

Q31A7 2.56 1.567 126

Q31A8 2.43 1.551 126

Q31A9 2.45 1.440 126

Q31A10 2.66 1.807 126

Q31A11 3.36 1.904 126

Q31A12 2.71 1.710 126

From the descriptive table we get to know that there are 126 valid cases for
our analysis.
Correlation Matrix

Q31A Q31A Q31A Q31A Q31A Q31A Q31A Q31A Q31A Q31A1 Q31A1 Q31A1
1 2 3 4 5 6 7 8 9 0 1 2

Correlatio Q31A1 1.000 .920 .687 .684 .747 .663 .727 .723 .462 .062 .245 .280
n
Q31A2 .920 1.000 .749 .679 .776 .647 .765 .742 .466 .063 .242 .240

Q31A3 .687 .749 1.000 .651 .752 .673 .667 .683 .511 .197 .372 .278

Q31A4 .684 .679 .651 1.000 .684 .628 .588 .627 .554 .189 .384 .344

Q31A5 .747 .776 .752 .684 1.000 .755 .742 .804 .534 .076 .381 .385

Q31A6 .663 .647 .673 .628 .755 1.000 .630 .643 .475 .125 .382 .347

Q31A7 .727 .765 .667 .588 .742 .630 1.000 .888 .451 -.009 .233 .197

Q31A8 .723 .742 .683 .627 .804 .643 .888 1.000 .493 .015 .276 .203

Q31A9 .462 .466 .511 .554 .534 .475 .451 .493 1.000 .401 .425 .602

Q31A1 .062 .063 .197 .189 .076 .125 -.009 .015 .401 1.000 .378 .488
0

Q31A1 .245 .242 .372 .384 .381 .382 .233 .276 .425 .378 1.000 .489
1

Q31A1 .280 .240 .278 .344 .385 .347 .197 .203 .602 .488 .489 1.000
2

The Principal Component Analysis can be carried out if the correlation matrix
for the variables contains at least two correlations of 0.3 or more. Here we see
that this condition is fulfilled
KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .885

Bartlett's Test of Sphericity Approx. Chi-Square 1247.634

Df 66

Sig. .000

KMO Bartlett measure of sampling adequacy is an index used to test


appropriateness of the factor analysis. The minimum required KMO is 0.5. The
table above shows that this index for the data is 0.885 and the chi-square
statistic is significant (<0.05). This means that Principal Component Analysis
is appropriate for this data.

Communalities

Initial Extraction

Q31A1 1.000 .796

Q31A2 1.000 .834

Q31A3 1.000 .714

Q31A4 1.000 .657

Q31A5 1.000 .824

Q31A6 1.000 .661

Q31A7 1.000 .790

Q31A8 1.000 .805

Q31A9 1.000 .655

Q31A10 1.000 .642

Q31A11 1.000 .536

Q31A12 1.000 .702

Extraction Method: Principal


Component Analysis.

The above table shows the initial communalities and extraction communalities.
Communality is the variance of each variable explained by the common factors
selected from the factor analysis. Extraction communalities are estimates if
variance in each variable accounted for by the components. Eg. 79.6% of the
variance of the variable Q31A1 is explained by common factors in this factor
analysis. The communalities in the above table are all high which indicates
that the extracted components represent the variables well.

Total Variance Explained

Extraction Sums of Squared Rotation Sums of Squared


Initial Eigenvalues Loadings Loadings

% of Cumulative % of Cumulative % of Cumulative


Component Total Variance % Total Variance % Total Variance %

1 6.743 56.191 56.191 6.743 56.191 56.191 6.044 50.370 50.370

2 1.875 15.623 71.814 1.875 15.623 71.814 2.573 21.444 71.814

3 .638 5.316 77.130

4 .555 4.623 81.753

5 .469 3.906 85.660

6 .410 3.417 89.077

7 .391 3.261 92.338

8 .314 2.621 94.959

9 .264 2.196 97.155

10 .181 1.509 98.664

11 .097 .811 99.475

12 .063 .525 100.000

Extraction Method: Principal Component Analysis.

The above table gives the total variance contributed by each component.
In the above table we see that only 2 eigenvalues are greater than 1. Therefore
only these 2 factors will be extracted.
The pivot table shows the percentage of variance explained by each factor and
the cumulative variance. If we look at the Rotation Sum of Square loadings
these 2 factors still account for 72 % of the variance
Scree plot plots the eigenvalue against number of components and help in
determining the optimal number of components . The scree plot supports
extraction 2 factors because the eigenvalues level off from the 3rd eigenvalue
onwards.
a
Component Matrix

Component

1 2

Q31A5 .901 -.108

Q31A2 .877 -.256

Q31A1 .863 -.227

Q31A8 .855 -.271

Q31A3 .843 -.057

Q31A7 .834 -.308

Q31A6 .813 -.034

Q31A4 .810 .029

Q31A9 .684 .434

Q31A10 .224 .769

Q31A12 .468 .695

Q31A11 .476 .556

Extraction Method: Principal


Component Analysis.

a. 2 components extracted.

This table gives each variable component loading


a
Rotated Component Matrix

Component

1 2

Q31A2 .908 .095

Q31A8 .894 .073

Q31A7 .888 .031

Q31A1 .884 .116

Q31A5 .875 .241

Q31A3 .802 .267

Q31A6 .765 .276

Q31A4 .739 .334

Q31A12 .169 .821

Q31A10 -.084 .797

Q31A11 .230 .695

Q31A9 .468 .660


Extraction Method: Principal
Component Analysis.
Rotation Method: Varimax with
Kaiser Normalization.

a. Rotation converged in 3
iterations.

The maximum of each row of the above table indicates that the respective
variable belongs to the respective component. We can see from the above
rotated component matrix that the variables Q31A2 through Q31A4 are
highly loaded on Factor 1. On the other hand, question Q31A12 through
Q31A9 are highly loaded on Factor 2.
Thus this analysis puts these 12 variables into 2 factors- Those which are
highly loaded on Factor 1 and those which are highly loaded on Factor 2.

Component Transformation
Matrix

Component 1 2

1 .925 .379

2 -.379 .925

Extraction Method: Principal


Component Analysis.
Rotation Method: Varimax with
Kaiser Normalization.
CLUSTER ANALYSIS
Cluster Analysis can be used to cluster cases and variables into groups.
Hierarichal cluster analysis is used for the following problem.

OBJECTIVE:

To compare Cluster analysis result with Factor Analysis Results

METHODOLOGY:

• Analyze->Classify-> Hierarchal Cluster.


• Variables are clustered
• Statistics and Plots are displayed
• Single Solution of 2 clusters is selected as Cluster Membership
• Agglomeration Schedule is selected
• Dendogram Plot is chosen
• Between groups linkage cluster method is used
• Chi square measure is used as counts measure

a
Case Processing Summary

Cases

Rejected

Valid Missing Value Negative Value Total

N Percent N Percent N Percent N Percent

126 34.9% 235 65.1% 0 .0% 361 100.0%

a. Chi-square between Sets of Frequencies used


Agglomeration Schedule

Cluster Combined Stage Cluster First Appears

Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage

1 1 2 3.219 0 0 5

2 7 8 3.646 0 0 5

3 5 6 4.706 0 0 4

4 3 5 5.409 0 3 6

5 1 7 5.716 1 2 6

6 1 3 5.858 5 4 7

7 1 4 6.645 6 0 11

8 9 12 6.712 0 0 9

9 9 10 7.854 8 0 10

10 9 11 8.125 9 0 11

11 1 9 8.773 7 10 0

The above table shows in which step clusters are combined

Cluster Membership

Case 2 Clusters

Q31A1 1

Q31A2 1

Q31A3 1

Q31A4 1

Q31A5 1

Q31A6 1

Q31A7 1

Q31A8 1

Q31A9 2

Q31A10 2

Q31A11 2

Q31A12 2

Above table shows that cluster 1 is made of variables Q31A1 through Q31A8
and cluster 2 is made up of variables Q31A9 to Q31A12 which is the same
result we have got in the previous problem.
This shows the same thing as dendogram but it doesn’t show the order in
which the clusters are combined.
* * * * * * * * * * * * * * * * * * * H I E R A R C H I C A L C L U S T E
R A N A L Y S I S * * * * * * * * * * * * * * * * * * *

Dendrogram using Average Linkage (Between Groups)

Rescaled Distance Cluster Combine

C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+

Q31A1 1 -+---------------------+
Q31A2 2 -+ |
Q31A7 7 ---+-------------------+ Branch 1
Q31A8 8 ---+ +-------+
Q31A5 5 -------------+-----+ | |
Q31A6 6 -------------+ +---+ +-----------------+
Q31A3 3 -------------------+ | |
Q31A4 4 -------------------------------+ |
Q31A9 9 -------------------------------+---------+ |
Q31A12 12 -------------------------------+ +---+ |
Q31A10 10 -----------------------------------------+ +---+
Q31A11 11 ---------------------------------------------+

Branch2

Dendogram shows how variables combine to form clusters. Here it shows the
two clusters. Cluster 1 is of variables Q31A1 through Q31A8 and second
cluster consists of variables Q31A9 to Q31A12.

You might also like