0% found this document useful (0 votes)
46 views83 pages

SPSS Training Manual (Z)

The SPSS Training Manual from Hawassa University provides a comprehensive guide on using SPSS software for statistical analysis, covering topics such as data entry, descriptive statistics, and various statistical tests. It includes detailed instructions on navigating the SPSS interface, editing data, and performing analyses like T-tests and ANOVA. The manual also explains how to import data from different sources and transform variables for analysis.

Uploaded by

shemsukedir646
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views83 pages

SPSS Training Manual (Z)

The SPSS Training Manual from Hawassa University provides a comprehensive guide on using SPSS software for statistical analysis, covering topics such as data entry, descriptive statistics, and various statistical tests. It includes detailed instructions on navigating the SPSS interface, editing data, and performing analyses like T-tests and ANOVA. The manual also explains how to import data from different sources and transform variables for analysis.

Uploaded by

shemsukedir646
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

HAWASSA UNIVERITY

SCHOOL OF MATHEMATICAL AND


STATISTICAL SCIENCES (STATISTICS
PROGRAM)

SPSS TRAINING MANUAL

MAY, 2017
OUTLINE OF THE TRAINING

1. Introduction SPSS Software

1.1 The SPSS Windows


1.2 The Menus and Their Use
1.3 Entering and Saving Data in SPSS
1.4 Data Editing
1.4 Data Transformation
1.5 Importing and Exporting Data
2. Descriptive Statistics

2.1 Summary statistics (Frequency, Measure of Central Tendency and Variability)


2.2 Graphical Presentation of data ( Bar graph(Chart), Pie-chart & Histogram)
3. Statistical Analysis Using SPSS

▪ One-Sample T-test, Independent Samples T-test and Paired


Samples T-test
▪ One-Way ANOVA
▪ Correlation Analysis
▪ Non-parametric test (e.g. Test of Association)
▪ Linear Regression Analysis
▪ Binary Logistic Regression Analysis
1. Introduction to SPSS Software

➢SPSS means Statistical Package for Social Science.

➢ However, it works for any science or art when statistical analysis is required.

Opening SPSS

• Depending on how the computer you are working on is structured, you can open
SPSS in one of two ways.

1) If there is SPSS shortcut like on the desktop, simply put the cursor on it
and double click the left mouse button.

2) Click the left mouse button on the button on your screen, then put
your cursor on Programs or All Programs and left click the mouse. Select SPSS
16 or 20 for Windows by clicking the left mouse button. Either approach will
launch the program.
Introduction (Cont.)

Exiting SPSS

• To close SPSS, you can either left click on the close button located on the
upper right hand corner of the window or select Exit from the File menu.

• A dialog box like the one below will appear for every open window asking you if you
want to save it before exiting.

• Click No for each dialog box if you do not have any new files or click Yes if you
want to save files.
Introduction (Cont.)
➢SPSS software has three windows:

Figure 1: Types of SPSS Windows


Figure 2: Individual SPSS Windows
A. Variable View Window
❖ Variable view window is a window where you can start working in the SPSS.
❖ Before any analysis, you are supposed to register all the information you have
❖ That is, you define every variable in the data and code them on the variable
window. Look image A in figure (2).
❖ As you can see from the first row of image A, you will get the following
column information,
❖ The column description of the variable view window is given below.

Figure 3: variable view window


Variable View Window (Cont.)
Name: write the full or short (code) form of your variable name without any space.
Type: When you click the type option you will see list of possible options just select one
which is suitable for your variable.
❖Mostly if you have a quantitative variable with number value or qualitative variable
with coded numeric value just tick Numeric option.
❖Whereas for qualitative variable with text value just tick string. See figure (4).
Width: the number of character before decimal.
Decimal: options will not work for String variable type. Where the Decimal shows the
number of digits after decimal.

Figure 4: Options to select for the variable type


Variable View Window (Cont.)
Label: is a column where you can write every informative or description about the
name of the variable written on the Name option at the beginning. Especially, this
column is very important if you used a code for the name of the variable.
Values: is a place where you will define code for the variable values. Just write the
value that you will use on the value box and write its representation on the value label.
E.g. for sex variable value-0, value label- Male click add and value- 1, value label -
Female again click add then click ok
❖ Just you have finished doing these, remember that its possible to remove or change
the thing you wrote. Look figure (5).
Missing: is column where you can define missing values.
❖ if you don’t have a value of one observation you couldn’t jump, but define missing
value and then assign that value when you missed it.
❖ Just click the discrete missing values or Range plus. . . . . . . Box and give value for
the missing.
❖ This missing value is up on you. Look figure 5 and 6.
Variable View Window (Cont.)

figure (6).
Figure 5: Variable labeling window

Columns and Align: is a column where you can give value for the width of
the column in the data view and Align tell us about the alignment of the value
in the cell.
Measure: is a place where you define the measurement scale(Nominal,
Ordinal and Scale) of the variable
B. Data View Window

❖ Data view window is a place where you will see the variables defined on the
variable view window listed on the first row of the spread sheet.
❖ And then if its your first time you are supposed to type the values of each variable.
❖ However, if you already have the data, you will see them here.
❖ Remember every column in this window shows the variable values and each row
shows the observation or individual information for each variable. Look image B in
figure (2).
C. Output Window
❖ This is a window where you will see all the results of the analysis you made.
❖ You can copy your result to Microsoft word or to other place from here or you can
even edit the outputs before you copy. Look image C in figure (2)
SPSS Menus and Icons

Menus and Icons given as follow.


File includes open, save, exit. You can open or create new files of multiple types

• Edit includes typically cut, copy, and paste commands which allows you to
specify various options for displaying data and output
• View allows you to select which toolbars you want to show, select font size, add or
remove the gridlines that separate each piece of data, and to select whether or not to
display your raw data or the value labels.
• Data allows you to select several options ranging from displaying data that is sorted
by a specific variable to selecting certain cases for subsequent analyses.
SPSS Menus and Icons (Cont.)
• Transform includes several options to change current variables. For example, you
can recode a given variables to the same or different variable, change scores into
rank scores, add a constant to variables, etc.
• Analyze includes all of the commands to carry out statistical analyses.
• Graphs includes the commands to create various types of graphs including box
plots, histograms, line graphs, and bar charts.
• Utilities allows you to list file information which is a list of all variables, there
labels, values, locations in the data file, and type.
• Window can be used to select which window you want to view (i.e., Data Editor,
Output Viewer, or Syntax).

• Help has many useful options including a link to the SPSS homepage, a statistics
coach, and a syntax guide. Using topics, you can use the index option to type in any
key word and get a list of options, or you can view the categories and subcategories
available under contents.
Entering and Saving Data in SPSS
• To enter data, simply beginning by typing information into each cell. If you did so,
SPSS would give each column a generic label such as var00001.

• Clearly this is not desirable because you would have no way of identifying what
var00001 meant later. Instead, you have to specify names for our variables.

• To do this, you can simply click on Variable View on the bottom left hand corner of
your window and specify the variable name, type, width, decimal, label and so on.

• After you enter the appropriate data click Save.


Data Editing
Inserting a Variable:
• After specifying all variables for a given data, if you forgot one variable say ID and
you like it to be the first variable in your data
• You can add this variable in one of the following ways
• In Variable View Window, highlight the first row and then click Insert Variable
which place a new variable before the selected variable.
• In Data View Window, highlight the first column and then click the Insert Variable
icon. This will also place a new variable column at the beginning of the file.
• Either highlight the first row in Variable View or highlight the first column in Data
View Window then click on Insert Variable icon from tool bar.
Inserting a Case
• If you want to insert a case between the person with ID 9 and ID 11 you can
highlight the row for the case with ID 11, and then either:
• click on Insert Case from the Data menu or
• click on the Insert Case icon from tool bar or
• In Data View Window, highlight the first column and then click the Insert Case
Data Editing (Cont.)

Sort Variable: you can sort variables by their name, type, label ,…
• Example to sort by variable name Data -> Sort variable ->name
Sort Cases
• Click Sort Cases under the Data menu
• In the dialog box, select participant ID or Name and move it into the Sort
by box by clicking the arrow.
• Select Ascending or Descending for Sort Order then click Ok.
Data Editing (Cont.)
Merging Files
a) Adding Cases
• Sometimes data that are related may be in different files and you would like to combine
or merge it in on file.
• In this case, each file contains the same variables but different cases.
• To combine these files, if one of the data files is open, then left click on Merge Files on
the Data menu and select Add Cases.
• Then specify the file from which the new data will come and click Open.
• A dialog box will appear showing you which variables will appear in the new file. see it,
and if all seems in order, click OK. The two files will be merged.
b) Adding Variables
• In other cases, you might have different data on the same cases or participants in
different files
• Be sure that variables on the same participants end up in the correct row, that is, you
need to match the cases (number of observation)
• Click on Merge Files in the Data menu, and select Add Variables indicate the file that
the new variables are coming from then click ok.
Data Editing (Cont.)
Split file
• We can split a given file in to two or more groups based on the categories/values of
a given variable
• For example, we can split a given data by male and female
Select Cases
• By selecting cases, the researcher can select only certain cases for analysis
• Click Data, click Select Cases, click Random Sample of Cases then select your
preferences
• You can also use ‘if condition selected’ option.
Data Transformation
Compute
• Compute is used to get a new variable by manipulating other variable(s).
• It uses conditional operators (If condition), logical operator and other mathematical
functions
• Click ‘Transform’ and then click ‘Compute Variable…’
• Example: Create new variable named ‘lnheight’ which is the natural log of height
• Type in lnheight in the ‘Target Variable’ box. Then type in ‘ln(height)’ in the
‘Numeric Expression’ box. Click OK
• A new variable ‘lnheight’ is added to the file
Data Transformation (Cont.)
Count
• It counts the occurrence of a value in a given variable
• To do this, click on Transform and then click on count values within cases
• Select the variable to the variables box, write the name and full description of
target variable in target variable and target label box respectively.
• Click on Define Values, write the value to be counted in the values box and then
add it in to values to count box by clicking add option.
• Click on Continue
• Then click OK
Data Transformation (Cont.)
Recode a Variable
• Recoding allows you to create a new or the same variable with a different value
• You can use the command ‘Recode into the Same Variables’ or ‘Recode into
Different Variables’
• For example to Recode into Different Variables,
• Click Transform, click ‘Recode into Different Variables’, move the old variable to
the right, give Name and Label for the Output Variable, click Change, click on
Old and New Values, insert Old and New values (Click Range to create ranges of
old values), click add for each values, click continue and then click ok.
Data Transformation (Cont.)
Ranking Cases
• The Transform menu is used to make changes to selected variables in the data file
and to compute new variables based on the values of existing ones.
• Ranking or recoding of data can also be done.
• The steps are simply click on Transform , click on Ranking Cases, select the
variable to be ranked, click on Smallest value or largest value from ‘Assign Rank
1 to’ option then click ok.
Importing Data (Reading Data In From Other Sources)
Opening data from EXCEL
• SPSS can also recognize data from several other sources
• For example, you can open data from Microsoft EXCEL in SPSS
• You should follow all of the variable name guidelines specified by SPSS (e.g.,
character length, no numbers beginning a name, etc.)
• Save your Excel file (with xls file extension ) then close it.
• Open SPSS and select Read Text Data from the File menu.
• A dialog box will appear, under Files of type select Excel, under Look in select
your file name then click Open
• Select Read variable names from the first row of data, because that is where the
names appear in the Excel file. Then, click Ok.
Importing Data (Cont.)
Text Data
• A text data file can be created in any word processing program or in Notepad or any
other text editor
• Be sure to save the file with the .txt or .dat file extension.
• If you have collected data from 4 people and typed it in the following format
012345 The first two digits are the ID number. The next digit is gender. Digits 4, 5,
021123 and 6 are the responses to the first 3 questions on a survey. No characters
031234 or spaces separate the variables. The data are on your disk in
042345 simpletextdata.txt

• Open SPSS.
• In the SPSS File menu, click Read text data.
• Select simpletextdata under Files of type Text and click Open.
• In the next dialog box, click No for “Does your text file have a predefined format” and
click Next.
• In the next dialog box, select Fixed width under “How are your variables arranged,”
then select No for “Are variable names included in the top of your file.” Then click
Next.
Importing Data (Cont.)
• In the next dialog box, indicate that the data starts on line 1, 1 line represents a case, and
you want to import all cases, then click Next. The following dialog box will appear. We
need to tell SPSS where to insert breaks for variables.

• The next dialog box will show you a draft of what your new data file will look like.
Notice, the variables will have generic names like v1, v2, etc. Then click Next.
• At the next dialog box, you can click Finish and your new data file will appear. You
could then specify variable names, types, and labels as illustrated above.
Importing Data (Cont.)
• Let’s take one more example where the text file is tab delimitated (a tab was inserted between
each variable) and has variable names at the top.
ID Gender Q1 Q2 Q3
• Below, is an example of the first two lines of from this text file. 01 2 3 4 5
• On the File menu, click Read text data.
• Select tabtextdata under Files of type Text, then click Open.
• On the next dialog box, you will see a draft that shows a box . between each variable to
represent the tabs. Are they in the right place? Select No for predefined format and then click
Next.
• Select Delimited for file arrangement and Yes for variable names at the top of the file, then
click Next.
• In the next dialog box, indicate that the data starts on line 2, Each line represents a case, and
you want to import all cases, then click Next.
• In the next dialog box, check Tab as the type of delimiter and then click Next.
• You will see a draft of your data file. Review it, and then click Next.
• Click Finish at the next dialog box and your new data file will appear.
• The difference between these two examples, is the second included the variable names at the
top of the file
Exporting Data from SPSS
• Copying is not the only option to take SPSS outputs in to Microsoft word or to
other program.
• But it is possible to export SPSS results into a Microsoft package.
• From file menu click on Export
• In the Objects to Export box, select All, All Visible or selected

• Under Document Type select Word/RTF (*.doc) from the drop down menu.

• In the File Name box type click on browse and specify the location and give file
name

• Click OK.
2. DESCRIPTIVE STATISTICS

❖To run descriptive Statistics apply the following, ( look figure 7)


Analyze → Descriptive Statistics
❖ From the descriptive analysis part the most frequently used once are Frequency,
Descriptive statistics calculation and cross tabulation.

A. Frequency

Analyze → Descriptive Statistics → Frequencies


❖ Click Statistics on the frequency window to compute measure of central tendencies
like mean, median, mode etc , dispersion measures like variance, quartiles, skewness etc

Figure 7: Descriptive Statistics window


DESCRIPTIVE STATISTICS (Cont.)

❖ In addition by clicking charts you can draw suitable graphs like histogram, pie chart
for the variable/s selected. See figure (8) for each window.
❖ Remember the first job during the above analysis is to select the variables listed on
image A and take them to image B.
❖ That means all the analysis selected will be done for the variables found on image B,
see figure (8).
B. Measure of Central Tendency and Variability
❖ By clicking Analyze → Descriptive Statistics → Descriptive you can select the
variable/s from image A in to image B of figure (9) to calculate mean, sum, dispersion
measures, shape of the data (skewness and kurtosis) and sorting.
❖ Remember by clicking the option box, you can chose the one you wanted to calculate
rather than the defaults. See figure (9).
Figure 8: Window for frequency
Figure 9: Window for Descriptive
DESCRIPTIVE STATISTICS (Cont.)

C. Cross Tabulation

❖Cross tabulation is used when you are interested to construct a two-way table.
❖As before all the available variable options will be listed on image A and from these
you select the row variable in to image B and the column variable in to image C.
❖If you want additional categorizing variable select it in to image D. On image D
(layer option) you can add more variables. see figure (10).

Figure 10: Window for Cross tabulation


DESCRIPTIVE STATISTICS (Cont.)

D. Graphical Presentation of Data

❖Based on the variable type you can construct a suitable graph like Bar graph(chart), Pie
Chart, Histogram etc using SPSS. Command: Graphs → Legacy Dialogs.
i. Bar graph(Chart) :In SPSS software, you have three Bar graph options:

• simple bar graph; for single variable only.

Figure 11: Sample graphical presentations


DESCRIPTIVE STATISTICS (Cont.)

❖ Clustered bar graph: for more than one variable display.


❖ Stacked bar graph: for more than one variable but different in display with
clustered bar graph.
❖ After you select the type of graph you are interested to draw, you have to fix the type
of summary that you are looking for.
❖ Summaries for Groups of Cases: Creates a chart summarizing categories of a
single variable.
❖ This variable may be numeric or string but string variable sound ok. Bar height
represents a function of either the category variable itself or another summary
variable.
❖ Summaries of Separate Variables: Creates a chart summarizing two or more
numeric variables. There is one bar for each variable.
❖ For both simple or clustered bar chart, after you select the data in chart option to
be Summaries for Groups of Cases on the new window as usual the variable
available for the analysis found on the left side of the window. Look figure (12) and
figure (13) Then:
Figure 12: Simple Bar graph: Summaries for Group of Cases
Figure 13: Clustered Bar graph: Summaries for group of cases
DESCRIPTIVE STATISTICS (Cont.)

❖ image C: Select the main categorizing variable and put it on category axis box.
❖ image M: Select the representation for the bar in the chart. The options are like number of
cases, percent of cases,... and other.
❖ If you select the other option you are supposed to select one numeric variable and then
the bars will represent the summary(click Change Statistic to select mean, median
variance or any ) of the numeric variable.
❖ Image T: click this to write the title and caption of the graph.
❖ Image P: this Panel by option will give you options to draw more than one graph on
single display.
❖ Enter a new variable (categorical) on Rows option to display the main plot categorized
row wise. And Enter a new variable (categorical) on Columns option to display the main
plot categorized column wise.
DESCRIPTIVE STATISTICS (Cont.)

❖ Image I: This is used for clustered graph. You have to select the clustering
variable and put it here. Look figure 14 and 15.
❖ Image N: Select two or more numeric variables to image N and then by clicking
change statistic below image N you can select the summary that is going to be
calculated like mean, median variance skewness etc.
❖ Image V: same purpose as image N.
❖ Image C: Select the clustering (categorizing) variable here.
Figure 14: Simple Bar graph: Summaries of Separate variables
Figure 15: Clustered Bar graph: Summaries of Separate variables
DESCRIPTIVE STATISTICS (Cont.)

ii. Pie-chart
❖ The procedures are the same as the steps needed to construct different simple bar
graphs.
Graphs → Legacy dialogs → Pie.
❖ Then decide which graph that you want to construct (summaries for groups of cases,
summary of separate variables and values of individual cases).
❖ Then the rest steps are the same.
iii. Histogram: (Look figure16)
❖ Image P: This Panel by option will give you a chance to draw more than one
graph on single display.
❖ Enter a new variable (categorical) on Rows option to display the main plot
categorized row wise. And Enter a new variable (categorical) on Columns option to
display the main plot categorized column wise.
❖ Image V: Insert the variable which histogram is going to be constructed here.
❖ Image T: click here and write the title and caption for the graph.
Figure 16: SPSS Window for Histogram
3. INFERENTIAL STATISTICS

3.1. Estimation and Hypothesis Test of population mean/s


❖ Just click Analyze → Compare Means then you will get the
different option to estimate and/or test the population parameter of
one, two or more than two population means.
3.1.1 One-Sample T-test
❖ The one-sample t-test is used to determine whether a sample comes
from a population with a specific mean (μ).
❖ The Null (Ho) and Alternate hypothesis (H1) for the one sample t-test
are:
Ho: The sample is from a normal population with a mean of μ
H1: The sample is not from the normal population of mean μ.
• Command: Analyze → Compare Means → One sample t- test
Refer figure 17.
Figure 17: One sample T-test
One-Sample T-test (Cont. )
❖After everything goes OK on the window displayed on figure 17 you will get an out put
tables looks like Table 1.
Example: Using the sample data from the software named world 95 Missing value
Data let us test whether the population average life expectancy of male is different from
60 years. The null and alternate hypothesis is
H0 : μM= 60 vs H1 : μM ≠ 60

Table 1 : One sample T-test Output table


One-Sample T-test (Cont. )
❖ See Table 2 for the result. Since the p-value (sig.(2-tailed))=0.00 which is less than
0.05 at 5 percent level of significance, we reject the null hypothesis.
❖ Thus based on the result we are 95 percent sure as the average male life expectancy
is different from 60 years.

Table 2: SPSS result for the One sample T-test example


3.1.2. Independent Samples T-test
❖These types of T-tests are used to compare groups of observation that are not related in any
way. That is, the groups are independent from one another
❖The Null (Ho) and Alternate hypothesis (H1) for the independent samples t-test are:
Ho: There is no statistically significant difference between the population mean of group 1 and group 2.
H1: There is statistically significant difference between the population mean of group one and group two.

❖After all steps click OK , you will see an out put looks like Table 3.

Figure 18: Independent Samples T-test


Independent Samples T-test (Cont.)
❖Remember that the Independent samples T-test need the equality of population variance test
before we proceed to the Mean test.
❖Thus, on the out put table in the second column and first row, we have the information about
Levene’s Test for Equality of Variances .
❖First accept or reject the equality of population variance of the two groups and then based
on the variance result proceed to the mean test.
❖That is chosen whether Equal variances assumed or Equal variances not assumed option
based on the variance test.
Table 3: Independent Samples T-test out-put
Independent Samples T-test (Cont.)

Example: Using the world95 data from the software let us test whether the population
average life expectancy of male is different from female population. Assume the two
variables be independent.

The null and alternate hypothesis for the example will be


H0 : μM − μF = 0 vs H1 : μM − μF ≠0 or H0 : μM = μF vs H1 : μM ≠ μF
❖ First, we need the test result for
H 0 :  : M F
2

2 2 2
M F
VS H 1

❖ Based on the Levene’s test output result in Table 4 the p-value (0.135) is greater than
0.05 (5 percent) level of significance.
❖ Thus, at 5 percent level we do not reject the null hypothesis which means the two
variables have equal population variance.
❖ Next, Based on the variance test result we stick with the first row information for the
mean test which assumes the equality of the population variance.
❖ P-value for the test of independence is 0.00 which is less than 5 percent, that results
rejection of the null hypothesis. Therefore, the average life expectancy of male and
female is not equal.
Table 4: SPSS out put for the Independent Samples T-test example
3.1.3. Paired Samples T-test
❖Paired Samples T-test is used to compare the population mean of groups that are related
in some way.
❖The Null (Ho ) and Alternate hypothesis (H1 ) for this test are:
Ho: There is no statistically significant difference b/n the population mean of the 2 groups.
H1: There is statistically significant difference b/n the population mean of the two groups.
❖After you select the paired sample in the mean comparison option you will get the a
window which looks like figure 19.

Figure 19: Paired Samples T-test


Paired Samples T-test (Cont.)
❖Image A: shows the number of pairings to be tested, often one pair.
❖ Image B: the place where we insert the first pairing variable
❖ Image C: the place where we insert the second pairing variable
❖After all, we click OK on the window to get an output table looks like Table 5
Table 5: Paired Samples T-test Out put
Paired Samples T-test (Cont.)
❖Example: Using world 95 data from the software, let us test whether the population
average life expectancy of male is different from female population. Assume the two
example variables be paired.(read more about paired data)
i.e H0 : μM − μF = 0 vs H1 : μM − μF ≠0
❖See Table 6 for the result. Based on the software result the p-value is 0.00, which leads the
rejection of the null hypothesis.
❖That is the average life expectancy of male and female is not equal.
Table 6: SPSS out put for the Paired Samples T-test example
3.1.4. One-Way ANOVA
❖ANOVA means Analysis of Variance.
❖When you want to test the population mean of two or more groups, we can apply the One-
Way ANOVA. (Need reading about the assumptions).
❖The Null (Ho) and Alternate hypothesis (H1) for the One-Way ANOVA test are:
Ho: There is no statistically significant difference between the population means of K
groups. where K=2,3,4,5,...
H1: At least the population mean of one group is statistically different from the rest.

Figure 20: One-Way ANOVA


One-Way ANOVA (Cont. )
❖ On figure 20 Select the variable from Image A in to image B, and then insert the
factorizing variable on the factor option (image D).
Example: Using the world 95 data from the software, let us test whether the population
average life expectancy of male is different per Region or Economic group (OECD,
East Europe, Pacific/Asia, Africa, Middle East and Latin America).
❖ The null and alternate hypothesis for the example above is:
H0 : μOE = μEE = μPA = μAf = μME = μLA vs
H1: At least one of the pairs are unequal
❖ Based on the result in Table 7, the p-value (sig.) is 0.00 which leads the rejection of
the null hypothesis since its less than 0.05 level of significance.
❖ Thus, the average life expectancy of male is at least different in one of the region
than the others.
❖ You can apply pair wise comparison to find out which one. (Need reading)
Table 7: SPSS output for the One-Way ANOVA analysis for the example
3.2. Simple Linear Correlation Analysis
❖ Correlation coefficient is the measure of the degree of relationship between two
continuous variables.
❖ The population correlation coefficient is represented by  and its estimator by r.
❖ The correlation coefficient r is also called Pearson’s correlation coefficient since it
was developed by Karl Pearson.
❖ r is given as the ratio of the covariance of the variables x and y to the product of
the standard deviations of x and y.
❖ Symbolically it can be written as follows:
 x y
r
Cor ( x, y )


( x  x )( y  y )
n 1 
 ( x  x )( y  y) 
 xy  n
( y ) 2
2
sd ( x).sd (Y )  (x  x  ( y  y)  (x  x )  ( y  y)
2 2
( x
( X ) )(
y
2
2
 2
 )
n 1 n 1 n n

❖The numerator is termed as the sum of products of x and y, SPxy.


❖ In the denominator, the first term is called the sum of squares of x, SSx, and
❖the second term is called the sum of squares of y, SSy. Thus,

SPxy
r= SS x SS y
Simple Linear Correlation Analysis(Cont.)
❖ The correlation coefficient is always between –1 and 1, that is -1≤r ≤ 1
❖ r = -1 implies perfect negative linear relationship between the variables under consideration
❖ r = 1 implies perfect positive linear relationship between the variables under consideration
❖ r = 0 implies there is no linear relationship between the two variables
❖ but there could be a non-linear relationship between them.
❖ In other words, when two variables are uncorrelated, r = 0, but when r = 0, it is not necessarily
true that the variables are uncorrelated.
❖ To analyze the command

Analyze → Correlate → Bi-variate, this will produce the dialog box seen in figure :

❖ You will get the fist window and then enter the variables on the left window to the variables
window and then click on the Pearson box

❖ After all click OK.

❖ Then you will have the out puts looks like Table 8

❖ Look figure 21 for more information


Simple Linear Correlation Analysis(Cont.)

Figure 21: SPSS steps for Correlation Analysis


Simple Linear Correlation Analysis(Cont.)
Table 8: Vender’s Income with Their Age and Hours Worked

Annual earnings age Hours worked per day


2841 29 12
1876 21 8
2934 62 10
1552 18 10
3065 40 11
3670 50 11
2005 65 5
3215 44 8
1930 17 8
2010 70 6
3111 20 9
2882 29 9
1683 15 5
1817 14 7
4066 33 12
Simple Linear Correlation Analysis(Cont.)

Table 9: Bi-variate Correlation Analysis Output

annualearningsy age hoursworkedperday


Pearson
Correlation 1 .264 .691**
annualearnings
Sig. (2-tailed) .342 .004
Pearson
Correlation .264 1 -.086
agex1
Sig. (2-tailed) .342 .760
Pearson
Correlation .691** -.086 1
hoursworkedperday
Sig. (2-tailed) .004 .760

**. Correlation is significant at the 0.01 level (2-tailed).


3.3 Test of Association

❖Test of Independence: is used to test the association or dependency between two


categorical variables.
The Null (Ho) and Alternate hypothesis (H1) for this test are
Ho: The two categorical variables are independent.
H1: The two categorical variables are dependent.
The test statistics needed for test of association is Chi-square (x2)

O e  2

  
2 r c
ij IJ
for all ij
e
i 1 j 1 ij

Where Oij and eij be the (i, j)th observed and expected cell frequency respectively in the
crosstab formed by the two categorical variables.
❖ Note that i th

Row Total  j Column Total 

th


e 
ij
Grand Total 
Test of Association (Cont.)
❖ Finally, we will accept or reject our null hypothesis by comparing x2 (chi-square
computed) and x2α,,df (chi-square tabulated)
where df = (no. of row −1) * (no. of column−1) and α be the level of significance or
we can a compare p-value with the level of significance (α).
❖ To analyze use the the command
Analyze → Descriptive Statistics →Cross tab,
❖ You will get the fist window and then enter one of the variables on the row option
and the other on column option
❖ After we click on the Statistics box we will get new box
❖ Tick the chi-square box, click continue and OK
❖ Then you will have the out puts looks like Table 8
❖ Look figure 21 for more information
Test of Association (Cont. )

Figure 22: SPSS steps for testing independence


Table 10: Test of independence output window

Table 11: Saving Status by Educational level and Gender of respondents


Saving Gender Educ. Level
0 1 2
1 0 0
0 0 1
0 0 2
1 1 1
0 0 2
0 0 1
1 1 0
0 0 0
1 1 2
0 0 2
1 1 0
0 0 1
0 0 2
0 1 2
1 0 1
1 0 0
0 0 2
0 1 1
0 1 2
Test of Association (Cont.)
Example: using the sample data above in the table 9 let us test the existence of
dependency between Saving character and Educational level of the respondents.
❖ The null and alternate hypothesis for the example will be
• H0: Saving and Educational Level are independent VS
• H1: Saving and Educational Level are not independent
❖ See for. Based the results in Table 10 the Pearson chi-square P-value (0.035) which
is less than 0.05, level of significance.
❖ Our decision will be reject the null hypothesis at 5 percent level of significance.
❖ Thus we can conclude that there exist a dependence between saving and
educational level based on our sample data.
Table 12: SPSS output for the Test of independence for our example
3.4. Regression Analysis

3.3.1 Linear Regression Analysis

Model assumptions are is normally distributed as identically and independently


with mean 0 and variance

Where k-be the total number of independent variables (covariates) in the model,
Y -be the response (dependent) variable and
x1, ...xk be the independent variables.
❖ If k = 1 the model is called Simple linear regression model and if k > 1 the model is
called Multiple Linear Regression Model.
❖ To have a linear regression output, go to Analyze→ Regression → linear,
❖then you will get the upcoming window, figure 22 where you will enter your data and
get the possible outputs.
Figure 23: Regression Analysis Window
Regression Analysis (Cont. )

❖ Example : let us use world 95 data on the package. From this data let

Y -Population increase (percent per year),

x1- Average female life expectancy,

x2 -Average male life expectancy and

x3 - Infant mortality (deaths per 1000 live births).

❖ Then if you follow the formal procedure with the method Enter, you will get the
upcoming outputs in Table 11,12 and 13.
❖ Note that the hypothesis for the ANOVA table (out put figure (33))for our example is

H0:β1= β2= β3 = 0 VS H1: at least one i is different from zero. Where i=1,2,3

❖ Based on our enter method of variable selection the fitted model given as follow,

❖ Remember to check all the model assumptions before we interpret and use the estimates.

❖ The model becomes


Regression Analysis (Cont. )
Model interpretation:
❖1.711 - keeping all the three covariates constant, the other unexplained in the model
covariates contribute a 1.711 percent increase per year.
❖-0.226 - keeping all the other covariates constant, as the average life expectancy of
female increase by one year the population density will decrease by a magnitude of 0.226
percent.
❖0.236 - keeping all the other covariates constant, as the average life expectancy of male
increase by one year the population density will increase by a magnitude of 0.236 percent.

Table 13: Variables Entered/ Removed and Model Summary Tables


Table 14: ANOVA
table
Table 15: Coefficients of the fitted model
3.5 Binary Logistic Regression

❖ During simple or multiple linear regression analysis the dependent variable must be
continuous variable.
❖ However, there are cases where the dependent variable is categorical.
❖ If the dependent variable is categorical what ever the independent variable is/are
we can fit Logistic Regression model.
❖ Also there are different types of categorical variables too, like Binary (Two
possible value) or Multilevel(greater than two possible values).
❖ Again the Multilevel categorical variables can be categorized as Ordinal, nominal
categorical variable.
❖ If you need more you can read further about multilevel logistic regression, but we
are intended to discuss
❖ If the dependent variable has only two categories we use a Binary Logistic
Regression.
❖ Let P = Prob(y = 1/x = xi) where Y be the Binary response variable of values 0 and
1, and x be the independent variable/s (covariate variable/s).
Binary Logistic Regression (Cont.)
Then the Binary Logistic regression model becomes:

❖ he corresponding model expression with its probability value is

❖ Omnibus Tests of Model Coefficients will be used to test the significance of included
covariates in the model
❖ Model comparison will done using Likelihood ratio test and Chi-square test statistics.
❖ In addition, Cox and Snell R2 and Nagelkerke R2 will be interpreted to see the quality
of the final model. (Need reading)

❖ Command: Analyze →Regression → Binary Logistic.... Look figure (35) for the detail
steps.
Binary Logistic Regression (Cont.)

Example: Let us consider Table 9 and model the impact of Gender and Educational
level on the saving status of 20 house holds. Where
Dependent variable:
Saving: 0=Yes, and 1= No)
Independent variables:
Gender: 0=Male, and 1= Female
Educational Level: 0=Elementary or less, 1=High school and 2=Above High School
Based on the Binary logistics analysis out put the final model is:

where,

Gender(1) = Male, Educ(1) = Elementary-or-less and Educ(2) = High School


Table 16: Important outputs for Binary Logistic Regression, First
part
Figure 24: Basic steps for Binary Logistic Regression
Binary Logistic Regression (Cont.)
Model Coefficients and Odds Ratio Interpretation
❖ The odds ratio Exp(β) = 68.991 = e4.234 shows respondents with Elementary or less
educational level are 68.991 times more likely to stay without saveing in
comparison to respondents with above high school educational level.
❖ The 95 percent confidence interval (1.726, 2.757) for the odds ratio of Elementary
or less educational level shows the smallest and largest possible multiplier for the
impact of Elementary or less educational level in comparison to above high school
educ. level of respondents at 5 percent level of significance.
❖ Note again that this 95 percent confidence interval doesn’t contain one in between
which shows this educational level value is significant to explain the saving status.
❖ This result is equivalent with the corresponding p-value of 0.024 which is less than
5 percent level of significance.
Table 17: Important outputs for Binary Logistic Regression
Binary Logistic Regression (Cont.)
❖ The odds ratio Exp(β) = 0.129 = e−2.048 shows Males are 0.129 times less likely not to
save in comparison to Female respondents.
❖ The 95 percent confidence interval (0.007, 2.217) for the odds ratio for the gender
variable shows with 5 percent level of significance Male are between 0.007 and 2.217
times less likely to stay without saving in comparison to Female respondents.
❖ Note also that the 95 percent confidence interval for gender contain one in between which
shows gender is insignificant to explain saving status.
❖ The p-value for gender is 0.158 which is greater than 5 percent level of significance.
❖ Interpret the second coefficient in similar way to the above.
Thank You

You might also like