0% found this document useful (0 votes)
8 views

Lecture3 Descriptive Statistics

Uploaded by

Ricx Rosco
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture3 Descriptive Statistics

Uploaded by

Ricx Rosco
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Descriptive Statistics

•••••••••••••••••••••••••••••
•••••
Describing your Data
Generally one of the first things to do with new data is to get
to know it by asking some general questions like but not
limited to the following:
 What variables are included? What
information are we getting?
 What is the format of the variables:
string, numeric, etc.?
 What type of variables: categorical,
continuous, and discrete?
 Is this sample or population data?
Hours Played
Sex Age Height Weight Mobile
Legend

Male 21 66 55 7

Female 21 65 43 6

Male 29 61 53 5

Male 25 64 43 3

Female 23 51 46 4

Female 21 52 52 7

Female 26 62 57 6

Male 21 54 59 3

Female 23 50 53 1

Female 21 52 63 4

Female 26 58 57 2
After looking at the data you may
want to know:

 How many males/females?


 What is the average age?
 What is the average height?
 Who plays mobile legend more frequently: men or women?
You can start answering some of these questions by
looking directly at the table. For some other questions,
you may have to do some calculations by obtaining a set
of descriptive statistics. These statistics are a collection
of measurements of two things: location and variability.
 Location tells you the central value (the mean is
the most common measure of this) of your
variables.
 Variability refers to the spread of the data from
the center value (i.e. variance, standard deviation).
 Statistics is basically the study of what causes
such variability.Location Variability
Mean Variance
Mode Standard deviation
Median Range
Use the Descriptive
Statistics data file. You should see the
screen below.

Click on DATA

Data Analysis

Descriptive Statistics

For input range, select


ALL data.
Since we include the labels in first row make sure to check that option.

For the output option which is the place where excel will enter the
results, select O1
or you can select a new worksheet or even new workbook.

Check “Summary statistics”

Press OK
You will get the following:
 While the whole descriptive statistics cells are selected go to
Format Cells to change all numbers to have one decimal
point. When you get the ‘format cells’ window, select the
following:
 Click OK.
 All numbers should now have one decimal as
follows:
Now we know something about our data.

 The average age in this sample is 23.36 years,


has a height of 57.73 inches tall and plays mobile
legend 4.3 hours a day.

We know this by looking at the ‘mean’ value on each variable.


Mean
• The sum of the observations divided by
the total number of observations.
• The most common indicator of central
tendency of a variable.

 If you look at the last two rows: “Sum” and “Count” you can
estimate the mean dividing “Sum” by ”Count” (sum/count).
 You can also calculate the mean using the function
below (IMPORTANT: All functions start with the equal “=”
sign):

=AVERAGE(range of cells with the


values of interest)
Sum

• refers to the sum of all


the values in a range of
values
The excel function for sum is:
=SUM(range of cells with the values
of interest)
For “age”

• means the sum of the


ages of all students

=AVERAGE(B2:B12)
Count

• refers to the count of cell


that contain values
(numbers)

The function is:


=COUNT(range of cells with the
values of interest)
“Min”

• the lowest value in an


array of values

The function is:


=MIN(range of cells with the values
of interest)
“Max”

• the largest value in


an array of values

The function is:


=MAX(range of cells with the values
of interest)
Median

• another measure of central tendency which is


the number in the middle
• To get the median you have to order the data
from lowest to highest.
• If the number of cases is odd the median is the
single value, for an even number of cases the
median is the average of the two numbers in the
middle

The excel function is:


=MEDIAN(range of cells with the
values of interest)
Mode

• refers to the most frequent,


repeated or common number in
the data
• By age there are more students
21 years old in the sample than
any other group.

The function is:


=MODE(range of cells with the
values of interest)
Range

• measure of dispersion
• It is simply the difference
between the largest and
smallest value, “max” – “min”.
Sample Variance

• measures the dispersion of the data


from the mean
• It is the simple mean of the squared
distance from the mean.
• It is calculated by: SV = sum of (X-mean
of X)2 / number of observation minus 1
• Higher variance means more dispersion
from the mean.

The excel function is:


=VAR(range of cells with the values of
interest)
Standard Deviation

• the squared root of the


variance
• indicates how close the
data is to the mean

The excel function is:


=STDEV(range of cells with the values of
interest)
Skewness

• measures the asymmetry of


the data, when in an
otherwise normal curve one
of the tails is longer than the
other
• It is a roughly test for
normality in the data (by
dividing it by the SE).
Skewness
• If it is positive there is more data on the
left side of the curve (right skewed, the
median and the mode are lower than the
mean).
• A negative value indicates that the mass
of the data is concentrated on the right of
the curve (left tail is longer, left skewed,
the median and the mode are higher than
the mean).
• A normal distribution has a skew of 0.

Can also be estimated using the function:


=SKEW(range of cells with the values of
interest)
Kurtosis

• measures the peak of the distribution


• also an indicator of normality
• Positive kurtosis indicates too few cases in
the tails or a tall distribution (leptokurtic).
• Negative kurtosis indicates too many
cases in the tails or a flat distribution
(platykurtic).
• A normal distribution has a kurtosis of 0
(given a correction of –3, otherwise it will
have a kurtosis of 3).
The excel function is:
=KURT(range of cells with the values of
interest)

You might also like