0% found this document useful (0 votes)
58 views

1 - Basic Concepts

The document discusses basic statistical concepts including defining statistics, identifying variable types and measurement scales, and sampling methods. Statistics is defined as the science of collecting, analyzing, and drawing conclusions from data. Variable types include qualitative, quantitative discrete, and quantitative continuous variables. Measurement scales are nominal, ordinal, interval, and ratio. Probability and non-probability sampling methods are also outlined.

Uploaded by

lyriemaecutara0
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

1 - Basic Concepts

The document discusses basic statistical concepts including defining statistics, identifying variable types and measurement scales, and sampling methods. Statistics is defined as the science of collecting, analyzing, and drawing conclusions from data. Variable types include qualitative, quantitative discrete, and quantitative continuous variables. Measurement scales are nominal, ordinal, interval, and ratio. Probability and non-probability sampling methods are also outlined.

Uploaded by

lyriemaecutara0
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Basic concepts in

Statistics
Objectives
At the end of this chapter, the students will be able to explain basic
statistical concepts and measures. Specifically,
1. Define statistics;

2. Identify the type of variable and the scale of measurement a


variable belongs;
3. Identify which method of data presentation is appropriate to use;
Statistical literacy
 Statistics is essential for making good decisions in an uncertain world.
 People’s ability to interpret and critically evaluate statistical
information and data-based arguments appearing in diverse media
channels, and their ability to discuss their opinions regarding such
statistical information” (Gal, as cited by Rumsey, 2002)
 Statistical literacy is important because you are faced with statistics
problems in your personal and professional lives.
 Be able to evaluate media reports about opinion surveys, medical
research studies, the state of the economy, and environmental issues.
Statistical literacy
 How can you evaluate evidence about global warming?
 Is there bias against women in appointing managers?
 Are cell phones dangerous to your health?
Statistical literacy
 How likely are you to win the lotto?
 How can you analyze whether a diet really works?
 How can you predict the selling price of a house?
 What’s the chance that you will get a passing grade in this
class?
What is Statistics?
The art and science of answering questions and exploring
ideas through the processes of gathering data, describing
data, and making generalizations about a population on
the basis of a smaller sample.

(https://onlinecourses.science.psu.edu/stat200/node/113)
What is Statistics?
The art and science of designing studies and analyzing the
data that those studies produce. Its ultimate goal is
translating data into knowledge and understanding of the
world around us. In short, statistics is the art and science
of learning from data.
(Agresti and Franklin, 2013)
What is statistics?
Statistics (Singular)
 science that deals with techniques for collecting,
presenting, analyzing, and drawing conclusions from data
 science of data

Statistics (Plural)
 numerical descriptions by which we enhance understanding
of data
 summary measures used to describe a sample
Example
1. Statistics are facts or data, either numerical or
nonnumerical.
2. Statistics is the science of organizing and
summarizing numerical or nonnumerical
information.
Variables
 When we obtain our sample, we obtain data values on one or
more variables
 A variable is a characteristic or attribute which varies from one
entity to another entity (tensile strength, no. of buildings in VSU
campus, income of engineers)
 A qualitative variable is one which classifies/identifies/describes
an element of a sample or population (sex of an engineer,
academic rank of faculty )
 A quantitative variable is one which quantifies an element of a
sample or population(age, length of service)
Variables
 A quantitative variable that can assume only a finite or countably
infinite number of possible values (usually integers) is called a
discrete variable (no. of board passers, enrolment in BSCE per
semester)
 A quantitative variable that can theoretically assume any value in a
specified interval (i.e., continuum) is called a continuous variable
(temperature, wind speed, weight of beams)
 measuring instruments limit the number of decimal places of
values of continuous variables
Summary
Variables

Qualitative Quantitative

Discrete Continuous
Measurement
 the process or rule of assigning labels or values to a variable

Importance:
 Identifying the level of measurement of a variable is one of
the factors in choosing a statistical method
Levels or Scales of Measurement

Ratio
Interval
Ordinal
Nominal
Nominal
 objects are classified into categories based on some defined
characteristics
 categories are mutually exclusive
 numbers are sometimes used as category codes but arithmetic
should not be performed on these number codes
 frequencies or counts of observations that belong to each
category is usually obtained to summarize the data

Examples: sex, degree program, religion, civil status


Ordinal
 objects are classified into ordered categories
 categories are mutually exclusive but have some logical order
(one category is “higher” than others)
 categories are scaled according to the amount of the particular
characteristic they possess
 the difference between two categories is meaningless
 the difference between any two categories is not equal

Examples: faculty position in a university, faculty performance rating,


grading system at VSU
Ordinal Measurement
The interval, or distance between numbers is unequal and unknown:
“AVERAGE Rank” “overall” RANK
Skiing 2.0 1
AxeThrow 2.3 2
Rugby 2.7 3
Opera 3.3 4
Interval
 possesses the properties of the ordinal scale
 the difference between any two values in the scale is the same
 the zero point is arbitrary and is just another point on the scale.

Examples:
temperature (in °C), achievement test score , IQ
Ratio
 Has all the properties of the interval scale
 the zero point reflects an absence of the characteristic (absolute
zero point)
 ratios of two values in the scale are meaningful

Examples:
weight, price, distance
Ratio Measurement
Example: Weight in pounds

10 lbs. is twice as much as 5 lbs


(ratios are meaningful: 10/5=2), and

zero pounds means no weight or absence of weight (true zero point)


Ratio
Example:
A height of zero is meaningful (it means you don’t exist). Compare
that to a temperature of zero, which while it exists, it doesn’t mean
anything in particular (although admittedly, in the Celsius scale it’s
the freezing point for water).
Example
Identify the type of variable and the scale of measurement
of the following variables:
1. Age (e.g., 19 years and 5 months old)

2. Total no. of units enrolled in the current semester

3. Sex (e.g., 0-Male and 1-Female)

4. Temperature (in Celcius)

5. ID number
Population and sample
 A population is the entire collection of objects or
outcomes about which data are collected
 A sample is a subset of the population containing the
observed objects or the outcomes and the resulting data
 Information obtained from a population data are called
parameters
 Numbers computed using the data obtained from a
sample are called statistics
 Statistics are used to estimate parameters
Parameters & Statistics
Example:
Decide whether the numerical value describes a population
parameter or a sample statistic.
a.) A recent survey of a sample of 450 engineering
students reported that the average weekly
income for students is $325.
Because the average of $325 is based on a sample, this
is a sample statistic.
b.) The average weekly income for all students is $405.
Because the average of $405 is based on a population,
this is a population parameter.
Population

Inference
Sampling
(generalizations)

sample
Sampling
 is the process of selecting a small number of elements from a
larger defined target group of elements such that the
information gathered from the small group will allow judgments
to be made about the larger group
 is the process of selecting a number of individuals for a study in
such a way that the individuals represent the larger group from
which they were selected
Sampling
A physician would like to know the characteristics of a person’s
blood (blood type, Rh factor, blood sugar, etc).
To be able to do this, the physician extracts a few milliliters of
blood from his arm.
He subjects this to laboratory analysis and concludes that the
characteristic obtained from the blood sample is the characteristic
of the person’s blood.
Reasons for sampling
1. Reduced cost
2. Greater speed or timeliness
3. Greater scope
4. Convenience
5. Physically impossible
Probability sampling
 each element in the population has a known chance of being
included in the sample
 the likelihood of inclusion is operationalized by the use of a random
mechanism (e.g. a device that is used to generate a random
number) and the assigned probability that the unit is specified
Probability sampling
 requires a listing of the population units and assigning a unique
label or identifier (usually counting numbers) to each one
(sampling frame)
 generally referred to as random samples
 allows drawing of valid generalizations about the population
Non-probability sampling
 the manner in which the units are selected from the
population depends on some inclusion rule as specified by the
sampler.
 sampling frame is not always required and the operational
cost is relatively cheaper
Non-probability sampling
 inability to provide objective measurement of accuracy
 inference can only be made by making assumptions
regarding the representativeness of the sample
Sampling Methods
Probability sampling Non-probability sampling
 simple random sampling  purposive sampling
 systematic sampling  convenience sampling
 stratified sampling  quota sampling
 cluster sampling
Simple Random Sampling
 each element in the population has a known and equal probability of
selection
 each possible sample of a given size (n) has a known and equal
probability of being the sample actually selected
 may not be practical to implement especially for large populations
due to the absence of good quality sampling frame and the
possibility that the selected units may be extremely scattered thus
making it doubly difficult to implement
 maybe done with replacement or without replacement
Simple Random Sampling
Systematic Sampling
 the sample is chosen by selecting a random starting point and then
picking every kth element in succession from the sampling frame
 k is the sampling interval and is determined by dividing the
population size N by the sample size n and rounded to the nearest
integer
 a random number (called the random start) is selected from 1 to k;
the unit assigned this number is then included in the sample and the
kth unit thereafter
Systematic Sampling
Stratified Sampling
 the population is divided into mutually exclusive sub-populations
called strata based on a stratification variable that is closely related
to the characteristic of interest
 the elements within a stratum should be as homogeneous as
possible, but the elements in different strata should be as
heterogeneous as possible.
 independent simple random samples are obtained from each stratum
 the overall sample size (n) can be distributed into the strata sizes (nh)
using equal allocation, proportional allocation or optimum allocation
Stratified Sampling
Cluster Sampling
 in many applications, units of the population are naturally grouped
(e.g. villages); these groupings are referred to as clusters
 a random sample of clusters is selected, based on a probability
sampling technique such as SRS
 for each selected cluster, either all the elements are included in the
sample (one-stage) or a sample of elements is drawn
probabilistically (two-stage)
Cluster Sampling
 elements within a cluster should be as heterogeneous as possible,
but clusters themselves should be as homogeneous as possible.
Ideally, each cluster should be a small-scale representation of the
population
 it is administratively convenient to implement
Cluster Sampling
Descriptive statistics
 Use of numerical information to summarize, simplify, and present
data.
 Organize and summarize data for clear presentation and easy
interpretation
 Computation of measures of location and variation
 Construction of tables and graphs
Inferential statistics
 techniques that use sample data to make general statements
about a population
 making decisions and drawing conclusions about a population
based on data obtained from a sample taken from the population
 allows meaningful generalizations only if the subjects in the
sample are representative of the population
 Estimation and hypotheses testing
Data
 refers to facts or figures from which conclusion can be drawn
 refers to the values or labels assigned to variable
 it is information collected, organized, analyzed, and interpreted by
statisticians
 it is needed whenever we undertake studies or researches which are
designed to answer particular problems, or to provide a base with
which certain decisions may be formulated
Data
 Can be quantitative or qualitative
 Qualitative data can be transformed into quantitative data
using number codes
 Can be primary or secondary
Methods of data collection
 Sample survey (personal, phone, online)
 Controlled experiments (field or lab)
 Observation (psychiatric wards)
 Registration method (e. g. as required by law)
 Focus group discussion (qualitative data)
 Use of existing records (secondary data)
Presenting data
 Generally, there are three ways of presenting statistical data:
textual, tabular, and graphical
 In a textual presentation, statistics are incorporated in a text or
paragraph.
 In a tabular presentation, statistics are organized in rows and
columns with appropriate labels
 In a graphical presentation, statistics are shown pictorial form
Textual presentation
Poverty incidence among Filipinos1 in 2015 was estimated at 21.6
percent. During the same period in 2012, poverty incidence among
Filipinos was recorded at 25.2 percent. On the other hand,
subsistence incidence among Filipinos, or the proportion of Filipinos
whose incomes fall below the food threshold, was estimated at 8.1
percent in 2015. In 2012, the subsistence incidence among Filipinos
is at 10.4 percent . Subsistence incidence among Filipinos is often
referred to as the proportion of Filipinos in extreme or subsistence
poverty.

Source: Philippine Statistics Authority


Tabular presentation
 arranges figures in a systematic manner using rows and
columns
 data can more readily be understood and comparisons
may more easily be made
Frequency Distributions
 A frequency distribution shows the number of observations
falling into each of several ranges of values.
 Frequency distributions are portrayed as frequency tables,
histograms, or polygons.
 Frequency distributions can show either the actual number
of observations falling in each range or the percentage of
observations.
Frequency Distributions

Table 1. Distribution of 25 patients afflicted with HIV


according to blood type

Blood Type Number of Patients


O 7
A 6
B 7
AB 5
What Are Frequency Distributions?
Table 2. Distribution of the number of cars owned
by families in a certain residential area
Number of cars Number of families
0 12
1 6
2 7
3 3
4 2
What Are Frequency Distributions?

Table 3. Distribution of the weights of 30 female PE major students.


Weights Number of Students Percent
85 - 94 1 3.3
95 - 104 3 10.0
105 - 114 4 13.3
115 - 124 6 20.0
125 - 134 9 30.0
135 - 144 6 20.0
145 - 154 1 3.3
Graphical Presentation

Graph (or chart) - is any device used in presenting numerical values


or relationship in pictorial form

Advantages
1. There is a better comprehension of data than is possible with
textual matter alone.
2. There is a more penetrating analysis of the subject than is
possible in written text.
3. There is a check of accuracy
Some Commonly Used Charts
Line Chart
 oldest, simplest, most familiar, and most widely used method
of presenting statistics graphically
 the plotted points of the data are connected by a line.
 the fluctuations of this line show the variations in the trend.
 the distance of the plotting from the base line of the graph
indicates the quantity
1. for emphasizing movement rather than actual amount
2. for depicting time series (data across time)
3. for comparing several series
4. when data cover a long period of time
5. when estimates or forecasts are to be shown
Source: PSA
Some Commonly Used Charts
Column Chart
 to depict numerical values of a given item over a period of
time
 values are represented by the height of the column
 preferable to the line chart when a sharper explanation of
trend is to be shown
Types:
1. Grouped-column chart - used to compare two or sometimes
three independent series over a period of time.
2. Subdivided-column chart - shows the component parts of a total.
These should be few in number and each should carry a
distinctive pattern so that it may be readily identified.
Source: PSA
Some Commonly Used Charts
Horizontal Bar Chart
 simplest form of graph comparing different items at a
specified date
 especially suited to represent categorical data
 bars may be arranged in numerical or alphabetical order,
depending on the purpose of the chart and the given data
Some Commonly Used Charts

Pie Chart
 a circular diagram that is divided into sections to show the
composition of a whole
 size of each section is indicative of the proportion to the
total of the corresponding component
 useful when there are few components to a whole
 many components (more than six) would diminish the
visual impact of the chart
Some Commonly Used Charts

Statistical Map
 used to present geographical statistics
 should be used only when geographic distribution is of
permanent importance and when data can be readily and
correctly interpreted in this form
Types
1. Shaded or cross-hatched map

2. Dot-map chart
Source: PSA
Graphical presentation of
frequency distributions
Frequency histogram
 bar graph showing the class boundaries on the x-axis and
the frequencies on the y-axis
 the border of each bar is erected at the class boundaries

Relative frequency histogram


 the relative frequencies (y-axis) are plotted against the
class boundaries (x-axis)
Frequency histogram
End of the lecture

You might also like