100% found this document useful (1 vote)

35 views44 pages

RSU - Statistics - Lecture 1 - Final - myRSU

The document discusses applied statistics including data types, sampling, descriptive statistics, and statistical methods. It covers topics like population and sample data, determining optimal sample size, and research design. Statistical analysis techniques are used to extract information from data and assess research outputs.

Uploaded by

irina.mozajeva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

35 views44 pages

RSU - Statistics - Lecture 1 - Final - myRSU

Uploaded by

irina.mozajeva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

APPLIED STATISTICS

Docent, Dr.oec. Irina Mozhaeva

Lecture 1
Data and Sampling
SYLLABUS

1. Types of Data and Data Collection. Population and sample data.

General ideas and types of sampling, optimal sample size
determination. Bias: how it arises and is avoided.
2. Aim of the survey. Designing questionnaires: best practice, tips
and common mistakes. Types of questions. Defining groups and
intervals. Summary representation of data.
3. Distributions of random variables. Exploring data: basic concepts
of descriptive statistics, measures of central tendency and
variation.
4. Statistical inference, confidence intervals. Central Limit Theorem.
5. Bivariate analyses. Correlation and covariance. Lines of best fit.
6. Time series analysis. Moving average and exponential smoothing
techniques. Seasonality. Use of a trend line and seasonal
component in prediction.
Assessment Criteria

• 50% of the final grade - exam in statistics

• 50% of the final grade - independent work, homework, activity:
– Attendance of lectures and seminars: 5%
– Activity and quality of answers in seminars: 15%
– Homework assignments and interim tests: 30%
LECTURE 1 OUTLINE

• Statistics: Basic Definitions and Concepts

• Statistical methods
• Data types
• Data collection, research design
• Population and sample data
• Sampling, optimal sample size determination
• Bias: how it arises and is avoided
Statistics: Basic Definitions and
Concepts
Key Terms

Statistics, also known as statistical analysis, or statistical inference is

a field of study concerned with collecting, summarizing data,
interpreting them, and making decisions based on data.

The related term data science or data analysis stands for a study of
processes and systems that extract knowledge or insights from data
in various forms, either structured or unstructured. Data science is a
continuation of some of the fields such as statistics, data mining, and
predictive analytics.
Key Terms

A population is any specific collection (whole number!) of objects

(persons, things, etc.) of interest.

To study the population, we usually select a sample. The idea

of sampling is to select a subset or subcollection of the population
and study it to gain information about the population.

A representative sample is a subset of a population that seeks to

accurately reflect the characteristics of the population.
Key Terms

A quantity calculated in a sample to estimate a value in a population

is called a statistic.

A parameter is a numerical characteristic of the whole population

that can be estimated by a statistic.
Key Terms

Observation unit is the unit described by the data that one analyzes.
 The unit of observation might be an individual, university, country,
etc.

A variable, usually notated by letters such as X and Y, is a

characteristic or measurement that can be determined for each
observation, sample and population.

Data are the actual values of the variable. They may be numeric or
text variables (string variables).

The probability of an event is a measure of the likelihood that the

event will occur.
Statistical Methods
Statistical Methods

Statistical methods are mathematical formulas, models, and

techniques that are used in statistical analysis of raw research data.

The application of statistical methods extracts information from

research data and provides different ways to assess the robustness
of research outputs.
Statistical Methods

Descriptive statistics is the branch of statistics that involves

organizing, summarizing, displaying, and describing data.

Inferential statistics is the branch of statistics that involves drawing

conclusions that extend beyond the immediate data alone.
Descriptive Statistics. Univariate analysis.

Univariate analysis involves the examination across cases of one

variable at a time.
Since it's a single variable it doesn’t deal with causes or
relationships. The main purpose of univariate analysis is to describe
the data and find patterns that exist within it.
There are three major characteristics of a single variable that we
tend to look at:
• the distribution
• the central tendency (e.g. mean, mode, median)
• the dispersion (e.g. variance, standard deviation,
interquartile range)

In many situations, we would describe all three of these

characteristics for our variables.
Descriptive Statistics. Bivariate Analysis

Bivariate analysis is used to find out if there is a relationship

between two different variables. Something as simple as
creating a scatterplot by plotting one variable against another
on a Cartesian plane (think X and Y axis) can sometimes give
you a picture of what the data is trying to tell you.
Multivariate Analysis

Multivariate analysis is the analysis of three or more

variables. There are many ways to perform multivariate analysis
depending on your goals. Some of these methods include:
• Additive Tree
• Canonical Correlation Analysis
• Cluster Analysis
• Correspondence Analysis
• Factor Analysis
• Generalized Procrustean Analysis
• Multidimensional Scaling
• Multiple Regression Analysis
• Partial Least Square Regression
• Redundancy Analysis
Data Types
Types of Data
Types of Data

Data

Primary Secondary

Primary data - quantitative or qualitative data obtained directly

from individuals, objects or processes. Such data is usually
collected exactly for the research problem you plan to study.
Types of Data

Data

Primary Secondary

Secondary data - data gathered by another researcher or agency

(and made available to you). Examples: census data published by
the Central Statistical Bureau, stock prices data published by CNN,
formal unemployment data provided by the State Employment
Agency.
Research Designs for Primary Data
Collection
Study design

In many ways the design of a study is more important than the

analysis. A badly designed study can never be retrieved, whereas
poor analysis can usually be amended or altered.

Hence, it is important at the outset to:

• Make objectives/research questions clear and unambiguous
(hypothesis-driven)
• Identify what data you need
• Plan your statistical analysis and decide on the methodology
applied before you collect any data.
Kinds of research designs

Four broad kinds of research designs are used in the behavioral and
social sciences:
• survey,
• experimental,
• comparative,
• and ethnographic.

Survey designs include the collection and analysis of data from

censuses, sample surveys, and longitudinal studies and the
examination of various relationships among the observed
phenomena. Randomization here is used to select members of a
sample so that the sample is as representative of the whole
population as possible.
Kinds of research designs

Experimental designs, in either the laboratory or field settings,

systematically manipulate a few variables while others that may affect the
outcome are held constant, randomized, or otherwise controlled. The
purpose of randomized experiments is to ensure that only one or a few
variables can systematically affect the results, so that causality can be
analyzed.
Comparative designs involve the retrieval of evidence that is recorded
in the flow of current or past events in different times or places and the
interpretation and analysis of this evidence.
Ethnographic designs involve a qualitative method where researchers
observe and/or interact with a study’s participants in their real-life
environment. The aim of an ethnographic study is to get ‘under the
skin’ of a problem (and all its associated issues). It is hoped that by
achieving this, a designer will be able to truly understand the problem
and therefore design a far better solution.
Population and Sample Data.
Sampling, Optimal Sample Size
Determination.
Population and Sample

Sample:
We can learn nearly as much
Population: by studying a suitably large
may be too big and/or correctly specified sample of a
expensive to study population as we can from
studying the entire population.
Random Selection

Most research study designs require a sample to be randomly

selected from a population.
Research1 suggests humans cannot generate random numbers and
thus cannot make random selections.
For a simplified random selection using one variable you can:
• Select numbered balls out of a bag (as in a lottery)
• Use an online random number generator, such as
www.random.org/integers
• Use the RAND or RANDBETWEEN functions in Excel
o See a tutorial: https://www.youtube.com/watch?v=fkyzQvjsqz0
• Or use Data Analysis in Excel.
o See a tutorial: https://www.youtube.com/watch?v=5XrJcFmbpWI&t=15s

1. Bains, W. (2008) Random number generation and creativity, Medical Hypotheses, 70(1), pp. 186-190
Determining Sample Size

One crucial aspect of study design is deciding how big your sample
should be. If you increase your sample size you increase the precision
of your estimates, which means that, for any given estimate / size of
effect, the greater the sample size the more “statistically significant”
the result will be. In other words, if your analysis is based on a small
number of observations, it will not detect results that are in fact
statistically significant.
However, increase in the sample size increases costs, therefore one
should define an optimal sample size.
Three main criteria need to be specified to determine the appropriate
sample size:
1. the level of precision
2. the level of confidence
3. the degree of variability of parameters measured
1. The level of precision

The level of precision or sampling error is the range in which the true
value of the population is estimated to be.
This range is often expressed in percentage points (e.g. ± 5%o). Thus,
if we find that 40% of students in our sample have read “Introduction
to Statistics” from A to Z, and our precision rate is ± 5%o, then we
may conclude that between 35% and 45% of all students have read
the entire book.
2. The confidence level

The confidence or risk level is based on ideas encompassed under

the Central Limit Theorem. The key idea of this theorem is that when
a population is repeatedly sampled, the average value of the
attribute obtained by those samples is equal to the true population
value. Furthermore, the values obtained by these samples are
distributed normally about the true value, with some samples having
a higher value and some obtaining a lower score.
In a normal distribution, about 95% of
the sample values will lie within two
standard deviations of the true
population value. This means that if a
95% confidence level is selected, 95 out
of 100 samples will have the true
population value within the specified
range of precision. Distribution of Means
for Repeated Samples
3. Degree of variability

The degree of variability in the attributes being measured refers to

the distribution of attributes in the population. The more
heterogeneous a population, the larger the sample size required to
obtain a given level of precision. The more homogeneous a
population, the smaller the sample size required.
You should note that a 50/50 split on a specific attribute or response
indicates maximum variability in the population, whereas a 90/10
split means that 90 per cent of the population share an attribute, so
the sample is less variable. If you don’t know what level of variability
to expect, then assume that it is 50 per cent. This may mean that you
use a larger sample size than was really needed, but that is better
than using a sample size that is too small, and then having no
confidence in the results.
Options for Determining Sample Size

There are several approaches to determining the sample size. These

include:
1. Census for small populations
2. Using published tables
3. Applying formulae
1. Using a Census for Small Populations

One option is to undertake a census, that is, to survey every member

of the population.
For small populations this may well be the only way to guarantee a
degree of accuracy. It eliminates all sampling error and provides data
on the whole population. However, cost considerations make this
impractical once populations exceed a few hundred.
2. Using Published Tables

The quickest option is to find a relevant table of sample sizes. These

will give sample sizes for different populations and with different
levels of precision, confidence levels and degrees of variability.
If you are happy with a 95 per cent confidence level, 5 per cent
precision and 50 per cent degree of variability, then you can choose
the sample size from the table shown below.

NB! The sample refers to the number of respondents, and not to the
number of people invited to participate in a survey. These sample
sizes also assume a truly random sample is used. If you need to
reflect differences in gender or age or geographic distribution, then
you have to use a stratified sampling system and a larger sample size.
3. Using Formulae

If you want different confidence intervals, or have different degrees

of variability, then you may find it easiest to use a formula to calculate
the sample size. For large populations and cases when the population
size is unknown, this formula by Cochran (1963) will tell you the
sample size required.

Z depends on the degree of confidence that you want. For a

confidence level of 95 per cent, Z=1.96; for 90 per cent, Z= 1.645; and
for 99 per cent, Z=2.576.
p is the degree of variability, expressed as a decimal; if you don’t
know this, then use 0.5.
e is the level of precision, expressed as a decimal.
• Refers to the actual uncertainty in a quantity. For example, prevalence of
coronavirus is 20% ± 10%, the absolute uncertainty is 10%.
3. Using Formulae. Example 1.

Imagine that you need to survey the total population of SMEs to

discover how many have a loan from a bank. You are happy with a
confidence level of 95 per cent, a precision rate of ±5 per cent and a
degree of variability of 50 per cent.
For a confidence level of 95 per cent, Z=1.96.

= 384.16 385

You obviously cannot interview a fraction of a person, so you need to

round upwards.
3. Using Formulae. Example 1.
Population Correction
If your total population is small, then your sample can be smaller.
If the total population of SMEs is just 1,000 members, then you can
adjust your sample size by using this equation, where n is the new
sample size and N is the size of the population.

384.16 / (1 + (384.16-1)/1000) = 277.41

278

You need to do the earlier calculation (on previous slide) to discover

the sample size for a large population and then you can apply the
‘finite population correction’.
Determining Sample Size Using Formulae. Task 1.

We want to estimate the true immunization coverage in a community

of children. Research tells us that immunization coverage should be
somewhere around 80% to avoid spread of the disease.
Precision (absolute): we would like the result to be within 2% of the
true value. Confidence level: conventional 95%.
Calculate the appropriate sample size using
the formula proposed by Cohran.

p = guess for the expected proportion in the population = 0.80

e = absolute precision = 0.02
Confidence level = 95%  Z = 1.96

n0 = 1.962 * (0.8) * (0.2) / 0.022

n0 = 1536.64  1537
Determining Sample Size Using Formulae. Task 1.

Now imagine we want to estimate the true immunization coverage in

one school only. Our population is 1200 children, other parameters
the same as above.
Adjust the sample size accordingly using the formula:

n = 1536.64/ (1 + (1536.64-1)/1200) = 674.05

Formula for Sample Size for the Mean

The use of tables and formulas to determine sample size in the above
discussion employed proportions that assume a dichotomous
response for the attributes being measured. There are two methods
to determine sample size for variables that are polytomous or
continuous. One method is to combine responses into two categories
and then use a sample size based on proportion. The second method
is to use the formula for the sample size for the mean. The formula of
the sample size for the mean is similar to that of the proportion,
except for the measure of variability. The formula for the mean
employs σ2 instead of (p x (p-1)).

σ2 is the variance of an attribute in the population.

The disadvantage of the sample size based on the mean is that a
"good" estimate of the population variance is necessary.
Bias: How It Arises and Is Avoided
What is Sample Selection Bias?

Sample selection bias is the bias that results from the failure to
ensure the proper randomization of a population sample.
The flaws of the sample selection process lead to situations where
some groups or individuals in the population are less likely to be
included in the sample, while others are more likely to participate.
The presence of sample selection bias may distort the statistical
analysis of a sample and affect the statistical significance of the
chosen statistical tests.
Types of Sample Selection Bias

1. Self-selection
Self-selection happens when the participants of the study exercise
control over the decision to participate in the study to a certain extent.
Since the participants may decide whether to participate in the
research or not, the selected sample does not represent the entire
population.
2. Selection from a specific area
The participants of the study are
selected from certain areas only
while other areas are not
represented in the sample.
3. Exclusion
Some groups in the population are
excluded from the study.
Types of Sample Selection Bias

4. Survivorship bias
Survivorship bias occurs when a sample is concentrated on subjects
that passed the selection process and ignores subjects that did not pass
the selection process. The survivorship bias results in overly optimistic
findings from the study.

5. Pre-screening of participants
The participants of the study are
recruited only from particular
groups. Thus, the sample will not
represent the entire population of
the study.
How to Overcome Bias?

The most obvious method is the establishment of a random sample

selection process.
Furthermore, one should ensure that the subgroups selected are
equivalent to the population in terms of their key characteristics (if
the key characteristics of population are known).
By analyzing the population of the study and by identifying the
subgroups of the population, a researcher must ensure that the
selected sample represents the total population as much as
possible.
If some of the population subgroups in the resulting sample are
underrepresented while other groups are overrepresented, a
researcher should apply a statistical correction by assigning weights
that will correct the bias.

ACS 1000 Faults Alarms Classic0.1
94% (17)
ACS 1000 Faults Alarms Classic0.1
189 pages
2
No ratings yet
2
14 pages
002 6030 RH120E Undercarriage CAT
No ratings yet
002 6030 RH120E Undercarriage CAT
26 pages
CRM Unit 1 Notes Introduction To CRM
100% (1)
CRM Unit 1 Notes Introduction To CRM
16 pages
As Electronics Coursework Example
100% (2)
As Electronics Coursework Example
5 pages
Ensayo de Vacaciones de Primavera
100% (1)
Ensayo de Vacaciones de Primavera
7 pages
RRB NTPC CBT Stage I & II Mathematics VOLUME 1 in English
0% (1)
RRB NTPC CBT Stage I & II Mathematics VOLUME 1 in English
416 pages
ABE 322 Sta Class 1-2
No ratings yet
ABE 322 Sta Class 1-2
35 pages
Oleo Mac Sparta 25 Brushcutter
No ratings yet
Oleo Mac Sparta 25 Brushcutter
21 pages
Business Statistics: A Decision-Making Approach: The Where, Why, and How of Data Collection
No ratings yet
Business Statistics: A Decision-Making Approach: The Where, Why, and How of Data Collection
129 pages
Bio Statistics
No ratings yet
Bio Statistics
217 pages
Statistics and Probability
No ratings yet
Statistics and Probability
69 pages
Psychology 117 Study Guide
100% (3)
Psychology 117 Study Guide
41 pages
Course Introduction Mpu
No ratings yet
Course Introduction Mpu
29 pages
Microwave Engineering Sem VII Mu Question Paper 23 D
No ratings yet
Microwave Engineering Sem VII Mu Question Paper 23 D
1 page
Yuandongtian
No ratings yet
Yuandongtian
87 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
71 pages
'MATH 233 Statistics For Social Sciences - Week 1' D - 241029 - 161224
No ratings yet
'MATH 233 Statistics For Social Sciences - Week 1' D - 241029 - 161224
110 pages
Neural Networks: Machine Learning Is Machine Learning Is
No ratings yet
Neural Networks: Machine Learning Is Machine Learning Is
23 pages
Umehabiba - 2340 - 4448 - 3 - Lec 1,2
No ratings yet
Umehabiba - 2340 - 4448 - 3 - Lec 1,2
41 pages
Measurement System OVHWizard - Dr. Wehrhahn
No ratings yet
Measurement System OVHWizard - Dr. Wehrhahn
1 page
Agile in Capgemini
No ratings yet
Agile in Capgemini
115 pages
Lecture 1
No ratings yet
Lecture 1
13 pages
Intro To Stats (QUAT)
No ratings yet
Intro To Stats (QUAT)
27 pages
AA SL - Unit 1a - Representing Data (Statistics)
No ratings yet
AA SL - Unit 1a - Representing Data (Statistics)
74 pages
Psychological Statistics
No ratings yet
Psychological Statistics
6 pages
Chmsu Compre Notes
No ratings yet
Chmsu Compre Notes
7 pages
مبادئ الاحصاء
No ratings yet
مبادئ الاحصاء
66 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
67 pages
Bathhouse Diplomacy
No ratings yet
Bathhouse Diplomacy
13 pages
Summary of Lectures
No ratings yet
Summary of Lectures
36 pages
Business Statistics May Module
No ratings yet
Business Statistics May Module
72 pages
Pythin Learnings
No ratings yet
Pythin Learnings
51 pages
Trust Wallet Spamming
No ratings yet
Trust Wallet Spamming
50 pages
IST1014 - Lecture01my Version
No ratings yet
IST1014 - Lecture01my Version
22 pages
Basic Concepts and Terminologies
No ratings yet
Basic Concepts and Terminologies
37 pages
MicroC2 eCh10L02Mem Const Var DataTypes
No ratings yet
MicroC2 eCh10L02Mem Const Var DataTypes
44 pages
Note For Students
No ratings yet
Note For Students
68 pages
Statistics For Beginners 2024
No ratings yet
Statistics For Beginners 2024
37 pages
INCOME STATMENT MAY North CHECK DATE ADJUSTED
No ratings yet
INCOME STATMENT MAY North CHECK DATE ADJUSTED
17 pages
Condor Scissors Lift t62 92367 Parts Book
100% (71)
Condor Scissors Lift t62 92367 Parts Book
20 pages
Math 140 Final Review Notes
No ratings yet
Math 140 Final Review Notes
20 pages
SMA 160 - Stds Notes (2025)
No ratings yet
SMA 160 - Stds Notes (2025)
40 pages
Past Question
No ratings yet
Past Question
50 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
Basic Statistics Data Management & Sampling GED0103
No ratings yet
Basic Statistics Data Management & Sampling GED0103
36 pages
Ridl Q2 Reviewer
No ratings yet
Ridl Q2 Reviewer
6 pages
Artifacts OF THE PROCESS: Presented by
No ratings yet
Artifacts OF THE PROCESS: Presented by
15 pages
Bio 206 Biostatistics
No ratings yet
Bio 206 Biostatistics
12 pages
Statistics - MMW
No ratings yet
Statistics - MMW
15 pages
7yhja Justpasteit
No ratings yet
7yhja Justpasteit
14 pages
G14-User Manual
No ratings yet
G14-User Manual
20 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
37 pages
Statistics
No ratings yet
Statistics
34 pages
Seminar 4
No ratings yet
Seminar 4
43 pages
WK 1 3
No ratings yet
WK 1 3
5 pages
Design and Analysis of Pressure Vessel
No ratings yet
Design and Analysis of Pressure Vessel
9 pages
Data Analysis
No ratings yet
Data Analysis
12 pages
Stats Reviewer Hanggang ch3 Nga Lang
No ratings yet
Stats Reviewer Hanggang ch3 Nga Lang
6 pages
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
No ratings yet
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
47 pages
Week 1: To Statistics: 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Sampling Technique and Data Collection
No ratings yet
Week 1: To Statistics: 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Sampling Technique and Data Collection
27 pages
Stas Tics
No ratings yet
Stas Tics
129 pages
Chap 1
No ratings yet
Chap 1
5 pages
EDA - Midterms - Reviewer
No ratings yet
EDA - Midterms - Reviewer
7 pages
MS 14L1 Introduction To Statistics
No ratings yet
MS 14L1 Introduction To Statistics
30 pages
Refrigeration System Operating With Solar Energy: - Design of Vapor-Absorption
No ratings yet
Refrigeration System Operating With Solar Energy: - Design of Vapor-Absorption
27 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
Unit 2 Statistics PDF
No ratings yet
Unit 2 Statistics PDF
18 pages
EDA - First Quiz Reviewer
No ratings yet
EDA - First Quiz Reviewer
5 pages
Module For R-3
No ratings yet
Module For R-3
14 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
32 pages
Smat3: Statistics and Robability
No ratings yet
Smat3: Statistics and Robability
8 pages
Sizing For Liquid-Vapor Relief
No ratings yet
Sizing For Liquid-Vapor Relief
17 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Icte Lesson
No ratings yet
Icte Lesson
19 pages
TELX TD PABX Manual PDF
No ratings yet
TELX TD PABX Manual PDF
44 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Sasa Reviewer P1, P4 at P5
No ratings yet
Sasa Reviewer P1, P4 at P5
10 pages
ENGDAN 203 Engineering Data Analysis Topic 1
No ratings yet
ENGDAN 203 Engineering Data Analysis Topic 1
5 pages
STATISTICS - Is A Branch of Mathematics That Deals With The Collection
No ratings yet
STATISTICS - Is A Branch of Mathematics That Deals With The Collection
14 pages
Statistics
No ratings yet
Statistics
7 pages
Introduction To Statistics. An Overview of Statistics
No ratings yet
Introduction To Statistics. An Overview of Statistics
11 pages
Company Profile PT Gita Guna Utama
No ratings yet
Company Profile PT Gita Guna Utama
53 pages
Chapter1 Stats
No ratings yet
Chapter1 Stats
7 pages
FS For Medium Voltage Motor PDF
No ratings yet
FS For Medium Voltage Motor PDF
8 pages
Water-Coold Ex
No ratings yet
Water-Coold Ex
6 pages
Etabs V18 Course Content
No ratings yet
Etabs V18 Course Content
9 pages
eME4 HW3 Flores BSME-4B
No ratings yet
eME4 HW3 Flores BSME-4B
6 pages
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet

RSU - Statistics - Lecture 1 - Final - myRSU

Uploaded by

RSU - Statistics - Lecture 1 - Final - myRSU

Uploaded by

APPLIED STATISTICS

Docent, Dr.oec. Irina Mozhaeva

1. Types of Data and Data Collection. Population and sample data.

• 50% of the final grade - exam in statistics

• Statistics: Basic Definitions and Concepts

Statistics, also known as statistical analysis, or statistical inference is

A population is any specific collection (whole number!) of objects

To study the population, we usually select a sample. The idea

A representative sample is a subset of a population that seeks to

A quantity calculated in a sample to estimate a value in a population

A parameter is a numerical characteristic of the whole population

A variable, usually notated by letters such as X and Y, is a

The probability of an event is a measure of the likelihood that the

Statistical methods are mathematical formulas, models, and

The application of statistical methods extracts information from

Descriptive statistics is the branch of statistics that involves

Inferential statistics is the branch of statistics that involves drawing

Univariate analysis involves the examination across cases of one

In many situations, we would describe all three of these

Bivariate analysis is used to find out if there is a relationship

Multivariate analysis is the analysis of three or more

Primary data - quantitative or qualitative data obtained directly

Secondary data - data gathered by another researcher or agency

In many ways the design of a study is more important than the

Hence, it is important at the outset to:

Survey designs include the collection and analysis of data from

Experimental designs, in either the laboratory or field settings,

Most research study designs require a sample to be randomly

The confidence or risk level is based on ideas encompassed under

The degree of variability in the attributes being measured refers to

There are several approaches to determining the sample size. These

One option is to undertake a census, that is, to survey every member

The quickest option is to find a relevant table of sample sizes. These

If you want different confidence intervals, or have different degrees

Z depends on the degree of confidence that you want. For a

Imagine that you need to survey the total population of SMEs to

You obviously cannot interview a fraction of a person, so you need to

384.16 / (1 + (384.16-1)/1000) = 277.41

You need to do the earlier calculation (on previous slide) to discover

We want to estimate the true immunization coverage in a community

p = guess for the expected proportion in the population = 0.80

n0 = 1.962 * (0.8) * (0.2) / 0.022

Now imagine we want to estimate the true immunization coverage in

n = 1536.64/ (1 + (1536.64-1)/1200) = 674.05

σ2 is the variance of an attribute in the population.

The most obvious method is the establishment of a random sample

You might also like