ENGINEERING
DATA ANALYSIS
CHAPTER 1 – INTRODUCTION TO STATISTICS
AND DATA PRESENTATION
OBJECTIVE
At the end of this chapter, the graduate students will be able to:
1. understand the key concepts of statistics.
2. differentiate the types of statistics.
3. understand the uses of statistics in everyday life.
4. learn to analyze data presentation.
STATISTICS
is a branch of applied mathematics that involves the collection, description, analysis, and inference
of conclusions from quantitative data. It concerns with determining how to draw reliable
conclusions about large groups and general phenomena from the observable characteristics of
small samples that represent only a small portion of the large group or limited number of instances
of a general phenomenon. The two major areas of statistics are known as descriptive statistics,
which describes the properties of sample and population data, and inferential statistics, which
uses those properties to test hypotheses and draw conclusions. Descriptive statistics refers to the
method of collection, extraction, summary, presentation, measures of central tendency, and
measures of variability. The purpose of descriptive statistics is to facilitate the presentation and
interpretation of data while inferential statistics refers to using the properties to test hypotheses
and draw conclusions based on the evidence obtained from samples.
USES OF STATISTICS
1. Statistics helps in providing a better understanding and accurate description of nature’s
phenomena.
2. 2. Statistics helps in the proper and efficient planning of a statistical inquiry in any field of study.
3. 3. Statistics helps in collecting appropriate quantitative data.
4. 4. Statistics helps in presenting complex data in a suitable tabular, diagrammatic and graphic
form for an easy and clear comprehension of the data.
5. 5. Statistics helps in understanding the nature and pattern of variability of a phenomenon
through quantitative observations.
6. 6. Statistics helps in drawing valid inferences, along with a measure of their reliability about the
population parameters from the sample data.
AREAS THAT USE
STATISTICS
IMPORTANCE OF STATISTICS IN
EDUCATION
1. Statistics helps in the collection and presentation of Data in a calculated and
systematic manner. Statistics in educations helps in the orderly arrangement of
both processed and unprocessed data.
2. Statistics makes the teaching and learning process more efficient. Statistics
in Education, with special considerations to measurement and evaluation of
concepts, are essential parts of the teaching and learning process.
3. Statistics helps in the provision and presentation of the exact type of
description. It practically helps teachers to give an accurate description of data.
This could be found in the cases of the administration of a pupil or observation
of a child.
4. Statistics serves as a reliable source of history in education. This is because
statistical documentation is always empirical and easy for understanding.
5. Statistics helps in the summary and presentation of results. Statistics helps
one to make data precise, concise, and meaningful and to express it in a way
that the persons involved will understand easily.
IMPORTANCE OF STATISTICS IN
EDUCATION
6. Statistics helps in the process of achieving an accurate prediction. Statistics helps to
guide one in any thinking activities. When one thinks systematically through a
calculated analysis and statistics, he or she thinks rightly and is likely to arrive at
positive results quickly.
7. Statistics helps in the analysis of some causal factors. Statistics enables teachers to
analyze some of the causal factors underlying complex and otherwise bebewildering
(confusing) events: it is a common factor that the behavioral outcome is a result of
numerous causal factors.
8. Statistics helps in the hospital analysis. Statistical analysis is very necessary in the
hospital for the best test results. Imagine where a doctor predicts a disease based on
statistical analysis, he is likely to present the best result out of it.
9. Statistics assists significantly in the collection of data and information. Statistics aids
in the prediction of future events.
10.Statistics makes studies to be highly responsive and empirical. The role of statistics
in the collection of data and information is a very sensitive one. With good statistical
data, one can access a variety of information. With this, one can easily handle and
manipulate the data accordingly.
POPULATION VS. SAMPLE
Instatistics, we commonly hear the words samples and
population. In most experimental research, we usually use the
term samples while in educational research, we commonly use
population.
Population is the entire group of elements to be studied while a
sample is the specific group or the subset of a population.
Since sample is a subset of a population, this means that the
sample size is always less than the population size. To have a
better understanding, let us say that the population refers to
the senior high school students of Pampanga State Agricultural
University. The sample that will be used in the study is the
students of the STEM strand. In research, population does not
always refer to people.
POPULATION VS. SAMPLE
CLASSIFICATION OF STATISTICS
1. Parametric Statistics – is an approach which assumes a
random sample from a normal distribution and involves testing
of hypothesis about the population parameter. The basic idea
in a parametric method is that there is a set of fixed
parameters that determine a probability model.
2. Nonparametric Statistics – is a statistical approach for
estimating and hypothesis testing when no underlying data
distribution is assumed. In this statistical technique, the set of
parameters is not fixed. It is also referred to as distribution-free
method.
DATA AND VARIABLES
Data refers to observations and measurements which have been
collected and analyzed in some way, often through research and
can be categorized as qualitative (attributes) and quantitative
(numerical). Variables are the characteristics or attributes that
you are observing, measuring and recording data for- some
examples include height, weight, eye colour, dog breed, climate,
electrical conductivity, customer service satisfaction and class
attendance, just to name a few. As the word suggests, the value
of a variable varies from one subject (i.e. person, place or thing)
to another. Examples of variables include faculty ranks,
educational attainment, salary, and sex
LEVELS OF DATA MEASUREMENT
Data can be classified into four levels of measurement. They are
(from lowest to highest level): Nominal scale level. Ordinal scale
level. Interval scale level. Ratio scale level.
LEVELS OF DATA MEASUREMENT
A categorical data is data which is grouped into categories, such
as data for a 'gender' or 'smoking status' variable while continuous
data is data which is measured on a continuous numerical scale
and which can take on a large number of possible values, such as
data for a ‘weight’ or ‘distance’ variable.
Discrete data measures counts or numbers of events, such as
data for a ‘class attendance’ variable, and while it is numerical
data it is not measured on a continuous numerical scale- so it
doesn’t fit neatly into either of the classifications above. It is
usually treated as continuous data, but if there are only a small
number of values (such as for a ‘number of children under three in
family’ variable) you might choose to treat them as categories
instead.
LEVELS OF DATA MEASUREMENT
One final thing to note is that any continuous data can always
be turned into categorical data, by simply creating categories
out of it. Continuous data for an ‘age’ variable could be turned
into categorical data by creating categories of 11-20 years
old, 21-30 years old, 31-40 years old, etc., for example, and
this can be useful if you want to analyze your continuous data
using statistics and statistical tests designed for categorical
data. You can’t go the other way around though and turn
categorical data into continuous data, so if you have the
choice then for maximum flexibility it is preferable to collect
continuous data.
SOURCES OF DATA
There are two types of statistical data sources namely
primary data source and secondary data source.
1. Primary data source – refers to data that come from
original sources and are collected at hand which include
data from government agencies, business stablishments,
organizations, and individuals who carry original data or
first-hand information relevant to a given problem.
2. Secondary data source – refers to the data collected by
others for another purpose which may include information
stored in books, internet, brochures, journals and
periodicals.
PRESENTATION OF DATA
The data gathered in a research should be presented in a manner that is
easily understood by the audience or listeners. This presentation can be
done in textual, tabular or graphical or a combination of textual and tabular
methods.
1. Textual presentation – data are presented in paragraph form, written
and read, and a combination of texts and numbers.
Example:
Of the 100 students interviewed, the following issues in a library use were
noted: 25 for old books, 40 for unarranged books, 15 for unsuitable lightings,
and 20 for torn pages of books.
PRESENTATION OF DATA
2. Tabular presentation – uses statistiucal table and a systematic organization of data
in columns and rows.
Parts of Statistical table:
Table heading – consists of table number and title. The table number serves to give
the table an identity.
Stubs – classification or categories which are found at the left side of the body of the
table
Box Head – the top of the column
Body – main part of the table
Footnotes – any statement or note inserted at the foot or bottom of the table
Source Notes – source of statistics which may include to acknowledge the origin
of the data.
PRESENTATION OF DATA
3. Graphical presentation – uses graphs (bar, line, pie or circle, and
pictograph) to present the data. A graph is perhaps the most attractive,
effective, and convincing way of presenting a data.
a. Bar Graph – used to show comparison or relationship between groups.
b. Line Graph – most useful in displaying data that changes continuously
over time.
c. Pie or Circle Graph – shows percentages of data effectively.
d. Pictograph (Pictogram) – uses small identical or figures of objects called
isotopes in making comparisons. Each picture represents a definite
quantity
DATA COLLECTION METHOD
There are three data collection method that a researcher can use.
1. Direct or Interview Method. The term ‘interview ‘is derived
from the Latin language which means “see each other“. In general
terms, the interview is nothing but a formal meeting between an
interviewer and interviewee where questions are asked by former
and answers are given by later. This method is considered the
most expensive way of collecting data because it needs more time
and money in conducting it.
2. Indirect or Questionnaire Method. In questionnaire method, it
is not possible on the part of the researcher to conduct an
intensive or in-depth study of the feelings, reactions and
sentiments of the respondents. This method is relatively simple
and inexpensive for it requires a few staff to handle it.
There are following types of questionnaires:
Computer questionnaire. Respondents are asked to answer the
questionnaire which is sent by mail. The advantages of the computer
questionnaires include their inexpensive price, time-efficiency, and
respondents do not feel pressured, therefore can answer when they
have time, giving more accurate answers. However, the main
shortcoming of the mail questionnaires is that sometimes respondents
do not bother answering them and they can just ignore the
questionnaire.
Telephone questionnaire. Researcher may choose to call potential
respondents with the aim of getting them to answer the questionnaire.
The advantage of the telephone questionnaire is that, it can be
completed during the short amount of time. The main disadvantage of
the phone questionnaire is that it is expensive most of the time.
Moreover, most people do not feel comfortable to answer many
questions asked through the phone and it is difficult to get sample
group to answer questionnaire over the phone.
In-house survey. This type of questionnaire involves the researcher
visiting respondents in their houses or workplaces. The advantage of
in-house survey is that more focus towards the questions can be
gained from respondents. However, in-house surveys also have a
range of disadvantages which include being time consuming, more
expensive and respondents may not wish to have the researcher in
their houses or workplaces for various reasons.
Mail Questionnaire. This sort of questionnaires involves the
researcher to send the questionnaire list to respondents through post,
often attaching pre-paid envelope. Mail questionnaires have an
advantage of providing more accurate answer, because respondents
can answer the questionnaire in their spare time. The disadvantages
associated with mail questionnaires include them being expensive,
time consuming and sometimes they end up in the bin put by
respondents.
DATA COLLECTION METHOD
3. Registration Method. This method of collecting data is commonly
enforced by certain laws, ordinances or standard practices. Examples
are birth certificate, marriage certificate and death certificate
registration, license, and motor vehicle registration.
4. Observation Method. This method makes use of the different
human senses in gathering information.
5. Experimentation. This method is usually conducted in laboratories
where specimens are subjected to some aspects of control to find out
cause and effect relationships.
SAMPLING TECHNIQUES
In research, the researcher should be able to determine the sampling
technique that he or she will used. Various sampling techniques or
sample designs can be used by the researcher. This sampling
technique depends on the nature of the problem and the kind of
population that will be used. If a sample isn't randomly selected, it will
probably be biased in some way and the data may not be a
representative of the population.
A. Probability Sampling. Probability sampling is a sampling
technique, in which the subjects of the population get an equal
opportunity to be selected as a representative sample
1. Simple Random Sampling - Every member and set of members
has an equal chance of being included in the sample. Technology,
random number generators, or some other sort of chance process is
needed to get a simple random sample. Usually, this is done by
getting a certain percentage of the population to be included in the
study.
Example — A teachers puts students' names in a hat and chooses
without looking to get a sample of students.
2. Stratified Random Sampling - The population is first split into
groups. The overall sample consists of some members from every
group. The members from each group are chosen randomly.
Example — A student council surveys 100100100 students by getting
random samples of 25 freshmen, 25 sophomores, 25 juniors, and 25
seniors.
3. Cluster Random Sampling - The population is first split into
groups. The overall sample consists of every member from some of the
groups. The groups are selected at random.
Example — An airline company wants to survey its customers one day,
so they randomly select 555 flights that day and survey every
passenger on those flights.
4. Systematic Random Sampling - Members of the population are
put in some order. A starting point is selected at random, and every n
th member is selected to be in the sample.
Example — A principal takes an alphabetized list of student names and
picks a random starting point. Every 20th student is selected to take a
survey.
B. Nonprobability Sampling. Non-probability sampling is a
sampling method in which not all members of the population have
an equal chance of participating in the study.
1. Purposive Sampling - is a non-probability sampling method
and it occurs when “elements selected for the sample are
chosen by the judgment of the researcher. Also known as
judgmental, selective or subjective sampling, purposive sampling
relies on the judgement of the researcher when it comes to selecting
the units (e.g., people, cases/organizations, events, pieces of data)
that Something to laugh about… are to be studied. The main
objective of a purposive sample is to produce a sample that can be
logically assumed to be representative of the population.
Example — A team of researchers wanted to understand what the
significance of white skin—whiteness—means to white people, so
they asked white people about this. This is a homogenous sample
created on the basis of race.
2. Convenience Sampling - The researcher chooses a sample that is
readily available in some non-random way. Accidental sampling is also
similar to convenience sampling.
Example — A researcher polls people as they walk by on the street.
3. Quota Sampling - is defined as a non-probability sampling method
in which researchers create a sample involving individuals that
represent a population. Researchers choose these individuals according
to specific traits or qualities. They decide and create quotas so that the
market research samples can be useful in collecting data.
Example — You could divide a population by the province they live in,
income or education level, or sex. The population is divided into groups
(also called strata) and samples are taken from each group to meet a
quota.