0% found this document useful (0 votes)
109 views84 pages

Data Management

This document discusses data management and statistics. It defines key terms like statistics, variables, data, population and sample. It identifies statisticians who contributed to the development of statistics like Quetelet, Graunt, Gauss, Galton and Pearson. Fields that use statistical theories are also outlined, such as health, medicine, education and business. The uses and categories of statistics are described, with descriptive statistics focusing on data organization and presentation, and inferential statistics enabling analysis and predictions. The document differentiates between variables, populations, samples, and dependent and independent variables.

Uploaded by

Rachelle Sabatin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views84 pages

Data Management

This document discusses data management and statistics. It defines key terms like statistics, variables, data, population and sample. It identifies statisticians who contributed to the development of statistics like Quetelet, Graunt, Gauss, Galton and Pearson. Fields that use statistical theories are also outlined, such as health, medicine, education and business. The uses and categories of statistics are described, with descriptive statistics focusing on data organization and presentation, and inferential statistics enabling analysis and predictions. The document differentiates between variables, populations, samples, and dependent and independent variables.

Uploaded by

Rachelle Sabatin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

DATA

MANAGEMENT
CLASS ACTIVITY
TWO PICS ONE WORD

C O L L E C T I O N
___ ___ ___ ___ ___ ___ ___ ___ ___ ___
TWO PICS ONE WORD

A N A L Y S I S
___ ___ ___ ___ ___ ___ ___ ___
TWO PICS ONE WORD

P R E S E N T A T I O N
___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___
TWO PICS ONE WORD

I N T E R P R E T A T I O N
__ __ __ __ __ __ __ __ __ __ __ __ __ __
FOUR PICS ONE WORD

S T A T I S T I C S
___ ___ ___ ___ ___ ___ ___ ___ ___ ___
STATISTICS
OBJECTIVES:
This section aims to:
1. define statistics;
2. enumerate different statisticians who contribute to
the development of statistics;
3. identify other fields which make use of statistical
theories and techniques; and
4. discuss the uses and categories of statistics.
DEFINITION OF TERMS
Statistics or Data Management
• is the science of collecting, organizing, presenting,
analyzing and interpreting numerical data.
• refer to the mere tabulation of numeric information in
reports of stock, market transactions, or to the body of
techniques used in processing or analyzing data.
Biostatistics
• is a branch of applied statistics that concerns the
application of statistical methods to medicine and
biological problems.
Statistician
• A person who simply collects information or one who
prepares analysis or interpretations.
• A scholar who develops a mathematical theory on
which the science of statistics is based.
Data
• the raw material which the statistician works. It can be
found through surveys, experiments, numerical records,
and other modes of research
STATISTICIANS WHO CONTRIBUTES TO THE
DEVELOPMENT OF STATISTICS

The following persons were statisticians who contribute to the


development of statistics:
• Adolf Quetelet
• John Graunt
• Carl Gauss
• Sir Francis Galton and Karl Pearson
• Abraham De Moivre
• William Goset
• Sir Ronald Fisher
ADOLF QUETELET

referred to as the Father


of Modern Statistics. He
established a commission
for statistics which
became a model for many
organizations of
statisticians.
JOHN GRAUNT

observed that
percentages of deaths
from suicides, accidents
and various diseases
CARL GAUSS

made its derivation from


study of errors in repeated
measurements which called
Gaussian distribution.
SIR FRANCIS
G A LT O N A N D K A R L
PEARSON

–developed the
theory of regression
and correlation
ABRAHAM DE
MOIVRE

– discovered the
equation of the
normal distribution
WILLIAM S. GOSSET

developed the
small-sample theory
SIR RONALD FISHER

– further developed
the small-sample
theory in the 20th
century which had a
great impact on
contemporary
statistical procedures
SOME OF THE SUBJECT AREAS OR FIELDS WHICH MAKE
USE OF STATISTICAL THEORIES AND TECHNIQUES

The following are the subject areas or fields which make use of
statistical theories and techniques:
• Health – public health programs, hospitalization, problems of
medical care, occurrences and cost of diseases, accidents and
handicaps.
• Medicine – causes, diagnosis, treatment and prevention of
communicable and non-communicable diseases.
• Education – teaching-learning process, measurement and
evaluation, educational studies, enrollment, management and
finance.
• Engineering – design and test of performance and quality
control
• Biology – research and experimentation in life processes of
plants and animals to promote growth and prolong life.
• Sports – points made out of so many attempts from the field
or foul from the line such as in basketball, volleyball,
football, etc.
• Business – production, distribution, sale of merchandise,
auditing and accounting procedures.
• Automatic Data Processing – construction, operation and
use of high speed computing and data processing equipment.
• Social Sciences – social systems and social welfare, behavior
patterns of groups of people.
• Economics – production, resources, trade, labor force,
consumers and producers responses to products and price
changing, advertising system and distribution.
USES OF STATISTICS

The following are the uses of statistics in general.


1. Statistics can give a precise description of data.
2. Statistics can predict the outcome of experiment or
the behavior of an individual.
3. Statistics can be used to test a hypothesis.
CATEGORIES OF STATISTICS
1. Descriptive Statistics – is concerned with
collecting, organizing, presenting, and analyzing
numerical data. The statistician tries to describe a
situation.
• Masses of unorganized numerical data are of little
value unless statistical techniques are available to
organize this type of data and present them into a
meaningful form such as in the form of tables,
charts and graphs.
EXAMPLES OF DESCRIPTIVE STATISTICS:

• Over-all average of students grades in his seven subjects.


• Median sale of computers for the month of January.
• 63% of those found to have diabetes were not aware that they have
such disease.
• Cigarettes were associated with 29% of the 4,470civilian fire deaths
in 2012.
• The mean cholesterol level of patients with myocardial infarction.
• Frequency distribution of fasting serum cholesterol level (mg/dL) of
1000 male Medical students.
2. Inferential Statistics – also called Statistical Inference or
Inductive Statistics, is concerned with analyzing the
organized data leading to prediction or inferences.

• It implies that before carrying out an inference, appropriate


and correct descriptive measures or methods are employed
to bring out good results.
• The area of inferential statistics called hypothesis testing
is a decision-making process for evaluating claims about a
population, based on the information obtained from
samples.
EXAMPLES OF INFERENTIAL STATISTICS :

• Majority of the patients who died of lung and liver cancer are males.
• Carbon monoxide is one of the major pollutants of smog.
• Wearing seat belts increases the chance of survival in automobiles
accidents.
• Drinking red wine may reduce the risk of heart diseases by 12%.
• Carrot juice may strengthen the lungs.
• Aspirin may lower the rate of heart attacks by 50%
POPULATION AND SAMPLE

Population or the universe refers to the


collection of all traits under study or under
consideration. A small part of this big group is
called a sample.
• Population – refers to the • Sample – is a subgroup of
groups or aggregates of the population.
people, objects, materials,
events, or things of any form. • Taken from the population so
as to represent the population
• The totality of all the samples. characteristics or traits.

• The measures of population • The measures of the samples


are called “parameters”. are called “estimates” or
“statistic”.

P O P U L AT I O N : PAR A M E T E R S S AM P L E : E S T IM AT E S
• Examples of population: • Examples of sample:
• Scores of the entire students • Scores of students in a class
of secondary level • The 40 children who actually
• All children of any age who participated in one specific
study about siblings.
have older or younger
• 10 students who can
siblings
manipulate the computer
easily.
KEY TERM:ENTIRE, ALL,
GENERAL KEY TERM:CLASS, 40, 10,
SPECIFIC
NEXT ….!?
VARIABLES AND
DATA
OBJECTIVES:

This section aims to:


1. differentiate the different types of variables;
2. discuss the types and scales of data; and
3. distinguish the types of variables and data
encountered in real life situation.
VARIABLE

• The characteristic that is being studied is called


variable.
• It varies across individuals or objects.
• It includes age, race, gender, intelligence, personality
type, attitudes, political or religious affiliation,
height, weight, marital status, eye color, etc.
TWO TYPES OF VARIABLES
1. Qualitative Variables
2. Quantitative
A. Discrete Variable
B. Continuous Variable
1. QUALITATIVE VARIABLES
– represent differences in quality, character, or kind but not in
amount.

• Ex. Sex, birthplace or geographic locations, religious


preference, marital status, eye color, brand of computer
purchased, etc.
2. QUANTITATIVE VARIABLES

– numerical in nature and can be ordered or ranked.

• Ex. Weight, height, age, test scores, speed and body


temperatures, grades, etc.
A. DISCRETE VARIABLE
– is a variable whose values can be counted using integral
values.

• Ex. Number of enrollees, drop-outs, deaths, number of


students in a classroom, number of computers functioning,
number of mathematics subjects and number of calls
received.
B. CONTINUOUS VARIABLE
• – is a variable that can assume any numerical value
over an interval or intervals. It can yields decimals or
fractions.

• Ex. Height, weight, temperature, time


TWO TYPES OF VARIABLES

Variable

Quantitative Qualitative

Discrete Continuous
DEPENDENT AND INDEPENDENT
VARIABLES

• Dependent Variable – • Independent Variable


the variable whose – the predictor
value is being predicted
EXAMPLE 1: to predict the value
of sunlight on the growth of a
certain plant
Independent
Dependent
variable:
Variable:
• amount of
• growth of the
sunlight exposed
plant
to the plant
EXAMPLE 2: To evaluate the effect of
using computer to the performance
of the students

Independent Dependent
Variable: Variable:
• time of using the • performance of
computer the students
PRIMARY AND SECONDARY DATA
• Secondary data refer to
• Primary data refer to
information which are taken
information which are
from published or
gathered directly from
unpublished data which are
the original source or previously gathered by other
which are based on direct individuals or agencies (e.g.-
or first-hand experience books, magazines,
(e.g. – autobiographies, newspapers, etc.).
diaries, etc.).
SCALES OF MEASUREMENT OF DATA

1. Nominal Data
2. Ordinal Data
3. Interval Data
4. Ratio Data
NOMINAL DATA
– use numbers for the purpose of identifying membership in a group
or category.

• Examples:
1. electric consumption: (1) residential,
(2) commercial, (3) industrial, (4) government,
(5) others
2. Gender of NEUST BSIT/BSN students: (1) male, (2) female
3. field of study: (1) BSN, (2) BIT, (3) BSChem (4)ECET
ORDINAL DATA
– connote ranking or inequalities

• Examples:
1. grades (A, B, C, D, E)
2. socioeconomic status (low, medium, or high)
3. intelligence (above average, average, below average)
4. built of people (small, medium, large, extra large)
5. contest (first, second, third)
INTERVAL DATA
– does not only include “greater than” and “less than” relationships,
but also has a limit of measurement that permits us to describe how
much more or less one object possesses than another.
• No true zero which means zero is not really nothing.
Examples:
1. Fahrenheit temperature scale (25 0F is colder than 360F)
2. scores on test as a measure of knowledge (a score of 5 is better
than 0 score)
RATIO DATA
– similar to interval data, but has an absolute zero and multiples are
meaningful.

• Examples:
1. election votes
2. measurements of length, height, weight, area, volume, density,
velocity, money, etc.
3. average daily delivery of 1000 packages per day
4. ages of students enrolled in Statistics subject
5. pages in the calculus book for engineering students
PLAY WITH OPERATIONS

1.2.3.4.5=120
1.2.3.4.5.6.7.8.9.10=
1.2.3.4……48.49.50=
1+2+3+4+5=15
1+2+3+4+5+6+7+8+9+10=
1+2+3+4+5+…..18+19+20=
1+2+3+4+5+….+48+49+50=
1+2+3+4+5+….+98+99+100=
SHORTCUT OPERATION
FACTORIAL
5!=120
10!=3 628 800
50!=3.04E64
SUMMATION
∑ x, where x is from 1 to 5 = 15
∑ x, where x is from 1 to 10 = 55
∑ x, where x is from 1 to 50 = 1275
∑ x, where x is from 1 to 100 = 5050
SUMMATION
OBJECTIVES:

This section aims to:


1. define summation;
2. enumerate the properties of summation; and
3. apply the properties of summation in solving.
,

SUMMATION
• – special symbol for writing of sums denoted by ∑ and is
defined as:

, where n and i are called the upper and lower limits


respectively.
Example 1.
Properties of Summation

1. = cn

Examples:
Properties of Summation

2.
Examples:

• a.
• b. If
Properties of Summation

3.

• Examples:
a.
• b. If
MOVE TO OTHER PPP…

•DATA COLLECTION AND


PRESENTATION
MEASURES OF
CENTRAL TENDENCY
MEASURES OF CENTRAL TENDENCY
There are many ways of describing of a given set of
data. A good number of descriptive measures exist in
statistics whose use depends largely on the nature of data
and the intended purpose of the description. This
measure is the measures of central tendency, it is use to
see how a large set of raw materials can be summarized
so that the meaningful essential can be extracted from it.

The most commonly measures of central tendency are


the mean, median, and mode.
MEAN ()
The most popular and useful measure of central tendency is
the arithmetic mean, which simply refer to as the mean.
Widely referred to in everyday usage as the average.

Where: N = total number of measurements


= represent the mean
= represents the individual scores

The symbol  is a Greek letter sigma, which means sum of.


In plain language, the arithmetic is obtained merely by
adding the individual scores and dividing the sum by the
number of scores.
PROPERTIES OF THE MEAN

• easy to compute
• easy to understand
• valuable in statistical tool
• strongly influence by extreme values, this is
particularly true when the number of cases is small
• cannot be compute when distribution contains
open-ended intervals
USES OF MEAN
• for interval and ratio measurement
• when greatest sampling stability is desired
• when the distribution is symmetrical about the
center
• when we want to know the “center of gravity” of a
sample
EXAMPLE 1
The grades of Jake in five subjects are 85, 86, 88,
90, and 92. What is his mean grade?
Solution:

Therefore, the average grade of Jake in his five


subjects is 88.2.
EXAMPLE 2
BSIT 1Z has 10 officers in their Statistics class. Two got a grade of
1.5, three got 1.75, another three got 2.25, one got 2.75 and another
got 3.0 as their final grades. What is the mean grade of the student
officers?
Solution:
Example 3: Determine the weighted mean if, 500
bags were sold at P250.00 each, 350 bags at
P200.00 each , 200 bags at P150.00 each , 150
bags at P100.00 each and 50 bags at P80.00 each.
Solution:

[500 x 250] + [350 x 200] + [200 x 150] + [150 x 100] + [50 x 80]
---------------------------------------------------------------------------------
500 + 350 + 200 + 150 + 50

P 244,000.00
--------------------
1250

P195.20
MEAN FOR GROUPED DATA
• For grouped data, the formula for finding the mean is as
follows:

where: = i th frequency
= i th class mark
i = 1,…, n
n = total number of observations
EXAMPLE 1
Age (in f x fx
Find the mean age of the patients
years)
in Hospital Y given in the table.
60-68 1 64 64 n =36
51-59 2 55 110 The mean age of the patients in
Hospital Y is
42-50 3 46 138
= 29.75
33-41 2 37 74
Therefore, the mean age of the
24-32 20 28 560 patients in Hospital Y is 29.75 years
old.
15-23 5 19 95
6-14 3 10 30
EXAMPLE 2

Find the mean score of the BS


Scores f x fx Environmental Science students in
Statistics examination given in the
table.
91-95 19 93 1,767 n = 165
86-90 21 88 1,848 Solution:

81-85 25 83 2,075
76-80 39 78 3,042
71-75 35 73 2,555 Therefore, the mean of the
students’ scores is 79.12.
66-70 26 68 1,768
MEDIAN ()
The median is the value of the middle item is an
ordered arrangement of data. In an ordered
distribution, half of the terms are located above the
median and half are below the median. The median
is a positional measure; hence, the values of the
individual items in a distribution do not affect the
median.
USES OF MEDIAN
• for ordinal or rank measurement
• when there is no sufficient time to compute the mean
• when the distribution markedly skewed, this is true
when one or more very extreme measures are at one
end of this distribution.
• when the cases fall within the upper or lower values
of the distribution and not in how far they are from
the central point
• when an incomplete distribution is given
• when typical score is desired
Example: Find the median salary of the seven employees
working in a small government department, annual salaries
(in thousand pesos) are as follows: 28, 60, 26, 32, 30, 26, 29.

Solution:
Arrange first the salaries in ascending/descending order.

26 26 28 29 30 32 60

60 32 30 29 28 26 26

The median salary is therefore P 29, 000.00


MEDIAN FOR GROUPED DATA
• The formula for finding the median of a grouped data is given by:

where: lower class boundary of the median class

< 𝑐𝑓𝑏 = cumulative frequency below the median class

𝑖 = class size

𝑓𝑀𝑑 = frequency of the median class


EXAMPLE 1

Find the median score of the


Scores f x cf BS Environmental Science
students in Statistics exam given
in the table.
91-95 19 93 19
86-90 21 88 40
5)
81-85 25 83 65
76-80 39 78 104
71-75 35 73 139 Therefore, the median score
is 77.74.
66-70 26 68 165
MODE ()
The mode is the simplest measure of central tendency. It
may be early identified by merely looking at an ungrouped
set of scores and locating the score or item which occurs
most frequently.

A distribution with only one mode is said to be


unimodal while a distribution with two or more modes is
described as multimodal. A distribution which has two
modes is labeled as bimodal; with three modes, as
trimodal; and so on.
USES OF MODE
• for nominal or categorical data
• when the quickest value of central tendency is
desired
• when a very rough value of central tendency is
acceptable
• when we wish to know the most typical, or
most frequent case in the distribution
Example 1: The manager of the men’s store observes that
the ten pairs of trousers sold yesterday had the following
waist sizes (in inches); 31, 34, 36, 33, 28, 34. 30, 34, 32, 40.

Solution:

31, 34, 36, 33, 28, 34, 30, 34, 32, 40

The mode of the waist sizes is 34 inches, and this fact is


undoubtedly of more interest to the manager than are the
facts that the mean waist size is 33.2 inches and the median
waist size is 33.5 inches.
MODE FOR GROUPED DATA
• To find the mode of a grouped data, the formula below is applied.

where:

𝐿𝑀0 = lower class boundary of the modal class


∆1 = difference between the frequency of the modal class and
frequency above it
∆2 = difference between the frequency of the modal class and
frequency just below it
𝑖 = class size
EXAMPLE 1

Find the modal score of the BS


Environmental Science students in
Scores f Statistics exam given in the table

For the distribution in table, the


91-95 19 modal class is the class interval
76-80, since it contains the highest
86-90 21 frequency. Hence,
81-85 25 39-25=14 39-35 = 4
76-80 39
71-75 35
66-70 26 Therefore, the modal score is 79.39.

You might also like