100% found this document useful (1 vote)
280 views66 pages

Basics of Business Statistics

1) The document discusses the basics of business statistics including definitions, major characteristics, why statistics are studied, and applications in business. 2) It covers descriptive statistics such as measures of central tendency and dispersion, inferential statistics including estimation and hypothesis testing, and statistical concepts used in finance, marketing, and operations management. 3) The document also addresses limitations of statistics as well as different types of statistical data including qualitative and quantitative data.

Uploaded by

Simran Tuteja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
280 views66 pages

Basics of Business Statistics

1) The document discusses the basics of business statistics including definitions, major characteristics, why statistics are studied, and applications in business. 2) It covers descriptive statistics such as measures of central tendency and dispersion, inferential statistics including estimation and hypothesis testing, and statistical concepts used in finance, marketing, and operations management. 3) The document also addresses limitations of statistics as well as different types of statistical data including qualitative and quantitative data.

Uploaded by

Simran Tuteja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Name of the course : Basics of Business Statistics

Course code :
Basics of Statistics

Definition:

Science of collection, presentation, analysis, and reasonable interpretation of data.

Statistics presents a rigorous scientific method for gaining insight into


data. For example, suppose we measure the weight of 100 patients in a
study. With so many measurements, simply looking at the data fails to
provide an informative account.

However statistics can give an instant overall picture of data based on


graphical presentation or numerical summarization irrespective to the
number of data points.

Besides data summarization, another important task of statistics is to


make inference and predict relations of variables.
Major characteristics of statistics
• Statistics are the aggregates of facts
• Statistics are affected by a number of factors.
• Statistics must be reasonably accurate
• Statistics must be collected in a systematic manner
• Collected in a systematic manner for a pre-determined purpose
• Statistics should be placed in relation to each other. If one collects data unrelated to each other, then such
data will be confusing and will not lead to any logical conclusions. Data should be comparable over time and
over space

Why study statistics?


1. Data are everywhere
2. Statistical techniques are used to make many decisions that affect our lives
3. No matter what your career, you will make professional decisions that involve data. An understanding of
statistical methods will help you make these decisions efectively
Managerial Application of Statistics
• Actuarial science - applies mathematical and statistical methods to assess risk in the insurance and finance
industries.
• Astrostatistics - applies statistical analysis to the understanding of astronomical data.
• Biostatistics - branch of biology that studies biological phenomena and observations by means of statistical
analysis and includes medical statistics.
• Business analytics - applies statistical methods to data sets (often very large) to develop new insights and
understanding of business performance & opportunities
• Demography - statistical study of all populations that can be applied to any kind of dynamic population, that
is, one that changes over time or space.
• Econometrics - branch of economics that applies statistical methods to the empirical study of economic
theories and relationships.
• Quality control reviews the factors involved in manufacturing and production; it can make use of statistical
sampling of product items to aid decisions in process control or in accepting deliveries
• Statistical finance, an area of econophysics uses exemplars from statistical physics with an emphasis on
emergent or collective properties of financial markets.
Classification of QT
• They can broadly be put under two groups
1) Statistical Techniques: Which are used in Statistical Techniques Programming
conducting the statistical inquiry concerning a Techniques
certain phenomenon.
It includes all the statistical methods beginning Data Classification, Linear Programming
from the collection of data till the task of tabulation, presentation
interpretation of the collected data.
Measures of Central Transportation
Collection, Classification, Summarizing, Tendency
Analysing , Interpretation of the data
Measures of Dispersion Assignment
2. Programming Techniques: Used by many
decision makers in modern times Probability & Probability Game Theory
distribution
First designed to tackle defence and military
problems and are now being used to solve Sampling, Estimation Simulation
business problems
Hypothesis Testing Network Techniques
It includes variety of techniques like linear
programming, games theory, simulation, network Correlation & Regression Sequencing
analysis, queuing theory, and so on
Time Series & Forecating Game Theory
Applications of statistical concepts in the business world
• Finance – correlation and regression, index numbers, time series analysis
• Marketing – hypothesis testing, chi-square tests, nonparametric statistics
• Personel – hypothesis testing, chi-square tests, nonparametric tests
• Operating management – hypothesis testing, estimation, analysis of variance, time series analysis

Limitations
• The inherent limitation concerning mathematical expressions
• High costs are involved in the use of quantitative techniques
• Statistics has no place in all such cases where quantification is not possible. E.g , beauty, intelligence,
courage cannot be quantified
• Statistics reveal the average behaviour, the normal or the general trend.
• Statistics are not 100 per cent precise as is Mathematics or Accountancy
Taxonomy of Statistics
Types of statistics
• Descriptive statistics – Methods of organizing, summarizing, and presenting data in an informative way
• Inferential statistics – The methods used to determine something about a population on the basis of a
sample
• Population –The entire set of individuals or objects of interest or the measurements obtained from all
individuals or objects of interest
• Sample – A portion, or part, of the population of interest

Inferential Statistics
• Estimation
• e.g., Estimate the population mean weight using the sample mean
weight
• Hypothesis testing
• e.g., Test the claim that the population mean weight is 70 kg

Inference is the process of drawing conclusions or making decisions about a population based on sample
results
Descriptive Statistics

• Collect data
• e.g., Survey

• Present data
• e.g., Tables and graphs

• Summarize data
• e.g., Sample mean =
Sampling
A sample should have the same characteristics as the population it is representing.
Sampling can be:
• with replacement: a member of the population may be chosen more than once (picking the candy from the
bowl)
• without replacement: a member of the population may be chosen only once (lottery ticket)

Sampling methods
Sampling methods can be:
• random (each member of the population has an equal chance of being selected)
• Nonrandom

The actual process of sampling causes sampling errors. For example, the sample may not be large
enough or representative of the population. Factors not related to the sampling process cause
nonsampling errors. A defective counting device can cause a nonsampling error.
Statistical data
 The collection of data that are relevant to the problem being studied is commonly the most difficult,
expensive, and time-consuming part of the entire research project.
 Statistical data are usually obtained by counting or measuring items.
Primary data are collected specifically for the analysis desired
Secondary data have already been compiled and are available for statistical analysis
 A variable is an item of interest that can take on many different numerical values.
 A constant has a fixed numerical value.

Variables Types of Variables


• A variable is a characteristic or • Variables can be classified as discrete or continuous.
condition that can change or take
• Discrete variables (such as class size) consist of indivisible categories, and
on different values.
• Most research begins with a
general question about the • continuous variables (such as time or weight) are infinitely divisible into
relationship between two variables whatever units a researcher may choose. For example, time can be measured to
for a specific group of individuals. the nearest minute, second, half-second, etc.
Data
Statistical data are usually obtained by counting or measuring items. Most data can be put into the following
categories:
• Qualitative - data are measurements that each fail into one of several categories. (hair color, ethnic groups and
other attributes of the population)
• quantitative - data are observations that are measured on a numerical scale (distance traveled to college,
number of children in a family, etc.)

Qualitative data Quantitative data


Qualitative data are generally described by words or
letters. They are not as widely used as quantitative data Quantitative data are always numbers and are the result
because many numerical techniques do not apply to the of counting or measuring attributes of a population.
qualitative data. For example, it does not make sense to Quantitative data can be separated into two subgroups:
find an average hair color or blood type.
• discrete (if it is the result of counting (the number of
Qualitative data can be separated into two subgroups: students of a given ethnic group in a class, the number
 dichotomic (if it takes the form of a word with two of books on a shelf, ...)
options (gender - male or female) • continuous (if it is the result of measuring (distance
 polynomic (if it takes the form of a word with more traveled, weight of luggage, …)
than two options (education - primary school, secondary
school and university).
Types of variables

Variables

Qualitative Quantitative

Dichotomic Polynomic Discrete Continuous

Amount of
Children in
Gender, marital Brand of Pc, income tax
family, Strokes
status hair color paid, weight of
on a golf hole
a student
Measuring Variables
• To establish relationships between variables, researchers must observe the variables and record their
observations. This requires that the variables be measured.
• The process of measuring a variable requires a set of categories called a scale of measurement and a process
that classifies each individual into one category.

4 Types of Measurement Scales


1. A nominal scale is an unordered set of 3. An interval scale is an ordered series of equal-sized
categories identified only by name. Nominal categories. Interval measurements identify the
measurements only permit you to determine direction and magnitude of a difference. The zero
whether two individuals are the same or point is located arbitrarily on an interval scale.
different.
4. A ratio scale is an interval scale where a value of
2. An ordinal scale is an ordered set of zero indicates none of the variable. Ratio
categories. Ordinal measurements tell you the measurements identify the direction and magnitude
direction of difference between two of differences and allow ratio comparisons of
individuals. measurements.

14
Numerical presentation of qualitative data
• pivot table (qualitative dichotomic statistical attributes)
• contingency table (qualitative statistical attributes from which at least one of them is polynomic)

Frequency distributions – numerical Steps for constructing a frequency distribution


presentation of quantitative data
1. Determine the number of classes
• Frequency distribution – shows the
frequency, or number of occurences, in 2. Determine the size of each class
each of several categories. Frequency 3. Determine the starting point for the first class
distributions are used to summarize
large volumes of data values. 4. Tally the number of values that occur in each class
• When the raw data are measured on a 5. Prepare a table of the distribution using actual counts and/ or
qunatitative scale, either interval or percentages (relative frequencies)
ration, categories or classes must be
designed for the data values before a
frequency distribution can be
formulated.
Charts and graphs
• Frequency distributions are good ways to present the essential aspects of data collections in concise and
understable terms
• Pictures are always more effective in displaying large data collections

Histogram Example of histogram

• Frequently used to
graphically present interval
and ratio data
• Is often used for interval
and ratio data
• The adjacent bars indicate
that a numerical range is
being summarized by
indicating the frequencies
in arbitrarily chosen classes
Frequency polygon

• Another common method for graphically presenting


interval and ratio data
• To construct a frequency polygon mark the frequencies
on the vertical axis and the values of the variable being
measured on the horizontal axis, as with the histogram.
• If the purpose of presenting is comparation with other
distributions, the frequency polygon provides a good
summary of the data
Ogive
• A graph of a cumulative frequency distribution
• Ogive is used when one wants to determine how many observations
lie above or below a certain value in a distribution.
• First cumulative frequency distribution is constructed
• Cumulative frequencies are plotted at the upper class limit of each
category
• Ogive can also be constructed for a relative frequency distribution.
Pie Chart Bar chart
Time Series
• Effective way of displaying • A common method for graphically
the percentage breakdown of presenting nominal and ordinal scaled Graph
data by category. data
• Useful if the relative sizes of • One bar represents the frequency for • The time series graph is a graph
the data components are to be each category of data that have been measured
emphasized over time.
• The bars are separated, such a graph
• Provides an effective way of is used for nominal and ordinal data – • The horizontal axis of this graph
presenting ratio-scaled or the separation emphasizes the plotting represents time periods and the
interval-scaled data after of frequencies for distinct categories vertical axis shows the numerical
being organized into values corresponding to these
categories time periods
Measures of Central Tendency
• Measures of central tendency are statistical measures which describe the position of a distribution.
• They are also called statistics of location, and are the complement of statistics of dispersion, which provide
information concerning the variance or distribution of observations.
• In the univariate context, the mean, median and mode are the most commonly used measures of central
tendency.
• Computable values on a distribution that discuss the behavior of the center of a distribution.
• In addition to describing the shape of a distribution, want to describe the data set’s central tendency
• A measure of central tendency represents the center or middle of the data
• Population mean (μ) is average of the population measurements
• Population parameter: a number calculated from all the population measurements that describes some aspect
of the population
• Sample statistic: a number calculated using the sample measurements that describes some aspect of the
sample
Descriptive Statistics
• Measures of Central Tendency
 Mean
 Median
 Mode

• Measures of Dispersion or variation


 Range
 Quartile Deviation
 Mean Deviation
 Standard Deviation

• Measures of Symmetry
 Skewness and Kurtosis
Measures of Central Tendency
• Mean, 
• The average or expected value
• Population X1, X2, …, XN
• Sample x1, x2, …, xn

• Median, Md
• The value of the middle point of the ordered measurements

• Mode, Mo
• The most frequent value
The Mean

•  
Arithmetic Mean Calculation Methods
Median Calculation of Median –
Discrete Series

Calculation of Median –
Continuous Series
The Mode
• Mode is the most frequent value or score in the distribution.
• It is defined as that value of the item in a series.
• It is denoted by the capital letter Z.
The mode Mo of a population or sample of
• highest point of the frequencies distribution curve. measurements is the measurement that occurs most
frequently

• Modes are the values that are observed “most


typically”
• Sometimes higher frequencies at two or more
values
• If there are two modes, the data is bimodal
• If more than two modes, the data is multimodal

• When data are in classes, the class with the highest


frequency is the modal class
• The tallest box in the histogram
Relationships Among Mean, Median and Mode

• In symmetrical distributions, the median and mean are equal

• For normal distributions, mean = median = mode • In positively skewed distributions, the mean is greater
than the median 

• In negatively skewed distributions, the mean is smaller than the median 


• As per Karl Pearson => 3 Median = 2 Mean + Mode
Measures of Variation
• Knowing the measures of central tendency is not enough
• Both of the distributions below have identical measures of central tendency

• Absolute Measures of variation


• Range
• Quartile Deviation
• Mean Deviation
• Standard Deviation
• Relative Measures of Deviation
• Coefficient of Range
• Coefficient of Quartile Deviation
• Coefficient of Mean Deviation
• Coefficient of Variation
The Range

• Largest minus smallest


• Measures the interval spanned by all the data
• For a series – 5,3,4,1,4,3,2
• largest is 5 and smallest is 1
• Range is 5 – 1 = 4
Quartile Deviation
Mean Deviation

Mean Deviation - merits

Mean Deviation - demerits


Standard Deviation

31
Variance

•  

32
The Empirical Rule for Normal Populations

• If a population has mean µ and standard


deviation σ and is described by a normal curve,
then

 68.26% of the population measurements lie


within one standard deviation of the mean: [µ‐
σ, µ+σ]
 95.44% lie within two standard deviations of
the mean: [µ-2σ, µ+2σ]
 99.73% lie within three standard deviations of
the mean : [µ-3σ, µ+3σ]

33
Merits of Standard Deviation

Demerits of Standard
Deviation

34
Relative measures of Dispersion

35
Coefficient of Range

36
Coefficient of Quartile Deviation

37
Coefficient of Mean Deviation –

38
Coefficient of Variation

•  

39
Coefficient of Variation – Practical Uses

40
Concept of Skewness

A distribution is said to be skewed-when the mean, median and mode fall at


different position in the distribution and the balance (or center of gravity) is
shifted to one side or the other i.e. to the left or to the right.
Therefore, the concept of skewness helps us to understand the
relationship between three measures-
• Mean.
• Median.
• Mode.
Symmetrical Distribution

• A frequency distribution is said to be symmetrical if the frequencies are equally distributed on both the sides
of central value.
• A symmetrical distribution may be either bell – shaped or U shaped.
• In symmetrical distribution, the values of mean, median and mode are equal i.e. Mean=Median=Mode
Skewed Distribution

• A frequency distribution is said to be skewed if the frequencies are not equally distributed on both the sides
of the central value.

• A skewed distribution may be-

• Positively Skewed
• Negatively Skewed
Skewed Distribution

• Negatively Skewed • Positively Skewed


• In this, the distribution is skewed • In this, the distribution is skewed
to the left (negative) to the right (positive)
• Here, Mode exceeds Mean and • Here, Mean exceeds Mode and
Median. Median.

Mean<Median<Mode Mode<Median<Mean
Tests of Skewness

In order to ascertain whether a distribution is skewed or not the following tests


may be applied. Skewness is present if:
•The values of mean, median and mode do not coincide.
•When the data are plotted on a graph they do not give the normal bell shaped form
i.e. when cut along a vertical line through the center the two halves are not equal.
•The sum of the positive deviations from the median is not equal to the sum of the
negative deviations.
•Quartiles are not equidistant from the median.
•Frequencies are not equally distributed at points of equal deviation from the mode.
Graphical Measures of Skewness
• Measures of skewness help us to know to what degree and in which direction (positive or
negative) the frequency distribution has a departure from symmetry.
• Positive or negative skewness can be detected graphically (as below) depending on whether the
right tail or the left tail is longer but, we don’t get idea of the magnitude
• Hence some statistical measures are required to find the magnitude of lack of symmetry

Mean> Median> Mode Mean=Median=Mode Mean<Median<Mode

Symmetrical Skewed to the Left Skewed to the Right


Statistical Measures of Skewness

Absolute Measures of Skewness Relative Measures of Skewness


Following are the absolute measures of There are four measures of skewness:
skewness:

• Skewness (Sk) = Mean – Median •β and γ Coefficient of skewness

•Karl Pearson's Coefficient of skewness


• Skewness (Sk) = Mean – Mode
•Bowley’s Coefficient of skewness
• Skewness (Sk) = (Q3 - Q2) - (Q2 -
•Kelly’s Coefficient of skewness
Q1)
β and γ Coefficient of Skewness

•  
Karl Pearson's Coefficient of Skewness……01

• This method is most frequently used for measuring skewness. The formula for
measuring coefficient of skewness is given by

SKP = Mean – Mode


σ

Where,
SKP = Karl Pearson's Coefficient of skewness,
σ = standard deviation.

Normally, this coefficient of skewness lies between -3 to +3.


Karl Pearson's Coefficient of Skewness…..02
In case the mode is indeterminate, the coefficient of skewness is:

Mean – (3 Median - 2
SKP = Mean)
σ
Now this formula is equal to

3(Mean - Median)
SKP = σ

The value of coefficient of skewness is zero, when the distribution is symmetrical.


The value of coefficient of skewness is positive, when the distribution is positively skewed.
The value of coefficient of skewness is negative, when the distribution is negatively skewed.
Bowley’s Coefficient of Skewness……01

Bowley developed a measure of skewness, which is based on quartile values.


The formula for measuring skewness is:

(Q3 – Q2) – (Q2 – Q1)


SKB =
(Q3 – Q1)

Where,
SKB = Bowley’s Coefficient of skewness,
Q1 = Quartile first Q2 = Quartile second
Q3 = Quartile Third
Bowley’s Coefficient of Skewness…..02

The above formula can be converted to-

SKB = Q3 + Q1 – 2Median
(Q3 – Q1)

The value of coefficient of skewness is zero, if it is a symmetrical


distribution.
If the value is greater than zero, it is positively skewed distribution.
And if the value is less than zero, it is negatively skewed distribution.
Kelly’s Coefficient of Skewness…..01

Kelly developed another measure of skewness, which is based on percentiles and


deciles.
The formula for measuring skewness is based on percentile as follows:

P90 – 2P50 + P
SKk = 10
P90 – P10
Where,
SKK = Kelly’s Coefficient of skewness,
= Percentile Ninety.
P90
= Percentile Fifty.
P50 = Percentile Ten.

P
Kelly’s Coefficient of Skewness…..02

This formula for measuring skewness is based on percentile are as follows:

SKk = D9 – 2D5 +
D 1 D9 – D 1

Where,
SKK = Kelly’s Coefficient of skewness,
D9 = Deciles Nine.
D5 = Deciles Five. D1 = Deciles one.
Example:

 
Moments:

•In Statistics, moments is used to indicate peculiarities of a frequency distribution.


•The utility of moments lies in the sense that they indicate different aspects of a given distribution.
•Thus, by using moments, we can measure the central tendency of a series, dispersion or variability, skewness
and the peakedness of the curve.
•The moments about the actual arithmetic mean are denoted by μ.
•The first four moments about mean or central moments are following:-
Moments:

Moments around Mean Moments around any Arbitrary No


Conversion formula for Moments

1st moment: (Mean)

2nd moment:
(Variance)

3rd moment: (Skewness)

4th moment: (Kurtosis)


Two important constants calculated from μ2, μ3 and μ4 are:-

β1 (read as beta one)


β2 (read as beta two)
•  • 
Kurtosis
•Kurtosis is another measure of the shape of a frequency curve. It is a Greek word,
which means bulginess.

•While skewness signifies the extent of asymmetry, kurtosis measures the degree
of peakedness of a frequency distribution.

•Karl Pearson classified curves into three types on the basis of the shape of their
peaks. These are:-
•Leptokurtic
•Mesokurtic
•Platykurtic
Kurtosis

• When the peak of a curve becomes


relatively high then that curve is
called Leptokurtic.

• When the curve is flat-topped,


then it is called Platykurtic.

• Since normal curve is neither very


peaked nor very flat topped, so it
is taken as a basis for comparison.

• This normal curve is called


Mesokurtic.
Measure of Kurtosis

• There are two measure of Kurtosis:

• Karl Pearson’s Measures of Kurtosis

• Kelly’s Measure of Kurtosis


Karl Pearson’s Measures of Kurtosis

Formula Result:

•  • 
Kelly’s Measure of Kurtosis

Formula Result:

•  • 
Example:

•  

You might also like