0% found this document useful (0 votes)
77 views

Statistical Analysis

Statistical analysis provides a framework for organizing, analyzing data, and examining business problems logically and systematically. There are two broad categories of statistical methods - descriptive statistics and inferential statistics. Descriptive statistics summarize data through measures of central tendency and variability, while inferential statistics allow inferences about populations based on sample data. Statistics helps managers analyze past performance, predict the future, and make effective decisions by describing markets, informing advertising, setting prices, and responding to changes in demand.

Uploaded by

Ammar Hassan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Statistical Analysis

Statistical analysis provides a framework for organizing, analyzing data, and examining business problems logically and systematically. There are two broad categories of statistical methods - descriptive statistics and inferential statistics. Descriptive statistics summarize data through measures of central tendency and variability, while inferential statistics allow inferences about populations based on sample data. Statistics helps managers analyze past performance, predict the future, and make effective decisions by describing markets, informing advertising, setting prices, and responding to changes in demand.

Uploaded by

Ammar Hassan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Statistical analysis 

provides a framework for organizing data, analyzing data, and


examining business problems in a logical and systematic way. Statistical
methods may be broken into two broad categories—methods of description
and methods of inference. ...
How does statistics help a manager?
Statistical Statistical research in busines
 Measures of frequency count, Percent, Frequency. ...
 Measures of Central Tendency. * Mean, Median, and Mode. ...
 Measures of Dispersion or Variation. * Range, Variance, Standard Deviation. ...
 Measures of Position. * Percentile Ranks, Quartile Ranks.
s enables managers to analyze past performance, predict future business partices and
lead organizations effectively. Statistics can describe markets, inform advertising, set
prices and respond to changes in consumer demand.

descriptive statistics are brief descriptive coefficients that summarize a given data set, which
can be either a representation of the entire or a sample of a population. Descriptive statistics
are broken down into measures of central tendency and measures of variability (spread).
Measures of central tendency include the mean, median and mode, while measures
of variability include standard deviation, variance, minimum and maximum variables,
and kurtosis and skewness.
pes
There are four major types of descriptive
Statistical inference is the process of using data analysis to infer properties of
an underlying distribution of probability.[1] Inferential statistical analysis infers
properties of a population, for example by testing hypotheses and deriving
estimates. It is assumed that the observed data set is sampled from a larger
population.
Inferential statistics can be contrasted with descriptive statistics. Descriptive
statistics is solely concerned with properties of the observed data, and it does not
rest on the assumption that the data come from a larger population. In machine
learning, the term inference is sometimes used instead to mean "make a
prediction, by evaluating an already trained model";[2] in this context inferring
properties of the model is referred to as training or learning (rather
than inference), and using a model for prediction is referred to
as inference (instead of prediction); see also predictive inference

Application of Statistics in Business


Posted by: Dilip D | Business Analyitics

The field of statistics has numerous applications in business. Because of technological


advancements, large amounts of data are generated by business these days. These data are now
being used to make decisions. These better decisions we make help us improve the running of a
department, a company , or the entire economy.

“Statistics is extensively used to enhance Business performance through Analytics”


 STUDY MBA IN BUSINESS ANALYTICS

Marketing
As per Philip Kotler and Gary Armstrong marketing “ identifies customer needs and wants ,
determine which target markets the organisations can serve best, and designs appropriate
products, services and Programs to serve these markets”

Marketing is all about creating and growing customers profitably. Statistics is used in almost
every aspect of creating and growing customers profitably. Statistics is extensively used in
making decisions regarding how to sell products to customers. Also, intelligent use of statistics
helps managers to design marketing campaigns targeted at the potential customers. Marketing
research is the systematic and objective gathering, recording and analysis of data about aspects
related to marketing. IMRB international, TNS India, RNB Research, The Nielson , Hansa
Research and Ipsos Indica Research are some of the popular market research companies in India.
Web analytics is about the tracking of online behaviour of potential customers and studying the
behaviour of browsers to various websites.

Use of Statistics is indispensable in forecasting sales, market share and demand for various types
of Industrial products.

Factor analysis, conjoint analysis and multidimensional scaling are invaluable tools which are
based on statistical concepts, for designing of products and services based on customer response.

Finance
Uncertainty is the hallmark of the financial world. All financial decisions are based on
“Expectation” that is best analysed with the help of the theory of probability and statistical
techniques. Probability and statistics are used extensively in designing of new insurance policies
and in fixing of premiums for insurance policies. Statistical tools and technique are used for
analysing risk and quantifying risk, also used in valuation of derivative instruments, comparing
return on investment in two or more instruments or companies.

Beta of a stock or equity is a statistical tool for comparing volatility, and is highly useful for
selection of portfolio of stocks.

The most sophisticated traders in today’s stock markets are those who trade in “derivatives” i.e
financial instruments whose underlying price depends on the price of some other asset.

Economics
Statistical data and methods render valuable assistance in the proper understanding of the
economic problem and the formulation of economic policies. Most economic phenomena and
indicators can be quantified and dealt with statistically sound logic.

In fact, Statistics got so much integrated with Economics that it led to development of a new
subject called Econometrics which basically deals with economics issues involving use of
Statistics.
Operations
The field of operations is about transforming various resources into product and services in the
place, quantity, cost, quality and time as required by the customers. Statistics plays a very useful
role at the input stage through sampling inspection and inventory management, in the process
stage through statistical quality control and six sigma method, and in the output stage through
sampling inspection. The term Six Sigma quality refers to situation where there is only 3.4
defects per million opportunities.

Human Resource Management or Development


Human Resource departments are inter alia entrusted with the responsibility of evaluating the
performance, developing rating systems, evolving compensatory reward and training system, etc.
All these functions involve designing forms, collecting, storing, retrieval and analysis of a mass
of data. All these functions can be performed efficiently and effectively with the help of
statistics.

Information Systems
Information Technology (IT) and statistics both have similar systematic approach in problem
solving. IT uses Statistics in various areas like, optimisation of server time, assessing
performance of a program by finding time taken as well as resources used by the Program. It is
also used in testing of the software.

Data Mining
Data Mining is used in almost all fields of business.

In Marketing, Data mining can be used for market analysis and management, target marketing,
CRM, market basket analysis, cross selling, market segmentation, customer profiling and
managing web based marketing, etc.

In Risk analysis and management, it is used for forecasting , customer retention, quality control,
competitive analysis and detection of unusual patterns.

In Finance, it is used in corporate planning and risk evaluation, financial planning and asset
evaluation, cash flow analysis and prediction , contingent claim analysis to evaluate assets, cross
sectional and time series analysis, customer credit rating, detecting of money laundering and
other financial crimes.

In Operations, it is used for resource planning , for summarising and comparing the resources
and spending.

In Retail industry, it is used to identify customer behaviours, patterns and trends as also for
designing more effective goods transportation and distribution policies, etc
frequency distribution is a representation, either in a graphical or tabular format, that
displays the number of observations within a given interval. The interval size depends
on the data being analyzed and the goals of the analyst. The intervals must be mutually
exclusive and exhaustive. Frequency distributions are typically used within a statistical
context. Generally, frequency distribution can be associated with the charting of
a normal distribution.
Uses of Frequency Distribution
 It is quite useful for data analysis.
 It assists in estimating the frequencies of the population on the basis of the
ample.
 It facilitates the computation of different statistical measures.
Frequency Distribution Table
Frequency distribution table (also known as frequency table) consists of various
components.
Classes: A large number of observations varying in a wide range are usually classified
in several groups according to the size of their values. Each of these groups is defined
by an interval called class interval. The class interval between 10 and 20 is defined as
10-20.
Class limits: The smallest and largest possible values in each class of a frequency
distribution table are known as class limits. For the class 10-20, the class limits are 10
and 20. 10 is called the lower class limit and 20 is called the upper class limit.
Class limit: Class limit is the midmost value of the class interval. It is also known as the
mid value. Mid value of each class = (lower limit + Upper limit)/2.
Magnitude of a class interval: The difference between the upper and lower limit of a
class is called the magnitude of a class interval.
Class frequency: The number of observation falling within a class interval is called
class frequency of that class interval.
Relative Frequency Distribution
It's a distribution where we mention relative frequencies against each class interval..
Relative frequency of a class is the frequency obtained by dividing frequency by the
total frequency. Relative frequency is the proportion of the total frequency that is in any
given class interval in the frequency distribution.
Cumulative Frequency Distribution
One of the important types of frequency distribution is Cumulative frequency
distribution. In cumulative frequency distribution, the frequencies are shown in the
cumulative manner. The cumulative frequency for each class interval is the frequency
for that class interval added to the preceding cumulative total. Cumulative frequency
can also defined as the sum of all previous frequencies up to the current point.
Simple Frequency Distribution
Simple frequency distribution is used to organize the larger data sets in an orderly
fashion. When there are several cases to be studied, it's a good idea to list them
separately, or else there will be a lengthy list to use. . A simple frequency distribution
shows the number of times each score occurs in a set of data. To find the frequency for
score count how many times the score occurs.
Grouped Frequency Distribution
A grouped frequency distribution is an ordered listed of a variable X, into groups in one
column with a listing in a second column, which is called the frequency column. A
grouped frequency distribution is an arrangement class intervals and corresponding
frequencies in a table.
Ungrouped Frequency Distribution
A frequency distribution with an interval width of 1 is called ungrouped frequency
distribution. Ungrouped frequency distribution is an arrangement of the observed values
in ascending order. The ungrouped frequency distribution are those data, which are not
arranged in groups. They are known as individual series.
Mean of Frequency Distribution
Mean of frequency distribution can be found by multiplying each midpoint by its
frequency, and then dividing by the total number of values in the frequency distribution.
Mean = ∑=f×xn∑=f×xn
where, f = frequency in each class
n = sum of the frequencies.

Graphical representation refers to the use of charts and graphs to visually display,


analyze, clarify, and interpret numerical data, functions, and other qualitative structure
A Pie Chart is a type of graph that displays data in a circular graph. The pieces of the
graph are proportional to the fraction of the whole in each category. In other
words, each slice of the pie is relative to the size of that category in the group as a
whole. The entire “pie” represents 100 percent of a whole, while the pie “slices”
represent portions of the whole
Frequency bar graph
Frequency bar graph
A bar graph is a graph that displays a bar for each category with the length of
each bar indicating the frequency of that category. To construct a bar graph, we need
to draw a vertical axis and a horizontal axis.

frequency histrogram
A graph that uses vertical columns to show frequencies (how many times each score
occurs). And no gaps between the bars.
Another type of graph that can be drawn to represent the same set of data as a
histogram represents is a frequency polygon. A frequency polygon is a graph
constructed by using lines to join the midpoints of each interval, or bin. The heights of
the points represent the frequencies

In statistics, an ogive, also known as a cumulative frequency polygon, can refer to one
of two things: any hand drawn graphic of a cumulative distribution function. any
empirical cumulative distribution function.
Measures of central tendency
The mean (average) of a data set is found by adding all numbers in the data set and
then dividing by the number of values in the set. The median is the middle value when a
data set is ordered from least to greatest. The mode is the number that occurs most
often in a data set. Created by Sal Khan.
The arithmetic mean is the simplest and most widely used measure of a mean,
or average. It simply involves taking the sum of a group of numbers, then dividing that
sum by the count of the numbers used in the series. For example, take the numbers 34,
44, 56, and 78. The sum is 21

the geometric mean is calculated by raising the product of a series of numbers to the


inverse of the total length of the series. The geometric mean is most useful when
numbers in the series are not independent of each other or if numbers tend to make
large fluctuations
The harmonic mean is a type of numerical average. It is calculated by dividing the
number of observations by the reciprocal of each number in the series. Thus,
the harmonic mean is the reciprocal of the arithmetic mean of the reciprocals.
For example, consider 2, 3, 5, 7, and 60 with number of observations as 5.

The range is the difference between the largest and the smallest observation in the
data. The prime advantage of this measure of dispersion is that it is easy to calculate.
On the other hand, it has lot of disadvantages. It is very sensitive to outliers and does
not use all the observations in a data set.
The average absolute deviation, or mean absolute deviation (MAD), of a data set is
the average of the absolute deviations from a central point. It is a summary
statistic of statistical dispersion or variability. In the general form, the central point can
be a mean, median, mode, or the result of any other measure of central tendency or any
random data point related to the given data set. The absolute values of the differences
between the data points and their central tendency are totaled and divided by the
number of data points.

Absolute Measures of Dispersion:


1. Range: The simplest and the easiest method of measuring dispersion of the values of
a variable is the Range. It is measured just as the difference between the highest and the
lowest values of a variable. The extent of dispersion increases as the divergence between
the highest and the lowest values of the variable increases.
We thus express the magnitude of Range as:
Range = (highest value – lowest value) of the variable.

For determining Range of a variable, it is necessary to arrange the values in an


increasing order.

ADVERTISEMENTS:

It will enable us to avoid mistakes in calculation and give us the best result.

Let us consider two separate examples below considering both the grouped and the
ungrouped data separately.

Example 1:
Consider the following series of numbers:
ADVERTISEMENTS:

1, 2, 4, 6, 8, 10, 12.

Solution:
Here, the highest value of the series is 12 and the lowest is 1.

Therefore, the Range = 12 – 1 = 11 i.e. the values of the variable are scattered within 11
units.

ADVERTISEMENTS:

Example 2:
For the data presented with their respective frequencies, the idea is to measure the same
as the difference between the mid-values of the two marginal classes.
Consider the following table:

The required Range is 54.5 – 4.5 = 50 or the observations on the variable are found
scattered within 50 units.

ADVERTISEMENTS:

It is to be noted that any change in marginal values or the classes of the variable in the
series given will change both the absolute and the percentage values of the Range.

At times of necessity, we express the relative value of the Range without


computing its absolute value and there we use the formula below
Relative value of the Range = Highest value – Lowest value/Highest value + Lowest
value

In our first example the relative value of the

Range is = 12-1/12+1 = 11/13 = 0.84

The ‘Range’, as a measure of ‘Dispersion’, has a number of advantages and disadvantage.


The concept of Range is, no doubt, simple and easy enough to calculate, specially when
the observations are arranged in an increasing order. But the main disadvantage is that
it is calculated only on the basis of the highest and the lowest values of the variable
without giving any importance to the other values.

ADVERTISEMENTS:
Therefore, the result can only be influenced with changes in those two values, not by any
other value of the variable. Again, in the case of a complex distribution of a variable with
respective frequencies, it is not much easy to calculate the value of Range correctly in
the above way. For all these reasons.

Range as a measure of the variability of the values of a variable, is not widely accepted
and spontaneously prescribed by the Statisticians of today However, it is not totally
rejected even today as it has certain traditional accept abilities like representing
temperate variations in a day by recording the maximum and the minimum values
regularly by the weather department, while imposing controlling measures against wide
fluctuations in the market prices of the essential goods and services bought and sold by
the common people while imposing Price-control and Rationing measures through
Public Sector Regulations, mainly to protect interests of both the buyers and sellers
simultaneously.

2. Quartile Deviation: While measuring the degree of variability of a variable


Quartile Deviation is claimed to be another useful device and an improved one in the
sense it gives equal importance or weightage to all the observations of the variable. Here
the given observations are classified into four equal quartiles with the notations Q 1, Q2,
Q3 and Q4. The average value of the difference between the third and the first quartiles is
termed as the Quartile Deviation.
Symbolically, we write:
Quartile Deviation = Q3 –  Q1/2
Through this measure it is ensured that at least 50% of the observations on the variable
are used in the calculation process and with this method the absolute value of the
Quartile Deviation can easily be measured.

ADVERTISEMENTS:
For determining the proportionate Quartile Deviation, also called the
Coefficient of Quartile Deviation, we use the following formula:
Coefficient of Quartile Deviation:

Consider the following examples:


Example 1:
Calculate the Quartile Deviation and Co-efficient of Quartile Deviation from
the following data:
ADVERTISEMENTS:

8, 10, 12, 14, 16, 18, 20.

Solution:
Here, n = 7, the first and third quartiles are:

Example 2:
Determine the QD and CQD from the following grouped data:

Solution:
In order to determine the values of QD and Co-efficient of QD Let us
prepare the following table:
Grouped frequency distribution of X with corresponding cumulative frequencies (Fα).

Relative measures of Dispersion:


While studying the variability of the observations of a variable, we usually use the
absolute measures of dispersion namely the Range, Quartile deviation. Mean deviation
and Standard deviation. But, the results of such measures are obtained in terms of the
units in which the observations are available and hence they are not comparable with
each other. Moreover, the results of the absolute measure gets affected by the number of
observations obtainable on the given variable as they consider only the positive
differences from their central value (Mean/Median).

To eliminate all these deficiencies in the measurement of variability of the observations


on a variable, we accept and introduce in respective situations the very concept of the
Relative measures of dispersion as they are independent of their own units of
measurement and hence they are comparable and again can be examined under a
common scale when they are expressed in unitary terms. Here lies the superiority of the
Relative Measures over the Absolute Measures of dispersion.

The usual Relative Measures of Dispersion are:

Among these four coefficients stated above the Coefficient of Variation is widely
accepted and used in almost all practical situations mainly because of its accuracy and
hence its approximation to explain the reality.

Example 1:
Determine the Coefficient of Range for the marks obtained by a student in
various subjects given below:

Solution:
Here, the highest and the lowest marks are 52 and 40 respectively.

Hence, Range = 52 – 40 = 12.


The Coeff. of Range = 12/52+40×100

Example 2:
The performances of two Batsmen S and R in five successive one-day cricket matches
are given below.

Identify the batsman who is more consistent:

Solution:
Here, we can use ‘Coefficient of Variation’ as the best measure of dispersion to identify
the more consistent one having lesser variation. As the components of CV, we are to
derive first the Mean and the Standard Deviation of the scores obtained by the two
Batsmen separately using the following usual notations:

Let us prepare the following table for finding out Mean and SD of the given
information:
For the cricketer S the Coefficient of Variation is smaller and hence he is more
consistent.

Example 3:
Calculate the Coefficient of Quartile Deviation from the following data:

Solution:
To calculate the required CQD from the given data, let us proceed in the
following way:
Example 4:
Compute the Coefficient of Mean-Deviation for the following data:

To calculate the coefficient of MD we take up the following technique.

Solution:
Calculation for the Coefficient of Mean-Deviation.
The variance is a measure of variability. It is calculated by taking the average of
squared deviations from the mean. Variance tells you the degree of spread in your data
set. The more spread the data, the larger the variance is in relation to the mean.

The standard deviation is a statistic that measures the dispersion of a dataset relative


to its mean and is calculated as the square root of the variance. ... If the data points are
further from the mean, there is a higher deviation within the data set; thus, the more
spread out the data, the higher the standard deviation

Probability is the measure of the likelihood that an event will occur in a Random
Experiment. Probability is quantified as a number between 0 and 1, where, loosely
speaking, 0 indicates impossibility and 1 indicates certainty. The higher
the probability of an event, the more likely it is that the event will occur

Basic Probability Rules


 Probability Rule One (For any event A, 0 ≤ P(A) ≤ 1)
 Probability Rule Two (The sum of the probabilities of all possible outcomes is 1)
 Probability Rule Three (The Complement Rule)
 Probabilities Involving Multiple Events.
 Probability Rule Four (Addition Rule for Disjoint Events)
 Finding P(A and B) using Logic.

.
The addition rule for probabilities describes two formulas, one for the probability for
either of two mutually exclusive events happening and the other for the probability of
two non-mutually exclusive events happening. The first formula is just the sum of
the probabilities of the two events.
What is the Addition Rule for Probabilities?

Given multiple events, the addition rule for probabilities is used to compute the
probability that at least one of the events happens. Probability can be defined as the
branch of mathematics that quantifies the certainty or uncertainty of an event or a set of
events.
 

Related Concepts

Before understanding the addition rule, it is important to understand a few simple


concepts:

 Sample space: It is the set of all possible events. For example, when flipping a
coin, the sample space is {Heads, Tails} because heads and tails are all the
possible outcomes.
 Event: In probability, an event is defined as a particular outcome. For example,
flipping a coin and getting heads is an event.
 Mutually exclusive events: They are events such that if one occurs, the other
cannot occur. Again, in the coin example, if we get heads, we cannot get tails.
Hence, the two are mutually exclusive events.
 Mutually exhaustive events: Events that together encompass the entire sample
space. In case of flipping a coin, getting heads and getting tails are mutually
exhaustive as the entire sample space is {Heads, Tails}.
 Independent events: Events that occur independently of each other. For
example, when flipping two coins, the outcome of the second coin is
independent of the outcome of the first coin.

The formula to compute the probability of two events A and B is given by:

Where:

 P(A ∪ B) – Probability that either A or B happens


 P(A) – Probability of Event A
 P(B) – Probability of Event B
 P(A ∩ B) – Probability of A and B happening together

The following Venn diagram illustrates how and why the formula works:


 

As shown above, we subtract the P(AB) term because it would be counted twice when
adding P(A) and P(B).

Calculating P(A ∩ B)

The probability of events A and B both happening – P(A ∩ B) – can be easily calculated if
the events are independent of each other by multiplying the two probabilities P(A) and
P(B) as shown below:

If A and B are independent events, then:

 
If events A and B are not independent of each other, the probability can be inferred
from the nature of the events, or it is otherwise difficult to determine.

Mutually Exclusive Events

In case of mutually exclusive events, the probability of both events occurring at once is
zero by definition because if one occurs, the other event cannot. Hence, for mutually
exclusive events A and B, there is:

Note the fact that mutually exclusive events are not independent because if both P(A)
and P(B) are non-zero probabilities, then P(AB) = P(A) * P(B) cannot be zero. In fact, by
their very definition of mutually exclusive events, they depend on the other event not
occurring. The diagram below illustrates the concept:

 
 

Numerical Example

Let’s move on to a numerical example that illustrates the concept. Assume two
independent events, A and B. Let P(A) = 0.6 and P(B) = 0.4. Then P(A ∪ B) is given by:

 P(A) = 0.6
 P(B) = 0.4

P(A ∩ B) = P(A) * P(B) = 0.6 * 0.4 = 0.24

P(A ∪ B) = P(A) + P(B) – P(AB) = 0.6 + 0.4 – 0.24 = 0.76

Hence, P(A ∪ B) is 76%.

Conditional probability is the probability of one event occurring with some relationship to

one or more other events. For example:


 Event A is that it is raining outside, and it has a 0.3 (30%) chance of raining today.
 Event B is that you will need to go outside, and that has a probability of 0.5 (50%).
A conditional probability would look at these two events in relationship with one another,
such as the probability that it is both raining and you will need to go outside.
The formula for conditional probability is:
P(B|A) = P(A and B) / P(A)
which you can also rewrite as:
P(B|A) = P(A∩B) / P(A)

Example 1
In a group of 100 sports car buyers, 40 bought alarm systems, 30 purchased bucket
seats, and 20 purchased an alarm system and bucket seats. If a car buyer chosen at
random bought an alarm system, what is the probability they also bought bucket seats?

Step 1: Figure out P(A). It’s given in the question as 40%, or 0.4.
Step 2: Figure out P(A∩B). This is the intersection of A and B: both happening together.
It’s given in the question 20 out of 100 buyers, or 0.2.
Step 3: Insert your answers into the formula:
P(B|A) = P(A∩B) / P(A) = 0.2 / 0.4 = 0.5.
The probability that a buyer bought bucket seats, given that they purchased an alarm
system, is 50%.

Rules of Probability

Often, we want to compute the probability of an event from the known probabilities of
other events. This lesson covers some important rules that simplify those computations.

Definitions and Notation

Before discussing the rules of probability, we state the following definitions:

 Two events are mutually exclusive or disjoint if they cannot occur at the same


time.
 The probability that Event A occurs, given that Event B has occurred, is called
a conditional probability. The conditional probability of Event A, given Event B,
is denoted by the symbol P(A|B).
 The complement of an event is the event not occurring. The probability that
Event A will not occur is denoted by P(A').
 The probability that Events A and B both occur is the probability of
the intersection of A and B. The probability of the intersection of Events A and B
is denoted by P(A ∩ B). If Events A and B are mutually exclusive, P(A ∩ B) = 0.
 The probability that Events A or B occur is the probability of the union of A and
B. The probability of the union of Events A and B is denoted by P(A ∪ B) .
 If the occurrence of Event A changes the probability of Event B, then Events A and
B are dependent. On the other hand, if the occurrence of Event A does not
change the probability of Event B, then Events A and B are independent.

Rule of Subtraction

In a previous lesson, we learned two important properties of probability:

 The probability of an event ranges from 0 to 1.


 The sum of probabilities of all possible events equals 1.

The rule of subtraction follows directly from these properties.

Rule of Subtraction. The probability that event A will occur is equal to 1 minus the
probability that event A will not occur.

P(A) = 1 - P(A')

Suppose, for example, the probability that Bill will graduate from college is 0.80. What is
the probability that Bill will not graduate from college? Based on the rule of subtraction,
the probability that Bill will not graduate is 1.00 - 0.80 or 0.20.

Rule of Multiplication

The rule of multiplication applies to the situation when we want to know the probability
of the intersection of two events; that is, we want to know the probability that two
events (Event A and Event B) both occur.

Rule of Multiplication The probability that Events A and B both occur is equal to the
probability that Event A occurs times the probability that Event B occurs, given that A
has occurred.

P(A ∩ B) = P(A) P(B|A)


Example
An urn contains 6 red marbles and 4 black marbles. Two marbles are drawn without
replacement from the urn. What is the probability that both of the marbles are black?

Solution: Let A = the event that the first marble is black; and let B = the event that the
second marble is black. We know the following:

 In the beginning, there are 10 marbles in the urn, 4 of which are black. Therefore,
P(A) = 4/10.
 After the first selection, there are 9 marbles in the urn, 3 of which are black.
Therefore, P(B|A) = 3/9.

Therefore, based on the rule of multiplication:

P(A ∩ B) = P(A) P(B|A)


P(A ∩ B) = (4/10) * (3/9) = 12/90 = 2/15 = 0.133

Rule of Addition

The rule of addition applies to the following situation. We have two events, and we want
to know the probability that either event occurs.

Rule of Addition The probability that Event A or Event B occurs is equal to the


probability that Event A occurs plus the probability that Event B occurs minus the
probability that both Events A and B occur.

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

Note: Invoking the fact that P(A ∩ B) = P( A )P( B | A ), the Addition Rule can also be
expressed as:

P(A ∪ B) = P(A) + P(B) - P(A)P( B | A )

Example
A student goes to the library. The probability that she checks out (a) a work of fiction is
0.40, (b) a work of non-fiction is 0.30, and (c) both fiction and non-fiction is 0.20. What is
the probability that the student checks out a work of fiction, non-fiction, or both?
Solution: Let F = the event that the student checks out fiction; and let N = the event that
the student checks out non-fiction. Then, based on the rule of addition:

P(F ∪ N) = P(F) + P(N) - P(F ∩ N)


P(F ∪ N) = 0.40 + 0.30 - 0.20 = 0.50

Test Your Understanding

Problem 1

An urn contains 6 red marbles and 4 black marbles. Two marbles are drawn with
replacement from the urn. What is the probability that both of the marbles are black?

(A) 0.16
(B) 0.32
(C) 0.36
(D) 0.40
(E) 0.60

Solution

The correct answer is A. Let A = the event that the first marble is black; and let B = the
event that the second marble is black. We know the following:

 In the beginning, there are 10 marbles in the urn, 4 of which are black. Therefore,
P(A) = 4/10.
 After the first selection, we replace the selected marble; so there are still 10
marbles in the urn, 4 of which are black. Therefore, P(B|A) = 4/10.

Therefore, based on the rule of multiplication:

P(A ∩ B) = P(A) P(B|A)


P(A ∩ B) = (4/10)*(4/10) = 16/100 = 0.16

Probability Calculator

Use the Probability Calculator to compute the probability of an event from the known
probabilities of other events. The Probability Calculator is free and easy to use. The
Probability Calculator can found in the Stat Trek main menu under the Stat Tools tab. Or
you can tap the button below.

Probability Calculator

Problem 2

A card is drawn randomly from a deck of ordinary playing cards. You win $10 if the card
is a spade or an ace. What is the probability that you will win the game?

(A) 1/13
(B) 13/52
(C) 4/13
(D) 17/52
(E) None of the above.

Solution

The correct answer is C. Let S = the event that the card is a spade; and let A = the event
that the card is an ace. We know the following:

 There are 52 cards in the deck.


 There are 13 spades, so P(S) = 13/52.
 There are 4 aces, so P(A) = 4/52.
 There is 1 ace that is also a spade, so P(S ∩ A) = 1/52.

Therefore, based on the rule of addition:

P(S ∪ A) = P(S) + P(A) - P(S ∩ A)


P(S ∪ A) = 13/52 + 4/52 - 1/52 = 16/52 = 4/13

Probability distributions indicate the likelihood of an event or outcome. ... p(x) = the


likelihood that random variable takes a specific value of x. The sum of
all probabilities for all possible values must equal 1. Furthermore, the probability for a
particular value or range of values must be between 0 and 1.
A normal distribution is the proper term for a probability bell curve. In a normal
distribution the mean is zero and the standard deviation is 1. It has zero skew and a
kurtosis of 3. Normal distributions are symmetrical, but not all
symmetrical distributions are normal.
For example;

The monthly bill of water in the village of Hyderabad are normally distributed and

it has a mean of Rs. 225 and a standard deviation of Rs. 55. Those people spend a

lot of their time in the fields watering the plant. In a group of 500 customers how

many customers can be expected to have a bill of Rs. 100 or less?

Solution:

Normal Distribution is calculated using the formula given below

Z = (X – µ) /∞
 Normal Distribution (Z) = (100 – 225) / 55

 Normal Distribution (Z) = -125 / 55

 Normal Distribution (Z) = -2.27

Estimation statistics is a data analysis framework that uses a combination of

effect sizes, confidence intervals, precision planning, and meta-analysis to plan

experiments, analyze data and interpret results. It is distinct from null hypothesis

significance testing, which is considered to be less informative.

You might also like