0%(3)0% found this document useful (3 votes) 2K views452 pagesDecision S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
DECISION SCIENCE
Author: Ken Black
University of Houston - Clear Lake
NMIMS GLOBAL ACCESS
SCHOOL FOR
NMIMS _ conrinuine Epucation
een ® saosaoe e209Chief Academie Officer
Dz Arun Mohan Sherry
MSc. (Gold Medalist), [Link]. (Computer Science ~ IIT Kharagpur), Ph.D.
NMIMS Global Access - School for Continuing Education
Content Customized by
‘Mr: Kali Charan Sabst
NMIMS Global Access ~ School for Continuing Education
Content Contributors
Abhijit Biswas
&
Lokesh Payasi
Copyright
2019 Publish
Iss
978-81-265-7826-8
‘Address:
4496/7, Ansari Road, Daryaganj, New Dethi-110002
Onis for
NMIMS Global Access - Sehool for Continuing Education School
Address VL. Mehta Road, Vile Parle (W), Mumbai - 400 056, India.
preeoee oe Continuing EducationUU Ces UU
Ken Black is currently professor of decision sciences in the School
of Business at the University of Houston-Clear Lake. He eared
a bachelor’s degree in mathematics from Graceland University, a
master’s degree in math education from the University of Texas at
El Peso, a Ph.D. in business administration(management science), and.
a Ph.D. in educational research from the University of North Texas.
Since joining the faculty of UHCL in 1979, Professor Black has taught
all levels of statistics courses, forecasting, management science,
market research, and produetion/operations management. He has
published over 20 journal articles and 20 professional papers, as well
as two textbooks: Business Statistics: An Introductory Course and
Business Statistics for Contemporary Decision Making.CHAPTER NO. CHAPTER NAME PAGE NO.
1 Introduction to Statistics 1
Charts and Graphs 22
3 Descriptive Statistios 50
4 Probability, 97
5 Probability Distributions 150
‘Sampling and Sampling Distributions,
7 Corvelation and Simple Regression Analysis 268
8 Multiple Regression Analysis 303
9 ‘Time-Series Forecasting 882
10 Decision Analysis 883
uw Case Studies 418
RB Appendix A Online) aT
eeeDECISION SCIEN!
CT ee
Decision Science
Introduction to Statistics: Statistics in Business, Basie Statistical Coneepts, Variables and Data,
Data Measurement.
Data Visualization: Frequency Distributions, Quantitative Data Graphs, Qualitative Data Graphs,
Charts and Graphs for Two Variables.
Descriptive Statistics: Measures of Central Tendency: Ungrouped Data, Measures of Variability:
‘Ungrouped Data, Measures of Central Tendency and Variability: Grouped Data, Measures of Shape.
Basics of Probability: Introduction to Probability, Methods of Assigning Probabilities, Structure of
Probability, Marginal probability, Union, Joint Probabilities, Addition Laws, Multiplication Laws,
Conditional Probability, Bayes’ Rule.
Probability Distributions: Discrete Versus Continuous Distribution, Binomial Distribution, Pois-
son Distribution, Hypergeometric Distribution, The Uniform Distribution, Normal Distribution,
using Normal Curve to Approximate Binomial Distribution, Exponential Distribution.
e Sampling Techniques and Sampling Distributions: Introduction to sampling, Reasons for Sam- e
pling, Random Versus Non-random Sampling, Sampling Error, Non-sampling Errors, Sampling
Distribution of “sample mean” and “Sample proportion’.
Correlation and Simple Regression: Correlation analysis, Introduction to Simple Regression
Analysis, Equation of the Regression Line, Coefficient of Determination.
Multiple Regression Analysis: Multiple Regression Model with Two Independent Variables,
Determining the Multiple Regression Equation, Coefficient of Multiple Determination, Adjusted R®,
Interpreting Multiple Regression Computer Output.
Forecasting Techniques: Introduction to Forecasting, Time-Series Components, Measurement of
Forecasting Error, Smoothing Techniques, Trend Analysis, Autocorrelation and Autoregression,
Decision Analysi
making under risk.
Decision making under certainty, Decision making under uncertainty, Decision
PeINTRODUCTION TO STATISTICS
conTENTS
a Introduction
12 | Basie Statistical Concepts
Self Arsessment Questia
Activity
13 | Data Measuremesit
tsa | Nominal Level
® 13.2 Ordinal Level ®
133 | Interval Level
isa | RatioLevel
125 | ~ Comparison ofthe Four Levels of Data
Self Assessment Questions
Activity
v@lE==
15 | Deserptive Questions
16 | Solutions for Descriptive Questions
13 | Answers and HintsPe eet et
BI) TSO A
STATISTICS DESCRIBE THE STATE OF BUSINESS
IN INDIA'S COUNTRYSIDE
India is the second largest country in the world with more than
1.25 billion people. More than 70% of the people live in rural areas seat-
tered about the countryside in 650,000 villages. Infact, it can be said that
one in every 10 people in the world live in rural India, While it has a per
capita income of less than $1 (US.) per day, rural India, which has been,
described in the past as poor and semi-iliterate, now contributes to
about one-half of the country’s gross national product (GNP). However,
rural India still has the most households in the world without electricity,
over 300,000.
Despite its poverty and economic disadvantages, there are eompelling
reasons for companies to market their goods and services to rural India.
‘The market of rural India has been growing at five times the rate of
the urban India market. There is increasing agricultural productivity
leading to growth in disposable income, and there is a reduction in the
gap between the tastes of urban and rural customers. The literacy level
is inereasing, and people are becoming more conscious about their
styles and opportunities for a better life.
Around 60% of alll middle-income households in India are in rural
® areas and mote than one-third of all rural households in India now ®
have a main source of income other than farming. Virtually every home
has a radio, about one-third have a television, and more than one-half
of rural households benefit from banking services. Forty-two percent
of the people living in India’s villages and small towns use toothpastes
and that proportion is increasing as rural income rises and as there is
‘heater awareness about oral hygiene.
Jn rural India, consumers are gaining more disposable income due to
‘the movement of manufacturing jobs to rural areas. It is estimated that
nearly 75% of the factories that opened in India in the past decade were
‘builtin rural areas. Products that are doing well in sales to people in rural
India include televisions, fans, bieycles, bath soap, two- or three-wheel-
ers, cars, and many others. According to MART, a New Delhi-based
research organization, rural India buys 46% of all soft drinks and 49% of
motoreycles sold in India. Because of such factors, many US. and Indian
firms such as Microsoft, General Electric, Kellogg's, Colgate-Palmolive,
Idea Cellular, Hindustan Lever, Godre), Nirma Chemical Works, Novartis,
Dabur, Tata Motors, and Vodafone India have entered the rural Indian
market with enthusiasm. Marketing to rural customers often involves
‘persuading them to try and to adopt products that they may not have used
‘before, Rural India is a huge, relatively untapped market for businesses.
However, entering such a market is not without risks and obstacles, The
dilemma facing companies is whether to enter this marketplace and, if 20,
to what extent and how,SEs oe tene Ere tec LCce)
‘Source: Adapted from “Marketing to Rural dia: Mokang the Bnd» Most." March 82007,
1n India Knowledge @ Wharton, hitp:/[Link]/aructocfn?
‘Stislehd=4172;"Rural Segment Quickly Catching Up", Sotamber 3015 [BEF Undia rand
‘Equity Fourdation) at: [Link]: “Unlocking the
‘Wealth in Rural Markets" June 2014, Harenrd Busine Bevin at bpd abr crg/01 08)
‘unlocking the veal in sural markets, "Much of Rural Ini Sill Wate for Elocteity”,
atober 2013, Univery of Washington, at: arse. [Link]/nows/201310;muah”
ruralindia-stillwatsabectricty
[NMIMS Global Accent - Schoo! for Continuing EducationNOTES
The primary objective of Chapter 1 is to introduce you to the world of
statistics, thereby enabling you to:
>= List quantitative and graphical examples of statisties within a busi-
ness context.
2 Define important statistical terms, including population, sample,
and parameter, as they relate to descriptive and inferential statistics.
Explain the difference between variables, measurement, and data,
Compare the four diffarent levels of data: nominal, ordinal, interval,
sntsston Fd
E 7” INTRODUCTION
Every minute of the working day, decisions are made by businesses around
the world that determine whether companies will be profitable and growing
or whether they will stagnate and die. Most of these decisions are made with
the assistance of information gathered about the marketplace, the economic
and financial environment, the workforce, the competition, and other factors
‘Such information usually comes in the form of data or is accompanied by
® data, Business statistes provides the tool through which such data are collected, e
analyzed, summarized, and presented to facilitate the decision-making
process, and business statistics plays an important role in the ongoing saga
of decision making within the dynamic world of business.
Virtually every area of business uses statistics in decision making. Here are
some reeant examples:
According to a national survey of independent business owners con-
ducted by the Institute for Local Self-Reliance in partnership with the
Advocates for Independent Business coalition, when asked “Which two
public policy changes would most help your business?” (retailers only),
40% said “Pass the Marketplace Fairness Act” and 38% said “Cap Credit
Card Swipe Fees”.
2 A survey of 1465 workers by Hotjobs reports that 55% of workers believe
that the quality of their work is pereeived the same when they work
remotely as when they are physically in the office.
A survey of 477 executives by the Association of Executive Search
Consultants determined that 48% of men and 67% of women say they are
more likely to negotiate for less business travel compared with five years
ago.
2A global Family Business Survey of 2,278 respondents sponsored by PwC
reported that 65% of family businesses reported growth in the last twelve
months and 49% of respondents are apprehensive about their ability to
recruit skilled staff in the next twelve months,
A Deloitte Retail “Green" survey of 1080 adults revealed that 54% agreed
that plastic, non-compostable shopping bags should be banned.INTRODUCTION TO STATISTICS 5
NOTES
A study of consumer electronics spending by a 2,500 member on-line
panel of the NPD group showed that consumers expect to spend $555, on
average, per person on new consumer electronics devices this year.
You can see from these few examples that there is a wide variety of uses and
applications of statisties in business. Note that in most of these examples,
business researchers have conducted a study and provided us rich and inter-
esting information.
In this text we will examine several types of graphs for depicting data as we
study ways to arrange or structure data into forms that are both meaningful
and useful to decision makers. We will learn about techniques for sampling
from a population that allow studies of the business world to be conducted
more inexpensively and in a more timely manner. We will explore various
‘ways to forecast future values and examine techniques for predicting trends.
These and many other exciting statistics and statistical techniques await us
on this journey through business statisties, Let us begin.
BASIC STATISTICAL CONCEPTS,
Business statistics, like many areas of study, has its own language. Itisimport-
ant to begin our study with an introduction of some basic concepts in order to
understand and communicate about the subject. We begin with a discussion
of the word statistics. The word statistics has many different meanings in our
® culture, Webster's Third New International Dictionary gives a comprehensive ®
definition of statisties as a science dealing with the collection, analysis, inter-
pretation, and presentation of numerical data. Viewed from this perspective,
statistics includes all the topics presented in this text. Figure 11 graphically
displays the key elements of statistics.
The study of statisties can be organized in a variety of ways. One of the
main ways is to subdivide statistics into two branches: descriptive statistics
and inferential statistics. To understand the difference between deseriptive
and inferential statistics, definitions of population and sample are helpful.
Webster's Third New International Dictionary defines population as a collec-
tion of persons, objects, or items of interest. The population can be a widely
defined category, such as “all automobiles,” or it can be narrowly defined,
such as “all Ford Mustang ears produced from 2014 to 2016.” A population can
Present
Prndings
Figure 1.1: The Key Elements of Statisties
eeNOTES
be a group of people, such as “all workers presently employed by Microsoft,”
or it can be a set of objects, such as “all dishwashers produced on February
2, 2016, by the General Electric Company at the Louisville plant.” The
researcher defines the population to be whatever he or she is studying. When
researchers gather data from the whole population for a given measurement
of interest, they call it a census. Most people are familiar with the “Census
of India”. Every 10 years, the government attempts to measure all persons
living in this eountry.
A sample is a portion of the whole and, if properly taken, is representative
of the whole, For various reasons, researchers often prefer to work with a
sample of the population instead of the entire population, For example, in
conducting quality-control experiments to determine the average life of light-
bulbs, a lightbulb manufacturer might randomly sample only 75 lightbulbs
during a produetion run. Because of time and money limitations, a human
resources manager might take a random sample of 40 employees instead of
using a census to measure company morale.
If a business analyst is using data gathered on a group to describe or reach
conclusions about that same group, the statistics are called descriptive statistics.
For example, ifan instructor produces statistics to summarize a class’s exam-
ination effort and uses those statistics to reach conclusions about that class
only, the statisties are descriptive.
® Many of the statistical data generated by businesses are descriptive, They ®
might include number of employees on vacation during June, average salary
at the Mumbai office, corporate sales for 2016, average managerial satisfac-
tion score on a company-wide census of employee attitudes, and average
return on investment for the Tata Motors for the years 1996 through 2016.
Another type of statistics is called inferential statistics. If a researcher
gathers data from a sample and uses the statistics generated to reach conelu-
sions about the population from whick the sample was taken, the statistics
are inferential statistics. The data gathered from the sample ane used to
infer something about a larger group. Inferential statistics are sometimes
referred to as inductive statistics. The use and importance of inferential
statistics continue to grow.
One application of inferential statistics is in pharmaceutical research. Some
new drugs are expensive to produce, and therefore tests must be limited to
small samples of patients. Utilizing inferential statistics, researchers can
design experiments with small randomly selected samples of patients and
attempt to reach conclusions and make inferences about the population.
‘Market researchers use inferential statistics to study the impact of advertis-
ing on various market segments. Suppose a soft drink company creates an
advertisement depieting a dispensing machine that talks to the buyer, and
market researchers want to measure the impact of the new advertisement on
various age groups. The researcher could stratify the population into age eat-
egories ranging from young to old, randomly sample each stratum, and use
inferential statistics to determine the effectiveness of the advertisement for
the various age groups in the population. The advantage of using inferential
statistics is that they enable the researcher to study effectively a wide range
of phenomena without having to conduct a census,
[NMIMS Global Accent - Schoo! for Continuing EducationINTRODUCTION TO STATISTICS 7
NOTES
A descriptive measure of the population is called a parameter. Parameters
are usually denoted by Greek letters. Examples of parameters are popula-
tion mean (), population variance (@), and population standard deviation
(o) A descriptive measure of a sample is called a statistic. Statistics are usu-
ally denoted by Roman letters. Examples of statisties are sample mean (2),
sample variance (¢), and sample standard deviation (9).
Differentiation between the terms parameter and statistic is important
only in the use of inferential statistics. A business researcher often wants
to estimate the value of a parameter or conduet tests about the parame-
ter However, the calculation of parameters is usually either impossible
or infeasible because of the amount of time and money required to take a
census. In such cases, the business researcher can take a random sample of
the population, caloulate a statistic on the sample, and infer by estimation
the value of the parameter. The basis for inferential statistics, then, is the
ability to make decisions about parameters without having to complete a
census of the population.
For example, a manufacturer of washing machines would probably want to
determine the average number of loads that a new machine can wash before
it needs repairs. The parameter is the population mean or average number of
washes per machine before repair: A company researcher takes a sample of
machines, computes the number of washes before repair for each machine,
averages the numbers, and estimates the population value or parameter
® by using the statistic, which in this case is the sample average. Figure 1.2 €
demonstrates the inferential process.
Inferences about parameters are made under uncertainty. Unless parame-
ters are computed directly from the population, the statistician never knows
with certainty whether the estimates or inferences made from samples are
true, In an effort to estimate the level of confidence in the result of the pro-
cess, statisticians use probability statements. For this and other reasons, part
of this text is devoted to probability (Chapter 4).
Caleulate x10 estate
Population
“
(parameter)
Selecta random sample from
the population
Figure 1.2: The Inferential Process
Serre ey enn ery ee ernNOTES
Business statistics is about measuring phenomena in the business world
and organizing, analyzing, and presenting the resulting numerical informa-
tion in such a way such that better, more informed business decisions can
‘be made, Most business statisties studies contain variables, measurements,
and data,
In business statistics, a variable is a characteristic of any entity being studied
that is eapable of taking on different values. Some examples of variables in busi-
ness might include return on investment, advertising dollars, labor productiv-
ity, stock price, historical cost, total sales, market share, age of worker, earnings
per share, miles driven to work, time spent in store shopping, and many, many
others, In business statistics studies, most variables produce a measurement
that can be used for analysis. A measurement is when @ standard process is
used to assign numbere to particular attributes or characteristics of a variable.
‘Many measurements are obvious, stich as time spent in a store shopping by a
customer, age of the worker, or the number of miles driven to work. However,
some measurements, such as labor productivity, customer satisfaction, and
veturn on investment, have to he defined by the business researcher or by
experts within the field, Once such measurements are recorcled and stored,
they can be denoted as “data.” It ean be said that data are recorded measure-
‘ments. The processes of measuring and data gathering are basic to all that we
do in business statistics. is data that are analyzed by a business statistician in
order to learn more about the variables being studied. Sometimes, sets of data
are organized into databases as a way to store data or as a means for more con-
veniently analyzing data or comparing variables. Valid data are the lifeblood
of business statistics, andl it is important that the business researcher give
thoughtful attention to the creation of meaningful, valid data before embark:
ing on analysis and reaching conclusions.
Fillin the blanks:
1 is a collection of persons, objects, or items of interest.
2, When researchers gather data from the whole population for a given
measurement of interest, then it’s called a .
3. Inforential statistics are sometimes referred to a8
State whether the following statements are true|false:
4, Data interpretation is a key element of statistics.
5. Researchers often prefer to work with the population instead of the
entire sample.
“International Labor Database” containing the civilian unem-
ployment rates in percent from seven countries presented yearly over a
40-year period. The countries are the United States, Canada, Australia,
Japan, Franee, Germany, and Italy. Prepare a comparative report based
on your study.
[NMIMS Global Accent - Schoo! for Continuing EducationINTRODUCTION TO STATISTICS 9
NOTES
El DATA MEASUREMENT
Millions of numerical data are gathered in businesses every day, repre-
senting myriad items, For example, numbers represent costs of items pro-
duced, geographical locations of retail outlets, weights of shipments, and
rankings of subordinates at yearly reviews. All such data should not be
analyzed the same way statistically because the entities represented by
the numbers are different. For this reason, the business researcher needs
to know the level of data measurement represented by the numbers being
analyzed.
The disparate use of numbers can be illustrated by the numbers 40 and 80,
which could represent the weights of two objects being shipped, the ratings
received on a consumer test by two different products, or football jersey
numbers of a fullback and a wide receiver: Although 80 pounds is twiee a8
much as 40 pounds, the wide receiver is probably not twice as big as the
fullback! Averaging the two weights seems reasonable, but averaging the
football jersey numbers makes no sense. The appropriateness of the data
analysis depends on the level of measure-
ment of the data gathered. The phenomenon Highest Leveof Dats Measurement
represented by the numbers determines the A
level of data measurement. Four common i= |
levels of data measurement follow.
jh sone (Se) *
2 oni =
4. Ratio Cas
ment followed by ordinal, interval, and ratio.
Ratio is the highest level of data measure Figure 1.5: Hierarchy of
ment, as shown in Figure 1.3. Levels of Data
1.3.1 NOMINAL LEVEL,
The lowest level of data measurement is the nominal level. Numbers repre-
senting nominal-level data (the word level often is omitted) ean be used only
to classify or categorize. Employee identification numbers are an example
of nominal data, The numbers are used only to differentiate employees and
not to make a value statement about them. Many demographic questions in
surveys result in data that are nominal because the questions are used for
classification only. The following is an example of such a question that would
result in nominal data:
Which of the following employment classifications best describes your area
of work?
1. Educator
2. Construction worker
3. Manufacturing worker
4. Lawyer
eeNOTES
5. Doctor
6. Other
‘Suppose that, for computing purposes, an educator is assigned a 1, a construe-
tion worker is assigned a 2, a manufacturing worker is assigned a 3, and so
on, These numbers should be used only to elassify respondents. The number
1 does not denote the top classification. It is used only to differentiate an edu-
ator (1) from a lawyer (4).
‘Some other types of variables that often produce nominal-level data are sex,
religion, ethnicity, geographic location, and place of birth. Social Security
numbers, telephone numbers, employee ID numbers, and ZIP code num-
bers are further examples of nominal data. Statistical techniques that are
appropriate for analyzing nominal data are limited. However, some of the
more widely used statistics, such as the chi-square statistic, can be applied to
nominal data, often producing useful information.
1.3.2 ORDINAL LEVEL.
Ordinal-level data measurement is higher than the nominal level. In
addition to the nominal-level capabilites, ordinal-level measurement ean
be used to ranke or order people or objects, For example, using ordinal
dlata, a supervisor can evaluate three employees by ranking their produc-
tivity with the numbers 1 through 3. The supervisor could identify one
® employee as the most productive, one as the least productive, and one as e
somewhere between by using ordinal data, However, the supervisor could
not use ordinal data to establish that the intervals between the employ-
5 ranked 1 and 2 and between the employees ranked 2 and 3 are equal;
thatis, she eould not say that the differences in the amount of productivity
botween workers ranked 1,2, and 3 are necessarily the same. With ordinal
data, the distances or spacing represented by consecutive numbers are
not always caval.
Some questionnaire Likert-type seales are considered by many researchers
to be ordinal in level. The following is an example of one such scale:
‘This computer tutorial is _ ee
‘not somewhat moderately very extremely
hhejpfil helpful helpful helpfal_—_—elpful
1 2 3 4 5
When this survey question is coded for the computer, only the numbers 1
through 5 will remain, not the adjectives. Virtually everyone would agree
that a5 is higher than a4 on this scale and that ranking responses is possible.
However, most respondents would not consider the differences between not
helpful, somewhat helpful, moderately helpful, very helpful, and extremely
helpful to he equal.
‘Mutual funds as investments are sometimes rated in terms of risk by using
measures of default risk, currency risk, and interest rate risk. These three
‘measures are applied to investments by rating them as having high, medium,
and low risk, Suppose high risk is assigned a 3, medium risk a 2, and low
risk a 1. Ifa fund is awarded a 3 rather than 2, it carries more risk, and so
on. However, the differences in risk between categories 1, 2, and 3 are not
necessarily equal. Thus, these measurements of risk are only ordinal-levelNye Lee
NOTES
measurements. Another example of the use of ordinal numbers in business is
the ranking of the top 50 most admired companies in Fortune magazine. The
numbers ranking the companies are only ordinal in measurement. Certain
statistical techniques are speeifically suited to ordinal data, but many other
techniques are not appropriate for use on ordinal data. For example, it does
not make sense to say that the average of “moderately helpful” and “very
helpful” is “moderately helpful and a half.”
Because nominal and ordinal data are often derived from imprecise mea-
surements such as demographic questions, the categorization of people or
objects, or the ranking of items, nominal and ordinal data are nonmetric data
and are sometimes referred to as qualitative data.
1.3.3 INTERVAL LEVEL
Interval-level data measurement is the nezt to the highest level of data in
which the distances between consecutive numbers have meaning and the data
are always numerical. The distances represented by the differences between
consecutive numbers are equal; that is, interval data have equal intervals.
An example of interval measurement is Fahrenheit temperature. With
Fahrenheit temperature numbers, the temperatures can be ranked, and the
amounts of heat between consecutive readings, such as 20°, 21°, and 22", are
the same.
® In addition, with interval-level data, the zero point is a matter of convention ®
or convenience and not a natural or fixed zero point. Zero is just another
point on the scale and does not mean the absence of the phenomenon. For
example, zero degrees Fahrenheit is not the lowest possible temperature,
Some other examples of interval-level data are the percentage change in
employment, the percentage return on a stock, and the dollar change in
stock price.
1.34 RATIO LEVEL
Ratio-level data measurement is the highest level of data measurement. Ratio
data have the same properties as interval data, but ratio data have an absolute
ero, and the ratio of two numbers is meaningful. The notion of absolute zero
means that zero is fixed, and the zero value in the data represents the absence
of the characteristic being studied. The value of zero cannot be arbitrarily
assigned because it represents a fixed point. This definition enables the stat-
istician to create ratios with the data.
Examples of ratio data are height, weight, time, volume, and Kelvin tem-
perature. With ratio data, a researcher can state that 180 pounds of weight
is twiee as much as 90 pounds or, in other words, make a ratio of 180:90,
Many of the data measured hy valves or gauges in industry are ratio data,
Other examples in the business world that are ratio level in measurement are
production eyele time, work measurement time, passenger miles, number of
trucks sold, complaints per 10,000 fliers, and number of employees.
Because interval-and ratio-level data are usually gathered by precise instru-
ments often used in production and engineering processes, in national stan-
dardized testing, or in standardized accounting procedures, they are called
metric data and are sometimes referred to as quantitative data.
eeNOTES
1.3.5 COMPARISON OF THE FOUR LEVELS OF DATA
Figure 1.4 shows the relationships of the usage potential among the four
levels of data measurement. The concentric squares denote that each higher
level of data can be analyzed by any of the techniques used on lower levels of
data but, in addition, can be used in other statistical techniques. Therefore,
ratio data can be analyzed by any statistical technique applicable to the other
three levels of data plus some others.
‘Nominal data are the most limited data in terms of the types of statistical
analysis that can be used with them. Ordinal data allow the researcher to
perform any analysis that can be done with nominal data and some addi-
tional analyses. With ratio data, a statistician can make ratio comparisons
and appropriately do any analysis that ean be performed on nominal, ordinal,
orinterval data, Some statistical techniques require ratio data and cannot be
used to analyze other levels of data,
Statistical techniques can be separated into two categories: parametric
statistics and nonparametric statisties, Parametric statisties require that
data be interval or ratio, If the data are nominal or ordinal, nonparametric
statisties must be used, Nonparametrie statistics ean also be used to analyze
interval or ratio data,
Figure 1.5 contains a summary of metrie data and nonmetrie data.
+ Higher Level Data
‘Ordinal + Interval and Ratio
+ Lower Level ata
+ Nominal and Ordinal
Qualitative Data
+ Must Use Nonparamettic Satstics
Figure 1.4: Usage Potential
of Variotis Levels of Data Figure 1.5: Metric vs. Nonmetrie Data
DEMONSTRATION PROBLEM L1
[Because of ineransed compatition for patients among providers snd the need
Ladmin-
to determine how providers can better serve their clientele, hos
istratore sometimes administer s quality satisfaction survey to their patients
after the patient is released, ‘The following types of questions are sometimes
scked on auch » survey, These questions will result in what level of data
measurement?
1. How long ago were you released from the hospital?
2. Which type of unit were you in for most of your stay?
_Coronary care
Intensive care
[NMIMS Global Accent - Schoo! for Continuing EducationNN Se Lea ORO
NOTES
_Maternity eare
_Medical unit
_Pediatrie/children's unit
_Surgical unit
3. In choosing a hospital, how important was the hospital's location?
(circle one)
Very Somewhat Not Very Not at All
Important Important Important Important
4. What was your body temperature when you were admitted to the
hospital?
5. Rate the skill of your doctor:
Excellent _VeryGood _Good _Fair Poor
Solution: Question 1 is a time measurement with an absolute zero and is
therefore ratio-level measurement. A person who has been out of the hospi-
tal for two weeks has been out twice as long as someone who has been out of
the hospital for one week.
Question 2 yields nominal data because the patient is asked only to cat-
egorize the type of unit he or she was in. This question does not require
® « hierarchy or ranking of the type of unit, Questions 3 and 5 are likely to ®
result in ording
in these two questions. For question 3, "very important” might be assigned
a4, “somewhat important” « 3, “not very important” « 2, and “not at all
important” a 1. Certainly, the higher the number, the more important is the
hospital's location, Thus, these responses can be ranked by selection. How-
ever, the increases in importance from 1 to 2 to 3 to 4 are not necessarily
‘equal. This same logic applies to the numeric values sssigned in question
5. In question 4, body temperature, if measured on a Fahrenheit or Celsius
seale, is interval in measurement.
vel dats, Suppose a number is assigned the descriptors
Fillin the blanks:
6. is the lowest level of data measurement.
7. Fahrenheit scale is example of .
8. ig the highest level of data measurement.
State whether the following statements are true/false:
9. Ordinal-level data measurement is higher than the nominal level.
10. Ordinal data are nonmetric data and are sometimes referred to as
quantitative data.
11. _Interval-Jevel data many not be always numerical.
PeresNOTES
From the “Ministry of Statistics and programme Implementation”,
Government of India website dawnload the “Energy Statisties” report (use
http:/[Link]/sites/default/files/publication_reports/Energy_
Statisties_2018.paf to download the report). In the report, identify one
variable each for the four common levels of data measurement,
Eg SUMMARY
Statistics is an important decision-making tool in business and is used
in virtually every area of business, In this course, the word statietice is
defined as the science of gathering, analyzing, interpreting, and present-
jing numerical data.
The study of statistics can be subdivided into two main areas: descriptive
statistics and inferential statistics, Deseriptive statisties result from gath-
ering data from a body, group, or population and reaching conclusions
only about that group, Inferential statistics are generated by gathering
sample data from a group, body, or population and reaching conclusions
about the larger group from which the sample was drawn.
A Most business statisties studies contain variables, measurements,
and data. A variable is a characteristic of any entity being studied that is
capable of taking on different values. Examples of variables might inelude
‘monthly household food spending, time between arrivals at a restaurant,
and patient satisfaction rating. A measurement is when a standard pro-
‘cess is used to assign numbers to particular attributes or characteristics
of a variable. Measurements on monthly household food spending might
bbe taken in dollars, time between arrivals might be measured in minutes,
and patient satisfaction might be measured using a 5-point scale. Data
are recorded measurements, Itis data that are analyzed by business stat-
isticians in order to learn more about the variables being studied.
The appropriate type of statistical analysis depends on the level of data
‘measurement, which can be (1) nominal, (2) ordinal, (3) intereal, or (4)
ratio, Nominal is the lowest level, representing classification only of such
data as geographic location, sex, or octal Security number. The next level
is ordinal, which provides rank ordering measurements in which the inter-
vals hetween consecutive numbers do not necessarily represent equal dis-
tances. Interval is the next to highest level of data measurement in which
the distances represented by consecutive numbers are equal. The highest
level of data measurement is ratio, which has all the qualities of interval
‘measurement, but ratio data contain an absolute zero and ratios between
numbers are meaningful. Interval and ratio data sometimes are called
‘metric or quantitative data. Nominal and ordinal data sometimes are called
nonmetric or qualitative data.
2 Two major types of inferential statisties are (1) parametrie statistics and
(2) nonparametric statistics. Use of parametric statistics requires interval
or ratio data and certain assumptions about the distribution of the data.
‘The techniques presented in this text are largely parametric. If data are
only nominal or ordinal in level, nonparametric statisties must be used.
[NMIMS Global Accent - Schoo! for Continuing EducationNN Se Lea ORO
NOTES
LES
1. Consus: The process of gathering data frm the whole population fora
given measurement of interest ina Conus
2, Variable: Variable i a characteristic of any entity being studied that is
‘spable of taking on diffrent vali,
8. Population: population i collection of persons, objects, o items of
interes.
4. Parameter: A descriptive measure ofthe population sealed a
parameter.
5. Sample: A sample is a portion of the whole and, if properly taken, i
representative of the whole
6, Statistic: A descriptive measure ofa sample is called statistic
1. Measureme
assign numbers to particular atributes or charactersis of avarable
8, Data: Data are recorded measurements
9. Descriptive statistics: 1F' business analyst uses data gathered on 2
troup to describe or reach conclusions abou! that safe group the satis:
ths are called descriptive statistics,
10, tnferentil statistics: If researcher gathers dat ffm » sample and
Uses the satisties generated to reach conchitins about the population
® from which the sample was taken, the statistics are inferential statistics. @
mesaurement is when a standard process is used to
11. Nominal-level data: The lowest level of data meastirement is the nomi-
nal level. Numbers representing nominal-evel data ean be used only to
classify or categorize.
12, Ordinal-level data: This measurement is higher than the nominal level.
In addition to the nominal level capabilities, ordinal-level measurement
‘can be used to rank or order people o objects.
13, Interval-level data: [s level of data in which the distances between
consecutive numbers have meaning and the data are always numerical.
14. Ratio-Level data: This messurement is the highest level of data measure
ment, Ratio data have the same properties as interval data, but ratio data,
have an absolute zero, and the ratio of two numbers is meaningful.
Ea DESCRIPTIVE QUESTIONS
1.1. Give a specific example of data that might be gathered from each
of the following business disciplines: accounting, finance, human
resources, marketing, information systems, production, and man-
agement. An example in the marketing area might be “number of
sales per month by each salesperson.”
1.2, State examples of data that can be gathered for decision making pur
poses from each of the following industries: manufacturing, insurance,
travel, retailing, communications, computing, agriculture, banking,
and healthcare. An example in the travel industry might be the cost of
business travel per day in various European cities,
PeresNOTES
1.
15.
4d.
18.
. Give an example of descriptive statistics in the recorded music indus-
try. Give an example of how inferential statistics could be used in the
recorded music industry. Compare the two examples. What makes
them different?
‘Suppose you are an operations manager for a plant that manufactures
batteries. Give an example of how you could use descriptive statistics to
make better managerial decisions. Give an example of how you could.
use inferential statistics to make better managerial decisions.
There are many types of information that might help the manager of a
lange department store run the business more efficiently and better under-
stand how to improve sales. Think about this in such areas as sales, cus-
tomers, human resources, inventory, suppliers, etc., and list five variables
that might produee information that could aid the manager in his or her
job. Write a sentence or two deseribing each variable, and briefly discuss
some numerical observations that might he generated for each variable,
. Suppose you are the owner of a medium-sized restaurant in a small
city. What are some variables associated with different aspects of the
‘business that might be helpful to you in making business decisions
about the restaurant? Name four of these variables, and for each vari-
able, briefly describe a numerical observation that might be the result
of measuring the variable.
Classify each of the following as nominal, ordinal, interval, or ratio data.
(a) The time required to produce each tire on an assembly line
(0) The number of quarts of mill family drinks in a month
(©) The ranking of four machines in your plant after they have been
designated as excellent, good, satisfactory, and poor
(a) The telephone area code of clients in the United States
(@) The age of each of your employees
(®) The dollar sales at the local pizza shop each month
(g) An employee's identification number
(h) The response time of an emergency unit
‘The Rathburn Manufacturing Company makes eleetrie witing, which,
it sells to contractors in the construction industry. Approximately 900
electric contractors purchase wire from Rathburn annually. Rathburn’s
director of marketing wants to determine electric contractors’ satisfac
tion with Rathburn’s wire. He developed a questionnaire that yields
f satisfaction score between 10 and 50 for participant responses.
A random sample of 35 of the 900 contractors is asked to complete a
satisfaction survey. The satisfaction scores for the 35 participants are
averaged to produce a mean satisfaction score,
(a) What is the population for this study?
(b) What is the sample for this study?
(©) What is the statistic for this study?
(@ What would be a parameter for this study?Nye Lee
NOTES
Ea SOLUTIONS FOR DESCRIPTIVE QUESTIONS
1.1. Examples of data in functional areas:
accounting ~ cost of goods, salary expense, depreciation, utility costs,
taxes, equipment inventory, etc.
finance — World bank bond rates, number of failed savings and loans,
‘measured risk of common stocks, stock dividends, foreign exchange
rate, liquidity rates for a single-family, ete.
human resources ~ salaries, size of engineering staff, years experience,
age of employees, years of edueation, ete.
marketing ~ number of units sold, dollar sales volume, forecast sales,
size of sales force, market share, measurement of eonsumer motivation,
measurement of consumer frustration, measurement of brand
preference, aititucle measurement, measurement of consumer risk, ete
information systems — CPU time, size of memory, number of work
stations, storage capacity, percent of professionals who are connected
toa computer network, dollar assets of company computing, number of
“hits” on the Internet, time spent on the Internet per day, percentage of
people who use the Internet, retail dollars spent in e-commeree, etc.
production ~ number of production runs per day, weight of a product;
assembly time, number of defects per run, temperature in the plant,
® amount of inventory, turnaround time, ete. ®
management - messurement of union participation, measurement of
employer support, measurement of tendency to control, number of
subordinates reporting to a manager, measurement of leadership
style, etc.
1.2. Examples of data in business industries:
manufacturing - size of punched hole, number of rejects, amount of
inventory, amount of production, number of production workers, etc.
insurance — number of claims per month, average amount of life
insurance per family head, life expectancy, cost of repairs for major
auto collision, average medical costs incurred for a single female over
45 years of age, ete.
travel — cost of airfare, number of miles traveled for ground transported
vacations, number of nights away from home, size of traveling party,
amount spent per day on besides lodging, etc.
retailing - inventory tumover ratio, sales volume, size of sales force,
number of competitors within 2 miles of retail outlet, area of store,
number of sales people, ete,
communications — cost per minut, number of phones per office, miles
of eable per customer headquarters, minutes per day of long distance
usage, number of operators, time between calls, ete.
computing ~ age of company hardware, cost of software, number of
CAD/CAM stations, age of computer operators, measure to evaluate
competing software packages, size of data base, etc.NOTES
13.
1A.
agriculture ~ number of farms per county, farm income, number of
acres of corn per farm, wholesale price of a gallon of milk, number of
livestock, grain storage capacity, ete.
banking - size of deposit, number of failed banks, amount loaned to
foreign banks, number of tellers per drive-in facility, average amount
of withdrawal from automatic teller machine, federal reserve discount
rate, ete.
healtheare ~ number of patients per physician per day, average cost of
hospital stay, average daily census of hospital, time spent waiting to see
a physician, patient satisfaction, number of blood tests done per week.
Deseriptive statisties in recorded musie industry —
1. RCA total sales of compact dises this week, number of artists under
contract to a company at a given time.
2, total dollars spent on advertising last month to promote an album.
3. number of units produced in a day.
4. number of retail outlets selling the company’s products.
Inferential statisties in recorded music industry ~
1. Measure the amount spent per month on recorded music for a few
consumers then use that figure toinfer the amount for the population.
2, Determination of market share for rap music by randomly selecting
a sample of 500 purchasers of recorded music.
3, Determination of top ten single records by sampling the number of
requests at a few radio stations.
4. Estimation of the average length of a single recording by taking a
sample of records ane measuring them.
The difference between descriptive and inferential statistics lies mainly
in the usage of the data, These descriptive examples all gather data
from every item in the population about which the description is being,
made. For example, RCA measures the sales on all its compact dises for
a week and reports the total.
In each of the inferential statistics examples, a sample of the population
is taken and the population value is estimated or inferred from the
sample, For example, it may be practically impossible to determine
the proportion of buyers who prefer rap music. However, a random,
sample of buyers can be contacted and interviewed for musiepreference.
‘The results ean be inferred to population market share,
Descriptive statistics in manufacturing batteries to make better
decisions ~
1, total number of worker hours per plant per week — help management
understand labor costs, work allocation, productivity, ete.
2. company sales volume of batteries in a year — help management
decide if the product is profitable, how much to advertise in coming
year, compare to costs to determine profitability13.
Nye Lee
3. total amount of sulfuric acid purchased per month for use in battery
production — can be used by management to study wasted inventory,
serap, etc.
Inferential statistics in manufacturing batteries to make decisions ~
1. Take a sample of batteries and test them to determine the average
shelf life - use the sample average to reach eonelusions about all
batteries of this type, Management can then make labeling and
advertising claims. They can compare these figures to the shelf life
of competing batteries.
2, Take a sample of battery consumers and determine how many
batteries they purchase per year Infer to the entire population —
management can use this information to estimate market potential
and penetration.
3. Interview a random sample of production workers to determine
attitude towards company management - management ean use this
survey results to ascertain employee morale and to direct efforts
towards creating a more positive working environment which,
hopefully, results in greater productivity.
1. Size of sale (8) per customer in men’s formal wear. Either by
taking a sample or using a census, management could eompute the
average sale in men's formal wear of alweekly period and compare
the number to the same average taken a year ago or a month ago to
determine if more is being sold per customer, Other variables might
include number of sales per hour, number of people entering the
department per day, number of dress shirts sold per day, ete.
2. Number of employees working per day. This variable could indicate
the day of the week (certain days have more or less sales), sales
activity (how sales are doing overall), or even health of associates,
Other variables might inelude percent of employees absent due to
illness, average number of hours worked per week per employee,
number of open positions, etc.
3, Inventory turnover rate. How fast are items in the store selling?
Other variables migh include reorder rate, percent of storage space
utilized, number of stockouts per week, etc.
4, Number of customers that enter the store per hour. This figure
will vary by day, time of day, and season, Compare figures on this
variable from period to period can give some indieation of sales
trends which ean help drive human resource planning, ete. Other
variables might include amount of time spent per customer in
the store per visit, distance that customers travel to shop in the
store, number of referrals that customers make to other people
annually, etc.
5. Percentage of people paying with eash, Percentage of people using
credit cards. These can be used to expedite pay systems, investigate
employee theft, calculate surcharges associated with credit cards,
etc, Other variables might include average time per checkout,
average wait time in pay line, ete.
ee
NOTESNOTES
1.6. 1. Size of bill or tab. This variable is the total amount in dollars spent
bya patron per visit to the restaurant. The bill or tab could be for an
individual or a group and would include both food and beverages if
they are all ineluded in the bill. Of course, the measurement would
be in dollars. This information could be very useful for the manager
‘or owner to know the average size ofa bill bath in projecting out total
reventies over a period or as a baseline before a marketing effort to
increase sales,
2, Percentage of capacity filled. This variable could be measured.
at various intervals, times, and days of the week, The measurement
‘would be ealeulated by taking the number of patronsin the restaurant
at any one time divided by the total number of seats in the restaurant
(capacity). From this, management could make staffing decisions for
various times and days of the week In addition, management could
make decisions about when to expand, how much to advertise, and/
or when to run specials.
3, Length of stay. The measurement is how many minutes people are
actually in the restaurant from the time they are assigned a table
until they are leaving? From this, management could determine
customer turnover rates which have capacity implications. That is,
how many times in a day is an average table “turned over”. If people
stay longer, do they spend more?
‘Number of arrivals per S-ninute intervals. The measurement is
how many customers arrive at the front door to be greeted by the
maitre ‘d in any given five-minute period. This figure will likely vary
bby day of the week, season of the year, and time of day. Management
‘ean use this information for staffing decisions and planning.
1.1. (a) ratio
(b) ratio
(©) ordinal
(@) nominal
(©) ratio
(®) ratio
(g) nominal
(hy ratio
18. (a) The population for this study is the 900 electric contractors who
purchased Rathburn wire.
(b) The sample is the randomly chosen group of thirty-five contractors.
(©) The statistic is the average satisfaction score for the sample of
thirty-five contractors.
(a) The parameter is the average satisfaction score for all 900 electric
contractors in the population.Nye Lee
NOTES
Ea ANSWERS AND HINTS
ANSWERS FOR SELF ASSESSMENT QUESTIONS,
oo om =
12 Basie Statistial Concepts 1 Population
2 =
‘ Inductive statistics
4 True
a False
1.8 Dita Measurement ‘ Nominal level 1
1 Interval eval data
Ratolers data
. Tre
10. bea
i. FalseC HAPTER
CHARTS AND GRAPHS
CONTENTS
1 Introduction
Frequency Distributions
Class Midpoint
Relative Frequency
Cumulative Frequency
Self Assessment Questions
® Activity ®
23 Quantitative Data Graphs
Histograms
Using Histograms to Get an Initial Overview of the Data
Frequency Polygons
Osives
Self Assessment Questions
24 Qualitative Data Graphs
24 Pie Charts
242 Bar Graph
Self Assessment Questions
Activity
5 Charts and Graphs for Two Variables
Cross Tabulation
Seatter Plot
Self Assessment Questions
Activity
Summary,
Deseriptive Questions
Solutions for Descriptive Questions
Answers and Hints
errr ey enn ee TereOooo Cee)
DATA VISUALIZATION USING VIZARD
Infruid Labs is a Hyderabad based Business Analytics and Data
‘Visualization Solutions company. The company offers analytics software
that can handle big data to provide actionable business intelligence
to customers by helping them in finding patterns in large datasets.
According to Gartner IT Glossary “Big data ie high-volume, high-velocity
and/or high-variety information assets that demand cost-effective, inno-
vative forms of information processing that enable enhanced insight, deci-
‘sion making, and process automation.”
‘The company was founded by Mahesh Yellai a management graduate
from Indian School of Business, Hyderabad and a B. Tech. from Indian’
Institute of Technology — Madras to help organizations to make data-
driven decisions. The company is helping its clients in different business
areas such as product strategy, human resource management,new prod-
uct development, operations management, competitor analysis, finan-
cial analysis, etc. The company was awarded best Enterprise Software
Startup by HYSEA in 2016 and also got awarded with one of 50 Emerging
Startups award by NASSCOM in 2017.
Vizard is Infruid’s flagship analytics product. As an instant search-
driven analyties platform, Vizard help users to derive deep insights from
their data. Vizard helped its clients to democratize analytics across the
Organization, so that everyone in the organization, who needs to make
decisions, has easy access to instant insights. Some of the best run
businesses are using Vizard across Sales, Marketing, Finance, HR and
Operations departments,
‘Vizard’s consume-grade user experience hides all the complexity of Big
Data processing behind a simple search box, so users ean type any ques-
tion about their Business and slice and dice their data even at big data
scale to get visual insights instantaneously.
Source: htps/[Link]
Few examples where Infruid’s analytics platform helped its clients
QA leading industrial equipment manufacturing company used Vizard
to help its Sales team to optimize selling price and maximize revenue.
‘The optimal selling price, for each of thousands of products, was deter-
mined in Vizard using historical data. Vizard helped the company to
increase its revenue by facilitating data-driven pricing decisions.
QA leading Steel manufacturer used Vizard to derive insights from the
inventory data and to interact with Enterprise Resource Planning
(ERP) software. Vizard has helped the company to plan its inventorystock movement across the various distributed warehouses and reduce:
the Inventory Holding Costs.
2 A leading renewable energy company used Vizard to run its Network.
Operations Centre to analyze Internet of Things ([oT) data from its,
solar power plants and remotely monitor geographically distributed.
power plants. Vizard also helped the company to lower its opera-
tional expenditure through preventive maintenance.
WAY AHEAD
‘The company has planned to release enhanced data discovery capabil-
ities to support augmented data discovery. Augmented data discovery
feature uses Artificial Intelligence (AI) to autodetect patterns of interest
from data and bubble them up to users’ attention.
‘Source: Adapted from Srshts Deora, "[Link] OfEnterprsos Do Not Havo A Clear Seperation
(Of Ownership OF Deta And Insights, Says Mahosh Yella Ofinfruid”, Analytics Indic,
‘ttps/[Link]:not slear-separation-ownorship-data-insights-
spe maberh lif Garner IT Ges Big Dt, pewregarinercomit sty)
iedateRint cuenny
‘The overall objective of Chapter? is for you to master several techniques
for summarizing and depicting data, thereby enabling you to:
2 Create a frequency distribution froma set of data.
> Create and evaluate different types of quantitative data graphs,
including histograms, frequency polygons, ogives, dot plots, and
stem-and-leaf plots, in order to evaluate the data being graphed.
> Create and evaluate different types of qualitative data graphs,
including pie charts, bar graphs, and Pareto charts, in order to
analyze the data being graphed.
> Create a cross-tabulation table and analyze basic
‘two-variable scatter plots of numerical data.
INTRODUCTIO!}
In Chapters 2 and 8 many techniques are presented for reformatting or
reducing data so that the data are more manageable and ean assist decision
makers more effectively. Some of the most effective mechanisms for present-
ing data in a form meaningful to decision makers are graphical depictions.
This chapter focuses on graphical tools for summarizing and presenting
data, Through graphs and charts, the decision maker can often get an overall
picture of the data and reach some useful conclusions merely by studying
the chart or graph. Key characteristics of graphs often suggest appropriate
choices among potential numerical methods (discussed in later chapters) for
analyzing data. Visual representations of data are often much more effective
communication tools than tables of numbers in business meetings.
‘A first step in exploring and analyzing data is to reduce important and some-
times expensive data to a graphic picture that is clear, concise and consis-
tent with the message of the original data. Converting data to graphies can
be creative and artful. In this chapter, guidelines are provided for select-
ing appropriate graphical representations for data sets. Charts and graphs
discussed in Chapter 2 include histograms, frequency polygons, ogives,
dot plots, stem-and-leaf plots, bar charts, pie charts, and Pareto charts for
one-variable data and both eross-tabulation tables and seatter plots for two-
variable numerical data.
El FREQUENCY DISTRIBUTIONS
Raw data, or data that have not been summarized in any way), are sometimes
referred fo as ungrouped data, As an example, Table 2.1 contains 60 years
of raw data of the unemployment rates for Canada. Data that have been orga-
nized into a frequency distribution are called grouped data, Table 2.2 pres-
ents a frequency distribution for the data displayed in Table 2.1.
The distinction between ungrouped and grouped data is important because
the calculation of statistics differs between the two types of data. Several
of the charts and graphs presented in this chapter are constructed from
grouped data,
ming Edueatio
NOTESNOTES
23, 10 63 13 96
28 Ta 56 106 ot
‘One particularly useful tool for grouping data is the frequency distribution,
which is a summary of data presented in the form of class intervals and fre
quencies. How is a frequeney distribution constructed from raw data? That
is, how are frequency distributions like the one displayed in Table 2.2 con-
structed from raw data like those presented in Table 2.1? Frequency distri-
butions are relatively easy to construct. Although some guidelines and rules
of thumb help in their construction, frequency distributions vary in final
shape and design, even when the original raw data are identical. In a sense,
frequency distributions are constructed according to individual business
researchers’ taste,
When constructing a frequency distribution, the business researcher should
first cletermine the range of the raw data. The range often is defined as the
difference between the largest and smallest numbers. The range for the data in
‘Table 2.1 is 9.7 (12.0~2.3),
‘The second step in constructing a frequency distribution is to determine how
‘many classes it will contain, One rule of thumb is to select between 3 and
15 classes. If the frequency distribution contains too few classes, the data
summary may be too general to be useful. Too many classes may result in afrequency distribution that does not aggregate the data enough to be helpful.
The final number of classes is arbitrary. The business researcher arrives at
a number by examining the range and determining a number of classes that
will span the range adequately and also be meaningful to the user. The data
in Table 2.1 were grouped into six classes for Table 2.2.
After selecting the number of classes, the business researcher must deter-
mine the width of the class interval. An approximation of the class width can
be calculated by dividing the range by the number of classes. For the data in
Table 2.1, this approximation would be 9.76 = 1.62. Normally, the number is
rounded up to the next whole number, which in this ease is 2. The frequency
distribution must start at a value equal to or lower than the lowest number
of the ungrouped data and end at a value equal to or higher than the high-
est number. The lowest unemployment rate is 2.3 and the highest is 12.0,
so the business researcher starts the frequeney distribution at | and ends it
at 13. Table 2.2 contains the completed frequency distribution for the data
in Table 2.1, Class endpoints are selected so that no value of the data can fit
into more than one class. The class interval expression “under” in the distri-
bution of Table 2.2 avoids such a problem.
22.41 CLASS MIDPOINT
‘The midpoint of each class interval is called the class midpoint and is some-
times referred to as the elass mark. It is the value halfway across the class
intereal and can be calculated as the average of the two class endpoints. For
example, in the distribution of Table 2.2, the midpoint of the class interval
Buunder 5 is 4, or (3 + 5)2.
‘The class midpoint is important, because it becomes the representative value
for each class in most group statistics calculations. The third column in Table 2.3
contains the class midpoints for all elasses of the data from Table 2.2.
2.2.2 RELATIVE FREQUENCY.
Relative frequency is the proportion of the total frequency that is in any given
class interval in a frequency distribution. Relative frequency is the individ-
ual class frequency divided by the total frequency. For example, from Table
23, the relative frequeney for the class interval S-under 7 is 13/60 =
rr
cUMU
Interval Frequency Midpoint Frequency Frequency
Launder 3 4 2 0667 4
s-under 5 2 4 2000 16
Suunder 7 13 6 2167 29
Tounder 9 19 5 167 48
Seunder 11 7 10 1167 55
L-under 13 5 2 0833, 60
Total 6
NOTESNOTES
Consideration of the relative frequency is preparatory to the study of prob-
ability in Chapter 4. Indeed, if values were selected randomly from the data
in Table 2.1, the probability of drawing a number that is “S-under 7” would
be 2167, the relative frequency for that class interval. The fourth column of
‘Table 23 lists the relative frequencies for the frequency distribution of Table 2
2.2.3 CUMULATIVE FREQUENCY
‘The cumulative frequency is a running total of frequencies through the classes
of a frequency distribution, The cumulative frequency for each class interval
is the frequency for that class interval added to the preceding cumulative
total. In Table 2.3 the cumulative frequency for the first class is the same as
the class frequeney: 4. The cumulative frequeney for the second clase inter-
valis the frequeney of that interval (12) plus the frequency of the first interval
(4), which yields a new cumulative frequency of 16. This process continues
‘through the last interval, at which paint the cumulative total equals the sum
of the frequencies (60). The concept of cumulative frequency is used in many
areas, including sales cumulated over a fiseal year; sports scores during a
contest (cumulated points), years of service, points earned in a course, and
costs of doing business over a period of time. Table 2.3 gives cumulative
frequencies for the data in Table 2.2,
® DEMONSTRATION PROBLEM 2.1 ®
‘The following data from the Federal Home Loan Mortgage Corporation are
the average monthly 30-year fixed rate mortgage interest rates fora recent
40-month pericd.
506 489 0 475 4
495° 47d 4954.07
Jia
[eS
S10 471427 391 B34
Construct s frequency distribution for these data, Csleulate and display the
class midpoints, relative frequencies, and cumulative frequencies for this fre-
queney distribution.
Solution: How many classes should this frequency distribution contain?
‘The range of the data ie 1.76 (5.10 - 9.34). If 8 classes are used, each class
width is approximatel
Range
Class width = ‘Number of Classes
76
LB 022
3S
Ifa class width of 25 is used, a frequency distribution can be constructed
with endpoints that are more uniform looking and allow for presentation
of the information in categories more familiar to mortgage interest rate
users.The first endpoint must be 3.34 or lower to include the smallest value: the last
endpoint must be 5.10 or higher to include the largest value. In this case, the
frequency distribution begins at 3.25 and ends at 5.25. The resulting frequency
distribution, class midpoints, relative frequencies, and cumulative frequen-
cive are listed in the following table:
Class Relative Cumulative
Interval Frequency Midpoint Frequency Frequency
3.25-under 3.50, 3 3.875 075 3
3.50-under 3.75, 4 3.625 100 T
[Link] 4.00, 7 3.875 175 ry
4.00-under 4.25 3 425
4.25-under 4.50 4 4.375
4.50-under 4.75 6 4.625
4.75-under 5.00 10 4.875
5.00-under 5.25 3 B25
Total 40
The frequencies and relative frequencies of these data reveal the mortgage
interest rate classes that are likely to oceur during this period of time. Over-
all, the mortgage rates are distributed relatively evenly with the 4.75-under
5.00 class interval containing the greatest frequency (10), followed by the 3.75-
tunder 4,00 class interval (7), and the 4.50-under 4.75 interval (6).
L is the individual class frequency divided by the total
Frequency,
2. Class midpoint is also sometimes referred as the
3. The cumulative frequency for each class interval is the
for that elass interval added to the preeeding cumulative total,
an
“List three specific uses of cumulative frequencies in business.
EEE] quantirarive pata GRAPHS
One of the most effective mechanisms for presenting data in a form meaning-
ful to decision makers is graphical depiction. Through graphs and charts, the
decision maker ean often get an overall picture of the data and reach some
useful conclusions merely by studying the chart or graph. Converting data
to graphics ean be creative and artful, Often the most diffieult step in this
process is to reduce important and sometimes expensive data to a graphic
picture that is both clear and concise and yet consistent with the message of
reer ey een eee a een
NOTESNOTES
the original data. One of the most important uses of graphical depiction in
statisties is to help the researcher determine the shape of a distribution. Data
graphs can generally be classified as quantitative or qualitative, Quantitative
data graphs are plotted along a numerical seale, and qualitative graphs are
plotted using non-numerical categories. In this section, we will examine
five types of quantitative data graphs: (1) histogram, (2) frequeney polygon,
(3) ogive, (4) dot plot, and (5) stem-and-leaf plot.
2.241 HISTOGRAMS
One of the more widely used types of graphs for quantitative data is the
histogram. A histogram is a series of contiguous rectangles that represent
the frequeney of data in given elase intervals, If the class intervals used
along the horizontal axis are equal, then the heights of the rectangles repre-
sent the frequency of values in a given class interval. Ifthe class intervals are
unequal, then the areas of the rectangles ean be used for relative compari-
sons of class frequencies. Construction of a histogram involves labelling the
x-axis (abscissa) with the class endpoints and the y-axis (ordinate) with the
frequencies, drawing a horizontal line segment from class endpoint to class
endpoint at each frequency value, and connecting each line segment verti-
cally from the frequency value to the x-axis to form a series of rectangles.
Figure 2:1 is « histogram of the frequeney distribution in Table 2.2.
A histograms usefl too! for differentiating the frequencies of class inter~
® vals. A quick glance at @ histogram reveals which class intervals produce e
the highest frequency totals. Figure 2.1 clearly shows that the class interval
T-under-9 yields by far the highest frequency count (19). Examination of the
histogram reveals where large increases or decreases oceur between classes,
such as from the L-under 3 class to the 2-under 5 class, an inerease off, an
from the T-under 9 class to the 9-under 11 class, a deerease of 12
Note that the scales used along the x- and y-axes for the histogram in Figure 2.1
are almost identical, However, because ranges of meaningful numbers for
2
Poso3 7 9 os
‘Unemployment Rates for Canada
Figure 2.1: Histogram of Canadian Unemployment DataNOTES
the two variables being graphed often differ considerably, the graph may
have different seales on the two axes.
Figure 2.2 shows what the histogram of unemployment rates would look like
if the scale on the y-axis were more compressed than that on the x-axis. Notice
that with the compressed graph, Figure 2.2, there appears to be less differ-
ence between the lengths of the rectangles than those in Figure 2.1 implying
that the differences in frequencies for the compressed graph are nat as great
as they are in Figure 2.1. It is important that the user of the graph clearly
understands the seales used for the axes of a histogram. Otherwise, a graph's
creator can “lie with statistics” by stretching or compressing a graph to make
a point.”
2
Frequency
pos Sl
‘Unemployment Rates for Canada
Figure 2.2: Histogram of Canadian Unemployment Data
(y-axis compressed)
‘ie shold be pointed ou that the seve package Excel ses che term Mstagrom to refer 0
Frequancy diebution. Hoover by checking Chars Ontput i the Exel Kstgrom dialog So, &
‘rophical histogram i olan created
2.3.2 USING HISTOGRAMS TO GET AN
OF THE DATA
{ITIAL OVERVIEW
Because of the widespread availability of computers and statistical software
packages to business researchers and decision makers, the histogram con-
tinues to grow in importance in yielding information about the shape of the
distribution of a large database, the variability of the data, the central loca-
tion of the data, and outlier data. Although most of these concepts are pre-
sented in Chapter 3, the notion of histogram as an initial tool to access these
data characteristies is presented here.
A business researcher measured the volume of stocks traded on Wall Street
three times a month for nine years resulting in a database of 324 observa-
tions. Suppose a financial decision maker wants to use these data to reach
some conclusions about the stock market.
Figure 23 shows a produced histogram of these data. What can we learn
from this histogram? Virtually all stock market volumes fall between zero
and 1 billion shares. The distribution takes on a shape that is high on the
left end and tapered to the right. In Chapter 3 we will learn that the shape
of this distribution is skewed toward the right end. In statisties, it ig oftenNOTES
so
so
20 |
o 500 milion 1 ion
Figure 2.3: Histogram of Stock Volumes
useful to determine whether data are approximately normally distributed
(bell shaped curve) as shown in Figure 2.4.
[Normal Distribution
[ \
S/ \
Figure 24: Normal Distribution
We can see by examining the histogram in Figure 2.3 that the stock market
volume data are not normally distributed. Although the centre of the histo-
gram is located near 500 million shares, a large portion of stock volume obser-
vations falls in the lower end of the data somewhere between 100 million and
400 million shares. In addition, the histogram shows some outliers in the upper
end of the distribution, Outliers ane data points that appear outside of the main
body of observations and may represent phenomena that differ from those rep-
resented by other data points. By observing the histogram, we notice a few
data observations near 1 billion, One could conclude that on a few stock market
days an unusually large volume of shares are traded. These and other insights
can be gleaned by examining the histogram and show that histograms play an
important role in the initial analysis of data.
2.3.3 FREQUENCY POLYGONS
A frequency polygon, like the histogram, is a graphical display of class fre-
quencies. However, instead of using rectangles like a histogram, in a frequency
polygon each class frequency is plotted as a dot at the class midpoint, and the
dots are connected by a series of line segments, Construction of a frequency
polygon begins by sealing class midpoints along the horizontal axis and the
frequency seale along the vertical axis. A dot is plotted for the associatedfrequency value at each class midpoint, Connecting these midpoint dots com-
pletes the graph. Figure 2.5 shows a frequency polygon of the distribution data
from Table 2.2 produced by using the sofiware package Excel, The informa-
tion gleaned from frequency polygons and histograms is similar: As with the
histogram, changing the scales of the axes can compress or stretch a frequency
polygon, which affects the user's impression of what the graph represents.
0
1s
16
“
R -
Eg
es \
gE. \. ¢
: a
:
°
Class Midpoint
Figure 2.5: Frequency Polygon of the Unemployment Data
2.3.4 OGIVES
‘An ogive (o-jive) is a cumulative frequency polygon. Construction begins by
labeling the x-axis with the class endpoints and the y-axis with the frequen-
cies, However, the use of cumulative frequency values requires that the scale
along the y-axis be great enough to inelude the frequeney total. A dot of zero
frequency is plotted at the beginning of the first class, and construetion pro-
ceeds by marking a dot at the end of each class interval for the cumulative
value, Connecting the dots then completes the ogive. Figure 2.6 presents an
ogive produced by using Excel for the data in Table 2.2,
Ogives are most useful when the decision maker wants to see running totals.
For example, ifa comptroller is interested in controlling costs, an ogive could
depict cumulative costs over a fiseal year,
Steep slopes in an ogive can be used to identify sharp increases in frequen-
cles. In Figure 2.6, a particularly steep slope occurs in the 7-under 9 class,
signifying a large jump in class frequency totals.
Table 2.4 contains scores from an examination on plant safety policy and
rules given to a group of 35 job trainees. A stem-and-leaf plot of these data
is displayed in Table 2.5. One advantage of such a distribution is that the
instructor ean readily see whether the seares are in the upper ar lower end
ofeach bracket and also determine the spread of the scores. A second advan-
NOTESNOTES
‘curative Frequency
zg
s
&
8
8
a sa to Ta
‘G4ass Endpoints
‘igure 2.6: Ogive of the Unemployment Data
PRT ee eee pin
ean stitage of stem-and-leaf plots is that the values of the original raw data are
retained (whereas most frequency distributions and graphic depictions use
the class midpoint to represent the values in a class).
Fill in the blanks!
4. Through and , the decision maker can often get
an overall picture of the data and reach some useful eonelusions.
5. Data graphs can generally be classified as or
6. are data points that appear outside of the main body of
‘Observations and may represent phenomena that differ from those
represented by other data points.
7. At) is a cumulative frequeney polygon.
Eg QUALITATIVE DATA GRAPHS
In contrast to quantitative data graphs that are plotted along 2 numerical
scale, qualitative graphs are plotted using non-numerieal estegories. In this
section, we will examine three types of qualitative data graphs: (1) pie charts,
(2) bar charts, and (@) Pareto charts,
24.1 PIE CHARTS
A pie chart is a circular depiction of data where the area of the whole pie rep-
‘resents 100% of the data and slices of the pie represent a percentage breakdown
of the sublevels. Pie charts show the relative magnitudes of the parts to the
whole. They are widely used in business, particularly to depict such things
as budget categories, market share, and time/resource allocations. However,
the use of pie charts is minimized in the sciences and technology because pie
charts can lead to less accurate judgments than are possible with other types
of graphs. Generally, itis more difficult for the viewer to interpret the relative
size of angles in a pie chart than to judge the length of rectangles in a bar chart.
Construction of the pie chart begins by determining the proportion of the
sub-unit to the whole. Table 2.6 contains the refining capacity (1,000 barrels
per day) of the top five petroleum refining companies in the United States in
a recent year
‘Toconstruct a pie chart from these data, first convert the raw capacity figures
to proportions by dividing each eapaeity figure by the total capacity figure
(15,134). This proportion is analogous to the relative frequency computed
for frequency distributions. Because a circle contains 360°, each proportion
is then multiplied by 360 to obtain the number of degrees to represent each
company in the pie chart. For example, Exxon Mobil’s capacity of 5,589 (1,000
barrels) represents a .3693 proportion of the total capacity for these five com-
panies. (5,589/15,134 = 0.3693). Multiplying this value by 360° results in an
angle of 122.95", The pie chart is then constructed by determining each of
the other angles and using a compass to lay out the slices. The pie chart in
Figure 2.7, depicts the data from Table 2.6.
ee
NOTES