0% found this document useful (3 votes)
2K views452 pages

Decision S

NMIMS

Uploaded by

Mm Pl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (3 votes)
2K views452 pages

Decision S

NMIMS

Uploaded by

Mm Pl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
DECISION SCIENCE Author: Ken Black University of Houston - Clear Lake NMIMS GLOBAL ACCESS SCHOOL FOR NMIMS _ conrinuine Epucation een ® saosaoe e209 Chief Academie Officer Dz Arun Mohan Sherry MSc. (Gold Medalist), [Link]. (Computer Science ~ IIT Kharagpur), Ph.D. NMIMS Global Access - School for Continuing Education Content Customized by ‘Mr: Kali Charan Sabst NMIMS Global Access ~ School for Continuing Education Content Contributors Abhijit Biswas & Lokesh Payasi Copyright 2019 Publish Iss 978-81-265-7826-8 ‘Address: 4496/7, Ansari Road, Daryaganj, New Dethi-110002 Onis for NMIMS Global Access - Sehool for Continuing Education School Address VL. Mehta Road, Vile Parle (W), Mumbai - 400 056, India. preeoee oe Continuing Education UU Ces UU Ken Black is currently professor of decision sciences in the School of Business at the University of Houston-Clear Lake. He eared a bachelor’s degree in mathematics from Graceland University, a master’s degree in math education from the University of Texas at El Peso, a Ph.D. in business administration(management science), and. a Ph.D. in educational research from the University of North Texas. Since joining the faculty of UHCL in 1979, Professor Black has taught all levels of statistics courses, forecasting, management science, market research, and produetion/operations management. He has published over 20 journal articles and 20 professional papers, as well as two textbooks: Business Statistics: An Introductory Course and Business Statistics for Contemporary Decision Making. CHAPTER NO. CHAPTER NAME PAGE NO. 1 Introduction to Statistics 1 Charts and Graphs 22 3 Descriptive Statistios 50 4 Probability, 97 5 Probability Distributions 150 ‘Sampling and Sampling Distributions, 7 Corvelation and Simple Regression Analysis 268 8 Multiple Regression Analysis 303 9 ‘Time-Series Forecasting 882 10 Decision Analysis 883 uw Case Studies 418 RB Appendix A Online) aT eee DECISION SCIEN! CT ee Decision Science Introduction to Statistics: Statistics in Business, Basie Statistical Coneepts, Variables and Data, Data Measurement. Data Visualization: Frequency Distributions, Quantitative Data Graphs, Qualitative Data Graphs, Charts and Graphs for Two Variables. Descriptive Statistics: Measures of Central Tendency: Ungrouped Data, Measures of Variability: ‘Ungrouped Data, Measures of Central Tendency and Variability: Grouped Data, Measures of Shape. Basics of Probability: Introduction to Probability, Methods of Assigning Probabilities, Structure of Probability, Marginal probability, Union, Joint Probabilities, Addition Laws, Multiplication Laws, Conditional Probability, Bayes’ Rule. Probability Distributions: Discrete Versus Continuous Distribution, Binomial Distribution, Pois- son Distribution, Hypergeometric Distribution, The Uniform Distribution, Normal Distribution, using Normal Curve to Approximate Binomial Distribution, Exponential Distribution. e Sampling Techniques and Sampling Distributions: Introduction to sampling, Reasons for Sam- e pling, Random Versus Non-random Sampling, Sampling Error, Non-sampling Errors, Sampling Distribution of “sample mean” and “Sample proportion’. Correlation and Simple Regression: Correlation analysis, Introduction to Simple Regression Analysis, Equation of the Regression Line, Coefficient of Determination. Multiple Regression Analysis: Multiple Regression Model with Two Independent Variables, Determining the Multiple Regression Equation, Coefficient of Multiple Determination, Adjusted R®, Interpreting Multiple Regression Computer Output. Forecasting Techniques: Introduction to Forecasting, Time-Series Components, Measurement of Forecasting Error, Smoothing Techniques, Trend Analysis, Autocorrelation and Autoregression, Decision Analysi making under risk. Decision making under certainty, Decision making under uncertainty, Decision Pe INTRODUCTION TO STATISTICS conTENTS a Introduction 12 | Basie Statistical Concepts Self Arsessment Questia Activity 13 | Data Measuremesit tsa | Nominal Level ® 13.2 Ordinal Level ® 133 | Interval Level isa | RatioLevel 125 | ~ Comparison ofthe Four Levels of Data Self Assessment Questions Activity v@lE== 15 | Deserptive Questions 16 | Solutions for Descriptive Questions 13 | Answers and Hints Pe eet et BI) TSO A STATISTICS DESCRIBE THE STATE OF BUSINESS IN INDIA'S COUNTRYSIDE India is the second largest country in the world with more than 1.25 billion people. More than 70% of the people live in rural areas seat- tered about the countryside in 650,000 villages. Infact, it can be said that one in every 10 people in the world live in rural India, While it has a per capita income of less than $1 (US.) per day, rural India, which has been, described in the past as poor and semi-iliterate, now contributes to about one-half of the country’s gross national product (GNP). However, rural India still has the most households in the world without electricity, over 300,000. Despite its poverty and economic disadvantages, there are eompelling reasons for companies to market their goods and services to rural India. ‘The market of rural India has been growing at five times the rate of the urban India market. There is increasing agricultural productivity leading to growth in disposable income, and there is a reduction in the gap between the tastes of urban and rural customers. The literacy level is inereasing, and people are becoming more conscious about their styles and opportunities for a better life. Around 60% of alll middle-income households in India are in rural ® areas and mote than one-third of all rural households in India now ® have a main source of income other than farming. Virtually every home has a radio, about one-third have a television, and more than one-half of rural households benefit from banking services. Forty-two percent of the people living in India’s villages and small towns use toothpastes and that proportion is increasing as rural income rises and as there is ‘heater awareness about oral hygiene. Jn rural India, consumers are gaining more disposable income due to ‘the movement of manufacturing jobs to rural areas. It is estimated that nearly 75% of the factories that opened in India in the past decade were ‘builtin rural areas. Products that are doing well in sales to people in rural India include televisions, fans, bieycles, bath soap, two- or three-wheel- ers, cars, and many others. According to MART, a New Delhi-based research organization, rural India buys 46% of all soft drinks and 49% of motoreycles sold in India. Because of such factors, many US. and Indian firms such as Microsoft, General Electric, Kellogg's, Colgate-Palmolive, Idea Cellular, Hindustan Lever, Godre), Nirma Chemical Works, Novartis, Dabur, Tata Motors, and Vodafone India have entered the rural Indian market with enthusiasm. Marketing to rural customers often involves ‘persuading them to try and to adopt products that they may not have used ‘before, Rural India is a huge, relatively untapped market for businesses. However, entering such a market is not without risks and obstacles, The dilemma facing companies is whether to enter this marketplace and, if 20, to what extent and how, SEs oe tene Ere tec LCce) ‘Source: Adapted from “Marketing to Rural dia: Mokang the Bnd» Most." March 82007, 1n India Knowledge @ Wharton, hitp:/[Link]/aructocfn? ‘Stislehd=4172;"Rural Segment Quickly Catching Up", Sotamber 3015 [BEF Undia rand ‘Equity Fourdation) at: [Link]: “Unlocking the ‘Wealth in Rural Markets" June 2014, Harenrd Busine Bevin at bpd abr crg/01 08) ‘unlocking the veal in sural markets, "Much of Rural Ini Sill Wate for Elocteity”, atober 2013, Univery of Washington, at: arse. [Link]/nows/201310;muah” ruralindia-stillwatsabectricty [NMIMS Global Accent - Schoo! for Continuing Education NOTES The primary objective of Chapter 1 is to introduce you to the world of statistics, thereby enabling you to: >= List quantitative and graphical examples of statisties within a busi- ness context. 2 Define important statistical terms, including population, sample, and parameter, as they relate to descriptive and inferential statistics. Explain the difference between variables, measurement, and data, Compare the four diffarent levels of data: nominal, ordinal, interval, sntsston Fd E 7” INTRODUCTION Every minute of the working day, decisions are made by businesses around the world that determine whether companies will be profitable and growing or whether they will stagnate and die. Most of these decisions are made with the assistance of information gathered about the marketplace, the economic and financial environment, the workforce, the competition, and other factors ‘Such information usually comes in the form of data or is accompanied by ® data, Business statistes provides the tool through which such data are collected, e analyzed, summarized, and presented to facilitate the decision-making process, and business statistics plays an important role in the ongoing saga of decision making within the dynamic world of business. Virtually every area of business uses statistics in decision making. Here are some reeant examples: According to a national survey of independent business owners con- ducted by the Institute for Local Self-Reliance in partnership with the Advocates for Independent Business coalition, when asked “Which two public policy changes would most help your business?” (retailers only), 40% said “Pass the Marketplace Fairness Act” and 38% said “Cap Credit Card Swipe Fees”. 2 A survey of 1465 workers by Hotjobs reports that 55% of workers believe that the quality of their work is pereeived the same when they work remotely as when they are physically in the office. A survey of 477 executives by the Association of Executive Search Consultants determined that 48% of men and 67% of women say they are more likely to negotiate for less business travel compared with five years ago. 2A global Family Business Survey of 2,278 respondents sponsored by PwC reported that 65% of family businesses reported growth in the last twelve months and 49% of respondents are apprehensive about their ability to recruit skilled staff in the next twelve months, A Deloitte Retail “Green" survey of 1080 adults revealed that 54% agreed that plastic, non-compostable shopping bags should be banned. INTRODUCTION TO STATISTICS 5 NOTES A study of consumer electronics spending by a 2,500 member on-line panel of the NPD group showed that consumers expect to spend $555, on average, per person on new consumer electronics devices this year. You can see from these few examples that there is a wide variety of uses and applications of statisties in business. Note that in most of these examples, business researchers have conducted a study and provided us rich and inter- esting information. In this text we will examine several types of graphs for depicting data as we study ways to arrange or structure data into forms that are both meaningful and useful to decision makers. We will learn about techniques for sampling from a population that allow studies of the business world to be conducted more inexpensively and in a more timely manner. We will explore various ‘ways to forecast future values and examine techniques for predicting trends. These and many other exciting statistics and statistical techniques await us on this journey through business statisties, Let us begin. BASIC STATISTICAL CONCEPTS, Business statistics, like many areas of study, has its own language. Itisimport- ant to begin our study with an introduction of some basic concepts in order to understand and communicate about the subject. We begin with a discussion of the word statistics. The word statistics has many different meanings in our ® culture, Webster's Third New International Dictionary gives a comprehensive ® definition of statisties as a science dealing with the collection, analysis, inter- pretation, and presentation of numerical data. Viewed from this perspective, statistics includes all the topics presented in this text. Figure 11 graphically displays the key elements of statistics. The study of statisties can be organized in a variety of ways. One of the main ways is to subdivide statistics into two branches: descriptive statistics and inferential statistics. To understand the difference between deseriptive and inferential statistics, definitions of population and sample are helpful. Webster's Third New International Dictionary defines population as a collec- tion of persons, objects, or items of interest. The population can be a widely defined category, such as “all automobiles,” or it can be narrowly defined, such as “all Ford Mustang ears produced from 2014 to 2016.” A population can Present Prndings Figure 1.1: The Key Elements of Statisties ee NOTES be a group of people, such as “all workers presently employed by Microsoft,” or it can be a set of objects, such as “all dishwashers produced on February 2, 2016, by the General Electric Company at the Louisville plant.” The researcher defines the population to be whatever he or she is studying. When researchers gather data from the whole population for a given measurement of interest, they call it a census. Most people are familiar with the “Census of India”. Every 10 years, the government attempts to measure all persons living in this eountry. A sample is a portion of the whole and, if properly taken, is representative of the whole, For various reasons, researchers often prefer to work with a sample of the population instead of the entire population, For example, in conducting quality-control experiments to determine the average life of light- bulbs, a lightbulb manufacturer might randomly sample only 75 lightbulbs during a produetion run. Because of time and money limitations, a human resources manager might take a random sample of 40 employees instead of using a census to measure company morale. If a business analyst is using data gathered on a group to describe or reach conclusions about that same group, the statistics are called descriptive statistics. For example, ifan instructor produces statistics to summarize a class’s exam- ination effort and uses those statistics to reach conclusions about that class only, the statisties are descriptive. ® Many of the statistical data generated by businesses are descriptive, They ® might include number of employees on vacation during June, average salary at the Mumbai office, corporate sales for 2016, average managerial satisfac- tion score on a company-wide census of employee attitudes, and average return on investment for the Tata Motors for the years 1996 through 2016. Another type of statistics is called inferential statistics. If a researcher gathers data from a sample and uses the statistics generated to reach conelu- sions about the population from whick the sample was taken, the statistics are inferential statistics. The data gathered from the sample ane used to infer something about a larger group. Inferential statistics are sometimes referred to as inductive statistics. The use and importance of inferential statistics continue to grow. One application of inferential statistics is in pharmaceutical research. Some new drugs are expensive to produce, and therefore tests must be limited to small samples of patients. Utilizing inferential statistics, researchers can design experiments with small randomly selected samples of patients and attempt to reach conclusions and make inferences about the population. ‘Market researchers use inferential statistics to study the impact of advertis- ing on various market segments. Suppose a soft drink company creates an advertisement depieting a dispensing machine that talks to the buyer, and market researchers want to measure the impact of the new advertisement on various age groups. The researcher could stratify the population into age eat- egories ranging from young to old, randomly sample each stratum, and use inferential statistics to determine the effectiveness of the advertisement for the various age groups in the population. The advantage of using inferential statistics is that they enable the researcher to study effectively a wide range of phenomena without having to conduct a census, [NMIMS Global Accent - Schoo! for Continuing Education INTRODUCTION TO STATISTICS 7 NOTES A descriptive measure of the population is called a parameter. Parameters are usually denoted by Greek letters. Examples of parameters are popula- tion mean (), population variance (@), and population standard deviation (o) A descriptive measure of a sample is called a statistic. Statistics are usu- ally denoted by Roman letters. Examples of statisties are sample mean (2), sample variance (¢), and sample standard deviation (9). Differentiation between the terms parameter and statistic is important only in the use of inferential statistics. A business researcher often wants to estimate the value of a parameter or conduet tests about the parame- ter However, the calculation of parameters is usually either impossible or infeasible because of the amount of time and money required to take a census. In such cases, the business researcher can take a random sample of the population, caloulate a statistic on the sample, and infer by estimation the value of the parameter. The basis for inferential statistics, then, is the ability to make decisions about parameters without having to complete a census of the population. For example, a manufacturer of washing machines would probably want to determine the average number of loads that a new machine can wash before it needs repairs. The parameter is the population mean or average number of washes per machine before repair: A company researcher takes a sample of machines, computes the number of washes before repair for each machine, averages the numbers, and estimates the population value or parameter ® by using the statistic, which in this case is the sample average. Figure 1.2 € demonstrates the inferential process. Inferences about parameters are made under uncertainty. Unless parame- ters are computed directly from the population, the statistician never knows with certainty whether the estimates or inferences made from samples are true, In an effort to estimate the level of confidence in the result of the pro- cess, statisticians use probability statements. For this and other reasons, part of this text is devoted to probability (Chapter 4). Caleulate x10 estate Population “ (parameter) Selecta random sample from the population Figure 1.2: The Inferential Process Serre ey enn ery ee ern NOTES Business statistics is about measuring phenomena in the business world and organizing, analyzing, and presenting the resulting numerical informa- tion in such a way such that better, more informed business decisions can ‘be made, Most business statisties studies contain variables, measurements, and data, In business statistics, a variable is a characteristic of any entity being studied that is eapable of taking on different values. Some examples of variables in busi- ness might include return on investment, advertising dollars, labor productiv- ity, stock price, historical cost, total sales, market share, age of worker, earnings per share, miles driven to work, time spent in store shopping, and many, many others, In business statistics studies, most variables produce a measurement that can be used for analysis. A measurement is when @ standard process is used to assign numbere to particular attributes or characteristics of a variable. ‘Many measurements are obvious, stich as time spent in a store shopping by a customer, age of the worker, or the number of miles driven to work. However, some measurements, such as labor productivity, customer satisfaction, and veturn on investment, have to he defined by the business researcher or by experts within the field, Once such measurements are recorcled and stored, they can be denoted as “data.” It ean be said that data are recorded measure- ‘ments. The processes of measuring and data gathering are basic to all that we do in business statistics. is data that are analyzed by a business statistician in order to learn more about the variables being studied. Sometimes, sets of data are organized into databases as a way to store data or as a means for more con- veniently analyzing data or comparing variables. Valid data are the lifeblood of business statistics, andl it is important that the business researcher give thoughtful attention to the creation of meaningful, valid data before embark: ing on analysis and reaching conclusions. Fillin the blanks: 1 is a collection of persons, objects, or items of interest. 2, When researchers gather data from the whole population for a given measurement of interest, then it’s called a . 3. Inforential statistics are sometimes referred to a8 State whether the following statements are true|false: 4, Data interpretation is a key element of statistics. 5. Researchers often prefer to work with the population instead of the entire sample. “International Labor Database” containing the civilian unem- ployment rates in percent from seven countries presented yearly over a 40-year period. The countries are the United States, Canada, Australia, Japan, Franee, Germany, and Italy. Prepare a comparative report based on your study. [NMIMS Global Accent - Schoo! for Continuing Education INTRODUCTION TO STATISTICS 9 NOTES El DATA MEASUREMENT Millions of numerical data are gathered in businesses every day, repre- senting myriad items, For example, numbers represent costs of items pro- duced, geographical locations of retail outlets, weights of shipments, and rankings of subordinates at yearly reviews. All such data should not be analyzed the same way statistically because the entities represented by the numbers are different. For this reason, the business researcher needs to know the level of data measurement represented by the numbers being analyzed. The disparate use of numbers can be illustrated by the numbers 40 and 80, which could represent the weights of two objects being shipped, the ratings received on a consumer test by two different products, or football jersey numbers of a fullback and a wide receiver: Although 80 pounds is twiee a8 much as 40 pounds, the wide receiver is probably not twice as big as the fullback! Averaging the two weights seems reasonable, but averaging the football jersey numbers makes no sense. The appropriateness of the data analysis depends on the level of measure- ment of the data gathered. The phenomenon Highest Leveof Dats Measurement represented by the numbers determines the A level of data measurement. Four common i= | levels of data measurement follow. jh sone (Se) * 2 oni = 4. Ratio Cas ment followed by ordinal, interval, and ratio. Ratio is the highest level of data measure Figure 1.5: Hierarchy of ment, as shown in Figure 1.3. Levels of Data 1.3.1 NOMINAL LEVEL, The lowest level of data measurement is the nominal level. Numbers repre- senting nominal-level data (the word level often is omitted) ean be used only to classify or categorize. Employee identification numbers are an example of nominal data, The numbers are used only to differentiate employees and not to make a value statement about them. Many demographic questions in surveys result in data that are nominal because the questions are used for classification only. The following is an example of such a question that would result in nominal data: Which of the following employment classifications best describes your area of work? 1. Educator 2. Construction worker 3. Manufacturing worker 4. Lawyer ee NOTES 5. Doctor 6. Other ‘Suppose that, for computing purposes, an educator is assigned a 1, a construe- tion worker is assigned a 2, a manufacturing worker is assigned a 3, and so on, These numbers should be used only to elassify respondents. The number 1 does not denote the top classification. It is used only to differentiate an edu- ator (1) from a lawyer (4). ‘Some other types of variables that often produce nominal-level data are sex, religion, ethnicity, geographic location, and place of birth. Social Security numbers, telephone numbers, employee ID numbers, and ZIP code num- bers are further examples of nominal data. Statistical techniques that are appropriate for analyzing nominal data are limited. However, some of the more widely used statistics, such as the chi-square statistic, can be applied to nominal data, often producing useful information. 1.3.2 ORDINAL LEVEL. Ordinal-level data measurement is higher than the nominal level. In addition to the nominal-level capabilites, ordinal-level measurement ean be used to ranke or order people or objects, For example, using ordinal dlata, a supervisor can evaluate three employees by ranking their produc- tivity with the numbers 1 through 3. The supervisor could identify one ® employee as the most productive, one as the least productive, and one as e somewhere between by using ordinal data, However, the supervisor could not use ordinal data to establish that the intervals between the employ- 5 ranked 1 and 2 and between the employees ranked 2 and 3 are equal; thatis, she eould not say that the differences in the amount of productivity botween workers ranked 1,2, and 3 are necessarily the same. With ordinal data, the distances or spacing represented by consecutive numbers are not always caval. Some questionnaire Likert-type seales are considered by many researchers to be ordinal in level. The following is an example of one such scale: ‘This computer tutorial is _ ee ‘not somewhat moderately very extremely hhejpfil helpful helpful helpfal_—_—elpful 1 2 3 4 5 When this survey question is coded for the computer, only the numbers 1 through 5 will remain, not the adjectives. Virtually everyone would agree that a5 is higher than a4 on this scale and that ranking responses is possible. However, most respondents would not consider the differences between not helpful, somewhat helpful, moderately helpful, very helpful, and extremely helpful to he equal. ‘Mutual funds as investments are sometimes rated in terms of risk by using measures of default risk, currency risk, and interest rate risk. These three ‘measures are applied to investments by rating them as having high, medium, and low risk, Suppose high risk is assigned a 3, medium risk a 2, and low risk a 1. Ifa fund is awarded a 3 rather than 2, it carries more risk, and so on. However, the differences in risk between categories 1, 2, and 3 are not necessarily equal. Thus, these measurements of risk are only ordinal-level Nye Lee NOTES measurements. Another example of the use of ordinal numbers in business is the ranking of the top 50 most admired companies in Fortune magazine. The numbers ranking the companies are only ordinal in measurement. Certain statistical techniques are speeifically suited to ordinal data, but many other techniques are not appropriate for use on ordinal data. For example, it does not make sense to say that the average of “moderately helpful” and “very helpful” is “moderately helpful and a half.” Because nominal and ordinal data are often derived from imprecise mea- surements such as demographic questions, the categorization of people or objects, or the ranking of items, nominal and ordinal data are nonmetric data and are sometimes referred to as qualitative data. 1.3.3 INTERVAL LEVEL Interval-level data measurement is the nezt to the highest level of data in which the distances between consecutive numbers have meaning and the data are always numerical. The distances represented by the differences between consecutive numbers are equal; that is, interval data have equal intervals. An example of interval measurement is Fahrenheit temperature. With Fahrenheit temperature numbers, the temperatures can be ranked, and the amounts of heat between consecutive readings, such as 20°, 21°, and 22", are the same. ® In addition, with interval-level data, the zero point is a matter of convention ® or convenience and not a natural or fixed zero point. Zero is just another point on the scale and does not mean the absence of the phenomenon. For example, zero degrees Fahrenheit is not the lowest possible temperature, Some other examples of interval-level data are the percentage change in employment, the percentage return on a stock, and the dollar change in stock price. 1.34 RATIO LEVEL Ratio-level data measurement is the highest level of data measurement. Ratio data have the same properties as interval data, but ratio data have an absolute ero, and the ratio of two numbers is meaningful. The notion of absolute zero means that zero is fixed, and the zero value in the data represents the absence of the characteristic being studied. The value of zero cannot be arbitrarily assigned because it represents a fixed point. This definition enables the stat- istician to create ratios with the data. Examples of ratio data are height, weight, time, volume, and Kelvin tem- perature. With ratio data, a researcher can state that 180 pounds of weight is twiee as much as 90 pounds or, in other words, make a ratio of 180:90, Many of the data measured hy valves or gauges in industry are ratio data, Other examples in the business world that are ratio level in measurement are production eyele time, work measurement time, passenger miles, number of trucks sold, complaints per 10,000 fliers, and number of employees. Because interval-and ratio-level data are usually gathered by precise instru- ments often used in production and engineering processes, in national stan- dardized testing, or in standardized accounting procedures, they are called metric data and are sometimes referred to as quantitative data. ee NOTES 1.3.5 COMPARISON OF THE FOUR LEVELS OF DATA Figure 1.4 shows the relationships of the usage potential among the four levels of data measurement. The concentric squares denote that each higher level of data can be analyzed by any of the techniques used on lower levels of data but, in addition, can be used in other statistical techniques. Therefore, ratio data can be analyzed by any statistical technique applicable to the other three levels of data plus some others. ‘Nominal data are the most limited data in terms of the types of statistical analysis that can be used with them. Ordinal data allow the researcher to perform any analysis that can be done with nominal data and some addi- tional analyses. With ratio data, a statistician can make ratio comparisons and appropriately do any analysis that ean be performed on nominal, ordinal, orinterval data, Some statistical techniques require ratio data and cannot be used to analyze other levels of data, Statistical techniques can be separated into two categories: parametric statistics and nonparametric statisties, Parametric statisties require that data be interval or ratio, If the data are nominal or ordinal, nonparametric statisties must be used, Nonparametrie statistics ean also be used to analyze interval or ratio data, Figure 1.5 contains a summary of metrie data and nonmetrie data. + Higher Level Data ‘Ordinal + Interval and Ratio + Lower Level ata + Nominal and Ordinal Qualitative Data + Must Use Nonparamettic Satstics Figure 1.4: Usage Potential of Variotis Levels of Data Figure 1.5: Metric vs. Nonmetrie Data DEMONSTRATION PROBLEM L1 [Because of ineransed compatition for patients among providers snd the need Ladmin- to determine how providers can better serve their clientele, hos istratore sometimes administer s quality satisfaction survey to their patients after the patient is released, ‘The following types of questions are sometimes scked on auch » survey, These questions will result in what level of data measurement? 1. How long ago were you released from the hospital? 2. Which type of unit were you in for most of your stay? _Coronary care Intensive care [NMIMS Global Accent - Schoo! for Continuing Education NN Se Lea ORO NOTES _Maternity eare _Medical unit _Pediatrie/children's unit _Surgical unit 3. In choosing a hospital, how important was the hospital's location? (circle one) Very Somewhat Not Very Not at All Important Important Important Important 4. What was your body temperature when you were admitted to the hospital? 5. Rate the skill of your doctor: Excellent _VeryGood _Good _Fair Poor Solution: Question 1 is a time measurement with an absolute zero and is therefore ratio-level measurement. A person who has been out of the hospi- tal for two weeks has been out twice as long as someone who has been out of the hospital for one week. Question 2 yields nominal data because the patient is asked only to cat- egorize the type of unit he or she was in. This question does not require ® « hierarchy or ranking of the type of unit, Questions 3 and 5 are likely to ® result in ording in these two questions. For question 3, "very important” might be assigned a4, “somewhat important” « 3, “not very important” « 2, and “not at all important” a 1. Certainly, the higher the number, the more important is the hospital's location, Thus, these responses can be ranked by selection. How- ever, the increases in importance from 1 to 2 to 3 to 4 are not necessarily ‘equal. This same logic applies to the numeric values sssigned in question 5. In question 4, body temperature, if measured on a Fahrenheit or Celsius seale, is interval in measurement. vel dats, Suppose a number is assigned the descriptors Fillin the blanks: 6. is the lowest level of data measurement. 7. Fahrenheit scale is example of . 8. ig the highest level of data measurement. State whether the following statements are true/false: 9. Ordinal-level data measurement is higher than the nominal level. 10. Ordinal data are nonmetric data and are sometimes referred to as quantitative data. 11. _Interval-Jevel data many not be always numerical. Peres NOTES From the “Ministry of Statistics and programme Implementation”, Government of India website dawnload the “Energy Statisties” report (use http:/[Link]/sites/default/files/publication_reports/Energy_ Statisties_2018.paf to download the report). In the report, identify one variable each for the four common levels of data measurement, Eg SUMMARY Statistics is an important decision-making tool in business and is used in virtually every area of business, In this course, the word statietice is defined as the science of gathering, analyzing, interpreting, and present- jing numerical data. The study of statistics can be subdivided into two main areas: descriptive statistics and inferential statistics, Deseriptive statisties result from gath- ering data from a body, group, or population and reaching conclusions only about that group, Inferential statistics are generated by gathering sample data from a group, body, or population and reaching conclusions about the larger group from which the sample was drawn. A Most business statisties studies contain variables, measurements, and data. A variable is a characteristic of any entity being studied that is capable of taking on different values. Examples of variables might inelude ‘monthly household food spending, time between arrivals at a restaurant, and patient satisfaction rating. A measurement is when a standard pro- ‘cess is used to assign numbers to particular attributes or characteristics of a variable. Measurements on monthly household food spending might bbe taken in dollars, time between arrivals might be measured in minutes, and patient satisfaction might be measured using a 5-point scale. Data are recorded measurements, Itis data that are analyzed by business stat- isticians in order to learn more about the variables being studied. The appropriate type of statistical analysis depends on the level of data ‘measurement, which can be (1) nominal, (2) ordinal, (3) intereal, or (4) ratio, Nominal is the lowest level, representing classification only of such data as geographic location, sex, or octal Security number. The next level is ordinal, which provides rank ordering measurements in which the inter- vals hetween consecutive numbers do not necessarily represent equal dis- tances. Interval is the next to highest level of data measurement in which the distances represented by consecutive numbers are equal. The highest level of data measurement is ratio, which has all the qualities of interval ‘measurement, but ratio data contain an absolute zero and ratios between numbers are meaningful. Interval and ratio data sometimes are called ‘metric or quantitative data. Nominal and ordinal data sometimes are called nonmetric or qualitative data. 2 Two major types of inferential statisties are (1) parametrie statistics and (2) nonparametric statistics. Use of parametric statistics requires interval or ratio data and certain assumptions about the distribution of the data. ‘The techniques presented in this text are largely parametric. If data are only nominal or ordinal in level, nonparametric statisties must be used. [NMIMS Global Accent - Schoo! for Continuing Education NN Se Lea ORO NOTES LES 1. Consus: The process of gathering data frm the whole population fora given measurement of interest ina Conus 2, Variable: Variable i a characteristic of any entity being studied that is ‘spable of taking on diffrent vali, 8. Population: population i collection of persons, objects, o items of interes. 4. Parameter: A descriptive measure ofthe population sealed a parameter. 5. Sample: A sample is a portion of the whole and, if properly taken, i representative of the whole 6, Statistic: A descriptive measure ofa sample is called statistic 1. Measureme assign numbers to particular atributes or charactersis of avarable 8, Data: Data are recorded measurements 9. Descriptive statistics: 1F' business analyst uses data gathered on 2 troup to describe or reach conclusions abou! that safe group the satis: ths are called descriptive statistics, 10, tnferentil statistics: If researcher gathers dat ffm » sample and Uses the satisties generated to reach conchitins about the population ® from which the sample was taken, the statistics are inferential statistics. @ mesaurement is when a standard process is used to 11. Nominal-level data: The lowest level of data meastirement is the nomi- nal level. Numbers representing nominal-evel data ean be used only to classify or categorize. 12, Ordinal-level data: This measurement is higher than the nominal level. In addition to the nominal level capabilities, ordinal-level measurement ‘can be used to rank or order people o objects. 13, Interval-level data: [s level of data in which the distances between consecutive numbers have meaning and the data are always numerical. 14. Ratio-Level data: This messurement is the highest level of data measure ment, Ratio data have the same properties as interval data, but ratio data, have an absolute zero, and the ratio of two numbers is meaningful. Ea DESCRIPTIVE QUESTIONS 1.1. Give a specific example of data that might be gathered from each of the following business disciplines: accounting, finance, human resources, marketing, information systems, production, and man- agement. An example in the marketing area might be “number of sales per month by each salesperson.” 1.2, State examples of data that can be gathered for decision making pur poses from each of the following industries: manufacturing, insurance, travel, retailing, communications, computing, agriculture, banking, and healthcare. An example in the travel industry might be the cost of business travel per day in various European cities, Peres NOTES 1. 15. 4d. 18. . Give an example of descriptive statistics in the recorded music indus- try. Give an example of how inferential statistics could be used in the recorded music industry. Compare the two examples. What makes them different? ‘Suppose you are an operations manager for a plant that manufactures batteries. Give an example of how you could use descriptive statistics to make better managerial decisions. Give an example of how you could. use inferential statistics to make better managerial decisions. There are many types of information that might help the manager of a lange department store run the business more efficiently and better under- stand how to improve sales. Think about this in such areas as sales, cus- tomers, human resources, inventory, suppliers, etc., and list five variables that might produee information that could aid the manager in his or her job. Write a sentence or two deseribing each variable, and briefly discuss some numerical observations that might he generated for each variable, . Suppose you are the owner of a medium-sized restaurant in a small city. What are some variables associated with different aspects of the ‘business that might be helpful to you in making business decisions about the restaurant? Name four of these variables, and for each vari- able, briefly describe a numerical observation that might be the result of measuring the variable. Classify each of the following as nominal, ordinal, interval, or ratio data. (a) The time required to produce each tire on an assembly line (0) The number of quarts of mill family drinks in a month (©) The ranking of four machines in your plant after they have been designated as excellent, good, satisfactory, and poor (a) The telephone area code of clients in the United States (@) The age of each of your employees (®) The dollar sales at the local pizza shop each month (g) An employee's identification number (h) The response time of an emergency unit ‘The Rathburn Manufacturing Company makes eleetrie witing, which, it sells to contractors in the construction industry. Approximately 900 electric contractors purchase wire from Rathburn annually. Rathburn’s director of marketing wants to determine electric contractors’ satisfac tion with Rathburn’s wire. He developed a questionnaire that yields f satisfaction score between 10 and 50 for participant responses. A random sample of 35 of the 900 contractors is asked to complete a satisfaction survey. The satisfaction scores for the 35 participants are averaged to produce a mean satisfaction score, (a) What is the population for this study? (b) What is the sample for this study? (©) What is the statistic for this study? (@ What would be a parameter for this study? Nye Lee NOTES Ea SOLUTIONS FOR DESCRIPTIVE QUESTIONS 1.1. Examples of data in functional areas: accounting ~ cost of goods, salary expense, depreciation, utility costs, taxes, equipment inventory, etc. finance — World bank bond rates, number of failed savings and loans, ‘measured risk of common stocks, stock dividends, foreign exchange rate, liquidity rates for a single-family, ete. human resources ~ salaries, size of engineering staff, years experience, age of employees, years of edueation, ete. marketing ~ number of units sold, dollar sales volume, forecast sales, size of sales force, market share, measurement of eonsumer motivation, measurement of consumer frustration, measurement of brand preference, aititucle measurement, measurement of consumer risk, ete information systems — CPU time, size of memory, number of work stations, storage capacity, percent of professionals who are connected toa computer network, dollar assets of company computing, number of “hits” on the Internet, time spent on the Internet per day, percentage of people who use the Internet, retail dollars spent in e-commeree, etc. production ~ number of production runs per day, weight of a product; assembly time, number of defects per run, temperature in the plant, ® amount of inventory, turnaround time, ete. ® management - messurement of union participation, measurement of employer support, measurement of tendency to control, number of subordinates reporting to a manager, measurement of leadership style, etc. 1.2. Examples of data in business industries: manufacturing - size of punched hole, number of rejects, amount of inventory, amount of production, number of production workers, etc. insurance — number of claims per month, average amount of life insurance per family head, life expectancy, cost of repairs for major auto collision, average medical costs incurred for a single female over 45 years of age, ete. travel — cost of airfare, number of miles traveled for ground transported vacations, number of nights away from home, size of traveling party, amount spent per day on besides lodging, etc. retailing - inventory tumover ratio, sales volume, size of sales force, number of competitors within 2 miles of retail outlet, area of store, number of sales people, ete, communications — cost per minut, number of phones per office, miles of eable per customer headquarters, minutes per day of long distance usage, number of operators, time between calls, ete. computing ~ age of company hardware, cost of software, number of CAD/CAM stations, age of computer operators, measure to evaluate competing software packages, size of data base, etc. NOTES 13. 1A. agriculture ~ number of farms per county, farm income, number of acres of corn per farm, wholesale price of a gallon of milk, number of livestock, grain storage capacity, ete. banking - size of deposit, number of failed banks, amount loaned to foreign banks, number of tellers per drive-in facility, average amount of withdrawal from automatic teller machine, federal reserve discount rate, ete. healtheare ~ number of patients per physician per day, average cost of hospital stay, average daily census of hospital, time spent waiting to see a physician, patient satisfaction, number of blood tests done per week. Deseriptive statisties in recorded musie industry — 1. RCA total sales of compact dises this week, number of artists under contract to a company at a given time. 2, total dollars spent on advertising last month to promote an album. 3. number of units produced in a day. 4. number of retail outlets selling the company’s products. Inferential statisties in recorded music industry ~ 1. Measure the amount spent per month on recorded music for a few consumers then use that figure toinfer the amount for the population. 2, Determination of market share for rap music by randomly selecting a sample of 500 purchasers of recorded music. 3, Determination of top ten single records by sampling the number of requests at a few radio stations. 4. Estimation of the average length of a single recording by taking a sample of records ane measuring them. The difference between descriptive and inferential statistics lies mainly in the usage of the data, These descriptive examples all gather data from every item in the population about which the description is being, made. For example, RCA measures the sales on all its compact dises for a week and reports the total. In each of the inferential statistics examples, a sample of the population is taken and the population value is estimated or inferred from the sample, For example, it may be practically impossible to determine the proportion of buyers who prefer rap music. However, a random, sample of buyers can be contacted and interviewed for musiepreference. ‘The results ean be inferred to population market share, Descriptive statistics in manufacturing batteries to make better decisions ~ 1, total number of worker hours per plant per week — help management understand labor costs, work allocation, productivity, ete. 2. company sales volume of batteries in a year — help management decide if the product is profitable, how much to advertise in coming year, compare to costs to determine profitability 13. Nye Lee 3. total amount of sulfuric acid purchased per month for use in battery production — can be used by management to study wasted inventory, serap, etc. Inferential statistics in manufacturing batteries to make decisions ~ 1. Take a sample of batteries and test them to determine the average shelf life - use the sample average to reach eonelusions about all batteries of this type, Management can then make labeling and advertising claims. They can compare these figures to the shelf life of competing batteries. 2, Take a sample of battery consumers and determine how many batteries they purchase per year Infer to the entire population — management can use this information to estimate market potential and penetration. 3. Interview a random sample of production workers to determine attitude towards company management - management ean use this survey results to ascertain employee morale and to direct efforts towards creating a more positive working environment which, hopefully, results in greater productivity. 1. Size of sale (8) per customer in men’s formal wear. Either by taking a sample or using a census, management could eompute the average sale in men's formal wear of alweekly period and compare the number to the same average taken a year ago or a month ago to determine if more is being sold per customer, Other variables might include number of sales per hour, number of people entering the department per day, number of dress shirts sold per day, ete. 2. Number of employees working per day. This variable could indicate the day of the week (certain days have more or less sales), sales activity (how sales are doing overall), or even health of associates, Other variables might inelude percent of employees absent due to illness, average number of hours worked per week per employee, number of open positions, etc. 3, Inventory turnover rate. How fast are items in the store selling? Other variables migh include reorder rate, percent of storage space utilized, number of stockouts per week, etc. 4, Number of customers that enter the store per hour. This figure will vary by day, time of day, and season, Compare figures on this variable from period to period can give some indieation of sales trends which ean help drive human resource planning, ete. Other variables might include amount of time spent per customer in the store per visit, distance that customers travel to shop in the store, number of referrals that customers make to other people annually, etc. 5. Percentage of people paying with eash, Percentage of people using credit cards. These can be used to expedite pay systems, investigate employee theft, calculate surcharges associated with credit cards, etc, Other variables might include average time per checkout, average wait time in pay line, ete. ee NOTES NOTES 1.6. 1. Size of bill or tab. This variable is the total amount in dollars spent bya patron per visit to the restaurant. The bill or tab could be for an individual or a group and would include both food and beverages if they are all ineluded in the bill. Of course, the measurement would be in dollars. This information could be very useful for the manager ‘or owner to know the average size ofa bill bath in projecting out total reventies over a period or as a baseline before a marketing effort to increase sales, 2, Percentage of capacity filled. This variable could be measured. at various intervals, times, and days of the week, The measurement ‘would be ealeulated by taking the number of patronsin the restaurant at any one time divided by the total number of seats in the restaurant (capacity). From this, management could make staffing decisions for various times and days of the week In addition, management could make decisions about when to expand, how much to advertise, and/ or when to run specials. 3, Length of stay. The measurement is how many minutes people are actually in the restaurant from the time they are assigned a table until they are leaving? From this, management could determine customer turnover rates which have capacity implications. That is, how many times in a day is an average table “turned over”. If people stay longer, do they spend more? ‘Number of arrivals per S-ninute intervals. The measurement is how many customers arrive at the front door to be greeted by the maitre ‘d in any given five-minute period. This figure will likely vary bby day of the week, season of the year, and time of day. Management ‘ean use this information for staffing decisions and planning. 1.1. (a) ratio (b) ratio (©) ordinal (@) nominal (©) ratio (®) ratio (g) nominal (hy ratio 18. (a) The population for this study is the 900 electric contractors who purchased Rathburn wire. (b) The sample is the randomly chosen group of thirty-five contractors. (©) The statistic is the average satisfaction score for the sample of thirty-five contractors. (a) The parameter is the average satisfaction score for all 900 electric contractors in the population. Nye Lee NOTES Ea ANSWERS AND HINTS ANSWERS FOR SELF ASSESSMENT QUESTIONS, oo om = 12 Basie Statistial Concepts 1 Population 2 = ‘ Inductive statistics 4 True a False 1.8 Dita Measurement ‘ Nominal level 1 1 Interval eval data Ratolers data . Tre 10. bea i. False C HAPTER CHARTS AND GRAPHS CONTENTS 1 Introduction Frequency Distributions Class Midpoint Relative Frequency Cumulative Frequency Self Assessment Questions ® Activity ® 23 Quantitative Data Graphs Histograms Using Histograms to Get an Initial Overview of the Data Frequency Polygons Osives Self Assessment Questions 24 Qualitative Data Graphs 24 Pie Charts 242 Bar Graph Self Assessment Questions Activity 5 Charts and Graphs for Two Variables Cross Tabulation Seatter Plot Self Assessment Questions Activity Summary, Deseriptive Questions Solutions for Descriptive Questions Answers and Hints errr ey enn ee Tere Oooo Cee) DATA VISUALIZATION USING VIZARD Infruid Labs is a Hyderabad based Business Analytics and Data ‘Visualization Solutions company. The company offers analytics software that can handle big data to provide actionable business intelligence to customers by helping them in finding patterns in large datasets. According to Gartner IT Glossary “Big data ie high-volume, high-velocity and/or high-variety information assets that demand cost-effective, inno- vative forms of information processing that enable enhanced insight, deci- ‘sion making, and process automation.” ‘The company was founded by Mahesh Yellai a management graduate from Indian School of Business, Hyderabad and a B. Tech. from Indian’ Institute of Technology — Madras to help organizations to make data- driven decisions. The company is helping its clients in different business areas such as product strategy, human resource management,new prod- uct development, operations management, competitor analysis, finan- cial analysis, etc. The company was awarded best Enterprise Software Startup by HYSEA in 2016 and also got awarded with one of 50 Emerging Startups award by NASSCOM in 2017. Vizard is Infruid’s flagship analytics product. As an instant search- driven analyties platform, Vizard help users to derive deep insights from their data. Vizard helped its clients to democratize analytics across the Organization, so that everyone in the organization, who needs to make decisions, has easy access to instant insights. Some of the best run businesses are using Vizard across Sales, Marketing, Finance, HR and Operations departments, ‘Vizard’s consume-grade user experience hides all the complexity of Big Data processing behind a simple search box, so users ean type any ques- tion about their Business and slice and dice their data even at big data scale to get visual insights instantaneously. Source: htps/[Link] Few examples where Infruid’s analytics platform helped its clients QA leading industrial equipment manufacturing company used Vizard to help its Sales team to optimize selling price and maximize revenue. ‘The optimal selling price, for each of thousands of products, was deter- mined in Vizard using historical data. Vizard helped the company to increase its revenue by facilitating data-driven pricing decisions. QA leading Steel manufacturer used Vizard to derive insights from the inventory data and to interact with Enterprise Resource Planning (ERP) software. Vizard has helped the company to plan its inventory stock movement across the various distributed warehouses and reduce: the Inventory Holding Costs. 2 A leading renewable energy company used Vizard to run its Network. Operations Centre to analyze Internet of Things ([oT) data from its, solar power plants and remotely monitor geographically distributed. power plants. Vizard also helped the company to lower its opera- tional expenditure through preventive maintenance. WAY AHEAD ‘The company has planned to release enhanced data discovery capabil- ities to support augmented data discovery. Augmented data discovery feature uses Artificial Intelligence (AI) to autodetect patterns of interest from data and bubble them up to users’ attention. ‘Source: Adapted from Srshts Deora, "[Link] OfEnterprsos Do Not Havo A Clear Seperation (Of Ownership OF Deta And Insights, Says Mahosh Yella Ofinfruid”, Analytics Indic, ‘ttps/[Link]:not slear-separation-ownorship-data-insights- spe maberh lif Garner IT Ges Big Dt, pewregarinercomit sty) iedate Rint cuenny ‘The overall objective of Chapter? is for you to master several techniques for summarizing and depicting data, thereby enabling you to: 2 Create a frequency distribution froma set of data. > Create and evaluate different types of quantitative data graphs, including histograms, frequency polygons, ogives, dot plots, and stem-and-leaf plots, in order to evaluate the data being graphed. > Create and evaluate different types of qualitative data graphs, including pie charts, bar graphs, and Pareto charts, in order to analyze the data being graphed. > Create a cross-tabulation table and analyze basic ‘two-variable scatter plots of numerical data. INTRODUCTIO!} In Chapters 2 and 8 many techniques are presented for reformatting or reducing data so that the data are more manageable and ean assist decision makers more effectively. Some of the most effective mechanisms for present- ing data in a form meaningful to decision makers are graphical depictions. This chapter focuses on graphical tools for summarizing and presenting data, Through graphs and charts, the decision maker can often get an overall picture of the data and reach some useful conclusions merely by studying the chart or graph. Key characteristics of graphs often suggest appropriate choices among potential numerical methods (discussed in later chapters) for analyzing data. Visual representations of data are often much more effective communication tools than tables of numbers in business meetings. ‘A first step in exploring and analyzing data is to reduce important and some- times expensive data to a graphic picture that is clear, concise and consis- tent with the message of the original data. Converting data to graphies can be creative and artful. In this chapter, guidelines are provided for select- ing appropriate graphical representations for data sets. Charts and graphs discussed in Chapter 2 include histograms, frequency polygons, ogives, dot plots, stem-and-leaf plots, bar charts, pie charts, and Pareto charts for one-variable data and both eross-tabulation tables and seatter plots for two- variable numerical data. El FREQUENCY DISTRIBUTIONS Raw data, or data that have not been summarized in any way), are sometimes referred fo as ungrouped data, As an example, Table 2.1 contains 60 years of raw data of the unemployment rates for Canada. Data that have been orga- nized into a frequency distribution are called grouped data, Table 2.2 pres- ents a frequency distribution for the data displayed in Table 2.1. The distinction between ungrouped and grouped data is important because the calculation of statistics differs between the two types of data. Several of the charts and graphs presented in this chapter are constructed from grouped data, ming Edueatio NOTES NOTES 23, 10 63 13 96 28 Ta 56 106 ot ‘One particularly useful tool for grouping data is the frequency distribution, which is a summary of data presented in the form of class intervals and fre quencies. How is a frequeney distribution constructed from raw data? That is, how are frequency distributions like the one displayed in Table 2.2 con- structed from raw data like those presented in Table 2.1? Frequency distri- butions are relatively easy to construct. Although some guidelines and rules of thumb help in their construction, frequency distributions vary in final shape and design, even when the original raw data are identical. In a sense, frequency distributions are constructed according to individual business researchers’ taste, When constructing a frequency distribution, the business researcher should first cletermine the range of the raw data. The range often is defined as the difference between the largest and smallest numbers. The range for the data in ‘Table 2.1 is 9.7 (12.0~2.3), ‘The second step in constructing a frequency distribution is to determine how ‘many classes it will contain, One rule of thumb is to select between 3 and 15 classes. If the frequency distribution contains too few classes, the data summary may be too general to be useful. Too many classes may result in a frequency distribution that does not aggregate the data enough to be helpful. The final number of classes is arbitrary. The business researcher arrives at a number by examining the range and determining a number of classes that will span the range adequately and also be meaningful to the user. The data in Table 2.1 were grouped into six classes for Table 2.2. After selecting the number of classes, the business researcher must deter- mine the width of the class interval. An approximation of the class width can be calculated by dividing the range by the number of classes. For the data in Table 2.1, this approximation would be 9.76 = 1.62. Normally, the number is rounded up to the next whole number, which in this ease is 2. The frequency distribution must start at a value equal to or lower than the lowest number of the ungrouped data and end at a value equal to or higher than the high- est number. The lowest unemployment rate is 2.3 and the highest is 12.0, so the business researcher starts the frequeney distribution at | and ends it at 13. Table 2.2 contains the completed frequency distribution for the data in Table 2.1, Class endpoints are selected so that no value of the data can fit into more than one class. The class interval expression “under” in the distri- bution of Table 2.2 avoids such a problem. 22.41 CLASS MIDPOINT ‘The midpoint of each class interval is called the class midpoint and is some- times referred to as the elass mark. It is the value halfway across the class intereal and can be calculated as the average of the two class endpoints. For example, in the distribution of Table 2.2, the midpoint of the class interval Buunder 5 is 4, or (3 + 5)2. ‘The class midpoint is important, because it becomes the representative value for each class in most group statistics calculations. The third column in Table 2.3 contains the class midpoints for all elasses of the data from Table 2.2. 2.2.2 RELATIVE FREQUENCY. Relative frequency is the proportion of the total frequency that is in any given class interval in a frequency distribution. Relative frequency is the individ- ual class frequency divided by the total frequency. For example, from Table 23, the relative frequeney for the class interval S-under 7 is 13/60 = rr cUMU Interval Frequency Midpoint Frequency Frequency Launder 3 4 2 0667 4 s-under 5 2 4 2000 16 Suunder 7 13 6 2167 29 Tounder 9 19 5 167 48 Seunder 11 7 10 1167 55 L-under 13 5 2 0833, 60 Total 6 NOTES NOTES Consideration of the relative frequency is preparatory to the study of prob- ability in Chapter 4. Indeed, if values were selected randomly from the data in Table 2.1, the probability of drawing a number that is “S-under 7” would be 2167, the relative frequency for that class interval. The fourth column of ‘Table 23 lists the relative frequencies for the frequency distribution of Table 2 2.2.3 CUMULATIVE FREQUENCY ‘The cumulative frequency is a running total of frequencies through the classes of a frequency distribution, The cumulative frequency for each class interval is the frequency for that class interval added to the preceding cumulative total. In Table 2.3 the cumulative frequency for the first class is the same as the class frequeney: 4. The cumulative frequeney for the second clase inter- valis the frequeney of that interval (12) plus the frequency of the first interval (4), which yields a new cumulative frequency of 16. This process continues ‘through the last interval, at which paint the cumulative total equals the sum of the frequencies (60). The concept of cumulative frequency is used in many areas, including sales cumulated over a fiseal year; sports scores during a contest (cumulated points), years of service, points earned in a course, and costs of doing business over a period of time. Table 2.3 gives cumulative frequencies for the data in Table 2.2, ® DEMONSTRATION PROBLEM 2.1 ® ‘The following data from the Federal Home Loan Mortgage Corporation are the average monthly 30-year fixed rate mortgage interest rates fora recent 40-month pericd. 506 489 0 475 4 495° 47d 4954.07 Jia [eS S10 471427 391 B34 Construct s frequency distribution for these data, Csleulate and display the class midpoints, relative frequencies, and cumulative frequencies for this fre- queney distribution. Solution: How many classes should this frequency distribution contain? ‘The range of the data ie 1.76 (5.10 - 9.34). If 8 classes are used, each class width is approximatel Range Class width = ‘Number of Classes 76 LB 022 3S Ifa class width of 25 is used, a frequency distribution can be constructed with endpoints that are more uniform looking and allow for presentation of the information in categories more familiar to mortgage interest rate users. The first endpoint must be 3.34 or lower to include the smallest value: the last endpoint must be 5.10 or higher to include the largest value. In this case, the frequency distribution begins at 3.25 and ends at 5.25. The resulting frequency distribution, class midpoints, relative frequencies, and cumulative frequen- cive are listed in the following table: Class Relative Cumulative Interval Frequency Midpoint Frequency Frequency 3.25-under 3.50, 3 3.875 075 3 3.50-under 3.75, 4 3.625 100 T [Link] 4.00, 7 3.875 175 ry 4.00-under 4.25 3 425 4.25-under 4.50 4 4.375 4.50-under 4.75 6 4.625 4.75-under 5.00 10 4.875 5.00-under 5.25 3 B25 Total 40 The frequencies and relative frequencies of these data reveal the mortgage interest rate classes that are likely to oceur during this period of time. Over- all, the mortgage rates are distributed relatively evenly with the 4.75-under 5.00 class interval containing the greatest frequency (10), followed by the 3.75- tunder 4,00 class interval (7), and the 4.50-under 4.75 interval (6). L is the individual class frequency divided by the total Frequency, 2. Class midpoint is also sometimes referred as the 3. The cumulative frequency for each class interval is the for that elass interval added to the preeeding cumulative total, an “List three specific uses of cumulative frequencies in business. EEE] quantirarive pata GRAPHS One of the most effective mechanisms for presenting data in a form meaning- ful to decision makers is graphical depiction. Through graphs and charts, the decision maker ean often get an overall picture of the data and reach some useful conclusions merely by studying the chart or graph. Converting data to graphics ean be creative and artful, Often the most diffieult step in this process is to reduce important and sometimes expensive data to a graphic picture that is both clear and concise and yet consistent with the message of reer ey een eee a een NOTES NOTES the original data. One of the most important uses of graphical depiction in statisties is to help the researcher determine the shape of a distribution. Data graphs can generally be classified as quantitative or qualitative, Quantitative data graphs are plotted along a numerical seale, and qualitative graphs are plotted using non-numerical categories. In this section, we will examine five types of quantitative data graphs: (1) histogram, (2) frequeney polygon, (3) ogive, (4) dot plot, and (5) stem-and-leaf plot. 2.241 HISTOGRAMS One of the more widely used types of graphs for quantitative data is the histogram. A histogram is a series of contiguous rectangles that represent the frequeney of data in given elase intervals, If the class intervals used along the horizontal axis are equal, then the heights of the rectangles repre- sent the frequency of values in a given class interval. Ifthe class intervals are unequal, then the areas of the rectangles ean be used for relative compari- sons of class frequencies. Construction of a histogram involves labelling the x-axis (abscissa) with the class endpoints and the y-axis (ordinate) with the frequencies, drawing a horizontal line segment from class endpoint to class endpoint at each frequency value, and connecting each line segment verti- cally from the frequency value to the x-axis to form a series of rectangles. Figure 2:1 is « histogram of the frequeney distribution in Table 2.2. A histograms usefl too! for differentiating the frequencies of class inter~ ® vals. A quick glance at @ histogram reveals which class intervals produce e the highest frequency totals. Figure 2.1 clearly shows that the class interval T-under-9 yields by far the highest frequency count (19). Examination of the histogram reveals where large increases or decreases oceur between classes, such as from the L-under 3 class to the 2-under 5 class, an inerease off, an from the T-under 9 class to the 9-under 11 class, a deerease of 12 Note that the scales used along the x- and y-axes for the histogram in Figure 2.1 are almost identical, However, because ranges of meaningful numbers for 2 Poso3 7 9 os ‘Unemployment Rates for Canada Figure 2.1: Histogram of Canadian Unemployment Data NOTES the two variables being graphed often differ considerably, the graph may have different seales on the two axes. Figure 2.2 shows what the histogram of unemployment rates would look like if the scale on the y-axis were more compressed than that on the x-axis. Notice that with the compressed graph, Figure 2.2, there appears to be less differ- ence between the lengths of the rectangles than those in Figure 2.1 implying that the differences in frequencies for the compressed graph are nat as great as they are in Figure 2.1. It is important that the user of the graph clearly understands the seales used for the axes of a histogram. Otherwise, a graph's creator can “lie with statistics” by stretching or compressing a graph to make a point.” 2 Frequency pos Sl ‘Unemployment Rates for Canada Figure 2.2: Histogram of Canadian Unemployment Data (y-axis compressed) ‘ie shold be pointed ou that the seve package Excel ses che term Mstagrom to refer 0 Frequancy diebution. Hoover by checking Chars Ontput i the Exel Kstgrom dialog So, & ‘rophical histogram i olan created 2.3.2 USING HISTOGRAMS TO GET AN OF THE DATA {ITIAL OVERVIEW Because of the widespread availability of computers and statistical software packages to business researchers and decision makers, the histogram con- tinues to grow in importance in yielding information about the shape of the distribution of a large database, the variability of the data, the central loca- tion of the data, and outlier data. Although most of these concepts are pre- sented in Chapter 3, the notion of histogram as an initial tool to access these data characteristies is presented here. A business researcher measured the volume of stocks traded on Wall Street three times a month for nine years resulting in a database of 324 observa- tions. Suppose a financial decision maker wants to use these data to reach some conclusions about the stock market. Figure 23 shows a produced histogram of these data. What can we learn from this histogram? Virtually all stock market volumes fall between zero and 1 billion shares. The distribution takes on a shape that is high on the left end and tapered to the right. In Chapter 3 we will learn that the shape of this distribution is skewed toward the right end. In statisties, it ig often NOTES so so 20 | o 500 milion 1 ion Figure 2.3: Histogram of Stock Volumes useful to determine whether data are approximately normally distributed (bell shaped curve) as shown in Figure 2.4. [Normal Distribution [ \ S/ \ Figure 24: Normal Distribution We can see by examining the histogram in Figure 2.3 that the stock market volume data are not normally distributed. Although the centre of the histo- gram is located near 500 million shares, a large portion of stock volume obser- vations falls in the lower end of the data somewhere between 100 million and 400 million shares. In addition, the histogram shows some outliers in the upper end of the distribution, Outliers ane data points that appear outside of the main body of observations and may represent phenomena that differ from those rep- resented by other data points. By observing the histogram, we notice a few data observations near 1 billion, One could conclude that on a few stock market days an unusually large volume of shares are traded. These and other insights can be gleaned by examining the histogram and show that histograms play an important role in the initial analysis of data. 2.3.3 FREQUENCY POLYGONS A frequency polygon, like the histogram, is a graphical display of class fre- quencies. However, instead of using rectangles like a histogram, in a frequency polygon each class frequency is plotted as a dot at the class midpoint, and the dots are connected by a series of line segments, Construction of a frequency polygon begins by sealing class midpoints along the horizontal axis and the frequency seale along the vertical axis. A dot is plotted for the associated frequency value at each class midpoint, Connecting these midpoint dots com- pletes the graph. Figure 2.5 shows a frequency polygon of the distribution data from Table 2.2 produced by using the sofiware package Excel, The informa- tion gleaned from frequency polygons and histograms is similar: As with the histogram, changing the scales of the axes can compress or stretch a frequency polygon, which affects the user's impression of what the graph represents. 0 1s 16 “ R - Eg es \ gE. \. ¢ : a : ° Class Midpoint Figure 2.5: Frequency Polygon of the Unemployment Data 2.3.4 OGIVES ‘An ogive (o-jive) is a cumulative frequency polygon. Construction begins by labeling the x-axis with the class endpoints and the y-axis with the frequen- cies, However, the use of cumulative frequency values requires that the scale along the y-axis be great enough to inelude the frequeney total. A dot of zero frequency is plotted at the beginning of the first class, and construetion pro- ceeds by marking a dot at the end of each class interval for the cumulative value, Connecting the dots then completes the ogive. Figure 2.6 presents an ogive produced by using Excel for the data in Table 2.2, Ogives are most useful when the decision maker wants to see running totals. For example, ifa comptroller is interested in controlling costs, an ogive could depict cumulative costs over a fiseal year, Steep slopes in an ogive can be used to identify sharp increases in frequen- cles. In Figure 2.6, a particularly steep slope occurs in the 7-under 9 class, signifying a large jump in class frequency totals. Table 2.4 contains scores from an examination on plant safety policy and rules given to a group of 35 job trainees. A stem-and-leaf plot of these data is displayed in Table 2.5. One advantage of such a distribution is that the instructor ean readily see whether the seares are in the upper ar lower end ofeach bracket and also determine the spread of the scores. A second advan- NOTES NOTES ‘curative Frequency zg s & 8 8 a sa to Ta ‘G4ass Endpoints ‘igure 2.6: Ogive of the Unemployment Data PRT ee eee pin ean sti tage of stem-and-leaf plots is that the values of the original raw data are retained (whereas most frequency distributions and graphic depictions use the class midpoint to represent the values in a class). Fill in the blanks! 4. Through and , the decision maker can often get an overall picture of the data and reach some useful eonelusions. 5. Data graphs can generally be classified as or 6. are data points that appear outside of the main body of ‘Observations and may represent phenomena that differ from those represented by other data points. 7. At) is a cumulative frequeney polygon. Eg QUALITATIVE DATA GRAPHS In contrast to quantitative data graphs that are plotted along 2 numerical scale, qualitative graphs are plotted using non-numerieal estegories. In this section, we will examine three types of qualitative data graphs: (1) pie charts, (2) bar charts, and (@) Pareto charts, 24.1 PIE CHARTS A pie chart is a circular depiction of data where the area of the whole pie rep- ‘resents 100% of the data and slices of the pie represent a percentage breakdown of the sublevels. Pie charts show the relative magnitudes of the parts to the whole. They are widely used in business, particularly to depict such things as budget categories, market share, and time/resource allocations. However, the use of pie charts is minimized in the sciences and technology because pie charts can lead to less accurate judgments than are possible with other types of graphs. Generally, itis more difficult for the viewer to interpret the relative size of angles in a pie chart than to judge the length of rectangles in a bar chart. Construction of the pie chart begins by determining the proportion of the sub-unit to the whole. Table 2.6 contains the refining capacity (1,000 barrels per day) of the top five petroleum refining companies in the United States in a recent year ‘Toconstruct a pie chart from these data, first convert the raw capacity figures to proportions by dividing each eapaeity figure by the total capacity figure (15,134). This proportion is analogous to the relative frequency computed for frequency distributions. Because a circle contains 360°, each proportion is then multiplied by 360 to obtain the number of degrees to represent each company in the pie chart. For example, Exxon Mobil’s capacity of 5,589 (1,000 barrels) represents a .3693 proportion of the total capacity for these five com- panies. (5,589/15,134 = 0.3693). Multiplying this value by 360° results in an angle of 122.95", The pie chart is then constructed by determining each of the other angles and using a compass to lay out the slices. The pie chart in Figure 2.7, depicts the data from Table 2.6. ee NOTES

You might also like