Unit 3 - BA - July 2022
Unit 3 - BA - July 2022
Exploring Data
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a
publicly accessible website, in whole or in part.
Chapter 2
DECISION MAKING
DATA ANALYSIS AND
BUSINESS ANALYTICS:
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Populations and Samples
A population includes all of the entities of interest
in a study (people, households, machines, etc.)
Examples:
All potential voters in a presidential election
All subscribers to cable television
All invoices submitted for Medicare reimbursement by
nursing homes
A sample is a subset of the population, often
randomly chosen and preferably representative of
the population as a whole.
Examples: Gallup, Harris, other polls today
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sets, Variables, and Observations
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.1:
Questionnaire [Link]
Objective: To illustrate variables and observations in a typical data
set.
Solution: Data set includes observations on 30 people who responded
to a questionnaire on the president’s environmental policies.
Variables include: age, gender, state, children, salary, opinion.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Types of Data
(slide 1 of 5)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Types of Data
(slide 2 of 5)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Types of Data
(slide 4 of 5)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Typical Time Series Data Set
(slide 5 of 5)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Descriptive Measures for
Categorical Variables
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.2:
Supermarket [Link] (slide 1 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.2:
Supermarket [Link] (slide 2 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.2:
Supermarket [Link] (slide 3 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Descriptive Measures for
Numerical Variables
Objective: To learn how salaries are distributed across all 2011 MLB
players.
Solution: Data set contains data on 843 Major League Baseball players in
the 2011 season.
Variables are player’s name, team, position, and salary.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3:
Baseball Salaries [Link] (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Central Tendency
(slide 1 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Central Tendency
(slide 3 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Minimum, Maximum,
Percentiles, and Quartiles
For any percentage p, the pth percentile is the value such that
a percentage p of all values are less than it.
The quartiles divide the data into four groups, each with
(approximately) a quarter of all observations.
The first, second and third quartiles are the percentiles
corresponding to p = 25%, p = 50%,
and p = 75%.
By definition, the second quartile (p = 50%) is equal to the median.
The minimum and maximum values can be calculated with
Excel’s MIN and MAX functions, and the percentiles and
quartiles with Excel’s PERCENTILE and QUARTILE
functions.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Variability
(slide 1 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rules for Interpreting Standard Deviation
(slide 1 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rules for Baseball Salaries
(slide 2 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Empirical Rules for Interpreting Standard Deviation
(slide 3 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Shape
(slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Measures of Shape
(slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Numerical Summary Measures in the
Status Bar and with StatTools
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3 (Continued):
Baseball Salaries [Link]
Objective: To learn the
fundamentals of StatTools and
use it to generate summary
measures of baseball salaries.
Solution: First, define a StatTools
data set, by selecting any cell in
the data set and clicking the Data
Set Manager button.
Then generate summary measures
for the Salary variable, by
selecting One-Variable Summary
from the Summary Statistics
dropdown list and filling in the
dialog box that appears.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Charts for Numerical Variables
There are many graphical ways to indicate the
distribution of a numerical variable.
For cross-sectional variables:
Histograms
Box plots
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Histograms
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3 (Continued):
Baseball Salaries [Link] (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3 (Continued):
Baseball Salaries [Link] (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.4:
Late or Lost [Link] (slide 1 of 2)
Objective: To fine-tune a
histogram for a variable with
integer counts.
Solution: Data set lists the number
of bags that were either late or lost
for 456 flights.
In the Histogram dialog box,
request 9 bins and set the minimum
and maximum to -0.5 and 8.5.
StatTools divides the range into 9
equal-length bins.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.4:
Late or Lost [Link] (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Box Plots
A box plot (or box-whisker plot) is an alternative
type of chart for showing the distribution of a
variable.
The elements of a generic box plot are shown below:
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.3 (Continued):
Baseball Salaries [Link]
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Time Series Data
Our main interest in time series variables is how
they change over time, and this information is lost
in traditional summary measures and in histograms
or box plots.
For time series data, a time series graph is used.
This is a graph of the values of one or more time
series, using time on the horizontal axis.
This is always the place to start a time series analysis.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.5:
Crime in [Link] (slide 1 of 3)
Objective: To see how time series graphs help to detect trends in crime
data.
Solution: Data set contains annual data on violent and property crimes for
the years 1960 to 2010.
In StatTools, designate a StatTools data set.
Then select Times Series Graph from the Time Series and Forecasting
dropdown list and fill in the resulting dialog box.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.5:
Crime in [Link] (slide 2 of 3)
Population Totals
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.5:
Crime in [Link] (slide 3 of 3)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.6:
DJIA Monthly [Link] (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.6:
DJIA Monthly [Link] (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Outliers
An outlier is a value or an entire observation (row)
that lies well outside of the norm.
Some statisticians define an outlier as any value more
than three standard deviations from the mean, but this
is only a rule of thumb.
Even if values are not unusual by themselves, there
still might be unusual combinations of values.
When dealing with outliers, it is best to run the
analyses two ways: with the outliers and without
them.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Missing Values
Most real data sets have gaps in the data.
There are two issues: how to detect these missing values
and what to do about them.
The more important issue is what to do about them:
One option is to simply ignore them. Then you will have to
be aware of how the software deals with missing values.
Another option is to fill in missing values with the average of
nonmissing values, but this isn’t usually a very good option.
A third option is to examine the nonmissing values in the row
of a missing value; these values might provide clues on what
the missing value should be.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Excel Tables for Filtering,
Sorting, and Summarizing
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.7:
Catalog [Link] (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.7:
Catalog [Link] (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Filtering
Finding records that match particular criteria is called filtering.
One way to filter is to create an Excel table, which
automatically provides dropdown arrows next to the field names
that allow you to filter.
There are also three ways to filter on any rectangular data set
with variable names:
1. Use the Filter button from the Sort & Filter dropdown list on the
Home ribbon.
2. Use the Filter button from the Sort & Filter group on the Data
ribbon.
3. Right-click any cell in the data set and select Filter. You get several
options, the most popular of which is Filter by Selected Cell’s
Value.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 2.7 (Continued):
Catalog [Link] (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a
publicly accessible website, in whole or in part.
Chapter 3
DECISION MAKING
Finding Relationships among Variables
DATA ANALYSIS AND
BUSINESS ANALYTICS:
Introduction
The primary interest in data analysis is usually in
relationships between variables.
The most useful numerical summary measure is correlation.
The most useful graph is a scatterplot.
To break down a numerical variable by a categorical variable, it
is useful to create side-by-side box plots.
Excel’s® pivot table breaks down one variable by others so that
all sorts of relationships can be uncovered very quickly.
The diagram in the file Data Analysis [Link]
gives you the big picture of which analyses are appropriate
for which data types and which tools are best for
performing the various analyses.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Relationships Among
Categorical Variables
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.2:
Baseball Salaries 2011 [Link] (slide 1 of 2)
Objective: To learn methods in StatTools for breaking down
baseball salaries by various categorical variables.
Solution: Data set contains the same 2011 baseball data examined
previously, as well as several extra categorical variables.
Create summary measures by selecting One-Variable Summary
from the Summary Statistics dropdown list.
Next, click the Format button and choose Stacked. Then choose the
Cat variable you want to categorize by and the Val variable you
want to summarize.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.2:
Baseball Salaries 2011 [Link] (slide 2 of 2)
Create side-by-side
boxplots, by selecting
Box-Whisker Plot
from the Summary
Graphs dropdown list
and filling in the
resulting dialog box.
Select the Stacked
format so that you can
choose a Cat variable
and a Val variable.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Relationships Among Numerical
Variables
To study relationships among numerical variables,
a new type of chart, called a scatterplot, and two
new summary measures, correlation and
covariance, are used.
These measures can be applied to any variables that
are displayed numerically.
However, they are appropriate only for truly
numerical variables, not for categorical variables
that have been coded numerically.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Scatterplots
A scatterplot is a scatter of points, where each
point denotes the values of an observation for two
selected variables.
It is a graphical method for detecting relationships
between two numerical variables.
The two variables are often labeled generically as X
and Y, so a scatterplot is sometimes called an X-Y
chart.
The purpose of a scatterplot is to make a relationship
(or the lack of it) apparent.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.3:
[Link] (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.3:
[Link] (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Trend Lines in Scatterplots
Once you have a scatterplot, Excel enables you to
superimpose one of several trend lines on the
scatterplot.
A trend line is a line or curve that “fits” the scatter as
well as possible.
This could be a straight line, or it could be one of
several types of curves.
To do this, right-click on any point in the chart,
select Add Trendline, and fill out the resulting
dialog box.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Scatterplot with Trend Line and Equation
Superimposed
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation and Covariance
(slide 1 of 4)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation and Covariance
(slide 3 of 4)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Correlation and Covariance
(slide 4 of 4)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.3 (Continued)
[Link] (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pivot Tables
The pivot table is an Excel tool that allows you to
break data down by categories.
Sometimes pivot tables are used to display tables of
counts, often called crosstabs or contingency
tables.
However, crosstabs typically list only counts,
whereas pivot tables can list counts, sums,
averages, and other summary measures.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.4:
Elecmart [Link] (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.4:
Elecmart [Link] (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Hiding Categories (Filtering)
You can filter out any items in a pivot table that you don’t want
to see.
Click the Row Labels dropdown arrow of the active field and check
the items you want to filter on.
A pivot table with hidden categories is shown below.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sorting on Values or Categories
It is easy to sort in a pivot table, either by the
numbers in the Values area or by the labels in a
Rows or Columns field.
To sort by the numbers in the Values area, right-click
any number and select Sort.
To sort on the labels of a Rows or Columns field, right-
click any of the categories and select Sort.
You can also click the dropdown arrow for the field and get
the dialog box that allows both sorting and filtering.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Changing Locations of Fields (Pivoting)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Changing Field Settings
You can change various settings in the Field Settings dialog
box.
To get to this dialog box:
Click the Field Setting button on the Analyze/Options ribbon.
OR right-click any of the pivot table cells and select the Field Settings item.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pivot Charts
It is easy to accompany pivot tables with pivot charts.
These charts adapt automatically to the underlying pivot table.
To create a pivot chart, click anywhere inside the pivot table,
select the PivotChart button on the Analyze/Options ribbon, and
select a chart type.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Variables in the Values Area
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Summarizing by Count
The variable in the Values area can be summarized by
the Count function.
This is useful when you want to know, for example, how
many of the orders were placed by females in the South.
Right-click any number in the pivot table, select Value
Field Settings, and select the Count function.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Grouping
Categories in a Rows or Columns variable can be grouped.
Suppose you want to summarize Sum of Total Cost by
Date.
Starting with a blank pivot table, check both Date and Total
Cost in the PivotTable Fields pane.
Then right-click any date and select Group.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Other Pivot Table Features
Showing/hiding subtotals and grand totals (check the Layout options on the
Design ribbon)
Dealing with blank rows, that is, categories with no data (right-click any
number, choose PivotTable Options, and check the options on the Layout &
Format tab)
Displaying the data behind a given number in a pivot table (double-click any
number in the Values area to get a new worksheet)
Formatting a pivot table with various styles (check the style options on the
Design ribbon)
Moving or renaming pivot tables (check the PivotTable and Action groups on
the Analyze/Options ribbon)
Refreshing pivot tables as the underlying data changes (check the Refresh
dropdown list on the Analyze/Options ribbon)
Creating pivot table formulas for calculated fields or calculated items (check
the Formulas dropdown list on the Analyze/Options ribbon)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.5:
Lasagna [Link] (slide 1 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 3.5:
Lasagna [Link] (slide 2 of 2)
Pivot Table and Pivot Chart for Examining the Effect of Gender
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Slicers and Timelines
In Excel 2010, Microsoft added slicers—lists of
the distinct values of any variable, which you can
then filter on.
You add a slicer from the Analyze/Options ribbon
under PivotTable Tools.
In Excel 2013, a Timeline feature was added. A
Timeline is like a slicer, but it is specifically for
filtering on a date variable.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pivot Table with Slicers and a Timeline
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.