0% found this document useful (0 votes)

27 views16 pages

Pivot Table

Uploaded by

Cherry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views16 pages

Pivot Table

Uploaded by

Cherry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Box plot

Pivot table
Box Plot
The method to summarize a set of data that is measured
using an interval scale is called a box and whisker plot

Parts of Box Plots

Minimum: The minimum value in the given dataset
First Quartile (Q1): The first quartile is the median of the lower
half of the data set.
Median: The median is the middle value of the dataset, which
divides the given dataset into two equal parts. The median is
considered as the second quartile.
Third Quartile (Q3): The third quartile is the median of the
upper half of the data.
Maximum: The maximum value in the given dataset.
Interquartile Range (IQR): The difference between the third
quartile and first quartile is known as the interquartile range.
(i.e.) IQR = Q3-Q1

Outlier: The data that falls on the far left or right side of the
ordered data is tested to be the outliers. Generally, the outliers
fall more than the specified distance from the first and third
quartile.

(i.e.) Outliers are greater than Q3+(1.5 . IQR) or less than

Q1-(1.5 . IQR).
Suppose you have the math test results for a class of 15
students. Here are the results:
91 95 54 69 80 85 88 73 71 70 66 90 86 84 73
Step 1: Order the data points from least to greatest.
54 66 69 70 71 73 73 80 84 85 86 88 90 91 95
Step 2: Find the median of the data:

finding the median

Step 3: Find the middle points of the two halves divided by the median (find the
upper and lower quartiles).
Step 4: Find the extreme values.

This is the easiest part. You need to find the largest and
smallest data values.

Extreme values = 54 and 95.

So, we can determine that the five-number summary for the

class of students is 54, 70, 80, 88, 95.

Now we are absolutely ready to draw our box and whisker

plot.
As you see, the plot is divided into four groups: a lower
whisker, a lower box half, an upper box half, and an upper
whisker. Each of those groups shows 25% of the data because
we have an equal amount of data in each group.
Interpreting the box and whisker plot results:

✔ The box and whisker plot shows that 50% of the students
have scores between 70 and 88 points.

✔ In addition, 75% scored lower than 88 points, and 50% have

test results above 80. So, if you have test results somewhere
in the lower whisker, you may need to study more.
Comparative double box and whisker plot
Suppose an IT company has two stores that sell computers. The
company recorded the number of sales each store made each
month. In the past 12 months, we have the following numbers of
sold computers:
Store 1:
350, 460, 20, 160, 580, 250, 210, 120, 200, 510, 290, 380.
Store 2:
520, 180, 260, 380, 80, 500, 630, 420, 210, 70, 440, 140.
Syntax:boxplot()

x: This parameter sets as a vector or a formula.

data: This parameter sets the data frame.
main: This parameter is the title of the chart.
names: This parameter are the group labels that will be
showed under each boxplot.

✔ The mtcars dataset is a built-in dataset in R that contains

measurements on 11 different attributes for 32 different cars.
✔ Load the mtcars Dataset
✔ data(mtcars)
Summarize the mtcars Dataset

We can use the summary() function to quickly summarize each

variable in the dataset:

summary(mtcars)
dim(mtcars)
names(mtcars)

hist(mtcars$mpg,
col='steelblue',
main='Histogram',
xlab='mpg',
ylab='Frequency')
boxplot(mtcars$mpg,
main='Distribution of mpg values',
ylab='mpg',
col='steelblue',
border='black')

plot(mtcars$mpg, mtcars$wt,
col='steelblue',
main='Scatterplot',
xlab='mpg',
ylab='wt',
pch=19)
Pivot table

✔ The Pivot table is one of Microsoft Excel’s most powerful features

that let us extract the significance from a large and detailed data
set.
✔ A Pivot Table often shows some statistical value about the dataset
by grouping some values from a column together, To do so in the
R programming Language, we use the group_by() and the
summarize() function of the dplyr package library.
Pivot table

✔ The dplyr package in the R Programming Language is a

structure of data manipulation that provides a uniform set of
verbs that help us in preprocessing large data.
✔ The group_by() function groups the data using one or more
variables and then summarize function creates the summary
of data by those groups using aggregate function passed to it
Pivot table

Syntax:
df %>% group_by( grouping_variables) %>% summarize( label =
aggregate_fun() )

Parameter:
df: determines the data frame in use.
grouping_variables: determine the variable used to group data.
aggregate_fun(): determines the function used for summary. for
example, sum, mean, etc.

sample_data <- data.frame(label=c(‘x', ‘y', ‘z', ‘x',

‘y', ‘z', ‘x', ‘y',
‘z'),
value=c(222, 18, 51, 52, 44, 19, 100, 98, 34))

# load library dplyr

library(dplyr)
# create pivot table with sum of value as summary
sample_data %>% group_by(label) %>%
summarize(sum_values = sum(value))
Pivot table

1x 374
2y 160
3z 104

# create sample data frame

sample_data <- data.frame(label=c(‘x', ‘y', ‘z', ‘x',
‘y', ‘z', ‘x', ‘y',
‘z'),
value=c(222, 18, 51, 52, 44, 19, 100, 98, 34))

# load library dplyr

library(dplyr)

# create pivot table with sum of value as summary

sample_data %>% group_by(label) %>%
summarize(average_values = mean(value))

1x 125.
2y 53.3
3z 34.7

Unit 3
No ratings yet
Unit 3
20 pages
Box Plot Data-Aggregation To Normalization DJB Notes 25-04-2024
No ratings yet
Box Plot Data-Aggregation To Normalization DJB Notes 25-04-2024
21 pages
Boxplots & Histograms in R
No ratings yet
Boxplots & Histograms in R
10 pages
Data Visualization Essentials
No ratings yet
Data Visualization Essentials
87 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
Lecture 5&6
No ratings yet
Lecture 5&6
15 pages
Exp-6 SDMA
No ratings yet
Exp-6 SDMA
7 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
Data Mining and Warehousing Assignment-1: Introduction To Boxplots
No ratings yet
Data Mining and Warehousing Assignment-1: Introduction To Boxplots
4 pages
Boxplot
No ratings yet
Boxplot
22 pages
Variance and Standard Deviation Guide
No ratings yet
Variance and Standard Deviation Guide
50 pages
CHP 2
No ratings yet
CHP 2
52 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Module IV
No ratings yet
Module IV
43 pages
ISE1204 - Lecture 2
No ratings yet
ISE1204 - Lecture 2
42 pages
ML Lab Manual Bcsl602
No ratings yet
ML Lab Manual Bcsl602
108 pages
Histogram, Box and Whisker Plots
No ratings yet
Histogram, Box and Whisker Plots
7 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
02data Part2
No ratings yet
02data Part2
34 pages
Unit 3
No ratings yet
Unit 3
45 pages
ADS PRINT Ans
No ratings yet
ADS PRINT Ans
4 pages
SSMDA
No ratings yet
SSMDA
37 pages
Lecture 4 Copy 1
No ratings yet
Lecture 4 Copy 1
13 pages
Computatm Solution
No ratings yet
Computatm Solution
6 pages
3 Data Visualization
No ratings yet
3 Data Visualization
75 pages
Notes: Section 1: Exploratory Data Analysis
No ratings yet
Notes: Section 1: Exploratory Data Analysis
6 pages
Statistics Unit1 Notes
No ratings yet
Statistics Unit1 Notes
11 pages
Chapter 2 Final of Final
No ratings yet
Chapter 2 Final of Final
158 pages
ADS Imp Ans
No ratings yet
ADS Imp Ans
11 pages
Types of Plots and Statistics Guide
No ratings yet
Types of Plots and Statistics Guide
3 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
Program-1
No ratings yet
Program-1
15 pages
Concepts of EDA, Outliers-Detection and Treatment
No ratings yet
Concepts of EDA, Outliers-Detection and Treatment
99 pages
STAT241 - Business Statistics (Day 3)
No ratings yet
STAT241 - Business Statistics (Day 3)
32 pages
Module - 3
No ratings yet
Module - 3
43 pages
Word File For Prob and Stats
No ratings yet
Word File For Prob and Stats
25 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
No ratings yet
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
63 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
87 pages
ch2 (Descriptive Statistics)
No ratings yet
ch2 (Descriptive Statistics)
18 pages
CH 3 - 250408 - 170537
No ratings yet
CH 3 - 250408 - 170537
33 pages
02 Data
No ratings yet
02 Data
36 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
7 pages
Problems Chapter 1
No ratings yet
Problems Chapter 1
11 pages
PGM 1
No ratings yet
PGM 1
5 pages
Lecture 2b - Describing Data-Numerical
No ratings yet
Lecture 2b - Describing Data-Numerical
47 pages
Chapter 2 Handout Jan 30
No ratings yet
Chapter 2 Handout Jan 30
12 pages
First Week
No ratings yet
First Week
8 pages
DM Lec2 Getting To Know Your Data
No ratings yet
DM Lec2 Getting To Know Your Data
34 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Nummerical Summaries
No ratings yet
Nummerical Summaries
11 pages
Topic1 Summarizing and Visualizing Data PDF
No ratings yet
Topic1 Summarizing and Visualizing Data PDF
29 pages
Bloxplots in Data Science
No ratings yet
Bloxplots in Data Science
3 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Java Unit 2
No ratings yet
Java Unit 2
38 pages
Process Scheduling for CS Students
No ratings yet
Process Scheduling for CS Students
37 pages
Introduction to Operating Systems
No ratings yet
Introduction to Operating Systems
22 pages
Basic Operating System Concept and Its Services
No ratings yet
Basic Operating System Concept and Its Services
10 pages
Aws Lab Manual Final PDF
No ratings yet
Aws Lab Manual Final PDF
53 pages
Bookbinders Case Study
No ratings yet
Bookbinders Case Study
11 pages
Is Naive Bayes A Good Classifier For Document Clas
No ratings yet
Is Naive Bayes A Good Classifier For Document Clas
11 pages
VOLUME TABLES Students
No ratings yet
VOLUME TABLES Students
4 pages
Excel Assignment
No ratings yet
Excel Assignment
32 pages
Module 5 Ge 4educ
No ratings yet
Module 5 Ge 4educ
12 pages
Reliability Engineering Guide
No ratings yet
Reliability Engineering Guide
7 pages
Group Assignment MGT555 (Farhana Dan Suaidah) CS2907B
No ratings yet
Group Assignment MGT555 (Farhana Dan Suaidah) CS2907B
36 pages
CHAPTER 14 Regression Analysis
No ratings yet
CHAPTER 14 Regression Analysis
69 pages
BS - Abid - Term Paper
No ratings yet
BS - Abid - Term Paper
29 pages
ch11 1 PDF
No ratings yet
ch11 1 PDF
4 pages
Questions For ML - Built A Thon
No ratings yet
Questions For ML - Built A Thon
7 pages
Project Management Practice Problems
No ratings yet
Project Management Practice Problems
5 pages
Efficiency of LSD Related To RBD
No ratings yet
Efficiency of LSD Related To RBD
4 pages
Chapter 3 and 4: Numerical Descriptive Measures: X N X WX P L N
No ratings yet
Chapter 3 and 4: Numerical Descriptive Measures: X N X WX P L N
7 pages
Tugas Rutin 1
No ratings yet
Tugas Rutin 1
5 pages
(Velda Rifka Almira) Statistical Analysis
No ratings yet
(Velda Rifka Almira) Statistical Analysis
13 pages
Determination of Precision and Bias Data For Use in Test Methods For Petroleum Products and Lubricants
No ratings yet
Determination of Precision and Bias Data For Use in Test Methods For Petroleum Products and Lubricants
28 pages
2001-05 A New Approach To Component VaR - Carroll, Perry, Yang, Ho
No ratings yet
2001-05 A New Approach To Component VaR - Carroll, Perry, Yang, Ho
11 pages
Effective Groundwater Model Calibration With Analysis of Data Sensitivities Predictions and Uncertainty 1st Edition Mary C. Hill PDF Download
No ratings yet
Effective Groundwater Model Calibration With Analysis of Data Sensitivities Predictions and Uncertainty 1st Edition Mary C. Hill PDF Download
55 pages
Intro to Independent-Samples t Test
No ratings yet
Intro to Independent-Samples t Test
8 pages
Test Bank Questions Chapter 5
No ratings yet
Test Bank Questions Chapter 5
5 pages
Customer Churn Prediction Project: by Shweta Gupta
100% (6)
Customer Churn Prediction Project: by Shweta Gupta
41 pages
Geostatistics for Scientists
No ratings yet
Geostatistics for Scientists
51 pages
Chapter16 Econometrics Measurement Error Models
No ratings yet
Chapter16 Econometrics Measurement Error Models
21 pages
Lecture Plan Format
No ratings yet
Lecture Plan Format
33 pages
Nursing Students' Reluctance Factors
No ratings yet
Nursing Students' Reluctance Factors
12 pages
Statistical Analysis Essentials
No ratings yet
Statistical Analysis Essentials
4 pages
3 Wirae
No ratings yet
3 Wirae
10 pages
The Five Assumptions of Multiple Linear Regression
No ratings yet
The Five Assumptions of Multiple Linear Regression
18 pages
27.12.10h15 KTLTC De-1
No ratings yet
27.12.10h15 KTLTC De-1
6 pages