Histogram
•A histogram is a plot that lets you discover, and show, the underlying frequency
distribution (shape) of a set of continuous data.
•This allows the inspection of the data for its underlying distribution (e.g., normal
distribution), outliers, skewness, etc.
•With bar charts, each column represents a group defined by a categorical variable; and
with histograms, each column represents a group defined by a continuous, quantitative
variable.
Histogram
Wages in Rs. No. of Workers (f)
0-10 22
10--20 38
20-30 46
30-40 35
40-50 20
50
45
40
35
No. of Workers
30
25
20
15
10
5
0
0-10 10--20 20-30 30-40 40-50
Wages in Rs
Box Plot
1. A box plot is a method for graphically depicting groups of numerical data through
their quartiles.
2. Box plots may also have lines extending from the boxes (whiskers) indicating variability
outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-
and-whisker diagram.
3. Outliers may be plotted as individual points.
Types:
• Standard Box Plot
• Variable width box plot
• Notched box plot
• Variable width box plots
Standard box plot
Displays :
1. Quartiles Q1,Q3
2. Median , M
3. Max and Min
4. Outliers
x
Outliers
x
Maximum
Q3
Whiskers Median
Q1
Minimum
Important note
•Data sets can sometimes contain outliers that are suspected to be anomalies (perhaps
because of data collection errors).
•If outliers are present, the whisker on the appropriate side is drawn to
Q1- 1.5 * IQR
and
Q3+1.5 * IQR
rather than the data minimum or the data maximum.
•Small circles or unfilled dots are drawn on the chart to indicate where suspected outliers
lie. Filled circles are used for known outliers.
Variable width box plot
Displays :
1. Quartiles Q1,Q3
2. Median , M
3. Max and Min
4. Outliers
5. Sample size
n=100 n=50
Notched box plot
Median
Notch
For 95% confidence interval
Notch = ± 1.57XIQR/n0.5
Notched box plot
Overlap of notches indicating Non overlap of notches indicating
no significant change there is a significant change
Case study
Ref: www.itl.nist.gov
1. Machine 2 has the smallest
median diameter and machine
1 having the largest median
diameter.
2. Machines 1 and 2 have
comparable variability while
machine 3 has somewhat
larger variability.
Case study
Ref: www.itl.nist.gov
1. Neither the location
nor the spread seem
to differ significantly
by day.
Case study
Ref: www.itl.nist.gov
1. Neither the location
nor the spread seem
to differ significantly
by time of day.
Examples
Construct a box and whisker plot of the concentration of suspended
solid material from lake and state your conclusions
42.4 65.7 29.8 58.7 52.1 55.8
57 68.7 67.3 67.3 54.3 54
73.1 81.3 59.9 56.9 62.2 69.9
66.9 59 56.3 43.3 57.4 45.3
Min 35.0625(29.8)
Q1 54.225
Med 58.05
Q3 67
Max 86.1625(81.3)
Run Chart
•A run chart, also known as a run-sequence plot is a graph that displays observed data in a
time sequence.
•Often, the data displayed represent some aspect of the output or performance of a
manufacturing or other business process. It is therefore a form of line chart
•By collecting and charting data over time, you can find trends or patterns in the process.
Because they do not use control limits, run charts cannot tell you if a process is stable.
No of complaints
Days
Run chart rules for interpretation
1. Rule One – A Shift
•A shift on a run chart is six or more consecutive points either all above or all below the
median.
• Skip values that fall on the median and continue counting.
•The change is likely to be attributable to something and not the result of random
variation within a process.
2. Rule Two – Trend:
• A trend on a run chart is five or more consecutive points all going up or all going
down.
• If the value of two or more successive points is the same, ignore one of the points
when counting.
• Like values do not make or break a trend.
A Shift
Trend
Run chart rules for interpretation
3. Rule Three – Runs:
• A run is a series of points in a row on one side of the median.
• A non-random pattern or signal of change is indicated by too few or too many runs
• To determine the number of runs above and below the median, count the number of
times the data line crosses the median and add one
4. Rule Four – Astronomical Point:
• This rule aids in detecting unusually large or small numbers.
• They are characterised by data points that are different from all or most of the
other values
Too less Runs
Astronomical Point
Example
An automobile industry manufactures engine components. the components are ground
for which out of roundness is required to be less than 5 microns. Sample 1 of 19
components and it was observed that some components do not meet this requirement.
Machine was handed over to maintenance dept.. Sample 2 was taken after. The out of
roundness are recorded below.
Draw notched box plot for 95% confidence interval (C= 1.96 )and offer your comments.
Component Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Out of roundness
Sample 1 4 5 7 6 8 7 7 9 4 6 10 9 9 9 4 3 9 8 3
Out of roundness
Sample 2 4 2 2 4 4 3 3 4 1 3 3 2 3 2 2 2 5 5 2
Example
3 3 4 4 4 5 6 6 7 7 7 8 8 9 9 9 9 9 10
1 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 5 5
Q1 4 2
For 95% confidence interval
Md 7 3 Notch = ± (1.57X IQR)/n0.5
Q3 9 4 1.25XIQR
Sm=
IQR 5 2 1.35X n0.5
Notch= ± Sm X C
Width of Notch :
Sample 1 = ± (1.57X 5)/19 0.5 = ± 1.80
Sample 2 = ± (1.57X 2)/19 0.5 = ± 0.72
Example
Construct a run chart for the following data showing a process parameter. Comment
whether the process shows a common or special causes for variation. Has there been any
significant trend? Offer your comments.
1 0.2 13 0.37
2 0.36 14 0.24
3 0.32 15 0.42
4 0.38 16 0.26
5 0.23 17 0.42
6 0.37 18 0.28
7 0.38 19 0.68
8 0.22 20 0.4
9 0.24 21 0.21
10 0.26 22 0.39
11 0.27 23 0.3
12 0.3
Example
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Stem and Leaf plot
A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first
digit or digits) and a "leaf" (usually the last digit).
100 1 1
100 11 22 33 55
120 103 112 126 110 2 2
110 33 66 99 99 99
102 142 119 119 120
120 00 00 11 22 55 66 66 77 99
126 145 155 132 130 2 3 6 7 8
130 2 3 6 7 8
140 2 3 4 5
152 119 127 133 140
150 2
2
5
3
8
4 5
101 113 105 158 150 2 5 8
Key
121 144 120 129
100 2 = 102
Key
101 136 137 143
100 2 = 102
138 116 125 122
Draw a stem and leaf plot for the following data.
0.28 0.24 0 -0.34
-0.15 -0.19 -0.27 0.2
-0.25 -0.13 -0.19 0.49
0.19 0.24 -0.31 0.12
-0.26 0.18 0 0.29
-0.2 0.4 0.25 -0.32
-0.33 -0.15 -0.21 0.13
0.16 -0.14 0.19 0.44
0.48 -0.16 0.18 0.29
Example 2
-0.3 1 2 3 4
-0.2 0 1 5 6 7 8
-0.1 3 4 5 5 6 9 9
0 0 0
0.1 2 3 6 8 8 9 9
0.2 4 4 5 8 9 9
0.3
0.4 0 4 8 9
Key
0.1 2 = 0.12
Example 3
Normal Probability Plot
•The normal probability plot is a graphical technique for assessing whether or not a data set
is approximately normally distributed.
•The data are plotted against a theoretical normal distribution in such a way that the points
should form an approximate straight line.
•There are two ways to assess.
1. On Normal Distribution Probability Paper
2. On regular graph paper
Normal Distribution Probability Paper
Normality
SD (0.84-0.5)
Mean (0.5)
mathisfun.com
Test for Normality and estimate the parameters from sample given below
176 192
191 201
214 190
220 183
205 185
xj f(t)= (j-0.5)/n
j X axis Y axis
1 176 0.05
2 183 0.15
3 185 0.25
4 190 0.35
5 191 0.45
6 192 0.55
7 201 0.65
8 205 0.75
9 214 0.85
10 220 0.95
170 180 190 200 210 220
NPP on regular graph paper
Procedure:
•Arrange your x-values in ascending order.
•Calculate
fi = (i-0.375)/(n+0.25)
where i is the position of the data value in the ordered list and n is the
number of observations.
•Find the z-score for each fi
•Plot your x-values on the horizontal axis and the corresponding z-score
on the vertical axis.
Test for Normality and estimate the parameters from sample given below using
regular graph paper
176 192
191 201
214 190
220 183
205 185
xi
i X axis fi=(i-0.375)/(n+0.25) Z Value
1 176 0.060 -1.55
2 183 0.158 -1.0
3 185 0.256 -0.65
4 190 0.353 -0.38
5 191 0.451 -0.12
6 192 0.548 0.12
7 201 0.646 0.38
8 205 0.743 0.65
9 214 0.841 1.0
10 220 0.939 1.55
z value
Random Variable
Example
A soft drink bottler is studying the internal pressure strength of 1 litre glass bottles. A
random sample of 16 bottles is tested and pressure strengths are obtained. The data
collected is shown below. Plot this data on regular graph paper. Does it seem reasonable to
conclude that pressure strength is normally distributed
236 218 221 231
212 205 213 214
229 203 198 212
203 210 234 211
i x fi Z value
1 198 0.038 -1.77
2 203 0.100 -1.28
3 203 0.162 -0.99
4 205 0.223 -0.76
5 210 0.285 -0.57
6 211 0.346 -0.40
7 212 0.408 -0.23
8 212 0.469 -0.08
9 213 0.531 0.08
10 214 0.592 0.23
11 218 0.654 0.40
12 221 0.715 0.57
13 229 0.777 0.76
14 231 0.838 0.99
15 234 0.900 1.28
16 236 0.962 1.77