Fds Unit 2 Notes
Fds Unit 2 Notes
4th Semester
2nd Semester
Deep Learning -
AD3501
Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester
7th Semester
8th Semester
Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
Page 1 of 80
www.BrainKart.com
UNIT 2
Frequency Distribution and Data: Types, Tables, and Graphs
For example
Raw data :
Raw data is an initial collection of information. This information has not yet been organized. After
the very first step of data collection, you will get raw data. For example,
A group of five friends their favourite colour. The answers are Blue, Green, Blue, Red, and Red. This
collection of information is the raw data.
Discrete data :Discrete data is that which is recorded in whole numbers, like the number of
children in a school or number of tigers in a zoo. It cannot be in decimals or fractions.
Continuous data :Continuous data need not be in whole numbers, it can be in decimals. Examples
are the temperature in a city for a week, your percentage of marks for the last exam etc.
• Pictographs
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 2 of 80
www.BrainKart.com
• Bar Graphs
The frequency of any value is the number of times that value appears in a data set. So from the
above examples of colours, we can say two children like the colour blue, so its frequency is two. So
to make meaning of the raw data, we must organize. And finding out the frequency of the data
values is how this organisation is done.
Frequency Distribution
Many times it is not easy or feasible to find the frequency of data from a very large dataset. So to
make sense of the data we make a frequency table and graphs. Let us take the example of the
heights of ten students in cms.
Frequency Distribution Table
139, 145, 150, 145, 136, 150, 152, 144, 138, 138
This frequency table will help us make better sense of the data given. Also when the data set is too
big (say if we were dealing with 100 students) we use tally marks for counting. It makes the task
more organized and easy. Below is an example of how we use tally marks.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 3 of 80
www.BrainKart.com
130-140 4
140-150 3
150-160 3
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 4 of 80
www.BrainKart.com
From the above table, you can see that the value of 150 is put in the class interval of 150-160 and
not 140-150. This is the convention we must follow.
• The table gives the number of snacks ordered and the number of days as a tally. Find
the frequency of snacks ordered. 2
Answer: From the frequency table the number of snacks ordered ranging between
• 2-4 is 4 days
• 4 to 6 is 3 days
• 6 to 8 is 9 days
• 8 to 10 is 9 days
• 10 to 12 is 7 days.
So the frequencies for all snacks ordered are 4, 3, 9, 9, 7
• Next, divide the range by the number of the group you want your data in and then round up.
Answer: In an overview, the frequency distribution of all distinct values in some variables and the
number of times they occur. Meaning that it tells how frequencies are distributed overvalues in a
frequency distribution. However, mostly we use frequency distributions to summarize categorical
variables.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 5 of 80
www.BrainKart.com
Answer: It has great importance in statistics. Also, a well-structured frequency distribution makes
possible a detailed analysis of the structure of the population with respect to given characteristics.
Therefore, the groups into which the population break down can be determined.
Answer: The various components of the frequency distribution are: Class interval, types of class
interval, class boundaries, midpoint or class mark, width or size o class interval, class frequency,
Descriptive Statistics
A population is the group to be studied, and population data is a collection of all elements in the
population. For example:
A sample is a subset of data drawn from the population of interest. For example:
For example,
The population mean (µ) is estimated by the sample mean (x̄ ). The population variance (σ2) is
estimated by the sample variance (s2).
For example:
Variables are divided into two major groups: Qualitative And Quantitative.
1. Qualitative variables
• Quantitative variables have values that are typically numeric, such as measurements.
2. Quantitative variables
o Quantitative variables can be broken down further into two more categories:
discrete and continuous variables.
Examples
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 7 of 80
www.BrainKart.com
Graphs
Data can be described clearly and concisely with the aid of a well-constructed frequency
distribution.
GRAPHS FOR QUANTITATIVE DATA
Histograms
A bar-type graph for quantitative data. The common boundaries between adjacent bars
emphasize the continuity of the data, as with continuous variables.
A histogram in Figure shows a casual glance at this histogram confirms previous conclusions: a
dense concentration of weights among the 150s, 160s, and 170s, with a spread in the direction
of the heavier weights. Let’s pinpoint some of the more important features of histograms.
■ Equal units along the horizontal axis (the X axis, or abscissa) reflect the various class intervals
of the frequency distribution.
■ Equal units along the vertical axis (the Y axis, or ordinate) reflect increases in frequency. (The
units along the vertical axis do not have to be the same width as those along the horizontal axis.)
■ The intersection of the two axes defines the origin at which both numerical scales equal 0
Frequency Polygon
A line graph for quantitative data that also emphasizes the continuity of continuous variables
An important variation on a histogram is the frequency polygon, or line graph. Frequency
polygons may be constructed directly from frequency distributions. However, we will follow the
step-by-step transformation of a histogram into a frequency polygon, as described in panels A,
B, C, and D of Figure 2.2. A. This panel shows the histogram for the weight distribution. B. Place
dots at the midpoints of each bar top or, in the absence of bar tops, at midpoints for classes on
the horizontal axis, and connect them with straight lines. [To find the midpoint of any class, such
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 8 of 80
www.BrainKart.com
as 160–169, simply add the two tabled boundaries (160 + 169 = 329) and divide this sum by 2
(329/2 = 164.5).] C. Anchor the frequency polygon to the horizontal axis. First, extend the upper
tail to the midpoint of the first unoccupied class (250–259) on the upper flank of the histogram.
Then extend the lower tail to the midpoint of the first unoccupied class (120–129) on the lower
flank of the histogram. Now all of the area under the frequency polygon is enclosed completely.
D. Finally, erase all of the histogram bars, leaving only the frequency polygon. Frequency
polygons are particularly useful when two or more frequency distributions or relative frequency
distributions are to be included in the same graph.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 9 of 80
www.BrainKart.com
• Note: For this to work, ALL data points should be rounded to the same
number of decimal places.
EXAMPLE: Best Actress Oscar Winners
When some of the stems hold a large number of leaves, we can split each stem into two:
one holding the leaves 0-4, and the other holding the leaves 5-9. A
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 10 of 80
www.BrainKart.com
statistical software package will often do the splitting for you, when appropriate.Note
that when rotated 90 degrees counter-clockwise, the stemplot visuallyresembles a
histogram:
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 11 of 80
www.BrainKart.com
Typical Shapes
Whether expressed as a histogram, a frequency polygon, or a stem and leaf display, an important
characteristic of a frequency distribution is its shape. Figure 2.3 shows some of the more typical
shapes for smoothed frequency polygons (which ignore the inevitable irregularities of real data).
Normal
Any distribution that approximates the normal shape in panel A of Figure 2.3 can be analyzed, as
we will see in Chapter 5, with the aid of the well-documented normal curve. The familiar bell-
shaped silhouette of the normal curve can be superimposed on many frequency distributions,
including those for uninterrupted gestation periods of human fetuses, scores on standardized
tests, and even the popping times of individual kernels in a batch of popcorn.
Bimodal
Any distribution that approximates the bimodal shape in panel B of Figure 2.3 might, as
suggested previously, reflect the coexistence of two different types of observations in the same
distribution. For instance, the distribution of the ages of residents in a neighborhood consisting
largely of either new parents or their infants has a bimodal shape.
Positively Skewed The two remaining shapes in Figure 2.3 are lopsided. A lopsided distribution
caused by a few extreme observations in the positive direction (to the right of the majority of
observations), as in panel C of Figure 2.3, is a positively skewed distribution.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 12 of 80
www.BrainKart.com
The distribution of incomes among U.S. families has a pronounced positive skew, with most
family incomes under $200,000 and relatively few family incomes spanning a wide range of
values above $200,000. The distribution of weights in Figure 2.1 also is positively skewed.
Negatively Skewed A lopsided distribution caused by a few extreme observations in the negative
direction (to the left of the majority of observations), as in panel D of Figure 2.3, is a negatively
skewed distribution. The distribution of ages at retirement among U.S. job holders has a
pronounced negative skew, with most retirement ages at 60 years or older and relatively few
retirement ages spanning the wide range of ages younger than 60.
A GRAPH FOR QUALITATIVE (NOMINAL) DATA:
The distribution in Table 2.7, based on replies to the question “Do you have a Facebook profile?”
appears as a bar graph in Figure 2.4. A glance at this graph confirms that Yes replies occur
approximately twice as often as No replies. As with histograms, equal segments along the
horizontal axis are allocated to the different words or classes that appear in the frequency
distribution for qualitative data. Likewise, equal segments along the vertical axis reflect
increases in frequency. The body of the bar graph consists of a series of bars whose heights
reflect the frequencies for the various words or classes. A person’s answer to the question “Do
you have a Facebook profile?” is either Yes or No, not some impossible intermediate value, such
as 40 percent Yes and 60 percent No. Gaps are placed between adjacent bars of bar graphs to
emphasize the discontinuous nature of qualitative data. A bar graph also can be used with
quantitative data to emphasize the discontinuous nature of a discrete variable, such as the
number of children in a family.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 13 of 80
www.BrainKart.com
Misleading Graphs:
Graphs can be constructed in an unscrupulous manner to support a particular point of view.
Indeed, this type of statistical fraud gives credibility to popular sayings, including “Numbers
don’t lie, but statisticians do” and “There are three kinds of lies—lies, damned lies, and
statistics.” For example, to imply that comparatively many students responded Yes to the
Facebook profile question, an unscrupulous person might resort to the various tricks shown in
Figure 2.5:
■ The width of the Yes bar is more than three times that of the No bar, thus violating the custom
that bars be equal in width.
■ The lower end of the frequency scale is omitted, thus violating the custom that the entire scale
be reproduced, beginning with zero. (Otherwise, a broken scale should be highlighted by
crossover lines, as in Figures 2.1 and 2.2.)
■ The height of the vertical axis is several times the width of the horizontal axis, thus violating
the custom, heretofore unmentioned, that the vertical axis be approximately as tall as the
horizontal axis is wide. Beware of graphs in which, because the vertical axis is many times larger
than the horizontal axis (as in Figure 2.5), frequency differences are exaggerated, or in which,
because the vertical axis is many times smaller than the horizontal axis, frequency differences
are suppressed.
AVERAGES
A center of a data set is a way of describing a location. We can measure a center of a
data in 3 different ways: the mean (average), the median and the mode.
The two main numerical measures for the center of a distribution are the mean and the
median. Each one of these measures is based on a completely different idea of
describing the center of a distribution. Let us first present each one of the measures,
and then compare their properties.
MEAN
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 14 of 80
www.BrainKart.com
The mean is the average of a set of observations (i.e., the sum of the observations
divided by the number of observations).
The mean is the average of a set of observations. If the n observations are written as their
mean can be written mathematically as: their mean is:
We add all of the ages to get 1233 and divide by the number of ages which was 32 to
get 38.5. We denote this result as x-bar and called the sample mean.
Often we have large sets of data and use a frequency table to display the data more
efficiently. Data were collected from the last three World Cup soccer tournaments. A
total of 192 games were played. The table below lists the number of goals scored per
game (not including any goals scored in shootouts).
Total # Frequency
Goals/Game
0 17
1 45
2 51
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 15 of 80
www.BrainKart.com
3 37
4 25
5 11
6 3
7 2
8 1
To find the mean number of goals scored per game, we would need to find the sum of
all 192 numbers, and then divide that sum by 192.
Rather than add 192 numbers, we use the fact that the same numbers appear many
times. For example, the number 0 appears 17 times, the number 1 appears 45 times,
the number2 appears 51 times, etc.
= 0(17) + 1(45) + 2(51) + 3(37) + 4(25) + 5(11) + 6(3) + 7(2) + 8(1) = 453.
Note that, in this example, the values of 1, 2, and 3 are the most common andour
averagefalls in this range representing the bulk of the data.
MEDIAN
If n is even, the median M is the mean of the two center observations in the ordered
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 16 of 80
www.BrainKart.com
list. These two observations are the ones “sitting” in the (n / 2) and(n / 2) + 1 spots
in the ordered list.
EXAMPLE: Median (1)
For a simple visualization of the location of the median, consider the following two
simple cases of n = 7 and n = 8 ordered observations, with each observation
represented by asolid circle:
Comments:
In the images above, the dots are equally spaced, this need not indicate the data values
are actually equally spaced as we are only interested in listing them in order. In fact, in
the above pictures, two subsequent dots could have exactly the same value. It is clear
that the value of the median will be in the same position regardless of the distance
between data values.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 17 of 80
www.BrainKart.com
Counting from the top, we find that: the 16th ranked observation is 35the 17thranked
observation also happens to be 35. Therefore, the median M = (35 + 35) / 2 = 35
The mean and the median, the most common measures of center, each describe the
centerof a distribution of values in a different way.
The mean describes the center as an average value, in which the actual values of the
data points play an important role.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 18 of 80
www.BrainKart.com
The median, on the other hand, locates the middle value as the center, and
theorder of the data is the key.
Data set A → 64 65 66 68 70 71 73
Data set B → 64 65 66 68 70 71 730
For dataset A, the mean is 68.1, and the median is 68.
Looking at dataset B, notice that all of the observations except the last one are
close together. The observation 730 is very large, and is certainly an outlier. In this case,
the median is still 68, but the mean will be influenced by the high outlier, and shifted
up to 162.
The message that we should take from this example is:
The mean is very sensitive to outliers (because it factors in their magnitude), while
the median is resistant (or robust) to outliers.
The mode of a data set is the number that occurs most frequently in the set.
• If no value appears more than once in the data set, the data set has no mode.
• If a there are two values that appear in the data set an equal number of
times, theyboth will be modes etc.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 19 of 80
www.BrainKart.com
For skewed right distributions and/or datasets with high outliers: the mean is
For skewed left distributions and/or datasets with low outliers: the mean is less than
the median.
• Use the sample mean as a measure of center for symmetric distributions with
no outliers. Otherwise, the median will be a more appropriate measure of the
center of our data.
Let’s Summarize
• The two main numerical measures for the center of a distribution are the
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 20 of 80
www.BrainKart.com
mean and the median. The mean is the average value, while the median is the middle
value.
• The mean is very sensitive to outliers (as it factors in their magnitude),
while the median is resistant to outliers.
• The mean is an appropriate measure of center for symmetric distributions
with no outliers. In all other cases, the median is often a better measure of the
center of the distribution.
Describing Variability
Intuitive Approach
• In Figure 4.1, each of the three frequency distributions consists of seven scores with the
same mean (10) but with different variabilities. (Ignore the numbers in boxes; their
significance will be explained later.) Before reading on, rank the three distributions from
least to most variable. Your intuition was correct if you concluded that distribution A has
the least variability, distribution B has intermediate variability, and distribution C has the
most variability. If this conclusion is not obvious, look at each of the three distributions, one
at a time, and note any differences among the values of individual scores. For distribution
A with the least (zero) variability, all seven scores have the same value (10). For
distribution B with intermediate variability, the values of scores vary slightly (one 9 and
one 11), and for distribution C with most variability, they vary even more (one 7, two 9s,
two 11s, and one 13). Importance of Variability Variability assumes a key role in an analysis
of research results. For example, a researcher might ask: Does fitness training improve, on
average, the scores of depressed patients on a mental-wellness test? To answer this
question, depressed patients are randomly assigned to two groups, fitness training is given
to one group, and wellness scores are obtained for both groups. Let’s assume that the mean
wellness score is larger for the group with fitness training. Is the observed mean difference
between the two groups real or merely transitory? This decision depends not only on the
size of the mean difference between the two groups but also on the inevitable variabilities
of individual scores within each group. To illustrate the importance of variability, Figure 4.2
shows the outcomes for two fictitious experiments, each with the same mean difference of
2, but with the two groups in experiment B having less variability than the two groups in
experiment C. Notice that groups B and C in Figure 4.2 are the same as their counterparts
in Figure 4.1. Although the new group B* retains exactly the same (intermediate) variability
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 21 of 80
www.BrainKart.com
as group B, each of its seven scores and its mean have been shifted 2 units to the right.
Likewise, although the new group C* retains exactly the same (most) variability as group
C, each of its seven scores and its mean have been shifted 2 units to the right. Consequently,
the crucial mean difference of 2 (from 12 − 10 = 2) is the same for both experiments. Before
reading on, decide which mean difference of 2 in Figure 4.2 is more apparent. The mean
difference for experiment B should seem more apparent because of the smaller variabilities
within both groups B and B*. Just as it’s easier to hear a phone message when static is
reduced, it’s easier to see a difference between group means when variabilities within
groups are reduced.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 22 of 80
www.BrainKart.com
Range
A range measures the spread of a data inside the limits of a data set, it is calculated as
a difference between the highest and lowest values in the data set. The larger the range,
the greater the spread of the data.The range covered by the data is the most intuitive
measure of variability. The range is exactly the distance between the smallest data
point (min) and the largest one (Max).
Note: When we first looked at the histogram, and tried to get a first feel for the spread
of the data, we were actually approximating the range, rather than calculating the exact
range.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 23 of 80
www.BrainKart.com
variable distribution, B, and 3.14 for the most variable distribution, C, in agreement
with our intuitive judgments about the relative variability of these three distributions.
Reconstructing the Variance To understand the variance better, let’s reconstruct it step
by step. Although a measure of variability, the variance also qualifies as a type of mean,
that is, as the balance point for some distribution. To qualify as a type of mean, the
values of all scores must be added and then divided by the total number of scores. In
the case of the variance, each original score is re-expressed as a distance or deviation
from the mean by subtracting the mean. For each of the three distributions in
Figure 4.1, the face values of the seven original scores (shown as numbers along the X
axis) have been re-expressed as deviation scores from their mean of 10 (shown as
numbers in the boxes). For example, in distribution C, one score coincides with the
mean of 10, four scores (two 9s and two 11s) deviate 1 unit from the mean, and two
scores (one 7 and one 13) deviate 3 units from the mean, yielding a set of seven
deviation scores: one 0, two –1s, two 1s, one –3, and one 3. (Deviation scores above the
mean are assigned positive signs; those below the mean are assigned negative signs.)
Mean of the Deviations Not a Useful Measure No useful measure of variability can be
produced by calculating the mean of these seven deviations, since, as you will recall
from Chapter 3, the sum of all deviations from their mean always equals zero. In effect,
the sum of all negative deviations always counterbalances the sum of all positive
deviations, regardless of the amount of variability in the group.
Standard deviation is the measure of the overall spread (variability) of a data set
valuesfrom the mean. The more spread out a data set is, the greater are the distances
from themean and the standard deviation.
There are many notations for the standard deviation: SD, s, Sd, StDev. Here,
we’ll use SDas an abbreviation for standard deviation, and use s as the symbol. Formula
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 24 of 80
www.BrainKart.com
Calculation
In order to get a better understanding of the standard deviation, it would beuseful
tosee an example of how it is calculated.
The following are the number of customers who entered a video store in8
consecutivehours: 7, 9, 5, 13, 3, 11, 15, 9
(112)/(7) = 16
• This value, the sum of the squared deviations divided by n – 1, is called the
variance. However, the variance is not used as a measure of spread directly as
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 25 of 80
www.BrainKart.com
the units are the square of the units of the original data.
5. The standard deviation of the data is the square root of the variance calculated
in step.
In this case, we have the square root of 16 which is 4. We will use the lower case
letter s represent the standard deviation. s = 4
• We take the square root to obtain a measure which is in the original units
of the data. The units of the variance of 16 are in “squared customers” which is
difficult to interpret.
• The units of the standard deviation are in “customers” which makes this
measure ofvariation more useful in practice than the variance.
9. The interpretation of the standard deviation is that on average, the actual
number of customers who enter the store each hour is 4 away from 9.
• The standard deviation is the square root of the variance (both population and sample).
• While the sample variance is the positive, unbiased estimator for the population
variance, the units for the variance are squared.
• The standard deviation is a common method for numerically describing the distribution
of a variable. The population standard deviation is σ (sigma) and sample standard
deviation is s.
Example 7
Compute the standard deviation of the sample data: 3, 5, 7 with a sample mean of 5.
DEGREES OF FREEDOM ( d f)
Degrees of freedom (df) refers to the number of values that are free to vary, given one
or more mathematical restrictions, in a sample being used to estimate a population
characteristic.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 26 of 80
www.BrainKart.com
The number of values free to vary, given one or more mathematical restrictions.
IQR = Q3 – Q1
The following picture illustrates this idea: (Think about the horizontal line as the data
ranging from the min to the Max). IMPORTANT NOTE: The “lines” in the following
illustrations are not to scale. The equal distances indicate equal amounts of data NOT
equal distance between the numeric values.
1. Arrange the data in increasing order, and find the median M. Recall that
the median divides the data, so that 50% of the data points are below the
median, and 50% of the data points are above the median.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 27 of 80
www.BrainKart.com
2. Find the median of the lower 50% of the data. This is called the first
quartile of the distribution, and the point is denoted by Q1. Note from the
picture that Q1 divides the lower 50% of the data into two halves, containing
25% of the data points in eachhalf. Q1 is called the first quartile, since one
quarter of the data points fall below it.
3. Repeat this again for the top 50% of the data. Find the median of the
top 50% of the data. This point is called the third quartile of the distribution,
and is denoted by Q3.Note from the picture that Q3 divides the top 50% of the
data into two halves, with 25%of the data points in each.Q3 is called the third
quartile,since three quarters of the data points fall below it.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 28 of 80
www.BrainKart.com
4. The middle 50% of the data falls between Q1 and Q3, and therefore:
IQR = Q3 – Q1.
Comments:
1. The last picture shows that Q1, M, and Q3 divide the data into four
quarters with 25%of the data points in each, where the median is essentially
the second quartile. The use of IQR = Q3 – Q1 as a measure of spread is therefore
particularly appropriate when the median M is used as a measure ofcenter.
2. We can define a bit more precisely what is considered the bottom or top
50% of the data. The bottom (top) 50% of the data is all the observations whose
position in the ordered list is to the left (right) of the location of the overall
median M. The following picture will visually illustrate this for the simple cases
of n = 7 and n = 8.
ps://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 29 of 80
www.BrainKart.com
Note that when n is odd (as in n = 7 above), the median is not included in either
the bottom or top half of the data; When n is even (as in n = 8 above), the data are
naturally divided into two halves.
To find the IQR of the Best Actress Oscar winners’ distribution, it will be
convenient touse the stemplot.
Q1 is the median of the bottom half of the data. Since there are 16 observations in
that half, Q1 is the mean of the 8th and 9th ranked observations in that half:
Q1 = (31 + 33) / 2 = 32
Similarly, Q3 is the median of the top half of the data, and since there are 16
observations in that half, Q3 is the mean of the 8th and 9th ranked observations
in that half:
Q3 = (41 + 42) / 2 = 41.5
IQR = 41.5 – 32 = 9.5
Note that in this example, the range covered by all the ages is 59 years, while the
range covered by the middle 50% of the ages is only 9.5 years. While the whole
dataset is spread over a range of 59 years, the middle 50% of the datais packed
into only 9.5 years.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 30 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 31 of 80
www.BrainKart.com
• The blue curve has a larger standard deviation. The curve is flatter and the tails are
thicker. The probability is spread over a larger range of values.
• The mean is the center of this distribution and the highest point.
• The curve is symmetric about the mean. (The area to the left of the mean equals the area to
the right of the mean.)
• The total area under the curve is equal to one.
• As x increases and decreases, the curve goes to zero but never touches.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 32 of 80
www.BrainKart.com
There are millions of possible combinations of means and standard deviations for
continuous random variables.
Finding probabilities associated with these variables would require us to integrate the PDF
over the range of values we are interested in.
• The standard normal table gives probabilities associated with specific Z-scores.
• The table we use is cumulative from the left.
• The negative side is for all Z-scores less than zero (all values less than the mean).
• The positive side is for all Z-scores greater than zero (all values greater than the mean).
• Not all standard normal tables work the same way.
Example 10
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 33 of 80
www.BrainKart.com
Figure 11. The standard normal table and associated area for z = 1.62.
• Read down the Z-column to get the first part of the Z-score (1.6).
• Read across the top row to get the second decimal place in the Z-score (0.02).
• The intersection of this row and column gives the area under the curve to the left of the Z-
score.
• What if we have an area and we want to find the Z-score associated with that area?
• Instead of Z-score → area, we want area → Z-score.
• We can use the standard normal table to find the area in the body of values and read
backwards to find the associated Z-score.
• Using the table, search the probabilities to find an area that is closest to the probability you
are interested in.
Example 11
Since the table is cumulative from the left, you must use the complement of 5%.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 34 of 80
www.BrainKart.com
Figure
13. The standard normal table.
The Z-score for the 95th percentile is 1.64.Area in between Two Z-scores
Example 12
• The middle 95% has 2.5% on the right and 2.5% on the left.
• Use the symmetry of the curve.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 35 of 80
www.BrainKart.com
• Look at your standard normal table. Since the table is cumulative from the left, it is easier
to find the area to the left first.
• Find the area of 0.025 on the negative side of the table.
• The Z-score for the area to the left is -1.96.
• Since the curve is symmetric, the Z-score for the area to the right is 1.96.
Common Z-scores
• Z.05 = 1.645 and the area between -1.645 and 1.645 is 90%
• Z.025 = 1.96 and the area between -1.96 and 1.96 is 95%
• Z.005 = 2.575 and the area between -2.575 and 2.575 is 99%
Typically, our normally distributed data do not have μ = 0 and σ = 1, but we can relate any
normal distribution to the standard normal distributions using the Z-score. We can
transform values of x to values of z.
For example, if a normally distributed random variable has a μ = 6 and σ = 2, then a value of
x = 7 corresponds to a Z-score of 0.5.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 36 of 80
www.BrainKart.com
This tells you that 7 is one-half a standard deviation above its mean. We can use this
relationship to find probabilities for any normal random variable.
To find the area for values of X, a normal random variable, draw a picture of the area of
interest, convert the x-values to Z-scores using the Z-score and then use the standard
normal table to find areas to the left, to the right, or in between.
Example 13
Adult deer population weights are normally distributed with µ = 110 lb. and σ = 29.7 lb. As
a biologist you determine that a weight less than 82 lb. is unhealthy and you want to know
what proportion of your population is unhealthy.
P(x<82)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 37 of 80
www.BrainKart.com
Convert 82 to a Z-score
This is an “area to the left” problem so you can read directly from the table to get the
probability.
P(x<82) = 0.1736
Approximately 17.36% of the population of adult deer is underweight, OR one deer chosen
at random will have a 17.36% chance of weighing less than 82 lb.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 38 of 80
www.BrainKart.com
Example 14
Statistics from the Midwest Regional Climate Center indicate that Jones City, which has a
large wildlife refuge, gets an average of 36.7 in. of rain each year with a standard deviation
of 5.1 in. The amount of rain is normally distributed. During what percent of the years does
Jones City get more than 40 in. of rain?
For approximately 25.78% of the years, Jones City will get more than 40 in. of rain.
Assessing Normality
• If the distribution is unknown and the sample size is not greater than 30 (Central
Limit Theorem), we have to assess the assumption of normality.
• Our primary method is the normal probability plot. This plot graphs the observed
data, ranked in ascending order, against the “expected” Z-score of that rank.
• If the sample data were taken from a normally distributed random variable, then the
plot would be approximately linear.
• The center line is the relationship we would expect to see if the data were drawn
from a perfectly normal distribution.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 39 of 80
www.BrainKart.com
• Notice how the observed data (red dots) loosely follow this linear relationship.
Minitab also computes an Anderson-Darling test to assess normality.
• The null hypothesis for this test is that the sample data have been drawn from a
normally distributed population. A p-value greater than 0.05 supports the
assumption of normality.
Compare the histogram and the normal probability plot in this next example. The
histogram indicates a skewed right distribution.
Figure 20. Histogram and normal probability plot for skewed right data.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 40 of 80
www.BrainKart.com
The observed data do not follow a linear pattern and the p-value for the A-D test is less
than 0.005 indicating a non-normal population distribution.
Normality cannot be assumed. You must always verify this assumption. Remember, the
probabilities we are finding come from the standard NORMAL table. If our data are NOT
normally distributed, then these probabilities DO NOT APPLY.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 41 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 42 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 43 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 44 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 45 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 46 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 47 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 48 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 49 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 50 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 51 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 52 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 53 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 54 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 55 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 56 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 57 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 58 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 59 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 60 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 61 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 62 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 63 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 64 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 65 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 66 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 67 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 68 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 69 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 70 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 71 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 72 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 73 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 74 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 75 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 76 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 77 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 78 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 79 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Page 80 of 80
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester
4th Semester
2nd Semester
Deep Learning -
AD3501
Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester
7th Semester
8th Semester
Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering