Aj Ka Kaam
Aj Ka Kaam
It is an
estimate of the probability distribution of a continuous variable (quantitative variable) and was first
introduced by Karl Pearson.[1] It is a kind of bar graph. To construct a histogram, the first step is to
"bin" the range of valuesthat is, divide the entire range of values into a series of intervalsand
then count how many values fall into each interval. The bins are usually specified as consecutive,
non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but are
not required to be) of equal size.[2]
If the bins are of equal size, a rectangle is erected over the bin with height proportional to
the frequency the number of cases in each bin. A histogram may also be normalized to display
"relative" frequencies. It then shows the proportion of cases that fall into each of several categories,
with the sum of the heights equaling 1.
However, bins need not be of equal width; in that case, the erected rectangle is defined to have
its area proportional to the frequency of cases in the bin.[3] The vertical axis is then not the frequency
but frequency density the number of cases per unit of the variable on the horizontal axis.
Examples of variable bin width are displayed on Census bureau data below.
As the adjacent bins leave no gaps, the rectangles of a histogram touch each other to indicate that
the original variable is continuous.[4]
Histograms give a rough sense of the density of the underlying distribution of the data, and often
for density estimation: estimating the probability density function of the underlying variable. The total
area of a histogram used for probability density is always normalized to 1. If the length of the
intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.
A histogram can be thought of as a simplistic kernel density estimation, which uses a kernel to
smooth frequencies over the bins. This yields a smoother probability density function, which will in
general more accurately reflect distribution of the underlying variable. The density estimate could be
plotted as an alternative to the histogram, and is usually drawn as a curve rather than a set of boxes.
Another alternative is the average shifted histogram,[5] which is fast to compute and gives a smooth
curve estimate of the density without using kernels.
The histogram is one of the seven basic tools of quality control.[6]
Histograms are sometimes confused with bar charts. A histogram is used for continuous data, where
the bins represent ranges of data, while a bar chart is a plot of categorical variables. Some authors
recommend that bar charts have gaps between the rectangles to clarify the distinction
2. Grouped data are data formed by aggregating individual observations of a variable into groups,
so that a frequency distribution of these groups serves as a convenient means of summarizing
or analyzing the data.
Contents
[hide]
1Example
2Mean of grouped data
3See also
4Notes
5References
Example[edit]
The idea of grouped data can be illustrated by considering the following raw dataset:
Table 1: Time taken (in seconds) by a group of students to
answer a simple math question
20 25 24 33 13
26 8 19 31 11
16 21 17 11 34
14 15 21 18 17
The above data can be grouped in order to construct a frequency distribution in any of several ways.
One method is to use intervals as a basis.
The smallest value in the above data is 8 and the largest is 34. The interval from 8 to 34 is broken up
into smaller subintervals (called class intervals). For each class interval, the amount of data items
falling in this interval is counted. This number is called the frequency of that class interval. The
results are tabulated as a frequency table as follows:
Table 2: Frequency distribution of the time taken (in seconds) by the group of students to
answer a simple math question
5 t < 10 1
10 t < 15 4
15 t < 20 6
20 t < 25 4
25 t < 30 2
30 t < 35 3
Another method of grouping the data is to use some qualitative characteristics instead of numerical
intervals. For example, suppose in the above example, there are three types of students: 1) Below
normal, if the response time is 5 to 14 seconds, 2) normal if it is between 15 and 24 seconds, and 3)
above normal if it is 25 seconds or more, then the grouped data looks like:
Frequency
Below normal 5
Normal 10
Above normal 5
In this formula, x refers to the midpoint of the class intervals, and f is the class frequency. Note
that the result of this will be different from the sample mean of the ungrouped data. The mean
for the grouped data in the above example, can be calculated as follows:
10 t < 15 4 12.5 50
25 t < 30 2 27.5 55
TOTAL 20 405
Ungrouped Data:- The data obtained in original form are called raw data or ungrouped data.
Example. The marks obtained obtained by 25 students in a class in a certain examination are given
below;
25, 8, 37, 16, 45, 40, 29, 12, 42, 25, 14, 16, 16, 20, 10, 36, 33, 24, 25, 35, 11, 30, 45, 48.
Arranging the marks of 25 students in ascending order, we get the following array.
8, 10, 11, 12, 14, 16, 16, 16, 20, 24, 25, 25, 25, 29, 30, 33, 35, 36, 37, 40, 40, 42, 45, 45, 48.
To Prepare A Frequency Distribution Table For Raw Data Using Tally Marks
We take each observation from the data, one at a time, and indicate the
To prepare a frequency distribution table for raw data using tally mark
We take each observation from the data, one at a time, and indicate the frequency (the number of
times the observation has occurred in the data) by small line, called tally marks. For convenience,
we write tally marks in bunches of five, the fifth one crossing the fourth diagonally. In the table so
formed, the sum of all the frequency is equal to the total number of observations in the given data.
Example: The sale of shoes of various sizes at a shop, on a frequency day is given below:
7 8 5 4 9 8 5 7 6 8 9 6 7 9
8 7 9 9 6 5 8 9 4 5 5 8 9 6
Frequency Table
To put the data in a more condensed form, we make groups of suitable size, and mention the
frequency of each group. Such a table is called a grouped frequency distribution table.
Class-Interval: Each class is bounded by two figures, which are called class limits. The figure on
the left side of a class is called its lower limit and that on its right is called its upper limit.
Each class is bounded by two figures, which are called class limits. The figure n the left side of a
class is called its lower limit and that on its right is called its upper limit.
Example: Suppose the marks obtained by some students in an examination are given. We may
consider the classes 0 10, 10 20 etc. in class 0 10, we in clued 0 and exclude 10. In class 10
20, we include 10 and exclude 10 and exclude 20.
3, 25, 48, 23, 17, 13, 11, 9, 46, 41, 37, 45, 10, 19, 39, 36, 34, 5, 17, 21,
39, 33, 28, 25, 12, 3, 8, 17, 48, 34, 15, 19, 32, 32, 19, 21, 28, 32, 20, 23,
Arrange the data in ascending order and present it as a grouped data in:
3, 3, 5, 8, 9, 10, 11, 12, 13, 15, 17, 17, 17, 19, 19, 19, 20, 21, 21, 23,
23, 25, 25, 28, 28, 32, 32, 32, 33, 34, 34, 36, 37, 39, 39, 41, 45, 46, 48, 48,
Note that the class 1 10 means, marks obtained form 1 to 10, including both.
(ii) Continuous Interval Form (or Exclusive Form)
In statistics, a central tendency (or measure of central tendency) is a central or typical value for
a probability distribution.[1] It may also be called a center or location of the distribution. Colloquially,
measures of central tendency are often called averages. The term central tendency dates from the
late 1920s.[2]
The most common measures of central tendency are the arithmetic mean, the median and
the mode. A central tendency can be calculated for either a finite set of values or for a theoretical
distribution, such as the normal distribution. Occasionally authors use central tendency to denote
"the tendency of quantitative data to cluster around some central value."[2][3]
The central tendency of a distribution is typically contrasted with its dispersion or variability;
dispersion and central tendency are the often characterized properties of distributions. Analysts may
judge whether data has a strong or a weak central tendency based on its dispersion.
Contents
[hide]
1Measures
2Solutions to variational problems
o 2.1Uniqueness
3Relationships between the mean, median and mode
4See also
5References
Measures[edit]
The following may be applied to one-dimensional data. Depending on the circumstances, it may be
appropriate to transform the data before calculating a central tendency. Examples are squaring the
values or taking logarithms. Whether a transformation is appropriate and what it should be, depend
heavily on the data being analyzed.
Arithmetic mean or simply, mean
the sum of all measurements divided by the number of observations in the data set.
Median
the middle value that separates the higher half from the lower half of the data set. The
median and the mode are the only measures of central tendency that can be used for ordinal
data, in which values are ranked relative to each other but are not measured absolutely.
Mode
the most frequent value in the data set. This is the only central tendency measure that can
be used with nominal data, which have purely qualitative category assignments.
Geometric mean
the nth root of the product of the data values, where there are n of these. This measure is
valid only for data that are measured absolutely on a strictly positive scale.
Harmonic mean
the reciprocal of the arithmetic mean of the reciprocals of the data values. This measure too
is valid only for data that are measured absolutely on a strictly positive scale.
Weighted arithmetic mean
an arithmetic mean that incorporates weighting to certain data elements.
Truncated mean or trimmed mean
the arithmetic mean of data values after a certain number or proportion of the highest and
lowest data values have been discarded.
Interquartile mean
a truncated mean based on data within the interquartile range.
Midrange
the arithmetic mean of the maximum and minimum values of a data set.
Midhinge
the arithmetic mean of the two quartiles.
Trimean
the weighted arithmetic mean of the median and two quartiles.
Winsorized mean
an arithmetic mean in which extreme values are replaced by values closer to the median.
Any of the above may be applied to each dimension of
multi-dimensional data, but the results may not be
invariant to rotations of the multi-dimensional space. In
addition, there are the
Geometric median
which minimizes the sum of distances to the data points. This is the same as the median
when applied to one-dimensional data, but it is not the same as taking the median of each
dimension independently. It is not invariant to different rescaling of the different dimensions.
Quadratic mean (often known as the root mean
square)
useful in engineering, but not often used in statistics. This is because it is not a good
indicator of the center of the distribution when the distribution includes negative values.
Simplicial depth
the probability that a randomly chosen simplex with vertices from the given distribution will
contain the given center
Tukey median
a point with the property that every halfspace containing it also contains many sample points
Solutions to variational
problems[edit]
Several measures of central tendency
can be characterized as solving a
variational problem, in the sense of
the calculus of variations, namely
minimizing variation from the center.
That is, given a measure of statistical
dispersion, one asks for a measure of
central tendency that minimizes
variation: such that variation from the
center is minimal among all choices of
center. In a quip, "dispersion precedes
location". This center may or may not
be unique. In the sense of Lp spaces,
the correspondence is:
central
Lp dispersion
tendency
average absolute
L1 median
deviation
Relationships between
the mean, median and
mode[edit]
Main article: Nonparametric skew
Relationships between the
mean, median and mode
where is the
mean, is the
median, is the
mode, and is the
standard deviation.
For every
distribution,[5][6]
When a distribution is skewed to the left (red dashed curve), the tail on the
curve's left-hand side is longer than the tail on the right-hand side, and the
mean is less than the mode. This situation is also called negative
skewness.
When a distribution is skewed to the right (blue dotted curve), the tail on
the curve's right-hand side is longer than the tail on the left-hand side, and
the mean is greater than the mode. This situation is also called positive
skewness.
5. Kurtosis
From Wikipedia, the free encyclopedia
In probability theory and statistics, kurtosis (from Greek: , kyrtos or kurtos, meaning "curved,
arching") is a measure of the "tailedness" of the probability distribution of a real-valued random
variable. In a similar way to the concept of skewness, kurtosis is a descriptor of the shape of a
probability distribution and, just as for skewness, there are different ways of quantifying it for a
theoretical distribution and corresponding ways of estimating it from a sample from a population.
Depending on the particular measure of kurtosis that is used, there are various interpretations of
kurtosis, and of how particular measures should be interpreted.
The standard measure of kurtosis, originating with Karl Pearson, is based on a scaled version of the
fourth moment of the data or population. This number is related to the tails of the distribution, not its
peak;[1] hence, the sometimes-seen characterization as "peakedness" is mistaken. For this measure,
higher kurtosis is the result of infrequent extreme deviations (or outliers), as opposed to frequent
modestly sized deviations.
The kurtosis of any univariate normal distribution is 3. It is common to compare the kurtosis of a
distribution to this value. Distributions with kurtosis less than 3 are said to be platykurtic, although
this does not imply the distribution is "flat-topped" as sometimes reported. Rather, it means the
distribution produces fewer and less extreme outliers than does the normal distribution. An example
of a platykurtic distribution is the uniform distribution, which does not produce outliers. Distributions
with kurtosis greater than 3 are said to be leptokurtic. An example of a leptokurtic distribution is
the Laplace distribution, which has tails that asymptotically approach zero more slowly than a
Gaussian, and therefore produces more outliers than the normal distribution. It is also common
practice to use an adjusted version of Pearson's kurtosis, the excess kurtosis, which is the kurtosis
minus 3, to provide the comparison to the normal distribution. Some authors use "kurtosis" by itself
to refer to the excess kurtosis. For the reason of clarity and generality, however, this article follows
the non-excess convention and explicitly indicates where excess kurtosis is meant.
Alternative measures of kurtosis are: the L-kurtosis, which is a scaled version of the fourth L-
moment; measures based on four population or sample quantiles.[2] These are analogous to the
alternative measures of skewness that are not based on ordinary moments.[2]
Contents
[hide]
1Pearson moments
o 1.1Interpretation
o 1.2Moors' interpretation
2Excess kurtosis
o 2.1Mesokurtic
o 2.2Leptokurtic
o 2.3Platykurtic
3Graphical examples
o 3.1The Pearson type VII family
4Of well-known distributions
5Sample kurtosis
6Sampling variance under normality
7Upper bound
8Estimators of population kurtosis
9Applications
o 9.1Kurtosis convergence
10Other measures
11See also
12References
13Further reading
14External links
Pearson moments[edit]
The kurtosis is the fourth standardized moment, defined as
where 4 is the fourth central moment and is the standard deviation. Several letters are used in
the literature to denote the kurtosis. A very common choice is , which is fine as long as it is
clear that it does not refer to a cumulant. Other choices include 2, to be similar to the notation
for skewness, although sometimes this is instead reserved for the excess kurtosis.
The kurtosis is bounded below by the squared skewness plus 1: [3]
where 3 is the third central moment. The lower bound is realized by the Bernoulli
distribution. There is no upper limit to the excess kurtosis of a general probability
distribution, and it may be infinite.
A reason why some authors favor the excess kurtosis is that cumulants are extensive.
Formulas related to the extensive property are more naturally expressed in terms of the
excess kurtosis. For example, let X1, ..., Xn be independent random variables for which the
fourth moment exists, and let Y be the random variable defined by the sum of the Xi. The
excess kurtosis of Y is
where is the standard deviation of . In particular if all of the Xi have the same
variance, then this simplifies to
The reason not to subtract off 3 is that the bare fourth moment better generalizes
to multivariate distributions, especially when independence is not assumed.
The cokurtosis between pairs of variables is an order four tensor. For a bivariate
normal distribution, the cokurtosis tensor has off-diagonal terms that are neither 0
nor 3 in general, so attempting to "correct" for an excess becomes confusing. It is
true, however, that the joint cumulants of degree greater than two for
any multivariate normal distribution are zero.
For two random variables, X and Y, not necessarily independent, the kurtosis of the
sum, X + Y, is
where is the kurtosis, var() is the variance and E() is the expectation
operator. The kurtosis can now be seen to be a measure of the
dispersion of Z2 around its expectation. Alternatively it can be seen to
be a measure of the dispersion of Z around +1 and 1. attains its
minimal value in a symmetric two-point distribution. In terms of the
original variable X, the kurtosis is a measure of the dispersion
of X around the two values .
High values of arise in two circumstances:
Excess kurtosis[edit]
6. normal curve
Word Origin
See more synonyms on [Link]
noun, Statistics.
1.
a bell-shaped curve showing a particular distribution of probabilityover the values of a random variable.
Also called Gaussian curve, probability curve.
1890-1895
[Link] Unabridged
Based on the Random House Dictionary, Random House, Inc. 2017.
Cite This Source
Now, it can readily be seen that this normal curve may also be consideredthe expectancy curv
eif the wind has no effect.
The heavy line represents the smoothed or normal curve, deduced fromeighteen years' statisti
cs and calculated for the year 1908.
normal curve
noun
1.
How to Name Your Beard
Browse more topics on our blog
Apostrophes 101
This small mark has two primary uses: to signify possession or omitted
letters.
People invent new words all the time, but which ones actually make it?
7. Gantt chart
From Wikipedia, the free encyclopedia
A Gantt chart showing three kinds of schedule dependencies (in red) and percent complete indications.
A Gantt chart is a type of bar chart that illustrates a project schedule. Gantt charts illustrate the start
and finish dates of the terminal elements and summary elements of a project. Terminal elements
and summary elements comprise the work breakdown structure of the project. Modern Gantt charts
also show the dependency (i.e., precedence network) relationships between activities. Gantt charts
can be used to show current schedule status using percent-complete shadings and a vertical
"TODAY" line as shown here.
Although now regarded as a common charting technique, Gantt charts were considered
revolutionary when first introduced.[1] This chart is also used in information technology to represent
data that has been collected.
Contents
[hide]
1Historical development
2Example
3Gantt chart baseline
4Gantt chart timeline
5Further applications
6See also
7Citations
8References
9Further reading
10External links
Historical development[edit]
The first known tool of this type was developed in 1896 by Karol Adamiecki, who called it
a harmonogram.[2] Adamiecki did not publish his chart until 1931, however, and only in Polish, which
limited both its adoption and recognition of his authorship. The chart is named after Henry
Gantt (18611919), who designed his chart around the years 19101915.[3][4]
One of the first major applications of Gantt charts was by the United States during World War I, at
the instigation of General William Crozier.[5]
In the 1980s, personal computers allowed widespread creation of complex and elaborate Gantt
charts. The first desktop applications were intended mainly for project managers and project
schedulers. With the advent of the Internet and increased collaboration over networks at the end of
the 1990s, Gantt charts became a common feature of web-based applications, including
collaborative groupware.
Example[edit]
In the following table there are seven tasks, labeled a through g. Some tasks can be done
concurrently (a and b) while others cannot be done until their predecessor task is complete
(c and dcannot begin until a is complete). Additionally, each task has three time estimates: the
optimistic time estimate (O), the most likely or normal time estimate (M), and the pessimistic time
estimate (P). The expected time (TE) is estimated using the beta probability distribution for the time
estimates, using the formula (O + 4M + P) 6.
a 2 4 6 4.00
b 3 5 9 5.33
c a 4 5 7 5.17
d a 4 6 10 6.33
e b, c 4 5 7 5.17
f d 3 4 8 4.50
g e 3 5 8 5.17
Once this step is complete, one can draw a Gantt chart or a network diagram.
A Gantt chart created using Microsoft Project (MSP). Note (1) the critical path is in red, (2) the slack is
the black lines connected to non-critical activities, (3) since Saturday and Sunday are not work days
and are thus excluded from the schedule, some bars on the Gantt chart are longer if they cut through a
weekend.
Further applications[edit]
Gantt charts can be used for scheduling generic resources as well as project management.
They can also be used for scheduling production processes and employee rostering.[6] In the
latter context, they may also be known as timebar schedules. Gantt charts can be used to track
shifts or tasks and also vacations or other types of out-of-office time.[7] Specialized employee
scheduling software may output schedules as a Gantt chart, or they may be created through
popular desktop publishing software.