0% found this document useful (0 votes)
120 views8 pages

Outliers, Variances, Probability Distributions (1) (Read-Only)

This document defines and explains outliers, variance, probability distributions, and correlations. It provides definitions for outliers as distant data points from the rest of a dataset. Variance measures how widely data points vary from the expected value. Probability distributions show the distribution of probability values as a function of variables. Correlation analyzes the strength and direction of relationships between two variables on a 0-100 scale.

Uploaded by

Anagha M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views8 pages

Outliers, Variances, Probability Distributions (1) (Read-Only)

This document defines and explains outliers, variance, probability distributions, and correlations. It provides definitions for outliers as distant data points from the rest of a dataset. Variance measures how widely data points vary from the expected value. Probability distributions show the distribution of probability values as a function of variables. Correlation analyzes the strength and direction of relationships between two variables on a 0-100 scale.

Uploaded by

Anagha M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

OUTLIERS, VARIANCES,

PROBABILITY DISTRIBUTIONS,
AND CORRELATIONS
Amisha Sarika Gowda (1GA19CS011)
Amulya V D (1GA19CS013)
Anagha M (1GA19CS014)
Anagha S (1GA19CS015)
OUTLIERS
 Outliers are data points that are numerically far distant
from the rest of the points in a dataset.
 There are several reasons for the presence of outliers in
relationships. Some of these are:
 Anomalous situation
 Presence of a previously unknown fact
 Human error
 Sampling error
VARIANCE
 Variance measures by the sum of squares of the
difference in values of a variable with respect to the
expected value.
 Variance indicates how widely data points in a dataset
vary.
 A high variance indicates that the data in the dataset is
very much spread out over a large area (random dataset),
whereas a low variance indicates that the data is very
similar in nature.
PROBABILISTIC DISTRIBUTION
 Probability distribution is the distribution of P values as
a function of all possible independent values, variables,
situations, distances or variables.
 The standard normal distribution formula is:
Normal distribution
 It relates to Gaussian function. Figure shows distribution
around , standard deviation and variance
 The figure also shows the percentages of areas in five
regions with respect to the total area under the curve for
P(x).
 The variance for probability distribution represents how
individual data points relate to each other within a
dataset.
 The variance is the average of the squared differences
between each data value and the mean.
CORRELATION
 Correlation means analysis which lets us find the
association or the absence of the relationship between
two variables, x and y.
 Correlation gives the strength of the relationship
between the model and the dependent variable on a
convenient 0-100% scale.
 Correlation is a statistical technique that measures and
describes the 'strength' and 'direction’ of the relationship
between two variables.
CORRELATION
 The correlation r between the two variables x and y is:

 where n is the number of observations in the sample, xi


is the x value for observation i, x dash is the sample
mean of x, yi is the y value for observation i, y dash is
the sample mean of y, sx is the sample standard deviation
of x, and sy is the sample standard deviation of y.
THANK YOU

You might also like