0% found this document useful (0 votes)

14 views34 pages

Understanding Correlation in Statistics

The document discusses correlation as a statistical technique to identify relationships between variables, detailing types such as positive, negative, and zero correlation, as well as simple, partial, and multiple correlation. It also introduces regression analysis, distinguishing between simple and multiple regression, and outlines supervised and unsupervised learning in machine learning, including their applications and advantages. Key concepts include the use of correlation coefficients and various algorithms for classification and regression tasks.

Uploaded by

favoha4730

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views34 pages

Understanding Correlation in Statistics

Uploaded by

favoha4730

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

GLS UNIVERSITY

BCA SEMESTER -III

210302301 STATISTICS FOR DATA ANALYSIS
MODULE 3

UNIT – 4 : CORRELATION

Introduction:
In today’s business world we come across many activities, which are dependent
on each other. In businesses we see large number of problems involving the use of two
or more variables. Identifying these variables and its dependency helps us in resolving
the many problems. Many times there are problems or situations where two variables
seem to move in the same direction such as both are increasing or decreasing. At times
an increase in one variable is accompanied by a decline in another. For example,
family income and expenditure, price of a product and its demand, advertisement
expenditure and sales volume etc. If two quantities vary in such a way that movements
in one are accompanied by movements in the other, then these quantities are said to
be correlated.

Meaning:
Correlation is a statistical technique to ascertain the association or relationship
between two or more variables. Correlation analysis is a statistical technique to study
the degree and direction of relationship between two or more variables.
A correlation coefficient is a statistical measure of the degree to which changes
to the value of one variable predict change to the value of another. When the
fluctuation of one variable reliably predicts a similar fluctuation in another variable,
there’s often a tendency to think that means that the change in one causes the change
in the other.

Types of Correlation:
Correlation is described or classified in several different ways. Three of the
most important are:

I. Positive and Negative

II. Simple, Partial and Multiple
III. Linear and non-linear

I. Positive, Negative and Zero Correlation:

Whether correlation is positive (direct) or negative (in-versa) would depend

upon the direction of change of the variable.

Positive Correlation: If both the variables vary in the same direction, correlation is
said to be positive. It means if one variable is increasing, the other on an average is
also increasing or if one variable is decreasing, the other on an average is also
deceasing, then the correlation is said to be positive correlation. For example, the
correlation between heights and weights of a group of persons is a positive
correlation.

Negative Correlation: If both the variables vary in opposite direction, the correlation
is said to be negative. If means if one variable increases, but the other variable
decreases or if one variable decreases, but the other variable increases, then the
correlation is said to be negative correlation. For example, the correlation between the
price of a product and its demand is a negative correlation
Zero Correlation: Actually it is not a type of correlation but still it is called as zero or
no correlation. When we don’t find any relationship between the variables then, it is
said to be zero correlation. It means a change in value of one variable doesn’t influence
or change the value of other variable. For example, the correlation between weight of
person and intelligence is a zero or no correlation.

II. Simple, Partial and Multiple Correlation:

The distinction between simple, partial and multiple correlation is based upon
the number of variables studied.
Simple Correlation: When only two variables are studied, it is a case of simple
correlation. For example, when one studies relationship between the marks secured
by student and the attendance of student in class, it is a problem of simple correlation.
Partial Correlation: In case of partial correlation one studies three or more variables
but considers only two variables to be influencing each other and the effect of other
influencing variables being held constant. For example, in above example of
relationship between student marks and attendance, the other variable influencing
such as effective teaching of teacher, use of teaching aid like computer, smart board
etc are assumed to be constant.
Multiple Correlation: When three or more variables are studied, it is a case of
multiple correlation. For example, in above example if study covers the relationship
between student marks, attendance of students, effectiveness of teacher, use of
teaching aids etc, it is a case of multiple correlation.

III. Linear and Non-linear Correlation:

Depending upon the constancy of the ratio of change between the variables, the
correlation may be Linear or Non-linear Correlation.

Linear Correlation: If the amount of change in one variable bears a constant ratio to
the amount of change in the other variable, then correlation is said to be linear. If such
variables are plotted on a graph paper all the plotted points would fall on a straight
line. For example: If it is assumed that, to produce one unit of finished product we
need 10 units of raw materials, then subsequently to produce 2 units of finished
product we need double of the one unit.

Non-linear Correlation: If the amount of change in one variable does not bear a
constant ratio to the amount of change to the other variable, then correlation is said to
be non-linear. If such variables are plotted on a graph, the points would fall on a curve
and not on a straight line. For example, if we double the amount of advertisement
expenditure, then sales volume would not necessarily be doubled.

State in each case whether there is

(a) Positive Correlation
(b) Negative Correlation
(c) No Correlation
Karl Pearson’s Coefficient of Correlation:

Karl Pearson’s Coefficient of Correlation:

Karl Pearson’s method of calculating coefficient of correlation is based on the
covariance of the two variables in a series. This method is widely used in practice and
the coefficient of correlation is denoted by the symbol “r”. If the two variables under
study are X and Y, the following formula suggested by Karl Pearson can be used for
measuring the degree of relationship of correlation.

Example 1
From following information find the correlation coefficient between advertisement
expenses and sales volume using Karl Pearson’s coefficient of correlation method.
Interpretation: From the above calculation it is very clear that there is high degree of
positive correlation i.e. r = 0.7866, between the two variables. i.e. Increase in
advertisement expenses leads to increased sales volume.

Example 2
Find the correlation coefficient between age and playing habits of the following
students using Karl Pearson’s coefficient of correlation method.

Solution:
To find the correlation between age and playing habits of the students, we need to
compute the percentages of students who are having the playing habit.

Percentage of playing habits = No. of Regular Players / Total No. of Students * 100

Now, let us assume that ages of the students are variable X and percentages of playing
habits are variable Y.
Example 3
Find Karl Pearson’s coefficient of correlation between capital employed and profit
obtained from the following data.
Example 4

Coefficient of correlation between X and Y is 0.3. Their covariance is 9. The variance of

X is 16. Find the standard devotion of Y series.
Regression
Regression analysis is a statistical method that
examines the relationship between one or more
independent variables and a dependent variable. It's
commonly used for prediction and understanding the
strength and nature of relationships between variables.
Regression analysis
Types of Regression
Simple Regression: This involves one independent
variable predicting a dependent variable. It's like
predicting someone's height based solely on their age.
Multiple Regression: Here, we have multiple
independent variables predicting a dependent variable.
Imagine predicting someone's weight based on age,
height, and maybe even their daily pizza intake.
Types of Regression
Linear Regression: This type assumes a linear
relationship between variables, meaning the change in
the dependent variable is proportional to the change in
the independent variable(s). Think of a straight line on
a graph.
Non-Linear Regression: In contrast, this type
acknowledges a non-linear relationship. The
relationship might be curved, like a sine wave or a
parabola, making it a bit trickier to model.
Supervised & Unsupervised Learning

<date/time> <footer> 1
Introduction

Machine learning is a field of computer science that gives

computers the ability to learn without being explicitly programmed.
Supervised learning and unsupervised learning are two main
types of machine learning.

17/02/2025 2
17/02/2025 3
Supervised Learning
Supervised learning is a form of ML in which the model is trained to
associate input data with specific output labels, drawing from labeled
training data.
Here, the algorithm is furnished with a dataset containing input features
paired with corresponding output labels.
The model's objective is to discern the correlation between input features
and output labels, enabling it to provide precise predictions or
classifications when confronted with unseen data.

17/02/2025 4

For example, a
labeled dataset of
images of
Elephant, Camel
and Cow would
have each image
tagged with either
“Elephant” ,
“Camel”or “Cow.”

17/02/2025 5
●
Supervised learning involves training a machine from labeled data.
●
Labeled data consists of examples with the correct answer or
classification.
●
The machine learns the relationship between inputs (fruit images) and
outputs (fruit labels).
●
The trained machine can then make predictions on new, unlabeled data.
Example: Let’s say you have a fruit basket that you want to identify. The
machine would first analyze the image to extract features such as its
shape, color, and texture. Then, it would compare these features to the
features of the fruits it has already learned about. If the new image’s
features are most similar to those of an apple, the machine would predict
that the fruit is an apple.
17/02/2025 6
Types of Supervised Learning

Classification
In classification tasks, the model predicts a discrete class label or category.
For example, it classifies emails as spam or not based on features like
keywords and sender information.

Regression
In regression tasks, the model anticipates a continuous value or quantity.
For instance, it forecasts house prices by considering features such as
square footage, number of bedrooms, and location.

17/02/2025 7
1. Regression
Regression is a type of supervised learning that is used to predict
continuous values, such as house prices, stock prices, or customer
churn. Regression algorithms learn a function that maps from the input
features to the output value.

Some common regression algorithms include:

Linear Regression

Polynomial Regression

Support Vector Machine Regression

Decision Tree Regression

Random Forest Regression
17/02/2025 8
2- Classification

Classification is a type of supervised learning that is used to predict
categorical values, such as whether a customer will churn or not,
whether an email is spam or not, or whether a medical image shows a
tumor or not. Classification algorithms learn a function that maps from
the input features to a probability distribution over the output classes.

Some common classification algorithms include:

Logistic Regression

Support Vector Machines

Decision Trees

Random Forests

Naive Baye
17/02/2025 9
Applications of Supervised learning

17/02/2025 10
Applications of Supervised learning
1. Spam Filtering : Identify and classify spam emails based on their content,
helping users avoid unwanted messages.

2. Image Classification: classify images into different categories, such as

animals, objects, or scenes, facilitating tasks like image search, content
moderation, and image-based product recommendations.
Facebook's facial recognition feature uses supervised learning to identify
people in photos.

3. Medical Diagnosis : Assist in medical diagnosis by analyzing patient data,

such as medical images, test results, and patient history, to identify patterns
that suggest specific diseases or conditions.

17/02/2025 11
Applications of Supervised learning
4. Fraud Detection : Analyze financial transactions and identify patterns that
indicate fraudulent activity, helping financial institutions prevent fraud and
protect their customers.

5. Natural Language Processing (NLP) : plays a crucial role in NLP tasks,

including sentiment analysis, machine translation, and text summarization,
enabling machines to understand and process human language effectively.

6. Speech Recognition: Virtual assistants like Siri and Alexa use supervised
learning to recognize spoken words and phrases.

17/02/2025 12
Advantages

Supervised learning allows collecting data and produces data output from
previous experiences.

Helps to optimize performance criteria with the help of experience.

Supervised machine learning helps to solve various types of real-world
computation problems.

It performs classification and regression tasks.

It allows estimating or mapping the result to a new sample.

We have complete control over choosing the number of classes we want in
the training data.

17/02/2025 13
Disadvantages

Classifying big data can be challenging.

Training for supervised learning needs a lot of computation time. So, it
requires a lot of time.

Supervised learning cannot handle all complex tasks in Machine
Learning.

Computation time is vast for supervised learning.

It requires a labelled data set.

It requires a training process.

17/02/2025 14
Unsupervised Learning
Unsupervised learning is a type of machine learning that learns from
unlabeled data. This means that the data does not have any pre-existing
labels or categories.
The goal of unsupervised learning is to discover patterns and relationships
in the data without any explicit guidance.
Unsupervised learning is the training of a machine using information that is
neither classified nor labeled and allowing the algorithm to act on that
information without guidance.
Here the task of the machine is to group unsorted information according to
similarities, patterns, and differences without any prior training of data.

17/02/2025 15
17/02/2025 16
●
Unsupervised learning allows the model to discover patterns and relationships in
unlabeled data.
●
Clustering algorithms group similar data points together based on their inherent
characteristics.
●
Feature extraction captures essential information from the data, enabling the model to
make meaningful distinctions.
●
Label association assigns categories to the clusters based on the extracted patterns and
characteristics.
Example: For instance, suppose it is given an image having both dogs and cats which it has
never seen. Thus the machine has no idea about the features of dogs and cats so we can’t
categorize it as ‘dogs and cats ‘. But it can categorize them according to their similarities,
patterns, and differences, i.e., we can easily categorize the above picture into two parts. The
first may contain all pics having dogs in them and the second part may contain all pics
having cats in them.
17/02/2025 17
Types of Unsupervised Learning
Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
Common clustering algorithms include K-means clustering, hierarchical
clustering, DBSCAN, and Gaussian mixture models (GMM).
Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people that
buy X also tend to buy Y.
The most well-known algorithm for association rule learning is Apriori, which
is used for market basket analysis. Association rule learning is commonly
applied in retail to analyze purchasing patterns, identify frequently co-
occurring items, and make recommendations.
17/02/2025 18
Applications of Unsupervised Learning
1. Customer Segmentation: Online retailers use unsupervised learning to segment
customers based on their buying behavior.
2. Recommendation Systems: Netflix uses unsupervised learning to recommend
movies and TV shows based on user viewing history.
3. Anomaly Detection: Credit card companies use unsupervised learning to detect
unusual transaction patterns that may indicate fraud.
4. Scientific discovery: Unsupervised learning can uncover hidden relationships and
patterns in scientific data, leading to new hypotheses and insights in various scientific
fields.
5. Image analysis: Unsupervised learning can group images based on their content,
facilitating tasks such as image classification, object detection, and image retrieval.

17/02/2025 19
Advantages


It does not require training data to be labeled.

Dimensionality reduction can be easily accomplished using unsupervised
learning.

Capable of finding previously unknown patterns in data.

Unsupervised learning can help you gain insights from unlabeled data that you
might not have been able to get otherwise.

Unsupervised learning is good at finding patterns and relationships in data
without being told what to look for. This can help you learn new things about
your data.

17/02/2025 20
Disadvantages

Difficult to measure accuracy or effectiveness due to lack of predefined answers
during training.

The results often have lesser accuracy.

The user needs to spend time interpreting and label the classes which follow that
classification.

Unsupervised learning can be sensitive to data quality, including missing values,
outliers, and noisy data.

Without labeled data, it can be difficult to evaluate the performance of
unsupervised learning models, making it challenging to assess their effectiveness

17/02/2025 21
Supervised Vs. Unsupervised Learning

17/02/2025 22

Understanding Correlation in Business Statistics
No ratings yet
Understanding Correlation in Business Statistics
27 pages
Understanding Correlation in Statistics
No ratings yet
Understanding Correlation in Statistics
33 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
27 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
34 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
100 pages
Understanding Correlation in Business Statistics
No ratings yet
Understanding Correlation in Business Statistics
30 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
21 pages
Major Attributes of Correlation Analysis
No ratings yet
Major Attributes of Correlation Analysis
8 pages
Understanding Correlation Analysis Techniques
No ratings yet
Understanding Correlation Analysis Techniques
12 pages
Correlation and Regression in Statistics
No ratings yet
Correlation and Regression in Statistics
32 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Understanding Correlation Analysis Basics
No ratings yet
Understanding Correlation Analysis Basics
4 pages
Understanding Correlation in Business Statistics
No ratings yet
Understanding Correlation in Business Statistics
27 pages
Understanding Correlation Analysis Techniques
No ratings yet
Understanding Correlation Analysis Techniques
14 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
30 pages
Understanding Correlation in Statistics
No ratings yet
Understanding Correlation in Statistics
22 pages
Correlation Maths
No ratings yet
Correlation Maths
27 pages
Understanding Correlation and Regression Analysis
No ratings yet
Understanding Correlation and Regression Analysis
3 pages
Correlation Applications in Economics
No ratings yet
Correlation Applications in Economics
22 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
19 pages
Correlation Analysis1
No ratings yet
Correlation Analysis1
25 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
30 pages
Correlation Analysis in Statistics
No ratings yet
Correlation Analysis in Statistics
5 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
10 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
3 pages
Understanding Scatter Plots and Correlation
No ratings yet
Understanding Scatter Plots and Correlation
33 pages
Understanding Correlation in Data Science
No ratings yet
Understanding Correlation in Data Science
23 pages
Understanding Karl Pearson's Correlation
No ratings yet
Understanding Karl Pearson's Correlation
14 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
4 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
64 pages
Correlation Analysis and Types
100% (2)
Correlation Analysis and Types
46 pages
Online Class Etiquette and Correlation Analysis
No ratings yet
Online Class Etiquette and Correlation Analysis
49 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
24 pages
Business Statistics and Analytics Unit 2 Notes, Correlation and Regression
No ratings yet
Business Statistics and Analytics Unit 2 Notes, Correlation and Regression
26 pages
Correlation Analysis Explained
No ratings yet
Correlation Analysis Explained
23 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
22 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
48 pages
Data Analysis Techniques in Research
No ratings yet
Data Analysis Techniques in Research
39 pages
Correlation Analysis: Understanding Relationships
No ratings yet
Correlation Analysis: Understanding Relationships
114 pages
UNIT-3 Correlation Is A Bivariate Analysis That Measures The Strength of Association
No ratings yet
UNIT-3 Correlation Is A Bivariate Analysis That Measures The Strength of Association
4 pages
Correlation and Regression Analysis Overview
No ratings yet
Correlation and Regression Analysis Overview
34 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
50 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
24 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
8 pages
Understanding Correlation Types and Methods
No ratings yet
Understanding Correlation Types and Methods
15 pages
Understanding Regression Analysis
100% (1)
Understanding Regression Analysis
12 pages
Regression Analysis in Business Statistics
No ratings yet
Regression Analysis in Business Statistics
43 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
21 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
23 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
20 pages
Understanding Correlation Types
No ratings yet
Understanding Correlation Types
9 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
5 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
31 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
17 pages
Reinforcement Learning Explained
No ratings yet
Reinforcement Learning Explained
39 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
33 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
58 pages
Deep Learning Techniques in Genomics
No ratings yet
Deep Learning Techniques in Genomics
15 pages
Intro to Linear Regression in ML
No ratings yet
Intro to Linear Regression in ML
56 pages
Machine Learning for Time Series Course
No ratings yet
Machine Learning for Time Series Course
78 pages
AI and ML Fundamentals Question Bank
No ratings yet
AI and ML Fundamentals Question Bank
8 pages
Supervised vs. Unsupervised Learning
No ratings yet
Supervised vs. Unsupervised Learning
23 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
7 pages
Neural Network Training Overview
No ratings yet
Neural Network Training Overview
34 pages
TsSHAP: Explainability in Time Series Forecasting
No ratings yet
TsSHAP: Explainability in Time Series Forecasting
11 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
65 pages
COMP3057: Intro to AI & Machine Learning
No ratings yet
COMP3057: Intro to AI & Machine Learning
44 pages
Machine Learning in Meteorology: Basics
No ratings yet
Machine Learning in Meteorology: Basics
21 pages
Machine Learning in Biomedical Applications
No ratings yet
Machine Learning in Biomedical Applications
25 pages
AI Applications and Software Development Guide
No ratings yet
AI Applications and Software Development Guide
39 pages
CS0-002 Cybersecurity Analyst Exam Guide
No ratings yet
CS0-002 Cybersecurity Analyst Exam Guide
28 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
151 pages
Anomaly Detection vs. Supervised Learning
No ratings yet
Anomaly Detection vs. Supervised Learning
21 pages
Understanding Generative AI Basics
No ratings yet
Understanding Generative AI Basics
87 pages
DL - Unit - 1 - Foundations of Deep Learning
No ratings yet
DL - Unit - 1 - Foundations of Deep Learning
35 pages
Supervised vs. Unsupervised Learning
No ratings yet
Supervised vs. Unsupervised Learning
8 pages
Machine Learning Concepts and Algorithms
No ratings yet
Machine Learning Concepts and Algorithms
3 pages
Machine Learning's Role in Finance
No ratings yet
Machine Learning's Role in Finance
24 pages
Artificial Neural Network Tutorial
100% (2)
Artificial Neural Network Tutorial
69 pages
Understanding Pre-training in NLP
No ratings yet
Understanding Pre-training in NLP
5 pages
R Programming for Data Science Basics
No ratings yet
R Programming for Data Science Basics
6 pages
DR R Praba-StudyonMLAlgorithms
No ratings yet
DR R Praba-StudyonMLAlgorithms
7 pages
Interpretable Controls in GANs
No ratings yet
Interpretable Controls in GANs
36 pages
Performance Profile for Machine Learning
No ratings yet
Performance Profile for Machine Learning
51 pages

Understanding Correlation in Statistics

Uploaded by

Understanding Correlation in Statistics

Uploaded by

GLS UNIVERSITY

BCA SEMESTER -III

I. Positive and Negative

I. Positive, Negative and Zero Correlation:

Whether correlation is positive (direct) or negative (in-versa) would depend

II. Simple, Partial and Multiple Correlation:

III. Linear and Non-linear Correlation:

State in each case whether there is

Karl Pearson’s Coefficient of Correlation:

Coefficient of correlation between X and Y is 0.3. Their covariance is 9. The variance of

Machine learning is a field of computer science that gives

2. Image Classification: classify images into different categories, such as

3. Medical Diagnosis : Assist in medical diagnosis by analyzing patient data,

5. Natural Language Processing (NLP) : plays a crucial role in NLP tasks,

You might also like