Data Science Assignment

first data science assignment. It covers 3 questions related to statistics required for data science.

Uploaded by

Kanak 8064

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Data Science Assignment

first data science assignment. It covers 3 questions related to statistics required for data science.

Uploaded by

Kanak 8064

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Question 1: Why is the confusion matrix useful for evaluating the performance of a classifier ?

Answer: In data science a classifier is a type of machine learning algorithm used to assign a class
label to a data input.

Now the more accurate the classifier predicts the more efficient the classifier is. To evaluate a
classifier we use confusion matrix.

A confusion matrix also known as an error metrics is a summarised table used to assess the
performance of a classifier. The table contains the information about the actual and predicted values
for a classifier. There are four types of results,

true positives, true negatives, false positives and false negatives.

Suppose we have a classifier which predicts if the image that is given as input is an image of a dog or
not.

Now from the table we can see that there are 50 true negatives 10 false positives 5 false negatives
and hundred true positives. and the total input is 165. Now from this data we can calculate accuracy
of the classifier. We can also calculate some other factors such as misclassification rate, true positive
rate and precision. From these factors we can really understand whether the classifier is a good
classifier or not.

Finally we can say that confusion matrix is very useful to evaluate a classifier and after evaluation we
can clearly understand weather the classifier is good or not.

Question 2: What happens if two features correlate in a linear regression ?

Answer: Regression analysis is a set of statistical methods used for the estimation of relationships
between a dependent variable and one or more independent variables. Suppose we want to conduct
a regression analysis about the GDP of our country. And we get the following equation of our
regression analysis -

GDP = B0 + B1Interest rate + B2inflation rate + ei

Here GDP is the dependent variable and interest rate and inflation rate are independent variables.
Here we can mention interest rate and inflation rate as the features of the regression model. Here
we can calculate GDP using the interest rate and inflation rate values. Now there are certain
assumptions of classical linear regression models.

One of the assumptions says for a regression model the features need to be independent that
means there should not be any correlation between the features. Now if the features have a
correlation between them then the regression model might not give proper results. It is because the
independent variables are dependent on each other and we assume that they are independent. So
the regression model doesn't give accurate results.

We can say it is important not to have multicollinearity issue between the independent variables of a
regression model.

Question 3: Prove why Pearson's correlation coefficient is between - 1 and 1.

Answer: Correlation coefficients are measures of association between two or more variables.
Correlation is a measure of association that tests whether a relationship exists between two
variables. It indicates both the strength of the association and its direction. The Pearson’s product
moment correlation coefficient written as ‘r’ can describe a linear relationship between two
variables.

Now the value of correlation coefficient is between minus one and one.

-1 indicates a strong negative relationship. It implies a perfect negative relationship between the
variables.

If the correlation coefficient is 0 it indicates no relationship.

If the correlation coefficient is 1 it indicates a strong positive relationship. It implies a perfect

positive relationship between the variables.

Now the values will vary between -1 and 1. It's because if we want to have a perfect negative or
positive relationship then the correlation coefficient will be either -1 and 1. Nothing can be more
perfect then a perfect correlation. So other correlation values we will get will be between -1 and 1. It
cannot be more than 1 or less than -1.

Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet
AD Coursework2 - Rawindu
100% (1)
AD Coursework2 - Rawindu
77 pages
Linear Regression With R
No ratings yet
Linear Regression With R
45 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Predictive Analytics-Mid Sem Exam Question Bank
No ratings yet
Predictive Analytics-Mid Sem Exam Question Bank
28 pages
Assignment-Based Subjective Questions
100% (1)
Assignment-Based Subjective Questions
10 pages
ML2
No ratings yet
ML2
8 pages
Lecture 1. Part 1-Regression Analysis. Correlation and SLRM
No ratings yet
Lecture 1. Part 1-Regression Analysis. Correlation and SLRM
44 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
BA Assignment
No ratings yet
BA Assignment
10 pages
Linear Regression Assignment_Subjective
No ratings yet
Linear Regression Assignment_Subjective
7 pages
meWeek 3
No ratings yet
meWeek 3
57 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
23 pages
DA U3
No ratings yet
DA U3
10 pages
Screenshot 2023-12-04 at 11.27.14
No ratings yet
Screenshot 2023-12-04 at 11.27.14
32 pages
Regression
No ratings yet
Regression
45 pages
Correlation
No ratings yet
Correlation
5 pages
20200519072923cce68d4cc4
No ratings yet
20200519072923cce68d4cc4
28 pages
Short Notes
No ratings yet
Short Notes
44 pages
BSADM Question Bank - MBA Sem 1
No ratings yet
BSADM Question Bank - MBA Sem 1
48 pages
Regn & Marketing Research
No ratings yet
Regn & Marketing Research
23 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
2 pages
Econometrics Lectures
No ratings yet
Econometrics Lectures
22 pages
Interview questions companie
No ratings yet
Interview questions companie
72 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
27 pages
Interpreting Regression Output
No ratings yet
Interpreting Regression Output
16 pages
Interpreting Correlation
No ratings yet
Interpreting Correlation
13 pages
Module 3 - Data Analysis_S RM
No ratings yet
Module 3 - Data Analysis_S RM
63 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
4. Correlation and Regression Analysis
No ratings yet
4. Correlation and Regression Analysis
17 pages
FDS_notes
No ratings yet
FDS_notes
6 pages
Quamet1 - CM7
No ratings yet
Quamet1 - CM7
10 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
18 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
45 pages
CSE-1-PPT-MiniTest-12feb24-Correlation (3)
No ratings yet
CSE-1-PPT-MiniTest-12feb24-Correlation (3)
9 pages
Assignment Linear Regression
No ratings yet
Assignment Linear Regression
10 pages
Lesson-9 (1)
No ratings yet
Lesson-9 (1)
4 pages
Data Science Interview Questions
100% (2)
Data Science Interview Questions
55 pages
Data Science Interview Questions: Answer Here
No ratings yet
Data Science Interview Questions: Answer Here
54 pages
DataScience Interview Questions
100% (1)
DataScience Interview Questions
66 pages
B.Tech_5thSem_KCS055_Unit 2_1
No ratings yet
B.Tech_5thSem_KCS055_Unit 2_1
4 pages
Regression PDF
No ratings yet
Regression PDF
10 pages
Ba All Notes Merge - Merged
No ratings yet
Ba All Notes Merge - Merged
385 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Data Science Related Interview Question
100% (1)
Data Science Related Interview Question
77 pages
Exploratory Data Analytics-1
No ratings yet
Exploratory Data Analytics-1
27 pages
bi2
No ratings yet
bi2
25 pages
Question 4 (A) What Are The Stochastic Assumption of The Ordinary Least Squares? Assumption 1
No ratings yet
Question 4 (A) What Are The Stochastic Assumption of The Ordinary Least Squares? Assumption 1
9 pages
Subjective Questions
No ratings yet
Subjective Questions
8 pages
Simple Regression Model: Conference Paper
No ratings yet
Simple Regression Model: Conference Paper
10 pages
QUESTION BANK Data Analytics
No ratings yet
QUESTION BANK Data Analytics
6 pages
Big Data SYBBA(CA)
No ratings yet
Big Data SYBBA(CA)
12 pages
Regression Assumptions Explained
No ratings yet
Regression Assumptions Explained
6 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
Linear_Regression_datascience_basit.pdf
No ratings yet
Linear_Regression_datascience_basit.pdf
19 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
datamining unit4
No ratings yet
datamining unit4
21 pages
Unit 2 (3)
No ratings yet
Unit 2 (3)
100 pages
M2 Dav
No ratings yet
M2 Dav
148 pages
Beginner’s Guide to Correlation Analysis: Bite-Size Stats, #4
From Everand
Beginner’s Guide to Correlation Analysis: Bite-Size Stats, #4
Lee Baker
No ratings yet
Fabric Data Science 1 150
No ratings yet
Fabric Data Science 1 150
150 pages
Catálogo AZOL-GAS
No ratings yet
Catálogo AZOL-GAS
772 pages
Ola Toll Free Helpline
No ratings yet
Ola Toll Free Helpline
3 pages
PSPP Unit WISE QB 2023
No ratings yet
PSPP Unit WISE QB 2023
2 pages
Alcatel Omniswitch-6860-Datasheet-En
No ratings yet
Alcatel Omniswitch-6860-Datasheet-En
17 pages
IT Assignment Ayush Pandey
No ratings yet
IT Assignment Ayush Pandey
18 pages
Mail Merge
No ratings yet
Mail Merge
26 pages
Wireless LAN USB Adapter: Quick Start Guide
No ratings yet
Wireless LAN USB Adapter: Quick Start Guide
4 pages
Kel026 PDF Eng
No ratings yet
Kel026 PDF Eng
10 pages
midsem22-23
No ratings yet
midsem22-23
4 pages
Posteroanterior Cephalometry: Craniofacial Frontal Analysis: Joseph G. Ghafari
No ratings yet
Posteroanterior Cephalometry: Craniofacial Frontal Analysis: Joseph G. Ghafari
26 pages
Best Practices For Using Tableau With Snowflake
No ratings yet
Best Practices For Using Tableau With Snowflake
64 pages
Blue Open Studio Import Tool For Panelmate Users Guide
No ratings yet
Blue Open Studio Import Tool For Panelmate Users Guide
15 pages
General Description Features: Fan Management IC
No ratings yet
General Description Features: Fan Management IC
15 pages
Solutions To The Exercises On Independent Component Analysis
No ratings yet
Solutions To The Exercises On Independent Component Analysis
12 pages
Gaussian 09W Reference: Æleen Frisch
No ratings yet
Gaussian 09W Reference: Æleen Frisch
28 pages
Task 3
No ratings yet
Task 3
17 pages
Bizhub C257i Datasheet
No ratings yet
Bizhub C257i Datasheet
4 pages
323-1851-103 (6500 R12.4 T-Series Guide) Issue2
No ratings yet
323-1851-103 (6500 R12.4 T-Series Guide) Issue2
570 pages
Blue Papers: Gain Carbon
No ratings yet
Blue Papers: Gain Carbon
25 pages
Preview Algebra 1 Anchorchartsunit 3 Functions
No ratings yet
Preview Algebra 1 Anchorchartsunit 3 Functions
7 pages
Adobe Responsive Design Paper Exec UE v3
No ratings yet
Adobe Responsive Design Paper Exec UE v3
7 pages
Online_Concepts_&_Applications_of_Algorithmic_Trading
No ratings yet
Online_Concepts_&_Applications_of_Algorithmic_Trading
3 pages
Robot Programming: Tomas Lozano-Perez
No ratings yet
Robot Programming: Tomas Lozano-Perez
21 pages
Big Data, Digital Demand, and Decision-Making
No ratings yet
Big Data, Digital Demand, and Decision-Making
24 pages
General Items
No ratings yet
General Items
4 pages
Gexcon2019 Datacards Product EFFECTS 11.11.20
No ratings yet
Gexcon2019 Datacards Product EFFECTS 11.11.20
2 pages
The Instructions: Usability Test Script - Web Sites
No ratings yet
The Instructions: Usability Test Script - Web Sites
6 pages
Medium - Com - @21harsh12 - Devsecops Devops Project Deploying A Petshop Java Based Application With Ci CD Docker and E737d3a5501b
No ratings yet
Medium - Com - @21harsh12 - Devsecops Devops Project Deploying A Petshop Java Based Application With Ci CD Docker and E737d3a5501b
43 pages

Data Science Assignment

Uploaded by

Data Science Assignment

Uploaded by

Question 1: Why is the confusion matrix useful for evaluating the performance of a classifier ?

true positives, true negatives, false positives and false negatives.

Question 2: What happens if two features correlate in a linear regression ?

GDP = B0 + B1*Interest rate + B2*inflation rate + ei

Question 3: Prove why Pearson's correlation coefficient is between - 1 and 1.

If the correlation coefficient is 0 it indicates no relationship.

If the correlation coefficient is 1 it indicates a strong positive relationship. It implies a perfect

You might also like

GDP = B0 + B1Interest rate + B2inflation rate + ei