Data Science Assignment
Data Science Assignment
Answer: In data science a classifier is a type of machine learning algorithm used to assign a class
label to a data input.
Now the more accurate the classifier predicts the more efficient the classifier is. To evaluate a
classifier we use confusion matrix.
A confusion matrix also known as an error metrics is a summarised table used to assess the
performance of a classifier. The table contains the information about the actual and predicted values
for a classifier. There are four types of results,
Suppose we have a classifier which predicts if the image that is given as input is an image of a dog or
not.
Now from the table we can see that there are 50 true negatives 10 false positives 5 false negatives
and hundred true positives. and the total input is 165. Now from this data we can calculate accuracy
of the classifier. We can also calculate some other factors such as misclassification rate, true positive
rate and precision. From these factors we can really understand whether the classifier is a good
classifier or not.
Finally we can say that confusion matrix is very useful to evaluate a classifier and after evaluation we
can clearly understand weather the classifier is good or not.
Answer: Regression analysis is a set of statistical methods used for the estimation of relationships
between a dependent variable and one or more independent variables. Suppose we want to conduct
a regression analysis about the GDP of our country. And we get the following equation of our
regression analysis -
One of the assumptions says for a regression model the features need to be independent that
means there should not be any correlation between the features. Now if the features have a
correlation between them then the regression model might not give proper results. It is because the
independent variables are dependent on each other and we assume that they are independent. So
the regression model doesn't give accurate results.
We can say it is important not to have multicollinearity issue between the independent variables of a
regression model.
Answer: Correlation coefficients are measures of association between two or more variables.
Correlation is a measure of association that tests whether a relationship exists between two
variables. It indicates both the strength of the association and its direction. The Pearson’s product
moment correlation coefficient written as ‘r’ can describe a linear relationship between two
variables.
Now the value of correlation coefficient is between minus one and one.
-1 indicates a strong negative relationship. It implies a perfect negative relationship between the
variables.
Now the values will vary between -1 and 1. It's because if we want to have a perfect negative or
positive relationship then the correlation coefficient will be either -1 and 1. Nothing can be more
perfect then a perfect correlation. So other correlation values we will get will be between -1 and 1. It
cannot be more than 1 or less than -1.