Name- Shivam Israni FYMCA(M) Roll no 23
ADBMS ASSIGNMENT 5
Q.1) Comparison of all classification algorithms
Logistic regression:
Logistic regression is a machine learning algorithm for classification & a calculation
used to predict a binary outcome: either something happens, or does not. This can be
exhibited as Yes/No, Pass/Fail, True/False, etc. In this algorithm, the probabilities
describing the possible outcomes of a single trial are modelled using a logistic function.
Logistic regression is designed for this purpose (classification), and is most useful for
understanding the influence of several independent variables on a single outcome
variable. Works only when the predicted variable is binary, assumes all predictors are
independent of each other and assumes data is free of missing values.
Naive Bayes :
Naive Bayes calculates the possibility of whether a data point belongs within a certain
category or does not. In text analysis, it can be used to categorize words or phrases as
belonging to a preset “tag” (classification) or not. It is based on Bayes’ theorem with the
assumption of independence between every pair of features. Naive Bayes classifiers
work well in many real-world situations such as document classification and spam
filtering. This algorithm requires a small amount of training data to estimate the
necessary parameters. Naive Bayes classifiers are extremely fast compared to more
sophisticated methods.
KNN:
K-nearest neighbors (KNN) is a pattern recognition algorithm that uses training
datasets to find the k closest relatives in future examples. When k-NN is used in
classification, you calculate to place data within the category of its nearest neighbor. If k
= 1, then it would be placed in the class nearest 1. K is classified by a plurality poll of its
neighbors. It is a type of lazy learning as it does not attempt to construct a general
internal model, but simply stores instances of the training data. Classification is
computed from a simple majority vote of the k nearest neighbors of each point. This
algorithm is simple to implement, robust to noisy training data, and effective if training
data is large. It needs to determine the value of K and the computation cost is high as it
needs to compute the distance of each instance to all the training samples
Decision tree:
A decision tree is a supervised learning algorithm that is perfect for classification
problems, as it’s able to order classes on a precise level. It works like a flow chart,
separating data points into two similar categories at a time from the “tree trunk” to
Name- Shivam Israni FYMCA(M) Roll no 23
“branches,” to “leaves,” where the categories become more finitely similar. This creates
categories within categories, allowing for organic classification with limited human
supervision. Given a data of attributes together with its classes, a decision tree
produces a sequence of rules that can be used to classify the data. Decision Tree is
simple to understand and visualise, requires little data preparation, and can handle both
numerical and categorical data.
Random forest:
The random forest algorithm is an expansion of decision tree, in that, you first construct
some-axis real-world decision trees with training data, then fit your new data within one
of the trees as a “random forest”. It is a meta-estimator that fits a number of decision
trees on various sub-samples of datasets and uses average to improve the predictive
accuracy of the model and controls overfitting. The sub-sample size is always the same
as the original input sample size but the samples are drawn with replacement.
Reduction in over-fitting and random forest classifier is more accurate than decision
trees in most cases
Q.2) Construct DT for given data set & Bayesian classification.
Name- Shivam Israni FYMCA(M) Roll no 23
Name- Shivam Israni FYMCA(M) Roll no 23