0% found this document useful (0 votes)

6 views47 pages

Chapter 4 SqCzYr

Chapter 4 discusses various machine learning algorithms for classification, focusing on decision trees, Bayes theorem, and activation functions. Decision trees are a graphical representation for decision-making, utilizing concepts like entropy and attribute selection measures to optimize their structure. The chapter also covers the advantages and disadvantages of these algorithms, highlighting their applications in neural networks.

Uploaded by

Olana Kelbesa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views47 pages

Chapter 4 SqCzYr

Uploaded by

Olana Kelbesa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Chapter 4

Machine Learning Algorithms for Classification,

Activation Functions And Perceptron

I. Classification Algorithms in Machine Learning

1. Decision Trees
§Decision Tree is a Supervised learning technique that can be
used for both classification and Regression problems, but mostly
it is preferred for solving Classification problems.
§It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
§The decisions or the test are performed on the basis of features of
the given dataset.
§It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions. 1
Decision Trees…Cont’D
§It
is called a decision tree because, similar to a tree, it starts
with the root node, which expands on further branches and
constructs a tree-like structure.
§Inorder to build a tree, we use the CART algorithm, which
stands for Classification and Regression Tree algorithm.
§Adecision tree simply asks a question, and based on the
answer (Yes/No), it further split the tree into subtrees.

2
Decision Trees…Cont’D
§The following diagram explains the general structure of
a decision tree:

3
Decision Trees…Cont’D
Why We Use Decision Trees?
§Thereare various algorithms in Machine learning, so
choosing the best algorithm for the given dataset and
problem is the main point to remember while creating a
machine learning model. The following are the two
reasons for using the Decision tree:
ØDecision Trees usually mimic human thinking ability while
making a decision, so it is easy to understand.
ØThe logic behind the decision tree can be easily understood
because it shows a tree-like structure. 4
Decision Trees…Cont’D
Decision Tree Terminologies
Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
Splitting: is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted
branches from the tree.
Parent/Child node: The root node of the tree is called the
parent node, and other nodes are called the child nodes. 5
Decision Trees…Cont’D
How does the Decision Tree Algorithm Work?
§ In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree.
§ The complete process can be better understood using the following
algorithm:
Ø Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
Ø Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
Ø Step-3: Divide the S into subsets that contains possible values for the
best attributes.
Ø Step-4: Generate the decision tree node, which contains the best
attribute.
Ø Step-5: Recursively make new decision trees using the subsets of 6
the
dataset created in step -3.
Decision Trees…Cont’D
Example: Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve this
problem, the decision tree starts with the root node (Salary attribute by
ASM). The root node splits further into the next decision node (distance
from the office) and one leaf node based on the corresponding labels. The
next decision node further gets split into one decision node (Cab facility)
and one leaf node. Finally, the decision node splits into two leaf nodes
(Accepted offers and Declined offer). Consider the following diagram:

7
Entropy in Machine Learning
§Machine Learning contains lots of algorithms and
concepts that solve complex problems easily, and one of
them is entropy in Machine Learning.
§Almost everyone must have heard the Entropy word once
during their school or college days in physics and chemistry.
§The base of entropy comes from physics, where it is
defined as the measurement of disorder, randomness,
unpredictability, or impurity in the system.

8
Entropy …Cont’D
§From Machine Learning side, Entropy is defined as the
randomness or measuring the disorder of the information
being processed in Machine Learning.
§Further, in other words, we can say that entropy is the
machine learning metric that measures the unpredictability or
impurity in the system.

9
Entropy …Cont’D
§When information is processed in the system, then every piece of
information has a specific value to make and can be used to draw
conclusions from it.
§So, if it is easier to draw a valuable conclusion from a piece of
information, then entropy will be lower in Machine learning, or if
entropy is higher, then it will be difficult to draw any conclusion
from that piece of information.
§Entropy is a useful tool in machine learning to understand various
concepts such as feature selection, building decision trees, and
fitting classification models, etc.
§Being a machine learning engineer and professional data scientist,
you must have in-depth knowledge of entropy in machine learning.
10
Entropy …Cont’D
§We can understand the term entropy with any simple example:
flipping a coin.
§When we flip a coin, then there can be two outcomes. However, it
is difficult to conclude what would be the exact outcome while
flipping a coin because there is no direct relation between flipping a
coin and its outcomes.
§There is a 50% probability of both outcomes; then, in such
scenarios, entropy would be high. This is the essence of entropy in
machine learning.

11
Entropy …Cont’D
§Consider a dataset having a total number of N classes, then the
entropy (E) can be determined with the formula:

Where, Pi = Probability of randomly selecting an example in class I

§Entropyalways lies between 0 and 1, however depending on
the number of classes in the dataset, it can be greater than 1.
§Let'sunderstand it with an example where we have a dataset
having three colors of fruits as red, green, and yellow.
12
Entropy …Cont’D
§Suppose we have 2 red, 2 green, and 4 yellow observations
throughout the dataset. Then as per the above equation:
E=−(prlog2pr+pglog2pg+pylog2py)
Where;
Pr = Probability of choosing red fruits, Pg = Probability of choosing
green fruits and Py = Probability of choosing yellow fruits.
Pr = 2/8 =1/4, Pg = 2/8 =1/4 and Py = 4/8 = 1/2
Now our final equation will be such as:

So, entropy will be 1.5.

13
Entropy …Cont’D

14
Attribute Selection Measures (ASM)
§While implementing a Decision tree, the main issue
arises that how to select the best attribute for the root
node and for sub-nodes.
§So, to solve such problems there is a technique which is
called as Attribute Selection Measure (ASM). By this
measurement, we can easily select the best attribute for
the nodes of the tree. There are two popular techniques
for ASM, which are:
ØInformation Gain
ØGini Index

15
ASM…Cont’D
I. Information Gain:
§Information gain is the measurement of changes in entropy after
the segmentation dataset based on an attribute.
§It calculates how much information a feature provides us about a
class.
§According to the value of information gain, we split the node and
build the decision tree.
§A decision tree algorithm always tries to maximize the value of
information gain, and a node/attribute having the highest
information gain is split first. It can be calculated using the
following formula:
Information Gain= Entropy(S)-[(Weighted Avg) *Entropy(each feature)]
16
ASM…Cont’D
II. Gini Index:
§Gini index is a measure of impurity or purity used while creating a
decision tree in the CART algorithm.
§An attribute with the low Gini index should be preferred as
compared to the high Gini index.
§It only creates binary splits, and the CART algorithm uses the Gini
index to create binary splits.
§Gini index can be calculated using the following formula:
Gini Index= 1- ∑jPj2, Where ‘Pj’ is the probability of an object being
classified to a particular class

17
Pruning in Getting an Optimal Decision tree
§ Pruning is a process of deleting the unnecessary nodes from a
tree in order to get the optimal decision tree.
§ A too-large tree increases the risk of overfitting, and a small
tree may not capture all the important features of the dataset.
§ Therefore, a technique that decreases the size of the learning
tree without reducing accuracy is known as Pruning.
§ There are mainly two types of tree pruning technology used:
Ø Cost Complexity Pruning
Ø Reduced Error Pruning.

18
Advantages And Disadvantages of the Decision Tree
Advantages of the Decision Tree
§ It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.
§ It can be very useful for solving decision-related problems.
§ It helps to think about all the possible outcomes for a problem.
§ There is less requirement of data cleaning compared to other algorithms
Disadvantages of the Decision Tree
§ The decision tree contains lots of layers, which makes it complex.
§ It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
§ For more class labels, the computational complexity of the
decision tree may increase.
19
2. Bayes Theorem in Machine learning
§Bayes theorem is also known with some other name such as Bayes
rule or Bayes Law. Bayes theorem helps to determine the
probability of an event with random knowledge
§An important concept of Bayes theorem named Bayesian method is
used to calculate conditional probability in Machine Learning
application that includes classification tasks.
§Further, a simplified version of Bayes theorem (Naïve Bayes
classification) is also used to reduce computation time and average
cost of the projects
§It is used to calculate the probability of occurring one event while
other one already occurred.
§It is a best method to relate the condition probability and marginal
probability.
20
Bayes Theorem …Cont’D
§Bayes theorem is one of the most popular machine learning concepts
that helps to calculate the probability of occurring one event with
uncertain knowledge while other one has already occurred.
§Bayes' theorem can be derived using product rule and conditional
probability of event X with known event Y:

P(X|Y) = P(Y|X). P(X)

P(Y)
§Here, both events X and Y are independent events which means
probability of outcome of both events does not depends one another.
§The above equation is called as Bayes Rule or Bayes Theorem.
21
Bayes Theorem …Cont’D
§P(X|Y) is called as posterior, which we need to calculate. It is
defined as updated probability after considering the evidence.
§P(Y|X)is called the likelihood. It is the probability of evidence
when hypothesis is true.
§P(X) is called the prior probability, probability of hypothesis
before considering the evidence
§P(Y)is called marginal probability. It is defined as the
probability of evidence under any consideration.
§Hence, Bayes Theorem can be written as:

Posterior = likelihood * prior / evidence

22
Bayes Theorem …Cont’D
§Naïve Bayes theorem is also a supervised algorithm, which is
based on Bayes theorem and used to solve classification problems.
§It is one of the most simple and effective classification algorithms
in Machine Learning which enables us to build various ML models
for quick predictions.
§Itis a probabilistic classifier that means it predicts on the basis of
probability of an object. Some popular Naïve Bayes algorithms
are spam filtration, Sentimental analysis, and classifying articles.

23
Advantages And Disadvantages of Naïve Bayes Classifier
in Machine Learning

Advantages
§ It is one of the simplest and effective methods for calculating the
conditional probability and text classification problems.
§ A Naïve-Bayes classifier algorithm is better than all other models where
assumption of independent predictors holds true.
§ It is easy to implement than other models.
§ It requires small amount of training data to estimate the test data which
minimize the training time period.
§ It can be used for Binary as well as Multi-class Classifications.

Disadvantage
The main disadvantage is that it limits the assumption of independent
predictors because it implicitly assumes that all attributes are independent
or unrelated but in real life it is not feasible to get mutually independent
attributes. 24
II. Activation Functions in Neural Networks
§ What is Activation Function?
§ It’s just a function that you use to get the output of node. It
is also known as Transfer Function.
§ Why we use Activation functions with Neural Networks?
§ It is used to determine the output of neural network like yes
or no.
§ It maps the resulting values in between 0 to 1 or -1 to 1 etc.
(depending upon the function).
§ The Activation Functions can be basically divided into 2 types.
Ø Linear Activation Function
Ø Non-linear Activation Functions
25
II. Activation Functions …Cont’D

Linear or Identity Activation Function

§ As you can see the function is a line or linear. Therefore, the output
of the functions will not be confined between any range.

Equation : f(x) = x And Range : (-infinity to infinity)

§It doesn’t help with the complexity or various parameters of usual
data that is fed to the neural networks.
26
II. Activation Functions …Cont’D

Non-linear Activation Function

§The Nonlinear Activation Functions are the most widely used
activation functions. Nonlinearity helps to make the graph look
something like this

27
II. Activation Functions …Cont’D

Non-linear Activation Function

§Itmakes it easy for the model to generalize or adapt with variety of
data and to differentiate between the output. The main
terminologies needed to understand for nonlinear functions are:
§Derivative or Differential: Change in y-axis w.r.t. change in x-axis.
It is also known as slope.
§Monotonic function : A function which is either entirely non-
increasing or non-decreasing.
§The Nonlinear Activation Functions are mainly divided on the
basis of their range or curves.
28
II. Activation Functions …Cont’D

1. Sigmoid or Logistic Activation Function

§The Sigmoid Function curve looks like a S-shape.

29
Sigmoid or Logistic …Cont’D

1. Sigmoid or Logistic Activation Function

§The main reason why we use sigmoid function is because it
exists between (0 to 1). Therefore, it is especially used for models
where we have to predict the probability as an output. Since
probability of anything exists only between the range of 0 and
1, sigmoid is the right choice.
§The function is differentiable. That means, we can find the slope
of the sigmoid curve at any two points.
§The function is monotonic but function’s derivative is not.
§The logistic sigmoid function can cause a neural network to get
stuck at the training time.
§The softmax function is a more generalized logistic activation
30
function which is used for multiclass classification.
2. Tanh or hyperbolic tangent Activation Function

§Tanh is also like logistic sigmoid but better. The range of the
Tanh function is from (-1 to 1).
§Tanh is also sigmoidal (S - shaped).

31
Tanh …Cont’D

§ The advantage is that the negative inputs will be mapped

strongly negative and the zero inputs will be mapped near
zero in the tanh graph.
§ The function is differentiable.
§ The function is monotonic while its derivative is not
monotonic.
§ The tanh function is mainly used classification between two
classes.
§ Both tanh and logistic sigmoid activation functions are used
in feed-forward nets.

32
3. ReLU (Rectified Linear Unit) Activation Function
§The ReLU is the most widely used activation function in the
world right now since, it is used in almost all the convolutional
neural networks or deep learning.

33
ReLU …Cont’D
§ As you can see, the ReLU is half rectified (from bottom). f(z) is
zero when z is less than zero and f(z) is equal to z when z is above
or equal to zero.
§ Range: [ 0 to infinity)
§ The function and its derivative both are monotonic.
§ But the issue is that all the negative values become zero
immediately which decreases the ability of the model to fit or train
from the data properly.
§ That means any negative input given to the ReLU activation
function turns the value into zero immediately in the graph, which
in turns affects the resulting graph by not mapping the negative
values appropriately.
34
4. Leaky ReLU
§It is an attempt to solve the dying ReLU problem

35
Leaky ReLU …Cont’D
§ The leak helps to increase the range of the ReLU function.
Usually, the value of a is 0.01 or so.
§ When a is not 0.01 then it is called Randomized ReLU.
§ Therefore the range of the Leaky ReLU is (-infinity to infinity).
§ Both Leaky and Randomized ReLU functions are monotonic in
nature. Also, their derivatives is monotonic in nature.

36
III. Perceptron in Machine Learning
§In Machine Learning and Artificial Intelligence, Perceptron is the
most commonly used term for all folks.
§It is the primary step to learn Machine Learning and Deep
Learning technologies, which consists of a set of weights, input
values or scores, and a threshold.
§Perceptron is a building block of an Artificial Neural Network.
§Initially, in the mid of 19th century, Mr. Frank Rosenblatt invented
the Perceptron for performing certain calculations to detect input
data capabilities or business intelligence.
§Perceptron is a linear Machine Learning algorithm used for
supervised learning for various binary classifiers.
§This algorithm enables neurons to learn elements and processes
them one by one during preparation
37
Perceptron …Cont’D
§Further,Perceptron is also understood as an Artificial Neuron or
Neural Network Unit that helps to detect certain input data
computations in business intelligence.
§Perceptron model is also treated as one of the best and simplest
types of Artificial Neural Networks (ANNs).
§However, it is a supervised learning algorithm of binary classifiers.
§Hence, we can consider it as a single-layer neural network with
four main parameters, i.e., input values, weights and Bias, net sum,
and an activation function.

38
Perceptron …Cont’D
§ In simple words, we can understand it as a classification algorithm
that can predict linear predictor function in terms of weight and
feature vectors.
§ Basic Components of Perceptron: Mr. Frank Rosenblatt invented
the perceptron model as a binary classifier which contains three
main components. These are as follows

39
Perceptron …Cont’D
§Input Nodes or Input Layer: is the primary component of
Perceptron which accepts the initial data into the system for further
processing. Each input node contains a real numerical value.
§Weight and Bias: Weight parameter represents the strength of the
connection between units. This is another most important parameter of
Perceptron components.
Ø Weight is directly proportional to the strength of the associated input
neuron in deciding the output. Further, Bias can be considered as the
line of intercept in a linear equation.
§Activation Function: These are the final and important components
that help to determine whether the neuron will fire or not. Activation
Function can be considered primarily as a step function.

40
How does Perceptron Work?
§ In Machine Learning, Perceptron is considered as a single-layer
neural network that consists of four main parameters named input
values (Input nodes), weights and Bias, net sum, and an activation
function.
§ The perceptron model begins with the multiplication of all input
values and their weights, then adds these values together to create
the weighted sum.
§ Then this weighted sum is applied to the activation function 'f' to
obtain the desired output. This activation function is also known as
the step function and is represented by 'f'.

41
How does Perceptron Work…Cont’D

42
How does Perceptron Work…Cont’D

§Perceptron model works in two important steps as follows:

§Step-1 : In the first step first, multiply all input values with
corresponding weight values and then add them to determine the
weighted sum.
§Mathematically, we can calculate the weighted sum as follows:
∑wi*xi = x1*w1 + x2*w2 +…wn*xn, then add a special term
called bias 'b' to this weighted sum to improve the model's
performance. ∑wi*xi + b
§Step-2: In the second step, an activation function is applied with
the above-mentioned weighted sum, which gives us output either
in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
43
Types of Perceptron Models
Based on the layers, Perceptron models are divided into
two types. These are as follows:
I. Single-layer Perceptron Model
II. Multi-layer Perceptron model
§Single Layer Perceptron Model: This is one of the easiest Artificial
neural networks (ANN) types.
§A single-layered perceptron model consists feed-forward network
and also includes a threshold transfer function inside the model.
§The main objective of the single-layer perceptron model is to
analyze the linearly separable objects with binary outcomes
§Multi-Layered Perceptron Model: will be discussed in next
chapter.
44
Perceptron Function
§Perceptron function ''f(x)'' can be achieved as output by
multiplying the input 'x' with the learned weight coefficient
'w'. Mathematically, we can express it as follows:
f(x)=1; if w.x+b>0, otherwise, f(x)=0
§'x' represents a vector of input x values.
§'w' represents real-valued weights vector
§'b' represents the bias

45
Characteristics of Perceptron
The perceptron model has the following characteristics.
1. Perceptron is a machine learning algorithm for supervised
learning of binary classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and the
decision is made whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the
weight function is greater than zero.
5. The linear decision boundary is drawn, enabling the distinction
between the two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value,
it must have an output signal; otherwise, no output will be shown
46
Limitations of Perceptron Model

A perceptron model has limitations:

Ø The output of a perceptron can only be a binary number (0
or 1) due to the hard limit transfer function.
Ø Perceptron can only be used to classify the linearly
separable sets of input vectors. If input vectors are non-
linear, it is not easy to classify them properly.

Chapter 4 SqCzYr
No ratings yet
Chapter 4 SqCzYr
47 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Trees and Probabilistic Models
No ratings yet
Decision Trees and Probabilistic Models
32 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Text Classification Techniques Overview
No ratings yet
Text Classification Techniques Overview
65 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Decision Trees
No ratings yet
Decision Trees
26 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
ML-PPT Unit Iii-1
No ratings yet
ML-PPT Unit Iii-1
38 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree & Random ForestNotes
No ratings yet
Decision Tree & Random ForestNotes
11 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
No ratings yet
Decision Tree Algorithm - A Complete Guide: Data Science Blogathon
13 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
13.decision Tree
No ratings yet
13.decision Tree
29 pages
2.unit 2
No ratings yet
2.unit 2
23 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Tree
No ratings yet
Tree
7 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
45 pages
Decision Tree Random Forest Theory
No ratings yet
Decision Tree Random Forest Theory
13 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
Implementation Phase:-: Experimental Methods and Algorithms Used Machine Learning Scope
No ratings yet
Implementation Phase:-: Experimental Methods and Algorithms Used Machine Learning Scope
16 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
11 pages
Chapter 04
No ratings yet
Chapter 04
48 pages
Unit 3
No ratings yet
Unit 3
21 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Cours #4-Decision Tree
No ratings yet
Cours #4-Decision Tree
18 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Unit 3
No ratings yet
Unit 3
25 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
7 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
CH 12
No ratings yet
CH 12
36 pages
CH 11
No ratings yet
CH 11
32 pages
CH 2
No ratings yet
CH 2
19 pages
KYFB Response 9
No ratings yet
KYFB Response 9
3 pages
MSDS Yeser® CDEA 1 1 Rev.
No ratings yet
MSDS Yeser® CDEA 1 1 Rev.
5 pages
Encounter Essay TEMEN OBLAK
No ratings yet
Encounter Essay TEMEN OBLAK
7 pages
Calcium PDF
No ratings yet
Calcium PDF
3 pages
Case #38: Facts
No ratings yet
Case #38: Facts
2 pages
Quinton Q55 Series 90
No ratings yet
Quinton Q55 Series 90
108 pages
Acs JCTC 6b00446
No ratings yet
Acs JCTC 6b00446
10 pages
Shivam Ind. Training Final
No ratings yet
Shivam Ind. Training Final
29 pages
23.2 Refr. Pass - Leave Rule
No ratings yet
23.2 Refr. Pass - Leave Rule
79 pages
Test 10 8 Kl.
No ratings yet
Test 10 8 Kl.
5 pages
Speech Recognition
100% (3)
Speech Recognition
66 pages
Tapas Model Question Paper 2024 25
100% (5)
Tapas Model Question Paper 2024 25
21 pages
Chinese Oral Project of Term3
No ratings yet
Chinese Oral Project of Term3
5 pages
ICEL-MSB1253100 - E# - Originale
No ratings yet
ICEL-MSB1253100 - E# - Originale
4 pages
MS S&MS6 Pro&MS7 S
No ratings yet
MS S&MS6 Pro&MS7 S
1 page
Capital Market
No ratings yet
Capital Market
41 pages
Anglo-Ndebele War
86% (7)
Anglo-Ndebele War
4 pages
Abeco Mastic Flooring System Guide
No ratings yet
Abeco Mastic Flooring System Guide
5 pages
Analyze The Factor Affecting The Range of A Projectile, Such As Launch Angle and Initial Velocity. Use A Launcher and Measure The Range For Different Angles
No ratings yet
Analyze The Factor Affecting The Range of A Projectile, Such As Launch Angle and Initial Velocity. Use A Launcher and Measure The Range For Different Angles
5 pages
ASM Config
No ratings yet
ASM Config
57 pages
EKC 315A Danfoss Rs8cS302
No ratings yet
EKC 315A Danfoss Rs8cS302
16 pages
Mid-term Exam on Economic Concepts
No ratings yet
Mid-term Exam on Economic Concepts
17 pages
Theories Career Construction Interview (Cci)
No ratings yet
Theories Career Construction Interview (Cci)
2 pages
HTML5 Question Paper
No ratings yet
HTML5 Question Paper
2 pages
Ebooks - 2452NFRS 2 Share-Based Payment (NFRS 2024)
No ratings yet
Ebooks - 2452NFRS 2 Share-Based Payment (NFRS 2024)
38 pages
A New Class of Registers
No ratings yet
A New Class of Registers
3 pages
TI-36X Pro Guidebook
100% (1)
TI-36X Pro Guidebook
78 pages
Cambridge IGCSE ™: French 0520/41
No ratings yet
Cambridge IGCSE ™: French 0520/41
12 pages
Traditional Mediterranean Architecture: House With Three Arches
No ratings yet
Traditional Mediterranean Architecture: House With Three Arches
12 pages
Dapa 1669 140 02 Site Analysis Doc Redacted
No ratings yet
Dapa 1669 140 02 Site Analysis Doc Redacted
48 pages
Assembly Drawing Examples and Exercises
No ratings yet
Assembly Drawing Examples and Exercises
2 pages

Chapter 4 SqCzYr

Uploaded by

Chapter 4 SqCzYr

Uploaded by

Chapter 4

Machine Learning Algorithms for Classification,

I. Classification Algorithms in Machine Learning

Where, Pi = Probability of randomly selecting an example in class I

So, entropy will be 1.5.

P(X|Y) = P(Y|X). P(X)

Posterior = likelihood * prior / evidence

Linear or Identity Activation Function

Equation : f(x) = x And Range : (-infinity to infinity)

Non-linear Activation Function

Non-linear Activation Function

1. Sigmoid or Logistic Activation Function

1. Sigmoid or Logistic Activation Function

§ The advantage is that the negative inputs will be mapped

§Perceptron model works in two important steps as follows:

A perceptron model has limitations:

You might also like