Open In App

Naive Bayes Classifiers

Last Updated : 21 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Naive Bayes is a classification algorithm that uses probability to predict which category a data point belongs to, assuming that all features are unrelated. This article will give you an overview as well as more advanced use and implementation of Naive Bayes in machine learning.

Illustration behind the Naive Bayes algorithm. We estimate P(x_α|y) independently in each dimension (middle two images) and then obtain an estimate of the full data distribution by assuming conditional independence P(x|y)=∏_αP(x_α|y)(very right image).

Key Features of Naive Bayes Classifiers

The main idea behind the Naive Bayes classifier is to use Bayes' Theorem to classify data based on the probabilities of different classes given the features of the data. It is used mostly in high-dimensional text classification

  • The Naive Bayes Classifier is a simple probabilistic classifier and it has very few number of parameters which are used to build the ML models that can predict at a faster speed than other classification algorithms.
  • It is a probabilistic classifier because it assumes that one feature in the model is independent of existence of another feature. In other words, each feature contributes to the predictions with no relation between each other.
  • Naïve Bayes Algorithm is used in spam filtration, Sentimental analysis, classifying articles and many more.

Why it is Called Naive Bayes?

It is named as "Naive" because it assumes the presence of one feature does not affect other features. The "Bayes" part of the name refers to its basis in Bayes’ Theorem.

Consider a fictional dataset that describes the weather conditions for playing a game of golf. Given the weather conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for playing golf. Here is a tabular representation of our dataset.

OutlookTemperatureHumidityWindyPlay Golf
0RainyHotHighFalseNo
1RainyHotHighTrueNo
2OvercastHotHighFalseYes
3SunnyMildHighFalseYes
4SunnyCoolNormalFalseYes
5SunnyCoolNormalTrueNo
6OvercastCoolNormalTrueYes
7RainyMildHighFalseNo
8RainyCoolNormalFalseYes
9SunnyMildNormalFalseYes
10RainyMildNormalTrueYes
11OvercastMildHighTrueYes
12OvercastHotNormalFalseYes
13SunnyMildHighTrueNo

The dataset is divided into two parts, namely, feature matrix and the response vector.

  • Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
  • Response vector contains the value of class variable(prediction or output) for each row of feature matrix. In above dataset, the class variable name is ‘Play golf’.

Assumption of Naive Bayes

The fundamental Naive Bayes assumption is that each feature makes an:

  • Feature independence: This means that when we are trying to classify something, we assume that each feature (or piece of information) in the data does not affect any other feature.
  • Continuous features are normally distributed: If a feature is continuous, then it is assumed to be normally distributed within each class.
  • Discrete features have multinomial distributions: If a feature is discrete, then it is assumed to have a multinomial distribution within each class.
  • Features are equally important: All features are assumed to contribute equally to the prediction of the class label.
  • No missing data: The data should not contain any missing values.

Introduction to Bayes' Theorem

Bayes’ Theorem provides a principled way to reverse conditional probabilities. It is defined as:

P(y|X) = \frac{P(X|y) \cdot P(y)}{P(X)}

Where:

  • P(y|X): Posterior probability, probability of class y given features X
  • P(X|y): Likelihood, probability of features X given class y
  • P(y): Prior probability of class y
  • P(X): Marginal likelihood or evidence

Naive Bayes Working

1. Terminology

Consider a classification problem (like predicting if someone plays golf based on weather). Then:

  • y is the class label (e.g. "Yes" or "No" for playing golf)
  • X = (x_1, x_2, ..., x_n) is the feature vector (e.g. Outlook, Temperature, Humidity, Wind)

A sample row from the dataset:

X = \text{(Rainy, Hot, High, False)}, \quad y = \text{No}

This represents:

What is the probability that someone will not play golf given that the weather is Rainy, Hot, High humidity, and No wind?

2. The Naive Assumption

The "naive" in Naive Bayes comes from the assumption that all features are independent given the class. That is:

P(x_1, x_2, ..., x_n | y) = P(x_1 | y) \cdot P(x_2 | y) \cdots P(x_n | y)

Thus, Bayes' theorem becomes:

P(y|x_1, ..., x_n) = \frac{P(y) \cdot \prod_{i=1}^{n} P(x_i | y)}{P(x_1)P(x_2)...P(x_n)}

Since the denominator is constant for a given input, we can write:

P(y|x_1, ..., x_n) \propto P(y) \cdot \prod_{i=1}^{n} P(x_i | y)

3. Constructing the Naive Bayes Classifier

We compute the posterior for each class y and choose the class with the highest probability:

\hat{y} = \arg\max_{y} P(y) \cdot \prod_{i=1}^{n} P(x_i | y)

This becomes our Naive Bayes classifier.

4. Example: Weather Dataset

Let’s take a dataset used for predicting if golf is played based on:

  • Outlook: Sunny, Rainy, Overcast
  • Temperature: Hot, Mild, Cool
  • Humidity: High, Normal
  • Wind: True, False
NaiveBayesExample
Example Tables for Naive Bayes

Example Input: X = (Sunny, Hot, Normal, False)

Goal: Predict if golf will be played (Yes or No).

5. Pre-computation from Dataset

Class Probabilities:

From dataset of 14 rows:

  • P(\text{Yes}) = \frac{9}{14}
  • P(\text{No}) = \frac{5}{14}

Conditional Probabilities (Tables 1–4):

Feature

Value

P (Value | Yes)

P (Value | No)

Outlook

Sunny

2/9

3/5

Temperature

Hot

2/9

2/5

Humidity

Normal

6/9

1/5

Wind

False

6/9

2/5

6. Calculate Posterior Probabilities

For Class = Yes:

P(\text{Yes | today}) \propto \frac{2}{9} \cdot \frac{2}{9} \cdot \frac{6}{9} \cdot \frac{6}{9} \cdot \frac{9}{14}

P(\text{Yes | today}) \approx 0.02116

For Class = No:

P(\text{No | today}) \propto \frac{3}{5} \cdot \frac{2}{5} \cdot \frac{1}{5} \cdot \frac{2}{5} \cdot \frac{5}{14}

P(\text{No | today}) \approx 0.0068

7. Normalize Probabilities

To compare:

P(\text{Yes | today}) = \frac{0.02116}{0.02116 + 0.0068} \approx 0.756

P(\text{No | today}) = \frac{0.0068}{0.02116 + 0.0068} \approx 0.244

8. Final Prediction

Since:

P(\text{Yes | today}) > P(\text{No | today})

The model predicts: Yes (Play Golf)

Naive Bayes for Continuous Features

For continuous features, we assume a Gaussian distribution:

P(x_i | y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left( -\frac{(x_i - \mu_y)^2}{2\sigma^2_y} \right)

Where:

  • \mu_y is the mean of feature x_i for class y
  • \sigma^2_y is the variance of feature x_i for class y

This leads to what is called Gaussian Naive Bayes.

Types of Naive Bayes Model

There are three types of Naive Bayes Model :

1. Gaussian Naive Bayes

In Gaussian Naive Bayes, continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution. A Gaussian distribution is also called Normal distribution When plotted, it gives a bell shaped curve which is symmetric about the mean of the feature values as shown below:

2. Multinomial Naive Bayes

Multinomial Naive Bayesis used when features represent the frequency of terms (such as word counts) in a document. It is commonly applied in text classification, where term frequencies are important.

3. Bernoulli Naive Bayes

Bernoulli Naive Bayes deals with binary features, where each feature indicates whether a word appears or not in a document. It is suited for scenarios where the presence or absence of terms is more relevant than their frequency. Both models are widely used in document classification tasks

Advantages of Naive Bayes Classifier

  • Easy to implement and computationally efficient.
  • Effective in cases with a large number of features.
  • Performs well even with limited training data.
  • It performs well in the presence of categorical features.
  • For numerical features data is assumed to come from normal distributions

Disadvantages of Naive Bayes Classifier

  • Assumes that features are independent, which may not always hold in real-world data.
  • Can be influenced by irrelevant attributes.
  • May assign zero probability to unseen events, leading to poor generalization.

Applications of Naive Bayes Classifier

  • Spam Email Filtering: Classifies emails as spam or non-spam based on features.
  • Text Classification: Used in sentiment analysis, document categorization, and topic classification.
  • Medical Diagnosis: Helps in predicting the likelihood of a disease based on symptoms.
  • Credit Scoring: Evaluates creditworthiness of individuals for loan approval.
  • Weather Prediction: Classifies weather conditions based on various factors.

Next Article

Similar Reads