Naive Bayes Classifier — A beginners guide.

Harshit Maheshwari
5 min readDec 5, 2023

--

Photo by Annie Spratt on Unsplash

Naive Bayes Classifier is a popular algorithm used in supervised learning for classification tasks. It is a probabilistic algorithm that relies on Bayes’ theorem to predict the class of an input data point. Despite its simplicity, Naive Bayes Classifier is known to perform well in many real-world applications, including spam filtering, text classification, and sentiment analysis.

Bayes’ Theorem

Before diving into Naive Bayes Classifier, let’s review Bayes’ theorem. Bayes’ theorem is a fundamental concept in probability theory that describes the relationship between the conditional probabilities of two events. It states that the probability of an event A given event B has occurred is proportional to the probability of event B given event A, multiplied by the probability of event A.

Mathematically, Bayes’ theorem can be represented as:

P(A|B) = P(B|A) * P(A) / P(B)

Where:

  • P(A|B) is the conditional probability of A given B has occurred
  • P(B|A) is the conditional probability of B given A has occurred
  • P(A) is the prior probability of A
  • P(B) is the prior probability of B

In the context of Naive Bayes Classifier, we use Bayes’ theorem to compute the probability of a class given an input data point.

Naive Bayes Classifier

Naive Bayes Classifier is a probabilistic algorithm that predicts the class of an input data point based on the probability of the input data point belonging to each class. The algorithm assumes that the features (attributes) of the input data point are independent of each other, hence the term “naive”. This assumption simplifies the computation of the joint probability of the features and allows us to compute the probability of each feature independently.

To illustrate Naive Bayes Classifier, let’s consider a simple example of classifying emails as spam or not spam. We have a dataset of emails, where each email is represented by a set of features (words). The goal is to classify a new email as either spam or not spam.

The first step in Naive Bayes Classifier is to compute the prior probabilities of the classes. In our example, we compute the probability of an email being spam (P(spam)) and the probability of an email being not spam (P(not spam)) based on the frequency of each class in the dataset.

The next step is to compute the conditional probabilities of each feature given the class. We compute the probability of each feature (word) given the class by counting the frequency of the feature in emails of that class and normalizing by the total number of words in emails of that class.

Once we have computed the conditional probabilities of each feature given the class, we can compute the joint probability of the features and the class. We do this by multiplying the conditional probabilities of each feature given the class and the prior probability of the class. We repeat this for each class to obtain the joint probability of the features for each class.

Finally, we select the class with the highest joint probability as the predicted class for the input data point. In our example, we compute the joint probability of the input email belonging to each class and select the class with the highest joint probability as the predicted class (spam or not spam).

Naive Bayes Classifier Variants

There are three main variants of Naive Bayes Classifier: Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes. Let’s go through each of these variants in more detail.

  1. Gaussian Naive Bayes Classifier
    Gaussian Naive Bayes Classifier is used when the features of the input data point are continuous and have a normal distribution. It assumes that the probability density function (PDF) of each feature for each class follows a normal (Gaussian) distribution. This variant computes the mean and standard deviation of each feature for each class and uses the Gaussian probability density function to compute the conditional probability of each feature given the class.
  2. Multinomial Naive Bayes Classifier
    Multinomial Naive Bayes Classifier is used when the features of the input data point are discrete and represent the frequency of occurrence of a particular event. This variant is commonly used for text classification, where the features are usually word counts or word frequencies. It assumes that the probability distribution of each feature for each class follows a multinomial distribution. This variant computes the probability of each feature (word) given the class using the multinomial distribution and the total count of each feature (word) in the training data.
  3. Bernoulli Naive Bayes Classifier
    Bernoulli Naive Bayes Classifier is also used for discrete data, but it assumes that the features are binary (i.e., 0 or 1). This variant is commonly used for text classification, where the features are binary indicators of the presence or absence of a particular word in the document. It assumes that the probability distribution of each feature for each class follows a Bernoulli distribution. This variant computes the probability of each feature (word) given the class using the Bernoulli distribution and the total count of each feature (word) in the training data.

Which variant to use?

The choice of which variant of Naive Bayes Classifier to use depends on the nature of the data and the problem at hand. In general, Gaussian Naive Bayes Classifier is suitable for continuous data, Multinomial Naive Bayes Classifier is suitable for discrete data with multiple occurrences, and Bernoulli Naive Bayes Classifier is suitable for binary discrete data.

However, in practice, it is often not clear which variant to use, and it may require some experimentation and evaluation to determine which variant performs best for a given problem. It is also worth noting that Naive Bayes Classifier is a simple algorithm that assumes independence between features, and may not be suitable for complex problems where the features are highly correlated.

Conclusion

In conclusion, Naive Bayes Classifier is a popular algorithm used in supervised learning for classification tasks. It is a probabilistic algorithm that relies on Bayes’ theorem to predict the class of an input data point. There are three main variants of Naive Bayes Classifier: Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes, each with its own assumptions and advantages. The choice of which variant to use depends on the nature of the data and the problem at hand. While Naive Bayes Classifier is a simple algorithm, it can be effective in many real-world applications, including spam filtering, text classification, and sentiment analysis.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Harshit Maheshwari
Harshit Maheshwari

Written by Harshit Maheshwari

Cultivating AI insights for over 5 years, I'm on a mission to demystify the machine learning landscape, one Medium article at a time.

No responses yet

Write a response