Skip to content Skip to footer

Getting Started with Machine Learning Algorithms: Naive Bayes

In supervised machine learning, the Naive Bayes algorithm is one of the most common algorithms we can use for both binary and multiple-class classification tasks. Since it has a wide range of real-life applications, it becomes crucial to learn about the concept behind these algorithms. So in this article, we will get an introductory guide to the k-nearest neighbour using the following major points.

Table of content

  • What is Naive Bayes?
  • How does a Naive Bayes algorithm work?
  • Assumptions of Naive Bayes
  • Code example
  • Pros and Cons of Naive Bayes

What is Naive Bayes?

In machine learning and data science space, naive Bayes is one of the popular algorithms which we use for classification tasks. Talking about the idea behind this algorithm, we can say it is based on Baye’s theorem of probability theory, named after Reverend Thomas Bayes. According to this theorem, the probability of a hypothesis (in this case, a particular class) is proportional to the probability of the evidence (the input features) given that hypothesis.

In naive Bayes, the word Naive refers to the assumption that the input features are conditionally independent given the class. The assumption is called naive because it is generally an oversimplification of real-world scenarios where the feature can depend on each other. Let’s take an example of a text classification scenario where we often find words or text from a document as an input feature. According to this assumption, the occurrence of one word does not affect the occurrence of other words in the same document, given the class. This is often not true because, generally, the occurrence of certain words in a document can affect the likelihood of other words appearing as well. Despite this naive assumption, Naive Bayes can still perform well in many real-world applications.

Instead of dwelling more on what naive Bayes is, we can understand naive Bayes by its working. So let’s know how naive Bayes works.

How does naive Bayes work?

As discussed above, it is based on the Bayes theorem of probability subject, naive Bayes working is dependent on calculating the probability of each possible class given the input feature. Absolutely it happens when the algorithm applies Bayes’s theory. In simplification, we can understand the Bayes theory by using the following mathematical notation:

P(class | features) ∝ P(features | class) x P(class) / P(features)

Where

  • P(class | features) is the posterior probability of the class given the input features.
  • P(features | class) is the likelihood of the input features given the class.
  • P(class) is the prior probability of the class.
  • P(features) is the marginal probability of the evidence (i.e., the input features).

The above notation can be explained as the probability of a hypothesis of a class label given the evidence of the input features, which is directly proportional to the probability of the evidence given the hypothesis multiplied by the prior probability of the hypothesis, divided by the marginal probability of the evidence. However, here the likelihood term is calculated assuming that the input features are conditionally independent given the class, as follows:

P(features | class) = P(feature_1 | class) x P(feature_2 | class) x … x P(feature_n | class)

where feature_1, feature_2, …, feature_n are the input features, and P(feature_i | class) is the probability of feature_i given the class.

By just using the likelihood and prior probabilities, we can simplify the formula for Naive Bayes to:

P(class | features) = normalization factor x P(feature_1 | class) x P(feature_2 | class) x … x

P(feature_n | class) x P(class)

Here the normalization factor is a constant that makes the probabilities sum up to 1, and the P(feature_i | class) and P(class) can be estimated using the training data.

Basically saying, To classify a new instance, Naive Bayes calculates the probability of each possible class label given the input features. Then using the above formula, it selects the class label with the highest probability as the predicted label for the instance.

When we go deeper into the subject, we find there are three major variants of the Naive Bayes algorithm which can be used for different use cases. basic details about these variants are as follows:

  • Gaussian Naive Bayes: This variant is used when the input features are continuous or numerical. It assumes that the input data follows a Gaussian distribution and estimates the mean and variance of each feature for each class. This variant is widely used in classification problems that involve continuous features, such as predicting the price of a house based on its features.
  • Multinomial Naive Bayes: This variant is used when the input features are discrete or categorical. It assumes that the input data follows a multinomial distribution and estimates the probabilities of each feature for each class. This variant is widely used in text classification problems, such as classifying emails as spam or not spam based on their content.
  • Bernoulli Naive Bayes: This variant is similar to Multinomial Naive Bayes but is used when the input features are binary or Boolean. It assumes that the input data follows a Bernoulli distribution and estimates the probabilities of each feature being present or absent for each class. This variant is also widely used in text classification problems, such as classifying documents as positive or negative based on the presence or absence of certain words.

Now let’s take a look at the assumptions we might need to take care of when choosing Naive Bayes for any data modelling procedure.

Assumption of Naive Bayes

Here are the important assumptions that we should consider When applying Naive Bayes for data modelling:

  • First of all, the Naive Bayes assumes that the input features are conditionally independent given the class label. So Independence of Features is one of the most important assumptions to cater for naive Bayes. In a more general sense, we can say the presence or absence of one feature does not affect the probability of another feature occurring.
  • As the naive Bayes algorithm treats all input features as equally important in predicting the class label so the Equal Importance of Features becomes the second assumption.
  • When training a naive Bayes model on data, we need to consider Enough Training Data so that it can give a reliable estimation of the probabilities of the input features given the class label.
  • The data we use to model this algorithm should come with the Absence of Redundancy, meaning that the features should not provide redundant or overlapping information about the class label.
  • The training data we are using with a naive Bayes model should have a Balanced Class Distribution. Unbalanced class distribution can lead to inaccuracy of the model, or the model can become biased toward the overrepresented class.

However, in many cases, it has been seen that this model can still perform well enough if the dependence among the features is not too strong. After knowing about the basics of Naive Bayes, let’s take a look at the code implementation.

Code Example

Here, in this implementation of Naive Bayes, we are going to use Python programming language in which we get modules to generate synthetic data, split data and model functions under the libraries such as sklearn and NumPy. Let’s start the implementation by importing the libraries and the modules.

Importing libraries

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

Here, we have called the NumPy library, which we will use to make synthetic data and make calculations, and the modules for splitting and model data using the Gaussian naive Bayes model.

Generating Data

Let’s make a dataset

# Generate random data

X = np.random.rand(1000, 5)

y = np.random.randint(0, 2, size=1000)

Here, we have generated random data with 1000 samples and five features, where the target variable (y) is a binary class label.

Let’s split the data

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Here, we have split the synthetic data into 70% training and 30% testing data.

Model training

Now, we can train the Gaussian naive Bayes model using the training data. Let’s make a model object.

# Create Naive Bayes model

model = GaussianNB()

Let’s fit the training data into the Naive Bayes model(above defined model object).

# Train model on training data

model.fit(X_train, y_train)

Output:

Here, we have trained the model object in the training data. Now we can make predictions and evaluate our model.

Model Evaluation

Let’s make a prediction out of our trained model.

# Make predictions on testing data

y_pred = model.predict(X_test)

let’s evaluate the model based on the predictions made by the model itself.

# Evaluate model performance

accuracy = np.mean(y_pred == y_test)

print(“Accuracy: “, accuracy)

Output:

# Evaluate model performance

accuracy = np.mean(y_pred == y_test)*100

print(“Accuracy: “, accuracy,’%’)

Output:

Here, we can see that our model has performed with almost 50% of accuracy. However, it is not an optimal performance, but our aim to learn about the implementation is completed.

In this list of articles, we discuss a machine learning algorithm two times where we learn the basics of algorithms the first time and in advance (second article), we see how we can use the algorithm in a more advanced manner so that we can get optimal performance from the machine learning algorithm.

So, please subscribe to us to learn the more advanced way to model data using different machine learning algorithms. Let’s take our discussion ahead and know the pros and cons of the Naive Bayes algorithm.

Pros and cons of Naive Bayes

There are several advantages and disadvantages of any machine learning algorithm. Similarly, naive Bayes has its own pros and cons. Some of them are listed below:

Pros

  • Naive Bayes can handle both continuous and categorical data, making it versatile for different types of datasets.
  • The algorithm is less prone to overfitting, which means Naive Bayes can generalize well to new data.
  • Naive Bayes performs well in high-dimensional datasets where the number of data features is larger than the number of data observations.
  • We can use Naive Bayes for both binary and multi-class classification problems.
  • Naive Bayes is relatively easy to implement and can be used as a baseline model for other, more complex algorithms.

Cons

  • The assumption of all features being independent of each other becomes a con because this is rarely true in real-world datasets.
  • Naive Bayes can be affected by the presence of outliers in the data.
  • Naive Bayes relies heavily on the quality of the input data and can perform poorly in the case of data being noisy or containing missing values.
  • The algorithm can have difficulties handling datasets with rare events, which can lead to underestimation of probabilities.
  • Naive Bayes is a probabilistic algorithm, which means that it can sometimes produce unreliable probabilities for rare events or extreme cases.

Final words

In the above article, we have discussed the naive Bayes algorithm, which is one of the popular algorithms in the machine learning space. By looking at the basic of it, we can say that it is mostly based on probability theories. This algorithm can be a good choice to work with when fewer calculations are required or the features in the dataset are independent of each other or have very low correlation. We have also discussed the assumption we need to take into account as well as the pros and cons of this algorithm.

To know more about different machine learning algorithms, one can subscribe to us. More details about us can be found below.

About DSW

DSW, specializing in Artificial Intelligence and Data Science, provides platforms and solutions for leveraging data through AI and advanced analytics. With offices located in Mumbai, India, and Dublin, Ireland, the company serves a broad range of customers across the globe.

Our mission is to democratize AI and Data Science, empowering customers with informed decision-making. Through fostering the AI ecosystem with data-driven, open-source technology solutions, we aim to benefit businesses, customers, and stakeholders and make AI available for everyone.

Our flagship platform ‘UnifyAI’ aims to streamline the data engineering process, provide a unified pipeline, and integrate AI capabilities to support businesses in transitioning from experimentation to full-scale production, ultimately enhancing operational efficiency and driving growth.