Open In App

How to Perform Ordinal Encoding Using Sklearn

Last Updated : 19 Jun, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Have you ever played a game where you rank things, like your favorite pizza toppings or the scariest monsters? In the field of computers, it is essentially what ordinal encoding accomplishes! It converts ordered data, such as "small," "medium", and "large," into numerical values that a computer can comprehend.

Understanding Ordinal Encoding

What is Ordinal Encoding?

Imagine you're helping a friend sort their movie collection. Three categories are available to you : "terrible," "okay," and "awesome." Each category is given a number by ordinal encoding, such as "terrible" = 1, "okay" = 2, and "awesome" = 3. In this manner "awesome" movies come after "okay" ones so the computer can comprehend. One way to transform categorical data into numerical data is to use ordinal encoding. 'Red, Green and Blue ' are examples of categories or groups that are represented by categorical data. Each category in ordinal encoding is given an integer, such as 1 for "Red," 2 for "Green," and 3 for "Blue." The important thing to remember is that these categories have a purposeful hierarchy. Here are some key terms to remember:

  • Category: A group of similar things, like the movie ratings in our example.
  • Ordinal Data: Data that has a natural order like movie ratings shirt sizes (small, medium, large), or video game difficulty levels (easy, normal, hard).
  • Encoding: Converting data into a format a computer can understand.

Why Use Ordinal Encoding?

Numerous algorithms in machine learning function best with numerical data. To assist the algorithms in processing the data and producing predictions, ordinal encoding converts categories into a number format. In cases when the categories are naturally arranged like grades (A, B, C) or levels (Low, Medium, High) it is extremely helpful.

Preparing the Data

Before we can perform ordinal encoding, we need to have some data to work with. Let's consider a simple dataset:

Student

Grade

Alice

A

Bob

B

Charlie

C

David

A

Eva

B

In this dataset, 'Grade' is a categorical variable that we want to encode.

Implementing Ordinal Encoding in Sklearn

Now, let's move on to the actual implementation using Sklearn.

Step 1: Install Sklearn

First, ensure you have Sklearn installed. You can install it using pip:

Python
pip install scikit-learn

Step 2: Import Necessary Libraries

Next, we'll import the required libraries.

Python
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder

Step 3: Create the DataFrame

We'll create a DataFrame using the sample data.

Python
data = {
    'Student': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Grade': ['A', 'B', 'C', 'A', 'B']
}
df = pd.DataFrame(data)
print(df)

Output:

   Student Grade
0 Alice A
1 Bob B
2 Charlie C
3 David A
4 Eva B

Step 4: Initialize and Apply OrdinalEncoder

We'll initialize the OrdinalEncoder and apply it to the 'Grade' column.

Python
encoder = OrdinalEncoder(categories=[['A', 'B', 'C']])
df['Grade_encoded'] = encoder.fit_transform(df[['Grade']])
print(df)

Output:

   Student Grade  Grade_encoded
0 Alice A 0.0
1 Bob B 1.0
2 Charlie C 2.0
3 David A 0.0
4 Eva B 1.0

Verifying the Results

To verify our results, we can check that the 'Grade' column has been correctly encoded. Each category 'A', 'B', and 'C' has been replaced with 0.0, 1.0, and 2.0, respectively. This confirms that our ordinal encoding has been applied successfully.

Example 2: Ordinal Encoding with a Public Dataset

Step 1: Import Necessary Libraries

We'll start by importing the necessary libraries.

Python
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
import matplotlib.pyplot as plt
import seaborn as sns

Step 2: Load the Public Dataset

We'll load the "Titanic" dataset directly from a URL.

Python
# Load the Titanic dataset
url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
df = pd.read_csv(url)
print("Original Data:")
print(df.head())

Output:

Original Data:
Survived Pclass Name \
0 0 3 Mr. Owen Harris Braund
1 1 1 Mrs. John Bradley (Florence Briggs Thayer) Cum...
2 1 3 Miss. Laina Heikkinen
3 1 1 Mrs. Jacques Heath (Lily May Peel) Futrelle
4 0 3 Mr. William Henry Allen

Sex Age Siblings/Spouses Aboard Parents/Children Aboard Fare
0 male 22.0 1 0 7.2500
1 female 38.0 1 0 71.2833
2 female 26.0 0 0 7.9250
3 female 35.0 1 0 53.1000
4 male 35.0 0 0 8.0500

The Titanic dataset contains various features about passengers, such as 'Pclass' (passenger class), 'Sex', 'Age', etc.

Step 3: Visualize the Column

We'll focus on the 'Sex' column and visualize its distribution.

Python
# Visualizing the 'Sex' column
sns.countplot(x='Sex', data=df)
plt.title('Original Sex Distribution')
plt.show()

Output:

download


The plot shows the frequency of male and female passengers in the dataset.

Step 4: Initialize and Apply OrdinalEncoder

We'll initialize the OrdinalEncoder and apply it to the 'Sex' column.

Python
# Initialize the OrdinalEncoder
encoder = OrdinalEncoder(categories=[['female', 'male']])
df['Sex_encoded'] = encoder.fit_transform(df[['Sex']])
print("\nEncoded Data:")
print(df[['Sex', 'Sex_encoded']].head())

Output:

Encoded Data:
Sex Sex_encoded
0 male 1.0
1 female 0.0
2 female 0.0
3 female 0.0
4 male 1.0

We specify the order of the categories as 'female' and 'male'. The OrdinalEncoder assigns 0 to 'female' and 1 to 'male'.

Step 5: Visualize the Encoded Data

Let's visualize the encoded data to see the distribution of numerical values.

Python
# Visualizing the encoded data
sns.countplot(x='Sex_encoded', data=df)
plt.title('Encoded Sex Distribution')
plt.show()

Output:

download-(2)

Conclusion

Ordinal encoding is a handy way to prepare your data for machine learning tasks. The method is simple and seamless thanks to Sklearn's OrdinalEncoder. You can now use order to your advantage in your data analysis endeavors! When the categories have a natural order, ordinal encoding is a simple yet effective method for turning categorical data into numerical representation. This procedure is significantly more accessible, when Sklearn is used. You may use ordinal encoding into your machine learning projects with ease by following the instructions provided in this article.


Next Article

Similar Reads