0% found this document useful (0 votes)
39 views5 pages

Practical 6 Encoding (Tanvir)

Practical_6_Encoding : DSV

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views5 pages

Practical 6 Encoding (Tanvir)

Practical_6_Encoding : DSV

Uploaded by

vhoratanvir1610
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DATA SCIENCE AND VISUALIZATION 12202080501060

202046707

Practical 6:
Perform encoding of categorical variables in the given dataset.

Introduction:

In data preprocessing, categorical variables need to be transformed into numerical


representations so that machine learning algorithms can process them effectively. This
practical demonstrates how to apply One-Hot Encoding, Label Encoding, and
preprocessing techniques such as scaling, normalization, and handling missing values. The
dataset used contains student details, including gender, city, mobile, semester marks, and
more.

Code:

from [Link] import ColumnTransformer

from [Link] import OneHotEncoder

import numpy as np

import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/DSV
/Dataset_(12202080501060)/student_dataset_with_missing_values.csv')

df = [Link](['Name', 'Enrollment'], axis=1)

X = [Link][:, :-1].values

y = [Link][:, -1].values

gender_col_index = [Link].get_loc('Gender')

city_col_index = [Link].get_loc('City')

mobile_col_index = [Link].get_loc('Mobile')

GCET
DATA SCIENCE AND VISUALIZATION 12202080501060
202046707

from [Link] import make_column_transformer

from [Link] import SimpleImputer

from [Link] import FunctionTransformer

numeric_transformer = SimpleImputer(strategy='mean')

categorical_transformer = OneHotEncoder(handle_unknown='ignore')

ct = make_column_transformer(

(categorical_transformer, [gender_col_index, city_col_index]),

(numeric_transformer, [mobile_col_index]),

remainder='passthrough'

X = ct.fit_transform(X)

X = [Link]() if hasattr(X, 'toarray') else X

print("Data after encoding 'Gender' and 'City' and handling 'Mobile':")

print(X[:5])

from [Link] import LabelEncoder

GCET
DATA SCIENCE AND VISUALIZATION 12202080501060
202046707

le = LabelEncoder()

y = le.fit_transform(y)

print(y)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 7)

X_train

X_test

y_train

y_test

from [Link] import StandardScaler

sc = StandardScaler()

X_train_numeric = X_train[:, 8:]

X_test_numeric = X_test[:, 8:]

GCET
DATA SCIENCE AND VISUALIZATION 12202080501060
202046707

X_train_scaled = sc.fit_transform(X_train_numeric)

X_test_scaled = [Link](X_test_numeric)

print("Scaled X_train (numerical columns):")

print(X_train_scaled)

from [Link] import Normalizer

from [Link] import SimpleImputer

import numpy as np

nm = Normalizer()

numerical_cols_indices = slice(8, None)

imputer_numerical = SimpleImputer(missing_values=[Link], strategy='mean')

GCET
DATA SCIENCE AND VISUALIZATION 12202080501060
202046707

X_train[:, numerical_cols_indices] = imputer_numerical.fit_transform(X_train[:,


numerical_cols_indices])

X_test[:, numerical_cols_indices] = imputer_numerical.transform(X_test[:,


numerical_cols_indices])

X_train[:, numerical_cols_indices] = nm.fit_transform(X_train[:, numerical_cols_indices])

X_test[:, numerical_cols_indices] = [Link](X_test[:, numerical_cols_indices])

print("Numerical columns normalized after imputation.")

print(X_train)

Important Points:

1. One-Hot Encoding is used for categorical variables like Gender and City.

2. Label Encoding is applied on the target variable.

3. Missing values in numerical columns are handled using mean imputation.

4. StandardScaler normalizes numerical values to a common scale.

5. Normalizer ensures feature vectors have unit norm.

Conclusion:
Encoding categorical variables is a crucial step in data preprocessing. It allows machine
learning models to interpret categorical data effectively. In this practical, we successfully
encoded categorical features, handled missing values, and applied scaling and
normalization to numerical data, preparing the dataset for model building.

GCET

You might also like