Artificial Intelligence and Machine Learning
PRACTICAL 2
Introduction to Machine Learning
Working with Scikit-Learn
Prepared by Nima Dema
1 | Page 15 February 2024
Artificial Intelligence and Machine Learning
Table of Contents
0. Learning Objectives 2
1. Introduction to scikit-learn library 2
2. Scikit-learn datasets 3
2.1. Import modules 3
2.2. Load data 4
2.3. Creating dataframe 4
2.4. Exploring the data 5
3. Scikit-learn Pre-processing techniques 5
3.1. Import Scaler 5
3.2. Scale the features 5
3.3. Convert scaled data to DataFrame 6
TODO: WORKING WITH SKLEARN DIABETES DATASET 6
0. Learning Objectives
In this week’s practical session, the main focus is on the usage of sklearn library.
The primary aim is to enable students to effectively harness the scikit-learn library
for machine learning tasks.
By the end of the lab, you should be able to:
➔ Use scikit-learn datasets module to load and read data
➔ Use some common scikit-learn methods to pre-process data
1. Introduction to scikit-learn library
The scikit-learn library, often abbreviated as sklearn, is an open-source machine
learning library for the Python programming language. It provides a wide range of
tools for machine learning tasks such as classification, regression, clustering,
dimensionality reduction, and model selection.
2 | Page 15 February 2024
Artificial Intelligence and Machine Learning
In addition to its machine learning capabilities, scikit-learn provides a rich suite of
data pre-processing methods including scalers, encoders, imputers and etc., which
are essential for preparing datasets before training machine learning models.
Moreover, scikit-earn also offers a collection of inbuilt datasets that serve as
convenient resources for experimentation and learning. These datasets cover a
diverse range of domains and are readily available for users to explore and apply
various machine learning techniques without the need for external data sources.
2. Scikit-learn datasets
The [Link] module contains different datasets that you can use to
work on for creating machine learning models. In this section we will explore few of
them.
INSTRUCTIONS:
➔ Load iris dataset, which is commonly used datasets to apply machine
learning techniques as a beginner. For this task use load_iris()
➔ Explore the data returned by the load_iris() method.
➔ Create dataframe from the loaded dataset. Make sure to include both features
as well as target.
2.1. Import modules
Import the necessary libraries (Already done for you). For this task you may need
pandas as well.
3 | Page 15 February 2024
Artificial Intelligence and Machine Learning
from [Link] import load_iris
import pandas as pd
2.2. Load data
Load data using load_iris() and answer following questions:
What is the type of data returned by the load_iris()?
Explain different attributes associated with the data returned by load_iris().
#Write your answer here [Expecting 3 lines of code]
…….
…….
…….
2.3. Creating dataframe
Create dataframe from the above loaded data. Make sure you include all the features
as well as target in your dataframe. Use dataframe head() method to check your
result.
Your expected result should look like:
Write your solution here [Expecting 3 lines of code]
…….
…….
…….
4 | Page 15 February 2024
Artificial Intelligence and Machine Learning
2.4. Exploring the data
Use necessary methods to explore your data and check what type (categorical or
numerical) of data does each features including the target falls under. Justify your
answer.
#write your answer here
…….
…….
…….
3. Scikit-learn Pre-processing techniques
In this section, we will explore the preprocessing tools provided by the scikit-learn.
Among various techniques essential for machine learning, one commonly used
method is feature scaling, which involves adjusting features to the same rage.
Sklearn offers numerous methods for scaling the features, but for our purposes, we
will opt for StandardScaler to scale our features.
INSTRUCTIONS:
First find the necessary module where the StandardScaler is and import it.
Create object of StandardScaler
Scale the features by calling fit_transform() method of StandardScaler.
3.1. Import Scaler
You can import StandardScaler from the [Link] module.
#write your solution here
…….
3.2. Scale the features
Initially, you must instantiate a scaler object and subsequently, you will invoke the
fit_transform method to effectively scale the features.
What type of data is returned by the fit_transform method() ?
#write your solution here
5 | Page 15 February 2024
Artificial Intelligence and Machine Learning
…….
3.3. Convert scaled data to DataFrame
To get better visualization of the data, let us convert our scaled data back to
dataframe. Compare your newly created dataframe with orginal dataframe. How are
they different?
#write your solution here
…….
TODO: WORKING WITH SKLEARN DIABETES DATASET
INSTRUCTIONS:
➔ Load sklearn diabetes datasets. Perform necessary steps to load and view
data and compare it with iris dataset. How is diabetes dataset different from
iris dataset? Justify your answer.
➔ Explore other scaling techniques available in sklearn and apply to your
dataset. How is your choice of scaling technique different from
StandardScaler?
THANK YOU
6 | Page 15 February 2024