0% found this document useful (0 votes)

26 views9 pages

PRL Report 1

The document provides an introduction to pattern recognition and data preprocessing, focusing on practical implementation using Python with the Iris dataset. It covers key concepts such as supervised and unsupervised learning, data preprocessing techniques, and the application of K-Nearest Neighbors and K-Means Clustering algorithms. The lab aims to equip students with foundational skills in building machine learning pipelines and emphasizes the importance of data quality and visualization.

Uploaded by

devshovon2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views9 pages

PRL Report 1

Uploaded by

devshovon2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

“INTRODUCTION TO PATTERN RECOGNITION AND DATA PREPROCESSING”

Sakib Chowdhury
ID: 2104010202241

Batch: 40th; Section: C

CSE 460: Pattern Recognition Laboratory

Instructor:
Md. Neamul Haque
Lecturer ____________
Department of Computer Science and Engineering Signature
Premier University

Department of Computer Science and Engineering

Premier University
Chattogram-4000, Bangladesh
26th July 2025
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

3 Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.1 What is Pattern Recognition? . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Supervised and Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . 3
3.3 Importance of Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 3

4 Tools and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

5 Procedure with Code and Explanations . . . . . . . . . . . . . . . . . . . . . 4

5.1 Step 1: Import Required Libraries . . . . . . . . . . . . . . . . . . . . . . . . . 4
5.2 Step 2: Load and Explore the Dataset . . . . . . . . . . . . . . . . . . . . . . 4
5.3 Step 3: Check for Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.4 Step 4: Feature Scaling (Standardization) . . . . . . . . . . . . . . . . . . . . 5
5.5 Step 5: Apply KNN (Supervised Learning) . . . . . . . . . . . . . . . . . . . . 6
5.6 Step 6: Apply K-Means Clustering (Unsupervised Learning) . . . . . . . . . . 6
5.7 Step 7: Visualization of Clustering Results . . . . . . . . . . . . . . . . . . . . 7

6 Observations and Results Summary . . . . . . . . . . . . . . . . . . . . . . . . 8

7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1
3 THEORETICAL BACKGROUND 2

INTRODUCTION
Pattern recognition is a foundational discipline in artificial intelligence and machine learning
that enables systems to identify meaningful structures and regularities in data. It serves as the
backbone for a wide range of real-world applications, including image and speech recognition,
medical diagnosis, fraud detection, and autonomous systems. At its core, pattern recog-
nition involves extracting useful features from raw data and using them to make informed
decisions—either by classifying known patterns or discovering hidden structures.
With the rapid growth of data-driven technologies, the ability to process, analyze, and
interpret data has become a critical skill. This laboratory exercise introduces the fundamental
concepts of pattern recognition through practical implementation using Python. By working
with a well-structured dataset and applying essential preprocessing and modeling techniques,
students gain hands-on experience in building intelligent systems that learn from data.
The lab focuses on two major paradigms: supervised learning, where models are trained
on labeled data to predict outcomes, and unsupervised learning, where algorithms discover
inherent groupings without prior labels. Using the widely adopted Iris dataset, the experiment
demonstrates key workflows such as data inspection, feature scaling, model training, and
visualization. These steps form the basis of most machine learning pipelines and provide a
solid foundation for more advanced studies in data science and AI.

OBJECTIVE
This laboratory session aims to provide a foundational understanding of pattern recognition
and its practical implementation using Python. The main objectives are:
- Understand the core concepts of pattern recognition.
- Differentiate between supervised and unsupervised learning paradigms.
- Perform essential data preprocessing steps such as handling missing values and feature
scaling.
- Implement and evaluate two fundamental machine learning algorithms:
- K-Nearest Neighbors (KNN) for classification (supervised learning).
- K-Means Clustering for grouping unlabeled data (unsupervised learning).
- Visualize results using interactive plotting tools.
The Iris dataset, a widely used benchmark in machine learning, is employed due to its
simplicity, clarity, and historical significance in classification tasks.

THEORETICAL BACKGROUND

What is Pattern Recognition?

Pattern recognition is the automated process of identifying patterns, structures, or regularities

in data. It plays a crucial role in various domains, including:
- Machine Learning
- Computer Vision
- Speech Recognition
4 TOOLS AND LIBRARIES 3

- Medical Diagnosis
- Data Mining
It typically involves feature extraction, model training, and decision-making based on
learned patterns from data.

Supervised and Unsupervised Learning

Table 1: Comparison of Supervised and Unsupervised Learning

Aspect Supervised Learning Unsupervised Learning

Data Type Labeled (input-output pairs) Unlabeled (input only)
Goal Predict output for new data Discover hidden structure or grouping
Training Signal Known target variable No explicit labels
Example Tasks Classification, Regression Clustering, Dimensionality Reduction
Example Algorithm K-Nearest Neighbors (KNN) K-Means Clustering

In this lab:
- KNN is used to classify iris species based on labeled training data.
- K-Means attempts to group similar flowers without using species labels.

Importance of Data Preprocessing

Real-world data is often inconsistent or unbalanced. Preprocessing ensures data quality and
model reliability. Key steps include:
- Handling Missing Values: Ensuring completeness of the dataset.
- Feature Scaling: Normalizing or standardizing features to prevent bias in distance-
based algorithms.
- Categorical Encoding: Converting non-numeric labels into numerical form (not re-
quired here as the Iris dataset is already encoded).
Without proper preprocessing, algorithms like KNN and K-Means may be dominated by
features with larger numerical ranges.

TOOLS AND LIBRARIES

The following Python libraries were used for data manipulation, modeling, and visualization:

Table 2: Required Libraries and Their Purposes

Library Purpose
pandas Data manipulation using DataFrames
numpy Numerical operations and array handling
matplotlib, seaborn Static data visualization
plotly Interactive and dynamic plots
scikit-learn Machine learning models and utilities
5 PROCEDURE WITH CODE AND EXPLANATIONS 4

To install all required packages, run:

!pip install pandas numpy matplotlib seaborn plotly scikit-learn

PROCEDURE WITH CODE AND EXPLANATIONS

This section presents the implementation workflow using annotated screenshots. Code and
outputs are shown visually to enhance clarity and presentation quality.

Step 1: Import Required Libraries

Figure 1: Importing essential libraries for data handling, visualization, and machine learning.

Explanation: All necessary modules are imported at the beginning. scikit-learn provides
pre-built algorithms, while plotly enables interactive visualizations.

Step 2: Load and Explore the Dataset

Figure 2: Loading the Iris dataset into a pandas DataFrame and displaying the first five rows.
5 PROCEDURE WITH CODE AND EXPLANATIONS 5

Explanation: The dataset contains 150 samples, each with 4 morphological features (sepal
and petal dimensions) and a target label (0: setosa, 1: versicolor, 2: virginica).

Step 3: Check for Missing Data

Figure 3: Output of df.isnull().sum() showing no missing values.

Explanation: The dataset is complete, requiring no imputation or data cleaning—ideal for

immediate modeling.

Step 4: Feature Scaling (Standardization)

Figure 4: Applying StandardScaler to normalize all features.

5 PROCEDURE WITH CODE AND EXPLANATIONS 6

Explanation: Since KNN and K-Means rely on distance metrics, standardization ensures all
features contribute equally by transforming them to have mean 0 and standard deviation 1.

Step 5: Apply KNN (Supervised Learning)

Figure 5: Training KNN with k = 3 and evaluating accuracy on the test set.

Explanation:
- Data is split into 80% training and 20% testing.
- KNN classifies new samples based on the majority class among the 3 nearest neighbors.
- Achieved accuracy: 100.0%, indicating strong class separability.

Step 6: Apply K-Means Clustering (Unsupervised Learning)

Figure 6: Fitting K-Means with 3 clusters and assigning cluster labels.

Explanation:
- K-Means partitions the data into k = 3 groups by minimizing within-cluster variance.
5 PROCEDURE WITH CODE AND EXPLANATIONS 7

- It operates without labels, discovering structure purely from feature similarity.

- Despite no supervision, clusters often align with true species.

Step 7: Visualization of Clustering Results

Figure 7: Interactive Plotly scatter plot of K-Means clusters.

Explanation:
- Cluster 0 (purple) is well-separated—likely Iris setosa.
- Clusters 1 and 2 show partial overlap, reflecting similarity between versicolor and vir-
ginica.
- Hover functionality allows inspection of individual samples.
7 CONCLUSION 8

OBSERVATIONS AND RESULTS SUMMARY

Table 3: Summary of Key Steps and Outcomes

Step Observation / Output

Dataset Structure 150 samples, 4 features, 3 balanced classes
Missing Values None detected — data is clean
Feature Scaling All features standardized (mean ≈ 0, std ≈ 1)
KNN Accuracy 100.0% — high classification performance
K-Means Clusters 3 clusters formed without labels
Cluster Visualization Clear separation with minor overlap in mid-range values

Insight: The high agreement between clusters and true species labels demonstrates that the
Iris dataset has strong intrinsic structure, making it ideal for teaching pattern recognition.

CONCLUSION
The Iris dataset, though simple, encapsulates the essence of pattern recognition. This lab not
only reinforces theoretical knowledge but also builds practical skills in building end-to-end
machine learning pipelines. This laboratory successfully introduced the fundamental concepts
and practical implementation of pattern recognition. Key takeaways include:
- Supervised Learning (KNN) achieved high accuracy (100.0%) in classifying iris
species, highlighting the power of labeled data.
- Unsupervised Learning (K-Means) discovered meaningful groupings without labels,
showing how algorithms can reveal hidden patterns.
- Data Preprocessing, especially feature scaling, is essential for distance-based models.
- Interactive Visualization using plotly enhanced interpretability and exploration.

The Iris dataset remains a powerful educational tool for illustrating core machine learning
concepts. Through this experiment, we gained valuable insight into how machines learn from
data—whether guided by labels or discovering patterns autonomously.

Lab Report 1
No ratings yet
Lab Report 1
6 pages
Ad8552 ML Unit V
No ratings yet
Ad8552 ML Unit V
78 pages
Intro to Exploratory Data Analysis
No ratings yet
Intro to Exploratory Data Analysis
17 pages
Class10-Introduction To ML
No ratings yet
Class10-Introduction To ML
32 pages
Comprehensive Machine Learning Guide
No ratings yet
Comprehensive Machine Learning Guide
121 pages
AI Concepts Using Python
100% (10)
AI Concepts Using Python
428 pages
R Course - Part7 ML - Exercise Sheet 2024
No ratings yet
R Course - Part7 ML - Exercise Sheet 2024
8 pages
Scikit-Learn User Guide Release 0.19.dev0
100% (2)
Scikit-Learn User Guide Release 0.19.dev0
2,133 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Scikit Learn Docs
100% (1)
Scikit Learn Docs
2,201 pages
Assignment 4 R Program1
No ratings yet
Assignment 4 R Program1
11 pages
ML Copy
No ratings yet
ML Copy
33 pages
Machine Learning for Business
No ratings yet
Machine Learning for Business
42 pages
Intro to Machine Learning Course
No ratings yet
Intro to Machine Learning Course
83 pages
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
No ratings yet
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
21 pages
Pattern Recognition 2nd Ed. (2009)
No ratings yet
Pattern Recognition 2nd Ed. (2009)
113 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
89 pages
Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
18 pages
Unit 1-2
No ratings yet
Unit 1-2
112 pages
EXAMPLE ML in Real Life
No ratings yet
EXAMPLE ML in Real Life
6 pages
Chapter 1. Introduction: (Huan - Nguyen@inha - Ac.kr)
No ratings yet
Chapter 1. Introduction: (Huan - Nguyen@inha - Ac.kr)
24 pages
CIFAR-10 Image Classification Overview
No ratings yet
CIFAR-10 Image Classification Overview
18 pages
Introduction to Pattern Recognition
100% (1)
Introduction to Pattern Recognition
39 pages
AAIC Syllabus
No ratings yet
AAIC Syllabus
19 pages
Module 4
No ratings yet
Module 4
55 pages
Project 2 - From Waste To Wealth: Bfc23702 Creativity and Innovation Faculty of Civil and Environmental Engineering
No ratings yet
Project 2 - From Waste To Wealth: Bfc23702 Creativity and Innovation Faculty of Civil and Environmental Engineering
2 pages
Course Outline - Advanced Marketing Research
No ratings yet
Course Outline - Advanced Marketing Research
6 pages
Lesson Plan 5: Prose
No ratings yet
Lesson Plan 5: Prose
2 pages
Health: Quarter 2, Wk. 8 - Module 8 Healthy Alternatives To Substance Use and Abuse
100% (4)
Health: Quarter 2, Wk. 8 - Module 8 Healthy Alternatives To Substance Use and Abuse
15 pages
Civil Engineering Materials Overview
No ratings yet
Civil Engineering Materials Overview
3 pages
Consulting Teacher Summative Evaluation
No ratings yet
Consulting Teacher Summative Evaluation
4 pages
UNIT 1: Getting To Know One's Self: "No One Is You, and That Is Your Power." - Dave Grohl
100% (1)
UNIT 1: Getting To Know One's Self: "No One Is You, and That Is Your Power." - Dave Grohl
28 pages
Text Response: Brain Gain in The Western Balkans
No ratings yet
Text Response: Brain Gain in The Western Balkans
1 page
Teacher's Notes and Answer Keys: Term Test 3: Listening
No ratings yet
Teacher's Notes and Answer Keys: Term Test 3: Listening
1 page
Graphic Novel Narrative Assignment
No ratings yet
Graphic Novel Narrative Assignment
2 pages
Bahasa Inggeris K3 SET 1 MPP3 2022
No ratings yet
Bahasa Inggeris K3 SET 1 MPP3 2022
8 pages
Women's Rights Lesson Plan
No ratings yet
Women's Rights Lesson Plan
9 pages
A Role of Cognitive Abality On Financial Stability
No ratings yet
A Role of Cognitive Abality On Financial Stability
21 pages
Factors Influencing Cognitive Dissonance
No ratings yet
Factors Influencing Cognitive Dissonance
2 pages
Curriculum Checklist
100% (1)
Curriculum Checklist
4 pages
Intergrated Marketing Communication Notes
No ratings yet
Intergrated Marketing Communication Notes
21 pages
Thai Students' Views on Uniforms
No ratings yet
Thai Students' Views on Uniforms
8 pages
Business English: Modals Lesson
No ratings yet
Business English: Modals Lesson
2 pages
Mini-Research Baylon
No ratings yet
Mini-Research Baylon
20 pages
Prestieesci Research Review
No ratings yet
Prestieesci Research Review
13 pages
Education Evolution for Teachers
No ratings yet
Education Evolution for Teachers
4 pages
English 3 LP Declarative and Interrogative
100% (1)
English 3 LP Declarative and Interrogative
4 pages
Strategies to Boost TWC Enrolment
No ratings yet
Strategies to Boost TWC Enrolment
3 pages
MBA Strategic Management Exam
No ratings yet
MBA Strategic Management Exam
2 pages
AB3601 - Strategic Management Course Outline 2021 - 2022 S2
No ratings yet
AB3601 - Strategic Management Course Outline 2021 - 2022 S2
15 pages
Creating Theoretical and Conceptual Frameworks
No ratings yet
Creating Theoretical and Conceptual Frameworks
23 pages
Types of Reinforcement Learning Explained
No ratings yet
Types of Reinforcement Learning Explained
3 pages
Lab 4 Scratch
No ratings yet
Lab 4 Scratch
3 pages
34 Callaman
No ratings yet
34 Callaman
14 pages
Portfolio in Field Study 1 (Galecia, R.A)
No ratings yet
Portfolio in Field Study 1 (Galecia, R.A)
19 pages

PRL Report 1

Uploaded by

PRL Report 1

Uploaded by

“INTRODUCTION TO PATTERN RECOGNITION AND DATA PREPROCESSING”

Batch: 40th; Section: C

CSE 460: Pattern Recognition Laboratory

Department of Computer Science and Engineering

4 Tools and Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

5 Procedure with Code and Explanations . . . . . . . . . . . . . . . . . . . . . 4

6 Observations and Results Summary . . . . . . . . . . . . . . . . . . . . . . . . 8

What is Pattern Recognition?

Pattern recognition is the automated process of identifying patterns, structures, or regularities

Supervised and Unsupervised Learning

Table 1: Comparison of Supervised and Unsupervised Learning

Aspect Supervised Learning Unsupervised Learning

Importance of Data Preprocessing

TOOLS AND LIBRARIES

Table 2: Required Libraries and Their Purposes

To install all required packages, run:

PROCEDURE WITH CODE AND EXPLANATIONS

Step 1: Import Required Libraries

Step 2: Load and Explore the Dataset

Step 3: Check for Missing Data

Figure 3: Output of df.isnull().sum() showing no missing values.

Explanation: The dataset is complete, requiring no imputation or data cleaning—ideal for

Step 4: Feature Scaling (Standardization)

Figure 4: Applying StandardScaler to normalize all features.

Step 5: Apply KNN (Supervised Learning)

Step 6: Apply K-Means Clustering (Unsupervised Learning)

Figure 6: Fitting K-Means with 3 clusters and assigning cluster labels.

- It operates without labels, discovering structure purely from feature similarity.

Step 7: Visualization of Clustering Results

Figure 7: Interactive Plotly scatter plot of K-Means clusters.

OBSERVATIONS AND RESULTS SUMMARY

Table 3: Summary of Key Steps and Outcomes

Step Observation / Output

You might also like