0% found this document useful (0 votes)

21 views5 pages

Data Exploration

Uploaded by

panu.sahasra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views5 pages

Data Exploration

Uploaded by

panu.sahasra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

1.

Data Exploration

In this stage of project cycle, we try to interpret some useful information out of the data we have acquired. For this purpose,
we need to explore the data and try to put it uniformly for a better understanding. This stage deals with validating or
verification of the collected data and to analyze that:

 The data is according to the specifications decided.

 The data is free from errors.
 The data is meeting our needs This stage
is divided into 2 sub stages.

1) Data Cleaning
2) Data Visualization.

Data Cleaning

Data cleaning helps in getting rid of commonly found errors and mistakes in a data set. These are the 3 commonly found errors
in data.

1) Outliers: Data points existing out of the range.

2) Missing data: Data points missing at certain places.
3) Erroneous data: Incorrect data points.

Outliers

An outlier is a data point in a dataset that is distant from all other observations.

An outlier is something that behaves d i f f e r e n t l y f r o m the combination/ collection of the data.

Missing Data

What do these NaN values indicate?They

are the missing values in the data set. We can handle them in two ways:

1. By eliminating the rows of missing values.

(Generally, not recommended as it might reduce
the data set to some extent leading to less data to be
trained)

2. By using an Imputer to find the best possible substitute to replace missing values.

Erroneous Data Student Class

Name
Erroneous data is test data that falls outside of what is RIYA GEORGE XA acceptable and should
JOSHUA SAM XA
be rejected by the system.
APARNA BINU XA
SIDHARDH V R XA
NITHILA M 57
ATHULYA M S XA
ANUJA MS XB
KEERTHI
XB
KRISHNANATH
Data Visualization

Why we need to explore data through visualization?

1) We want to quickly get a sense of the trends, relationships, and patterns contained within the data.
2) It helps us define strategy for which model to use at a later stage.
3) Visual representation is easier to understand and communicate to others

Please draw all the graphs and write description from the T.B as it is

Modelling

It’s the fourth stage of AI project cycle. In previous stage, i.e. graphical representation makes the data understandable for
humans as we can discover trends and patterns out of it. But when it comes to machines accessing and analyzing data, it
needs the data in the most basic form of numbers (which is binary – 0s and 1s) and when it comes to discovering patterns
and trends in data, the machine goes in for mathematical representations of the same.

The ability to mathematically describe the relationship between parameters is the heart of every AI model. Generally, AI
models can be classified as follows:

Rule Based Approach

In this approach, the rules are defined by the developer. The machine follows the rules or instructions mentioned by the
developer and performs its task accordingly. So, it’s a static model. i.e. the machine once trained, does not take into
consideration any changes made in the original training dataset.

Thus, machine learning gets introduced as an extension to this as in that case, the machine adapts to change in data and
rules and follows the updated path only, while a rule-based model does what it has been taught once.

Learning Based Approach

It’s a type of AI modelling where the machine learns by itself. Under the Learning Based approach, the AI model gets
trained on the data fed to it and then is able to design a model which is adaptive to the change in data. That is, if the
model is trained with X type of data and the machine designs the algorithm around it, the model would modify itself
according to the changes which occur in the data so that all the exceptions are handled in this case.

After training, the machine is now fed with testing data. Now, the testing data might not have similar images as the ones
on which the model has been trained. So, the model adapts to the features on which it has been trained and accordingly
predicts the output. In this way, the machine learns by itself by adapting to the new data which is flowing in. This is the
machine learning approach which introduces the dynamicity in the model.
Generally, learning based models can be classified as follows:
I. Supervised Learning

In a supervised learning model, the dataset which is fed to the machine is labelled. In other
words, we can say that the dataset is known to the person who is training the machine only then
he/she is able to label the data. A label is some information which can be used as a tag for data.
For example, students get grades according to the marks they secure in examinations. These
grades are labels which categorize the students according to their marks.

There are two main types of supervised learning models:

a) Classification
In this model, data is classified according to the labels. For example, in the grading system, students are classified on the
basis of the grades they obtain with respect to their marks in the examination. This model works on discrete dataset which
means the data need not be continuous.

b) Regression
This model work on continuous data. For example, if you wish to predict your next salary, then you would put in the data
of your previous salary, any increments, etc., and would train the model. Here, the data which has been fed to the
machine is continuous.
II. Unsupervised Learning

An unsupervised learning model works on unlabeled dataset. This means that the data which is fed to the machine is
random and there is a possibility that the person who is training the model does not have any information regarding it. The
unsupervised learning models are used to identify relationships, patterns and trends out of the data which is fed into it. It
helps the user in understanding what the data is about and what are the major features identified by the machine in it.
For example, you have a random data of 1000 dog images and you wish to understand some pattern out of it, you would
feed this data into the unsupervised learning model and would train the machine on it. After training, the machine would
come up with patterns which it was able
to identify out of it. The Machine might come up with patterns which are already known to the user like colour or it might
even come up with something very unusual like the size of the dogs. There are two main types of unsupervised learning
models:

a) Clustering

It refers to the unsupervised learning algorithm which can cluster the unknown data according to the patterns or
trends identified out of it. The patterns observed might be the ones which are known to the developer or it might
even come up with some unique patterns out of it.

.
b) Dimensionality Reduction

We humans are able to visualize up to 3-Dimensions only but according to a lot of theories and algorithms, there are
various entities which exist beyond 3-Dimensions. For example, in Natural language Processing, the words are
considered to be N-Dimensional entities. Which means that we cannot visualize them as they exist beyond our
visualization ability. Hence, to make sense out of it, we need to reduce their dimensions. Here, dimensionality reduction
algorithm is used.

III. Reinforcement Learning

It a type of machine learning technique that enables an agent(model) to learn in an interactive environment by trial and
error using feedback from its own actions and experiences. Though both supervised and reinforcement learning use
mapping between input and output, unlike supervised learning where feedback provided to the agent(model) is correct
set of actions for performing a task, reinforcement learning uses rewards and punishment as signals for positive and
negative
behavior. Reinforcement learning is all about making decisions sequentially.

5. Evaluation
Evaluation is a process of understanding the reliability of any AI model, based on outputs by feeding the test data set into
the model and comparing it with actual answers. i.e. once a model has been made and trained, it needs to go through
proper testing so that one can calculate the efficiency and performance of the model. Hence, the model is tested with the
help of Testing Data. which was separated out of the acquired data set at Data Acquisition stage.

Accuracy
Accuracy is define as the percentage of correct predictions out of all the observations.

Precision
Precision is defined as the percentage of true positive cases versus all the cases where the prediction is true.

Recall
Recall is defined as the fraction of positive cases that are correctly Identified.

F1 score
The F1 score is a number between 0 and 1 and is the harmonic mean of precision and recall

Advanced Concept of Modelling in AI
No ratings yet
Advanced Concept of Modelling in AI
32 pages
Computing Programmes Syllabus 2021
No ratings yet
Computing Programmes Syllabus 2021
587 pages
Unit 2 – Advance Concepts of Modelling in AI
No ratings yet
Unit 2 – Advance Concepts of Modelling in AI
12 pages
Part B Unit 2 Running Notes and Textbook Questions
No ratings yet
Part B Unit 2 Running Notes and Textbook Questions
27 pages
Introduction To AI - Part Three
No ratings yet
Introduction To AI - Part Three
7 pages
Data Science Process and Machine Learning
No ratings yet
Data Science Process and Machine Learning
6 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
Quick Start Guide - SAP SuccessFactors
100% (2)
Quick Start Guide - SAP SuccessFactors
18 pages
Machine Learning Notes
100% (10)
Machine Learning Notes
19 pages
AI Project Cycle
No ratings yet
AI Project Cycle
7 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
SS CH2 LM AI CLASS X
No ratings yet
SS CH2 LM AI CLASS X
92 pages
3. Introduction to Machine Learning
No ratings yet
3. Introduction to Machine Learning
20 pages
ML Lecture - 1
No ratings yet
ML Lecture - 1
33 pages
Deep Learning[1]
No ratings yet
Deep Learning[1]
26 pages
Machine Learning - Unit - 1
100% (1)
Machine Learning - Unit - 1
58 pages
Project Cycle
No ratings yet
Project Cycle
12 pages
UNIT 3__ML
No ratings yet
UNIT 3__ML
15 pages
Arduino V Machine Learning Steven Barrett pdf download
No ratings yet
Arduino V Machine Learning Steven Barrett pdf download
83 pages
Ai Project Cycle Short Note
No ratings yet
Ai Project Cycle Short Note
9 pages
Ai Project Cycle
No ratings yet
Ai Project Cycle
4 pages
Module 1
No ratings yet
Module 1
50 pages
Machine Learning For Absolute Beginners A Plain English Introduction 2 edition Edition Oliver Theobald pdf download
100% (4)
Machine Learning For Absolute Beginners A Plain English Introduction 2 edition Edition Oliver Theobald pdf download
63 pages
Unit2- AI Project Cycle-converted
No ratings yet
Unit2- AI Project Cycle-converted
12 pages
Ai Project Cycle
No ratings yet
Ai Project Cycle
9 pages
PBS23101013_MGT7105
No ratings yet
PBS23101013_MGT7105
12 pages
1 ML M1503-Introduction - ABP
No ratings yet
1 ML M1503-Introduction - ABP
14 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
PartB-U-2_Notes
No ratings yet
PartB-U-2_Notes
17 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Project cycle- Key points
No ratings yet
Project cycle- Key points
3 pages
LECTURE-2
No ratings yet
LECTURE-2
36 pages
AI PROJECT CYCLE
No ratings yet
AI PROJECT CYCLE
30 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
MLE
No ratings yet
MLE
15 pages
Study Notes - Lesson 1 - 7 PDF
No ratings yet
Study Notes - Lesson 1 - 7 PDF
25 pages
AI Project Cycle PPT - Notes
No ratings yet
AI Project Cycle PPT - Notes
9 pages
Impact of Artificial Intelligence (IA) on Service Delivery in Tertiary Education Sector of Benue State, Nigeria
No ratings yet
Impact of Artificial Intelligence (IA) on Service Delivery in Tertiary Education Sector of Benue State, Nigeria
13 pages
帮助写作解决您的论文写作难题
100% (1)
帮助写作解决您的论文写作难题
4 pages
IX AI Subject Specific Skills Unit 2 Notes
No ratings yet
IX AI Subject Specific Skills Unit 2 Notes
6 pages
MLES
No ratings yet
MLES
30 pages
360 DigiTech 2020 Q4 Presentation
No ratings yet
360 DigiTech 2020 Q4 Presentation
24 pages
AI Modelling
No ratings yet
AI Modelling
18 pages
AI Project Cycle 4
No ratings yet
AI Project Cycle 4
3 pages
AI MODEL CLASSIFICATION
No ratings yet
AI MODEL CLASSIFICATION
21 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
AI Project Cycle-Notes
No ratings yet
AI Project Cycle-Notes
14 pages
AI_Project_Cycle Revision notes
No ratings yet
AI_Project_Cycle Revision notes
4 pages
Assignment No 1
No ratings yet
Assignment No 1
9 pages
02 AI Project Cycle Revision Notes
No ratings yet
02 AI Project Cycle Revision Notes
5 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
Unit-2 AI Project Cycle
No ratings yet
Unit-2 AI Project Cycle
20 pages
science.adt6807
No ratings yet
science.adt6807
4 pages
AI PROJECT CYCLE Data Modelling
No ratings yet
AI PROJECT CYCLE Data Modelling
18 pages
AI PROJECT CYCLE EASY NOTES
No ratings yet
AI PROJECT CYCLE EASY NOTES
7 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Phi-3 Technical Report: A Highly Capable Language Model Locally On Your Phone
No ratings yet
Phi-3 Technical Report: A Highly Capable Language Model Locally On Your Phone
12 pages
ML Doc1
No ratings yet
ML Doc1
14 pages
02 Ai Project Cycle Revision Notes
No ratings yet
02 Ai Project Cycle Revision Notes
4 pages
iconfile-1744590964
No ratings yet
iconfile-1744590964
6 pages
AI Session 5 Class 10
No ratings yet
AI Session 5 Class 10
19 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
02 Ai Project Cycle Revision Notes
No ratings yet
02 Ai Project Cycle Revision Notes
4 pages
Chapter-3
No ratings yet
Chapter-3
4 pages
Unit_IIAIProjectCycle
No ratings yet
Unit_IIAIProjectCycle
9 pages
Discern Brain Tumor
No ratings yet
Discern Brain Tumor
10 pages
Robotics Vision Lecture1
No ratings yet
Robotics Vision Lecture1
258 pages
5301 Answers
No ratings yet
5301 Answers
4 pages
Beldex
No ratings yet
Beldex
3 pages
GPT-2 Model Card
No ratings yet
GPT-2 Model Card
3 pages
Thesis WIPv1
No ratings yet
Thesis WIPv1
30 pages
WL DSP Comparison 2024
No ratings yet
WL DSP Comparison 2024
1 page
Advance Data Analytics Using MS Excel - Office Master
No ratings yet
Advance Data Analytics Using MS Excel - Office Master
10 pages
Design Possibilities and Challenges of DNN Models
No ratings yet
Design Possibilities and Challenges of DNN Models
61 pages
Đề 21 - Phát triển đề tham khảo BGD môn Tiếng Anh năm 2024 (HS)
No ratings yet
Đề 21 - Phát triển đề tham khảo BGD môn Tiếng Anh năm 2024 (HS)
4 pages
Problem Representation in Ai
100% (10)
Problem Representation in Ai
12 pages
The Example That Does NOT Represent
No ratings yet
The Example That Does NOT Represent
5 pages
Introduction For Care Insurance
No ratings yet
Introduction For Care Insurance
11 pages
Assignment 1
No ratings yet
Assignment 1
11 pages
Chen Decary 2019 Artificial Intelligence in Healthcare An Essential Guide For Health Leaders
No ratings yet
Chen Decary 2019 Artificial Intelligence in Healthcare An Essential Guide For Health Leaders
9 pages
Assignment 3 without 9,10 ans
No ratings yet
Assignment 3 without 9,10 ans
3 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
Chapter Two Literature Review
0% (1)
Chapter Two Literature Review
6 pages
The Evolution of The Financial Adviser Platform
No ratings yet
The Evolution of The Financial Adviser Platform
14 pages
AIML - 21CS54 - IA3 - Preparatory - Question Bank-1
No ratings yet
AIML - 21CS54 - IA3 - Preparatory - Question Bank-1
3 pages
THC 101 - Macro Perspective in Tourism and Hospitality
No ratings yet
THC 101 - Macro Perspective in Tourism and Hospitality
24 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet