0% found this document useful (0 votes)

2 views

Data Science Lecture No 02

Uploaded by

abdul baqi

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Data Science Lecture No 02

Uploaded by

abdul baqi

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Lecture No.

02 AI 7 , SEN –5
th th

Course: Data Science

Instructor: Dr. Maryum Nisar

12/15/2024 1
Data Science

12/15/2024 2
Lecture Contents
Data Science
Understanding Data Science
Exploratory Data Analysis

12/15/2024 3
Data Science
Data Science
 Data science is the application of computational and statistical techniques to
address or gain insight into some problem in the real world
 Data science = statistics +
data processing +
machine learning +
scientific inquiry +
visualization +
business analytics +
big data + …

12/15/2024 4
CRISP process
CRoss-Industry Standard Process for data
mining (CRISP)

5
Data Science Process Step

6
Understanding data science
Data requirements:
 There can be various sources of data for an organization. It is important to comprehend what type of data is required for
the organization to be collected, curated, and stored.
 In addition to this, it is required to categorize the data, numerical or categorical, and the format of storage and
dissemination.

 Data collection:
 Data collected from several sources must be stored in the correct format and transferred to the right information technology
personnel within a company. As mentioned previously, data can be collected from several objects on several events using
different types of sensors and storage tools.

Data processing:
 Preprocessing involves the process of pre-curating the dataset before actual analysis. Common tasks involve correctly
exporting the dataset, placing them under the right tables, structuring them, and exporting them in the correct format.
Understanding data science
 Data cleaning:
 Preprocessed data is still not ready for detailed analysis. It must be correctly transformed for an incompleteness
check, duplicates check, error check, and missing value check. These tasks are performed in the data cleaning stage,
which involves responsibilities such as matching the correct record, finding inaccuracies in the dataset, understanding
the overall data quality, removing duplicate items, and filling in the missing values.
 However, how could we identify these anomalies on any dataset?
 An example of data cleaning would be using outlier detection methods for quantitative data cleaning.

 EDA:
 Exploratory data analysis, is the stage where we actually start to understand the message contained in the data.

Modeling and algorithm:

 From a data science perspective, generalized models or mathematical formulas can represent or exhibit relationships
among different variables, such as correlation or causation..
Understanding data science
Data Product:
 A data product is generally based on a model developed during data analysis, for example, a
recommendation model that inputs user purchase history and recommends a related item that
the user is highly likely to buy.

Communication:
 This stage deals with disseminating the results to end stakeholders to use the result for business
intelligence. One of the most notable steps in this stage is data visualization.
 Visualization deals with information relay techniques such as tables, charts, summary diagrams,
and bar charts to show the analyzed result.
Prior Knowledge
Gaining information on:

- Objective of the problem

- Subject area of the problem
- Data

10
Data Preparation / Data exploration

Data Exploration
Data quality
Handling missing values
Data type conversion
Transformation
Outliers
Feature selection
Sampling

11
Introduction to Exploratory Data Analysis (EDA)

EDA is a crucial step in data

science that allows for
understanding data.

It involves summarizing data,

detecting anomalies, and
testing assumptions.

EDA helps make data-driven

decisions before modeling.
12

12
Key aspects of EDA
Correlation Analysis
 Checking the relationships between variables to understand how they might affect each other. This
includes computing correlation coefficients and creating correlation matrices.

 Handling Missing Values

 Detecting and deciding how to address missing data points, whether by imputation or removal,
depending on their impact and the amount of missing data.

 Summary Statistics
 Calculating key statistics that provide insight into data trends and nuances

Testing Assumptions
 Many statistical tests and models assume the data meet certain conditions (like normality
Why Exploratory Data Analysis is Important?
Exploratory Data Analysis (EDA) is important for several reasons, especially in the context of data
science and statistical modeling. Here are some of the key reasons why EDA is a critical step in the
data analysis process:
 Understanding Data Structures
 Identifying Patterns and Relationships
 Detecting Anomalies and Outliers
 Testing Assumptions
 Informing Feature Selection and Engineering
 Optimizing Model Design
 Facilitating Data Cleaning
 Enhancing Communication
EDA Importance
Understanding Data Structures
o EDA helps in getting familiar with the dataset, understanding the number of features, the
type of data in each feature, and the distribution of data points. This understanding is
crucial for selecting appropriate analysis or prediction techniques.

Identifying Patterns and Relationships

o Through visualizations and statistical summaries, EDA can reveal hidden patterns and
intrinsic relationships between variables. These insights can guide further analysis and
enable more effective feature engineering and model building.

Detecting Anomalies and Outliers

o EDA is essential for identifying errors or unusual data points that may adversely affect the
results of your analysis. Detecting these early can prevent costly mistakes in predictive
modeling and analysis.
EDA Importance
 Testing Assumptions
o Many statistical models assume that data follow a certain distribution or that variables
are independent. EDA involves checking these assumptions.

 Informing Feature Selection and Engineering

o Insights gained from EDA can inform which features are most relevant to include in a
model and how to transform them (scaling, encoding) to improve model performance.

Optimizing Model Design

o By understanding the data’s characteristics, analysts can choose appropriate modeling
techniques, decide on the complexity of the model, and better tune model parameters.
EDA Importance
 Facilitating Data Cleaning
o EDA helps in spotting missing values and errors in the data, which are critical to address
before further analysis to improve data quality and integrity.

 Enhancing Communication
o Visual and statistical summaries from EDA can make it easier to communicate findings
and convince others of the validity of your conclusions, particularly when explaining data-
driven insights to stakeholders without technical backgrounds.
Traditional Vs Machine Learning Model

18
https://www.ranker.com/crowdranked-list/best-jobs-in-the-world
https://www.panelplace.com/blogs/top-8-coolest-jobs-world 18
Data Science process

19
https://www.ranker.com/crowdranked-list/best-jobs-in-the-world
https://www.panelplace.com/blogs/top-8-coolest-jobs-world 19
Data Science process

20
Thank You !
12/15/2024 21

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (81)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
Ccs346 Eda Unit 1 Notes
No ratings yet
Ccs346 Eda Unit 1 Notes
20 pages
1001 Songs
69% (72)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Data Science Lecture No 02
No ratings yet
Data Science Lecture No 02
21 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Unit 3
No ratings yet
Unit 3
83 pages
Notes Unit I
No ratings yet
Notes Unit I
47 pages
Notes - EDA-Unit1 (2)
No ratings yet
Notes - EDA-Unit1 (2)
34 pages
UNIT 1 Exploratory Data Analysis
No ratings yet
UNIT 1 Exploratory Data Analysis
21 pages
UNIT 1
No ratings yet
UNIT 1
23 pages
Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Notes - Unit 1 - Exploratory Data Analysis
No ratings yet
Notes - Unit 1 - Exploratory Data Analysis
33 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
33 pages
Document (1)
No ratings yet
Document (1)
10 pages
unit-1
No ratings yet
unit-1
50 pages
DAT100_Int_Data_Ana_Lec2_Intro II
No ratings yet
DAT100_Int_Data_Ana_Lec2_Intro II
39 pages
Unit I and unit ii dev (1)
No ratings yet
Unit I and unit ii dev (1)
36 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
EDA and Cleaning
No ratings yet
EDA and Cleaning
24 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
IDS CH2 Bharath S
No ratings yet
IDS CH2 Bharath S
57 pages
Data Science Tools Final
No ratings yet
Data Science Tools Final
11 pages
Summer Training
No ratings yet
Summer Training
8 pages
DS Lecture 15
No ratings yet
DS Lecture 15
44 pages
Session1-DataCharacteristics
No ratings yet
Session1-DataCharacteristics
41 pages
EDA Unit 1 Notes
No ratings yet
EDA Unit 1 Notes
27 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
Lecture 2 The data science process and tools for each step
No ratings yet
Lecture 2 The data science process and tools for each step
8 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
18 pages
Slidesgo Enhancing Insights a Comprehensive Overview of Data Science Modules 20250113133756aOMY
No ratings yet
Slidesgo Enhancing Insights a Comprehensive Overview of Data Science Modules 20250113133756aOMY
14 pages
dataScience(mod1)
No ratings yet
dataScience(mod1)
4 pages
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
Introduction-to-Data-Science
No ratings yet
Introduction-to-Data-Science
8 pages
Emerging - 2021 - Module 2 PDF
No ratings yet
Emerging - 2021 - Module 2 PDF
61 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Introduction to DataAnalysis
No ratings yet
Introduction to DataAnalysis
17 pages
What Is Exploratory Data Analysis (EDA)
100% (1)
What Is Exploratory Data Analysis (EDA)
13 pages
Unit 1
No ratings yet
Unit 1
19 pages
Data Science-Lec 1
No ratings yet
Data Science-Lec 1
17 pages
3 DSEngineering
No ratings yet
3 DSEngineering
64 pages
Unit-2
No ratings yet
Unit-2
21 pages
Datascience and Visualization
No ratings yet
Datascience and Visualization
8 pages
Data Science 2
No ratings yet
Data Science 2
55 pages
Research Assignment 02burhan Ul Din
No ratings yet
Research Assignment 02burhan Ul Din
8 pages
Green Gradient Monotone Minimalist Presentation Template
No ratings yet
Green Gradient Monotone Minimalist Presentation Template
8 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
BI_Unit_2
No ratings yet
BI_Unit_2
113 pages
UNIT I Material
No ratings yet
UNIT I Material
25 pages
EDA 2
No ratings yet
EDA 2
69 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
EDA - Zep
No ratings yet
EDA - Zep
33 pages
6220010
No ratings yet
6220010
37 pages
Fundamental of Data Science
No ratings yet
Fundamental of Data Science
20 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
UNIT _ Introduction_DataScience_new (1)
No ratings yet
UNIT _ Introduction_DataScience_new (1)
55 pages
Unit 4
No ratings yet
Unit 4
33 pages
Data Science Lecture No 03
No ratings yet
Data Science Lecture No 03
23 pages
Demographic Features of Pakistan
No ratings yet
Demographic Features of Pakistan
21 pages
Pakistan Studies
No ratings yet
Pakistan Studies
11 pages
Indian Nationalism
No ratings yet
Indian Nationalism
10 pages
8051 Tutorial
No ratings yet
8051 Tutorial
116 pages
1BS S4hana2022 BPD en XX
No ratings yet
1BS S4hana2022 BPD en XX
35 pages
Alan Varvashenya Resume
No ratings yet
Alan Varvashenya Resume
1 page
Marketing For Architects
No ratings yet
Marketing For Architects
13 pages
Authority - Sbs - Hitapp v2
No ratings yet
Authority - Sbs - Hitapp v2
2 pages
Download Complete (Ebook) Aspic 02 - Detectives de lo extraño by Thierry Glorin, Jacques Lamontagne PDF for All Chapters
100% (11)
Download Complete (Ebook) Aspic 02 - Detectives de lo extraño by Thierry Glorin, Jacques Lamontagne PDF for All Chapters
57 pages
28. SCBA Air Compressor
No ratings yet
28. SCBA Air Compressor
28 pages
Jkcement Summer Training Report by Manoj Sharma
100% (3)
Jkcement Summer Training Report by Manoj Sharma
49 pages
Indian Journal of Behavioral Research (2022, Vol-3, No 3 & 4) - 110-120
No ratings yet
Indian Journal of Behavioral Research (2022, Vol-3, No 3 & 4) - 110-120
12 pages
UPS Neuttral Earthing
No ratings yet
UPS Neuttral Earthing
11 pages
Vehicular Ad-Hoc Network Vanet
No ratings yet
Vehicular Ad-Hoc Network Vanet
4 pages
Development of A CFD Tool Based On SnappyHexMeshOpenFOAM For The Axial Fan PerformanceLecture Notes in Mechanical Engineering
No ratings yet
Development of A CFD Tool Based On SnappyHexMeshOpenFOAM For The Axial Fan PerformanceLecture Notes in Mechanical Engineering
13 pages
RBDG Man 032 0100 - Rams
No ratings yet
RBDG Man 032 0100 - Rams
12 pages
Chaptgpt Use
No ratings yet
Chaptgpt Use
6 pages
Ibm Laptop Thinkpad 385xd (2635)
No ratings yet
Ibm Laptop Thinkpad 385xd (2635)
132 pages
Achievement Test Eim
No ratings yet
Achievement Test Eim
4 pages
Facebook Inc V Nguyen Et Al Candce-21-05002 0001.0
No ratings yet
Facebook Inc V Nguyen Et Al Candce-21-05002 0001.0
35 pages
Castellini C22 OM PDF
No ratings yet
Castellini C22 OM PDF
75 pages
Cot Powerpoint English 5-Types of Viewing Materials 1
100% (1)
Cot Powerpoint English 5-Types of Viewing Materials 1
137 pages
A Proposed Power Plant Design For Barangay Lalakay, Los Baños, Laguna
No ratings yet
A Proposed Power Plant Design For Barangay Lalakay, Los Baños, Laguna
21 pages
User Manual
No ratings yet
User Manual
4 pages
Unlocking_the_Potential_of_Generative_AI_through_N
No ratings yet
Unlocking_the_Potential_of_Generative_AI_through_N
55 pages
SOP 2 Testing
No ratings yet
SOP 2 Testing
2 pages
Improve Power Conversion Efficiency
No ratings yet
Improve Power Conversion Efficiency
20 pages
Atmospheric Water Generator
No ratings yet
Atmospheric Water Generator
9 pages
Lecture-47 (Exception Handling)
No ratings yet
Lecture-47 (Exception Handling)
32 pages

Data Science Lecture No 02

Uploaded by

Data Science Lecture No 02

Uploaded by

Lecture No.

Course: Data Science

Modeling and algorithm:

- Objective of the problem

EDA is a crucial step in data

It involves summarizing data,

EDA helps make data-driven

 Handling Missing Values

Identifying Patterns and Relationships

Detecting Anomalies and Outliers

 Informing Feature Selection and Engineering

Optimizing Model Design

You might also like