0% found this document useful (0 votes)

16 views59 pages

Data Analytics Unit - II Data Analysis

Uploaded by

Shubham Vishwakarma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views59 pages

Data Analytics Unit - II Data Analysis

Uploaded by

Shubham Vishwakarma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Data Analytics

Unit - II | Data Analysis

By - Er. Monu Kumar
B.Tech(CSE), M.Tech(CSE), NET JRF, Ph.D(CSE)*
Introduction to Data Analysis
Introduction to Data Analysis

● Deﬁnition: Data analysis involves cleaning, transforming, modeling,

and visualizing data to uncover valuable insights and support
decision-making in engineering applications.
● Importance:
○ Optimizing processes and designs.
○ Predicting system behavior.
○ Identifying potential failures and anomalies.
○ Making data-driven decisions
Introduction to Data Analysis

● Key Areas:
○ Regression Modeling
○ Multivariate Analysis
○ Bayesian Modeling & Networks
○ Support Vector & Kernel Methods
○ Time Series Analysis
○ Rule Induction
○ Neural Networks
○ Fuzzy Logic
○ Stochastic Search Methods
Introduction to Data Analysis
Introduction to Data Analysis
A simple infographic illustrating the data analysis process
Step 1

Deﬁne why you need Data

Analysis

Step 5 Step 2

Interprets results and Begin collecting data

apply them from sources

Step 4 Step 3

Begin analyzing the data Clean through

unnecessary data
Regression Modeling
Regression Modeling
Regression analysis is a statistical method used to examine the relationship
between one or more independent variables and a dependent variable. It
quantiﬁes how changes in independent variables inﬂuence the dependent variable,
allowing for prediction and insight into data relationships.

● Simple Linear Regression: Models the relationship between a single

independent and dependent variable using a straight line.
● Multiple Linear Regression: Considers the inﬂuence of multiple independent
variables on the dependent variable.
● Polynomial Regression: Captures curvilinear relationships when a straight
line is insuﬃcient.
● Logistic Regression: Predicts the probability of a categorical outcome (e.g.,
yes/no).
Regression Modeling: Types
Regression Modeling
Mathematical Process (Simple Linear Regression)
Regression Modeling
Regression Modeling
Diagram – Regression Line
Regression Modeling

Key Points
● Regression helps in prediction, forecasting, and understanding
relationships.
● Accuracy depends on assumptions: linearity, independence,
normality, homoscedasticity.
● Extended forms include polynomial regression, ridge/lasso
regression, logistic regression for classification.
Multivariate Analysis
Multivariate Analysis (MVA)

Multivariate Analysis is a collection of statistical techniques used to

examine data that arises from more than one dependent variable or
multiple independent variables simultaneously.
Unlike univariate (single variable) or bivariate (two variables) analysis,
multivariate analysis helps understand relationships, patterns, and
dependencies across multiple dimensions at once.
Example: A researcher studies how income, education, and age
(independent variables) together affect spending behavior and savings
(dependent variables).
Multivariate Analysis (MVA)

Objectives
1. Identify relationships among multiple variables.
2. Reduce data dimensionality (Principal Component Analysis).
3. Classify & group data (Cluster Analysis, Discriminant Analysis).
4. Predict outcomes based on multiple predictors (Multivariate
Regression).
5. Find hidden structures in large datasets.
Multivariate Analysis (MVA)

Types of Multivariate Analysis

1. Multivariate Regression Analysis – Predicts multiple dependent variables

from multiple independent variables. Y=XB+EY
2. Principal Component Analysis (PCA) – Reduces data dimensions while
retaining variance.
3. Factor Analysis – Identifies hidden (latent) variables influencing observed
data.
4. Discriminant Analysis – Classifies data into categories.
5. Cluster Analysis – Groups similar data points together.
6. MANOVA (Multivariate Analysis of Variance) – Extension of ANOVA for
multiple dependent variables.
Multivariate Analysis (MVA)
Mathematical Process (Generalized View)
Multivariate Analysis (MVA)
Multivariate Analysis (MVA)

Key Points
● Multivariate analysis deals with datasets having multiple
dimensions.
● It helps in prediction, classification, and pattern recognition.
Widely applied in finance, marketing, biology, medicine, and
AI/ML.
● Techniques like PCA, clustering, MANOVA are part of this field.
Bayes' Theorem
Bayes' Theorem
Bayesian Modeling
Bayesian Modeling

Definition
Bayesian modeling in data analytics is a statistical approach where
uncertainty in data is represented using probability distributions, and
Bayes’ theorem is used to update beliefs about parameters or
hypotheses as new data becomes available.
Unlike traditional (frequentist) methods that provide point estimates,
Bayesian modeling gives a distribution of possible outcomes (posterior
distribution), allowing analysts to incorporate prior knowledge and
quantify uncertainty more effectively.
Bayesian Modeling

Why Use Bayesian Modeling in Data Analytics?

1. Handles uncertainty naturally (through probability distributions).
2. Updates knowledge continuously as new data arrives.
3. Incorporates prior knowledge (domain expertise, historical data).
4. Useful in prediction, decision-making, classification, anomaly
detection, risk assessment.
Bayesian Modeling
Mathematical Framework
Bayesian Modeling

Example in Data Analytics

Customer behavior analysis:
● Prior belief: Customers usually buy product A with probability 0.3.
● Collect new data: Out of 100 customers, 40 buy product A.
● Bayesian model updates prior with data → New posterior belief ~
probability of purchase is ~0.38 with credible interval.
This helps businesses predict sales, personalize recommendations,
and manage risks with quantified uncertainty.
Bayesian Modeling: Bayesian Updating in Data Analytics
Bayesian Modeling

Key Takeaways
● Bayesian modeling is a powerful tool in modern data analytics.
● It allows continuous learning as new data arrives.
● It provides probabilistic insights instead of just point estimates.
● Applications: fraud detection, healthcare analytics, customer
segmentation, demand forecasting, and AI/ML models.
Inference Problem
Inference Problem
Support Vector and Kernel
Methods
Support Vector and Kernel Methods

Deﬁnition
● Support Vector Machines (SVMs): Supervised learning models that
classify data by ﬁnding the optimal hyperplane that maximizes the
margin between different classes.
● Kernel Methods: Mathematical techniques that implicitly map data
into a higher-dimensional feature space, enabling SVMs to handle
non-linear relationships.
Support Vector and Kernel Methods

Key concepts

● Hyperplane: A decision boundary separating different classes in feature

space.
● Support Vectors: Data points closest to the hyperplane, deﬁning the
margin.
● Kernel Function: A function that computes the inner product between
data points in the transformed feature space. Examples include linear,
polynomial, and Radial Basis Function (RBF) kernels.
● Kernel Trick: The ability of kernel methods to work in higher-dimensional
spaces without explicitly computing the transformations.
Support Vector and Kernel Methods

Example: Classiﬁcation in ﬂotation processes

● Scenario: Classifying froth images from a sulfur ﬂotation process.
● Approach: SVMs, combined with kernel methods (e.g., RBF kernel and
multiple-kernel functions), can be used to classify froth images into
different appearance categories based on textural features extracted
from the images.
● Application: Accurate classiﬁcation of froth images, aiding in process
monitoring and optimization.
Analysis of Time Series
Analysis of Time Series

Deﬁnition: Time series analysis involves studying data points collected

over time to identify trends, seasonality, cycles, and random ﬂuctuations.

Key aspects
● Linear Systems Analysis: Utilizes linear models to understand and
predict time series behavior, assuming linear relationships between
variables.
● Nonlinear Dynamics: Explores complex and non-linear patterns in
time series data, employing models like Threshold Autoregressive
(TAR) and Autoregressive Conditional Heteroskedasticity (ARCH)
models.
Analysis of Time Series
Components of Time Series
Analysis of Time Series

Example: Financial forecasting

● Scenario: Predicting stock prices, interest rates, or economic indicators.
● Approach: Linear time series models like ARMA(Autoregressive Moving
Average) or ARIMA(Autoregressive Integrated Moving Average) can be
used for forecasting when relationships are relatively stable. Nonlinear
models like ARCH or GARCH may be more suitable for modeling
volatility and capturing non-linear patterns in ﬁnancial time series.
● Application: Informing investment decisions, risk management
strategies, and economic planning.
Rule Induction
Rule Induction

Deﬁnition: Rule induction is a machine learning technique that extracts

classiﬁcation rules from data, typically in the form of "IF-THEN" statements.

Key concepts
● Sequential Covering: A greedy algorithm that iteratively discovers rules
that cover the positive instances of a class, removing covered instances
from the training data, and repeating the process until all positive
instances are covered.
● Learn-One-Rule: A sub-routine within sequential covering that grows
individual rules by adding conjuncts (conditions) until the rule achieves a
desired accuracy, measured by the ratio of correctly covered positive
instances to all covered instances.
Rule Induction

Example: Predictive maintenance

● Scenario: Predicting machine failures in a factory based on sensor
readings.
● Approach: Rule induction algorithms can analyze historical sensor
data leading to failures and non-failures, generating rules like: "IF
cylinder temperature > 852ºC for 3 consecutive hours THEN machine
failure likely within 24 hours".
● Application: Early alerts of imminent machine breakdowns, enabling
preventative maintenance and reducing downtime.
Rule Induction

Applications
● Expert systems
● Credit scoring
● Medical decision systems.
Neural Networks
Neural Networks

Deﬁnition
Neural networks (also Artiﬁcial Neural Networks or ANNs) are
computational models inspired by the structure and function of biological
neural networks, capable of learning from data and modeling complex
relationships.
Neural Networks

Key aspects
● Learning and Generalization: The process of adjusting network
weights (parameters) based on training data to minimize errors and
enable accurate predictions on unseen data, i.e., generalize.
● Competitive Learning: An unsupervised learning paradigm where
network nodes compete to respond to input data, with the winning
node adapting its weights.
● Principal Component Analysis (PCA) and Neural Networks: PCA can
be used as a preprocessing step to reduce dimensionality and
simplify the input to neural networks, potentially improving training
eﬃciency and generalization performance.
Neural Networks

Example: Image recognition

● Scenario: Classifying images (e.g., identifying objects or faces).
● Approach: Convolutional Neural Networks (CNNs) are a type of ANN
particularly well-suited for image processing, learning features like
edges and shapes through multiple layers of processing, and making
classiﬁcations in the output layer.
● Application: Facial recognition systems, medical image object
segmentation, and more.
Neural Networks

Applications
● Speech recognition
● NLP
● Computer vision.
Fuzzy Logic
Fuzzy Logic

Deﬁnition
Fuzzy logic is a form of logic that allows propositions to be represented with
degrees of truthfulness and falsehood, enabling reasoning with imprecise
and ambiguous information.

Key aspects
● Fuzzy Models from Data: Extracting fuzzy rules and membership
functions from data to represent relationships and build fuzzy models.
● Fuzzy Decision Trees: A generalization of traditional decision trees that
handle attributes with numeric-symbolic values, offering a more
expressive and understandable representation of induced knowledge.
Fuzzy Logic

Example: Control systems

● Scenario: Designing a control system for a process where precise
mathematical models are diﬃcult to obtain.
● Approach: Fuzzy logic can be used to emulate human reasoning and
control actions, encapsulating expertise in terms of linguistic
descriptions and fuzzy inference rules.
● Application: Linear and non-linear control in various engineering
domains, including chemical engineering, robotics, and vehicular
technology.
Fuzzy Logic

Applications
● Control systems (AC, washing machines)
● Decision-making.
Stochastic Search Methods
Stochastic Search Methods

Deﬁnition
Stochastic search methods, also known as stochastic optimization,
employ randomness to explore the design space and ﬁnd optimal
solutions for complex optimization problems.

Key concepts
● Exploration vs. Exploitation: Balancing the search for novel solutions
(exploration) with reﬁning promising existing solutions (exploitation).
● Heuristics: Rules or guidelines, often inspired by natural processes,
that help guide the search process towards optimal solutions.
Stochastic Search Methods

Example: Optimizing hyperparameters in machine learning models

● Scenario: Finding the optimal combination of hyperparameters (e.g.,
kernel function and regularization term C) for a Support Vector
Machine model.
● Approach: Stochastic search methods like Particle Swarm
Optimization (PSO) or Genetic Algorithms (GA) can be used to explore
the hyperparameter space eﬃciently, identifying the best combination
for the given dataset.
● Application: Improved model performance and generalization ability
of machine learning models.
Stochastic Search Methods

Methods include:
● Genetic Algorithms
● Simulated Annealing
● Particle Swarm Optimization(PSO)
Stochastic Search Methods

Applications
● Feature selection
● Scheduling
● AI game playing.
Stochastic Search Methods
Random search path exploring multiple valleys before ﬁnding the global
optimum.

KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
7 Tsa Ri
No ratings yet
7 Tsa Ri
18 pages
Business Analytics Unit 3 Notes - Watermarked
No ratings yet
Business Analytics Unit 3 Notes - Watermarked
16 pages
Module - 03
No ratings yet
Module - 03
28 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
Understanding Predictive Analytics Techniques
No ratings yet
Understanding Predictive Analytics Techniques
10 pages
Finals-Predictive-Time-Series-Analysis - Module
No ratings yet
Finals-Predictive-Time-Series-Analysis - Module
14 pages
Data Analyst Role Tasks Skills
No ratings yet
Data Analyst Role Tasks Skills
21 pages
Predictive Analytics - Wikipedia
No ratings yet
Predictive Analytics - Wikipedia
11 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Time Series Analysis
No ratings yet
Time Series Analysis
5 pages
Introdution To StatisticalAnalysis
No ratings yet
Introdution To StatisticalAnalysis
66 pages
Data Analysis & Regression Guide
No ratings yet
Data Analysis & Regression Guide
136 pages
Data Science: Data Governance Guide
No ratings yet
Data Science: Data Governance Guide
44 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
Descriptive Analytics in Business Decisions
No ratings yet
Descriptive Analytics in Business Decisions
30 pages
Comprehensive Data Science and AI Course
No ratings yet
Comprehensive Data Science and AI Course
43 pages
Data Analyst Role Tasks Skills
No ratings yet
Data Analyst Role Tasks Skills
21 pages
Arima
100% (1)
Arima
65 pages
Business Data Analytics Part 4
No ratings yet
Business Data Analytics Part 4
52 pages
Week 7-8 Data Summary and Model Description Empirical Results
No ratings yet
Week 7-8 Data Summary and Model Description Empirical Results
42 pages
Empirical Finance
No ratings yet
Empirical Finance
5 pages
Time Series Analysis Handbook 04
No ratings yet
Time Series Analysis Handbook 04
16 pages
Cp5293 Big Data Analytics 1
No ratings yet
Cp5293 Big Data Analytics 1
9 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
DSBA Curriculum Booklet
No ratings yet
DSBA Curriculum Booklet
14 pages
DA Unit-2
No ratings yet
DA Unit-2
7 pages
The Handbook of Data Mining - 1st Edition ISBN 0805840818, 9780805840810 Complete EPUB Ebook
No ratings yet
The Handbook of Data Mining - 1st Edition ISBN 0805840818, 9780805840810 Complete EPUB Ebook
17 pages
Wa0016.
No ratings yet
Wa0016.
60 pages
DATA ANALYTICS Unit III
No ratings yet
DATA ANALYTICS Unit III
29 pages
DATA ANALYTICS Unit III & IV
No ratings yet
DATA ANALYTICS Unit III & IV
83 pages
Dr. Gaurav Dixit: Department of Management Studies
No ratings yet
Dr. Gaurav Dixit: Department of Management Studies
26 pages
Data Mining and Analysis: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Analysis: Fundamental Concepts and Algorithms
9 pages
Time Series Forecasting Guide
No ratings yet
Time Series Forecasting Guide
12 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
IV Ai-Ds Ad3491 Fdsa QB Unit5
No ratings yet
IV Ai-Ds Ad3491 Fdsa QB Unit5
4 pages
Machine Learning for Decision Making
No ratings yet
Machine Learning for Decision Making
5 pages
Predictive Analytics
No ratings yet
Predictive Analytics
8 pages
Time Series A Data Analysis Approach Using R by Robert Shumway, David Stoffer
No ratings yet
Time Series A Data Analysis Approach Using R by Robert Shumway, David Stoffer
272 pages
Staiqc Paper6
No ratings yet
Staiqc Paper6
20 pages
Analytics Boot Camp
No ratings yet
Analytics Boot Camp
126 pages
Time Series Analysis for Stock Prediction
No ratings yet
Time Series Analysis for Stock Prediction
21 pages
00 Time Series Analysis - Complete Study Guide
No ratings yet
00 Time Series Analysis - Complete Study Guide
26 pages
Mutivariate and Baysian
No ratings yet
Mutivariate and Baysian
21 pages
Unit3 Datamining
No ratings yet
Unit3 Datamining
5 pages
Predictive Analytics & Supervised Learning
No ratings yet
Predictive Analytics & Supervised Learning
17 pages
Introduction to Predictive Analytics
100% (1)
Introduction to Predictive Analytics
46 pages
Unit - III - PREDICTIVE ANALYTICS
No ratings yet
Unit - III - PREDICTIVE ANALYTICS
28 pages
Understanding Time Trends in Forecasting
No ratings yet
Understanding Time Trends in Forecasting
14 pages
Session1 DataCharacteristics
No ratings yet
Session1 DataCharacteristics
41 pages
Predictive Modeling
No ratings yet
Predictive Modeling
52 pages
Time Series
100% (1)
Time Series
91 pages
Da Ete Notes
No ratings yet
Da Ete Notes
10 pages
Predictive Analytics
No ratings yet
Predictive Analytics
3 pages
Data Analytics
No ratings yet
Data Analytics
34 pages
A Data-Driven Model For Software Reliability Prediction
No ratings yet
A Data-Driven Model For Software Reliability Prediction
32 pages
LHD
No ratings yet
LHD
8 pages
Younity Course Module Feb'24 Edition-2
No ratings yet
Younity Course Module Feb'24 Edition-2
57 pages
ML 4
No ratings yet
ML 4
6 pages
EKONOMETRIKA Dummy Susi
No ratings yet
EKONOMETRIKA Dummy Susi
6 pages
Samtani, 2022
No ratings yet
Samtani, 2022
25 pages
Adedokun Esther
No ratings yet
Adedokun Esther
46 pages
Hypothesis Testing and Z-Test Guide
100% (1)
Hypothesis Testing and Z-Test Guide
42 pages
A Note On Forecast Error
No ratings yet
A Note On Forecast Error
2 pages
YouTube Trending Data Analysis Log
No ratings yet
YouTube Trending Data Analysis Log
5 pages
Concordia University Machine Learning Assaignment With Solutions
No ratings yet
Concordia University Machine Learning Assaignment With Solutions
8 pages
House Price Prediction with Regression
No ratings yet
House Price Prediction with Regression
20 pages
Six Sigma Black Belt PDF
100% (12)
Six Sigma Black Belt PDF
557 pages
Supply Chain Management: Demand Forecasting
No ratings yet
Supply Chain Management: Demand Forecasting
37 pages
MANOVA Analysis of Race and Income
No ratings yet
MANOVA Analysis of Race and Income
10 pages
10 Week Data Analytics Study Plan
No ratings yet
10 Week Data Analytics Study Plan
2 pages
AE 18-19, Lec 4 Multicollinearity Dummy Variables PDF
No ratings yet
AE 18-19, Lec 4 Multicollinearity Dummy Variables PDF
32 pages
Data Analysis and Tools in Legal Research
100% (1)
Data Analysis and Tools in Legal Research
8 pages
Excel for Business Analytics Beginners
No ratings yet
Excel for Business Analytics Beginners
21 pages
Basic Statistics Lesson 1
No ratings yet
Basic Statistics Lesson 1
5 pages
Applied Multivariate Analysis (Multivariate Analysis) DR Amit Mitra Iit Kanpur
0% (1)
Applied Multivariate Analysis (Multivariate Analysis) DR Amit Mitra Iit Kanpur
2 pages
Anuraj Nakarmi: Effect of Sales Promotion On Consumer Behavior
No ratings yet
Anuraj Nakarmi: Effect of Sales Promotion On Consumer Behavior
44 pages
Zabbix 6.0: Anomaly Detection Guide
No ratings yet
Zabbix 6.0: Anomaly Detection Guide
28 pages
Community Forest Management: An Assessment and Explanation of Its Performance Through QCA
No ratings yet
Community Forest Management: An Assessment and Explanation of Its Performance Through QCA
11 pages
Hypothesis Testing Overview and Steps
No ratings yet
Hypothesis Testing Overview and Steps
8 pages
Doing Quantitative Research in Education With IBM SPSS Statistics 3rd Edition Daniel Muijs Ebook Supplementary Edition
No ratings yet
Doing Quantitative Research in Education With IBM SPSS Statistics 3rd Edition Daniel Muijs Ebook Supplementary Edition
43 pages
Data Science Career Profile
No ratings yet
Data Science Career Profile
2 pages
Machine Learning in Production Andrew Kelleher, Adam Kelleher Isbn 978-0!13!4116549 Pearson 1st Edition 2019 282 Pages
No ratings yet
Machine Learning in Production Andrew Kelleher, Adam Kelleher Isbn 978-0!13!4116549 Pearson 1st Edition 2019 282 Pages
282 pages
Data Analytics Project Guide
No ratings yet
Data Analytics Project Guide
6 pages
STAT 432: Quiz II Review
No ratings yet
STAT 432: Quiz II Review
5 pages
8 Homogenitas
No ratings yet
8 Homogenitas
23 pages

Data Analytics Unit - II Data Analysis

Uploaded by

Data Analytics Unit - II Data Analysis

Uploaded by

Data Analytics

Unit - II | Data Analysis

● Deﬁnition: Data analysis involves cleaning, transforming, modeling,

Deﬁne why you need Data

Interprets results and Begin collecting data

Begin analyzing the data Clean through

● Simple Linear Regression: Models the relationship between a single

Multivariate Analysis is a collection of statistical techniques used to

Types of Multivariate Analysis

1. Multivariate Regression Analysis – Predicts multiple dependent variables

Why Use Bayesian Modeling in Data Analytics?

Example in Data Analytics

● Hyperplane: A decision boundary separating different classes in feature

Example: Classiﬁcation in ﬂotation processes

Deﬁnition: Time series analysis involves studying data points collected

Example: Financial forecasting

Deﬁnition: Rule induction is a machine learning technique that extracts

Example: Predictive maintenance

Example: Image recognition

Example: Control systems

Example: Optimizing hyperparameters in machine learning models

You might also like