0% found this document useful (0 votes)

4 views8 pages

Ics054 Unit 2a

Uploaded by

temp790648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views8 pages

Ics054 Unit 2a

Uploaded by

temp790648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

UNIT-2

DATA ANALYSIS
REGRESSION
Regression in data analytics is a statistical technique used to model and analyze the relationship between a
dependent variable and one or more independent variables, enabling predictions and insights from data.

Regression analysis is a statistical and predictive modeling technique used in data analytics to understand
and quantify the relationship between a dependent (target) variable and one or more independent
(predictor) variables. It helps measure how changes in independent variables affect the dependent
variable and is widely used for prediction, forecasting, and causal analysis.

PURPOSE OF REGRESSION ANALYSIS

 Understand relationships between variables
 Predict future values of the target variable based on predictors
 Forecast trends using historical data
 Identify key factors influencing outcomes (positive or negative impacts)
 Support data-driven decision-making across industries

TYPES OF REGRESSION
 LINEAR REGRESSION: Models a linear relationship between one dependent variable and one or
more independent variables. It finds the best fit line minimizing the difference between observed
and predicted values.

 MULTIPLE REGRESSIONS: Extension of linear regression with multiple predictor variables.

 POLYNOMIAL REGRESSION: Models nonlinear relationships by introducing polynomial terms.
 LOGISTIC REGRESSION: Used for classification problems where dependent variable is categorical.

HOW REGRESSION WORKS

 Data Collection and Preparation: Gather data on dependent and independent variables, clean it,
and handle missing or noisy data.
 Model Selection: Choose an appropriate regression model based on data type, relationships, and
problem objective.
 Parameter Estimation: Use methods like least squares to find coefficients that best fit the data.
 Model Evaluation: Assess accuracy using metrics like R-squared (explained variance), mean
squared error, and visualize residuals.
 Prediction and Interpretation: Use the model to predict outcomes and interpret relationships
between variables.
2
IMPORTANT CONCEPTS

Dependent Variable: The outcome being predicted or explained.

Independent Variables: Factors influencing the outcome.
Intercept: The expected value of the dependent variable when all independent variables are zero.
Coefficient: The estimated impact of an independent variable on the dependent variable.
Residuals: Differences between observed and predicted values, used to measure model fit.

LINEAR REGRESSION
Linear Regression is a supervised machine learning algorithm and statistical technique used to model the
linear relationship between a dependent (target) variable and one or more independent (predictor)
variables. It predicts the value of the dependent variable based on the independent variables, assuming a
straight-line relationship.
Linear regression describes the connection between one dependent variable (outcome) and one or more
independent variables (predictors) using a straight line. The relationship is mathematically represented by
the equation:

Y=a+bX
● Y: Dependent variable (what you want to predict)
● X: Independent variable(s) (used for prediction)
● a: Intercept (value of Y when X=0)
● b: Slope (how much Y changes with one unit of X)

TYPES OF LINEAR REGRESSION

● Simple Linear Regression: 1 predictor and 1outcome variable (e.g., studying hours and exam
scores).
● Multiple Linear Regression: Several predictors for 1 outcome (ex: height & gender predicting weight)

MULTIPLE LINEAR REGRESSION

3
LOGOSTIC REGRESSION
Logistic Regression is a statistical and supervised machine learning algorithm used for classification
problems where the dependent variable is binary or categorical (often coded as 0 or 1). Unlike linear
regression that predicts continuous values, logistic regression predicts the probability of an outcome
belonging to a particular class, such as "yes" or "no," "success" or "failure."
 Logistic regression maps independent variables to probabilities using an S-shaped curve (sigmoid
function), not a straight line
 It classifies results based on a probability threshold (often 0.5 : above = event occurs; below = event
does not occur)
PURPOSE
 Predict the probability of a binary event occurring.
 Classify observations into two categories based on predictor variables.
 Model relationships where the outcome is categorical and explanatory variables can be continuous
or categorical.

EQUATION
Despite its name, logistic regression is a classification algorithm, not a regression algorithm. It predicts the
probability that a given input point belongs to a particular class using the logistic (sigmoid) function.

EXAMPLE
4

EXAMPLE

PREDICTING IF A STUDENT PASSES AN EXAM

Hours Studied (X) Pass (Y)

1 0
2 0
3 0
4 1

5 1
6 1

Predict for a Student Who Studied 3.5 Hours

MULTIVARIATE ANALYSIS
COMPARISON OF UNIVARIATE, BIVARIATE, AND MULTIVARIATE ANALYSIS

NUMBER
ANALYSIS KEY TECHNIQUES AND
OF PURPOSE EXAMPLE
TYPE FOCUS
VARIABLES
Summarize and describe Descriptive statistics—
One characteristics of a single Heights of students mean, median, mode;
Univariate
variable variable without in a class measures of dispersion;
investigating relationships histograms, bar charts
Relationship
Explore relationship or Correlation, scatter
Two between
Bivariate association between two plots, simple linear
variables temperature and
variables regression
ice cream sales
Analyze interactions and Multiple linear/logistic
More than Analyzing ad type,
relationships among regression, MANOVA,
Multivariate two gender, and click
multiple variables PCA, factor & cluster
variables rates together
simultaneously analysis
.
Multivariate analysis is a collection of powerful statistical techniques used to analyze data involving more
than two variables at a time, facilitating the discovery of complex relationships and patterns in large
datasets.
Multivariate analysis evaluates the relationships among multiple variables simultaneously, moving beyond
univariate (one variable) and bivariate (two variable) approaches. It is invaluable for extracting richer
insights from data and is widely used in fields such as marketing, healthcare, finance, social sciences, and
environmental studies.
 Discovers patterns and associations that may not be visible looking at one or two variables alone.
 Empowers more accurate prediction, decision-making, and error correction.
 Allows companies and researchers to analyze complex phenomena by considering all relevant
influencing factors.

METHODS AND TECHNIQUES

● MULTIPLE LINEAR REGRESSION
○ Predicts a single outcome using several predictors.
● MULTIPLE LOGISTIC REGRESSION
○ Models a binary outcome using multiple variables.
● MANOVA (MULTIVARIATE ANALYSIS OF VARIANCE)
○ Tests for significant differences in multiple dependent variables across groups.
6
● FACTOR ANALYSIS
○ Identifies underlying dimensions or factors among related variables.
● PRINCIPAL COMPONENT ANALYSIS (PCA)
○ Reduces data complexity by combining correlated variables into principal components.
● CLUSTER ANALYSIS
○ Classifies observations into groups based on similarities among multiple variables.

TYPES
● Dependence Techniques
○ One or more dependent variables are predicted by independent variables (e.g., regression,
MANOVA).
● Interdependence Techniques
○ No clear dependent variable; seeks to reveal structure or patterns among variables (e.g.,
factor analysis, cluster analysis).

FACTOR ANALYSIS
Factor analysis aims to uncover hidden patterns or structures within data by grouping related variables
into factors. These factors represent shared variance, summarizing the information in several correlated
variables with fewer dimensions, making data easier to analyze and interpret.

WORK
 Collect data on many variables believed to be related.
 Compute the correlation or covariance matrix to assess relationships between variables.
 Extract common factors that explain the maximum shared variance.
 Each observed variable loads onto one or more factors with different weights (factor loadings).
 The end result is a smaller number of factors representing most of the information in the dataset.

WHY
 Dimensionality Reduction: Decreases the number of variables to work with.
 Identify Latent Constructs: Reveals hidden dimensions like personality traits or customer
satisfaction.
 Simplify Data: Helps summarize data for better visualization and understanding.
 Variable Selection: Identifies which variables contribute most to key factors.
 Improve Models: Reduces multicollinearity and noise in predictive modeling.

APPLICATIONS
 Psychology and social sciences (e.g., measuring personality or attitudes).
 Marketing (e.g., understanding customer preferences).
 Finance (e.g., analyzing market factors influencing asset prices).
 Health sciences (e.g., grouping symptoms into syndromes)
7

CLUSTER ANALYSIS
Cluster analysis is a statistical method used to group a set of objects or data points into clusters based on
their similarities, where objects within a cluster are more similar to each other than to those in other
clusters.
 It is an unsupervised learning technique—meaning no predefined labels or categories are needed.
 The goal is to discover natural groupings or structures in data.
 Clusters formed have high internal similarity and are well-separated from other clusters.

WORKS
1. Choose the clustering method like k-means, hierarchical, or two-step clustering depending on the
data size and goals.
2. Select number of clusters or let the algorithm determine the best number based on data patterns.
3. Choose variables/features to include for measuring similarity.
4. Calculate similarity or distance between data points using metrics such as Euclidean distance,
Manhattan distance, or cosine similarity.
5. Assign data points into clusters where the total intra-cluster distance is minimized.
6. Visualize and interpret the clusters using plots or dendrograms.

COMMON APPLICATIONS
 Marketing: Customer segmentation to tailor marketing strategies.
 Healthcare: Grouping patients by symptoms to assist in diagnosis.
 Biology: Classifying species based on traits.
 Retail: Market basket analysis to find product groupings.

PRINCIPAL COMPONENT ANALYSIS (PCA)

Principal Component Analysis (PCA) is a statistical technique used to reduce the dimensionality of large
datasets while retaining most of the original information. It transforms potentially correlated variables into
a smaller set of new, uncorrelated variables called principal components, which capture the maximum
variance in the data.
 PCA creates new variables (principal components) that are linear combinations of original variables.
 The first principal component explains the largest amount of variation in the dataset.
 Each subsequent component explains the next highest amount of variation, with the constraint
that it is uncorrelated (orthogonal) to the previous components.
 By using a few principal components, you reduce complexity while preserving essential
information.

PCA WORKS (STEP-BY-STEP)

8
1. Standardize Data: Variables r scaled to have mean zero & unit variance to make variables
comparable.
2. Compute Covariance Matrix: Shows how variables vary together.
3. Calculate Eigenvalues and Eigenvectors: Eigenvectors determine the direction of principal
components; eigenvalues measure the amount of variance explained.
4. Select Principal Components: Choose components with the highest eigenvalues (variance
explained).
5. Transform Data: Original data projected into new components to create a smaller, simplified
dataset.

MANOVA (MULTIVARIATE ANALYSIS OF VARIANCE)

Multivariate Analysis of Variance (MANOVA) is a statistical technique used to compare the means of
multiple dependent variables across two or more groups defined by one or more categorical independent
variables. It extends the Analysis of Variance (ANOVA) by evaluating several outcome variables
simultaneously rather than one at a time.
 MANOVA tests if the group means on a combination of dependent variables differ significantly.
 Considers correlations & interrelations among dependent variables while assessing group
differences.
 Commonly used when researchers want to understand how multiple outcomes vary by group or
treatment type.

MANOVA WORKS
 Tests the null hypothesis that the mean vectors of all dependent variables are equal across groups.
 Calculates test statistics such as Wilks’ Lambda, Pillai’s Trace, Hotelling’s Trace, and Roy’s Largest
Root to determine significance.
 Assumes multivariate normality, homogeneity of covariance matrices, independence of
observations, and linear relationships among variables.

TYPES OF MANOVA
 One-Way MANOVA: Compares groups based on one independent variable.
 Two-Way MANOVA: Examines effects of two independent variables and their interaction.
 Repeated Measures MANOVA: Measures same subjects multiple times under different conditions.

Regression
No ratings yet
Regression
7 pages
Da 2
No ratings yet
Da 2
31 pages
Unit 2 Data Analytics
No ratings yet
Unit 2 Data Analytics
33 pages
Correlation and Regression
No ratings yet
Correlation and Regression
3 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
Regression Analysis in ML
No ratings yet
Regression Analysis in ML
1 page
Unit 2
No ratings yet
Unit 2
76 pages
Unit 2 Da
No ratings yet
Unit 2 Da
31 pages
Predictive Analytics
No ratings yet
Predictive Analytics
22 pages
Notes of DA Unit-II
No ratings yet
Notes of DA Unit-II
91 pages
Regression Analysis Handouts
No ratings yet
Regression Analysis Handouts
12 pages
Data Analysis & Regression Guide
No ratings yet
Data Analysis & Regression Guide
136 pages
Regression Vs Correlation
No ratings yet
Regression Vs Correlation
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
83 pages
Unit 2
No ratings yet
Unit 2
48 pages
Module 3
No ratings yet
Module 3
34 pages
Aiml Module 3 Part 3
No ratings yet
Aiml Module 3 Part 3
12 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
ML - Module 3 Chapter 5
No ratings yet
ML - Module 3 Chapter 5
10 pages
Unit 7 8614
No ratings yet
Unit 7 8614
35 pages
Unit 1 Regression
No ratings yet
Unit 1 Regression
26 pages
ML Unit-III Notes
No ratings yet
ML Unit-III Notes
83 pages
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
No ratings yet
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
12 pages
Unit III
No ratings yet
Unit III
13 pages
Badm 21 Green
No ratings yet
Badm 21 Green
19 pages
Unit 2
No ratings yet
Unit 2
6 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
BDA Unit 5
No ratings yet
BDA Unit 5
8 pages
Income Tax
No ratings yet
Income Tax
9 pages
Ida Unit-3
No ratings yet
Ida Unit-3
34 pages
01 Multivariate Analysis
100% (1)
01 Multivariate Analysis
40 pages
CH 5
No ratings yet
CH 5
36 pages
Unit 7 8614
No ratings yet
Unit 7 8614
35 pages
Mutivariate and Baysian
No ratings yet
Mutivariate and Baysian
21 pages
Tendency, Etc
No ratings yet
Tendency, Etc
1 page
BST 32202 Linear Regression 1 Introduction
No ratings yet
BST 32202 Linear Regression 1 Introduction
12 pages
003-Forecasting Techniques Detailed
No ratings yet
003-Forecasting Techniques Detailed
20 pages
Forecasting Models & Regression Analysis
No ratings yet
Forecasting Models & Regression Analysis
13 pages
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
No ratings yet
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
16 pages
Data Analysis Notes
No ratings yet
Data Analysis Notes
9 pages
MGT782 - Ia2 - Azura MD Radzi
No ratings yet
MGT782 - Ia2 - Azura MD Radzi
15 pages
Ba Reporting
No ratings yet
Ba Reporting
3 pages
Cha 6
No ratings yet
Cha 6
8 pages
Bda Unit 5
No ratings yet
Bda Unit 5
14 pages
Unit 2
No ratings yet
Unit 2
44 pages
Ra Web
No ratings yet
Ra Web
70 pages
Assignment On Regression
100% (1)
Assignment On Regression
11 pages
Correlation & Regression Analysis Guide
No ratings yet
Correlation & Regression Analysis Guide
49 pages
Module5 Bigdata Analytics
No ratings yet
Module5 Bigdata Analytics
110 pages
Dva 2
No ratings yet
Dva 2
13 pages
ASM Using R 2 Marks Answer Keys
100% (1)
ASM Using R 2 Marks Answer Keys
10 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Correlation
No ratings yet
Correlation
5 pages
Business Intelligence - Chapter 6
No ratings yet
Business Intelligence - Chapter 6
43 pages
6th Sem Project
No ratings yet
6th Sem Project
48 pages
Inferential Statistics & Regression
No ratings yet
Inferential Statistics & Regression
22 pages
Predictive Analytics-Mid Sem Exam Question Bank
No ratings yet
Predictive Analytics-Mid Sem Exam Question Bank
28 pages
CH 6
No ratings yet
CH 6
43 pages
SOLVED NUMERICALS EXAMPLES in Machine Learning
No ratings yet
SOLVED NUMERICALS EXAMPLES in Machine Learning
59 pages
Dedupe Documentation: Release 2.0.0
No ratings yet
Dedupe Documentation: Release 2.0.0
60 pages
Cricket Player Data Analysis Using Clustering Technique
No ratings yet
Cricket Player Data Analysis Using Clustering Technique
5 pages
Question Paper Code:: (10×2 20 Marks)
No ratings yet
Question Paper Code:: (10×2 20 Marks)
2 pages
NoteGPT AI PPT Maker 1728839183012
No ratings yet
NoteGPT AI PPT Maker 1728839183012
18 pages
Vocational Training Report On AI
No ratings yet
Vocational Training Report On AI
57 pages
B.Tech CSE Semester V-VI Course Outline
No ratings yet
B.Tech CSE Semester V-VI Course Outline
38 pages
20cs20906 - Advanced Machine Learning - Question Bank
No ratings yet
20cs20906 - Advanced Machine Learning - Question Bank
5 pages
3D Bounding Box Estimation For Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results
No ratings yet
3D Bounding Box Estimation For Autonomous Vehicles by Cascaded Geometric Constraints and Depurated 2D Detections Using 3D Results
11 pages
Artificial Intelligence Certification
No ratings yet
Artificial Intelligence Certification
8 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
Lol 1
No ratings yet
Lol 1
7 pages
Layout Similarity
No ratings yet
Layout Similarity
18 pages
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
No ratings yet
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
10 pages
PCA & Clustering with R: Homework Guide
No ratings yet
PCA & Clustering with R: Homework Guide
1 page
Data Science Basics for Beginners
No ratings yet
Data Science Basics for Beginners
16 pages
Mit Data Science Machine Learning Program Brochure
No ratings yet
Mit Data Science Machine Learning Program Brochure
17 pages
Consumer Emotions in Research
No ratings yet
Consumer Emotions in Research
18 pages
Network Modelling and Variational Bayesian Inference For Structure Analysis of Signed Networks
No ratings yet
Network Modelling and Variational Bayesian Inference For Structure Analysis of Signed Networks
19 pages
Literature Review On Sampling Techniques
100% (3)
Literature Review On Sampling Techniques
7 pages
Unit 4 Data Analytics
No ratings yet
Unit 4 Data Analytics
11 pages
Cluster-Driven Predictive Model For Asphalt Pavement Maximum Temperature in Tropical Airport
No ratings yet
Cluster-Driven Predictive Model For Asphalt Pavement Maximum Temperature in Tropical Airport
20 pages
Romi DM Aug2020
100% (1)
Romi DM Aug2020
721 pages
Aiet Brochure
No ratings yet
Aiet Brochure
14 pages
Module 1
No ratings yet
Module 1
18 pages
Machine Learning Methods in Finance
No ratings yet
Machine Learning Methods in Finance
64 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
5 pages
Textual Data Science With R Chapman Hall CRC Computer Science Data Analysis 1st Edition Mónica Bécue-Bertaut Download
No ratings yet
Textual Data Science With R Chapman Hall CRC Computer Science Data Analysis 1st Edition Mónica Bécue-Bertaut Download
163 pages
Literature Review Example Electrical Engineering
100% (3)
Literature Review Example Electrical Engineering
7 pages
Exploring Wav2vec 2.0 Fine Tuning For Improved Speech Emotion Recognition
No ratings yet
Exploring Wav2vec 2.0 Fine Tuning For Improved Speech Emotion Recognition
5 pages

Ics054 Unit 2a

Uploaded by

Ics054 Unit 2a

Uploaded by

UNIT-2

PURPOSE OF REGRESSION ANALYSIS

 MULTIPLE REGRESSIONS: Extension of linear regression with multiple predictor variables.

HOW REGRESSION WORKS

Dependent Variable: The outcome being predicted or explained.

TYPES OF LINEAR REGRESSION

MULTIPLE LINEAR REGRESSION

PREDICTING IF A STUDENT PASSES AN EXAM

Hours Studied (X) Pass (Y)

Predict for a Student Who Studied 3.5 Hours

METHODS AND TECHNIQUES

PRINCIPAL COMPONENT ANALYSIS (PCA)

PCA WORKS (STEP-BY-STEP)

MANOVA (MULTIVARIATE ANALYSIS OF VARIANCE)

You might also like