Objective Segmentation

Uploaded by

vagdevitanuku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views21 pages

Objective Segmentation

Uploaded by

vagdevitanuku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Objective

Segmentation
Decision Tree
• The goal is to create a model that predicts the value of a target variable
based on several input variables.
• Each interior node corresponds to one of the input variables; there are
edges to children for each of the possible values of that input variable.
• Each leaf represents a value of the target variable given the values of
the input variables represented by the path from the root to the leaf.
• Classification tree analysis – Class label
• Regression tree analysis – Real number
Decision Tree Construction
Specific decision-tree algorithms:
 ID3 (Iterative Dichotomiser 3)
 C4.5 (successor of ID3)
 CART (Classification And Regression Tree)
 CHAID (CHI-squared Automatic Interaction Detector). Performs multi-
level splits
 MARS: extends decision trees to handle numerical data better.
 Conditional Inference Trees. Statistics-based approach that uses non-
parametric tests as splitting criteria, corrected for multiple testing to avoid
over fitting
Advantages of Decision Trees
• Simple to understand and interpret
• Requires little data preparation
• Able to handle both numerical and categorical
• Uses a white box model
• Possible to validate a model using statistical tests
• Robust
• Performs well with large datasets
Tools to construct Decision Trees
 Salford Systems CART (which licensed the proprietary code of the original CART
authors)
 IBM SPSS Modeler
 Rapid Miner
 SAS Enterprise Miner
 Matlab
 R (an open source software environment for statistical computing which includes
several CART implementations such as rpart, party and random Forest packages)
 Weka (a free and open-source data mining suite, contains many decision tree
algorithms)
 Orange (a free data mining software suite, which includes the tree module orngTree)
 KNIME
 Microsoft SQL Server
 Scikit-learn
CHAID: CHI-squared Automatic Interaction
Detector
• Morgan and Sonquist (1963)
• AID - Automatic Interaction Detection
• Stepwise splitting
• One split of k categories into two groups – 2k-1 possible splits
• Kass (1980) proposed a modification to AID called CHAID for
categorized dependent and independent variables.
Key features
1.Categorical Variables
2.Chi-squared Test: determine the best split for a node
3.Multiple Branches
4.Merging Categories: If no statistically significant difference
is found among different categories of a variable
5.Stopping Criteria: no statistically significant splits are
found or minimum node size.
6.Applications: marketing for segmenting customers , risk
assessment, predicting response rates etc..
7.Visualization: the tree structure highlights the hierarchy
of significant variables that lead to different segments.
Algorithm-step 1
Dividing the cases that reach a certain node in the
tree
1. Cross tabulate the response variable (target) with
each of the explanatory variables.

Gender=male Gender=Female
Yes 12 0
No 1 13
Algorithm – step 2
2. When there are more than two columns, find the
"best" subtable formed by combining column
categories
2.1 This is applied to each table with more than 2
columns.
2.2 Compute Pearson X2 tests for independence for each
allowable subtable
2.3 Look for the smallest X2 value. If it is not
significant, combine the column categories.
2.4 Repeat step 2 if the new table has more than two
columns
Algorithm – step 3
3 Allows categories combined at step 2 to be broken apart.
3.1 For each compound category consisting of at least 3 of the original
categories, find the “most significant" binary split
3.2 if X2 is significant, implement the split and return to step 2.
3.3 otherwise retain the compound categories for this variable, and move on to
the next variable
Algorithm - Step 4
4. You have now completed the “optimal” combining of categories for
each explanatory variable.
4.1 Find the most significant of these “optimally” merged explanatory variables
4.2 Compute a “Bonferroni” adjusted chi-squared test of independence for the reduced
table for each explanatory variable.
Algorithm – Step 5
5 Use the “most significant" variable in step 4 to
split the node with respect to the merged categories
for that variable.
5.1 repeat steps 1-5 for each of the offspring nodes.
5.2 Stop if
• no variable is significant in step 4.
• the number of cases reaching a node is below a specified
limit.
CART – Classification and Regression
Tree
• CART algorithm was introduced in Breiman et al. (1986).
• A CART tree is a binary decision tree that is constructed by splitting a
node into two child nodes repeatedly, beginning with the root node
that contains the whole learning sample.
• The CART growing method attempts to maximize within-node
homogeneity.
• Gini Index – impurity measure
Gini Index

• Another sensible measure of impurity

(i and j are classes)

• After applying attribute A, the resulting Gini index is

• Gini can be interpreted as expected error rate

Gini Index

. .

. .
. .

Attributes: color, border, dot

Classification: triangle, square

16
Gini Index for Color
. .
. .
. .
. .
red

Color? green

.
yellow .

.
.

17
Gain of Gini Index

18
Regression Tree
Overfitting- pruning
• In order to fit the data (even noisy data), the model
keeps generating new nodes and ultimately the tree
becomes too complex to interpret.
Pre vs Post pruning
• Pre pruning - The hyperparameters are max_depth,
min_samples_leaf, and min_samples_split
• Post pruning - the model might slightly increase the
training error but drastically decrease the testing error
Tree Score = SSR + alpha*T, where alpha is a tuning
parameter -Cross Validation.
SSR –sum of squared residuals, T – Tree complexity
penalty

Data Mining: Decision Trees & CHAID
No ratings yet
Data Mining: Decision Trees & CHAID
18 pages
An Overview of Data Mining2016-PAKTAUFIK-MPK
No ratings yet
An Overview of Data Mining2016-PAKTAUFIK-MPK
29 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
CART - Machine Learning
No ratings yet
CART - Machine Learning
29 pages
Module 2 CARTAlgorithm
No ratings yet
Module 2 CARTAlgorithm
13 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
BA CH 12 PPT
No ratings yet
BA CH 12 PPT
50 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
Unit 4
No ratings yet
Unit 4
22 pages
Classification Trees - CART and CHAID
No ratings yet
Classification Trees - CART and CHAID
50 pages
Dadm s16 Cart
No ratings yet
Dadm s16 Cart
18 pages
Cart
No ratings yet
Cart
24 pages
C4.5 vs CHAID: Decision Tree Algorithms
No ratings yet
C4.5 vs CHAID: Decision Tree Algorithms
30 pages
223a1131 ML Exp 4
No ratings yet
223a1131 ML Exp 4
9 pages
Data Science Concepts Lesson04 Decision Tree Concepts
No ratings yet
Data Science Concepts Lesson04 Decision Tree Concepts
22 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
CHAID Decision Tree
No ratings yet
CHAID Decision Tree
14 pages
Data Analytics - Unit-IV
No ratings yet
Data Analytics - Unit-IV
21 pages
Unit 2
No ratings yet
Unit 2
29 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Unit 4
No ratings yet
Unit 4
15 pages
Cart Introduction Beamer
No ratings yet
Cart Introduction Beamer
18 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
Presentation On Decision Tree
No ratings yet
Presentation On Decision Tree
39 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Decision Trees: Example
No ratings yet
Decision Trees: Example
14 pages
Business Data Mining Week 11
No ratings yet
Business Data Mining Week 11
15 pages
CART vs. CHAID: Classification Trees Explained
No ratings yet
CART vs. CHAID: Classification Trees Explained
15 pages
Classification and Regression Trees Overview
No ratings yet
Classification and Regression Trees Overview
37 pages
(Oke) Vt6sutton PDF
No ratings yet
(Oke) Vt6sutton PDF
27 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Overview of Decision Tree Algorithms
No ratings yet
Overview of Decision Tree Algorithms
21 pages
Decision Trees - CHAID AND CART 2019 PDF
No ratings yet
Decision Trees - CHAID AND CART 2019 PDF
44 pages
Tree Construction Principles in Data Mining
No ratings yet
Tree Construction Principles in Data Mining
48 pages
Week 7
No ratings yet
Week 7
32 pages
Discuss The Concept of Pruning in Decision Trees and Its Role in Preventing Overfitting
No ratings yet
Discuss The Concept of Pruning in Decision Trees and Its Role in Preventing Overfitting
3 pages
Ch5 Part1 SystematicOverview
No ratings yet
Ch5 Part1 SystematicOverview
30 pages
Unit Iii DM
No ratings yet
Unit Iii DM
48 pages
Big Data Classification Basics
No ratings yet
Big Data Classification Basics
47 pages
68546bc500cdd
No ratings yet
68546bc500cdd
6 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Unit IV
No ratings yet
Unit IV
32 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
23 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
25 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
AST Day 3 Slides
No ratings yet
AST Day 3 Slides
79 pages
1
No ratings yet
1
2 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
IEEE Conference Tuttor Connect
No ratings yet
IEEE Conference Tuttor Connect
5 pages
Class 3 - Testing & Validating MVP
No ratings yet
Class 3 - Testing & Validating MVP
40 pages
Lab04 MSAccess
No ratings yet
Lab04 MSAccess
9 pages
GROK - Comprehensive Report On Top 20 AI Image Generation Communities
No ratings yet
GROK - Comprehensive Report On Top 20 AI Image Generation Communities
9 pages
Chapter 1 Spreadsheet Basics
No ratings yet
Chapter 1 Spreadsheet Basics
12 pages
LOGIQ E10s R3 Release Note - English - UM - 5862474-1EN - 6
No ratings yet
LOGIQ E10s R3 Release Note - English - UM - 5862474-1EN - 6
31 pages
5th Sem Syllabus
No ratings yet
5th Sem Syllabus
4 pages
Jahanzeb Jameel: Education Skills
No ratings yet
Jahanzeb Jameel: Education Skills
1 page
Decoding Animal Convertion Using Ai
No ratings yet
Decoding Animal Convertion Using Ai
13 pages
Linux File Viewing Commands Explained
No ratings yet
Linux File Viewing Commands Explained
5 pages
Cloud Basic Abbreviations and Terms Glossary
No ratings yet
Cloud Basic Abbreviations and Terms Glossary
3 pages
E-Commerce Blueberry Project
No ratings yet
E-Commerce Blueberry Project
3 pages
Microsemi SmartFusion2 Microcontroller Subsystem User Guide UG0331 V15
No ratings yet
Microsemi SmartFusion2 Microcontroller Subsystem User Guide UG0331 V15
829 pages
Absolute vs. Relative URLs Guide
0% (1)
Absolute vs. Relative URLs Guide
3 pages
Troubleshooting Directory For LHB Type RMPU EOG AC Coach-Eng PDF
100% (2)
Troubleshooting Directory For LHB Type RMPU EOG AC Coach-Eng PDF
44 pages
Comma: Brief History of ASCII Code
No ratings yet
Comma: Brief History of ASCII Code
4 pages
G200 GPS Tracker User Manual
No ratings yet
G200 GPS Tracker User Manual
6 pages
Introduction to Computer Programming
No ratings yet
Introduction to Computer Programming
17 pages
PKI Leave Management System Overview
No ratings yet
PKI Leave Management System Overview
8 pages
Amazon - in - B.R. Gupta - Books
0% (1)
Amazon - in - B.R. Gupta - Books
1 page
Goal Programming 1
No ratings yet
Goal Programming 1
9 pages
Aeries Teacher Guide: Setup & Usage
No ratings yet
Aeries Teacher Guide: Setup & Usage
18 pages
BDPA U4A1 GomezDelgado OrtegaGarcia
No ratings yet
BDPA U4A1 GomezDelgado OrtegaGarcia
14 pages
Basic Parts and Function of Microsoft Word
73% (11)
Basic Parts and Function of Microsoft Word
7 pages
System Calls in OS
No ratings yet
System Calls in OS
7 pages
SANS Scrutinizing A Web Based Chris Kosmas 110725
No ratings yet
SANS Scrutinizing A Web Based Chris Kosmas 110725
66 pages
Fuzzy Surfaces in GIS and Geographical Analysis: Theory, Analytical Methods, Algorithms, and Applications
No ratings yet
Fuzzy Surfaces in GIS and Geographical Analysis: Theory, Analytical Methods, Algorithms, and Applications
167 pages
TDeMare Resume 23
No ratings yet
TDeMare Resume 23
2 pages
Ocnos Ansible Guide
No ratings yet
Ocnos Ansible Guide
57 pages
RAID Adaptec ASR-6405E User Guide Servak
No ratings yet
RAID Adaptec ASR-6405E User Guide Servak
145 pages