We Now Discuss The Process of Building A Regression Tree

The document discusses the process of building a regression tree which involves dividing the predictor space into distinct regions and making predictions based on the mean response of training observations within each region. It then elaborates on how the regions are constructed using a top-down, greedy approach called recursive binary splitting which selects the predictor and cutpoint that minimize the residual sum of squares at each step.

Uploaded by

karinduarte30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

We Now Discuss The Process of Building A Regression Tree

Uploaded by

karinduarte30

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

We now discuss the process of building a regression tree.

Roughly speaking, there are two

steps

. 1. We divide the predictor space — that is, the set of possible values for X1, X2,...,Xp — into J
distinct and non-overlapping regions, R1, R2,...,RJ

. 2. For every observation that falls into the region Rj , we make the same prediction, which is
simply the mean of the response values for the training observations in Rj .

For instance, suppose that in Step 1 we obtain two regions, R1 and R2, and that the response
mean of the training observations in the first region is 10, while the response mean of the
training observations in the second region is 20. Then for a given observation X = x, if x ∈ R1 we
will predict a value of 10, and if x ∈ R2 we will predict a value of 20.

We now elaborate on Step 1 above. How do we construct the regions R1,...,RJ ? In theory, the
regions could have any shape. However, we choose to divide the predictor space into high-
dimensional rectangles, or boxes, for simplicity and for ease of interpretation of the resulting
predic tive model. The goal is to find boxes R1,...,RJ that minimize the RSS, given by

∑ ∑ (𝑌𝐼 − 𝑌̂𝑅𝐽 )2
𝐽=1 𝐼€𝑅𝐽

where 𝑌̂𝑅𝐽 is the mean response for the training observations within the jth box.
Unfortunately, it is computationally infeasible to consider every possible partition of the
feature space into J boxes. For this reason, we take a top-down, greedy approach that is
known as recursive binary splitting. The approach is top-down because it begins at the top of
the tree (at which point all observations belong to a single region) and then successively splits
the predictor space; each split is indicated via two new branches further down on the tree. It is
greedy because at each step of the tree-building process, the best split is made at that
particular step, rather than looking ahead and picking a split that will lead to a better tree in
some future step.

In order to perform recursive binary splitting, we first select the pre dictor Xj and the cutpoint
s such that splitting the predictor space into the regions {X|𝑋𝐽 < s} and {X|𝑋𝐽 ≥ s} leads to the
greatest possible reduction in RSS. (The notation {X|𝑋𝐽 < s} means the region of predictor

space in which Xj takes on a value less than s.) That is, we consider all predictors X1,...,Xp, and
all possible values of the cutpoint s for each of the predictors, and then choose the predictor
and cutpoint such that the resulting tree has the lowest RSS. In greater detail, for any j and s,
we define the pair of half-planes

𝑅1 (j, s) = {X|𝑋𝐽 < s} and R2(j, s) = {X|𝑋𝐽 ≥ s},

and we seek the value of j and s that minimize the equation

∑ (𝑦𝑖 − 𝑌̂𝑅1 )2 + ∑ (𝑦𝑖 − 𝑌̂𝑅2 )2
𝑖:𝑥𝑖 €𝑅1 (𝑗,𝑠) 𝑖:𝑥𝑖 €𝑅2 (𝑗,𝑠)

where 𝑌̂𝑅1 is the mean response for the training observations in 𝑅1 (j, s), and 𝑌̂𝑅2 is the me an
response for the training observations in 𝑅2 (j, s). Finding the values of j and s that minimize
can be done quite quickly, especially when the number of features p is not too large. Next, we
repeat the process, looking for the best predictor and best cutpoint in order to split the data
further so as to minimize the RSS within each of the resulting regions. However, this time,
instead of splitting the entire predictor space, we split one of the two previously identified
regions. We now have three regions. Again, we look to split one of these three regions further,
so as to minimize the RSS. The process continues until a stopping criterion is reached; for
instance, we may continue until no region contains more than five observations.

Once the regions R1,...,RJ have been created, we predict the response for a given test
observation using the mean of the training observations in the region to which that test
observation belongs

Stats 3
No ratings yet
Stats 3
3 pages
Trees Handout
No ratings yet
Trees Handout
51 pages
Chapter 7 - Trees
No ratings yet
Chapter 7 - Trees
80 pages
Module09 TreeBasedMethods
No ratings yet
Module09 TreeBasedMethods
36 pages
Module10 TreeBasedMethods
No ratings yet
Module10 TreeBasedMethods
33 pages
08 Tree Regression 1
No ratings yet
08 Tree Regression 1
49 pages
Regression Tree by Bishop
No ratings yet
Regression Tree by Bishop
4 pages
Cart Animation en Feb19 Final
No ratings yet
Cart Animation en Feb19 Final
60 pages
6 - CART Models
No ratings yet
6 - CART Models
15 pages
Support, Decision and Random
No ratings yet
Support, Decision and Random
8 pages
THUẬT TOÁN
No ratings yet
THUẬT TOÁN
4 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
MMC 1
No ratings yet
MMC 1
7 pages
MI_Unit 4
No ratings yet
MI_Unit 4
79 pages
03_Concepts
No ratings yet
03_Concepts
5 pages
Regression tree
No ratings yet
Regression tree
7 pages
Lecture 16
No ratings yet
Lecture 16
5 pages
Chap 8
No ratings yet
Chap 8
9 pages
Árboles de Regresión. Algunos Algoritmos y Extensiones A Métodos de Consenso Autor David Gonzalo Ejea Carbonell
No ratings yet
Árboles de Regresión. Algunos Algoritmos y Extensiones A Métodos de Consenso Autor David Gonzalo Ejea Carbonell
34 pages
Lecture 16
No ratings yet
Lecture 16
6 pages
BSC ML Ch3.pptx
No ratings yet
BSC ML Ch3.pptx
106 pages
Top-down Induction of Model Trees With Regression and Splitting Nodes
No ratings yet
Top-down Induction of Model Trees With Regression and Splitting Nodes
14 pages
Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
Unit IV
No ratings yet
Unit IV
36 pages
CP 4
No ratings yet
CP 4
2 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
05_nonparametric
No ratings yet
05_nonparametric
22 pages
Unit 3
No ratings yet
Unit 3
31 pages
Gee Cart 2008
No ratings yet
Gee Cart 2008
8 pages
XG Boost Research Paper (2)
No ratings yet
XG Boost Research Paper (2)
5 pages
Applied Predictive Analytics For Business: Decision Trees
No ratings yet
Applied Predictive Analytics For Business: Decision Trees
30 pages
Introduction To RPART
No ratings yet
Introduction To RPART
67 pages
Tandom Forest
No ratings yet
Tandom Forest
6 pages
Chapter 9 Multivariate Regression Tree - Workshop 10 - Advanced Multivariate Analyses in R
No ratings yet
Chapter 9 Multivariate Regression Tree - Workshop 10 - Advanced Multivariate Analyses in R
14 pages
Lecture-7---Decision-Tree-Regression-imran-19032025-103416am
No ratings yet
Lecture-7---Decision-Tree-Regression-imran-19032025-103416am
40 pages
Regression Trees
No ratings yet
Regression Trees
11 pages
A Detailed Analysis of The Supervised Machine Learning Algorithms
No ratings yet
A Detailed Analysis of The Supervised Machine Learning Algorithms
5 pages
Figure 9: Process of Knowledge Data Discovery Based On
No ratings yet
Figure 9: Process of Knowledge Data Discovery Based On
7 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
CP 3
No ratings yet
CP 3
2 pages
12 PAGES_Random Forest Algorithm, Support Vector Machine for Regression Analysis
No ratings yet
12 PAGES_Random Forest Algorithm, Support Vector Machine for Regression Analysis
12 pages
Sequential Forward Selection (SFS)
No ratings yet
Sequential Forward Selection (SFS)
5 pages
Classification and Regression Trees As Alternatives To Regression
No ratings yet
Classification and Regression Trees As Alternatives To Regression
2 pages
Yield Modeling Rule Ensembles PDF ASMC2007
No ratings yet
Yield Modeling Rule Ensembles PDF ASMC2007
6 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
Ml2-Summary
No ratings yet
Ml2-Summary
8 pages
MC Learning
No ratings yet
MC Learning
4 pages
Lecture3 Classification (PartII)
No ratings yet
Lecture3 Classification (PartII)
164 pages
Unit 2
No ratings yet
Unit 2
11 pages
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
No ratings yet
Apznzayn4iudcvxyoppqs61j04 7hfvwveb4orry3irmq7ekrlv08lh81olz64cb1ycwzmxuattzrg0ox0g-e Tcprei1i3bwhbnbqofqhvtixwokm0ftaoxwee3znpcytoh6jgknlof6 Rukjysosqdyan8wfbovpzrikmrpeywyu07ft Vvpsanuerxuhcghc7g6sd4pcyi9z-Wao8bn
20 pages
Treepred
No ratings yet
Treepred
5 pages
Modelmap: An R Package For Model Creation and Map Production
No ratings yet
Modelmap: An R Package For Model Creation and Map Production
69 pages
Random Forest Explained
No ratings yet
Random Forest Explained
39 pages
XGBoost and Upgrades
No ratings yet
XGBoost and Upgrades
14 pages
A Working Guide To Boosted Regression Trees: J. Elith, J. R. Leathwick and T. Hastie
No ratings yet
A Working Guide To Boosted Regression Trees: J. Elith, J. R. Leathwick and T. Hastie
12 pages
Piecewise-Polynomial Regression Trees
No ratings yet
Piecewise-Polynomial Regression Trees
30 pages
Longintro
No ratings yet
Longintro
60 pages