0% found this document useful (0 votes)

61 views

Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop

This document discusses decision trees and boosted decision trees for classification. It contains the following key points: 1) Decision trees use sequential cuts to split data into nodes, with leaf nodes classifying events as signal or background. 2) Boosted decision trees combine many decision trees derived from the same sample using different event weights to overcome stability problems in single trees. 3) AdaBoost adaptively reweights events misclassified by previous trees, giving higher weight to difficult events, and also weights trees based on their individual error rates.

Uploaded by

Ashish Tiwari

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop

Uploaded by

Ashish Tiwari

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 30

Decision Trees and Boosting

Helge Voss (MPI–K, Heidelberg)

TMVA Workshop , CERN, 21 Jan 2011
Boosted Decision Trees
 Decision Tree: Sequential application of cuts splits
the data into nodes, where the final nodes (leafs)
classify an event as signal or background

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 2
Boosted Decision Trees
 Decision Tree: Sequential application of cuts splits
the data into nodes, where the final nodes (leafs)
classify an event as signal or background
 used since a long time in general “data-mining”
applications, less known in (High Energy)
Physics
 similar to “simple Cuts”: each leaf node is a
set of cuts.  many boxes in phase space
attributed either to signal or backgr.
 independent of monotonous variable
transformations, immune against outliers
 weak variables are ignored (and don’t
(much) deteriorate performance)
 Disadvantage  very sensitive to statistical
fluctuations in training data

 Boosted Decision Trees (1996):

combine a whole forest of Decision Trees,
derived from the same sample, e.g. using
 became popular in HEP since
different event weights. MiniBooNE, B.Roe et.a., NIM 543(2005)
 overcomes the stability problem
Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 3
Growing a Decision Tree
 start with training sample at the root node

 split training sample at node into two, using a cut

in the variable that gives best separation gain

 continue splitting until:

 minimal #events per node
 maximum number of nodes
 maximum depth specified
 a split doesn’t give a minimum separation gain
 leaf-nodes classify S,B according to the
majority of events or give a S/B probability

 Why no multiple branches (splits) per node ?

 Fragments data too quickly; also: multiple splits per node = series of binary node splits

 What about multivariate splits?

 time consuming
 other methods more adapted for such correlations
 we’ll see later that for “boosted” DTs weak (dull) classifiers are often better, anyway
Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 4
Separation Gain

 What do we mean by “best separation gain”?

 define a measure on how mixed S and B are in a node:
 MisClassification:
1-max(p,1-p)
 Gini-index: (Corrado Gini 1912, typically used to measure income inequality)

p (1-p) : p=purity
 Cross Entropy:
-(plnp + (1-p)ln(1-p))

cross entropy
 difference in the various indices are small, Gini index
most commonly used: Gini-index misidentification

purity

separation gain: e.g. NParentGiniParent – NleftGiniLeftNode – Nright*GiniRightNode

 Consider all variables and all possible cut values

 select variable and cut that maximises the separation gain.
Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 5
Separation Gain
MisClassificationError  sort of the classical way to choose cut
BUT:

cumulative-
distributions

There are cases where the simple “misclassificaton” does not have any optimium at all!
other S=400,B=400  (S=300,B=100) (S=100,B=300) or (S=200,B=0) (S=200,B=400)
example:
 equal in terms of misclassification error, but GiniIndex/Entropy favour the latter
Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 6
Decision Tree Pruning
 One can continue node splitting until all leaf nodes
are basically pure (using the training sample)
obviously: that’s overtraining

 Two possibilities:
 stop growing earlier
generally not a good idea, even useless
splits might open up subsequent useful splits
 grow tree to the end and “cut back”, nodes
that seem statistically dominated:
 pruning

 e.g. Cost Complexity pruning: C(T, )    | y(x)  y(C) |   Nleaf nodes

 assign to every sub-tree, T C(T,) : leafs events
of T in leaf

 find subtree T with minmal C(T,) for given 

Loss function regularisaion/
 use subsequent weakest link pruning
cost parameter
 which cost parameter  ?
large enough to avoid overtraining
tuning parameter or “cross validation” (still to come in TMVA hopefully soon…)
Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 7
Decision Tree Pruning
 “Real life” example of an optimally pruned Decision Tree:

Decision tree
Decision tree before pruning after pruning

 Pruning algorithms are developed and applied on individual trees

 optimally pruned single trees are not necessarily optimal in a forest !
 actually they tend to be TOO big when boosted, no matter how hard you
prune!

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 8
Boosting

classifier
Training Sample C(0)(x)
re-weight
Weighted classifier
Sample C(1)(x)
re-weight
Weighted classifier
Sample C(2)(x)
NClassifier
re-weight
Weighted classifier
y(x)   i
w iC(i) (x)
Sample C(3)(x)
re-weight

Weighted classifier
Sample C(m)(x)

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 9
Adaptive Boosting (AdaBoost)

classifier  AdaBoost re-weights events

Training Sample C(0)(x) misclassified by previous classifier by:
re-weight
classifier
1  ferr
Weighted with :
Sample C(1)(x) ferr
re-weight
misclassified events
Weighted classifier ferr 
Sample C(2)(x) all events
re-weight
Weighted classifier  AdaBoost weights the classifiers also
Sample C(3)(x) using the error rate of the individual
classifier according to:
re-weight

NClassifier
 1  ferr
(i)
 (i)
y(x)   log  (i) C (x)
classifier
i  ferr 
Weighted
Sample C(m)(x)

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 10
Boosted Decision Trees
 Result of ONE Decision Tree for test event is either “Signal” or “Background”
 the tree gives a fixed signal eff. and background rejection

 For a whole Forest however:

y(B)  0
y(S)  1

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 11
AdaBoost in Pictures

Start here: misclassified events get

… and so on
equal event weights larger weights

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 12
Boosted Decision Trees – Control Plots

A very well behaved

example’

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 13
Boosted Decision Trees – Control Plots

A more “difficult” example

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 14
AdaBoost: A simple demonstration

The example: (somewhat artificial…but nice for demonstration) :

• Data file with three “bumps” var(i) > x var(i) <= x
• Weak classifier (i.e. one single simple “cut” ↔ decision tree stumps )
B S

b) a)

Two reasonable cuts: a) Var0 > 0.5  εsignal=66% εbkg ≈ 0% misclassified events in total 16.5%
or
b) Var0 < -0.5  εsignal=33% εbkg ≈ 0% misclassified events in total 33%

the training of a single decision tree stump will find “cut a)”

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 15
AdaBoost: A simple demonstration
The first “tree”, choosing cut a) will give an error fraction: err = 0.165
 before building the next “tree”: weight wrong classified training events by ( 1-err/err) ) ≈ 5
 the next “tree” sees essentially the following data sample:

re-weight .. and hence will

chose: “cut b)”:
b) Var0 < -0.5

The combined classifier: Tree1 + Tree2

the (weighted) average of the response to
a test event from both trees is able to
separate signal from background as
good as one would expect from the most
powerful classifier

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 16
AdaBoost: A simple demonstration
Only 1 tree “stump” Only 2 tree “stumps” with AdaBoost

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 17
“A Statistical View of Boosting” (Friedman 1998 et.al)

Boosted Decision Trees: two different interpretations

give events that are “difficult to categorize” more weight and average afterwards the
results of all classifiers that were obtained with different weights
see each Tree as a “basis function” of a possible classifier 
• boosting or bagging is just a mean to generate a set of “basis functions”
• linear combination of basis functions gives final classifier or: final classifier is an
expansion in the basis functions. 
y( , x)    T (x)
tree
i i

• every “boosting” algorithm can be interpreted as optimising the loss function in a

“greedy stagewise” manner
•i.e. from the current point in the optimisation – e.g.building of the decision tree
forest- :
• chooses the parameters for the next boost step (weights) such that one
moves a long the steepest gradient of the loss function

• AdaBoost: “exponential loss function” = exp( -y0y(α,x)) where y0=-1 (bkg), y0=1 (signal)

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 18
Gradient Boost

 Gradient Boost is a way to implement “boosting” with arbitrary “loss functions” by

approximating “somehow” the gradient of the loss function

 AdaBoost: Exponential loss exp( -y0y(α,x))  theoretically sensitive to outliers

 Binomial log-likelihood loss ln(1 + exp( -2y0y(α,x))  more well behaved loss function,
(the corresponding “GradientBoost” is implmented in TMVA)

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 19
Bagging and Randomised Trees

other classifier combinations:

 Bagging:
 combine trees grown from “bootstrap” samples
(i.e re-sample training data with replacement)

 Randomised Trees: (Random Forest: trademark L.Breiman, A.Cutler)

 combine trees grown with:
 random bootstrap (or subsets) of the training data only
 consider at each node only a random subsets of variables for the split
 NO Pruning!

 These combined classifiers work surprisingly well, are very stable and
almost perfect “out of the box” classifiers

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 20
AdaBoost vs Bagging and Randomised Forests
Sometimes people present “boosting” as nothing else then just “smearing” in order to make
the Decision Trees more stable w.r.t statistical fluctuations in the training.

clever “boosting” however can do more, than for example: for previous example of “three
bumps”

- Random Forests
- Bagging

as in this case, pure statistical fluctuations are

not enough to enhance the 2nd peak sufficiently

however: a “fully grown decision tree” is

much more than a “weak classifier”
_AdaBoost
 “stabilization” aspect is more important

Surprisingly: Often using smaller trees (weaker classifiers) in AdaBoost and other clever boosting
algorithms (i.e. gradient boost) seems to give overall significantly better performance !

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 21
Boosting at Work

 Boosting seems to work best on “weak” classifiers (i.e. small, dum trees)
 Tuning (tree building) parameter settings are important
 For good out of the box performance: Large numbers of very small trees
Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 22
Generalised Classifier Boosting
Principle (just as in BDT): multiple training cycles, each time wrongly
classified events get a higher event weight

classifier
Training Sample C(0)(x)
re-weight
Weighted classifier
Sample C(1)(x)
re-weight
NClassifier
 1  ferr
(i)
 (i)
Weighted classifier
y(x)   log  (i) C (x)
Sample C(2)(x)
i  ferr 
re-weight

Response is weighted sum

of each classifier response
Weighted classifier
Sample C(m)(x)

Boosting might be interesting especially for simple (weak) Methods like Cuts, Linear
Discriminants, simple (small, few nodes) MLPs
Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 23
AdaBoost On a linear Classifier (e.g. Fisher)

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 24
AdaBoost On a linear Classifier (e.g. Fisher)

 Ups… there’s still a problem in TMVA’s generalized boosting. This example doesn’t work yet !
Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 25
Boosting a Fisher Discriminant in TMVA…
 100 Boosts of a “Fisher Discriminant”
 as Multivariate Tree split (yes.. it is in TMVA
although I argued against it earlier. I hoped to
cope better with linear correlations that way…)
 generalised boosting of Fisher classifier

Something isn’t quite correct yet !

1st Fisher cut 2nd Fisher cut 65th Fisher cut

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 26
Learning with Rule Ensembles
 Following RuleFit approach by Friedman-Popescu Friedman-Popescu, Tech Rep,
Stat. Dpt, Stanford U., 2003

 Model is linear combination of rules, where a rule is a sequence of cuts (i.e. a

branch of a decision tree)
RuleFit classifier rules (cut sequence normalised
 rm=1 if all cuts discriminating
satisfied, =0 otherwise) event variables

 
 MR  nR
y RF  x   a0   am rm xˆ   bk xˆk
m 1 k 1

Sum of rules Linear Fisher term

 The problem to solve is

 Create rule ensemble: use forest of decision trees
 pruning removes topologically equal rules (same variables in cut sequence)
 Add a “Fisher term” to capture linear correlations
 Fit coefficients am, bk: gradient direct regularization minimising Risk (Friedman et al.)
One of the elementary cellular automaton rules (Wolfram 1983, 2002). It specifies the next color in a cell, depending
on its color and its immediate neighbors. Its rule outcomes are encoded in the binary representation 30=000111102.

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 27
Regression Trees

 Rather than calling leafs Signal or Background

could also give them “values” (i.e. “mean value” of all values attributed to
training events that end up in the node)
Regression Tree

 Node Splitting: Separation Gain  Gain in Variance (RMS) of target function

 Boosting: error fraction  “distance” measure from the mean

linear, square or exponential

 Use this to model ANY non analytic function of which you have “training data”
i.e.
 energy in your calorimeter as function of show parameters
 training data from testbeam

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 28
Regression Trees

 Leaf Nodes:
One output value
ZOOM
Regression Trees seem to need DESPITE BOOSTING larger trees

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 29
Summary

Boosted Decision Trees  a “brute force method” works “out of the

box”
check tuning parameters anyway.
start with “small trees” (limit the maximum number of splits (tree depth)
automatic tuning parameter optimisation
first implementation is done, obviously needs LOTs of time!

be as careful as with “cuts” and check against data

Boosting can (in principle) be applied to any (weak) classifier

Boosted Regression Trees  at least as much “brute force”

little experience with yet.. but probably equally robust and powerful

Helge Voss TMVA-Workshop, CERN, 21. January 2011 ― Decision Trees and Boosting 30

Build Your Own Flight Sim in C++ (DOS GameDev) Michael Radtke & Chris Lampton
100% (1)
Build Your Own Flight Sim in C++ (DOS GameDev) Michael Radtke & Chris Lampton
672 pages
Algorithms On String Trees and Sequences
No ratings yet
Algorithms On String Trees and Sequences
326 pages
Apple II Human Interface Guidelines 1985
No ratings yet
Apple II Human Interface Guidelines 1985
130 pages
Loan Prediction Using Machine Learning
No ratings yet
Loan Prediction Using Machine Learning
29 pages
Machine-Learning Paradigms
No ratings yet
Machine-Learning Paradigms
32 pages
Kubernetes Basic To Advanced
No ratings yet
Kubernetes Basic To Advanced
4 pages
Performance Redis
No ratings yet
Performance Redis
119 pages
Uncertainty in Modeling
No ratings yet
Uncertainty in Modeling
25 pages
Threaded Interpretive Languages PDF
No ratings yet
Threaded Interpretive Languages PDF
266 pages
Understanding Apple Basic 1983
No ratings yet
Understanding Apple Basic 1983
62 pages
GS+ Reference Manual
No ratings yet
GS+ Reference Manual
23 pages
Amiga World Tech Journal Vol 02-01-1992 Feb
No ratings yet
Amiga World Tech Journal Vol 02-01-1992 Feb
68 pages
Delphi Informant Magazine (1995-2001)
No ratings yet
Delphi Informant Magazine (1995-2001)
58 pages
Programming Agents Williams
No ratings yet
Programming Agents Williams
31 pages
Suicide Detection With Natural Language Processing
No ratings yet
Suicide Detection With Natural Language Processing
14 pages
Amiga World Tech Journal Vol 02-02-1992 Apr
No ratings yet
Amiga World Tech Journal Vol 02-02-1992 Apr
68 pages
Introduction To MIPS Assembly Language Programming1
No ratings yet
Introduction To MIPS Assembly Language Programming1
179 pages
Delphi Informant Magazine (1995-2001)
No ratings yet
Delphi Informant Magazine (1995-2001)
41 pages
Graphics Programming
75% (4)
Graphics Programming
42 pages
MEGA65-Book Draft PDF
No ratings yet
MEGA65-Book Draft PDF
822 pages
Dougherty - The Apple II Monitor Peeled
No ratings yet
Dougherty - The Apple II Monitor Peeled
48 pages
Tile Based Games
No ratings yet
Tile Based Games
104 pages
The Big Book of Mlops: Ebook
100% (1)
The Big Book of Mlops: Ebook
36 pages
Delphi Informant Magazine Vol 6 No 12
No ratings yet
Delphi Informant Magazine Vol 6 No 12
38 pages
Amiga World Tech Journal Vol 01-02-1991 Jun Jul
No ratings yet
Amiga World Tech Journal Vol 01-02-1991 Jun Jul
56 pages
A Crash Course in Redis - ByteByteGo Newsletter
No ratings yet
A Crash Course in Redis - ByteByteGo Newsletter
13 pages
Amiga World Tech Journal Vol 01-01-1991 Apr May
No ratings yet
Amiga World Tech Journal Vol 01-01-1991 Apr May
56 pages
Integer BASIC Disassembly
100% (3)
Integer BASIC Disassembly
62 pages
CH 2 Raster Scan Graphics
No ratings yet
CH 2 Raster Scan Graphics
23 pages
Masm50 Codeview and Utilguide
No ratings yet
Masm50 Codeview and Utilguide
400 pages
Amiga World Tech Journal Vol 01-03-1991 Aug Sep
No ratings yet
Amiga World Tech Journal Vol 01-03-1991 Aug Sep
72 pages
Apple IIgs Hardware Reference
No ratings yet
Apple IIgs Hardware Reference
164 pages
Summary of Apple II Monitor Commands
100% (2)
Summary of Apple II Monitor Commands
3 pages
Apple APW C Release Notes
No ratings yet
Apple APW C Release Notes
33 pages
Memory Book C
100% (2)
Memory Book C
350 pages
Direct2D Succinctly PDF
No ratings yet
Direct2D Succinctly PDF
187 pages
P-CAD 2004 Circuit Simulator User's Guide
No ratings yet
P-CAD 2004 Circuit Simulator User's Guide
184 pages
Z80 User Manual
No ratings yet
Z80 User Manual
308 pages
0212
No ratings yet
0212
36 pages
ABCs of Atari Computers
No ratings yet
ABCs of Atari Computers
236 pages
Introduction To Saturn Assembly Language 3e - Fernandes & Rechlin 2005
No ratings yet
Introduction To Saturn Assembly Language 3e - Fernandes & Rechlin 2005
189 pages
MSX Computing - Apr-May 1986
No ratings yet
MSX Computing - Apr-May 1986
68 pages
Parallel Computing
100% (1)
Parallel Computing
53 pages
Atari Basic Reference Guide (c061948 Rev.b) 1983
100% (1)
Atari Basic Reference Guide (c061948 Rev.b) 1983
12 pages
Forth Programmer's Handbook: The Language of Innovation..
No ratings yet
Forth Programmer's Handbook: The Language of Innovation..
241 pages
Deep C Modified
No ratings yet
Deep C Modified
449 pages
LLVM Reference Card
No ratings yet
LLVM Reference Card
2 pages
Trees, Boosting, and Random Forest
No ratings yet
Trees, Boosting, and Random Forest
14 pages
Tex
No ratings yet
Tex
7 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
BDT KSETA Freudenstadt
No ratings yet
BDT KSETA Freudenstadt
32 pages
CSC 3304 Lecture 08 Boosting Ensemble Methods
No ratings yet
CSC 3304 Lecture 08 Boosting Ensemble Methods
41 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
DSA5102_lecture3
No ratings yet
DSA5102_lecture3
34 pages
Randomized Decision Trees II: 1 Feature Selection
No ratings yet
Randomized Decision Trees II: 1 Feature Selection
3 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Bagging Boosting Comparisons
No ratings yet
Bagging Boosting Comparisons
35 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Physics Analysis With Advanced Data Mining Techniques: Hai-Jun Yang University of Michigan, Ann Arbor
No ratings yet
Physics Analysis With Advanced Data Mining Techniques: Hai-Jun Yang University of Michigan, Ann Arbor
54 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
16 pages
Decision Trees: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
No ratings yet
Decision Trees: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University
101 pages
Floating Point
No ratings yet
Floating Point
16 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Introduction To HDL Day - 3: STC On HDL For Digital System Design 1
50% (2)
Introduction To HDL Day - 3: STC On HDL For Digital System Design 1
8 pages
Classification: Decision Trees
No ratings yet
Classification: Decision Trees
30 pages
Cheatsheet
No ratings yet
Cheatsheet
1 page
Web Services: 1. SSL Certificates
No ratings yet
Web Services: 1. SSL Certificates
9 pages
Uploads Notes Btech 4sem It OS End Term Sample Paper
No ratings yet
Uploads Notes Btech 4sem It OS End Term Sample Paper
9 pages
Advt NT
No ratings yet
Advt NT
3 pages
Dursa & Abay Sentiment Analysis Model Final Proposal Print
No ratings yet
Dursa & Abay Sentiment Analysis Model Final Proposal Print
39 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
AI chapter 1
No ratings yet
AI chapter 1
26 pages
Data Discretization Unification
No ratings yet
Data Discretization Unification
14 pages
So, You Want To Learn Artificial Intelligence. Here's How You Do It
No ratings yet
So, You Want To Learn Artificial Intelligence. Here's How You Do It
23 pages
DWDM
No ratings yet
DWDM
5 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
21 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
24 pages
Discriminant & Logit Analysis Using SAS Enterprise Guide
No ratings yet
Discriminant & Logit Analysis Using SAS Enterprise Guide
53 pages
Samuel Cantrell
No ratings yet
Samuel Cantrell
7 pages
CLArticulosAcademicosyJuegodeRol Uni
No ratings yet
CLArticulosAcademicosyJuegodeRol Uni
732 pages
An Optimized Approach For Prediction of Heart Diseases Using Gradient Boosting Classifier
No ratings yet
An Optimized Approach For Prediction of Heart Diseases Using Gradient Boosting Classifier
7 pages
Kaur2020 Article Hyper-parameterOptimizationOfD
No ratings yet
Kaur2020 Article Hyper-parameterOptimizationOfD
15 pages
Erik Lamoureux - Red Blood Cell Deformability Image Classification Using Deep Learning
No ratings yet
Erik Lamoureux - Red Blood Cell Deformability Image Classification Using Deep Learning
1 page
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
61 pages
Quantum-Enhanced Support Vector Classifier For Image Classification
100% (1)
Quantum-Enhanced Support Vector Classifier For Image Classification
6 pages
Deloitte NL Data Analytics Artificial Intelligence Whitepaper Eng - Removed
No ratings yet
Deloitte NL Data Analytics Artificial Intelligence Whitepaper Eng - Removed
19 pages
Loan Approval Model Prediction
No ratings yet
Loan Approval Model Prediction
10 pages
A Novel Approach For Polycystic Ovary Syndrome Prediction Using Machine Learning in Bioinformatics
No ratings yet
A Novel Approach For Polycystic Ovary Syndrome Prediction Using Machine Learning in Bioinformatics
16 pages
Application Studies To Car Interior of Kansei Engineering
No ratings yet
Application Studies To Car Interior of Kansei Engineering
10 pages
Chemometric Software For Multivariate Data Analysis Based On Matlab
No ratings yet
Chemometric Software For Multivariate Data Analysis Based On Matlab
8 pages
Eur 27196 Ipro
No ratings yet
Eur 27196 Ipro
114 pages
2021_Real-time recognition system of soybean seed full-surface defects based on deep learning - ScienceDirect
No ratings yet
2021_Real-time recognition system of soybean seed full-surface defects based on deep learning - ScienceDirect
7 pages
Face Mask Detection Using Machine Learning and Deep Learning
No ratings yet
Face Mask Detection Using Machine Learning and Deep Learning
6 pages
Data Science Interview Preparation 1
100% (3)
Data Science Interview Preparation 1
79 pages
Research Methodology in Commerce
No ratings yet
Research Methodology in Commerce
3 pages
Unit-3 Classification & Regression
No ratings yet
Unit-3 Classification & Regression
4 pages
DataFITS A Heterogeneous Data Fusion Framework For Traffic and Incident Prediction
No ratings yet
DataFITS A Heterogeneous Data Fusion Framework For Traffic and Incident Prediction
13 pages
CCS355 SET1 Anna University Lab Manual Question Set
100% (1)
CCS355 SET1 Anna University Lab Manual Question Set
3 pages

Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop

Uploaded by

Decision Trees and Boosting: Helge Voss (MPI-K, Heidelberg) TMVA Workshop

Uploaded by

Decision Trees and Boosting

Helge Voss (MPI–K, Heidelberg)

 Boosted Decision Trees (1996):

 split training sample at node into two, using a cut

 continue splitting until:

 Why no multiple branches (splits) per node ?

 What about multivariate splits?

 What do we mean by “best separation gain”?

separation gain: e.g. NParent*GiniParent – Nleft*GiniLeftNode – Nright*GiniRightNode

 Consider all variables and all possible cut values

 e.g. Cost Complexity pruning: C(T, )    | y(x)  y(C) |   Nleaf nodes

 find subtree T with minmal C(T,) for given 

 Pruning algorithms are developed and applied on individual trees

classifier  AdaBoost re-weights events

 For a whole Forest however:

Start here: misclassified events get

A very well behaved

A more “difficult” example

The example: (somewhat artificial…but nice for demonstration) :

re-weight .. and hence will

The combined classifier: Tree1 + Tree2

Boosted Decision Trees: two different interpretations

• every “boosting” algorithm can be interpreted as optimising the loss function in a

 Gradient Boost is a way to implement “boosting” with arbitrary “loss functions” by

 AdaBoost: Exponential loss exp( -y0y(α,x))  theoretically sensitive to outliers

other classifier combinations:

 Randomised Trees: (Random Forest: trademark L.Breiman, A.Cutler)

as in this case, pure statistical fluctuations are

however: a “fully grown decision tree” is

Response is weighted sum

Something isn’t quite correct yet !

1st Fisher cut 2nd Fisher cut 65th Fisher cut

 Model is linear combination of rules, where a rule is a sequence of cuts (i.e. a

Sum of rules Linear Fisher term

 The problem to solve is

 Rather than calling leafs Signal or Background

 Node Splitting: Separation Gain  Gain in Variance (RMS) of target function

 Boosting: error fraction  “distance” measure from the mean

Boosted Decision Trees  a “brute force method” works “out of the

be as careful as with “cuts” and check against data

Boosting can (in principle) be applied to any (weak) classifier

Boosted Regression Trees  at least as much “brute force”

You might also like

separation gain: e.g. NParentGiniParent – NleftGiniLeftNode – Nright*GiniRightNode