Semi-random model tree ensembles: an effective and scalable regression method

Semi-random model tree ensembles: an effective
and scalable regression method

Bernhard Pfahringer
Department of Computer Science
University of Waikato, New Zealand

September 22nd , 2011

September 22nd , method 1 / 28
Bernhard Pfahringer Department of ComputerSemi-random model of Waikato, New Zealand () and scalable regression2011
Science University tree ensembles: an effective

Background

Outline

1 Background

2 Algorithm

3 Results

4 Summary


Background

Local regression

non-linear functions can be approximated by a set of locally linear
estimators
Regression and model trees are fast multi-variate versions of local
regression


Background

Piece-wise linear approximation example


Background

Sample Regression Tree: constants in the leaves

A159 <= −0.62 :
A149 <= 0.52 : Y = 1.6977
A149 > 0.52 : Y = 1.2213
A159 > −0.62 :
A149 <= 0.638 :
A57 <= −0.485 : Y = 0.8388
A57 > −0.485 : Y = 1.0569
A149 > 0.638 : Y = 0.6062


Background

Sample Model Tree: linear models in the leaves

A159 <= −0.62 :
A149 <= 0.52 : LM1
A149 > 0.52 : LM2
A159 > −0.62 :
A149 <= 0.638 : LM3
A149 > 0.638 : LM4

LM1 Y = −0.597 ∗ A149 − 0.211 ∗ A159 + 1.901
LM2 Y = −0.471 ∗ A149 − 0.211 ∗ A159 + 1.353
LM3 Y = −0.365 ∗ A149 − 0.232 ∗ A159 + 1.017
LM4 Y = −0.555 ∗ A149 − 0.232 ∗ A159 + 0.776


Algorithm

Outline

1 Background

2 Algorithm

3 Results

4 Summary


Algorithm

Ensembles of Semi-Random Model Trees

Ensembles usually improve results
Most ensembles use randomization to generate diversity
2 sources of randomness:
For each tree: divide data into a train and a validation set
To split: select best attribute from a random subset of all attributes


Algorithm

Single Semi-Random Model Tree

Only consider median as split value (=> balanced trees)
Leaf model: linear ridge regression model
Cap model predictions inside observed extremes
Optimise tree depth and ridge value using the validation set


Algorithm

Build ensemble

BUILD E NSEMBLE (data, numTrees, k )

1 for i = 1 to numTrees
2 do randomly split data into two:
3 train + validate
4 BUILD T REE (train, validate, k)

Bernhard Pfahringer Department of ComputerSemi-random model of Waikato, New Zealand () and September 22nd , 2011
Science University tree ensembles: an effective scalable regression method 10 / 28

Algorithm

BuildTree

BUILD T REE (train, validate, k)

1 min ← MIN TARGET VALUE(train)
2 max ← MAX TARGET VALUE(train)
3 localSSE ← LIN R EG(train, validate)
4 £
5 if |train| > 10 & |validate| > 10
6 do split ← RANDOM S PLIT(train, k )
7 £
8 smT ← SMALLER(train, split)
9 smV ← SMALLER(validate, split)
10 smaller ← BUILD T REE(smT , smV , k )
11 £
12 laT ← LARGER(train, split)
13 laV ← LARGER(validate, split)
14 larger ← BUILD T REE(laT , laV , k )

Algorithm

BuildTree, continued

15 subSSE ← SSE(smaller , larger , validate)
16 £
17 if localSSE < subSSE
18 do smaller ← null
19 larger ← null
20 else
21 localModel ← null


Algorithm

Ridge regression

LIN R EG (train, validate)

1 for ridge in 10−8 , 10−4 , 10−2 , 10−1 , 1, 10
2 do modelr ← RIDGE R EGRESS(train, ridge)
3 sser ← SSE(modelr , validate)
4 if bestModel == model10
5 do build models for ridge = 102 , 103 , ...
6 and so on while improving
7 localModel ← bestModel
8 return minimum-sse-on-validation-data


Algorithm

Random split selection

RANDOM S PLIT (train, k)

1 for i = 1 to k
2 do splitAttr ← RANDOM CHOICE(allAttrs)
3 stump ← STUMP(APPROX MEDIAN(splitAttr ))
4 compute SSE(stump, train)
5 return minimum-sse stump


Algorithm

Parameter Settings

reported experiments:
average predictions of 50 randomized model trees
to split select best of 50% randomly selected attributes
generally: should optimise separately for every application, e.g. using
cross-validation
number of trees: “the more the merrier”, but diminishing returns
number of randomly selected attributes: 50% is a good default, but
may be depend on the total number and on sparseness


Results

Outline

1 Background

2 Algorithm

3 Results

4 Summary


Results

Comparison

use more than 20 Torgo/UCI datasets, > 900 examples
2 1
repeated 3 training, 3 testing splits
training split into equal build and validation halves ( 3 , 1 )
1
3
preprocessed for missing or categorical values
compare to:
LR: linear ridge regression, optimise ridge value
GP: gaussian process regression, optimise noise level and RBF
gamma
AG: additive groves, use ”fast” script
use RMAE: relative mean absolute error


Results

RMAE on Torgo/UCI

RMAE for Torgo/UCI data

100

90 RMT

GP
80
LR
70
AG

60

50

40

30

20

10

0

e
om re

el s

le es

cp ng

e
M

ev o

u_ ct
ile d

lta pl s

H
k

v

s

nk ll
s

ab nh

L

us nm

2H
am

nk s
t

us ol

m H
ou

n

de 2d n
or

l_ tor
ba nt

ba ma

ak
pu e_8
on
m
oc

n

_a rie

N

pu 16
8F

cp _a

ho p
rm tu

_e an
ro

ro
ni

i

32

a3
a8
us
at
e
gr

ho in8
co lay

st

qu
al
f

ca va
lo ex

e_
s
u
le
to

ho

m
ai

k
co ct
is

el
o
rh

lta
lo

de
co


Results

Build times on Torgo/UCI

Training time in seconds for Torgo/UCI data

100000

RMT

10000 GP

LR

AG
1000

100

10

1

0.1
s

es
ab ke

nk s

ns
ile e

ed
ba 2nh

ai rs

us ing

2d _8L

oc nts

e
k

co isto ut

no
u_ t

ho 16H
cp M

ki ll
pu 8nm

de um 2H

le H

rm am
ev ol

v
cp _ac

or
a
b a ro n
oc

on

ur
m
_e N

p
to

o
8F

an
sm
a

ro

fri

ni
a3

us

co me
at

lo lay

gr

xt
lta a8
st
qu

al

3

va

e
e_

el
u

le
n
nk

pl
us

te
ho
m

o
_a

el
l_

rh
p

ho
lta

ca

lo
de

co


Results

UCI Census dataset

Table: Partial results, 2458285 examples in total, therefore about 800000 in
the training fold.

Method RMAE Time (secs)
LR 15.96 1205
RMT 9.78 19811
GP ? ? (would need 5 Tb RAM)
AG ? ? (estimated 2000000)


Results

Near infrared (NIR) Datasets

proprietary NIR data
7 datasets
from 255 upto 7500 spectra
between 170 and 500odd features
preprocessed for noise and base line shift


Results

Sample NIR spectrum

Prepocessed sample spectrum (nitrogen in soil)

4

3

2

1

0
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 155 162 169

-1

-2


Results

RMAE on NIR data

RMAE for NIR datasets

90

RMT
80
GP

70 LR

AG
60

50

40

30

20

10
n omd rmd tc phe ph p5 na g5


Results

Build times on NIR data

Training time in seconds for NIR data

100000

10000

1000

RMT
GP
100
LR
AG

10

1
omd rmd na n tc ph phe p5 g5

0.1


Results

Random Model Tree Build Times discussion

complexity is O(K ∗ N ∗ logN + K 2 ∗ N)
second term (linear model computation) seems to dominate
therefore observed complexity ∼ O(K 2 ∗ N)


Summary

Outline

1 Background

2 Algorithm

3 Results

4 Summary


Summary

Conclusions

Semi-Random Model Trees perform well
They are fast: build time is practically linear in N
Can model non-linear relationships


Summary

Future Work

Improve efﬁciency for large K
Study more and different regression problems
More comparisons to alternative regression schemes
Streaming/Moa variant


Semi-random model tree ensembles: an effective and scalable regression method

More Related Content

What's hot

Viewers also liked

Similar to Semi-random model tree ensembles: an effective and scalable regression method

Recently uploaded

Semi-random model tree ensembles: an effective and scalable regression method