Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
1
Gunnar Rätsch
Friedrich Miescher Laboratory
Max Planck Society
Tübingen, Germany
http://www.tuebingen.mpg.de/~raetsch
Machine Learning in
Science and Engineering
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
2
Roadmap
• Motivating Examples
• Some Background
• Boosting & SVMs
• Applications
Rationale: Let computers learn to automate
processes and to understand highly
complex data
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
3
Example 1: Spam Classification
From: smartballlottery@hf-uk.org
Subject: Congratulations
Date: 16. December 2004 02:12:54 MEZ
LOTTERY COORDINATOR,
INTERNATIONAL PROMOTIONS/PRIZE AWARD DEPARTMENT.
SMARTBALL LOTTERY, UK.
DEAR WINNER,
WINNER OF HIGH STAKES DRAWS
Congratulations to you as we bring to your notice, the
results of the the end of year, HIGH STAKES DRAWS of
SMARTBALL LOTTERY UNITED KINGDOM. We are happy to inform you
that you have emerged a winner under the HIGH STAKES DRAWS
SECOND CATEGORY,which is part of our promotional draws. The
draws were held on15th DECEMBER 2004 and results are being
officially announced today. Participants were selected
through a computer ballot system drawn from 30,000
names/email addresses of individuals and companies from
Africa, America, Asia, Australia,Europe, Middle East, and
Oceania as part of our International Promotions Program.
…
From: manfred@cse.ucsc.edu
Subject: ML Positions in Santa Cruz
Date: 4. December 2004 06:00:37 MEZ
We have a Machine Learning position
at Computer Science Department of
the University of California at Santa Cruz
(at the assistant, associate or full professor level).
Current faculty members in related areas:
Machine Learning: DAVID HELMBOLD and MANFRED WARMUTH
Artificial Intelligence: BOB LEVINSON
DAVID HAUSSLER was one of the main ML researchers in our
department. He now has launched the new Biomolecular Engineering
department at Santa Cruz
There is considerable synergy for Machine Learning at Santa
Cruz:
-New department of Applied Math and Statistics with an emphasis
on Bayesian Methods http://www.ams.ucsc.edu/
-- New department of Biomolecular Engineering
http://www.cbse.ucsc.edu/
…
Goal: Classify emails into spam / no spam
How? Learn from previously classified emails!
Training: analyze previous emails
Application: classify new emails
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
4
Example 2: Drug Design
Actives
Inactives
Chemist
ON
H
N
NN
Cl
OH
OH
N
F
F
F
N
OH
N
Cl
N
OH
S
O
O
O
O
N
NH
Cl
OH
N
OH
OH
O
OOH
OH
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
5
The Drug Design Cycle
Actives
Inactives
Chemist
former CombiChem
technology
ON
H
N
NN
Cl
OH
OH
N
F
F
F
N
OH
N
Cl
N
OH
S
O
O
O
O
N
NH
Cl
OH
N
OH
OH
O
OOH
OH
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
6
The Drug Design Cycle
former CombiChem
technology
Actives
InactivesLearning
Machine
former CombiChem
technology
ON
H
N
NN
Cl
OH
OH
N
F
F
F
N
OH
N
Cl
N
OH
S
O
O
O
O
N
NH
Cl
OH
N
OH
OH
O
OOH
OH
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
7
Example 3: Face Detection
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
8
Premises for Machine Learning
• Supervised Machine Learning
• Observe N training examples with label
• Learn function
• Predict label of unseen example
• Examples generated from statistical process
• Relationship between features and label
• Assumption: unseen examples are generated
from same or similar process
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
9
Problem Formulation
···
Natural
+1
Natural
+1
Plastic
-1
Plastic
-1 ?
The “World”:
• Data
• Unknown Target Function
• Unknown Distribution
• Objective
Problem: is unknown
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
10
Problem Formulation
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
11
Example: Natural vs. Plastic Apples
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
12
Example: Natural vs. Plastic Apples
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
13
Example: Natural vs. Plastic Apples
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
14
AdaBoost (Freund & Schapire, 1996)
•Idea:
• Use simple many “rules of thumb”
• Simple hypotheses are not perfect!
• Hypotheses combination => increased accuracy
• Problems
• How to generate different hypotheses?
• How to combine them?
• Method
• Compute distribution on examples
• Find hypothesis on the weighted sample
• Combine hypotheses linearly:
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
15
Boosting: 1st iteration (simple hypothesis)
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
16
Boosting: recompute weighting
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
17
Boosting: 2nd iteration
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
18
Boosting: 2nd hypothesis
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
19
Boosting: recompute weighting
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
20
Boosting: 3rd hypothesis
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
21
Boosting: 4rd hypothesis
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
22
Boosting: combination of hypotheses
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
23
Boosting: decision
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
24
AdaBoost Algorithm
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
25
AdaBoost algorithm
• Combination of
• Decision stumps/trees
• Neural networks
• Heuristic rules
• Further reading
• http://www.boosting.org
• http://www.mlss.cc
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
26
Linear Separation
property 1
property2
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
27
Linear Separation
property 1
?
property2
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
28
Linear Separation with Margins
property 1
property2
property 1
?
large margin => good generalization
{m
argin
property2
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
29
Large Margin Separation
{m
argin
Idea:
• Find hyperplane
that maximizes margin
(with )
• Use for prediction
Solution:
• Linear combination of examples
• many ’s are zero
• Support Vector Machines
Demo
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
30
Kernel Trick
Linear in
input space
Linear in
feature space
Non-linear in
input space
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
31
Example: Polynomial Kernel
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
32
Support Vector Machines
• Demo: Gaussian Kernel
• Many other algorithms can use kernels
• Many other application specific kernels
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
33
Capabilities of Current Techniques
• Theoretically & algorithmically well understood:
• Classification with few classes
• Regression (real valued)
• Novelty Detection
Bottom Line: Machine Learning
works well for relatively simple
objects with simple properties
• Current Research
• Complex objects
• Many classes
• Complex learning setup (active learning)
• Prediction of complex properties
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
34
Capabilities of Current Techniques
• Theoretically & algorithmically well understood:
• Classification with few classes
• Regression (real valued)
• Novelty Detection
Bottom Line: Machine Learning
works well for relatively simple
objects with simple properties
• Current Research
• Complex objects
• Many classes
• Complex learning setup (active learning)
• Prediction of complex properties
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
35
Capabilities of Current Techniques
• Theoretically & algorithmically well understood:
• Classification with few classes
• Regression (real valued)
• Novelty Detection
Bottom Line: Machine Learning
works well for relatively simple
objects with simple properties
• Current Research
• Complex objects
• Many classes
• Complex learning setup (active learning)
• Prediction of complex properties
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
36
Capabilities of Current Techniques
• Theoretically & algorithmically well understood:
• Classification with few classes
• Regression (real valued)
• Novelty Detection
Bottom Line: Machine Learning
works well for relatively simple
objects with simple properties
• Current Research
• Complex objects
• Many classes
• Complex learning setup (active learning)
• Prediction of complex properties
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
37
Capabilities of Current Techniques
• Theoretically & algorithmically well understood:
• Classification with few classes
• Regression (real valued)
• Novelty Detection
Bottom Line: Machine Learning
works well for relatively simple
objects with simple properties
• Current Research
• Complex objects
• Many classes
• Complex learning setup (active learning)
• Prediction of complex properties
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
38
Many Applications
• Handwritten Letter/Digit recognition
• Face/Object detection in natural scenes
• Brain-Computer Interfacing
• Gene Finding
• Drug Discovery
• Intrusion Detection Systems (unsupervised)
• Document Classification (by topic, spam mails)
• Non-Intrusive Load Monitoring of electric appliances
• Company Fraud Detection (Questionaires)
• Fake Interviewer identification in social studies
• Optimized Disk caching strategies
• Optimal Disk-Spin-Down prediction
• …
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
39
MNIST Benchmark
SVM with polynomial kernel
(considers d-th order correlations of pixels)
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
40
MNIST Error Rates
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
41
2. Search
Classifier
face
non-face
1.
525,820 patches=
7
1
)1(2
7.0450600
l
l
Face Detection
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
42
Note: for “easy” patches, a
quick and inaccurate
classification is sufficient.
Method: sequential
approximation of the classifier
in a Hilbert space
Result: a set of face detection
filters
Romdhani, Blake, Schölkopf, & Torr, 2001
Fast Face Detection
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
43
1 Filter, 19.8% patches left
Example: 1280x1024 Image
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
44
10 Filters, 0.74% Patches left
Example: 1280x1024 Image
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
45
20 Filters, 0.06% Patches left
Example: 1280x1024 Image
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
46
30 Filters, 0.01% Patches left
Example: 1280x1024 Image
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
47
70 Filters, 0.007 % patches left
Example: 1280x1024 Image
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
48
Single Trial Analysis of EEG:towards BCI
Gabriel Curio Benjamin Blankertz Klaus-Robert Müller
Intelligent Data Analysis Group, Fraunhofer-FIRST
Berlin, Germany
Neurophysics Group
Dept. of Neurology
Klinikum Benjamin
Franklin
Freie Universität
Berlin, Germany
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
49
Cerebral Cocktail Party Problem
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
50
The Cocktail Party Problem
How to decompose superimposed signals?
Analogous Signal Processing problem as for cocktail party problem
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
51
The Cocktail Party Problem
• input: 3 mixed signals
• algorithm: enforce independence
(“independent component analysis”)
via temporal de-correlation
• output: 3 separated signals
(Demo: Andreas Ziehe, Fraunhofer FIRST, Berlin)
"Imagine that you are on the edge of a lake and a friend challenges you to play a game. The game
is this: Your friend digs two narrow channels up from the side of the lake […]. Halfway up each one,
your friend stretches a handkerchief and fastens it to the sides of the channel. As waves reach the
side of the lake they travel up the channels and cause the two handkerchiefs to go into motion. You
are allowed to look only at the handkerchiefs and from their motions to answer a series of
questions: How many boats are there on the lake and where are they? Which is the most powerful
one? Which one is closer? Is the wind blowing?” (Auditory Scene Analysis, A. Bregman )
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
52
Minimal Electrode Configuration
• coverage: bilateral primary
sensorimotor cortices
• 27 scalp electrodes
• reference: nose
• bandpass: 0.05 Hz - 200 Hz
• ADC 1 kHz
• downsampling to 100 Hz
• EMG (forearms bilaterally):
m. flexor digitorum
• EOG
• event channel:
keystroke timing
(ms precision)
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
53
Single Trial vs. Averaging
-500 -400 -300 -200 -100 0 [ms]
-15
-10
-5
0
5
10
15
-500 -400 -300 -200 -100 0 [ms]
-15
-10
-5
0
5
10
15
[µV]
-600 -500 -400 -300 -200 -100 0 [ms]
-15
-10
-5
0
5
10
15
-600 -500 -400 -300 -200 -100 0 [ms]
-15
-10
-5
0
5
10
15
[µV]
LEFT
hand
(ch. C4)
RIGHT
hand
(ch. C3)
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
54
BCI Setup
ACQUISITION modes:
- few single electrodes
- 32-128 channel electrode caps
- subdural macroelectrodes
- intracortical multi-single-units
EEG parameters:
- slow cortical potentials
- µ/_ amplitude modulations
- Bereitschafts-/motor-potential
TASK alternatives:
- feedback control
- imagined movements
- movement (preparation)
- mental state diversity
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
55
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
56
Finding Genes on Genomic DNA
Splice Sites: on the boundary
• Exons (may code for protein)
• Introns (noncoding)
Coding region starts with Translation Initiation Site (TIS: “ATG”)
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
57
Application: TIS Finding
GMD.SCAI
Institute for Algorithms
and Scientific Computing
Alexander Zien
Thomas Lengauer
GMD.FIRST
Institute for
Computer Architecture
and Software Technology
Gunnar Rätsch
Sebastian Mika
Bernhard Schölkopf
Klaus-Robert Müller
Engineering Support Vector Machine (SVM) Kernels
That Recognize Translation Initiation Sites (TIS)
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
58
TIS Finding: Classification Problem
• Build fixed-length sequence
representation of candidates
• Select candidate positions
for TIS by looking for ATG
• Transform sequence into
representaion in real space
A (1,0,0,0,0)
C (0,1,0,0,0)
G (0,0,1,0,0)
T (0,0,0,1,0)
N (0,0,0,0,1)
1000-dimensional real space
(...,0,1,0,0,0,0,...)
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
59
2-class Splice Site Detection
Window of 150nt
around known splice sites
Positive examples: fixed window around a true splice site
Negative examples: generated by shifting the window
Design of new Support Vector Kernel
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
60
The Drug Design Cycle
former CombiChem
technology
Actives
InactivesLearning
Machine
former CombiChem
technology
ON
H
N
NN
Cl
OH
OH
N
F
F
F
N
OH
N
Cl
N
OH
S
O
O
O
O
N
NH
Cl
OH
N
OH
OH
O
OOH
OH
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
61
Three types of Compounds/Points
actives
inactives
untested
few
more
plenty
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
62
Shape/Feature Descriptor
Shape/Feature Signature
~105 bits
Shape jShape i
bit number
254230
bit =
Shape
Feature type
Feature location
S. Putta, A Novel Shape/Feature Descriptor, 2001
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
63
Maximizing the Number of Hits
Total number of
active examples
selected
after each batch
Largest Selection Strategy
On Thrombin
dataset
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
64
Concluding Remarks
• Computational Challenges
• Algorithms can work with 100.000’s of examples
(need operations
• Usually model parameters to be tuned
(cross-validation computationally expensive)
• Need computer clusters and
Job scheduling systems (pbs, gridengine)
• Often use MATLAB
(to be replaced by python: help!)
• Machine learning is an exciting research area …
• … involving Computer Science, Statistics & Mathematics
• … with…
• a large number of present and future applications (in all situations
where data is available, but explicit knowledge is scarce)…
• an elegant underlying theory…
• and an abundance of questions to study.
New computational biology group in Tübingen: looking for people to hire
Gunnar Rätsch Machine Learning in Science and Engineering CCC Berlin, December 27, 2004
65
Thanks for Your Attention!
Collegues & Contributors: K. Bennett, G. Dornhege, A.
Jagota, M. Kawanabe, J. Kohlmorgen, S. Lemm, C. Lemmen, P.
Laskov, J. Liao, T. Lengauer, R. Meir, S. Mika, K-R. Müller,
T. Onoda, A. Smola, C. Schäfer, B. Schölkopf, R. Sommer, S.
Sonnenburg, J. Srinivasan, K. Tsuda, M. Warmuth, J. Weston,
A. Zien
Gunnar Rätsch
http://www.tuebingen.mpg.de/~raetsch
Gunnar.Raetsch@tuebingen.mpg.de
Special Thanks: Nora Toussaint, Julia Lüning, Matthias Noll

Machine Learning in Science and Engineering

  • 1.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 1 Gunnar Rätsch Friedrich Miescher Laboratory Max Planck Society Tübingen, Germany http://www.tuebingen.mpg.de/~raetsch Machine Learning in Science and Engineering
  • 2.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 2 Roadmap • Motivating Examples • Some Background • Boosting & SVMs • Applications Rationale: Let computers learn to automate processes and to understand highly complex data
  • 3.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 3 Example 1: Spam Classification From: [email protected] Subject: Congratulations Date: 16. December 2004 02:12:54 MEZ LOTTERY COORDINATOR, INTERNATIONAL PROMOTIONS/PRIZE AWARD DEPARTMENT. SMARTBALL LOTTERY, UK. DEAR WINNER, WINNER OF HIGH STAKES DRAWS Congratulations to you as we bring to your notice, the results of the the end of year, HIGH STAKES DRAWS of SMARTBALL LOTTERY UNITED KINGDOM. We are happy to inform you that you have emerged a winner under the HIGH STAKES DRAWS SECOND CATEGORY,which is part of our promotional draws. The draws were held on15th DECEMBER 2004 and results are being officially announced today. Participants were selected through a computer ballot system drawn from 30,000 names/email addresses of individuals and companies from Africa, America, Asia, Australia,Europe, Middle East, and Oceania as part of our International Promotions Program. … From: [email protected] Subject: ML Positions in Santa Cruz Date: 4. December 2004 06:00:37 MEZ We have a Machine Learning position at Computer Science Department of the University of California at Santa Cruz (at the assistant, associate or full professor level). Current faculty members in related areas: Machine Learning: DAVID HELMBOLD and MANFRED WARMUTH Artificial Intelligence: BOB LEVINSON DAVID HAUSSLER was one of the main ML researchers in our department. He now has launched the new Biomolecular Engineering department at Santa Cruz There is considerable synergy for Machine Learning at Santa Cruz: -New department of Applied Math and Statistics with an emphasis on Bayesian Methods http://www.ams.ucsc.edu/ -- New department of Biomolecular Engineering http://www.cbse.ucsc.edu/ … Goal: Classify emails into spam / no spam How? Learn from previously classified emails! Training: analyze previous emails Application: classify new emails
  • 4.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 4 Example 2: Drug Design Actives Inactives Chemist ON H N NN Cl OH OH N F F F N OH N Cl N OH S O O O O N NH Cl OH N OH OH O OOH OH
  • 5.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 5 The Drug Design Cycle Actives Inactives Chemist former CombiChem technology ON H N NN Cl OH OH N F F F N OH N Cl N OH S O O O O N NH Cl OH N OH OH O OOH OH
  • 6.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 6 The Drug Design Cycle former CombiChem technology Actives InactivesLearning Machine former CombiChem technology ON H N NN Cl OH OH N F F F N OH N Cl N OH S O O O O N NH Cl OH N OH OH O OOH OH
  • 7.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 7 Example 3: Face Detection
  • 8.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 8 Premises for Machine Learning • Supervised Machine Learning • Observe N training examples with label • Learn function • Predict label of unseen example • Examples generated from statistical process • Relationship between features and label • Assumption: unseen examples are generated from same or similar process
  • 9.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 9 Problem Formulation ··· Natural +1 Natural +1 Plastic -1 Plastic -1 ? The “World”: • Data • Unknown Target Function • Unknown Distribution • Objective Problem: is unknown
  • 10.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 10 Problem Formulation
  • 11.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 11 Example: Natural vs. Plastic Apples
  • 12.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 12 Example: Natural vs. Plastic Apples
  • 13.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 13 Example: Natural vs. Plastic Apples
  • 14.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 14 AdaBoost (Freund & Schapire, 1996) •Idea: • Use simple many “rules of thumb” • Simple hypotheses are not perfect! • Hypotheses combination => increased accuracy • Problems • How to generate different hypotheses? • How to combine them? • Method • Compute distribution on examples • Find hypothesis on the weighted sample • Combine hypotheses linearly:
  • 15.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 15 Boosting: 1st iteration (simple hypothesis)
  • 16.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 16 Boosting: recompute weighting
  • 17.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 17 Boosting: 2nd iteration
  • 18.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 18 Boosting: 2nd hypothesis
  • 19.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 19 Boosting: recompute weighting
  • 20.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 20 Boosting: 3rd hypothesis
  • 21.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 21 Boosting: 4rd hypothesis
  • 22.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 22 Boosting: combination of hypotheses
  • 23.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 23 Boosting: decision
  • 24.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 24 AdaBoost Algorithm
  • 25.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 25 AdaBoost algorithm • Combination of • Decision stumps/trees • Neural networks • Heuristic rules • Further reading • http://www.boosting.org • http://www.mlss.cc
  • 26.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 26 Linear Separation property 1 property2
  • 27.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 27 Linear Separation property 1 ? property2
  • 28.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 28 Linear Separation with Margins property 1 property2 property 1 ? large margin => good generalization {m argin property2
  • 29.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 29 Large Margin Separation {m argin Idea: • Find hyperplane that maximizes margin (with ) • Use for prediction Solution: • Linear combination of examples • many ’s are zero • Support Vector Machines Demo
  • 30.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 30 Kernel Trick Linear in input space Linear in feature space Non-linear in input space
  • 31.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 31 Example: Polynomial Kernel
  • 32.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 32 Support Vector Machines • Demo: Gaussian Kernel • Many other algorithms can use kernels • Many other application specific kernels
  • 33.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 33 Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties
  • 34.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 34 Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties
  • 35.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 35 Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties
  • 36.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 36 Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties
  • 37.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 37 Capabilities of Current Techniques • Theoretically & algorithmically well understood: • Classification with few classes • Regression (real valued) • Novelty Detection Bottom Line: Machine Learning works well for relatively simple objects with simple properties • Current Research • Complex objects • Many classes • Complex learning setup (active learning) • Prediction of complex properties
  • 38.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 38 Many Applications • Handwritten Letter/Digit recognition • Face/Object detection in natural scenes • Brain-Computer Interfacing • Gene Finding • Drug Discovery • Intrusion Detection Systems (unsupervised) • Document Classification (by topic, spam mails) • Non-Intrusive Load Monitoring of electric appliances • Company Fraud Detection (Questionaires) • Fake Interviewer identification in social studies • Optimized Disk caching strategies • Optimal Disk-Spin-Down prediction • …
  • 39.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 39 MNIST Benchmark SVM with polynomial kernel (considers d-th order correlations of pixels)
  • 40.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 40 MNIST Error Rates
  • 41.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 41 2. Search Classifier face non-face 1. 525,820 patches= 7 1 )1(2 7.0450600 l l Face Detection
  • 42.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 42 Note: for “easy” patches, a quick and inaccurate classification is sufficient. Method: sequential approximation of the classifier in a Hilbert space Result: a set of face detection filters Romdhani, Blake, Schölkopf, & Torr, 2001 Fast Face Detection
  • 43.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 43 1 Filter, 19.8% patches left Example: 1280x1024 Image
  • 44.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 44 10 Filters, 0.74% Patches left Example: 1280x1024 Image
  • 45.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 45 20 Filters, 0.06% Patches left Example: 1280x1024 Image
  • 46.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 46 30 Filters, 0.01% Patches left Example: 1280x1024 Image
  • 47.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 47 70 Filters, 0.007 % patches left Example: 1280x1024 Image
  • 48.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 48 Single Trial Analysis of EEG:towards BCI Gabriel Curio Benjamin Blankertz Klaus-Robert Müller Intelligent Data Analysis Group, Fraunhofer-FIRST Berlin, Germany Neurophysics Group Dept. of Neurology Klinikum Benjamin Franklin Freie Universität Berlin, Germany
  • 49.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 49 Cerebral Cocktail Party Problem
  • 50.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 50 The Cocktail Party Problem How to decompose superimposed signals? Analogous Signal Processing problem as for cocktail party problem
  • 51.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 51 The Cocktail Party Problem • input: 3 mixed signals • algorithm: enforce independence (“independent component analysis”) via temporal de-correlation • output: 3 separated signals (Demo: Andreas Ziehe, Fraunhofer FIRST, Berlin) "Imagine that you are on the edge of a lake and a friend challenges you to play a game. The game is this: Your friend digs two narrow channels up from the side of the lake […]. Halfway up each one, your friend stretches a handkerchief and fastens it to the sides of the channel. As waves reach the side of the lake they travel up the channels and cause the two handkerchiefs to go into motion. You are allowed to look only at the handkerchiefs and from their motions to answer a series of questions: How many boats are there on the lake and where are they? Which is the most powerful one? Which one is closer? Is the wind blowing?” (Auditory Scene Analysis, A. Bregman )
  • 52.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 52 Minimal Electrode Configuration • coverage: bilateral primary sensorimotor cortices • 27 scalp electrodes • reference: nose • bandpass: 0.05 Hz - 200 Hz • ADC 1 kHz • downsampling to 100 Hz • EMG (forearms bilaterally): m. flexor digitorum • EOG • event channel: keystroke timing (ms precision)
  • 53.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 53 Single Trial vs. Averaging -500 -400 -300 -200 -100 0 [ms] -15 -10 -5 0 5 10 15 -500 -400 -300 -200 -100 0 [ms] -15 -10 -5 0 5 10 15 [µV] -600 -500 -400 -300 -200 -100 0 [ms] -15 -10 -5 0 5 10 15 -600 -500 -400 -300 -200 -100 0 [ms] -15 -10 -5 0 5 10 15 [µV] LEFT hand (ch. C4) RIGHT hand (ch. C3)
  • 54.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 54 BCI Setup ACQUISITION modes: - few single electrodes - 32-128 channel electrode caps - subdural macroelectrodes - intracortical multi-single-units EEG parameters: - slow cortical potentials - µ/_ amplitude modulations - Bereitschafts-/motor-potential TASK alternatives: - feedback control - imagined movements - movement (preparation) - mental state diversity
  • 55.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 55
  • 56.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 56 Finding Genes on Genomic DNA Splice Sites: on the boundary • Exons (may code for protein) • Introns (noncoding) Coding region starts with Translation Initiation Site (TIS: “ATG”)
  • 57.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 57 Application: TIS Finding GMD.SCAI Institute for Algorithms and Scientific Computing Alexander Zien Thomas Lengauer GMD.FIRST Institute for Computer Architecture and Software Technology Gunnar Rätsch Sebastian Mika Bernhard Schölkopf Klaus-Robert Müller Engineering Support Vector Machine (SVM) Kernels That Recognize Translation Initiation Sites (TIS)
  • 58.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 58 TIS Finding: Classification Problem • Build fixed-length sequence representation of candidates • Select candidate positions for TIS by looking for ATG • Transform sequence into representaion in real space A (1,0,0,0,0) C (0,1,0,0,0) G (0,0,1,0,0) T (0,0,0,1,0) N (0,0,0,0,1) 1000-dimensional real space (...,0,1,0,0,0,0,...)
  • 59.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 59 2-class Splice Site Detection Window of 150nt around known splice sites Positive examples: fixed window around a true splice site Negative examples: generated by shifting the window Design of new Support Vector Kernel
  • 60.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 60 The Drug Design Cycle former CombiChem technology Actives InactivesLearning Machine former CombiChem technology ON H N NN Cl OH OH N F F F N OH N Cl N OH S O O O O N NH Cl OH N OH OH O OOH OH
  • 61.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 61 Three types of Compounds/Points actives inactives untested few more plenty
  • 62.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 62 Shape/Feature Descriptor Shape/Feature Signature ~105 bits Shape jShape i bit number 254230 bit = Shape Feature type Feature location S. Putta, A Novel Shape/Feature Descriptor, 2001
  • 63.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 63 Maximizing the Number of Hits Total number of active examples selected after each batch Largest Selection Strategy On Thrombin dataset
  • 64.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 64 Concluding Remarks • Computational Challenges • Algorithms can work with 100.000’s of examples (need operations • Usually model parameters to be tuned (cross-validation computationally expensive) • Need computer clusters and Job scheduling systems (pbs, gridengine) • Often use MATLAB (to be replaced by python: help!) • Machine learning is an exciting research area … • … involving Computer Science, Statistics & Mathematics • … with… • a large number of present and future applications (in all situations where data is available, but explicit knowledge is scarce)… • an elegant underlying theory… • and an abundance of questions to study. New computational biology group in Tübingen: looking for people to hire
  • 65.
    Gunnar Rätsch MachineLearning in Science and Engineering CCC Berlin, December 27, 2004 65 Thanks for Your Attention! Collegues & Contributors: K. Bennett, G. Dornhege, A. Jagota, M. Kawanabe, J. Kohlmorgen, S. Lemm, C. Lemmen, P. Laskov, J. Liao, T. Lengauer, R. Meir, S. Mika, K-R. Müller, T. Onoda, A. Smola, C. Schäfer, B. Schölkopf, R. Sommer, S. Sonnenburg, J. Srinivasan, K. Tsuda, M. Warmuth, J. Weston, A. Zien Gunnar Rätsch http://www.tuebingen.mpg.de/~raetsch [email protected] Special Thanks: Nora Toussaint, Julia Lüning, Matthias Noll