Sanity Checks
David Duvenaud
Cambridge University
Computational and Biological Learning Lab
April 24, 2013
A Simple Example
Comparing Models of Prawn
Minds Actual Experiments:
• Paper: Comparing evidence 102 reptitions of prawn flocking:
for different models of prawn
behavior.
• Requires inference
conditioned on results of
experiments.
• Author’s highest-impact
publication venue so far.
function [logP, samples] = logP_mc_ring_memory(theta, direction, N, modelidx, type)
if nargin < 5
type = 1;
if nargin <4
modelidx = 1;
end
end
%downsample inputs, coreelation length is ~10 frames
for i = 1:numel(theta)
theta = theta(:, 1:2:end);
direction = direction(:, 1:2:end);
end
logP = zeros(N, 1, ’double’);
samples = zeros(N, 6, ’double’);
priormin = [0, 1, -2, -2, 0, -7.5];
priormax = [pi, 5, 2, 2, 1, -7.49];
priorrange = priormax-priormin;
switch modelidx
case 0
log_l_pdf = @(x) logP_ring_null(theta, direction, x(1), x(2), x(3:4), x(5), x(6));
case 1
log_l_pdf = @(x) logP_ring_mf(theta, direction, x(1), x(2), x(3:4), x(5), x(6));
case 2
.
.
.
Is anything amiss?
% downsample inputs,
• theta is a cell array, one cell
% coreelation length is ~10 frames per experiment, each
for i = 1:numel(theta)
theta = theta(:, 1:2:end); iteration discards half the
direction = direction(:, 1:2:end); experiments!
end
• At the end of the loop, only 1
of 102 experiments left.
• So many pointless
experiments!
Is anything amiss?
% downsample inputs,
• theta is a cell array, one cell
% coreelation length is ~10 frames per experiment, each
for i = 1:numel(theta)
theta = theta(:, 1:2:end); iteration discards half the
direction = direction(:, 1:2:end); experiments!
end
• At the end of the loop, only 1
of 102 experiments left.
• So many pointless
experiments!
The lesson: never release your code
[update: Was fixed and re-published:
www.ploscompbiol.org/article/
info:doi/10.1371/journal.pcbi.1002961]
Very Common
In Machine Learning In General
• 2009: Oxford vision group • Your code will have bugs!
retraction after including test • My rate: about 1 per line of
cases in training set. matlab.
• A unnamed lab member
almost didn’t include results
in NIPS paper because of
sign error in plots.
• Retraction Watch Blog:
retractionwatch.wordpress.com
How to trust anything?
When writing code
Standard Advice Carl’s Advice
• Write Unit Tests • Re-write your code until it
• Physicists have good uses the right data structure.
protocols. • Keep your code short and
• Compute same thing in simple, no corner cases.
different ways. • Ideally, everything fits on one
• Use checkgrad! page.
When writing code
Standard Advice Carl’s Advice
• Write Unit Tests • Re-write your code until it
• Physicists have good uses the right data structure.
protocols. • Keep your code short and
• Compute same thing in simple, no corner cases.
different ways. • Ideally, everything fits on one
• Use checkgrad! page.
These methods work but slow you down
When running experiments
Things to always compare
against: Datasets to include
• A random guesser (finds • A trivial-to-predict dataset
bugs in evaluation code) (finds major bugs in any
• Always guesses mean/mode method)
(finds too-easy problems) • A dataset with no signal
• 1-nearest neighbour (finds (finds bugs in evaluation
bugs in train/test splitting) code)
• A translated, scaled version
of dataset (finds bugs in
implementation of model)
Can detect problems without looking at code
In General
Notice Confusion Empirical Rates
• Notice when you’re confused
• Notice when you’re
rationalizing
• Red flag: Looking at only
one number and making up
a story about why it goes up
or down (i.e. cog sci )
Look at details until they aren’t suprising
Main Takeaways
To keep in mind To practice
• You probably have bugs • Include simple baselines
• Finding them early saves • Check invariants
time • Plot everything
• Finding them before you • Keep things simple
publish saves retractions
A few sanity checks go a long way