0% found this document useful (0 votes)
74 views11 pages

Sanity 2

The document discusses the importance of sanity checks and testing when writing code and conducting experiments. It notes that bugs are very common in machine learning code and provides examples of bugs found in published papers. It recommends practices like writing unit tests, comparing results to simple baselines, checking invariances, and plotting details to catch bugs early and avoid retractions. The key message is that a few basic sanity checks can go a long way in validating results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views11 pages

Sanity 2

The document discusses the importance of sanity checks and testing when writing code and conducting experiments. It notes that bugs are very common in machine learning code and provides examples of bugs found in published papers. It recommends practices like writing unit tests, comparing results to simple baselines, checking invariances, and plotting details to catch bugs early and avoid retractions. The key message is that a few basic sanity checks can go a long way in validating results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Sanity Checks

David Duvenaud

Cambridge University
Computational and Biological Learning Lab

April 24, 2013


A Simple Example
Comparing Models of Prawn
Minds Actual Experiments:

• Paper: Comparing evidence 102 reptitions of prawn flocking:


for different models of prawn
behavior.
• Requires inference
conditioned on results of
experiments.
• Author’s highest-impact
publication venue so far.
function [logP, samples] = logP_mc_ring_memory(theta, direction, N, modelidx, type)

if nargin < 5
type = 1;

if nargin <4
modelidx = 1;
end
end

%downsample inputs, coreelation length is ~10 frames


for i = 1:numel(theta)
theta = theta(:, 1:2:end);
direction = direction(:, 1:2:end);
end

logP = zeros(N, 1, ’double’);


samples = zeros(N, 6, ’double’);

priormin = [0, 1, -2, -2, 0, -7.5];


priormax = [pi, 5, 2, 2, 1, -7.49];
priorrange = priormax-priormin;

switch modelidx
case 0
log_l_pdf = @(x) logP_ring_null(theta, direction, x(1), x(2), x(3:4), x(5), x(6));
case 1
log_l_pdf = @(x) logP_ring_mf(theta, direction, x(1), x(2), x(3:4), x(5), x(6));
case 2
.
.
.
Is anything amiss?

% downsample inputs,
• theta is a cell array, one cell
% coreelation length is ~10 frames per experiment, each
for i = 1:numel(theta)
theta = theta(:, 1:2:end); iteration discards half the
direction = direction(:, 1:2:end); experiments!
end
• At the end of the loop, only 1
of 102 experiments left.
• So many pointless
experiments!
Is anything amiss?

% downsample inputs,
• theta is a cell array, one cell
% coreelation length is ~10 frames per experiment, each
for i = 1:numel(theta)
theta = theta(:, 1:2:end); iteration discards half the
direction = direction(:, 1:2:end); experiments!
end
• At the end of the loop, only 1
of 102 experiments left.
• So many pointless
experiments!

The lesson: never release your code

[update: Was fixed and re-published:


www.ploscompbiol.org/article/
info:doi/10.1371/journal.pcbi.1002961]
Very Common

In Machine Learning In General

• 2009: Oxford vision group • Your code will have bugs!


retraction after including test • My rate: about 1 per line of
cases in training set. matlab.
• A unnamed lab member
almost didn’t include results
in NIPS paper because of
sign error in plots.
• Retraction Watch Blog:
retractionwatch.wordpress.com

How to trust anything?


When writing code

Standard Advice Carl’s Advice

• Write Unit Tests • Re-write your code until it


• Physicists have good uses the right data structure.
protocols. • Keep your code short and
• Compute same thing in simple, no corner cases.
different ways. • Ideally, everything fits on one
• Use checkgrad! page.
When writing code

Standard Advice Carl’s Advice

• Write Unit Tests • Re-write your code until it


• Physicists have good uses the right data structure.
protocols. • Keep your code short and
• Compute same thing in simple, no corner cases.
different ways. • Ideally, everything fits on one
• Use checkgrad! page.

These methods work but slow you down


When running experiments

Things to always compare


against: Datasets to include

• A random guesser (finds • A trivial-to-predict dataset


bugs in evaluation code) (finds major bugs in any
• Always guesses mean/mode method)
(finds too-easy problems) • A dataset with no signal
• 1-nearest neighbour (finds (finds bugs in evaluation
bugs in train/test splitting) code)
• A translated, scaled version
of dataset (finds bugs in
implementation of model)

Can detect problems without looking at code


In General

Notice Confusion Empirical Rates

• Notice when you’re confused


• Notice when you’re
rationalizing
• Red flag: Looking at only
one number and making up
a story about why it goes up
or down (i.e. cog sci )

Look at details until they aren’t suprising


Main Takeaways

To keep in mind To practice

• You probably have bugs • Include simple baselines


• Finding them early saves • Check invariants
time • Plot everything
• Finding them before you • Keep things simple
publish saves retractions

A few sanity checks go a long way

You might also like