Skip to content

Remove necessity to define likelihood variable inside model, is done … #2339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 27, 2017

Conversation

hvasbath
Copy link
Contributor

…now inside sampler init

  • added testing intermediate stage loading results
  • renamed ATMIP_sample to smc_sample
  • minor fix in clearing existing stages

Thumbs up to @junpenglao 's idea!

…now inside sampler init

 * added testing intermediate stage loading results
 * renamed ATMIP_sample to smc_sample
 * minor fix in clearing existing stages
@@ -147,6 +147,10 @@ def __init__(self, vars=None, out_vars=None, n_chains=100, scaling=1., covarianc
vars = inputvars(vars)

if out_vars is None:
if not any(likelihood_name == RV.name for RV in model.unobserved_RVs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not leave the option for user to define Deterministic likelihood.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a deterministic likelihood?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is basically to check if by any chance the likelihood name variable is defined already, repeated initialisation of the same model would also add the Deterministic again into the model, which would result in an error
this is basically a workaround to get the likelihood written into the trace @fonnesbeck

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you add the logp as a sampler statistic? That it would be attached to the trace but didn't clutter the normal variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would I do that? @aseyboldt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each step method can set generates_stats to True and define dtypes for the individual values in stats_dtypes. You can have a look at nuts.py lines 75 to 87. The step method must then return for each step a tuple (q, [stats]), where q is the sampled point and stats contains the statistics it wants to store.

Copy link
Contributor Author

@hvasbath hvasbath Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that feature didnt exist when I started writing SMC, I guess now it could be even possible to get it into the common pm.sample API ... however, that would require some major changes I have no time to do that in the near future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aseyboldt I didnt know about that as well!

@@ -28,7 +28,7 @@
from .arraystep import metrop_select
from ..backends import smc_text as atext

__all__ = ['SMC', 'ATMIP_sample']
__all__ = ['SMC', 'smc_sample']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep acronyms in all caps, so SMC_sample

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! will change it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other dedicated sampling functions are named the other way round: sample_gp, sample_ppc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And according to pep8 acronyms in class names should only have the first letter capitalized: http://legacy.python.org/dev/peps/pep-0008/#descriptive-naming-styles
I guess that means that acronyms in function names should be all lower case (which is the case in the newer python stdlib modules I think, eg ipaddress).

Copy link
Member

@twiecki twiecki Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on sample_SMC().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twiecki Sure? Shouldn't it be sample_smc? I think this goes clearly against the common convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I prefer sample_smc, too. sample_SMC looks clunky and doesn't follow convention.

@@ -419,9 +423,9 @@ def resample(self):
return outindx


def ATMIP_sample(n_steps, step=None, start=None, homepath=None, chain=0, stage=0, n_jobs=1,
def smc_sample(n_steps, step=None, start=None, homepath=None, chain=0, stage=0, n_jobs=1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMC_sample

@hvasbath
Copy link
Contributor Author

I cant see why the tests are marked as failed? There are no failed tests in the list!

@fonnesbeck
Copy link
Member

PyLint failures?

@junpenglao
Copy link
Member

It happened in #1977 as well. Is it relate to xfail?

@aseyboldt
Copy link
Member

@hvasbath Doesn't this do an eigenvalue decomposition of the same proposal correlation matrix in each step right now? There are already normal proposal classes that do a cholesky decomposition once and use that to draw samples in metropolis.py. That should be much faster.

@hvasbath
Copy link
Contributor Author

hvasbath commented Jun 21, 2017

@aseyboldt No, it updates the proposal covariance once each transition state, then uses the same proposal covariance throughout the sampling steps. The proposal_steps are created only once at the beginning of each chain for the whole chain. So no repeated RNG call here. Calling the RNG each step is unbelievably inefficient as discussed here: #1034

@aseyboldt
Copy link
Member

On first glance that looks like it might just be the eigen decomposition I was talking about. To draw a mvnormal you need to factor the covariance matrix first. This is very expensive (O(n^3)), so you only want to do that once. After that you only need to multiply independent normal draws with the factored matrix, which is much faster than the factorization.

In [30]: cov = np.random.randn(500, 500)
In [31]: cov = 500 * np.eye(500) + cov @ cov.T
In [32]: prop = pm.step_methods.metropolis.MultivariateNormalProposal(cov)
In [33]: %timeit prop()
81.6 µs ± 902 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [34]: prop = pm.step_methods.smc.MultivariateNormalProposal(cov)
In [35]: %timeit prop()
140 ms ± 665 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

You would still get a bit of a speedup by drawing several samples at the same time I guess...

@hvasbath
Copy link
Contributor Author

Ah now I get it. Good point! Also did miss that there was improvement in the metropolis MVNormal -however we should improve that to be able to draw several values at once ...

@ColCarroll
Copy link
Member

I restarted that build -- this is an ongoing issue that

pymc3/tests/test_examples.py::TestARM5_4::test_run

takes O(10 minutes) to run. Travis kills the job if there's no output for 10 minutes. A partial fix would be turning on the progressbar to get some output (a real fix would be making the test faster, or adding a docstring explaining why it takes 10 minutes!)

@@ -419,9 +408,9 @@ def resample(self):
return outindx


def ATMIP_sample(n_steps, step=None, start=None, homepath=None, chain=0, stage=0, n_jobs=1,
def sample_SMC(n_steps, step=None, start=None, homepath=None, chain=0, stage=0, n_jobs=1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_smc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before I change it again could we please vote ;) or is this the final statement? because I had it already once like that ;) ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're at 2 vs 1 currently (@aseyboldt, @twiecki and @fonnesbeck). @junpenglao @ColCarroll?

@ColCarroll
Copy link
Member

ColCarroll commented Jun 23, 2017 via email

@junpenglao
Copy link
Member

lower case, i agree with.

@fonnesbeck
Copy link
Member

I vote that we just be consistent. What is the policy? Lower case for functions and caps for classes?

@twiecki
Copy link
Member

twiecki commented Jun 23, 2017

I like that policy.

@fonnesbeck
Copy link
Member

Sounds good. Go ahead and change it back @hvasbath. Sorry for the hassle.

@hvasbath hvasbath mentioned this pull request Jun 24, 2017
@hvasbath
Copy link
Contributor Author

Some of the tests exceed the time, and some dont give output for 10mins. I guess its related to what @ColCarroll mentioned above ...

@junpenglao
Copy link
Member

Yeah ongoing issue... restarted them by hand for you.

@hvasbath
Copy link
Contributor Author

hvasbath commented Jun 27, 2017

What happened to the notebook? Ah I see got removed for the release ... I guess here only the tests need to be rerun after the theano caching trick they should run through...

@fonnesbeck
Copy link
Member

#2354 puts the notebook back in. I will go ahead and merge it.

@junpenglao
Copy link
Member

@hvasbath the conflict still need to resolved tho

@junpenglao junpenglao merged commit 53fa46b into pymc-devs:master Jun 27, 2017
@junpenglao
Copy link
Member

Thanks @hvasbath!

@hvasbath hvasbath deleted the smc_model_llk branch June 27, 2017 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants