Rule extraction with Bayesian Logistic Regression

Katya_Voloshina · March 26, 2025, 10:37pm

Hello!

I am a beginner in Bayesian modelling and looking for advice on a specific implementation. I am working with small datasets and I thought that estimating priors with posteriors taken from a model trained on similar data would help (I work with languages, so training a model on a closely related model and then initiating another model with posterios).

However, my main task is to extract what factors affect the choice of answer. So, I would need some feature selection / rule extraction algorithm.

I have been looking into Bayesian Sparse Logistic Regression or adapting this tutorial. However, maybe I am missing something and there are other ways of doing that?

I would appreciate any advice!

jonsedar · April 3, 2025, 6:22am

Can I assume you essentnially have a linear model of covariates (Bx) and a logistic transform?

If so, by using a relatively standard practice Normal prior on the Betas you get L2 regularization for free, or you can force L1 regularization by using a Laplace prior.

Some (quite old, possibly out of date) code dicussed here: How to add an L1 Regularization on the likelihood when use pymc3 to sample a MCMC although I expect you can probably find newer discussions / code approaches here too

bob-carpenter · April 4, 2025, 8:35pm

There are a lot of ways to do this.

Unfortunately, this isn’t one of them. If you use these as priors and look at Bayesian posterior means, there is zero probability (measure zero) that you will bet a posterior mean of zero, because it’s a continuous distribution.

L1 (lasso) regularization can force actual zeros if you use maximum likelihood. L2 (ridge) won’t.

The only way to assign a non-zero probability of zero in the posterior is to have a non-zero probability in the prior, which means a spike-and-slab prior. That is, you make the prior a mixture of a probability mass at zero and a continuous density elsewhere. This will let the prior, and hence the posterior, assign probability mass at zero. Otherwise you never get posterior probabity mass of zero in a Bayesian approach because it’s a measure zero set in a continuous distribution.

[EDIT: But, you’re not going to be able to fit spike-and-slab with HMC/NUTS, because the marginalization is combinatorial (it will work with a handful of coefficients, but not more). You might be able to get PyMC to fit it by sampling slab/no-slab with a discrete sampler, but the problem there is that this is an NP-hard problem in general, so no way to guarantee you get reasonable answers everywhere in reasonable time.]

P.S. There’s an elastic net prior that combines the goodness of L1 (can get actual zero results) and L2 (will identify coefficients for collinearity). But again, it won’t work with Bayesian posterior inference, only with penalized maximum likelihood. And even then, you have to write a special optimizer to get an actual value of zero after finitely many iterations (it has to truncate at zero otherwise it will see-saw).

jessegrabowski · April 5, 2025, 12:57am

Have you seen this continuous spike-and-slab prior? I played around with it a bit and it seemed to do reasonable things, but I admittedly don’t have a lot of experience working with shrinkage priors to know what “reasonable” is.

bob-carpenter · April 7, 2025, 2:57pm

The horseshoe prior is also a continuous form of the spike and slab. The only problem is that if you have a continuous prior, you have a continuous posterior, so there’s no way to get non-zero probability mass at zero. Oddly, the paper you linked doesn’t seem to mention this. If all you care about is predictive performance, shrinking to nearly zero is good enough (after you take scale of covariates into account). But if you have a bajillion covariates and want to trim them for run-time speed, there’s a post-processing step you need to do.

Topic		Replies	Views
Bayesian sparse logistic regression in PyMC3 Questions	2	692	February 7, 2020
Prior choice for discrete features - logistic regression version agnostic prior , modeling	9	1300	January 1, 2023
L1 and L2 pegularization for an Autoregressive model Questions	4	1187	January 1, 2020
Using prior distributions in pymc3 Questions development	0	659	May 30, 2019
TwoFer: Prior Predictive Checks - An Applied Question & a Conceptual Question	0	433	August 12, 2021

Rule extraction with Bayesian Logistic Regression

Related topics