Hi all,
I’m trying to build a PyMC model that estimates the variances for three multivariate normal distributions in a categorical decision-making task. To verify that the model can recover parameters, I first generated synthetic data where I know the true variances.
However, when setting up the model, I run into a problem where the likelihood becomes -inf
, causing a SamplingError
.
I’ll include code that should reproduce the problem with data generation and modeling steps below to show where I think the problem happens.
Data generation:
import numpy as np
import pandas as pd
from scipy import stats
nr_participants = 2
grid_size = 100
x_vals = np.linspace(-1, 1, num=grid_size)
y_vals = np.linspace(-1, 1, num=grid_size)
X, Y = np.meshgrid(x_vals, y_vals)
grid_points = np.vstack([X.ravel(), Y.ravel()]).T
# Define means and covariances for each choice
means = [
np.array([0.1, -0.6]),
np.array([0.5, 0.5]),
np.array([-0.6, 0.2])
]
covariances = [
np.diag([0.1**2, 0.1**2]),
np.diag([0.15**2, 0.15**2]),
np.diag([0.2**2, 0.2**2])
]
# Calculate log-likelihoods for each choice
likelihoods = []
for mean, cov in zip(means, covariances):
dist = stats.multivariate_normal(mean, cov)
likelihoods.append(dist.logpdf(grid_points))
# Convert log-likelihoods to probabilities
log_likelihoods = np.vstack(likelihoods).T
probabilities = np.exp(log_likelihoods)
probabilities /= probabilities.sum(axis=1, keepdims=True)
# Simulate observed choices
observed_choices = []
for _ in range(nr_participants):
for prob in probabilities:
choice = np.random.choice([1, 2, 3], p=prob)
observed_choices.append(choice)
probs_repeated = np.round(probabilities, 3).tolist() * nr_participants
x = np.tile(grid_points[:, 0], nr_participants)
y = np.tile(grid_points[:, 1], nr_participants)
participants = np.repeat(np.arange(nr_participants), len(grid_points))
df= pd.DataFrame({
'x': x,
'y': y,
'participant': participants,
'response': observed_choices,
'probs': probs_repeated
})
Model specification:
import pymc as pm
import pytensor.tensor as pt
values = df[['x', 'y']].values
y = df['response'].values - 1 # Make responses 0-based
means = [
np.array([0.1, -0.6]),
np.array([0.5, 0.5]),
np.array([-0.6, 0.2])
]
with pm.Model() as model:
# Priors on standard deviations
sigma1 = pm.HalfNormal('sigma1', sigma=0.3, shape=2)
sigma2 = pm.HalfNormal('sigma2', sigma=0.3, shape=2)
sigma3 = pm.HalfNormal('sigma3', sigma=0.3, shape=2)
# Covariance matrices
cov1 = pt.diag(sigma1 ** 2)
cov2 = pt.diag(sigma2 ** 2)
cov3 = pt.diag(sigma3 ** 2)
rv1 = pm.MvNormal.dist(mu=means[0], cov=cov1)
rv2 = pm.MvNormal.dist(mu=means[1], cov=cov2)
rv3 = pm.MvNormal.dist(mu=means[2], cov=cov3)
# Log-likelihoods
log_lik1 = pm.logp(rv1, values)
log_lik2 = pm.logp(rv2, values)
log_lik3 = pm.logp(rv3, values)
log_liks = pt.stack([log_lik1, log_lik2, log_lik3], axis=1)
# Softmax transformation
probs = pm.math.softmax(log_liks)
# Clip and renormalize (safe for numerical issues)
epsilon = 1e-7
p_safe = pt.clip(probs, epsilon, 1 - epsilon)
p_safe = p_safe / pt.sum(p_safe, axis=1, keepdims=True)
# Categorical likelihood
obs = pm.Categorical('obs', p=p_safe, observed=y)
# Sample
trace = pm.sample(1000, tune=1000, target_accept=0.9, return_inferencedata=True)
When I try to sample, I get:
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'sigma1_log__': array([-3.07162307, -3.82064486]), 'sigma2_log__': array([-3.28873934, -3.60609323]), 'sigma3_log__': array([-3.91690929, -2.75276419])}
Logp initial evaluation results:
{'sigma1': -1.79, 'sigma2': -1.52, 'sigma3': -2.59, 'obs': -inf}
You can call `model.debug()` for more details.
The likelihood for obs are -inf, eventhough I already add epsilon in case that a 0-probability is encountered.
I added the lists for the log likelihoods and probs to the model to trace back the error and found that the problem is occuring due to the way the log_liks are transfomred into probabilities with the softmax function.
For example the first few entries of the log_liks
array([
[-756.5, -34858.9, -207401.9],
[-729.6, -34399.9, -207053.4],
[-703.3, -33947.2, -206722.9],
])
We can see that the first value is much higher (i.e., less negative) than the others.
Thus, the resulting softmax probabilities should look like:
array([
[~1.0, ~0.0, ~0.0],
[~1.0, ~0.0, ~0.0],
[~1.0, ~0.0, ~0.0],
])
However, what I actually observe from probs.eval()
is:
array([
[0.0, 0.0, 0.0],
[1e-321, 0.0, 0.0],
[1e-310, 0.0, 0.0],
])
Meaning that all choices have nearly zero probability, which leads to the -inf
likelihood.
Why is pm.math.softmax(log_liks)
producing extremely small values (basically zeros) instead of proper probabilities?
How should I correctly transform these log-likelihoods into stable probabilities for the categorical likelihood?
Any help would be greatly appreciated!