Problem with pymc softmax transformation producing unexpected results

Hi all,

I’m trying to build a PyMC model that estimates the variances for three multivariate normal distributions in a categorical decision-making task. To verify that the model can recover parameters, I first generated synthetic data where I know the true variances.

However, when setting up the model, I run into a problem where the likelihood becomes -inf, causing a SamplingError.

I’ll include code that should reproduce the problem with data generation and modeling steps below to show where I think the problem happens.

Data generation:

import numpy as np
import pandas as pd
from scipy import stats

nr_participants = 2
grid_size = 100
x_vals = np.linspace(-1, 1, num=grid_size)
y_vals = np.linspace(-1, 1, num=grid_size)

X, Y = np.meshgrid(x_vals, y_vals)
grid_points = np.vstack([X.ravel(), Y.ravel()]).T

# Define means and covariances for each choice
means = [
    np.array([0.1, -0.6]),
    np.array([0.5, 0.5]),
    np.array([-0.6, 0.2])
]
covariances = [
    np.diag([0.1**2, 0.1**2]),
    np.diag([0.15**2, 0.15**2]),
    np.diag([0.2**2, 0.2**2])
]

# Calculate log-likelihoods for each choice
likelihoods = []
for mean, cov in zip(means, covariances):
    dist = stats.multivariate_normal(mean, cov)
    likelihoods.append(dist.logpdf(grid_points))

# Convert log-likelihoods to probabilities
log_likelihoods = np.vstack(likelihoods).T
probabilities = np.exp(log_likelihoods)
probabilities /= probabilities.sum(axis=1, keepdims=True)

# Simulate observed choices
observed_choices = []
for _ in range(nr_participants):
    for prob in probabilities:
        choice = np.random.choice([1, 2, 3], p=prob)
        observed_choices.append(choice)

probs_repeated = np.round(probabilities, 3).tolist() * nr_participants
x = np.tile(grid_points[:, 0], nr_participants)
y = np.tile(grid_points[:, 1], nr_participants)
participants = np.repeat(np.arange(nr_participants), len(grid_points))

df= pd.DataFrame({
    'x': x,
    'y': y,
    'participant': participants,
    'response': observed_choices,
    'probs': probs_repeated
})

Model specification:

import pymc as pm
import pytensor.tensor as pt

values = df[['x', 'y']].values
y = df['response'].values - 1  # Make responses 0-based

means = [
    np.array([0.1, -0.6]),
    np.array([0.5, 0.5]),
    np.array([-0.6, 0.2])
]

with pm.Model() as model:
    # Priors on standard deviations
    sigma1 = pm.HalfNormal('sigma1', sigma=0.3, shape=2)
    sigma2 = pm.HalfNormal('sigma2', sigma=0.3, shape=2)
    sigma3 = pm.HalfNormal('sigma3', sigma=0.3, shape=2)

    # Covariance matrices
    cov1 = pt.diag(sigma1 ** 2)
    cov2 = pt.diag(sigma2 ** 2)
    cov3 = pt.diag(sigma3 ** 2)

    rv1 = pm.MvNormal.dist(mu=means[0], cov=cov1)
    rv2 = pm.MvNormal.dist(mu=means[1], cov=cov2)
    rv3 = pm.MvNormal.dist(mu=means[2], cov=cov3)

    # Log-likelihoods
    log_lik1 = pm.logp(rv1, values)
    log_lik2 = pm.logp(rv2, values)
    log_lik3 = pm.logp(rv3, values)

    log_liks = pt.stack([log_lik1, log_lik2, log_lik3], axis=1)

    # Softmax transformation
    probs = pm.math.softmax(log_liks)

    # Clip and renormalize (safe for numerical issues)
    epsilon = 1e-7
    p_safe = pt.clip(probs, epsilon, 1 - epsilon)
    p_safe = p_safe / pt.sum(p_safe, axis=1, keepdims=True)

    # Categorical likelihood
    obs = pm.Categorical('obs', p=p_safe, observed=y)

    # Sample
    trace = pm.sample(1000, tune=1000, target_accept=0.9, return_inferencedata=True)

When I try to sample, I get:

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'sigma1_log__': array([-3.07162307, -3.82064486]), 'sigma2_log__': array([-3.28873934, -3.60609323]), 'sigma3_log__': array([-3.91690929, -2.75276419])}

Logp initial evaluation results:
{'sigma1': -1.79, 'sigma2': -1.52, 'sigma3': -2.59, 'obs': -inf}
You can call `model.debug()` for more details.

The likelihood for obs are -inf, eventhough I already add epsilon in case that a 0-probability is encountered.
I added the lists for the log likelihoods and probs to the model to trace back the error and found that the problem is occuring due to the way the log_liks are transfomred into probabilities with the softmax function.

For example the first few entries of the log_liks

array([
  [-756.5, -34858.9, -207401.9],
  [-729.6, -34399.9, -207053.4],
  [-703.3, -33947.2, -206722.9],
])

We can see that the first value is much higher (i.e., less negative) than the others.

Thus, the resulting softmax probabilities should look like:

array([
  [~1.0, ~0.0, ~0.0],
  [~1.0, ~0.0, ~0.0],
  [~1.0, ~0.0, ~0.0],
])

However, what I actually observe from probs.eval() is:

array([
  [0.0, 0.0, 0.0],
  [1e-321, 0.0, 0.0],
  [1e-310, 0.0, 0.0],
])

Meaning that all choices have nearly zero probability, which leads to the -inf likelihood.

Why is pm.math.softmax(log_liks) producing extremely small values (basically zeros) instead of proper probabilities?
How should I correctly transform these log-likelihoods into stable probabilities for the categorical likelihood?

Any help would be greatly appreciated!

Are you missing an axis in softmax?

1 Like

Oops, silly me!
Thank you so much!
Now I am surprised that also chatGPT didn’t point that out in the whole conversation I was having with it since a few days since it’s such a common mistake :melting_face:

1 Like

Can I maybe ask a follow up question?
Is there any obvious reason why the parameters would not be recovered by the model correctly? There is a pattern that the parameter are systematically underestiamted by approximately the same magnitue.

Is this a magnitude to worry about? Shouldn’t they be recovered with the peak at the very value I used for simulation?