Skip to content

Conversation

@ericmjl
Copy link
Member

@ericmjl ericmjl commented Nov 20, 2025

Fix: sample_posterior_predictive incorrectly resamples dependencies of Deterministic variables

Problem Description

When using sample_posterior_predictive with predictions=True and including a Deterministic variable in var_names, the function incorrectly resamples the Deterministic's dependencies (random variables) instead of using their values from the posterior trace.

Expected Behavior

When a Deterministic variable is included in var_names, it should be recomputed from the posterior samples of its dependencies, not cause those dependencies to be resampled.

Actual Behavior (Before Fix)

The dependencies (random variables) were resampled (likely from their priors), causing:

  1. Incorrect variance (much higher than expected - often 1000x+)
  2. Values that don't match the posterior
  3. Correlation near zero between posterior and predictions

Impact

This bug affects any use case where:

  • Deterministic variables are included in var_names for sample_posterior_predictive
  • Users want to recompute Deterministic variables for new data/coordinates
  • Users expect Deterministic variables to have the same variance as their dependencies

Root Cause

In pymc/sampling/forward.py, the compile_forward_sampling_function marks variables as "volatile" if they:

  1. Are in the outputs list
  2. Are basic RVs not in the trace
  3. Have volatile inputs

When a Deterministic variable was included in var_names:

  1. It became an output
  2. The graph walking logic treated it like a random variable
  3. Its dependencies (intercepts, slopes, etc.) got incorrectly marked as volatile
  4. They got resampled instead of using trace values

The documentation previously acknowledged this problem (line 844 in forward.py):

"Including a Deterministic in var_names may incorrectly force a random variable to be resampled"

Solution

Added a second pass in compile_forward_sampling_function that:

  1. Identifies Deterministic variables in outputs
  2. Finds their trace ancestors (random variables in the trace)
  3. If all trace ancestors are in the trace, unmarks both the Deterministic and its ancestors as volatile
  4. This ensures Deterministic variables are recomputed from trace values rather than causing resampling

Implementation

The fix adds a second pass after the initial volatile node marking:

# Second pass: Unmark Deterministic outputs and their trace dependencies
# if all trace ancestors of the Deterministic are in trace
for output in fg.outputs:
    if is_deterministic(output):
        # Find all ancestors that are in basic_rvs (the actual random variables)
        output_ancestors = ancestors([output], blockers=[])
        trace_ancestors = [
            anc
            for anc in output_ancestors
            if anc in vars_in_trace_set and anc in basic_rvs
        ]
        all_trace_ancestors_in_trace = all(
            anc in vars_in_trace_set for anc in trace_ancestors
        )

        if all_trace_ancestors_in_trace and trace_ancestors:
            # Unmark the Deterministic itself - it will be recomputed from trace values
            volatile_nodes.discard(output)
            # Unmark its trace ancestors - they should use trace values, not be resampled
            for anc in trace_ancestors:
                volatile_nodes.discard(anc)

This ensures that when sample_posterior_predictive is called with var_names=["mu_grid"] (where mu_grid is a Deterministic), the output shows Sampling: [] instead of Sampling: [intercepts, slopes], indicating that no variables are being resampled.

Changes Made

  1. Fixed compile_forward_sampling_function in pymc/sampling/forward.py:

    • Added deterministics parameter to identify Deterministic variables
    • Added second pass to unmark Deterministic outputs and their trace dependencies as volatile
    • Updated call site to pass model.deterministics
  2. Added comprehensive test suite in tests/sampling/test_deterministic_posterior_predictive.py:

    • Test basic Deterministic recomputation (no resampling)
    • Test random variables depending on Deterministic (correct sampling)
    • Test nested Deterministic dependencies
    • Test mixed trace/non-trace dependencies
    • Test logging verification (Sampling: [])
  3. Updated documentation in pymc/sampling/forward.py:

    • Removed the warning about incorrect behavior
    • Added note explaining correct behavior
    • Added examples showing when resampling occurs vs. when it doesn't

Testing

All tests pass, including:

  • Existing tests continue to pass
  • New tests verify correct behavior for Deterministic variables
  • Edge cases are covered (nested Deterministics, mixed dependencies)

Example

import pymc as pm
import numpy as np

with pm.Model() as model:
    intercepts = pm.Normal("intercepts", mu=0, sigma=1, shape=(2,))
    slopes = pm.Normal("slopes", mu=0, sigma=1, shape=(2,))
    
    # Deterministic variable
    mu_grid = pm.Deterministic(
        "mu_grid",
        intercepts[:, None] + slopes[:, None] * np.array([0, 12, 24])[None, :],
    )
    
    idata = pm.sample(draws=100, tune=100)

# Before fix: Sampling: [intercepts, slopes] (incorrect)
# After fix: Sampling: [] (correct - recomputed from trace)
with model:
    idata_pp = pm.sample_posterior_predictive(
        idata, var_names=["mu_grid"], predictions=True
    )

# Variance matches posterior exactly
assert np.isclose(
    idata.posterior.mu_grid.var(),
    idata_pp.predictions.mu_grid.var(),
    rtol=0.1
)

Question for Maintainers

Does this fix align with the intended behavior of sample_posterior_predictive? We believe this is the correct behavior - Deterministic variables should be recomputed from trace values, not cause their dependencies to be resampled.

… cause resampling

- Fix compile_forward_sampling_function to recompute Deterministic variables
  from trace values instead of resampling their dependencies
- Add comprehensive test suite covering edge cases
- Update documentation to reflect correct behavior

Fixes issue where Deterministic variables in var_names would incorrectly
force their dependencies to be resampled, causing incorrect variance.
@williambdean
Copy link
Contributor

Seems related to #7069

@ericmjl
Copy link
Member Author

ericmjl commented Nov 20, 2025

@ricardoV94 / @lucianopaz -- please hold off on review, I've got a few more things I'd like to do.

@ericmjl ericmjl marked this pull request as draft November 20, 2025 15:53
@ricardoV94
Copy link
Member

Note also #6876

A Deterministic may be structurally "constant" (may not even depend on any random variable) but be stale after graph intervention with pm.do / or change of ConstantData/coords. So while Deterministic shouldn't be a source of volatility, the fix shouldn't override volatility analysis on the inputs. I haven't read the code so this may be handled correctly, just wanted to bring up for context.

We probably need to fix both at once, because right now setting Deterministics in var_names is the only way users can work around the bug in #6876

I think these two fixes can be done regardless of #7069 which will need a careful user deprecation cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants