Skip to content

BUG: can not make predictions for a new set of x with the same model #7897

@sylviankahane

Description

@sylviankahane

Describe the issue:

I am trying to do Example 1 from the paper of Hogg et al with PyMC. The original set of x, y, and sigma_y contains 16 entries. The PyMC model, with these data, works fine. Now I am trying to obtain predictions for a new set x_new.
The suggested way is:

with model:
pm.set_data({"x": x_new}) # Update the shared data container
y_new = pm.sample_posterior_predictive(trace)

where trace are the results of sampling obtained in the previous step. One can average y_new over the chains and the draws to obtain some predictions.

The problem is that the x_new has to be of the same length as the original x or y, otherwise the above procedure fails (for len(x_new)=18:

ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (16,) and arg 1 with shape (18,).

This is disappointing if one wants to mimic the technique used in big data analysis where a given set is divided into a large train set and a smaller test set.

Reproduceable code example:

import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import pymc as pm
import arviz as az


a = np.linspace(0, x1.min(), 8)
b = np.linspace(x1.max(), 1.5*x1.max(), 10)
predictors_out_of_sample = np.zeros(18)
predictors_out_of_sample[0:8] = a
predictors_out_of_sample[8:118] = b
print(len(predictors_out_of_sample),' > than x1 or y1')

# not working wuth len(predictors_out_of_sample) not ew len(y1)
#x_new = xr.DataArray(predictors_out_of_sample)
with model:
    pm.set_data({"x": predictors_out_of_sample})  # Update the shared data container
    y_test = pm.sample_posterior_predictive(trace)

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File d:\miniconda3\Lib\site-packages\pytensor\compile\function\types.py:1039, in Function.__call__(self, output_subset, *args, **kwargs)
   1038 try:
-> 1039     outputs = vm() if output_subset is None else vm(output_subset=output_subset)
   1040 except Exception:

File d:\miniconda3\Lib\site-packages\pytensor\graph\op.py:544, in Op.make_py_thunk.<locals>.rval(p, i, o, n, cm)
    536 @is_thunk_type
    537 def rval(
    538     p=p,
   (...)    542     cm=node_compute_map,
    543 ):
--> 544     r = p(n, [x[0] for x in i], o)
    545     for entry in cm:

File d:\miniconda3\Lib\site-packages\pytensor\tensor\random\op.py:428, in RandomVariable.perform(self, node, inputs, outputs)
    426 outputs[0][0] = rng
    427 outputs[1][0] = np.asarray(
--> 428     self.rng_fn(rng, *args, None if size is None else tuple(size)),
    429     dtype=self.dtype,
    430 )

File d:\miniconda3\Lib\site-packages\pytensor\tensor\random\op.py:194, in RandomVariable.rng_fn(self, rng, *args, **kwargs)
    193 """Sample a numeric random variate."""
--> 194 return getattr(rng, self.name)(*args, **kwargs)

File numpy/random/_generator.pyx:1290, in numpy.random._generator.Generator.normal()

File numpy/random/_common.pyx:619, in numpy.random._common.cont()

File numpy/random/_common.pyx:536, in numpy.random._common.cont_broadcast_2()

File d:\miniconda3\Lib\site-packages\numpy\__init__.cython-30.pxd:783, in numpy.PyArray_MultiIterNew3()

ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (16,) and arg 1 with shape (18,).

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[703], line 5
      3 with model:
      4     pm.set_data({"x": predictors_out_of_sample})  # Update the shared data container
----> 5     y_test = pm.sample_posterior_predictive(trace)

File d:\miniconda3\Lib\site-packages\pymc\sampling\forward.py:951, in sample_posterior_predictive(trace, model, var_names, sample_dims, random_seed, progressbar, progressbar_theme, return_inferencedata, extend_inferencedata, predictions, idata_kwargs, compile_kwargs)
    946 # there's only a single chain, but the index might hit it multiple times if
    947 # the number of indices is greater than the length of the trace.
    948 else:
    949     param = _trace[idx % len_trace]
--> 951 values = sampler_fn(**param)
    953 for k, v in zip(vars_, values):
    954     ppc_trace_t.insert(k.name, v, idx)

File d:\miniconda3\Lib\site-packages\pymc\util.py:390, in point_wrapper.<locals>.wrapped(**kwargs)
    388 def wrapped(**kwargs):
    389     input_point = {k: v for k, v in kwargs.items() if k in ins}
--> 390     return core_function(**input_point)

File d:\miniconda3\Lib\site-packages\pytensor\compile\function\types.py:1049, in Function.__call__(self, output_subset, *args, **kwargs)
   1047     if hasattr(self.vm, "thunks"):
   1048         thunk = self.vm.thunks[self.vm.position_of_error]
-> 1049     raise_with_op(
   1050         self.maker.fgraph,
   1051         node=self.vm.nodes[self.vm.position_of_error],
   1052         thunk=thunk,
   1053         storage_map=getattr(self.vm, "storage_map", None),
   1054     )
   1055 else:
   1056     # old-style linkers raise their own exceptions
   1057     raise

File d:\miniconda3\Lib\site-packages\pytensor\link\utils.py:526, in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
    521     warnings.warn(
    522         f"{exc_type} error does not allow us to add an extra error message"
    523     )
    524     # Some exception need extra parameter in inputs. So forget the
    525     # extra long error message in that case.
--> 526 raise exc_value.with_traceback(exc_trace)

File d:\miniconda3\Lib\site-packages\pytensor\compile\function\types.py:1039, in Function.__call__(self, output_subset, *args, **kwargs)
   1037     t0_fn = time.perf_counter()
   1038 try:
-> 1039     outputs = vm() if output_subset is None else vm(output_subset=output_subset)
   1040 except Exception:
   1041     self._restore_defaults()

File d:\miniconda3\Lib\site-packages\pytensor\graph\op.py:544, in Op.make_py_thunk.<locals>.rval(p, i, o, n, cm)
    536 @is_thunk_type
    537 def rval(
    538     p=p,
   (...)    542     cm=node_compute_map,
    543 ):
--> 544     r = p(n, [x[0] for x in i], o)
    545     for entry in cm:
    546         entry[0] = True

File d:\miniconda3\Lib\site-packages\pytensor\tensor\random\op.py:428, in RandomVariable.perform(self, node, inputs, outputs)
    424     rng = deepcopy(rng)
    426 outputs[0][0] = rng
    427 outputs[1][0] = np.asarray(
--> 428     self.rng_fn(rng, *args, None if size is None else tuple(size)),
    429     dtype=self.dtype,
    430 )

File d:\miniconda3\Lib\site-packages\pytensor\tensor\random\op.py:194, in RandomVariable.rng_fn(self, rng, *args, **kwargs)
    192 def rng_fn(self, rng, *args, **kwargs) -> int | float | np.ndarray:
    193     """Sample a numeric random variate."""
--> 194     return getattr(rng, self.name)(*args, **kwargs)

File numpy/random/_generator.pyx:1290, in numpy.random._generator.Generator.normal()

File numpy/random/_common.pyx:619, in numpy.random._common.cont()

File numpy/random/_common.pyx:536, in numpy.random._common.cont_broadcast_2()

File d:\miniconda3\Lib\site-packages\numpy\__init__.cython-30.pxd:783, in numpy.PyArray_MultiIterNew3()

ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (16,) and arg 1 with shape (18,).
Apply node that caused the error: normal_rv{"(),()->()"}(RNG(<Generator(PCG64) at 0x2038BFA0BA0>), MakeVector{dtype='int64'}.0, Composite{((i0 * i1) + i2)}.0, Composite{sqrt((246.05275315838355 + sqr((0.09374713619993619 * i0))))}.0)
Toposort index: 6
Inputs types: [RandomGeneratorType, TensorType(int64, shape=(1,)), TensorType(float64, shape=(None,)), TensorType(float64, shape=(None,))]
Inputs shapes: ['No shapes', (1,), (18,), (18,)]
Inputs strides: ['No strides', (8,), (8,), (8,)]
Inputs values: [Generator(PCG64) at 0x2038BFA0BA0, array([16]), 'not shown', 'not shown']
Outputs clients: [[output[7](normal_rv{"(),()->()"}.0)], [output[6](y_obs), DeepCopyOp(y_obs), DeepCopyOp(y_obs), DeepCopyOp(y_obs), DeepCopyOp(y_obs), DeepCopyOp(y_obs), DeepCopyOp(y_obs)]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
  File "d:\miniconda3\Lib\site-packages\IPython\core\async_helpers.py", line 128, in _pseudo_sync_runner
    coro.send(None)
  File "d:\miniconda3\Lib\site-packages\IPython\core\interactiveshell.py", line 3362, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "d:\miniconda3\Lib\site-packages\IPython\core\interactiveshell.py", line 3607, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "d:\miniconda3\Lib\site-packages\IPython\core\interactiveshell.py", line 3667, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\sylvi\AppData\Local\Temp\ipykernel_18312\1755518607.py", line 22, in <module>
    y_obs = pm.Normal("y_obs", mu=μ, sigma=σ_μ, observed=y)
  File "d:\miniconda3\Lib\site-packages\pymc\distributions\distribution.py", line 529, in __new__
    rv_out = cls.dist(*args, **kwargs)
  File "d:\miniconda3\Lib\site-packages\pymc\distributions\continuous.py", line 491, in dist
    return super().dist([mu, sigma], **kwargs)
  File "d:\miniconda3\Lib\site-packages\pymc\distributions\distribution.py", line 598, in dist
    return cls.rv_op(*dist_params, size=create_size, **kwargs)

HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

1

PyMC version information:

'5.25.1'

win11

Context for the issue:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions