Add generic observation processes which combine the convolution with the noise model. #644

cdc-mitzimorris · 2025-12-23T16:23:17Z

This PR adds work that was done in https://github.com/cdcent/cfa-pyrenew-hierarchical/pull/4 to PyRenew.

It adds the base observation process class, concrete implementations for Count processes and the abstract base class for Measurement processes, together with unit tests and two new tutorials for count and measurement observation processes respectively.

Once this PR and the work done in https://github.com/cdcent/cfa-pyrenew-hierarchical/pull/5 have been added to PyRenew, subsequent PRs will deprecate unused features and harmonize the documentation and tutorials.

codecov · 2025-12-23T16:27:21Z

Codecov Report

❌ Patch coverage is 98.38710% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.26%. Comparing base (02446c5) to head (7e48001).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
pyrenew/observation/count_observations.py	96.55%	2 Missing ⚠️
pyrenew/observation/noise.py	98.38%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #644      +/-   ##
==========================================
+ Coverage   96.98%   97.26%   +0.28%     
==========================================
  Files          42       47       +5     
  Lines        1094     1280     +186     
==========================================
+ Hits         1061     1245     +184     
- Misses         33       35       +2

Flag	Coverage Δ
unittests	`97.26% <98.38%> (+0.28%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-12-23T16:27:38Z

Thank you for your contribution @cdc-mitzimorris 🚀! Your github-pages is ready for download 👉 here 👈!
_{(The artifact expires on 2026-02-03T16:32:57Z. You can re-generate it by re-running the workflow here.)}

…ents, 'aggregate' instead of 'jurisdiction'

for more information, see https://pre-commit.ci

… into mem_generic_observations

for more information, see https://pre-commit.ci

docs/tutorials/observation_processes_counts.qmd

… into mem_generic_observations

pyrenew/observation/measurements.py

dylanhmorris

Thanks @cdc-mitzimorris! A few things to address and then I can re-review.

Co-authored-by: Dylan H. Morris <dylanhmorris@users.noreply.github.com>

… into mem_generic_observations

for more information, see https://pre-commit.ci

… into mem_generic_observations

cdc-mitzimorris · 2026-01-15T18:40:05Z

@dylanhmorris - ready for re-review - made all suggested changes to the tutorials and added arg "name" to observation processes so that user specifies the signal name.

pyrenew/observation/base.py

test/test_observation_measurements.py

dylanhmorris

Thanks @cdc-mitzimorris! Just a couple remaining questions.

cdc-mitzimorris · 2026-01-26T15:52:28Z

changes made.

cdc-mitzimorris · 2026-01-26T16:51:57Z

@dylanhmorris - conversation resolved.

dylanhmorris

Still need the separate noise module.

dylanhmorris · 2026-01-26T18:39:39Z

pyrenew/observation/noise.py

@cdc-mitzimorris this still needs to be implemented.

damonbayer · 2026-01-26T18:44:24Z

pyrenew/observation/count_observations.py

+    Output preserves input timeline. First len(delay_pmf)-1 days return
+    -1 or ~0 (depending on noise model) due to NaN padding.


What is meant by ~0? and why does the padding change depending on the noise model?

I think we should discuss this f2f in our upcoming meeting.

damonbayer · 2026-01-26T19:41:10Z

pyrenew/observation/count_observations.py

+        times : ArrayLike | None
+            Day indices for sparse observations. None for dense observations.


index relative to what vector?

cdc-mitzimorris · 2026-01-27T16:35:12Z

@dylanhmorris ready for re-re-re-review!

dylanhmorris · 2026-01-28T00:14:39Z

pyrenew/observation/base.py

+        site_name = f"{self.name}_{suffix}"
+        numpyro.deterministic(site_name, value)


Use the helper function

Suggested change

site_name = f"{self.name}_{suffix}"

numpyro.deterministic(site_name, value)

numpyro.deterministic(self._sample_site_name(suffix), value)

dylanhmorris · 2026-01-28T00:26:36Z

pyrenew/observation/base.py

+    def _validate_pmf(
+        self,
+        pmf: ArrayLike,
+        param_name: str,
+        atol: float = 1e-6,
+    ) -> None:
+        """
+        Validate that an array is a valid probability mass function.
+
+        Checks:
+
+        - Non-empty array
+        - Sums to 1.0 (within tolerance)
+        - All non-negative values
+
+        Parameters
+        ----------
+        pmf : ArrayLike
+            The PMF array to validate
+        param_name : str
+            Name of the parameter (for error messages)
+        atol : float, default 1e-6
+            Absolute tolerance for sum-to-one check
+
+        Raises
+        ------
+        ValueError
+            If PMF is empty, doesn't sum to 1.0 (within tolerance),
+            or contains negative values.
+        """
+        if pmf.size == 0:
+            raise ValueError(f"{param_name} must return non-empty array")
+
+        pmf_sum = jnp.sum(pmf)
+        if not jnp.isclose(pmf_sum, 1.0, atol=atol):
+            raise ValueError(
+                f"{param_name} must sum to 1.0 (±{atol}), got {float(pmf_sum):.6f}"
+            )
+
+        if jnp.any(pmf < 0):
+            raise ValueError(f"{param_name} must have non-negative values")


Should land this PR and then address, but this duplicates distutil.validate_discrete_dist_vector.

That said, I prefer the implementation here, so this should replace that.

We could also (in #645) consider having discrete PMFs as a strong abstraction. We could then require certain RVs to be of class PMF and deferring the validation to those RVs' constructors. Given how fundamental discrete PMFs are to discrete-time renewal processes I favor this.

dylanhmorris · 2026-01-28T00:34:45Z

pyrenew/observation/base.py

+    def get_minimum_observation_day(self) -> int:
+        """
+        Get the first day with valid (non-NaN) convolution results.
+
+        Due to the convolution operation requiring a history window,
+        the first ``len(pmf) - 1`` days will have NaN values in the
+        output. This method returns the index of the first valid day.
+
+        Returns
+        -------
+        int
+            Day index (0-based) of first valid observation.
+            Equal to ``len(pmf) - 1``.
+        """
+        pmf = self.temporal_pmf_rv()
+        return int(len(pmf) - 1)


Not used anywhere in the class. Would be better to implement in terms of a PMF abstraction (could then also apply to computing the offset in compute_delay_ascertained_incidence. I think refrain from implementing for now.

Suggested change

def get_minimum_observation_day(self) -> int:

"""

Get the first day with valid (non-NaN) convolution results.

Due to the convolution operation requiring a history window,

the first ``len(pmf) - 1`` days will have NaN values in the

output. This method returns the index of the first valid day.

Returns

-------

int

Day index (0-based) of first valid observation.

Equal to ``len(pmf) - 1``.

"""

pmf = self.temporal_pmf_rv()

return int(len(pmf) - 1)

dylanhmorris · 2026-01-28T00:47:40Z

pyrenew/observation/count_observations.py

+        ascertainment_rate = self.ascertainment_rate_rv()
+        if jnp.any(ascertainment_rate < 0) or jnp.any(ascertainment_rate > 1):
+            raise ValueError(
+                "ascertainment_rate_rv must be in [0, 1], "
+                "got value(s) outside this range"
+            )


A single sample doesn't suffice to validate this for stochastic ascertainment rates. I think remove until we have actual support handling for RVs (also some argument that the scaling factor for counts does not have to be <1, though in practice it usually is (since it's usually modeling the quantity P(event reported | infection) = P(event occurs | infection) * P(event reported | event occurs)).

Suggested change

ascertainment_rate = self.ascertainment_rate_rv()

if jnp.any(ascertainment_rate < 0) or jnp.any(ascertainment_rate > 1):

raise ValueError(

"ascertainment_rate_rv must be in [0, 1], "

"got value(s) outside this range"

)

dylanhmorris · 2026-01-28T00:50:02Z

pyrenew/observation/count_observations.py

+        delay_pmf = self.temporal_pmf_rv()
+        self._validate_pmf(delay_pmf, "delay_distribution_rv")


Similarly, remove for now, implement when RVs have attributes that allow strict validation.0

Suggested change

delay_pmf = self.temporal_pmf_rv()

self._validate_pmf(delay_pmf, "delay_distribution_rv")

dylanhmorris · 2026-01-28T00:52:25Z

pyrenew/observation/count_observations.py

+        int
+            Length of delay distribution PMF.
+        """
+        return len(self.temporal_pmf_rv())


But high priority to replace with something that doesn't require a sampling call.

Suggested change

return len(self.temporal_pmf_rv())

return jnp.shape(self.temporal_pmf_rv())[0]

dylanhmorris · 2026-01-28T01:03:10Z

pyrenew/observation/count_observations.py

+        is_1d = infections.ndim == 1
+        if is_1d:
+            infections = infections[:, jnp.newaxis]
+
+        def convolve_col(col):  # numpydoc ignore=GL08
+            return self._convolve_with_alignment(col, delay_pmf, ascertainment_rate)[0]
+
+        predicted_counts = jax.vmap(convolve_col, in_axes=1, out_axes=1)(infections)
+
+        return predicted_counts[:, 0] if is_1d else predicted_counts


If there's going to be a switch for 1d input, I think it's cleaner and clearer to vmap only in the >2d case, and not pad the input only to unpad the output

Suggested change

is_1d = infections.ndim == 1

if is_1d:

infections = infections[:, jnp.newaxis]

def convolve_col(col): # numpydoc ignore=GL08

return self._convolve_with_alignment(col, delay_pmf, ascertainment_rate)[0]

predicted_counts = jax.vmap(convolve_col, in_axes=1, out_axes=1)(infections)

return predicted_counts[:, 0] if is_1d else predicted_counts

is_1d = infections.ndim == 1

if is_1d:

predicted_counts = self._convolve_with_alignment(infections, delay_pmf, ascertainment_rate)[0]

else:

predicted_counts = jax.vmap(lambda col: self._convolve_with_alignment(col, delay_pmf, ascertainment_rate)[0], in_axes=1, out_axes=1)(infections)

return predicted_counts

cdc-mitzimorris added 11 commits September 15, 2025 18:24

Merge branch 'main' of https://github.com/CDCgov/PyRenew

680bb1e

update

2cb876b

Merge branch 'main' of github-bf06:CDCgov/PyRenew

60db8df

Merge branch 'main' of github-bf06:CDCgov/PyRenew

32a5314

Merge branch 'main' of github-bf06:CDCgov/PyRenew

d6213f2

Merge branch 'main' of github-bf06:CDCgov/PyRenew

96f27c9

Merge branch 'main' of github-bf06:CDCgov/PyRenew

1cb6fa2

Merge branch 'main' of github-bf06:CDCgov/PyRenew

f62e1e4

Merge branch 'main' of github-bf06:CDCgov/PyRenew

0c6785d

added generic observation processes and unit tests

35bba26

adding tutorials

89d7aab

cdc-mitzimorris requested review from SamuelBrand1, damonbayer and dylanhmorris as code owners December 23, 2025 16:23

cdc-mitzimorris and others added 8 commits December 23, 2025 12:07

improve test coverage

a2e4630

improve unit test coverage

24096bc

consistent names: 'subpop' (not site), 'sensor' for site/lab measurem…

671d9d0

…ents, 'aggregate' instead of 'jurisdiction'

consistent names: 'subpop' (not site), 'sensor' for site/lab measurem…

57d2fba

…ents, 'aggregate' instead of 'jurisdiction'

[pre-commit.ci] auto fixes from pre-commit.com hooks

7efb524

for more information, see https://pre-commit.ci

add observation types

8a7947f

Merge branch 'mem_generic_observations' of github-bf06:CDCgov/PyRenew…

571ada3

… into mem_generic_observations

[pre-commit.ci] auto fixes from pre-commit.com hooks

6dff8cd

for more information, see https://pre-commit.ci

dylanhmorris reviewed Dec 31, 2025

View reviewed changes

docs/tutorials/observation_processes_counts.qmd Outdated Show resolved Hide resolved