Represent statespace metadata with dataclasses #607

Dekermanjian · 2025-11-02T17:50:36Z

This is a draft proposal for #598

The idea is to handle each component separately using _set_{component} methods and all information are stored using data classes for easy mapping.

I believe this will simplify our tests of these components and will reduce redundancies where we have the same information spread across multiple sub-components like data_names and data_info.

@jessegrabowski let me know what you think I put a little notebook together to showcase the changes.

review-notebook-app · 2025-11-02T17:50:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jessegrabowski

This is a great first pass, much cleaner than what we have now.

pymc_extras/statespace/models/structural/components/regression_dataclass.py

jessegrabowski · 2025-11-06T20:06:12Z

We can also keep all of the existing properties like state_names, shock_names, state_dims, etc, but move them to the base class and just extract the requested info from the relevant Info objects.

jessegrabowski · 2025-11-07T14:10:12Z

Reflecting on it, I am convinced this is the way to go. It's 1000x more ergonomic. I made some changes to your initial code to make the API more "dictionary like", and to reduce code duplication. I moved everything to statespace/core/properties.py, because this is ultimately going to replace what we have in both the core models and the components.

Dekermanjian · 2025-11-07T14:37:48Z

@jessegrabowski, this is looking really cool! What can I do to help push this forward?

jessegrabowski · 2025-11-07T15:04:06Z

Delete the new regression_dataclass.py and simply refactor regression.py to use the new stuff.

We should keep your notebook with the plan to add it as a new example for the docs. Or it can be merged into the custom statespace notebook. So that should also be updated to import from the new properties.py file

Dekermanjian · 2025-11-07T15:15:27Z

Perfect! I'll work on that today!! It is really looking cool!

pymc_extras/statespace/core/properties.py

pymc_extras/statespace/models/structural/components/regression.py

pymc_extras/statespace/models/structural/components/regression_dataclass.py

pymc_extras/statespace/models/structural/core.py

jessegrabowski · 2025-11-11T02:39:23Z

pymc_extras/statespace/models/utilities.py

-        SHOCK_DIM: ss_mod.shock_names,
-        SHOCK_AUX_DIM: ss_mod.shock_names,
-    }
+    ALL_STATE_COORD = Coord(dimension=ALL_STATE_DIM, labels=ss_mod.state_names)


This function should be eliminated in favor of the method CoordsInfo.defaults_from_model

This comment still stands. Is there anywhere we need this instead of just using CoordsInfo.defaults_from_model?

tests/statespace/core/test_properties.py

tests/statespace/models/structural/components/test_regression.py

tests/statespace/models/structural/conftest.py

Dekermanjian · 2025-11-11T12:20:08Z

@jessegrabowski, I agree with all of your comments above. I am going to start making those changes.

pymc_extras/statespace/core/properties.py

jessegrabowski

Incomplete review, I'll continue tomorrow AM

jessegrabowski · 2025-12-08T01:24:41Z

pymc_extras/statespace/core/properties.py

+            # if key in index:
+            #     raise ValueError(f"Duplicate {self.key_field} '{key}' detected.") # This needs to be possible for shared states


That shouldn't happen here though, it should come up in merge or add right? And we handle it there with the allow_duplicates flag

I think what happens is because our data classes are immutable the __post_init__ runs right after our merge/add because we always return new objects of the same dataclass and it see that there are duplicate keys even though the merge/add method had allowed them via allow_duplicates.

There shouldn't be duplicate keys in the final result though. My understanding was that if we set allow_duplicates=False, there's essentially a runtime guard that we aren't trying to add a key that already exists (this will error). If True, we don't raise an error and overwrite the existing key, like a python dictionary.

pymc_extras/statespace/core/properties.py

pymc_extras/statespace/models/structural/components/regression.py

pymc_extras/statespace/models/structural/core.py

pymc_extras/statespace/utils/message_tools.py

Dekermanjian · 2025-12-22T23:22:25Z

Hey @jessegrabowski, by switching a lot of the component attributes to properties I was able to simplify a good amount of downstream methods. If you don't mind taking a look at the current state of this before I go ahead and do the same with the rest of the SSM components.

jessegrabowski · 2025-12-22T23:38:25Z

pymc_extras/statespace/models/structural/components/regression.py


        self.coords_info = CoordInfo(coords=[regression_state_coord, endogenous_state_coord])

    def populate_component_properties(self) -> None:


This method won't be unique to regression right? We will want to move it up to the base class.

@jessegrabowski, in the base class there is a populate_component_properties method that raises a NotImplemented. Did you want to replace that with a generic method that sets _set_<foo> for the 2 defaults (shocks and states) that we provide?

The generic populate_component_propterties should call everything. For any setter that raises by default, we should always implement it (even if we just have def _set_foo(self): pass). Then it's really obvious that this component doesn't have a specific property. Alternatively we can set an empty Info.

For optional properties, we should make sure it's harmless to call e.g. _set_data_info() for components that don't use it.

pymc_extras/statespace/models/structural/core.py

jessegrabowski · 2025-12-22T23:46:37Z

Yeah it looks really great! Go ahead and do the others. Excited to get this over the finish line

Dekermanjian · 2025-12-28T15:36:39Z

Yeah it looks really great! Go ahead and do the others. Excited to get this over the finish line

Yeah, this is going to be pretty cool! I am going to commit + push the components one by one as I complete them. I will also let you know here once I get all of them done so that if you prefer reviewing everything all at once.

Dekermanjian · 2025-12-29T20:47:10Z

Alright! @jessegrabowski, I refactored all the structural components to use the new dataclass architecture. All tests under ./tests/statespace/models/structural/components/ pass. We will need to refactor the models (ETS, S/VARIMA/X, etc) to use the new architecture.

jessegrabowski · 2025-12-29T20:49:06Z

Amazing! How is your impression of it after refactoring everything? Does it seem a bit more readable? Any sharp edges that are still confusing?

Also it looks like you need to rebase and run pre-commit

Dekermanjian · 2025-12-29T20:51:26Z

I think it is more readable. There are some repeated code that I am trying to figure out how to make less repeatable. These are the k_<foo> variables that you will see at the start of many of the _set_<foo> methods. I am sure once you take a look in your review we can iron them out.

Dekermanjian · 2025-12-29T20:54:06Z

Also it looks like you need to rebase and run pre-commit

Yes, I definitely need to rebase, but I am pretty sure the pre-commit did run. hmm that is odd.

…sing the new dataclasses API

…uplicate with warning 2. removed unnecessary imports from __init__ after deleting regression_dataclass 3. updated components and structural classes to only utilize dataclasses and pull other objects from <foo>_info dataclasses 4. updated tests to conform to dataclass api

2. created tests for add and merge methods 3. added utility to convert from snake to pascal and integrated it in error messaging

… and placed default shoch and state setters

…_duplicates is False 2. converted component attributes into properties 3. removed _combine_property method 4. removed redundant observed_states property 5. fixed indentation bug

…component to use dataclass structure

2. Added TensorVariable and TensorData properties for use with make_and_register_variable/data 3. Updated regression component to use TensorData property

…itecture

Dekermanjian · 2025-12-29T21:12:42Z

@jessegrabowski, I rebased and ran pre-commit again but the pre-commit says everything passed.

jessegrabowski · 2025-12-29T21:15:30Z

@jessegrabowski, I rebased and ran pre-commit again but the pre-commit says everything passed.

Looks like the CI is happy now :)

jessegrabowski · 2025-12-29T22:26:47Z

pymc_extras/statespace/core/properties.py

@@ -0,0 +1,257 @@
+from __future__ import annotations


The robot always adds this -- why do we need it?

pymc_extras/statespace/core/properties.py

jessegrabowski · 2025-12-29T22:30:05Z

pymc_extras/statespace/core/properties.py

+            # if key in index:
+            #     raise ValueError(f"Duplicate {self.key_field} '{key}' detected.") # This needs to be possible for shared states


There shouldn't be duplicate keys in the final result though. My understanding was that if we set allow_duplicates=False, there's essentially a runtime guard that we aren't trying to add a key that already exists (this will error). If True, we don't raise an error and overwrite the existing key, like a python dictionary.

pymc_extras/statespace/core/properties.py

pymc_extras/statespace/models/structural/core.py

jessegrabowski · 2025-12-30T00:09:18Z

pymc_extras/statespace/models/utilities.py

-        SHOCK_DIM: ss_mod.shock_names,
-        SHOCK_AUX_DIM: ss_mod.shock_names,
-    }
+    ALL_STATE_COORD = Coord(dimension=ALL_STATE_DIM, labels=ss_mod.state_names)


This comment still stands. Is there anywhere we need this instead of just using CoordsInfo.defaults_from_model?

tests/statespace/models/structural/components/test_cycle.py

jessegrabowski · 2025-12-30T00:10:32Z

tests/statespace/models/structural/components/test_level_trend.py

    )


+@pytest.mark.filterwarnings("ignore:Duplicate names found:UserWarning")


duplicates shouldn't ever warn, it should raise (if disallowed) or silently accept it.

I changed the argument name allow_duplicates to overwrite_duplicates because that is really what is happening. I was able to impose the logic of raising if there is a duplicate key. I will remove the warning message of overwriting duplicate names which is required when we share states.

tests/statespace/models/structural/components/test_measurement_error.py

2. added raise value error on duplicate keys in Info class 3. updated arg name from allow_duplicates to overwrite_duplicates 4. updated component child classes so that graph construction method it as the bottom of the file 5. updated setter methods in component child classes so that they can be called in any order 6. removed warning when overwriting duplicate names in info class 7. reduced complexity by using parameter containers in if blocks 8. Switched merge to add methods for TensorVariable and TensorData construction 9. renamed dataclass TensorVariable and TensoreVariableInfo due to conflict with pt.TensorVariable

Dekermanjian · 2025-12-30T23:42:42Z

@jessegrabowski, I had to rename the data classes TensorVariable and TensorVariableInfo to PyTensorVariable and PyTensorVariableInfo because there was a conflict with Pytensor's pt.TensorVariable class.

I preferred the older names but didn't want to risk any issues.

jessegrabowski · 2025-12-31T00:28:05Z

@jessegrabowski, I had to rename the data classes TensorVariable and TensorVariableInfo to PyTensorVariable and PyTensorVariableInfo because there was a conflict with Pytensor's pt.TensorVariable class.

What about SymbolicTensor and SymbolicTensorInfo ?

Dekermanjian · 2025-12-31T00:33:05Z

@jessegrabowski, I had to rename the data classes TensorVariable and TensorVariableInfo to PyTensorVariable and PyTensorVariableInfo because there was a conflict with Pytensor's pt.TensorVariable class.

What about SymbolicTensor and SymbolicTensorInfo ?

That is much better! I will switch it to that.

…bolicData 2. removed a remnant of _name_to_vars being used in make_and_register_data

Dekermanjian marked this pull request as draft November 2, 2025 17:50

Dekermanjian added enhancements New feature or request request discussion statespace labels Nov 2, 2025

jessegrabowski reviewed Nov 6, 2025

View reviewed changes

jessegrabowski changed the title ~~proposal for updating propogate_component_properties using data classes~~ Represent statespace metadata with dataclasses Nov 7, 2025

jessegrabowski requested changes Nov 11, 2025

View reviewed changes

jessegrabowski reviewed Nov 15, 2025

View reviewed changes

pymc_extras/statespace/core/properties.py Outdated Show resolved Hide resolved

jessegrabowski requested changes Dec 8, 2025

View reviewed changes

jessegrabowski requested changes Dec 13, 2025

View reviewed changes

jessegrabowski reviewed Dec 22, 2025

View reviewed changes

Dekermanjian and others added 5 commits December 29, 2025 13:56

proposal for updating propogate_component_properties using data classes

9ff1158

Iterate on proposal

2210d7a

Fix iterator, add to_dict method to CoordsInfo

071799f

Add observed_states helper to StateInfo

257ac75

made necessary changes to get the regression component test to pass u…

09d97d4

…sing the new dataclasses API

Dekermanjian added 9 commits December 29, 2025 13:56

1. added add and merge methods to base class

2762288

2. created tests for add and merge methods 3. added utility to convert from snake to pascal and integrated it in error messaging

removed data & coords setters in _set<foo> medthod in Component class…

52be00a

… and placed default shoch and state setters

1. updated properties base class to handle duplicate names when allow…

ef9ee2f

…_duplicates is False 2. converted component attributes into properties 3. removed _combine_property method 4. removed redundant observed_states property 5. fixed indentation bug

added docstring to setter methods in core and refactored level trend …

6fb4d47

…component to use dataclass structure

1.restructured seasonal components to work with dataclass architecture

99753ad

2. Added TensorVariable and TensorData properties for use with make_and_register_variable/data 3. Updated regression component to use TensorData property

restructured autoregressive component to follow dataclass architecture

798ee7c

restructured measuerment error component to align with dataclass arch…

5efd80b

…itecture

restructured cycle component to use dataclass architecture

f08f7d6

Dekermanjian force-pushed the ssm_populate_component_properties branch from c80a065 to f08f7d6 Compare December 29, 2025 21:10

jessegrabowski requested changes Dec 30, 2025

View reviewed changes

1. changed PyTensorVariable to SymbolicVariable and TensorData to Sym…

350fa8a

…bolicData 2. removed a remnant of _name_to_vars being used in make_and_register_data

		# if key in index:
		# raise ValueError(f"Duplicate {self.key_field} '{key}' detected.") # This needs to be possible for shared states


		self.coords_info = CoordInfo(coords=[regression_state_coord, endogenous_state_coord])

		def populate_component_properties(self) -> None:

		)


		@pytest.mark.filterwarnings("ignore:Duplicate names found:UserWarning")

Represent statespace metadata with dataclasses #607

Are you sure you want to change the base?

Represent statespace metadata with dataclasses #607

Uh oh!

Conversation

Dekermanjian commented Nov 2, 2025

Uh oh!

review-notebook-app bot commented Nov 2, 2025

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jessegrabowski commented Nov 6, 2025

Uh oh!

jessegrabowski commented Nov 7, 2025

Uh oh!

Dekermanjian commented Nov 7, 2025

Uh oh!

jessegrabowski commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dekermanjian commented Nov 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dekermanjian commented Nov 11, 2025

Uh oh!

Uh oh!

jessegrabowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Dekermanjian commented Dec 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jessegrabowski commented Dec 22, 2025

Uh oh!

jessegrabowski commented Nov 7, 2025 •

edited

Loading

jessegrabowski commented Dec 29, 2025 •

edited

Loading

Dekermanjian commented Dec 29, 2025 •

edited

Loading

Dekermanjian commented Dec 29, 2025 •

edited

Loading

Dekermanjian commented Dec 30, 2025 •

edited

Loading