Pipeline serialization to config improvements#288
Conversation
* Also Beamform and Map serialization fixes
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughPipeline config now expects a wrapper {"pipeline": {"operations": [...]}}, serialization gained a verbose mode that preserves user JIT kwargs, Pipeline.from_path was added (pipeline_from_yaml deprecated), Merge/Stack/BranchedPipeline removed from ops exports, CommonMidpointPhaseError added, and tests/docs updated to match HF/config path plumbing. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zea/ops/base.py`:
- Around line 30-31: The current guard uses hasattr(value, "ndim") but does not
ensure value.tolist() exists, risking an AttributeError; change the check around
the tolist call so you only call value.tolist() when it exists and is callable
(e.g., if hasattr(value, "tolist") and callable(getattr(value, "tolist", None)))
or wrap the call in a try/except AttributeError fallback; update the code path
that returns value.tolist() (the line containing value.tolist()) accordingly so
custom objects without tolist won't raise.
- Around line 306-310: When building params in the serialization path (the block
that calls _to_native and assigns params[name] = value), detect callable values
that are not safe for config export (e.g., functools.partial or other callable
subclasses like a bound method) and fail fast: before adding to params, if
callable(value) and value is not a plain importable reference (or does not
round-trip via YAML/JSON), raise a clear TypeError indicating the parameter
(name) is not serializable and must be replaced with a serializable reference;
implement the check right after calling _to_native and before assigning into
params (use the existing names value and name) so required ctor-introspected
params such as Lambda.func are rejected instead of being dumped verbatim.
- Around line 313-336: The serializer omits the Operation attribute
additional_output_keys, so reconstructed Operation instances can lose declared
extra outputs; update the serialization block in Operation (in methods around
where params is built — references: self.additional_output_keys, self.key,
self.output_key, output_keys) to emit params["additional_output_keys"] =
self.additional_output_keys in verbose mode and in compact mode add
params["additional_output_keys"] only when self.additional_output_keys is
non-empty (similar to the existing conditionals for cache/ jit fields) so
round-trip deserialization preserves extra outputs.
* Focus on Config.from_path * Fixes rabbit
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/test_ops_infra.py (1)
106-172: Please retain coverage for legacy flatoperationsconfig.Current fixtures now only exercise the wrapped format. Since
pipeline_from_configstill supports top-level{"operations": [...]}, add one regression test for that branch to prevent accidental breakage.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_ops_infra.py` around lines 106 - 172, Add a regression test that exercises the legacy flat {"operations": [...]} branch of pipeline_from_config: create a small legacy_config = {"operations": [{"name":"multiply"},{"name":"add"}]}, call pipeline_from_config(legacy_config) and assert the returned pipeline contains the expected operations (e.g., operation names "multiply" and "add" or equivalent behavior); place this test in tests/test_ops_infra.py (e.g., test_pipeline_from_legacy_operations) so the legacy path is covered alongside the existing wrapped-format fixtures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@zea/ops/pipeline.py`:
- Around line 123-127: The code currently replaces user-provided jit_kwargs when
keras.backend.backend() == "jax" and self.static_params != [], losing options
stored in jit_kwargs; instead merge the static_argnames into the existing
jit_kwargs (referencing self._user_jit_kwargs, jit_kwargs, self.static_params
and the keras.backend.backend() check) so user keys are preserved—e.g., create
an updated dict that keeps all existing jit_kwargs and adds/overwrites only the
"static_argnames" key with self.static_params (or call
jit_kwargs.update({"static_argnames": self.static_params})), rather than
assigning jit_kwargs = {"static_argnames": ...}.
- Around line 1319-1325: The serializer _pipeline_to_serializable_dict currently
only writes operations so top-level Pipeline attributes (e.g., with_batch_dim,
jit_options, jit_kwargs, name) are dropped; update
_pipeline_to_serializable_dict to include these pipeline-level fields in the
returned dict (e.g., add keys for with_batch_dim, jit_options, jit_kwargs, name
populated from the pipeline instance) while still using
Pipeline._pipeline_to_list(...) for operations, and ensure the corresponding
deserializer (pipeline_from_config) will read those keys back to reconstruct the
original Pipeline settings.
- Around line 1351-1355: Replace the YAML emitter to produce portable YAML:
change the yaml.dump call that serializes
_pipeline_to_serializable_dict(pipeline, verbose=verbose) to yaml.safe_dump and
remove the Dumper=yaml.Dumper argument (keep other params like indent and file
handle). This ensures the serialized output (from the pipeline variable via
_pipeline_to_serializable_dict) can be read back by the existing yaml.safe_load
used earlier in the file (around the yaml.safe_load usage) and prevents
Python-specific tags like !!python/tuple from being emitted.
---
Nitpick comments:
In `@tests/test_ops_infra.py`:
- Around line 106-172: Add a regression test that exercises the legacy flat
{"operations": [...]} branch of pipeline_from_config: create a small
legacy_config = {"operations": [{"name":"multiply"},{"name":"add"}]}, call
pipeline_from_config(legacy_config) and assert the returned pipeline contains
the expected operations (e.g., operation names "multiply" and "add" or
equivalent behavior); place this test in tests/test_ops_infra.py (e.g.,
test_pipeline_from_legacy_operations) so the legacy path is covered alongside
the existing wrapped-format fixtures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e77014e4-56d5-429d-bf1a-50211cf447e0
📒 Files selected for processing (11)
docs/source/getting-started.rstdocs/source/parameters.rstdocs/source/parameters_doc.pytests/test_configs.pytests/test_ops_infra.pyzea/config.pyzea/data/preset_utils.pyzea/internal/setup_zea.pyzea/ops/base.pyzea/ops/pipeline.pyzea/scan.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
zea/data/preset_utils.py (1)
77-84: Consider adding type hint forfilesparameter.The return type is
list[str], but thefilesparameter is typed as barelist. For consistency and clarity, consider usinglist[str].Suggested fix
def _download_files_in_path( repo_id: str, - files: list, + files: list[str], path_filter: str = None, cache_dir=HF_DATASETS_DIR, repo_type="dataset", **kwargs, ) -> list[str]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@zea/data/preset_utils.py` around lines 77 - 84, The files parameter in _download_files_in_path is currently untyped; update its annotation from bare list to list[str] (i.e., change the function signature of _download_files_in_path to use files: list[str]) so the parameter and return types are consistent and clearer; keep other parameters and return type list[str] unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@zea/data/preset_utils.py`:
- Around line 77-84: The files parameter in _download_files_in_path is currently
untyped; update its annotation from bare list to list[str] (i.e., change the
function signature of _download_files_in_path to use files: list[str]) so the
parameter and return types are consistent and clearer; keep other parameters and
return type list[str] unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 89c8d8c3-6cbd-4271-89c8-e69aefa545a6
📒 Files selected for processing (2)
tests/tools/test_hf.pyzea/data/preset_utils.py
Replaces 'verbose' with 'compact' in serialization and round-trip methods, clarifying the behavior for minimal or full config output. Moves additional output key definitions to class-level attributes, ensuring they are not serialized as params. Strengthens pipeline config validation and preserves pipeline-level kwargs in round-trips. Improves YAML portability and JAX static_argnames merging logic.
… feature/pipeline-config
Catch lambda serialization error Fix notebook Even compacter config (strips)
wesselvannierop
left a comment
There was a problem hiding this comment.
Awesome PR, this will greatly improve the sharing of RF data & the particular processing associated with it!
Fixes issue in #280. Generale improvements to pipeline saving.
If saved to config, now always in the form of:
So with the
pipelineas top level key. Saving and loading is now symmetric, and also addedPipeline.from_pathfor consistency withModel.from_pathandConfig.from_path. Lastly, defaults arguments in Pipeline are not automatically saved (for cleaner config) unless specified withverbose. See an example below:Also addressed #289, now added the class variable
ADD_OUTPUT_KEYS.Lastly removed
Merge,StackandBranchedPipelineoperations, as they were not being maintained anymore. See issues #207, #206 and #209.Consistency API saving and loading from configs
Now we have the same syntax for loading from a path:
zea.Model.from_path,zea.Pipeline.from_path,zea.Config.from_pathAnd saving:
zea.Model.to_json(Keras API),zea.Pipeline.to_yaml(orto_json),zea.Config.to_yaml(orto_json)Summary by CodeRabbit
Breaking Changes
New Features
Improvements
Documentation