Skip to content

Pipeline serialization to config improvements#288

Merged
tristan-deep merged 18 commits intomainfrom
feature/pipeline-config
Mar 13, 2026
Merged

Pipeline serialization to config improvements#288
tristan-deep merged 18 commits intomainfrom
feature/pipeline-config

Conversation

@tristan-deep
Copy link
Collaborator

@tristan-deep tristan-deep commented Mar 12, 2026

Fixes issue in #280. Generale improvements to pipeline saving.

If saved to config, now always in the form of:

pipeline:
    operations:
        - <operation_name>
        - ...

So with the pipeline as top level key. Saving and loading is now symmetric, and also added Pipeline.from_path for consistency with Model.from_path and Config.from_path. Lastly, defaults arguments in Pipeline are not automatically saved (for cleaner config) unless specified with verbose. See an example below:

import zea

config_path = "hf://zeahub/picmus/config_iq.yaml"
config = zea.Config.from_path(config_path)

pipeline = zea.Pipeline.from_config(config)
pipeline.to_yaml("pipeline_config.yaml")

# Now the pipeline is saved, there are two options to load it again
# which all should produce the same pipeline as the original one.
# 1. Load the pipeline directly from a path (local or HF yaml file)
# 2. Load the config from a path and then load the pipeline from the config

pipeline_2 = zea.Pipeline.from_yaml("pipeline_config.yaml")
config_2 = zea.Config.from_path("pipeline_config.yaml")

pipeline_3 = zea.Pipeline.from_config(config_2)

print(config.pipeline)
print(config_2.pipeline)

print(pipeline)
print(pipeline_2)
print(pipeline_3)

Also addressed #289, now added the class variable ADD_OUTPUT_KEYS.

Lastly removed Merge, Stack and BranchedPipeline operations, as they were not being maintained anymore. See issues #207, #206 and #209.

Consistency API saving and loading from configs

Now we have the same syntax for loading from a path:
zea.Model.from_path, zea.Pipeline.from_path, zea.Config.from_path

And saving:
zea.Model.to_json (Keras API), zea.Pipeline.to_yaml (or to_json), zea.Config.to_yaml (or to_json)

Summary by CodeRabbit

  • Breaking Changes

    • Pipeline configs now require a top-level "pipeline" wrapper; branched pipelines and some legacy pipeline loaders were removed/deprecated.
  • New Features

    • Added a file-based pipeline loader and compact/verbose serialization modes enabling reliable YAML/JSON/config round-trips.
  • Improvements

    • Better serialization fidelity, clearer error and representation messages, and more consistent handling of HF-style config paths.
  • Documentation

    • Docs updated to use unified path-based config loading (including HF URIs).

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Pipeline config now expects a wrapper {"pipeline": {"operations": [...]}}, serialization gained a verbose mode that preserves user JIT kwargs, Pipeline.from_path was added (pipeline_from_yaml deprecated), Merge/Stack/BranchedPipeline removed from ops exports, CommonMidpointPhaseError added, and tests/docs updated to match HF/config path plumbing.

Changes

Cohort / File(s) Summary
Tests
tests/test_ops_infra.py, tests/test_configs.py, tests/tools/test_hf.py
Updated tests to use Pipeline, adapt configs to the nested {"pipeline": {...}} shape, remove branched-pipeline tests, add round-trip/verbose serialization and Beamform/Map/PatchedGrid checks; added HF kwargs forwarding tests and adjusted HF test helpers to accept repo_type.
Public exports & ops package
zea/ops/__init__.py
Removed Merge, Stack, BranchedPipeline from exports; added CommonMidpointPhaseError; updated __all__ and examples to reference Pipeline.from_path and the nested pipeline contract.
Pipeline implementation & serialization
zea/ops/pipeline.py
Added Pipeline.from_path and deprecation wrapper for YAML loader; centralized serialization with _pipeline_to_serializable_dict; introduced verbose-controlled get_dict/to_config/to_json/to_yaml; propagate verbose to nested ops and preserve user JIT kwargs; removed branched-pipeline logic.
Operation base behavior
zea/ops/base.py
Added _to_native helper, store user-provided jit_kwargs as _user_jit_kwargs; changed Operation.get_dict(self, verbose=False) to compact vs verbose semantics; removed Merge and Stack classes; equality now uses new serialization semantics.
Interface and pipeline input
zea/interface.py
Pipeline.from_config now receives the full self.config wrapper instead of self.config.pipeline.
Tensor op minor rename
zea/ops/tensor.py
Renamed Threshold internal attribute _fill_value_typefill_value and updated resolver usage.
Config, HF helpers & setup plumbing
zea/config.py, zea/data/preset_utils.py, zea/internal/setup_zea.py, docs/...
Added Config.from_path(...) supporting local and hf:// paths; deprecated from_yaml/from_hf; propagated repo_type/**kwargs through HF helpers and setup functions; docs/examples switched to from_path; HF-related helpers/tests updated.
Docs
docs/source/getting-started.rst, docs/source/parameters.rst, docs/source/parameters_doc.py, zea/scan.py
Example/doctest updates to use Config.from_path(...) and HF-style URIs instead of from_yaml/from_hf.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • wesselvannierop
  • swpenninga
  • vincentvdschaft
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change of improving pipeline serialization to config with the new nested pipeline key structure, verbose mode control, and from_path method.
Docstring Coverage ✅ Passed Docstring coverage is 83.70% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/pipeline-config
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 81.10236% with 48 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zea/ops/base.py 76.74% 9 Missing and 11 partials ⚠️
zea/ops/pipeline.py 85.12% 7 Missing and 11 partials ⚠️
zea/ops/ultrasound.py 50.00% 5 Missing and 1 partial ⚠️
zea/config.py 83.33% 2 Missing ⚠️
tests/tools/test_hf.py 87.50% 1 Missing ⚠️
zea/data/preset_utils.py 90.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zea/ops/base.py`:
- Around line 30-31: The current guard uses hasattr(value, "ndim") but does not
ensure value.tolist() exists, risking an AttributeError; change the check around
the tolist call so you only call value.tolist() when it exists and is callable
(e.g., if hasattr(value, "tolist") and callable(getattr(value, "tolist", None)))
or wrap the call in a try/except AttributeError fallback; update the code path
that returns value.tolist() (the line containing value.tolist()) accordingly so
custom objects without tolist won't raise.
- Around line 306-310: When building params in the serialization path (the block
that calls _to_native and assigns params[name] = value), detect callable values
that are not safe for config export (e.g., functools.partial or other callable
subclasses like a bound method) and fail fast: before adding to params, if
callable(value) and value is not a plain importable reference (or does not
round-trip via YAML/JSON), raise a clear TypeError indicating the parameter
(name) is not serializable and must be replaced with a serializable reference;
implement the check right after calling _to_native and before assigning into
params (use the existing names value and name) so required ctor-introspected
params such as Lambda.func are rejected instead of being dumped verbatim.
- Around line 313-336: The serializer omits the Operation attribute
additional_output_keys, so reconstructed Operation instances can lose declared
extra outputs; update the serialization block in Operation (in methods around
where params is built — references: self.additional_output_keys, self.key,
self.output_key, output_keys) to emit params["additional_output_keys"] =
self.additional_output_keys in verbose mode and in compact mode add
params["additional_output_keys"] only when self.additional_output_keys is
non-empty (similar to the existing conditionals for cache/ jit fields) so
round-trip deserialization preserves extra outputs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a962c4d3-b217-4a49-9df5-791e1f2ed9b8

📥 Commits

Reviewing files that changed from the base of the PR and between ec131ed and 4cda24f.

📒 Files selected for processing (1)
  • zea/ops/base.py

* Focus on Config.from_path
* Fixes rabbit
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/test_ops_infra.py (1)

106-172: Please retain coverage for legacy flat operations config.

Current fixtures now only exercise the wrapped format. Since pipeline_from_config still supports top-level {"operations": [...]}, add one regression test for that branch to prevent accidental breakage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_ops_infra.py` around lines 106 - 172, Add a regression test that
exercises the legacy flat {"operations": [...]} branch of pipeline_from_config:
create a small legacy_config = {"operations":
[{"name":"multiply"},{"name":"add"}]}, call pipeline_from_config(legacy_config)
and assert the returned pipeline contains the expected operations (e.g.,
operation names "multiply" and "add" or equivalent behavior); place this test in
tests/test_ops_infra.py (e.g., test_pipeline_from_legacy_operations) so the
legacy path is covered alongside the existing wrapped-format fixtures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@zea/ops/pipeline.py`:
- Around line 123-127: The code currently replaces user-provided jit_kwargs when
keras.backend.backend() == "jax" and self.static_params != [], losing options
stored in jit_kwargs; instead merge the static_argnames into the existing
jit_kwargs (referencing self._user_jit_kwargs, jit_kwargs, self.static_params
and the keras.backend.backend() check) so user keys are preserved—e.g., create
an updated dict that keeps all existing jit_kwargs and adds/overwrites only the
"static_argnames" key with self.static_params (or call
jit_kwargs.update({"static_argnames": self.static_params})), rather than
assigning jit_kwargs = {"static_argnames": ...}.
- Around line 1319-1325: The serializer _pipeline_to_serializable_dict currently
only writes operations so top-level Pipeline attributes (e.g., with_batch_dim,
jit_options, jit_kwargs, name) are dropped; update
_pipeline_to_serializable_dict to include these pipeline-level fields in the
returned dict (e.g., add keys for with_batch_dim, jit_options, jit_kwargs, name
populated from the pipeline instance) while still using
Pipeline._pipeline_to_list(...) for operations, and ensure the corresponding
deserializer (pipeline_from_config) will read those keys back to reconstruct the
original Pipeline settings.
- Around line 1351-1355: Replace the YAML emitter to produce portable YAML:
change the yaml.dump call that serializes
_pipeline_to_serializable_dict(pipeline, verbose=verbose) to yaml.safe_dump and
remove the Dumper=yaml.Dumper argument (keep other params like indent and file
handle). This ensures the serialized output (from the pipeline variable via
_pipeline_to_serializable_dict) can be read back by the existing yaml.safe_load
used earlier in the file (around the yaml.safe_load usage) and prevents
Python-specific tags like !!python/tuple from being emitted.

---

Nitpick comments:
In `@tests/test_ops_infra.py`:
- Around line 106-172: Add a regression test that exercises the legacy flat
{"operations": [...]} branch of pipeline_from_config: create a small
legacy_config = {"operations": [{"name":"multiply"},{"name":"add"}]}, call
pipeline_from_config(legacy_config) and assert the returned pipeline contains
the expected operations (e.g., operation names "multiply" and "add" or
equivalent behavior); place this test in tests/test_ops_infra.py (e.g.,
test_pipeline_from_legacy_operations) so the legacy path is covered alongside
the existing wrapped-format fixtures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e77014e4-56d5-429d-bf1a-50211cf447e0

📥 Commits

Reviewing files that changed from the base of the PR and between 4cda24f and 2c7e526.

📒 Files selected for processing (11)
  • docs/source/getting-started.rst
  • docs/source/parameters.rst
  • docs/source/parameters_doc.py
  • tests/test_configs.py
  • tests/test_ops_infra.py
  • zea/config.py
  • zea/data/preset_utils.py
  • zea/internal/setup_zea.py
  • zea/ops/base.py
  • zea/ops/pipeline.py
  • zea/scan.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
zea/data/preset_utils.py (1)

77-84: Consider adding type hint for files parameter.

The return type is list[str], but the files parameter is typed as bare list. For consistency and clarity, consider using list[str].

Suggested fix
 def _download_files_in_path(
     repo_id: str,
-    files: list,
+    files: list[str],
     path_filter: str = None,
     cache_dir=HF_DATASETS_DIR,
     repo_type="dataset",
     **kwargs,
 ) -> list[str]:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zea/data/preset_utils.py` around lines 77 - 84, The files parameter in
_download_files_in_path is currently untyped; update its annotation from bare
list to list[str] (i.e., change the function signature of
_download_files_in_path to use files: list[str]) so the parameter and return
types are consistent and clearer; keep other parameters and return type
list[str] unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@zea/data/preset_utils.py`:
- Around line 77-84: The files parameter in _download_files_in_path is currently
untyped; update its annotation from bare list to list[str] (i.e., change the
function signature of _download_files_in_path to use files: list[str]) so the
parameter and return types are consistent and clearer; keep other parameters and
return type list[str] unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 89c8d8c3-6cbd-4271-89c8-e69aefa545a6

📥 Commits

Reviewing files that changed from the base of the PR and between 2c7e526 and a69f7dd.

📒 Files selected for processing (2)
  • tests/tools/test_hf.py
  • zea/data/preset_utils.py

Replaces 'verbose' with 'compact' in serialization and round-trip methods, clarifying the behavior for minimal or full config output.

Moves additional output key definitions to class-level attributes, ensuring they are not serialized as params.

Strengthens pipeline config validation and preserves pipeline-level kwargs in round-trips.

Improves YAML portability and JAX static_argnames merging logic.
@tristan-deep tristan-deep linked an issue Mar 13, 2026 that may be closed by this pull request
tristan-deep and others added 4 commits March 13, 2026 12:20
Copy link
Collaborator

@wesselvannierop wesselvannierop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR, this will greatly improve the sharing of RF data & the particular processing associated with it!

@tristan-deep tristan-deep merged commit 3d06e3d into main Mar 13, 2026
10 checks passed
@tristan-deep tristan-deep deleted the feature/pipeline-config branch March 13, 2026 13:53
@tristan-deep tristan-deep mentioned this pull request Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

2 participants