Skip to content

JOSS review @tomelse#283

Open
tristan-deep wants to merge 8 commits intomainfrom
feature/review-tomelse
Open

JOSS review @tomelse#283
tristan-deep wants to merge 8 commits intomainfrom
feature/review-tomelse

Conversation

@tristan-deep
Copy link
Collaborator

@tristan-deep tristan-deep commented Mar 9, 2026

This PR addresses the JOSS review comments and feedback by @tomelse. See a summary of the review here: openjournals/joss-reviews#9881 (comment)

And the related issues:

Please find an updated version of the docs right here: https://zea--283.org.readthedocs.build/en/283/

Summary by CodeRabbit

  • Documentation
    • Updated example notebook: clarified TensorFlow installation requirement for batch loading, added prominent migration notes, separated explanatory text, added keras import and explicit image-to-NumPy conversion, and included a new dataloader usage and visualization snippet.
  • Improvements
    • Added and refined type annotations on public-facing methods to improve clarity and IDE support while preserving runtime behavior.

@tristan-deep tristan-deep added the documentation Improvements or additions to documentation label Mar 9, 2026
@tristan-deep tristan-deep marked this pull request as draft March 9, 2026 16:38
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

Walkthrough

Added return type annotations to Pipeline and data-file public methods; updated the zea_data_example notebook to clarify TensorFlow dependency, add keras imports, convert images to NumPy in examples, and reorganize dataloader usage and explanatory text.

Changes

Cohort / File(s) Summary
Notebook updates
docs/source/notebooks/data/zea_data_example.ipynb
Clarified that zea.backend.tensorflow.make_dataloader requires TensorFlow; added keras imports and new dataloader usage blocks; inserted keras.ops.convert_to_numpy(images) conversion and adjusted explanatory/output text.
Type hints — Data module
zea/data/file.py
Added return type annotations: File.load_data() -> np.ndarray; load_file() -> (np.ndarray, Scan, Probe). No logic changes.
Type hints — Pipeline module
zea/ops/pipeline.py
Imported typing (Any, Dict) and annotated Pipeline.call(self, **inputs) -> Dict[str, Any] and Pipeline.__call__(self, return_numpy=False, **inputs) -> Dict[str, Any]. Signatures only; no behavior changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • wesselvannierop
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'JOSS review @tomelse' is vague and does not clearly summarize the main changes in the changeset, which include notebook updates, type annotation additions, and pipeline signature modifications. Consider using a more descriptive title that summarizes the primary changes, such as 'Add type hints and update documentation for JOSS review' or 'Address JOSS review feedback with type annotations and notebook updates'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/review-tomelse
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@tristan-deep tristan-deep marked this pull request as ready for review March 10, 2026 11:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
zea/ops/pipeline.py (1)

297-317: ⚠️ Potential issue | 🟠 Major

Handle the empty-pipeline case before returning outputs.

outputs is only assigned inside the loop, so Pipeline([]) or a config with operations: [] still crashes here with UnboundLocalError. Either reject empty pipelines in __init__, or return the inputs unchanged when there are no callable layers.

🩹 Local fix
     def call(self, **inputs) -> Dict[str, Any]:
         """Process input data through the pipeline."""
+        if not self._callable_layers:
+            return inputs
+
         for operation in self._callable_layers:
             try:
                 outputs = operation(**inputs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zea/ops/pipeline.py` around lines 297 - 317, The call method can raise
UnboundLocalError when self._callable_layers is empty because outputs is only
set inside the loop; fix by handling the empty-pipeline case up front or
ensuring outputs is initialized: either check if not self._callable_layers and
return inputs immediately, or initialize outputs = inputs before the for-loop in
Pipeline.call so that when no operations exist the original inputs are returned;
update references to self._callable_layers and the call method accordingly.
zea/data/file.py (1)

171-226: ⚠️ Potential issue | 🟡 Minor

Broaden the return type or disallow scalar indexing.

Line 175 declares -> np.ndarray, but data[indices] can return a NumPy scalar when every axis is indexed by an integer. Since the signature accepts arbitrary integer tuples, the type contract is narrower than the actual runtime behavior. Either restrict indices to array-producing cases, or widen the return type to include numpy.generic:

✏️ Typing fix
-    ) -> np.ndarray:
+    ) -> np.ndarray | np.generic:

Mirror the same widened type in load_file(...) for its first tuple element.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@zea/data/file.py` around lines 171 - 226, The return annotation of load_data
currently declares -> np.ndarray but indexing with indices (the indices
parameter of load_data) can yield a NumPy scalar (numpy.generic) when all axes
are integer-indexed; update the type signature to reflect this by widening the
return type to Union[np.ndarray, numpy.generic] (or np.ndarray | numpy.generic)
and adjust any imports/typing aliases as needed; additionally mirror this
widened first-tuple-element return type in the related load_file function so its
corresponding returned element also allows numpy.generic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/notebooks/data/zea_data_example.ipynb`:
- Around line 383-387: The notebook documents that
zea.backend.tensorflow.make_dataloader requires TensorFlow but doesn't install
it; add a new notebook cell immediately before the dataloader import/usage that
installs TensorFlow (run pip install tensorflow) and capture its output so the
subsequent import of zea.backend.tensorflow and the call to make_dataloader
succeed in a fresh Colab environment.

---

Outside diff comments:
In `@zea/data/file.py`:
- Around line 171-226: The return annotation of load_data currently declares ->
np.ndarray but indexing with indices (the indices parameter of load_data) can
yield a NumPy scalar (numpy.generic) when all axes are integer-indexed; update
the type signature to reflect this by widening the return type to
Union[np.ndarray, numpy.generic] (or np.ndarray | numpy.generic) and adjust any
imports/typing aliases as needed; additionally mirror this widened
first-tuple-element return type in the related load_file function so its
corresponding returned element also allows numpy.generic.

In `@zea/ops/pipeline.py`:
- Around line 297-317: The call method can raise UnboundLocalError when
self._callable_layers is empty because outputs is only set inside the loop; fix
by handling the empty-pipeline case up front or ensuring outputs is initialized:
either check if not self._callable_layers and return inputs immediately, or
initialize outputs = inputs before the for-loop in Pipeline.call so that when no
operations exist the original inputs are returned; update references to
self._callable_layers and the call method accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0d321eed-f81b-415d-afa8-10acc2699062

📥 Commits

Reviewing files that changed from the base of the PR and between 9aaa0c3 and 6a4b9c0.

📒 Files selected for processing (3)
  • docs/source/notebooks/data/zea_data_example.ipynb
  • zea/data/file.py
  • zea/ops/pipeline.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/source/notebooks/data/zea_data_example.ipynb (1)

383-387: Consider tightening the TensorFlow note wording for Colab vs local runs.

The note is directionally correct; a small wording tweak would make environment expectations clearer and avoid ambiguity.

Suggested wording refinement
- "In machine and deep learning workflows, we often want more features like batching, shuffling, and parallel data loading. The `zea.backend.tensorflow.make_dataloader` function provides a convenient way to create a TensorFlow data loader from a zea  dataset. "
+ "In machine and deep learning workflows, we often want features like batching, shuffling, and parallel data loading. The `zea.backend.tensorflow.make_dataloader` function provides a convenient way to create a TensorFlow data loader from a zea dataset."

- "🚨 Note! This does require a working TensorFlow installation, but does work in combination with any other backend as well. We are [working on migrating](https://github.com/tue-bmd/zea/pull/256) to [Grain](https://github.com/google/grain), which will provide a backend-agnostic dataloader in the near future."
+ "🚨 Note: `make_dataloader` requires TensorFlow. In Colab, TensorFlow is available by default; for local environments, install TensorFlow first. It can still be used alongside other backends. We are [working on migrating](https://github.com/tue-bmd/zea/pull/256) to [Grain](https://github.com/google/grain), which will provide a backend-agnostic dataloader in the near future."

Based on learnings: In zea documentation notebooks intended for Colab, do not include a line like %pip install tensorflow since Colab already ships TensorFlow; prefer a brief note (optionally guarded import) instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/notebooks/data/zea_data_example.ipynb` around lines 383 - 387,
The TensorFlow environment note around the
zea.backend.tensorflow.make_dataloader description is ambiguous for Colab vs
local runs; update the notebook text to a concise guarded-note: remove any
suggestion to run "%pip install tensorflow", instead state that Colab already
includes TensorFlow but local setups may need installation, and optionally show
a short guarded-import snippet (try/except ImportError with pip install only in
the except) as an alternative; edit the paragraph referencing make_dataloader to
use this tightened wording so readers know when installation is required without
prescribing unnecessary Colab installs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/notebooks/data/zea_data_example.ipynb`:
- Line 12: The bullet text "3. Loading data in batches with dataloading
utilities with `zea.backend.tensorflow.make_dataloader` This does require a
working TensorFlow installation!" is a run-on with a duplicated "with" and
missing punctuation; update the string so it reads clearly (for example: "3.
Loading data in batches using the dataloader utility
`zea.backend.tensorflow.make_dataloader`. This requires a working TensorFlow
installation."). Locate and replace the sentence in the loading-options bullet
in the notebook cell that contains `zea.backend.tensorflow.make_dataloader`.

---

Nitpick comments:
In `@docs/source/notebooks/data/zea_data_example.ipynb`:
- Around line 383-387: The TensorFlow environment note around the
zea.backend.tensorflow.make_dataloader description is ambiguous for Colab vs
local runs; update the notebook text to a concise guarded-note: remove any
suggestion to run "%pip install tensorflow", instead state that Colab already
includes TensorFlow but local setups may need installation, and optionally show
a short guarded-import snippet (try/except ImportError with pip install only in
the except) as an alternative; edit the paragraph referencing make_dataloader to
use this tightened wording so readers know when installation is required without
prescribing unnecessary Colab installs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 437080a4-76ea-42ab-b429-6e35ba3b1031

📥 Commits

Reviewing files that changed from the base of the PR and between 6a4b9c0 and 476d651.

📒 Files selected for processing (2)
  • docs/source/notebooks/data/zea_data_example.ipynb
  • zea/ops/pipeline.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • zea/ops/pipeline.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/notebooks/data/zea_data_example.ipynb`:
- Around line 383-387: Edit the dataloader documentation paragraph (around the
zea.backend.tensorflow.make_dataloader mention) to fix the double-space and
rephrase the warning for clarity: change "🚨 Note! This does require a working
TensorFlow installation, but does work in combination with any other backend as
well. We are [working on migrating]..." to a concise sentence like "Note: a
working TensorFlow installation is required, though the dataloader can be used
alongside other backends; we are migrating to Grain for a backend-agnostic
dataloader (see PR link)." Ensure the double space is removed and the text
references zea.backend.tensorflow.make_dataloader and the Grain migration link.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: abd8edaf-8ad8-42ae-8ea8-cd91c7eac6f5

📥 Commits

Reviewing files that changed from the base of the PR and between 476d651 and 3fae77c.

📒 Files selected for processing (1)
  • docs/source/notebooks/data/zea_data_example.ipynb

Comment on lines +383 to +387
"In machine and deep learning workflows, we often want more features like batching, shuffling, and parallel data loading. The `zea.backend.tensorflow.make_dataloader` function provides a convenient way to create a TensorFlow data loader from a zea dataset. \n",
"\n",
"🚨 Note! This does require a working TensorFlow installation, but does work in combination with any other backend as well. We are [working on migrating](https://github.com/tue-bmd/zea/pull/256) to [Grain](https://github.com/google/grain), which will provide a backend-agnostic dataloader in the near future.\n",
"\n",
"This dataloader is particularly useful for training models. It is important that there is some consistency in the dataset, which is not the case for [PICMUS](https://www.creatis.insa-lyon.fr/Challenge/IEEE_IUS_2016/home). Therefore in this example we will use a small part of the [CAMUS](https://www.creatis.insa-lyon.fr/Challenge/camus/) dataset."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Polish wording in the dataloader section for readability.

There’s a small double-space typo and slightly awkward phrasing in the warning line.

✏️ Suggested doc text cleanup
- In machine and deep learning workflows, we often want more features like batching, shuffling, and parallel data loading. The `zea.backend.tensorflow.make_dataloader` function provides a convenient way to create a TensorFlow data loader from a zea  dataset. 
+ In machine and deep learning workflows, we often want more features like batching, shuffling, and parallel data loading. The `zea.backend.tensorflow.make_dataloader` function provides a convenient way to create a TensorFlow data loader from a zea dataset.

- 🚨 Note! This does require a working TensorFlow installation, but does work in combination with any other backend as well. We are [working on migrating](https://github.com/tue-bmd/zea/pull/256) to [Grain](https://github.com/google/grain), which will provide a backend-agnostic dataloader in the near future.
+ 🚨 Note: This requires a working TensorFlow installation, but it can be used in combination with other backends. We are [working on migrating](https://github.com/tue-bmd/zea/pull/256) to [Grain](https://github.com/google/grain), which will provide a backend-agnostic dataloader in the near future.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/notebooks/data/zea_data_example.ipynb` around lines 383 - 387,
Edit the dataloader documentation paragraph (around the
zea.backend.tensorflow.make_dataloader mention) to fix the double-space and
rephrase the warning for clarity: change "🚨 Note! This does require a working
TensorFlow installation, but does work in combination with any other backend as
well. We are [working on migrating]..." to a concise sentence like "Note: a
working TensorFlow installation is required, though the dataloader can be used
alongside other backends; we are migrating to Grain for a backend-agnostic
dataloader (see PR link)." Ensure the double space is removed and the text
references zea.backend.tensorflow.make_dataloader and the Grain migration link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant