Skip to content

Replaced lazy all_loaded scans with an _n_lazy counter#103

Closed
milosobral wants to merge 12 commits intomainfrom
milo/lazy-attributes
Closed

Replaced lazy all_loaded scans with an _n_lazy counter#103
milosobral wants to merge 12 commits intomainfrom
milo/lazy-attributes

Conversation

@milosobral
Copy link
Copy Markdown
Contributor

@milosobral milosobral commented Mar 5, 2026

  • Before: lazy objects repeatedly scanned attributes to check if everything was loaded.
  • Now: a simple counter tracks remaining lazy fields and flips to eager mode when it reaches zero.
  • Why this helps: avoids repeated O(n) checks during lazy access, which is important for large objects with many fields.

This optimization makes quite a small difference for small objects but does make a bigger difference for larger ones with more attributes.
Benchmark results:

Benchmark 041f06c (µs) Working Tree (µs) Speedup
Data.slice() (lazy, realistic) 21686.920 19182.811 1.13x
Data.slice() (in-memory) 597.558 609.912 0.98x
IrregularTimeSeries.slice() 55.478 55.153 1.01x
Interval.slice() 5.802 5.869 0.99x
Interval.and (1k&single) 558.361 551.012 1.01x
Interval.and (1k&100) 715.903 728.079 0.98x
Interval.or (1k|100) 4122.880 4147.787 0.99x
Interval.difference (1k-100) 3967.796 3957.231 1.00x
ArrayDict.keys() x100k 0.661 0.641 1.03x
LazyInterval access (10 attrs) 1778.541 1599.147 1.11x
LazyInterval access (500 attrs) 164313.927 53811.654 3.05x

Summary by CodeRabbit

  • New Features

    • Lazy-loaded temporal objects now track remaining lazy fields and automatically convert to fully materialized versions once all pending fields are loaded.
  • Benchmarks

    • Benchmark suite updated with parameterized lazy-access scenarios and preset 10/500-attribute runs for improved profiling.
  • Tests

    • Added tests covering lazy-load counters, materialization order, and automatic conversion behavior.
  • Changelog

    • Documented the new counter-driven eager conversion.

@milosobral milosobral requested a review from vinamarora8 March 5, 2026 18:26
@milosobral milosobral self-assigned this Mar 5, 2026
@milosobral milosobral added the enhancement New feature or request label Mar 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

📝 Walkthrough

Walkthrough

A per-instance lazy-count _n_lazy is added across LazyArrayDict, LazyInterval, LazyRegularTimeSeries, and LazyIrregularTimeSeries to track remaining HDF5-backed attributes; when it reaches zero, objects convert to their eager counterparts and remove lazy bookkeeping. Benchmarks and tests updated to exercise the counter.

Changes

Cohort / File(s) Summary
Lazy core updates
temporaldata/arraydict.py, temporaldata/interval.py, temporaldata/regular_ts.py, temporaldata/irregular_ts.py
Add _n_lazy per-instance counter; narrow __getattribute__ interception to non-private attrs present in __dict__; decrement _n_lazy when materializing h5py datasets; promote to eager classes when _n_lazy == 0; add __setattr__/__delattr__ (ArrayDict), propagate _n_lazy via from_hdf5, select_by_mask, and slice.
Benchmarks
benchmarks/benchmark.py
Refactor bench_lazy_interval_access to accept num_attrs, generate target_pos_{i} fields, reduce timekeys, change measurement loops and iteration counts; add wrappers bench_lazy_interval_access_10 and bench_lazy_interval_access_500; simplify temp file handling.
Tests
tests/test_arraydict.py, tests/test_interval.py, tests/test_irregular_ts.py, tests/test_regular_ts.py
Add tests validating _n_lazy initialization (counts only dataset members), decrementation on attribute access, behavior across select_by_mask and slice, and automatic promotion to eager types after final lazy attribute is materialized.
Changelog
CHANGELOG.md
Add Unreleased → Added note documenting a simple counter tracking remaining lazy fields and automatic switch to eager mode.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LazyObj as Lazy Object
    participant AttrHandler as __getattribute__
    participant HDF5 as HDF5 Dataset
    participant Converter as Promotion Logic
    participant EagerObj as Eager Object

    Client->>LazyObj: access attribute (e.g., "start" or "target_pos_i")
    LazyObj->>AttrHandler: intercept access
    AttrHandler->>HDF5: detect h5py.Dataset (lazy)
    HDF5-->>AttrHandler: dataset reference
    AttrHandler->>AttrHandler: materialize -> NumPy array
    AttrHandler->>LazyObj: store array in __dict__
    AttrHandler->>LazyObj: decrement _n_lazy
    LazyObj->>LazyObj: check _n_lazy
    alt _n_lazy == 0
        LazyObj->>Converter: trigger promotion
        Converter->>EagerObj: convert instance, remove lazy state
        EagerObj-->>Client: return eager attribute/object
    else _n_lazy > 0
        LazyObj-->>Client: return materialized attribute (object remains lazy)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A tiny counter counts each hop,
Lazy fields fall as datasets drop.
When zero whispers, the swap's complete—
From sleepy hops to eager feet! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: replacing lazy attribute-loading scans with a counter mechanism to track remaining lazy fields.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch milo/lazy-attributes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@milosobral milosobral mentioned this pull request Mar 5, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 95.69892% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
temporaldata/arraydict.py 90.90% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
benchmarks/benchmark.py (1)

52-53: Remove or implement the unused span parameter.

span on Line 52 is never used, so the helper API is misleading. If it’s not needed, remove it.

Proposed simplification
-def _make_disjoint_intervals(
-    n, span=10_000, min_gap=1.0, min_dur=0.5, max_dur=2.0, seed=42
-):
+def _make_disjoint_intervals(
+    n, min_gap=1.0, min_dur=0.5, max_dur=2.0, seed=42
+):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark.py` around lines 52 - 53, The function signature in
benchmarks/benchmark.py declares a parameter span (n, span=10_000, min_gap=1.0,
min_dur=0.5, max_dur=2.0, seed=42) that is never used; either remove span from
the signature and any callers, or implement its intended behavior where the
function (and helper functions like any generator or sampler used inside)
constrains/uses the overall span value. Update the signature and all call sites
to match, and run tests to ensure no references to span remain. Ensure you
adjust the docstring and parameter list (and any default) for the function to
reflect the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmarks/benchmark.py`:
- Around line 300-335: The temp file created in bench_lazy_interval_access
(tmpfile/path) is only removed at the end and can leak on exceptions; wrap the
setup and benchmark execution in a try/finally (or use the same pattern as
bench_data_slice_lazy) so os.unlink(path) is always called in the finally block,
ensuring the temporary file is removed on both success and failure; locate
function bench_lazy_interval_access and move the cleanup into a finally that
always executes after the h5py usage and _bench call.

In `@benchmarks/compare.py`:
- Around line 33-38: Replace raw subprocess.run invocations in
benchmarks/compare.py with hardened calls: resolve the git executable via
shutil.which("git") and use that absolute path instead of relying on PATH, add a
reasonable timeout value to prevent hanging, and enable check=True so failures
raise exceptions you can handle. Update each subprocess.run call (the ones
creating result and similar calls around the previous invocations) to use the
resolved git path, timeout, and check parameters and handle
subprocess.CalledProcessError where appropriate.

In `@tests/test_arraydict.py`:
- Around line 196-200: The test uses bare attribute access expressions on the
Data object (data.unit_id, data.brain_region, data.waveform_mean) to drive lazy
loading and triggers Ruff B018; update these to explicitly assign the accessed
values to a throwaway variable (e.g., use "_" ) so the side-effect intent is
clear and the linter is satisfied while keeping the existing asserts that check
data.__dict__["_n_lazy"].

In `@tests/test_interval.py`:
- Around line 269-275: The test uses bare attribute accesses (data.start,
data.end, data.go_cue_time, data.drifting_gratings_dir and the similar access at
line 307) solely to trigger lazy materialization, which Ruff flags as B018;
change each bare access to an explicit discard assignment (e.g., _ = data.start)
so the side effect is preserved while satisfying the linter, updating
occurrences for the attributes referenced above.

---

Nitpick comments:
In `@benchmarks/benchmark.py`:
- Around line 52-53: The function signature in benchmarks/benchmark.py declares
a parameter span (n, span=10_000, min_gap=1.0, min_dur=0.5, max_dur=2.0,
seed=42) that is never used; either remove span from the signature and any
callers, or implement its intended behavior where the function (and helper
functions like any generator or sampler used inside) constrains/uses the overall
span value. Update the signature and all call sites to match, and run tests to
ensure no references to span remain. Ensure you adjust the docstring and
parameter list (and any default) for the function to reflect the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c5b0d6c-011f-46c8-b6b4-86ce7cd0a2fc

📥 Commits

Reviewing files that changed from the base of the PR and between 041f06c and dcc62cb.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • benchmarks/benchmark.py
  • benchmarks/compare.py
  • temporaldata/arraydict.py
  • temporaldata/interval.py
  • temporaldata/irregular_ts.py
  • temporaldata/regular_ts.py
  • tests/test_arraydict.py
  • tests/test_interval.py

Comment thread benchmarks/benchmark.py Outdated
Comment thread benchmarks/compare.py
Comment thread tests/test_arraydict.py Outdated
Comment thread tests/test_interval.py Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 10, 2026

Benchmark comparison vs main

Baseline: origin/main
Target: HEAD

Running benchmarks for 7f4724182a...
Running benchmarks for b0fdafccb7...

  Benchmark                                     7f4724182a (µs)    b0fdafccb7 (µs)    Speedup
  --------------------------------------------------------------------------------------------
  Data.slice() (lazy, realistic)                       1918.066           1043.529      1.84x
  Data.slice() (in-memory)                              746.784            743.584      1.00x
  IrregularTimeSeries.slice()                            86.482             86.295      1.00x
  RegularTimeSeries.slice()                              44.214             44.005      1.00x
  Interval.slice()                                        9.240              9.232      1.00x
  Interval.__and__ (1k&single)                           50.819             50.627      1.00x
  Interval.__and__ (1k&100)                            1197.922           1177.819      1.02x
  Interval.__or__ (1k|100)                               65.401             64.996      1.01x
  Interval.difference (1k-100)                         5754.280           5771.520      1.00x
  ArrayDict.keys() x100k                                  1.796              1.804      1.00x
  LazyInterval access (10 attrs)                       2758.103           2466.841      1.12x
  LazyInterval access (500 attrs)                    384729.702          84630.665      4.55x

Copy link
Copy Markdown
Member

@vinamarora8 vinamarora8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@milosobral Leaving some comments!
Also, curious as to why no tests are added for IrregularTimeSeries?
Can you also merge main into this pls.

Comment thread temporaldata/arraydict.py

# store it, now the array is loaded
self.__dict__[name] = out
self.__dict__["_n_lazy"] -= 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just self._n_lazy -= 1?

Comment thread temporaldata/arraydict.py
Comment on lines 453 to 460
obj = cls.__new__(cls)
for key, value in file.items():
obj.__dict__[key] = value

obj._unicode_keys = file.attrs["_unicode_keys"].astype(str).tolist()
obj._lazy_ops = {}
obj._n_lazy = len(file)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to be safe always? Maybe we should do this?

for key, value in file.item():
    obj.__dict__[key] = value
    if isinstance(value, h5py.Dataset):
        self._n_lazy += 1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea good catch

…n_lazy.

- Added a new test to verify that _n_lazy reflects the number of datasets in LazyArrayDict.
@milosobral milosobral requested a review from vinamarora8 March 23, 2026 18:25
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@temporaldata/arraydict.py`:
- Around line 367-386: The object can become permanently “lazy” because _n_lazy
is not updated when callers set or delete public fields; override
LazyArrayDict.__setattr__ and __delattr__ to keep _n_lazy in sync with actual
dataset-valued entries: when setting an attribute (in __setattr__) check the
previous value from self.__dict__ and the incoming value and increment/decrement
_n_lazy if a h5py.Dataset (or whatever lazy marker you use) is being added or
removed (or replaced by a non-dataset), then delegate to ArrayDict.__setattr__
or assign into __dict__; when deleting an attribute (in __delattr__) decrement
_n_lazy if the removed value was a dataset, then perform deletion and run the
same promotion logic used in __getattribute__ (flip to ArrayDict and del
_lazy_ops/_unicode_keys/_n_lazy when _n_lazy reaches 0). Ensure you reference
__getattribute__, __setattr__, __delattr__, _n_lazy, LazyArrayDict, ArrayDict
and to_hdf5 when implementing these changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 131a3d36-e278-4a0c-9683-e54f4f2c8f6c

📥 Commits

Reviewing files that changed from the base of the PR and between dcc62cb and 4f7de61.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • benchmarks/benchmark.py
  • temporaldata/arraydict.py
  • tests/test_arraydict.py
  • tests/test_interval.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/test_arraydict.py
  • tests/test_interval.py

Comment thread temporaldata/arraydict.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
benchmarks/benchmark.py (1)

339-356: ⚠️ Potential issue | 🟡 Minor

Move temp-file cleanup into a finally.

The trailing os.unlink(path) still won't run if setup or the benchmark body raises, so failed benchmark runs can leak temp files.

Possible fix
-    with h5py.File(path, "w") as f:
-        iv.to_hdf5(f)
-
-    results = None
-    with h5py.File(path, "r") as f:
-
-        def go():
-            lazy = LazyInterval.from_hdf5(f)
-            _ = lazy.start
-            _ = lazy.end
-
-            for i in range(num_attrs):
-                _ = getattr(lazy, f"target_pos_{i}")
-
-        results = _bench(f"LazyInterval access ({num_attrs} attrs)", go, number=25)
-
-    os.unlink(path)
-    return results
+    try:
+        with h5py.File(path, "w") as f:
+            iv.to_hdf5(f)
+
+        with h5py.File(path, "r") as f:
+
+            def go():
+                lazy = LazyInterval.from_hdf5(f)
+                _ = lazy.start
+                _ = lazy.end
+
+                for i in range(num_attrs):
+                    _ = getattr(lazy, f"target_pos_{i}")
+
+            return _bench(f"LazyInterval access ({num_attrs} attrs)", go, number=25)
+    finally:
+        if os.path.exists(path):
+            os.unlink(path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark.py` around lines 339 - 356, The temp file at variable
path can leak if an exception occurs before the final os.unlink(path); wrap the
file-creation and benchmark logic (the with h5py.File(path, "w") as f: ... the
with h5py.File(path, "r") as f: ... and the call to _bench inside go which uses
LazyInterval.from_hdf5) in a try/finally and move os.unlink(path) into the
finally block so the file is always removed even on errors; ensure any early
returns still let the finally execute and reference path,
LazyInterval.from_hdf5, and _bench unchanged.
temporaldata/arraydict.py (1)

342-386: ⚠️ Potential issue | 🟠 Major

Counter-based promotion still needs mutation hooks.

Now that promotion depends entirely on _n_lazy, replacing or deleting a still-lazy public field can leave the instance stuck as LazyArrayDict even after every dataset is gone. Please keep the counter in sync in __setattr__/__delattr__ and mirror the same fix in the other lazy containers added in this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@temporaldata/arraydict.py` around lines 342 - 386, The lazy-promotion counter
_n_lazy is not updated when public fields are set or deleted, so implement hooks
in __setattr__ and __delattr__ to keep it in sync: in __setattr__(self, name,
value) if name is a public key (not starting with "_") adjust _n_lazy up when
assigning an h5py.Dataset (or other lazy sentinel) and adjust down when
replacing an existing h5py.Dataset with a non-lazy value; in __delattr__(self,
name) if deleting a public key decrement _n_lazy if the deleted value is an
h5py.Dataset; after adjustments, check the same promotion condition (if
self._n_lazy == 0 then promote by setting __class__ = ArrayDict and deleting
_lazy_ops/_unicode_keys/_n_lazy) to mirror the behavior in __getattribute__;
apply the same pattern to the other lazy container classes introduced in this PR
so all lazy containers update their _n_lazy consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@temporaldata/arraydict.py`:
- Around line 455-458: The loop that assigns HDF5 group members into
obj.__dict__ is inserting non-dataset members (e.g., groups) which breaks
LazyArrayDict assumptions and _n_lazy counting; change the logic in the loop
inside class LazyArrayDict (the for key, value in file.items() block) to skip
any value that is not an instance of h5py.Dataset before assigning to
obj.__dict__ and before incrementing _n_lazy (i.e., only set obj.__dict__[key]
and increment _n_lazy when isinstance(value, h5py.Dataset)); alternatively, if
you prefer strictness, raise a TypeError for non-dataset members instead of
silently skipping—ensure keys(), __len__ and masking only see dataset entries.

---

Duplicate comments:
In `@benchmarks/benchmark.py`:
- Around line 339-356: The temp file at variable path can leak if an exception
occurs before the final os.unlink(path); wrap the file-creation and benchmark
logic (the with h5py.File(path, "w") as f: ... the with h5py.File(path, "r") as
f: ... and the call to _bench inside go which uses LazyInterval.from_hdf5) in a
try/finally and move os.unlink(path) into the finally block so the file is
always removed even on errors; ensure any early returns still let the finally
execute and reference path, LazyInterval.from_hdf5, and _bench unchanged.

In `@temporaldata/arraydict.py`:
- Around line 342-386: The lazy-promotion counter _n_lazy is not updated when
public fields are set or deleted, so implement hooks in __setattr__ and
__delattr__ to keep it in sync: in __setattr__(self, name, value) if name is a
public key (not starting with "_") adjust _n_lazy up when assigning an
h5py.Dataset (or other lazy sentinel) and adjust down when replacing an existing
h5py.Dataset with a non-lazy value; in __delattr__(self, name) if deleting a
public key decrement _n_lazy if the deleted value is an h5py.Dataset; after
adjustments, check the same promotion condition (if self._n_lazy == 0 then
promote by setting __class__ = ArrayDict and deleting
_lazy_ops/_unicode_keys/_n_lazy) to mirror the behavior in __getattribute__;
apply the same pattern to the other lazy container classes introduced in this PR
so all lazy containers update their _n_lazy consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82eb43df-8459-4e3a-bab2-cb843c52a1ab

📥 Commits

Reviewing files that changed from the base of the PR and between 4f7de61 and c00c900.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • benchmarks/benchmark.py
  • temporaldata/arraydict.py
  • temporaldata/interval.py
  • temporaldata/irregular_ts.py
  • temporaldata/regular_ts.py
  • tests/test_arraydict.py
  • tests/test_interval.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_interval.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • CHANGELOG.md
  • tests/test_arraydict.py
  • temporaldata/regular_ts.py

Comment thread temporaldata/arraydict.py
Comment on lines 455 to +458
for key, value in file.items():
obj.__dict__[key] = value
if isinstance(value, h5py.Dataset):
n_lazy += 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip unexpected non-dataset HDF5 members here.

This now excludes non-datasets from _n_lazy, but it still inserts them into __dict__. A root group then shows up in keys() and breaks __len__ / masking because LazyArrayDict assumes every public entry is array-like. Either ignore or reject non-h5py.Dataset members before assignment.

Possible fix
         obj = cls.__new__(cls)
         n_lazy = 0
         for key, value in file.items():
-            obj.__dict__[key] = value
-            if isinstance(value, h5py.Dataset):
-                n_lazy += 1
+            if not isinstance(value, h5py.Dataset):
+                continue
+            obj.__dict__[key] = value
+            n_lazy += 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for key, value in file.items():
obj.__dict__[key] = value
if isinstance(value, h5py.Dataset):
n_lazy += 1
for key, value in file.items():
if not isinstance(value, h5py.Dataset):
continue
obj.__dict__[key] = value
n_lazy += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@temporaldata/arraydict.py` around lines 455 - 458, The loop that assigns HDF5
group members into obj.__dict__ is inserting non-dataset members (e.g., groups)
which breaks LazyArrayDict assumptions and _n_lazy counting; change the logic in
the loop inside class LazyArrayDict (the for key, value in file.items() block)
to skip any value that is not an instance of h5py.Dataset before assigning to
obj.__dict__ and before incrementing _n_lazy (i.e., only set obj.__dict__[key]
and increment _n_lazy when isinstance(value, h5py.Dataset)); alternatively, if
you prefer strictness, raise a TypeError for non-dataset members instead of
silently skipping—ensure keys(), __len__ and masking only see dataset entries.

…assignment and deletion. Added tests to ensure correct promotion to ArrayDict after modifying datasets.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
temporaldata/arraydict.py (1)

490-498: ⚠️ Potential issue | 🟡 Minor

Skip non-dataset HDF5 members during from_hdf5 loading.

The loop stores all HDF5 items directly into __dict__, including groups, but only counts datasets for _n_lazy. When select_by_mask() iterates keys() and applies masking with value[mask], h5py.Group objects would fail since they don't support boolean indexing. HDF5 groups are created in production code (e.g., data.py, regular_ts.py, irregular_ts.py), so this issue would manifest in real usage.

🔧 Suggested fix: skip non-datasets before assignment
         obj = cls.__new__(cls)
         n_lazy = 0
         for key, value in file.items():
+            if not isinstance(value, h5py.Dataset):
+                continue
             obj.__dict__[key] = value
-            if isinstance(value, h5py.Dataset):
-                n_lazy += 1
+            n_lazy += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@temporaldata/arraydict.py` around lines 490 - 498, The from_hdf5 loader is
assigning all HDF5 members (including h5py.Group) into obj.__dict__ but only
counts datasets in _n_lazy, causing select_by_mask to later attempt boolean
indexing on groups; modify the loop in from_hdf5 to only assign and count
members when isinstance(value, h5py.Dataset) (skip non-dataset members), leaving
_lazy_ops, _unicode_keys and _n_lazy behavior unchanged so select_by_mask and
other code that expects datasets won't receive groups.
🧹 Nitpick comments (1)
tests/test_arraydict.py (1)

216-235: Strengthen test to verify non-dataset members are excluded from keys().

The test validates that _n_lazy only counts datasets, but doesn't verify that the nested group "nested_metadata" is excluded from lazy.keys(). This would catch the issue where non-datasets are still inserted into __dict__.

✨ Suggested enhancement
     with h5py.File(test_filepath, "r") as f:
         assert len(f) == 4
         lazy = LazyArrayDict.from_hdf5(f)
         assert lazy.__dict__["_n_lazy"] == 3
+        # Verify non-dataset members are not in keys
+        assert "nested_metadata" not in lazy.keys()
+        assert len(lazy.keys()) == 3
         _ = lazy.unit_id
         _ = lazy.brain_region
         _ = lazy.waveform_mean
         assert lazy.__class__ == ArrayDict
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_arraydict.py` around lines 216 - 235, Add an assertion that the
non-dataset group "nested_metadata" is not treated as a key by the lazy loader:
after creating lazy via LazyArrayDict.from_hdf5(f) (variable name lazy), assert
that "nested_metadata" is not in lazy.keys() (or alternatively not in
lazy.__dict__) to ensure only datasets are represented as keys and non-dataset
groups aren’t inserted into the object's state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@temporaldata/arraydict.py`:
- Around line 490-498: The from_hdf5 loader is assigning all HDF5 members
(including h5py.Group) into obj.__dict__ but only counts datasets in _n_lazy,
causing select_by_mask to later attempt boolean indexing on groups; modify the
loop in from_hdf5 to only assign and count members when isinstance(value,
h5py.Dataset) (skip non-dataset members), leaving _lazy_ops, _unicode_keys and
_n_lazy behavior unchanged so select_by_mask and other code that expects
datasets won't receive groups.

---

Nitpick comments:
In `@tests/test_arraydict.py`:
- Around line 216-235: Add an assertion that the non-dataset group
"nested_metadata" is not treated as a key by the lazy loader: after creating
lazy via LazyArrayDict.from_hdf5(f) (variable name lazy), assert that
"nested_metadata" is not in lazy.keys() (or alternatively not in lazy.__dict__)
to ensure only datasets are represented as keys and non-dataset groups aren’t
inserted into the object's state.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88e78040-f20f-4902-8a2c-c4e82bd7989b

📥 Commits

Reviewing files that changed from the base of the PR and between c00c900 and b0fdafc.

📒 Files selected for processing (4)
  • temporaldata/arraydict.py
  • tests/test_arraydict.py
  • tests/test_irregular_ts.py
  • tests/test_regular_ts.py

@milosobral
Copy link
Copy Markdown
Contributor Author

After some back and forth, we (@vinamarora8 and I) have decided that, although this does speed up the code quite significantly, the added complexity and potential for dangerous bugs is not worth it for this specific optimization.

@milosobral milosobral closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants