Replaced lazy all_loaded scans with an _n_lazy counter by milosobral · Pull Request #103 · neuro-galaxy/temporaldata

milosobral · 2026-03-05T18:26:46Z

Before: lazy objects repeatedly scanned attributes to check if everything was loaded.
Now: a simple counter tracks remaining lazy fields and flips to eager mode when it reaches zero.
Why this helps: avoids repeated O(n) checks during lazy access, which is important for large objects with many fields.

This optimization makes quite a small difference for small objects but does make a bigger difference for larger ones with more attributes.
Benchmark results:

Benchmark	`041f06c` (µs)	Working Tree (µs)	Speedup
Data.slice() (lazy, realistic)	21686.920	19182.811	1.13x
Data.slice() (in-memory)	597.558	609.912	0.98x
IrregularTimeSeries.slice()	55.478	55.153	1.01x
Interval.slice()	5.802	5.869	0.99x
Interval.and (1k&single)	558.361	551.012	1.01x
Interval.and (1k&100)	715.903	728.079	0.98x
Interval.or (1k\|100)	4122.880	4147.787	0.99x
Interval.difference (1k-100)	3967.796	3957.231	1.00x
ArrayDict.keys() x100k	0.661	0.641	1.03x
LazyInterval access (10 attrs)	1778.541	1599.147	1.11x
LazyInterval access (500 attrs)	164313.927	53811.654	3.05x

Summary by CodeRabbit

New Features
- Lazy-loaded temporal objects now track remaining lazy fields and automatically convert to fully materialized versions once all pending fields are loaded.
Benchmarks
- Benchmark suite updated with parameterized lazy-access scenarios and preset 10/500-attribute runs for improved profiling.
Tests
- Added tests covering lazy-load counters, materialization order, and automatic conversion behavior.
Changelog
- Documented the new counter-driven eager conversion.

…oved benchmark to really show that change

coderabbitai · 2026-03-05T18:27:50Z

📝 Walkthrough

Walkthrough

A per-instance lazy-count _n_lazy is added across LazyArrayDict, LazyInterval, LazyRegularTimeSeries, and LazyIrregularTimeSeries to track remaining HDF5-backed attributes; when it reaches zero, objects convert to their eager counterparts and remove lazy bookkeeping. Benchmarks and tests updated to exercise the counter.

Changes

Cohort / File(s)	Summary
Lazy core updates `temporaldata/arraydict.py`, `temporaldata/interval.py`, `temporaldata/regular_ts.py`, `temporaldata/irregular_ts.py`	Add `_n_lazy` per-instance counter; narrow `__getattribute__` interception to non-private attrs present in `__dict__`; decrement `_n_lazy` when materializing h5py datasets; promote to eager classes when `_n_lazy == 0`; add `__setattr__`/`__delattr__` (ArrayDict), propagate `_n_lazy` via `from_hdf5`, `select_by_mask`, and `slice`.
Benchmarks `benchmarks/benchmark.py`	Refactor `bench_lazy_interval_access` to accept `num_attrs`, generate `target_pos_{i}` fields, reduce `timekeys`, change measurement loops and iteration counts; add wrappers `bench_lazy_interval_access_10` and `bench_lazy_interval_access_500`; simplify temp file handling.
Tests `tests/test_arraydict.py`, `tests/test_interval.py`, `tests/test_irregular_ts.py`, `tests/test_regular_ts.py`	Add tests validating `_n_lazy` initialization (counts only dataset members), decrementation on attribute access, behavior across `select_by_mask` and `slice`, and automatic promotion to eager types after final lazy attribute is materialized.
Changelog `CHANGELOG.md`	Add Unreleased → Added note documenting a simple counter tracking remaining lazy fields and automatic switch to eager mode.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LazyObj as Lazy Object
    participant AttrHandler as __getattribute__
    participant HDF5 as HDF5 Dataset
    participant Converter as Promotion Logic
    participant EagerObj as Eager Object

    Client->>LazyObj: access attribute (e.g., "start" or "target_pos_i")
    LazyObj->>AttrHandler: intercept access
    AttrHandler->>HDF5: detect h5py.Dataset (lazy)
    HDF5-->>AttrHandler: dataset reference
    AttrHandler->>AttrHandler: materialize -> NumPy array
    AttrHandler->>LazyObj: store array in __dict__
    AttrHandler->>LazyObj: decrement _n_lazy
    LazyObj->>LazyObj: check _n_lazy
    alt _n_lazy == 0
        LazyObj->>Converter: trigger promotion
        Converter->>EagerObj: convert instance, remove lazy state
        EagerObj-->>Client: return eager attribute/object
    else _n_lazy > 0
        LazyObj-->>Client: return materialized attribute (object remains lazy)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A tiny counter counts each hop,
Lazy fields fall as datasets drop.
When zero whispers, the swap's complete—
From sleepy hops to eager feet! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.54% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main change: replacing lazy attribute-loading scans with a counter mechanism to track remaining lazy fields.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch milo/lazy-attributes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-05T18:29:35Z

Codecov Report

❌ Patch coverage is 95.69892% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
temporaldata/arraydict.py	90.90%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

benchmarks/benchmark.py (1)

52-53: Remove or implement the unused span parameter.

span on Line 52 is never used, so the helper API is misleading. If it’s not needed, remove it.

Proposed simplification

-def _make_disjoint_intervals(
-    n, span=10_000, min_gap=1.0, min_dur=0.5, max_dur=2.0, seed=42
-):
+def _make_disjoint_intervals(
+    n, min_gap=1.0, min_dur=0.5, max_dur=2.0, seed=42
+):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark.py` around lines 52 - 53, The function signature in
benchmarks/benchmark.py declares a parameter span (n, span=10_000, min_gap=1.0,
min_dur=0.5, max_dur=2.0, seed=42) that is never used; either remove span from
the signature and any callers, or implement its intended behavior where the
function (and helper functions like any generator or sampler used inside)
constrains/uses the overall span value. Update the signature and all call sites
to match, and run tests to ensure no references to span remain. Ensure you
adjust the docstring and parameter list (and any default) for the function to
reflect the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@benchmarks/benchmark.py`:
- Around line 300-335: The temp file created in bench_lazy_interval_access
(tmpfile/path) is only removed at the end and can leak on exceptions; wrap the
setup and benchmark execution in a try/finally (or use the same pattern as
bench_data_slice_lazy) so os.unlink(path) is always called in the finally block,
ensuring the temporary file is removed on both success and failure; locate
function bench_lazy_interval_access and move the cleanup into a finally that
always executes after the h5py usage and _bench call.

In `@benchmarks/compare.py`:
- Around line 33-38: Replace raw subprocess.run invocations in
benchmarks/compare.py with hardened calls: resolve the git executable via
shutil.which("git") and use that absolute path instead of relying on PATH, add a
reasonable timeout value to prevent hanging, and enable check=True so failures
raise exceptions you can handle. Update each subprocess.run call (the ones
creating result and similar calls around the previous invocations) to use the
resolved git path, timeout, and check parameters and handle
subprocess.CalledProcessError where appropriate.

In `@tests/test_arraydict.py`:
- Around line 196-200: The test uses bare attribute access expressions on the
Data object (data.unit_id, data.brain_region, data.waveform_mean) to drive lazy
loading and triggers Ruff B018; update these to explicitly assign the accessed
values to a throwaway variable (e.g., use "_" ) so the side-effect intent is
clear and the linter is satisfied while keeping the existing asserts that check
data.__dict__["_n_lazy"].

In `@tests/test_interval.py`:
- Around line 269-275: The test uses bare attribute accesses (data.start,
data.end, data.go_cue_time, data.drifting_gratings_dir and the similar access at
line 307) solely to trigger lazy materialization, which Ruff flags as B018;
change each bare access to an explicit discard assignment (e.g., _ = data.start)
so the side effect is preserved while satisfying the linter, updating
occurrences for the attributes referenced above.

---

Nitpick comments:
In `@benchmarks/benchmark.py`:
- Around line 52-53: The function signature in benchmarks/benchmark.py declares
a parameter span (n, span=10_000, min_gap=1.0, min_dur=0.5, max_dur=2.0,
seed=42) that is never used; either remove span from the signature and any
callers, or implement its intended behavior where the function (and helper
functions like any generator or sampler used inside) constrains/uses the overall
span value. Update the signature and all call sites to match, and run tests to
ensure no references to span remain. Ensure you adjust the docstring and
parameter list (and any default) for the function to reflect the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c5b0d6c-011f-46c8-b6b4-86ce7cd0a2fc

📥 Commits

Reviewing files that changed from the base of the PR and between 041f06c and dcc62cb.

📒 Files selected for processing (9)

CHANGELOG.md
benchmarks/benchmark.py
benchmarks/compare.py
temporaldata/arraydict.py
temporaldata/interval.py
temporaldata/irregular_ts.py
temporaldata/regular_ts.py
tests/test_arraydict.py
tests/test_interval.py

…nto milo/lazy-attributes

github-actions · 2026-03-10T18:49:29Z

Benchmark comparison vs main

Baseline: origin/main
Target: HEAD

Running benchmarks for 7f4724182a...
Running benchmarks for b0fdafccb7...

  Benchmark                                     7f4724182a (µs)    b0fdafccb7 (µs)    Speedup
  --------------------------------------------------------------------------------------------
  Data.slice() (lazy, realistic)                       1918.066           1043.529      1.84x
  Data.slice() (in-memory)                              746.784            743.584      1.00x
  IrregularTimeSeries.slice()                            86.482             86.295      1.00x
  RegularTimeSeries.slice()                              44.214             44.005      1.00x
  Interval.slice()                                        9.240              9.232      1.00x
  Interval.__and__ (1k&single)                           50.819             50.627      1.00x
  Interval.__and__ (1k&100)                            1197.922           1177.819      1.02x
  Interval.__or__ (1k|100)                               65.401             64.996      1.01x
  Interval.difference (1k-100)                         5754.280           5771.520      1.00x
  ArrayDict.keys() x100k                                  1.796              1.804      1.00x
  LazyInterval access (10 attrs)                       2758.103           2466.841      1.12x
  LazyInterval access (500 attrs)                    384729.702          84630.665      4.55x

vinamarora8

@milosobral Leaving some comments!
Also, curious as to why no tests are added for IrregularTimeSeries?
Can you also merge main into this pls.

vinamarora8 · 2026-03-13T21:04:22Z


-                    # store it, now the array is loaded
                    self.__dict__[name] = out
+                    self.__dict__["_n_lazy"] -= 1


Why not just self._n_lazy -= 1?

vinamarora8 · 2026-03-16T18:29:55Z

        obj = cls.__new__(cls)
        for key, value in file.items():
            obj.__dict__[key] = value

        obj._unicode_keys = file.attrs["_unicode_keys"].astype(str).tolist()
        obj._lazy_ops = {}
+        obj._n_lazy = len(file)



Is this going to be safe always? Maybe we should do this?

for key, value in file.item(): obj.__dict__[key] = value if isinstance(value, h5py.Dataset): self._n_lazy += 1

yea good catch

…n_lazy. - Added a new test to verify that _n_lazy reflects the number of datasets in LazyArrayDict.

…nto milo/lazy-attributes

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@temporaldata/arraydict.py`:
- Around line 367-386: The object can become permanently “lazy” because _n_lazy
is not updated when callers set or delete public fields; override
LazyArrayDict.__setattr__ and __delattr__ to keep _n_lazy in sync with actual
dataset-valued entries: when setting an attribute (in __setattr__) check the
previous value from self.__dict__ and the incoming value and increment/decrement
_n_lazy if a h5py.Dataset (or whatever lazy marker you use) is being added or
removed (or replaced by a non-dataset), then delegate to ArrayDict.__setattr__
or assign into __dict__; when deleting an attribute (in __delattr__) decrement
_n_lazy if the removed value was a dataset, then perform deletion and run the
same promotion logic used in __getattribute__ (flip to ArrayDict and del
_lazy_ops/_unicode_keys/_n_lazy when _n_lazy reaches 0). Ensure you reference
__getattribute__, __setattr__, __delattr__, _n_lazy, LazyArrayDict, ArrayDict
and to_hdf5 when implementing these changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 131a3d36-e278-4a0c-9683-e54f4f2c8f6c

📥 Commits

Reviewing files that changed from the base of the PR and between dcc62cb and 4f7de61.

📒 Files selected for processing (5)

CHANGELOG.md
benchmarks/benchmark.py
temporaldata/arraydict.py
tests/test_arraydict.py
tests/test_interval.py

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.md

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_arraydict.py
tests/test_interval.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

benchmarks/benchmark.py (1)

339-356: ⚠️ Potential issue | 🟡 Minor

Move temp-file cleanup into a finally.

The trailing os.unlink(path) still won't run if setup or the benchmark body raises, so failed benchmark runs can leak temp files.

Possible fix

-    with h5py.File(path, "w") as f:
-        iv.to_hdf5(f)
-
-    results = None
-    with h5py.File(path, "r") as f:
-
-        def go():
-            lazy = LazyInterval.from_hdf5(f)
-            _ = lazy.start
-            _ = lazy.end
-
-            for i in range(num_attrs):
-                _ = getattr(lazy, f"target_pos_{i}")
-
-        results = _bench(f"LazyInterval access ({num_attrs} attrs)", go, number=25)
-
-    os.unlink(path)
-    return results
+    try:
+        with h5py.File(path, "w") as f:
+            iv.to_hdf5(f)
+
+        with h5py.File(path, "r") as f:
+
+            def go():
+                lazy = LazyInterval.from_hdf5(f)
+                _ = lazy.start
+                _ = lazy.end
+
+                for i in range(num_attrs):
+                    _ = getattr(lazy, f"target_pos_{i}")
+
+            return _bench(f"LazyInterval access ({num_attrs} attrs)", go, number=25)
+    finally:
+        if os.path.exists(path):
+            os.unlink(path)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmarks/benchmark.py` around lines 339 - 356, The temp file at variable
path can leak if an exception occurs before the final os.unlink(path); wrap the
file-creation and benchmark logic (the with h5py.File(path, "w") as f: ... the
with h5py.File(path, "r") as f: ... and the call to _bench inside go which uses
LazyInterval.from_hdf5) in a try/finally and move os.unlink(path) into the
finally block so the file is always removed even on errors; ensure any early
returns still let the finally execute and reference path,
LazyInterval.from_hdf5, and _bench unchanged.

temporaldata/arraydict.py (1)

342-386: ⚠️ Potential issue | 🟠 Major

Counter-based promotion still needs mutation hooks.

Now that promotion depends entirely on _n_lazy, replacing or deleting a still-lazy public field can leave the instance stuck as LazyArrayDict even after every dataset is gone. Please keep the counter in sync in __setattr__/__delattr__ and mirror the same fix in the other lazy containers added in this PR.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@temporaldata/arraydict.py` around lines 342 - 386, The lazy-promotion counter
_n_lazy is not updated when public fields are set or deleted, so implement hooks
in __setattr__ and __delattr__ to keep it in sync: in __setattr__(self, name,
value) if name is a public key (not starting with "_") adjust _n_lazy up when
assigning an h5py.Dataset (or other lazy sentinel) and adjust down when
replacing an existing h5py.Dataset with a non-lazy value; in __delattr__(self,
name) if deleting a public key decrement _n_lazy if the deleted value is an
h5py.Dataset; after adjustments, check the same promotion condition (if
self._n_lazy == 0 then promote by setting __class__ = ArrayDict and deleting
_lazy_ops/_unicode_keys/_n_lazy) to mirror the behavior in __getattribute__;
apply the same pattern to the other lazy container classes introduced in this PR
so all lazy containers update their _n_lazy consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@temporaldata/arraydict.py`:
- Around line 455-458: The loop that assigns HDF5 group members into
obj.__dict__ is inserting non-dataset members (e.g., groups) which breaks
LazyArrayDict assumptions and _n_lazy counting; change the logic in the loop
inside class LazyArrayDict (the for key, value in file.items() block) to skip
any value that is not an instance of h5py.Dataset before assigning to
obj.__dict__ and before incrementing _n_lazy (i.e., only set obj.__dict__[key]
and increment _n_lazy when isinstance(value, h5py.Dataset)); alternatively, if
you prefer strictness, raise a TypeError for non-dataset members instead of
silently skipping—ensure keys(), __len__ and masking only see dataset entries.

---

Duplicate comments:
In `@benchmarks/benchmark.py`:
- Around line 339-356: The temp file at variable path can leak if an exception
occurs before the final os.unlink(path); wrap the file-creation and benchmark
logic (the with h5py.File(path, "w") as f: ... the with h5py.File(path, "r") as
f: ... and the call to _bench inside go which uses LazyInterval.from_hdf5) in a
try/finally and move os.unlink(path) into the finally block so the file is
always removed even on errors; ensure any early returns still let the finally
execute and reference path, LazyInterval.from_hdf5, and _bench unchanged.

In `@temporaldata/arraydict.py`:
- Around line 342-386: The lazy-promotion counter _n_lazy is not updated when
public fields are set or deleted, so implement hooks in __setattr__ and
__delattr__ to keep it in sync: in __setattr__(self, name, value) if name is a
public key (not starting with "_") adjust _n_lazy up when assigning an
h5py.Dataset (or other lazy sentinel) and adjust down when replacing an existing
h5py.Dataset with a non-lazy value; in __delattr__(self, name) if deleting a
public key decrement _n_lazy if the deleted value is an h5py.Dataset; after
adjustments, check the same promotion condition (if self._n_lazy == 0 then
promote by setting __class__ = ArrayDict and deleting
_lazy_ops/_unicode_keys/_n_lazy) to mirror the behavior in __getattribute__;
apply the same pattern to the other lazy container classes introduced in this PR
so all lazy containers update their _n_lazy consistently.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 82eb43df-8459-4e3a-bab2-cb843c52a1ab

📥 Commits

Reviewing files that changed from the base of the PR and between 4f7de61 and c00c900.

📒 Files selected for processing (8)

CHANGELOG.md
benchmarks/benchmark.py
temporaldata/arraydict.py
temporaldata/interval.py
temporaldata/irregular_ts.py
temporaldata/regular_ts.py
tests/test_arraydict.py
tests/test_interval.py

✅ Files skipped from review due to trivial changes (1)

tests/test_interval.py

🚧 Files skipped from review as they are similar to previous changes (3)

CHANGELOG.md
tests/test_arraydict.py
temporaldata/regular_ts.py

coderabbitai · 2026-03-23T18:39:36Z

        for key, value in file.items():
            obj.__dict__[key] = value
+            if isinstance(value, h5py.Dataset):
+                n_lazy += 1


⚠️ Potential issue | 🟠 Major

Skip unexpected non-dataset HDF5 members here.

This now excludes non-datasets from _n_lazy, but it still inserts them into __dict__. A root group then shows up in keys() and breaks __len__ / masking because LazyArrayDict assumes every public entry is array-like. Either ignore or reject non-h5py.Dataset members before assignment.

Possible fix

obj = cls.__new__(cls) n_lazy = 0 for key, value in file.items(): - obj.__dict__[key] = value - if isinstance(value, h5py.Dataset): - n_lazy += 1 + if not isinstance(value, h5py.Dataset): + continue + obj.__dict__[key] = value + n_lazy += 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for key, value in file.items():

obj.__dict__[key] = value

if isinstance(value, h5py.Dataset):

n_lazy += 1

for key, value in file.items():

if not isinstance(value, h5py.Dataset):

continue

obj.__dict__[key] = value

n_lazy += 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@temporaldata/arraydict.py` around lines 455 - 458, The loop that assigns HDF5 group members into obj.__dict__ is inserting non-dataset members (e.g., groups) which breaks LazyArrayDict assumptions and _n_lazy counting; change the logic in the loop inside class LazyArrayDict (the for key, value in file.items() block) to skip any value that is not an instance of h5py.Dataset before assigning to obj.__dict__ and before incrementing _n_lazy (i.e., only set obj.__dict__[key] and increment _n_lazy when isinstance(value, h5py.Dataset)); alternatively, if you prefer strictness, raise a TypeError for non-dataset members instead of silently skipping—ensure keys(), __len__ and masking only see dataset entries.

…assignment and deletion. Added tests to ensure correct promotion to ArrayDict after modifying datasets.

coderabbitai

♻️ Duplicate comments (1)

temporaldata/arraydict.py (1)
490-498: ⚠️ Potential issue | 🟡 Minor

Skip non-dataset HDF5 members during from_hdf5 loading.

The loop stores all HDF5 items directly into __dict__, including groups, but only counts datasets for _n_lazy. When select_by_mask() iterates keys() and applies masking with value[mask], h5py.Group objects would fail since they don't support boolean indexing. HDF5 groups are created in production code (e.g., data.py, regular_ts.py, irregular_ts.py), so this issue would manifest in real usage.
🔧 Suggested fix: skip non-datasets before assignment
         obj = cls.__new__(cls)
         n_lazy = 0
         for key, value in file.items():
+            if not isinstance(value, h5py.Dataset):
+                continue
             obj.__dict__[key] = value
-            if isinstance(value, h5py.Dataset):
-                n_lazy += 1
+            n_lazy += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@temporaldata/arraydict.py` around lines 490 - 498, The from_hdf5 loader is
assigning all HDF5 members (including h5py.Group) into obj.__dict__ but only
counts datasets in _n_lazy, causing select_by_mask to later attempt boolean
indexing on groups; modify the loop in from_hdf5 to only assign and count
members when isinstance(value, h5py.Dataset) (skip non-dataset members), leaving
_lazy_ops, _unicode_keys and _n_lazy behavior unchanged so select_by_mask and
other code that expects datasets won't receive groups.

🧹 Nitpick comments (1)

tests/test_arraydict.py (1)

216-235: Strengthen test to verify non-dataset members are excluded from keys().

The test validates that _n_lazy only counts datasets, but doesn't verify that the nested group "nested_metadata" is excluded from lazy.keys(). This would catch the issue where non-datasets are still inserted into __dict__.

✨ Suggested enhancement

     with h5py.File(test_filepath, "r") as f:
         assert len(f) == 4
         lazy = LazyArrayDict.from_hdf5(f)
         assert lazy.__dict__["_n_lazy"] == 3
+        # Verify non-dataset members are not in keys
+        assert "nested_metadata" not in lazy.keys()
+        assert len(lazy.keys()) == 3
         _ = lazy.unit_id
         _ = lazy.brain_region
         _ = lazy.waveform_mean
         assert lazy.__class__ == ArrayDict

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_arraydict.py` around lines 216 - 235, Add an assertion that the
non-dataset group "nested_metadata" is not treated as a key by the lazy loader:
after creating lazy via LazyArrayDict.from_hdf5(f) (variable name lazy), assert
that "nested_metadata" is not in lazy.keys() (or alternatively not in
lazy.__dict__) to ensure only datasets are represented as keys and non-dataset
groups aren’t inserted into the object's state.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@temporaldata/arraydict.py`:
- Around line 490-498: The from_hdf5 loader is assigning all HDF5 members
(including h5py.Group) into obj.__dict__ but only counts datasets in _n_lazy,
causing select_by_mask to later attempt boolean indexing on groups; modify the
loop in from_hdf5 to only assign and count members when isinstance(value,
h5py.Dataset) (skip non-dataset members), leaving _lazy_ops, _unicode_keys and
_n_lazy behavior unchanged so select_by_mask and other code that expects
datasets won't receive groups.

---

Nitpick comments:
In `@tests/test_arraydict.py`:
- Around line 216-235: Add an assertion that the non-dataset group
"nested_metadata" is not treated as a key by the lazy loader: after creating
lazy via LazyArrayDict.from_hdf5(f) (variable name lazy), assert that
"nested_metadata" is not in lazy.keys() (or alternatively not in lazy.__dict__)
to ensure only datasets are represented as keys and non-dataset groups aren’t
inserted into the object's state.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88e78040-f20f-4902-8a2c-c4e82bd7989b

📥 Commits

Reviewing files that changed from the base of the PR and between c00c900 and b0fdafc.

📒 Files selected for processing (4)

temporaldata/arraydict.py
tests/test_arraydict.py
tests/test_irregular_ts.py
tests/test_regular_ts.py

milosobral · 2026-03-25T13:10:21Z

After some back and forth, we (@vinamarora8 and I) have decided that, although this does speed up the code quite significantly, the added complexity and potential for dangerous bugs is not worth it for this specific optimization.

milosobral added 4 commits March 5, 2026 12:07

Benchmarking code

88be16f

Updated Changelog

dcc20bf

Adressed Coderabbit comments

3e52156

Added lazy counter to make accessing lazy files more efficient + impr…

dcc62cb

…oved benchmark to really show that change

milosobral requested a review from vinamarora8 March 5, 2026 18:26

milosobral self-assigned this Mar 5, 2026

milosobral added the enhancement New feature or request label Mar 5, 2026

milosobral mentioned this pull request Mar 5, 2026

Optimizations #99

Closed

coderabbitai Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread benchmarks/benchmark.py Outdated

Comment thread benchmarks/compare.py

Comment thread tests/test_arraydict.py Outdated

Comment thread tests/test_interval.py Outdated

milosobral added 3 commits March 5, 2026 13:40

Update to the benchmark to improve lazy loading benchmark

4f16e68

Nitpicky comment from coderabbit

b0b6a58

Merge branch 'main' of https://github.com/neuro-galaxy/temporaldata i…

925e721

…nto milo/lazy-attributes

Updated changelog

1904c04

vinamarora8 reviewed Mar 16, 2026

View reviewed changes

milosobral added 2 commits March 23, 2026 14:21

- Updated LazyArrayDict to correctly count only dataset members for _…

4f7de61

…n_lazy. - Added a new test to verify that _n_lazy reflects the number of datasets in LazyArrayDict.

Merge branch 'main' of https://github.com/neuro-galaxy/temporaldata i…

c00c900

…nto milo/lazy-attributes

milosobral requested a review from vinamarora8 March 23, 2026 18:25

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread temporaldata/arraydict.py Outdated

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

milosobral added 2 commits March 23, 2026 14:39

Enhance LazyArrayDict to manage lazy dataset counts during attribute …

2113056

…assignment and deletion. Added tests to ensure correct promotion to ArrayDict after modifying datasets.

added basics tests for its and rts

b0fdafc

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

milosobral closed this Mar 25, 2026

Conversation

milosobral commented Mar 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark comparison vs main

Uh oh!

vinamarora8 left a comment

Choose a reason for hiding this comment

Uh oh!

vinamarora8 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

vinamarora8 Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

milosobral Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

milosobral commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

milosobral commented Mar 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 5, 2026 •

edited

Loading

codecov Bot commented Mar 5, 2026 •

edited

Loading

github-actions Bot commented Mar 10, 2026 •

edited

Loading