Generalize the trace storage to support in-memory/zarr/csv #39

aseyboldt · 2025-09-10T13:30:28Z

No description provided.

codecov-commenter · 2025-09-10T14:08:58Z

Codecov Report

❌ Patch coverage is 51.66774% with 1507 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.12%. Comparing base (394c629) to head (8ea2dd6).
⚠️ Report is 78 commits behind head on main.

Files with missing lines	Patch %	Lines
src/storage/zarr/async_impl.rs	0.00%	456 Missing ⚠️
src/storage/ndarray.rs	0.00%	200 Missing ⚠️
src/storage/hashmap.rs	0.00%	189 Missing ⚠️
nuts-derive/src/lib.rs	62.73%	139 Missing ⚠️
nuts-storable/src/lib.rs	24.82%	109 Missing ⚠️
src/storage/csv.rs	84.00%	88 Missing ⚠️
src/storage/zarr/sync_impl.rs	78.75%	75 Missing ⚠️
src/sampler.rs	79.91%	46 Missing ⚠️
src/stepsize/adapt.rs	75.43%	42 Missing ⚠️
src/cpu_math.rs	53.16%	37 Missing ⚠️
... and 10 more

❗ There is a different number of reports uploaded between BASE (394c629) and HEAD (8ea2dd6). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (394c629) HEAD (8ea2dd6)

2 1

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #39       +/-   ##
===========================================
- Coverage   83.77%   61.12%   -22.65%     
===========================================
  Files           8       27       +19     
  Lines        1923     6429     +4506     
===========================================
+ Hits         1611     3930     +2319     
- Misses        312     2499     +2187

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull Request Overview

This PR generalizes the trace storage system in nuts-rs to support multiple backends including in-memory, Zarr, and CSV formats. The major changes replace the previous Arrow-based storage system with a more flexible architecture using new traits for storage configuration and chain management.

Key changes:

Replaces Arrow-based storage with modular storage backends
Introduces new StorageConfig and TraceStorage traits for flexible storage options
Updates the Math trait to use FlowParameters instead of TransformParams and adds vector expansion capability
Modifies the Model trait interface and sampling workflow

Reviewed Changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/sample_normal.rs	Updates test to use new Zarr storage backend and updated API
src/zarr_storage/sync_impl.rs	Implements synchronous Zarr storage backend with chunked array operations
src/zarr_storage/mod.rs	Module organization for Zarr storage implementations
src/zarr_storage/common.rs	Common utilities and types for Zarr storage backends
src/zarr_storage/async_impl.rs	Implements asynchronous Zarr storage backend
src/transformed_hamiltonian.rs	Updates to use new storable stats system instead of Arrow builders
src/transform_adapt_strategy.rs	Converts from Arrow-based to storable stats system
src/storage.rs	Defines core storage traits and interfaces
src/stepsize_dual_avg.rs	Adds serialization support to dual averaging types
src/stepsize_adapt.rs	Major refactoring to support multiple step size adaptation methods
src/stepsize_adam.rs	New Adam optimizer implementation for step size adaptation
src/sampler_stats.rs	Simplified stats system using storable traits
src/sampler.rs	Major refactoring of sampler to use new storage and model interfaces
src/nuts.rs	Adds minimum depth support and removes unused Arrow imports
src/ndarray_storage.rs	Implements ndarray-based storage backend
src/model.rs	New model trait definition
src/math_base.rs	Updates Math trait with new methods and renamed types
src/mass_matrix_adapt.rs	Converts to storable stats system
src/mass_matrix.rs	Updates mass matrix types to use storable system
src/low_rank_mass_matrix.rs	Converts complex Arrow stats to simple storable types
src/lib.rs	Updates public API exports for new storage and math systems
src/hashmap_storage.rs	Implements simple HashMap-based storage backend
src/euclidean_hamiltonian.rs	Updates to use storable stats instead of Arrow
src/csv_storage.rs	Implements CSV storage backend with CmdStan compatibility
src/cpu_math.rs	Adds dimension and coordinate support, updates for new Math trait
src/chain.rs	Major refactoring to use storable stats and new sampling interface

Comments suppressed due to low confidence (2)

src/stepsize_adapt.rs:1

The default method is set to Adam but the documentation comment says 'Use dual averaging for step size adaptation (default)'. This inconsistency should be resolved.

use itertools::Either;

tests/sample_normal.rs:1

Commented code should be removed. The function signature appears to have been changed but the old version is still commented out.

use std::{

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/zarr_storage/sync_impl.rs

src/zarr_storage/async_impl.rs

src/sampler.rs

Copilot · 2025-09-12T08:46:32Z

src/nuts.rs


        assert!(!progress.diverging);
-        StatTraceBuilder::<_, NutsChain<_, ThreadRng, _>>::finalize(builder);
+        // TODO check stats?


TODO comment indicates incomplete implementation. This should either be implemented or the comment should be more specific about what needs to be checked.

Suggested change

// TODO check stats?

// Check that progress.stats is not empty (if stats are collected)

// For example, if progress has a stats field:

// assert!(!progress.stats.is_empty());

tests/sample_normal.rs

Copilot

Pull Request Overview

Copilot reviewed 48 out of 48 changed files in this pull request and generated 6 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-12T10:47:46Z

src/transform_adapt_strategy.rs

        if draw < self.final_window_size {
            if draw < 100 {
-                if (draw > 0) & (draw % 10 == 0) {
+                if (draw > 0) & draw.is_multiple_of(10) {


The bitwise AND operator & should be replaced with the logical AND operator && for boolean conditions. While both work due to Rust's type system, && is the conventional operator for logical conditions and provides short-circuiting behavior.

Suggested change

if (draw > 0) & draw.is_multiple_of(10) {

if (draw > 0) && draw.is_multiple_of(10) {

Copilot · 2025-09-12T10:47:46Z

src/transform_adapt_strategy.rs

                    )?;
                }
-            } else if (draw > 0) & (draw % self.options.transform_update_freq == 0) {
+            } else if (draw > 0) & draw.is_multiple_of(self.options.transform_update_freq) {


The bitwise AND operator & should be replaced with the logical AND operator && for boolean conditions. While both work due to Rust's type system, && is the conventional operator for logical conditions and provides short-circuiting behavior.

Suggested change

} else if (draw > 0) & draw.is_multiple_of(self.options.transform_update_freq) {

} else if (draw > 0) && draw.is_multiple_of(self.options.transform_update_freq) {

Copilot · 2025-09-12T10:47:47Z

src/storage/zarr/common.rs

+        )
+        .dimension_names(Some(dims))
+        .build(store.clone(), &format!("{}/{}", group_path, name))?;
+        //array.store_metadata()?;


Commented-out code should be removed unless there's a specific reason to keep it. If this is temporarily disabled for debugging, consider adding a comment explaining why.

Suggested change

//array.store_metadata()?;

Copilot · 2025-09-12T10:47:47Z

src/storage/zarr/async_impl.rs

+        tokio::runtime::Handle::current().block_on(store_zarr_chunk_async(
+            array,
+            data,
+            chain_chunk_index,
+        ))


This code is calling block_on twice in a nested manner, which is an anti-pattern and could lead to deadlocks or performance issues. The inner block_on call should be removed since we're already in an async context from the outer block_on.

Suggested change

tokio::runtime::Handle::current().block_on(store_zarr_chunk_async(

array,

data,

chain_chunk_index,

))

store_zarr_chunk_async(

array,

data,

chain_chunk_index,

).await

Copilot · 2025-09-12T10:47:47Z

src/mass_matrix/low_rank.rs

+            .filter(|&(&val, _)| {
                (val > self.settings.eigval_cutoff) | (val < self.settings.eigval_cutoff.recip())


The pattern |&(&val, _)| can be simplified to |(val, _)| by removing the reference destructuring, making the code more readable.

Suggested change

.filter(|&(&val, _)| {

(val > self.settings.eigval_cutoff) | (val < self.settings.eigval_cutoff.recip())

.filter(|(val, _)| {

(*val > self.settings.eigval_cutoff) | (*val < self.settings.eigval_cutoff.recip())

Copilot · 2025-09-12T10:47:48Z

src/state.rs

        let rc = unsafe { std::mem::ManuallyDrop::take(&mut self.inner) };
-        if (Rc::strong_count(&rc) == 1) & (Rc::weak_count(&rc) == 0) {
-            if let Some(storage) = rc.reuser.upgrade() {
+        if (Rc::strong_count(&rc) == 1) & (Rc::weak_count(&rc) == 0)


The bitwise AND operator & should be replaced with the logical AND operator && for boolean conditions. The existing && on line 108 shows the correct pattern to follow.

Suggested change

if (Rc::strong_count(&rc) == 1) & (Rc::weak_count(&rc) == 0)

if (Rc::strong_count(&rc) == 1) && (Rc::weak_count(&rc) == 0)

aseyboldt added 6 commits August 29, 2025 14:12

feat: allow sampling with fixed step size

4310bd0

feat: add step size jitter

9a4ccb8

feat: add mindepth option for nuts

b5dbaef

fixup: step size jitter

d236556

feat: enable step size jitter by default

e5cf1f9

feat: implement step size adaptation with adam

7ada9be

aseyboldt force-pushed the storage-backend branch 3 times, most recently from fd15e8d to af35e94 Compare September 10, 2025 15:11

aseyboldt requested a review from Copilot September 12, 2025 08:44

Copilot AI reviewed Sep 12, 2025

View reviewed changes

aseyboldt force-pushed the storage-backend branch 2 times, most recently from 4147467 to 62af0c4 Compare September 12, 2025 10:11

aseyboldt added 9 commits September 12, 2025 12:39

feat: generalize sample and stats storage

40a04a3

feature: add csv file storage backend

f3cf5bf

feat: implement async zarr storage

8807769

ci: specify features in CI

4e3bac0

feat: add rng to Model.math()

83c2711

style: some formatting changes

c499645

style: restructure packages

6926c1b

fix: restore dual-average step size adapt as default

0af19b6

style: some clippy fixes

8ea2dd6

aseyboldt force-pushed the storage-backend branch from 62af0c4 to 8ea2dd6 Compare September 12, 2025 10:44

aseyboldt requested a review from Copilot September 12, 2025 10:46

Copilot AI reviewed Sep 12, 2025

View reviewed changes

aseyboldt merged commit 3aac0a5 into pymc-devs:main Sep 12, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generalize the trace storage to support in-memory/zarr/csv #39

Generalize the trace storage to support in-memory/zarr/csv #39

Uh oh!

aseyboldt commented Sep 10, 2025

Uh oh!

codecov-commenter commented Sep 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Copilot AI Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

-        // TODO check stats?
+        // Check that progress.stats is not empty (if stats are collected)
+        // For example, if progress has a stats field:
+        // assert!(!progress.stats.is_empty());

	if (draw > 0) & draw.is_multiple_of(10) {
	if (draw > 0) && draw.is_multiple_of(10) {

	} else if (draw > 0) & draw.is_multiple_of(self.options.transform_update_freq) {
	} else if (draw > 0) && draw.is_multiple_of(self.options.transform_update_freq) {

		.filter(\|&(&val, _)\| {
		(val > self.settings.eigval_cutoff) \| (val < self.settings.eigval_cutoff.recip())

	if (Rc::strong_count(&rc) == 1) & (Rc::weak_count(&rc) == 0)
	if (Rc::strong_count(&rc) == 1) && (Rc::weak_count(&rc) == 0)

Generalize the trace storage to support in-memory/zarr/csv #39

Generalize the trace storage to support in-memory/zarr/csv #39

Uh oh!

Conversation

aseyboldt commented Sep 10, 2025

Uh oh!

codecov-commenter commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Sep 10, 2025 •

edited

Loading