-
Notifications
You must be signed in to change notification settings - Fork 11
Generalize the trace storage to support in-memory/zarr/csv #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #39 +/- ##
===========================================
- Coverage 83.77% 61.12% -22.65%
===========================================
Files 8 27 +19
Lines 1923 6429 +4506
===========================================
+ Hits 1611 3930 +2319
- Misses 312 2499 +2187 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fd15e8d
to
af35e94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR generalizes the trace storage system in nuts-rs to support multiple backends including in-memory, Zarr, and CSV formats. The major changes replace the previous Arrow-based storage system with a more flexible architecture using new traits for storage configuration and chain management.
Key changes:
- Replaces Arrow-based storage with modular storage backends
- Introduces new
StorageConfig
andTraceStorage
traits for flexible storage options - Updates the
Math
trait to useFlowParameters
instead ofTransformParams
and adds vector expansion capability - Modifies the
Model
trait interface and sampling workflow
Reviewed Changes
Copilot reviewed 42 out of 42 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
tests/sample_normal.rs | Updates test to use new Zarr storage backend and updated API |
src/zarr_storage/sync_impl.rs | Implements synchronous Zarr storage backend with chunked array operations |
src/zarr_storage/mod.rs | Module organization for Zarr storage implementations |
src/zarr_storage/common.rs | Common utilities and types for Zarr storage backends |
src/zarr_storage/async_impl.rs | Implements asynchronous Zarr storage backend |
src/transformed_hamiltonian.rs | Updates to use new storable stats system instead of Arrow builders |
src/transform_adapt_strategy.rs | Converts from Arrow-based to storable stats system |
src/storage.rs | Defines core storage traits and interfaces |
src/stepsize_dual_avg.rs | Adds serialization support to dual averaging types |
src/stepsize_adapt.rs | Major refactoring to support multiple step size adaptation methods |
src/stepsize_adam.rs | New Adam optimizer implementation for step size adaptation |
src/sampler_stats.rs | Simplified stats system using storable traits |
src/sampler.rs | Major refactoring of sampler to use new storage and model interfaces |
src/nuts.rs | Adds minimum depth support and removes unused Arrow imports |
src/ndarray_storage.rs | Implements ndarray-based storage backend |
src/model.rs | New model trait definition |
src/math_base.rs | Updates Math trait with new methods and renamed types |
src/mass_matrix_adapt.rs | Converts to storable stats system |
src/mass_matrix.rs | Updates mass matrix types to use storable system |
src/low_rank_mass_matrix.rs | Converts complex Arrow stats to simple storable types |
src/lib.rs | Updates public API exports for new storage and math systems |
src/hashmap_storage.rs | Implements simple HashMap-based storage backend |
src/euclidean_hamiltonian.rs | Updates to use storable stats instead of Arrow |
src/csv_storage.rs | Implements CSV storage backend with CmdStan compatibility |
src/cpu_math.rs | Adds dimension and coordinate support, updates for new Math trait |
src/chain.rs | Major refactoring to use storable stats and new sampling interface |
Comments suppressed due to low confidence (2)
src/stepsize_adapt.rs:1
- The default method is set to Adam but the documentation comment says 'Use dual averaging for step size adaptation (default)'. This inconsistency should be resolved.
use itertools::Either;
tests/sample_normal.rs:1
- Commented code should be removed. The function signature appears to have been changed but the old version is still commented out.
use std::{
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/nuts.rs
Outdated
|
||
assert!(!progress.diverging); | ||
StatTraceBuilder::<_, NutsChain<_, ThreadRng, _>>::finalize(builder); | ||
// TODO check stats? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO comment indicates incomplete implementation. This should either be implemented or the comment should be more specific about what needs to be checked.
// TODO check stats? | |
// Check that progress.stats is not empty (if stats are collected) | |
// For example, if progress has a stats field: | |
// assert!(!progress.stats.is_empty()); |
Copilot uses AI. Check for mistakes.
4147467
to
62af0c4
Compare
62af0c4
to
8ea2dd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 48 out of 48 changed files in this pull request and generated 6 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
if draw < self.final_window_size { | ||
if draw < 100 { | ||
if (draw > 0) & (draw % 10 == 0) { | ||
if (draw > 0) & draw.is_multiple_of(10) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bitwise AND operator &
should be replaced with the logical AND operator &&
for boolean conditions. While both work due to Rust's type system, &&
is the conventional operator for logical conditions and provides short-circuiting behavior.
if (draw > 0) & draw.is_multiple_of(10) { | |
if (draw > 0) && draw.is_multiple_of(10) { |
Copilot uses AI. Check for mistakes.
)?; | ||
} | ||
} else if (draw > 0) & (draw % self.options.transform_update_freq == 0) { | ||
} else if (draw > 0) & draw.is_multiple_of(self.options.transform_update_freq) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bitwise AND operator &
should be replaced with the logical AND operator &&
for boolean conditions. While both work due to Rust's type system, &&
is the conventional operator for logical conditions and provides short-circuiting behavior.
} else if (draw > 0) & draw.is_multiple_of(self.options.transform_update_freq) { | |
} else if (draw > 0) && draw.is_multiple_of(self.options.transform_update_freq) { |
Copilot uses AI. Check for mistakes.
) | ||
.dimension_names(Some(dims)) | ||
.build(store.clone(), &format!("{}/{}", group_path, name))?; | ||
//array.store_metadata()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code should be removed unless there's a specific reason to keep it. If this is temporarily disabled for debugging, consider adding a comment explaining why.
//array.store_metadata()?; |
Copilot uses AI. Check for mistakes.
tokio::runtime::Handle::current().block_on(store_zarr_chunk_async( | ||
array, | ||
data, | ||
chain_chunk_index, | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is calling block_on
twice in a nested manner, which is an anti-pattern and could lead to deadlocks or performance issues. The inner block_on
call should be removed since we're already in an async context from the outer block_on
.
tokio::runtime::Handle::current().block_on(store_zarr_chunk_async( | |
array, | |
data, | |
chain_chunk_index, | |
)) | |
store_zarr_chunk_async( | |
array, | |
data, | |
chain_chunk_index, | |
).await |
Copilot uses AI. Check for mistakes.
.filter(|&(&val, _)| { | ||
(val > self.settings.eigval_cutoff) | (val < self.settings.eigval_cutoff.recip()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern |&(&val, _)|
can be simplified to |(val, _)|
by removing the reference destructuring, making the code more readable.
.filter(|&(&val, _)| { | |
(val > self.settings.eigval_cutoff) | (val < self.settings.eigval_cutoff.recip()) | |
.filter(|(val, _)| { | |
(*val > self.settings.eigval_cutoff) | (*val < self.settings.eigval_cutoff.recip()) |
Copilot uses AI. Check for mistakes.
let rc = unsafe { std::mem::ManuallyDrop::take(&mut self.inner) }; | ||
if (Rc::strong_count(&rc) == 1) & (Rc::weak_count(&rc) == 0) { | ||
if let Some(storage) = rc.reuser.upgrade() { | ||
if (Rc::strong_count(&rc) == 1) & (Rc::weak_count(&rc) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bitwise AND operator &
should be replaced with the logical AND operator &&
for boolean conditions. The existing &&
on line 108 shows the correct pattern to follow.
if (Rc::strong_count(&rc) == 1) & (Rc::weak_count(&rc) == 0) | |
if (Rc::strong_count(&rc) == 1) && (Rc::weak_count(&rc) == 0) |
Copilot uses AI. Check for mistakes.
No description provided.