Skip to content

Conversation

@Sicheng-Pan
Copy link
Contributor

@Sicheng-Pan Sicheng-Pan commented Feb 10, 2026

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • Updated a few structs for QuantizedSpannIndexWriter to facilitate segment writer
    • Separated scrub and rebuild centroid logic from commit to a separate finish
    • Introduces the new QuantizedSpannSegmentWriter, under the feature flag
    • Updated the VectorSegmentWriter to use the new writer impl

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

Sicheng-Pan commented Feb 10, 2026

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@Sicheng-Pan Sicheng-Pan force-pushed the 02-10-_enh_quantized_spann_segment branch from 8bded9c to 70b715c Compare February 10, 2026 22:26
@blacksmith-sh

This comment has been minimized.

@Sicheng-Pan Sicheng-Pan changed the title [ENH] Quantized Spann Segment [ENH] Quantized Spann Segment Writer Feb 11, 2026
@Sicheng-Pan Sicheng-Pan force-pushed the 02-10-_enh_quantized_spann_segment branch from 15768fa to e3957b5 Compare February 11, 2026 02:22
@Sicheng-Pan Sicheng-Pan marked this pull request as ready for review February 11, 2026 02:23
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Feb 11, 2026

Introduce Quantized SPANN Segment Writer and Supporting Infrastructure

Adds a full Quantized SPANN segment-writing path gated behind the usearch feature flag. Key pieces include a new QuantizedSpannSegmentWriter/flusher, substantial extensions to QuantizedSpannIndexWriter (split finish()/commit(), reopen/create helpers, blockfile plumbing), schema helpers for extracting SPANN configs, and wiring into VectorSegmentWriter with feature-guarded enum variants and error propagation. Supporting updates span record/log handling, configuration defaults, Cargo features, and extensive persistence/integration tests that exercise reopen, persistence, and GC scenarios.

Key Changes

• Added QuantizedSpannSegmentWriter/QuantizedSpannSegmentFlusher in rust/segment/src/quantized_spann.rs with segment lifecycle (from existing/new segments, log application, finish, commit, flush) and async reopen logic
• Enhancements to rust/index/src/spann/quantized_spann.rs: new finish() phase, cluster block sizing, file-id struct relocation, GC/scrub changes, refactored commit ordering, additional tests, and public helpers used by the new segment writer
• Schema/type updates (rust/types/src/collection_schema.rs, rust/types/src/segment.rs, rust/segment/src/types.rs) to expose SPANN configs, add file-path constants, wire new vector writer/flusher variants, and propagate log embedding helpers
• Extended infrastructure in rust/segment/src/distributed_spann.rs and rust/segment/src/blockfile_record.rs to distinguish legacy spann segments and route feature-gated quantized writer errors
• Cargo feature gating (rust/segment/Cargo.toml plus lib.rs) enabling compilation only when usearch is available, along with numerous new async persistence tests for the quantized writer pipeline

Possible Issues

• Schema reconciliation forcibly errors when quantize is true; validate this aligns with intended UX and won’t reject valid legacy configs
finish() is now required before commit(); callers that skip finish() (older code paths/tests) may see silent data corruption—ensure all call sites (including external consumers) are updated
• Quantized writer currently reads OFFSET_ID_TO_DATA blockfiles opportunistically; missing record segments will disable reopen but may not emit clear diagnostics
• Feature-flag gating may leave dead enum variants when usearch is off; confirm exhaustive matches elsewhere still compile
• Long-running integration tests significantly increase suite time; consider marking as ignored or moving to separate job if needed

This summary was automatically generated by @propel-code-bot

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Feb 11, 2026

OOM Events Detected

  • Job Python tests / test-rust-bindings-stress (3.9) has run into an OOM error.

@Sicheng-Pan Sicheng-Pan force-pushed the 02-10-_enh_quantized_spann_segment branch from e3957b5 to 0a585d2 Compare February 11, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant