All notable changes to sqlite-diskann will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Dynamic
search_list_sizeauto-scaling: beam width automatically increases withsqrt(index_size)to maintain recall at scale. No manual tuning needed for most workloads. - Lazy back-edges optimization for deferred edge repair during batch inserts
diskann_begin_batch()anddiskann_end_batch()API for multi-insert optimizationdiskann_abort_batch()for transaction rollback support- BlobSpot reference counting to prevent use-after-free at scale
- Persistent BlobCache across batch insert operations
- Insert profiling instrumentation (enabled via
DISKANN_DEBUG_TIMING=1) - Performance experiment documentation framework in
experiments/directory - Comprehensive parameter tuning guide (
PARAMETERS.md) - Benchmark profiles for parameter sweep testing
- Stress tests for large-scale performance validation
- Virtual table integration now uses cache-only batch mode (lazy edges disabled for vtab path)
- Improved blob handle lifecycle management to prevent COMMIT blocking
- Enhanced experiment tracking with templates and detailed analysis requirements
- Critical: O(n) random start bottleneck reduced by 96.8% (replaced
COUNT(*)+OFFSETwith indexed seek) - Critical: Auto-calculate block size based on dimensions × max_neighbors for graph connectivity
- Critical: BlobCache UAF segfault at 100k scale (replaced
is_cached/owns_blobswith refcounting) - Blob handle expiration during Phase 2 immediate flush (auto-reopen in
blob_spot_flush()) - Deferred edge repair failure due to missing
is_abortedflag on cached blob spots - Blob handles blocking COMMIT in virtual table path (
blob_cache_release_handles()) - Refcount leak in insert cleanup path (removed premature
new_blob = NULLassignment)
- Random start optimization: 26% → 0.9% of insert time (1.5ms → 46µs at 10k scale)
- Batch insert mode enables persistent cache across multiple inserts (0% → expected high hit rate)
- Reduced default insert list size for faster development builds
- Added comprehensive documentation on parameter tuning and experiment tracking
- Created benchmark framework documentation
- Added performance experiment templates and analysis guidelines
- Enhanced TypeScript API reference and usage guide
- Clarified package design for Node.js/TypeScript projects
0.1.2 - 2026-02-10
- Release process refinements
0.1.1 - 2026-02-10
- Platform-specific binary path resolution in TypeScript wrapper
prepare-releasescript for automated release workflow
- Extension loading path for platform-specific binaries
0.1.0 - 2026-02-10
-
Core DiskANN Implementation
- Complete extraction of DiskANN algorithm from libSQL
- Public C API with 9 functions (8 original +
diskann_search_filtered) - BLOB I/O layer (
src/diskann_blob.c) - Node binary format with little-endian serialization (
src/diskann_node.c) - Beam search implementation (
src/diskann_search.c) - Insert with edge pruning and graph construction (
src/diskann_insert.c) - Vector deletion support
-
Virtual Table Interface
- Phase 1: Basic virtual table with MATCH search
- Phase 2: Metadata column support
- Phase 3: Filtered search with arbitrary WHERE clauses
- SQL-level CREATE/DROP/INSERT/SEARCH/DELETE operations
-
TypeScript/Node.js Package
- Hybrid CJS/ESM support for maximum compatibility
- Duck-typed DatabaseLike interface for multi-library support
- Type-safe TypeScript wrapper API
- SQL injection prevention with identifier validation
- Pre-built native binaries for supported platforms
-
Testing Infrastructure
- 175 total tests (126 C API + 49 virtual table)
- Integration tests for 128D vectors
- Recall scaling tests
- Delete-at-scale tests
- AddressSanitizer (ASan) verification
- Valgrind memory leak detection
- Stress tests for performance benchmarking
-
Build & CI/CD
- Cross-platform GitHub Actions workflow
- Windows, macOS, and Linux support
- AddressSanitizer integration
- Valgrind integration
- Bear (Build EAR) for compilation database
- clang-tidy static analysis
-
Documentation
- Comprehensive README with installation and usage guide
- API reference documentation
- Project guidelines (CLAUDE.md)
- C coding standards (DESIGN-PRINCIPLES.md)
- TDD methodology guide (TDD.md)
- Rust rewrite assessment (decision to stay in C)
- SAVEPOINT removed from index creation (prevented nested transaction issues)
- DiskAnnSearchCtx initialization to avoid undefined behavior
- Magic numbers replaced with named constants
- Windows MSVC compatibility issues
- Cross-platform timing and process ID support
- Build configuration for all platforms
- Copyright header standardization (removed "Original"/"Modifications" qualifiers)
- Metadata stored as INTEGER (not BLOB) for cross-platform portability
- V3 node format only (no V1/V2 compatibility)
- Float32 vectors only (removed VectorPair complexity)
- pruning_alpha stored as fixed-point ×1000 in metadata
- Index/database name validation prevents SQL injection
- Original libSQL bugs identified and documented:
diskAnnDelete()line 1676: uses neighbor's own rowid instead of deleted node's rowiddiskAnnSearchInternal()line 1413:out:label always returns SQLITE_OK, ignoringrcdiskAnnSearchCtxInit(): allocates withsizeof(double)but uses asfloat*
0.0.1 - 2026-02-09
- Initial project structure
- Vendored SQLite 3.51.2
- Basic Makefile
- Git repository initialization
- 0.x.x: Pre-1.0 development releases
- 1.0.0: First stable release (planned after block size fix and batch optimization validation)
- Existing 4KB block indices may have poor recall at 100k+ scale
- Recommend rebuilding indices with auto-calculated block size
- Batch insert API is backward compatible (single inserts still work)
- Virtual table schema unchanged
- Baseline: 189 inserts/sec at 10k scale
- With batch mode: Pending benchmarks (expected 2-5x improvement)
- Random start: Now <1% of insert time (previously 26%)
- 10k vectors: 97% recall@10
- 100k vectors: Pending block size fix validation
- Search beam width auto-adjusted for filtered queries
Derived from libSQL DiskANN implementation (MIT license).
Original Copyright: 2024 the libSQL authors Modifications Copyright: 2026 PhotoStructure Inc.