(feat) add vfs: storage abstraction layer for Scorch#2237
(feat) add vfs: storage abstraction layer for Scorch#2237ajroetker wants to merge 6 commits intoblevesearch:masterfrom
vfs: storage abstraction layer for Scorch#2237Conversation
`vfs` is a new storage abstraction layer that decouples Scorch's filesystem operations from the core indexing logic. This enables Scorch to use different storage backends (filesystem, S3, etc.) without modification. - Directory interface abstracting filesystem operations - FSDirectory: Drop-in replacement using local filesystem - Backward compatible with existing Scorch indexes - Allows custom storage backend implementations - Gets closer to enabling cloud-native and distributed architectures
|
Thanks for the contribution @ajroetker . Pinging the team to weigh in on this. |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a Virtual File System (VFS) abstraction layer for the Scorch index implementation to support pluggable storage backends beyond the local filesystem.
- Adds a
Directoryinterface that abstracts filesystem operations (Open, Create, Remove, Rename, Stat, ReadDir, Lock, Unlock) - Implements
FSDirectoryas a filesystem-based implementation of the Directory interface - Integrates VFS into the Scorch index to use the abstraction layer instead of direct filesystem calls
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| index/scorch/vfs/directory.go | Defines the Directory and FileInfo interfaces for abstracting filesystem operations |
| index/scorch/vfs/fs_directory.go | Implements FSDirectory using the local filesystem with file locking via syscall.Flock |
| index/scorch/vfs/fs_directory_test.go | Tests for FSDirectory covering basic operations, directory operations, locking, and concurrent reads |
| index/scorch/vfs/directory_test.go | Test suite for Directory interface compliance testing |
| index/scorch/scorch.go | Integrates VFS directory into Scorch, allowing custom directory implementations via config |
| index/scorch/persister.go | Updates file operations to use VFS directory, fixes lock scope in removeOldZapFiles |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| t.Errorf("Failed to open file: %v", err) | ||
| return | ||
| } | ||
| defer r.Close() | ||
|
|
||
| data, err := io.ReadAll(r) | ||
| if err != nil { | ||
| t.Errorf("Failed to read file: %v", err) | ||
| return | ||
| } | ||
|
|
||
| if string(data) != string(testData) { | ||
| t.Errorf("Data mismatch in concurrent read") |
There was a problem hiding this comment.
Using t.Errorf from within a goroutine is not safe as testing.T methods are not goroutine-safe. Consider using a different mechanism to capture errors from goroutines, such as collecting errors in a channel or using t.Error with proper synchronization.
|
I think after implementing more of the necessary migration, I'll need to transition the vfs into a different package abstraction, possibly bleve_index_api? So that zapx can use the VFS for opening segments with the vfs interface. |
|
@abhinavdangeti I've tested all these prs together locally and everything passes, not sure how y'all handle the cross repo stuff in ci but happy to change whatever. |
You can update the go.mod entries for bleve_index_api and zapx/v16 to |
|
@abhinavdangeti Awesome thank you! I've done that to the zapx and this PR! I'll take a look at any issues as they come up. |
Complete the VFS abstraction work by implementing CopyTo for online index backups and fixing a bug where merge cancellation wasn't checked between tasks. - Implement snapshot_index.go CopyTo to copy bolt + segments via VFS - Add cancellation check in merge loop between tasks
vfsis a new storage abstraction layer that decouples Scorch's filesystem operations from the core indexing logic. This enables Scorch to use different storage backends (filesystem, S3, etc.) without modification.I've also created a demonstration of the efficacy of the abstraction using an s3 backed minio client with local caching if that's of interest.