A lightweight, efficient vector database with similarity search capabilities, designed for machine learning and AI applications.
- Fast similarity search using optimized indexing
- HNSW indexing for significantly improved search performance
- Vector collections/namespaces for organizing different types of embeddings
- Metadata filtering for fine-grained search control
- Efficient binary storage format for vectors and metadata
- Automatic text embedding with sentence-transformers
- Rich command-line interface with beautiful output
- Cross-platform support (Windows, macOS, Linux)
- Customizable storage locations
- Metadata support for enhanced document management
- Built-in text similarity search
pip install vecstream# Add a document
vecstream add "Machine learning is transforming technology" doc1
# Search for similar documents
vecstream search "AI and machine learning" --k 3
# Search with metadata filtering
vecstream search "cloud computing" --filter '{"category": "ai", "year": 2023}'
# Get document by ID
vecstream get doc1
# View database information
vecstream info
# Create and use a collection
vecstream create_collection research
vecstream add "Neural networks research" doc2 --collection research
# Use custom storage location
vecstream add "Custom storage test" doc3 --db-path "./my_vectors"
# Remove a document
vecstream remove doc1from vecstream.collections import CollectionManager
from vecstream.binary_store import BinaryVectorStore
# Using collections for different vector types
manager = CollectionManager("./vector_db")
research_collection = manager.create_collection("research")
products_collection = manager.create_collection("products")
# Add vectors with metadata to collections
research_collection.add_vector(
id="paper1",
vector=[1.0, 0.0, 0.0],
metadata={"topic": "AI", "year": 2023, "author": "Smith"}
)
# Search with metadata filtering
results = research_collection.search_similar(
query=[1.0, 0.0, 0.0],
k=5,
filter_metadata={"year": 2023, "topic": "AI"}
)
# Basic binary store usage (compatible with earlier versions)
store = BinaryVectorStore("./vector_db")
# Add vectors with metadata
store.add_vector(
id="doc1",
vector=[1.0, 0.0, 0.0],
metadata={"text": "Example document", "tags": ["test"]}
)
# Search similar vectors
results = store.search_similar([1.0, 0.0, 0.0], k=5)
# Get vector with metadata
vector, metadata = store.get_vector_with_metadata("doc1")By default, VecStream stores its data in:
- Windows:
%APPDATA%/VecStream/store/ - macOS/Linux:
~/.vecstream/store/
You can specify a custom storage location using the --db-path option in CLI commands or by passing the path to CollectionManager or BinaryVectorStore.
VecStream uses an efficient binary storage format:
- Vectors: NumPy
.npyformat for fast access - Metadata: JSON format for flexibility
- Automatic compression and optimization
- Collections organized in subdirectories
The command-line interface provides:
- Vector Management: Add, get, update and remove vectors with
add,get, andremovecommands - Similarity Search: Fast vector search with
searchcommand with adjustable k-nearest neighbors - HNSW Indexing: Significantly faster search performance for large datasets (up to 100x faster)
- Collections: Organize vectors by type with
collection create,collection list, and other commands - Metadata Filtering: Filter search results with
--filter '{"key": "value"}'syntax - Nested Filters: Support for dot notation in filters like
--filter '{"details.color": "red"}' - Beautiful UI: Rich, colored output and progress indicators for long operations
- Database Stats: View detailed database information with
infocommand - Custom Storage: Specify storage locations with
--db-pathoption
The Python API offers:
- HNSW Indexing: Fast approximate nearest-neighbor search with customizable parameters:
from vecstream.hnsw_index import HNSWIndex index = HNSWIndex(dim=128, M=16, ef_construction=200)
- Collections: Organize vectors with the CollectionManager:
from vecstream.collections import CollectionManager manager = CollectionManager("./vector_db", use_hnsw=True) collection = manager.create_collection("images")
- Metadata Filtering: Fine-grained search control:
results = collection.search_similar(query, filter_metadata={"category": "electronics"})
- Nested Filtering: Access nested properties with dot notation:
results = collection.search_similar(query, filter_metadata={"details.color": "black"})
- Binary Storage: Efficient serialization for large datasets:
from vecstream.binary_store import BinaryVectorStore store = BinaryVectorStore("./vector_db")
- Vector Operations: Direct access to similarity calculations, normalization, and more
- Type Safety: Strong typing and error handling with descriptive exceptions
- Python 3.8 or higher
- NumPy
- SciPy
- sentence-transformers
- Rich (for CLI)
- Click (for CLI)
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
-
0.3.0 (2024-03-XX)
- Added HNSW indexing for faster similarity search
- Added collections/namespaces for organizing vectors
- Added metadata filtering for search results
- Improved CLI with collection management commands
- Performance optimizations
-
0.2.0 (2024-03-XX)
- Added binary vector store
- Improved persistent storage
- Enhanced CLI functionality
- Added metadata support
-
0.1.0 (2024-03-XX)
- Initial release
- Basic vector storage and search functionality
- CLI interface
- Client-server architecture
| Document | Description | Link |
|---|---|---|
| API Reference | Complete reference of VecStream's classes, methods, and CLI commands | API Reference |
| Advanced Usage | Detailed examples and best practices for using VecStream | Advanced Usage |
| Feature | Description | Documentation |
|---|---|---|
| HNSW Indexing | Fast approximate nearest neighbor search for large datasets | API Reference, Usage Examples |
| Collections | Organize vectors with metadata for better organization | API Reference, Usage Examples |
| Metadata Filtering | Filter search results using metadata properties | API Reference, Usage Examples |
| Binary Storage | Efficient storage format for large vector datasets | API Reference, Usage Examples |