Releases: edwinkys/oasysdb
v0.7.3
What's Changed
This release includes a fix for an issue when opening OasysDB database from within another Tokio runtime. The source of this issue is the connection validation method inside of the open method which creates a new Tokio runtime to execute the method. This issue is solved by removing the connection validation method and trusting that the user will provide a valid SQL connection.
Contributors
Full Changelog
v0.7.2
What's Changed
This release includes a fix for the file system issue happening on Windows which happen when the default temporary directory in in a different drive than the current working directory. This issue is fixed by creating a temporary directory in the root of the database directory.
Contributors
Full Changelog
v0.7.1
What's Changed
This release includes a low-level CRUD API for the index implementation in the Database layer. Once the index is built, when necessary, you can use the CRUD API to manage the index data directly. This API allows you to perform the following operations:
- Insert a new vector record into the index.
- Update an existing vector record in the index.
- Delete a vector record from the index.
Contributors
Full Changelog
v0.7.0
What's Changed
OasysDB v0.7.0 is a major release that includes a complete overhaul of the system. Instead of being a dedicated vector database, OasysDB is now a hybrid vector database that integrates with SQL databases such as SQLite and PostgreSQL which you can configure to store the vector records. This approach gives various advantages such as:
- Reliability and durability of the data due to SQL database ACID properties.
- Separation of vector storage and computation allowing you to scale the system
independently.
These are some of the key changes in this release:
- SQL Storage Layer: OasysDB can be configured to source vector records from a SQL database such as SQLite or PostgreSQL.
- Multi-index Support: OasysDB can support multiple indices for the same SQL table allowing users to improve the search performance.
- Pre-filtering: OasysDB can pre-filter the vector records from SQL tables based on the metadata before inserting them into the index.
- Configurable Algorithm: Each index in OasysDB can be configured with different algorithms and parameters to fit the performance requirements.
Contributors
Full Changelog
v0.6.1
What's Changed
- Add support for boolean metadata type. This allows full compatibility with JSON-like object or dictionary metadata when storing vector records in the collection.
- We optimize the database save and get collection operations performance by 10-20% by reducing the number of IO operations. Also, the save collection operation is now atomic which means that the collection is saved to the disk only when the operation is completed successfully.
- We launch our own documentation website at docs.oasysdb.com to provide a better user experience and more comprehensive documentation for the OasysDB library. It's still a work in progress and we will continue to improve the documentation over time.
Contributors
Full Changelog
v0.6.0
What's Changed
-
CONDITIONAL BREAKING CHANGE: We remove support for dot distance metric and we replace cosine similarity with cosine distance metric. This change is made to make the distance metric consistent with the other distance metrics.
-
The default configuration for the collection (EF Construction and EF Search) is increased to a more sensible value according to the common real-world use cases. The default EF Construction is set to 128 and the default EF Search is set to 64.
-
We add a new script to measure the recall rate of the collection search functionality. And with this, we improve the search recall rate of OasysDB to match the recall rate of HNSWLib with the same configuration.
cargo run --example measure-recall
-
We add a new benchmark to measure the performance of saving and getting the collection. The benchmark can be run by running the command below.
cargo bench
Contributors
Full Changelog
v0.5.1
What's Changed
We add a new method Collection.filter to filter the vector records based on the metadata. This method returns a HashMap of the filtered vector records and their corresponding vector IDs. This implementation performs a linear search through the collection and thus might be slow for large datasets.
This implementation includes support for the following metadata to filter:
String: Stored value must include the filter string.Float: Stored value must be equal to the filter float.Integer: Stored value must be equal to the filter integer.Object: Stored value must match all the key-value pairs in the filter object.
We currently don't support filtering based on the array type metadata because I am not sure of the best way to implement it. If you have any suggestions, please let me know.
Contributors
Full Changelog
v0.5.0
What's Changed
-
BREAKING CHANGE: Although there is no change in the database API, the underlying storage format has been changed to save the collection data to dedicated files directly. The details of the new persistent system and how to migrate from v0.4.x to v0.5.0 can be found in this migration guide.
-
By adding the feature
gen, you can now use theEmbeddingModeltrait and OpenAI's embedding models to generate vectors or records from text without external dependencies. This feature is optional and can be enabled by adding the feature to theCargo.tomlfile.[dependencies] oasysdb = { version = "0.5.0", features = ["gen"] }
use oasysdb::vectorgen::*; fn main() { // Change the API key to your own. let api_key = "xxx"; let model = OpenAI::new(api_key, "text-embedding-3-small"); let content = "OasysDB is awesome!"; let vector = model.create_vector(content).unwrap(); assert_eq!(vector.len(), 1536); }
Contributors
Full Changelog
v0.4.5
What's Changed
- Add insert benchmark to measure the performance of inserting vectors into the collection. The benchmark can be run using the
cargo benchcommand. - Fix the issue with large-size dirty IO buffers caused by the database operation. This issue is fixed by flushing the dirty IO buffers after the operation is completed. This operation can be done synchronously or asynchronously based on the user's preference since this operation might take some time to complete.
Contributors
Full Changelog
v0.4.4
What's Changed
- Maximize compatibility with the standard library error types to allow users to convert OasysDB errors to most commonly used error handling libraries such as
anyhow,thiserror, etc. - Add conversion methods to convert metadata to JSON value by
serde_jsonand vice versa. This allows users to store JSON format metadata easily. - Add normalized cosine distance metric to the collection search functionality. Read more about the normalized cosine distance metric here.
- Fix the search distance calculation to use the correct distance metric and sort it accordingly based on the collection configuration.
- Add vector ID utility methods to the
VectorIDstruct to make it easier to work with the vector ID.
Additional Notes
- Add a new benchmark to measure the true search AKA brute-force search performance of the collection. If possible, dealing with a small dataset, it is recommended to use the true search method for better accuracy. The benchmark can be run using the
cargo benchcommand. - Improve the documentation to include more examples and explanations on how to use the library: Comprehensive Guide.