You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guide.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,7 +56,7 @@ I made this decision to make the indexing algorithm more efficient and performan
56
56
57
57
By default, due to the nature of the vector indexing algorithm, OasysDB stores the vector record data in memory via the collection interface. This means that unless persisted to disk via the database save collection method, the data will be lost when the program is closed.
58
58
59
-
Under the hood, OasysDB serializes the collection using [Serde](https://github.com/serde-rs/serde) and saves it to the database file using[Sled](https://github.com/spacejam/sled). Because of this, **whenever you modify a collection, you need to save the collection back to the database to persist the changes to disk.**
59
+
Under the hood, OasysDB serializes the collection to bytes using [Serde](https://github.com/serde-rs/serde) and writes it to a file. The reference to the file is then saved, along with other details, to the database powered by[Sled](https://github.com/spacejam/sled). Because of this, **whenever you modify a collection, you need to save the collection back to the database to persist the changes to disk.**
60
60
61
61
When opening the database, OasysDB doesn't automatically load the collections from the database file into memory as this would be inefficient if you have many collections you don't necessarily use all the time. Instead, you need to load the collections you want to use into memory manually using the get collection method.
Due to the breaking changes introduced in v0.5.0 on the persistence system, you might need to update your codebase to make it compatible with the new version. This is not required if you are starting a new project from scratch.
4
+
5
+
### What happened?
6
+
7
+
In v0.5.0, we introduced a new persistence system that is more optimized for rapidly changing data. Previously, we were using Sled to store the serialized collection blobs. We found that it was not the best option for our use case as each blob size could be somewhere in between 100MB to 10GB.
8
+
9
+
When the data change rapidly, the collections need to be saved periodically to avoid data loss. With this, the collections need to be reserialized and rewritten back into Sled. The dirty IO buffer during these operations caused some storage issues, bloating the space required to store the collection for up to 100x the collection size.
10
+
11
+
This new system is more optimized for our use case since we now write the serialized collection data directly to a dedicated file on the disk. Now, we only use Sled for storing the collection metadata and the path to where the collection is stored.
12
+
13
+
## How to migrate?
14
+
15
+
To migrate OasysDB from v0.4.5 to v0.5.0, I recommend creating a new Rust project and migrating the database from there. This migration project will read the data from the old database and write them to the new database. And for that, this project need to have access to the database files.
16
+
17
+
If you are using OasysDB on Python, you might want to use Rust to migrate the database as it supports installing both versions of OasysDB on the same project easily which is required for the migration. I can promise you that the migration process is quite simple and straightforward.
18
+
19
+
**Friendly Reminder**: Make sure to create a back-up of your database files before proceeding 😉
20
+
21
+
### 1. Install both versions of OasysDB
22
+
23
+
After setting up the new project, you can install both versions of OasysDB by specifying the package and the version in the `Cargo.toml` file.
24
+
25
+
```toml
26
+
[dependencies]
27
+
odb4 = { package = "oasysdb", version = "0.4.5" }
28
+
odb5 = { package = "oasysdb", version = "0.5.0" }
29
+
```
30
+
31
+
### 2. Migrate the database
32
+
33
+
The following script will read the collections from the old database and write them to the new database which is all we need to do to migrate the database.
After running the script, you can verify the migration by checking the new database files. The new database path should contain a sub-directory called `collections` which stores the serialized collection data. The number of files in this directory should be equal to the number of collections you migrated.
59
+
60
+
Don't forget to point your application to the new database path after the migration or rename the new database path to the old database path to make sure that your application uses the new database correctly.
61
+
62
+
## Conclusion
63
+
64
+
If all the steps are followed correctly, you should have successfully migrated your OasysDB database from v0.4.5 to v0.5.0. If you face any issues during the migration, feel free to reach out to me on our [Discord](https://discord.gg/bDhQrkqNP4).
65
+
66
+
I will be happy to personally assist you with the migration process 😁
🔹 **Built-in Incremental ID**: No headache record management and efficient storage.
46
52
47
53
## Design Philosophy
48
54
@@ -121,6 +127,14 @@ fn main() {
121
127
}
122
128
```
123
129
130
+
## Feature Flags
131
+
132
+
OasysDB provides several feature flags to enable or disable certain features. You can do this by adding the feature flags to your project `Cargo.toml` file. Below are the available feature flags and their descriptions:
133
+
134
+
-`json`: Enables easy Serde's JSON conversion from and to the metadata type. This feature is very useful if you have a complex metadata type or if you use APIs that communicate using JSON.
135
+
136
+
-`gen`: Enables the vector generator trait and modules to extract vector embeddings from your contents using OpenAI or other embedding models. This feature allows OasysDB to handle vector embedding extraction for you without separate dependencies.
0 commit comments