Skip to content

Commit e258949

Browse files
committed
vectors - design doc draft
Signed-off-by: Amit Prinz Setter <[email protected]>
1 parent 0d80b71 commit e258949

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed

docs/design/vectors.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# S3 Vectors for Noobaa
2+
3+
## S3 API
4+
### Vector Buckets and Indexes
5+
6+
1. Needed for minimal POC - create vector bucket, create index, put vector and query vector.
7+
2. MVP requires all get/list/delete API implemented.
8+
1. BucketPolicy is not a must for MVP.
9+
4. Hopefully will be able to reuse current bucket/object implementation with some adaptation.
10+
1. Eg add "bucket content" field to indicate whether bucket is object or vector instead of creating a new “vector bucket” entity.
11+
12+
13+
## Noobaa Middle layer
14+
15+
16+
1. Add an abstract VectorSDK, generally parallel to s3 api.
17+
1. Eg VectorSDK.create_vector_bucket(), VectorSDK.put_vector()
18+
2. For POC - only one concrete implementation of VectorSDK translating action to LanceDB API.
19+
1. Will make switch to Datastax vector implementation pluggable.
20+
3. Hopefully we might be able to skip namespace layer after initiating a LanceDB client connected to an appropriate storage. Ie client initialization will be dependent on namespace, but after that all api calls will be the same.
21+
4. Need to translate s3 api into LanceDB api.
22+
1. Eg {bucket vector, index} -> table
23+
24+
## Backing storage
25+
1. For s3 compatible backinstore, we can give connect a Lance client with the s3 credentials.
26+
1. Need to adapt provider-specific credentials (eg AWS secret key id vs. Azure account name).
27+
2. For file system BS, we can connect LanceDB with a designated directory in file system.
28+
3. For other usecases we can connect Lance with the Noobaa s3 endpoint.
29+
30+
31+
## LanceDB
32+
1. Has a [JS client](https://lancedb.github.io/lancedb/js/#development), which is nice.
33+
2. For containerized, clients will probably live inside each endpoint fork. If this is deemed not feasible (or just too wasteful) we will need to run LanceDB client inside its own process/container/pod.
34+
3. For NC, we will need a new parameter for storage directory.
35+
4. For s3 BS, we will provide LanceDB with the s3 account, customized per s3 provider that Lance supports.
36+
5. There will probably be some discrepancy between Lance and AWS s3 features.
37+
1. Eg metadata filter
38+
6. Paid support considerations - enterpise edition? forking/ds?
39+

0 commit comments

Comments
 (0)