-
-
Notifications
You must be signed in to change notification settings - Fork 4
Experimentally write blobs/parts into a segment/batched file #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jan-auer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and ideas in comments, none of them blocking.
|
|
||
| #[derive(Clone)] | ||
| pub struct StorageId { | ||
| pub id: Uuid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed we have several variants for this StorageId now, from protobuf and the manual ones. Should we consolidate?
In a follow-up we should define whether we want to encode the use case and scope (org) into the ID. I think that would be a valuable addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
putting all of that into a unified ID sounds good to me. we might as well just encode all of that as a path like {usecase}/{scope}/{id}.
|
|
||
| impl StorageService { | ||
| pub fn new(path: &Path) -> Self { | ||
| let db_path = path.join("db.sqlite"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not write the blobs directly onto the FS? I know this is just for dev, but we know that sqlite is pretty bad with large blobs. Compression we could detect via magic bytes and metadata needs to go somewhere else anyway.
| segment BLOB NOT NULL, | ||
| segment_offset INTEGER NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not add an offset at this point and assume every blob is on its own. If the application decides to chunk files, those would be separate blobs, but for dev there's no need to chunk. As for the scalable solution, we ideally test without chunking first.
f9cfaf4 to
0a59dd7
Compare
|
My idea with that segmentation/chunking is like this:
Yes, this means that large files are split up, just to be then grouped together again into completely different segments. That was my idea. Whether it makes sense to do is another thing, but I think its worth a try. |
No description provided.