Experimentally write blobs/parts into a segment/batched file #7

Swatinem · 2025-07-01T13:20:43Z

No description provided.

jan-auer

Some questions and ideas in comments, none of them blocking.

jan-auer · 2025-07-01T13:28:34Z

service/src/datamodel.rs

+
+#[derive(Clone)]
+pub struct StorageId {
+    pub id: Uuid,


I noticed we have several variants for this StorageId now, from protobuf and the manual ones. Should we consolidate?

In a follow-up we should define whether we want to encode the use case and scope (org) into the ID. I think that would be a valuable addition.

putting all of that into a unified ID sounds good to me. we might as well just encode all of that as a path like {usecase}/{scope}/{id}.

jan-auer · 2025-07-01T13:29:02Z

service/src/lib.rs

+
+impl StorageService {
+    pub fn new(path: &Path) -> Self {
+        let db_path = path.join("db.sqlite");


Why not write the blobs directly onto the FS? I know this is just for dev, but we know that sqlite is pretty bad with large blobs. Compression we could detect via magic bytes and metadata needs to go somewhere else anyway.

jan-auer · 2025-07-01T13:30:51Z

service/src/lib.rs

+    segment BLOB NOT NULL,
+    segment_offset INTEGER NOT NULL


I would not add an offset at this point and assume every blob is on its own. If the application decides to chunk files, those would be separate blobs, but for dev there's no need to chunk. As for the scalable solution, we ideally test without chunking first.

Swatinem · 2025-07-01T14:53:02Z

My idea with that segmentation/chunking is like this:

Clients are splitting large files into parts, and those parts are being uploaded/stored as multi-part requests.
Internally, the server then groups those parts into segments. The reason for this is to better optimize for small files.
Depending on where the segment is located (locally, or remote in GCS, we can do a range request to fetch it).

Yes, this means that large files are split up, just to be then grouped together again into completely different segments.
This might seem weird, but I think it actually makes sense. This means that clients that use our client SDK can parallelize downloads according to these internal parts, and re-assemble the file locally accordingly.
A non-client-SDK enabled client will just request the whole file via a HTTP GET, where we will then stream those parts one after the other.

That was my idea. Whether it makes sense to do is another thing, but I think its worth a try.

Swatinem self-assigned this Jul 1, 2025

Swatinem requested a review from a team as a code owner July 1, 2025 13:20

jan-auer approved these changes Jul 1, 2025

View reviewed changes

Experimentally write blobs/parts into a segment/batched file

0a59dd7

Swatinem force-pushed the swatinem/write-local-segments branch from f9cfaf4 to 0a59dd7 Compare July 1, 2025 14:47

Swatinem merged commit f8ff20f into master Jul 1, 2025
3 checks passed

Swatinem deleted the swatinem/write-local-segments branch July 1, 2025 14:53

jan-auer pushed a commit that referenced this pull request Jul 3, 2025

Experimentally write blobs/parts into a segment/batched file (#7)

9e85af8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Experimentally write blobs/parts into a segment/batched file #7

Experimentally write blobs/parts into a segment/batched file #7

Uh oh!

Swatinem commented Jul 1, 2025

Uh oh!

jan-auer left a comment

Uh oh!

jan-auer Jul 1, 2025

Uh oh!

Swatinem Jul 1, 2025

Uh oh!

jan-auer Jul 1, 2025

Uh oh!

jan-auer Jul 1, 2025

Uh oh!

Swatinem commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Experimentally write blobs/parts into a segment/batched file #7

Experimentally write blobs/parts into a segment/batched file #7

Uh oh!

Conversation

Swatinem commented Jul 1, 2025

Uh oh!

jan-auer left a comment

Choose a reason for hiding this comment

Uh oh!

jan-auer Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Swatinem Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

jan-auer Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

jan-auer Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Swatinem commented Jul 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants