[Question] Recommended way to store large files using postgrest? #2629

elimisteve · 2023-01-21T08:16:28Z

elimisteve
Jan 21, 2023

Hi there! Long-time postgrest user here.

How should I store large files, preferably using Postgres and postgrest, so that I can do streaming writes and streaming reads?

Even though storing large files in Postgres is often not recommended since it makes it harder to back up the (then large) database, I really want consistency, and storing files in Postgres helps accomplish this when I replicate the DB to other nodes, and do so without needing to synchronize the state of the file system to other nodes in parallel with DB replication.

Postgres has a Large Objects feature, but allegedly its performance is quite bad and information actually gets stored in 2 tables, making deletes more complex.

Should I use bytea? Supposedly bytea writes cannot be streamed, which isn't so good, but perhaps streaming logic was added to postgrest, just as the Java dev in the linked post manually implemented it despite streaming not natively being supported by Postgres?

It's not clear from the postgrest docs whether range headers will allow me to grab part of a file stored in Postgres as a bytea. Can I?

When it comes to existing postgrest-using solutions, I see the example in the docs of serving images from Postgres, but usually images easily fit into RAM and therefore streaming capabilities are not needed in the image-serving scenario, unlike my use case.

I could use postgrest with Supabase and let it store files on disk for me, but I'd probably have consistency issues again, yes?

Dealing with large files is such a pain I'm considering storing them as small chunks in different Postgres rows, though then that makes it much more complex for the client to download a file, since then it needs to be aware of those chunks.

Another solution would be to say: forget multi-node consistency, forget database replication, forget Postgres, just get file storage working on one node first by storing the (often large) files on disk and side-stepping postgrest since it's perhaps not the right tool for the job. But then of course there's a bunch of built-in functionality (e.g., CRUD logic) I'd be missing from not using postgrest!

Overall, how would you recommend I store large files, preferably with postgrest in the loop?

Thanks!

steve-chavez · 2023-01-24T18:20:27Z

steve-chavez
Jan 24, 2023
Maintainer

Hey @elimisteve,

just as the Java dev in the linked post manually implemented it despite streaming not natively being supported by Postgres?

I think you refer to this one #278

It's not clear from the postgrest docs whether range headers will allow me to grab part of a file stored in Postgres as a bytea. Can I?

Yeah, that will not help because range headers are for rows and not column values. Maybe we could split parts of the bytea with one of the bytea functions and map that to Range headers, but atm I have no idea how this interface would be or if it's worth it to add this.

Overall, how would you recommend I store large files, preferably with postgrest in the loop?

I would only recommend storing the URLs for the files on PostgreSQL while keeping the files on a separate storage.

0 replies

wolfgangwalther · 2023-02-26T19:07:46Z

wolfgangwalther
Feb 26, 2023
Maintainer

How should I store large files, preferably using Postgres and postgrest, so that I can do streaming writes and streaming reads?

Even though storing large files in Postgres is often not recommended since it makes it harder to back up the (then large) database, I really want consistency, and storing files in Postgres helps accomplish this when I replicate the DB to other nodes, and do so without needing to synchronize the state of the file system to other nodes in parallel with DB replication.

Fully agree - I store my files in the database for the very same reason. And I do store video files in there, too.

Should I use bytea? Supposedly bytea writes cannot be streamed, which isn't so good, but perhaps streaming logic was added to postgrest, just as the Java dev in the linked post manually implemented it despite streaming not natively being supported by Postgres?

It's not clear from the postgrest docs whether range headers will allow me to grab part of a file stored in Postgres as a bytea. Can I?

I use bytea to store the full file in a single row. This limits the file size to 1GB, but that's fine for me.

Overall, how would you recommend I store large files, preferably with postgrest in the loop?

A few important things to consider:

This only works with an nginx reverse proxy in front of PostgREST. I use nginx to to cache all files via proxy_cache.
Use an RPC function to redirect downloads from nginx to. This will allow you to return all the important headers like Content-Length, Content-Range, Content-Type, ETag etc.
Use SET STORAGE EXTERNAL for your bytea column. This will avoid compression (which is likely not helping anyway, because your content might be compressed itself already) and it will allow you to read a substring of the bytea value without reading the full row from disk. If you were to implement range requests in the RPC, you'd need this.
You can implement authentication on top of those cached requests by using nginx' auth_request directive.

nginx will cache the full file for me and then do the streaming. The first time a file is loaded it needs to be transferred from the database to nginx in full, but after that range requests are possible without supporting them on the SQL side. Basically, I use postgres for consistency, but nginx to serve files - best of both worlds, at the cost of storing every file twice.

I am not sure what your idea is about "streaming writes". I have an upload RPC, which takes the raw octet stream as a bytea parameter. That's certainly not streaming, because the file has to be uploaded at once, before the database is hit. But that's actually a good thing, because nginx will cache the request body and only when the request is sent entirely, it will hit PostgREST and the database. This keeps the time one of the database connections is blocked to a minimum.

1 reply

steve-chavez Feb 27, 2023
Maintainer

Really interesting @wolfgangwalther! A blog post or how-to on the docs would be great if you get some time 😃

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Question] Recommended way to store large files using postgrest? #2629

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Question] Recommended way to store large files using postgrest? #2629

Uh oh!

elimisteve Jan 21, 2023

Replies: 2 comments · 1 reply

Uh oh!

steve-chavez Jan 24, 2023 Maintainer

Uh oh!

wolfgangwalther Feb 26, 2023 Maintainer

Uh oh!

steve-chavez Feb 27, 2023 Maintainer

elimisteve
Jan 21, 2023

Replies: 2 comments 1 reply

steve-chavez
Jan 24, 2023
Maintainer

wolfgangwalther
Feb 26, 2023
Maintainer

steve-chavez Feb 27, 2023
Maintainer