Skip to content

Use Storage disk for storage to support different filesystems like S3#119

Open
barryvdh wants to merge 6 commits intoErugoOSS:mainfrom
barryvdh:feat-s3
Open

Use Storage disk for storage to support different filesystems like S3#119
barryvdh wants to merge 6 commits intoErugoOSS:mainfrom
barryvdh:feat-s3

Conversation

@barryvdh
Copy link
Copy Markdown

Fixes #37

@barryvdh barryvdh marked this pull request as draft July 18, 2025 10:59
@barryvdh
Copy link
Copy Markdown
Author

For zipping I used https://github.com/stechstudio/laravel-zipstream

@barryvdh barryvdh marked this pull request as ready for review July 18, 2025 11:19
@barryvdh
Copy link
Copy Markdown
Author

Although I'm not really sure how efficient it is, because the files will be saved to s3 first, then zipped and stored on S3 again. So large files might be a bit problematic. Not sure if you could zip them client side, or just download multiple files in the browser instead. https://dev.to/cmcnicholas/zipadeedoodah-download-multiple-files-to-zip-on-client-browser-1hgc

@barryvdh
Copy link
Copy Markdown
Author

Not sure how the s3 upload directly works, but I would guess:

  • prepare the shares on an endpoint (eg api/share/createTemporary)
  • save the shares, return an URL for each fill with Storage::temporaryUploadUrl($filename, now()->addMinutes(5)); https://laravel.com/docs/12.x/filesystem#temporary-upload-urls
  • upload the files to temp URL instead of to the server
  • call the API to finish the requests and verify

Download would also be using https://laravel.com/docs/12.x/filesystem#temporary-urls which would also work for local files if serve is enabled, and then you could probably just redirect there.

@camdarley
Copy link
Copy Markdown

Hey @barryvdh, great work on this PR.
I'd like to suggest an approach that could solve the upload side as well, and address the efficiency concerns you raised about large files and zipping.
It seems that tusd has a first-class S3 storage backend (s3store) that works with any S3-compatible service. It maps the tus resumable protocol onto S3 Multipart Uploads:

$ export AWS_ACCESS_KEY_ID=xxxxx
$ export AWS_SECRET_ACCESS_KEY=xxxxx
$ export AWS_REGION=us-east-1
$ tusd -s3-bucket=my-bucket -s3-endpoint=https://s3.example.com

When running in S3 mode:

  • Each tus PATCH becomes an S3 multipart upload part
  • A temporary buffer on local disk is used per concurrent upload (roughly part_size × (1 + MaxBufferedParts), typically a few hundred MB with defaults — not the full file size)
  • The final assembled object appears in the bucket once the upload completes
  • tusd creates a .info metadata object and potentially a .part temporary object alongside each upload
    This means a server with a few GB of local disk could handle concurrent uploads of arbitrary size.

You then could combine both approaches:

  • Upload path (tusd → S3 directly): Configure tusd to use -s3-bucket + -s3-endpoint instead of -upload-dir. This requires modifying the tusd startup script/supervisor config in the Docker image. The TusdHooksController would need to be aware that completed files are now S3 objects (keyed by the tus upload ID) rather than local files.
  • Read/download/delete path (Laravel Storage disk — your PR): Your Storage::disk('s3') approach handles downloads, deletion, and metadata reads. For downloads from S3, temporaryUrl() can generate pre-signed URLs to redirect clients directly to the storage backend (avoiding proxying large files through PHP). This works with S3-compatible services as long as the endpoint is publicly accessible.
  • Zip/bundle downloads: For multi-file shares on S3, laravel-zipstream (which you already integrated) supports streaming directly from S3 sources (s3://bucket/path) to the client as a zip, without storing the full zip on disk. Note: the package uses its own S3 client, so the custom endpoint may need to be wired up via the zipstream.s3client container binding or matching AWS_* env vars.

Need changes in Erugo:

  • Docker entrypoint / supervisor config: Conditionally pass -s3-bucket, -s3-endpoint, -s3-object-prefix to tusd when S3 is configured, instead of -upload-dir
  • Environment variables: Add STORAGE_DISK=local|s3 and the standard AWS_* / S3_* env vars to .env.example
  • config/filesystems.php: Add an S3 disk configuration with endpoint support and 'use_path_style_endpoint' => true (required by most non-AWS S3 services)
  • TusdHooksController: After upload completion, the hook needs to know the file is in S3 (the upload ID maps to the S3 object key). Instead of moving/copying a local file, it would just record the S3 path in the database
  • Download controller: Use Storage::temporaryUrl() for S3 (redirect to pre-signed URL) or Storage::download() for local — your PR already goes in this direction
  • Share cleanup (expiration): Storage::disk()->delete() works for both local and S3. For abandoned/incomplete uploads, tusd's built-in expiration can clean up orphaned multipart uploads, or a cron job can handle it

S3-compatible providers: things to watch
tusd's s3store uses the standard AWS SDK and has been tested with AWS, MinIO, R2, and others. A few things worth noting for non-AWS providers:

  • Path-style addressing: Most S3-compatible services require path-style URLs. tusd enables this automatically when -s3-endpoint is set. On the Laravel side, 'use_path_style_endpoint' => true is needed in the disk config.
  • Region parameter: Some providers expect us-east-1 as a dummy region value. This should be configurable.
  • OpenStack Swift-based providers (OVH, Infomaniak, etc.): The S3 Core API (GET, PUT, DELETE, multipart uploads, pre-signed URLs) is supported. However, object tagging and bucket lifecycle policies are not — so cleanup of incomplete uploads should use a cron-based approach rather than S3 lifecycle rules.
  • Cloudflare R2: Requires uniform part sizes. tusd supports this via -s3-min-part-size and -s3-part-size flags.
  • Content encoding: Some providers don't support aws-chunked transfer encoding. This shouldn't affect tusd's multipart uploads (fixed-size parts), but is worth validating during testing.

Summary

Concern Current PR With tusd S3 backend
Upload storage Still local disk Direct to S3 (temp buffer only)
Download Storage::disk() ✅ Same + temporaryUrl() redirect
Delete Storage::disk() ✅ Same
Zip efficiency ⚠️ S3→local→zip→S3 Stream via laravel-zipstream
Min local disk Full upload size ~few hundred MB for buffering
S3-compatible services ✅ Laravel Flysystem ✅ tusd s3store + Laravel

Does it make sense to you?
If it does I'd be happy to help with implementation and testing.

References:

@camdarley
Copy link
Copy Markdown

Implementation & Testing Update

Following up on my previous comment, I've implemented and tested the full S3 integration with the tusd S3 backend on an Infomaniak Public Cloud bucket (OpenStack Swift with S3-compatible API).

What was implemented

The implementation is on the feat-s3-tusd branch, based on origin/main (the original feat-s3 branch couldn't be rebased cleanly due to 130+ commits of divergence, mainly the tusd migration).

New files:

  • app/Services/StorageService.php — Centralized helper (isS3(), tusdUploadPath(), tusdUploadExists(), deleteTusdUpload(), deleteDirectory())
  • config/erugo.php — Erugo-specific config (tusd_s3_object_prefix)
  • docker/dev/tusd-wrapper.sh + docker/alpine/tusd-wrapper.sh — Conditional tusd startup script (S3 or local backend)
  • 5 integration tests + 1 unit test for StorageService

Modified files:

  • TusdHooksController.php — S3-aware post-finish hook (records S3 key as temp_path), bundle extraction with S3 round-trip
  • UploadsController.phpStorage::move() for S3 file transfer, StorageService::tusdUploadExists() for verification
  • SharesController.phpStorage::temporaryUrl() for pre-signed URL redirects (single files + zip)
  • CreateShareZip.php — Rewritten with stechstudio/laravel-zipstream (Zip::create()->addFromDisk()->saveToDisk())
  • Share.phpStorage facade for cleanFiles(), StorageService::deleteDirectory() for S3-safe deletion
  • Docker files (Dockerfiles, supervisord.conf, start-container) — tusd wrapper, AWS env var export
  • composer.json — Added league/flysystem-aws-s3-v3 and stechstudio/laravel-zipstream

Adaptations vs. initial proposal

Several things worked differently than expected during real-world testing:

Aspect Initial proposal Actual implementation
tusd region flag -s3-region CLI flag Doesn't exist in tusd v2.6.0 — uses AWS_REGION env var instead
tusd upload IDs Simple hex string S3 mode uses hex+base64 format (id+multipartUploadId). Required regex update in all hooks and a tusdS3Key() helper to strip the suffix for S3 object lookup
Storage::deleteDirectory() Expected to work on S3 Silently fails (returns false) on Infomaniak. Built StorageService::deleteDirectory() that lists all files then deletes individually
Zip streaming Stream directly from S3 via laravel-zipstream Zip::create()->addFromDisk()->saveToDisk() — creates the zip in S3, not streamed to client. Works but stores the zip as a separate S3 object
Source files after zip Assumed could be deleted Kept in S3 mode to allow individual file downloads via pre-signed URLs

E2E test results (Infomaniak S3)

All tests performed against s3.pub1.infomaniak.cloud with path-style addressing:

Test Result
Docker build + tusd S3 startup
Single file upload (1.7 GB .mp4 via UI)
Share creation (file moved from uploads/ to {userId}/{shareId}/)
Download via pre-signed URL redirect
Multi-file share + zip creation in S3
Individual file download from multi-file share
Share cleanup (files + zip deleted, status → deleted)
Unit tests (21/21)
Integration tests against live S3 (17/17)

Known limitations vs. local block storage

  1. Double storage for multi-file shares — Both individual files and the zip are kept in S3 (~2x space). Necessary for individual file downloads via pre-signed URLs.
  2. Bundle uploads require local temp space — Zip bundles are downloaded from S3 to the container for extraction, then re-uploaded. Large bundles need sufficient ephemeral storage.
  3. StatsController disk stats are meaninglessdisk_total_space() / disk_free_space() report container disk, not S3 bucket usage.
  4. No fallbackFILESYSTEM_DISK is global. If S3 is down, Erugo is down. No hybrid mode.
  5. Download speed depends on client ↔ S3 bandwidth — Not proxied through PHP (which is good), but the server can't optimize the path.
  6. S3 provider quirks — Tested only on Infomaniak (OpenStack Swift). AWS native, R2, MinIO may behave differently (especially around deleteDirectory, virtual-hosted style, lifecycle policies).
  7. No multipart upload cleanup policy — Abandoned tusd multipart uploads in S3 aren't cleaned automatically. Would need -s3-max-multipart-lifetime in tusd or a bucket lifecycle rule (not supported on all providers).

Happy to open a separate PR from feat-s3-tusd if you'd like to review the full diff.

@camdarley
Copy link
Copy Markdown

The full implementation is available here for review: camdarley/Erugo@feat-s3-tusd (diff vs main)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancements] External Cloud Storage Option

2 participants