-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Please describe the feature
As a data recipient, I want to be able to use rclone to copy a complete dataset so I can easily manage downloads (US11, priority: could).
Background
rclone supports many backends beyond S3. The most relevant for SDA:
- HTTP — simplest. Needs a page with file links (directory listing). rclone supports custom headers (
--header), soAuthorizationandX-C4GH-Public-Keycan be passed. - WebDAV — more capable (directory listing, metadata), but requires implementing a WebDAV interface.
- S3 — most complex. Requires full S3 API compatibility.
As noted in #1680: "rclone does not require full S3 support, there are lots of standard protocols that rclone supports, maybe it's possible with simple http for example."
Questions to investigate
- Can rclone's HTTP backend work with the current v2 API as-is? The API already provides dataset file listing (
GET /datasets/{datasetId}/files) and file download (GET /files/{fileId}). The gap is that rclone's HTTP backend expects a browsable HTML directory listing, not a JSON API. - Is a thin adapter needed? E.g. a lightweight layer that serves an HTML directory listing from the dataset files endpoint, with download links pointing to file endpoints.
- Header forwarding — rclone supports
--headerfor custom headers, but does it forward them correctly on redirects? Same concern as htsget-rs. - Crypt4GH complication — downloaded files are re-encrypted per recipient. rclone would download encrypted files. Is the user expected to decrypt locally with their private key, or do we need integration with crypt4gh-aware tooling?
Acceptance criteria
- Document which rclone backend(s) are compatible with the download API
- If an adapter is needed, implement it (internal, not public API)
- Verify rclone can list and download a complete dataset with auth headers
- Document the rclone configuration for end users
Additional context
- US11 is priority could, so this is lower priority than htsget-rs (US8, must)
- sda-cli already handles dataset downloads (US9) — rclone would be an alternative for users who prefer standard tooling
- The investigation outcome may be "current API already works with rclone HTTP backend + custom headers" — in which case only documentation is needed
Estimation of size
small (investigation) / medium (if adapter needed)
Estimation of priority
low (US11 is could)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request