Skip to content

Optional DRS upload, update & delete extensions#418

Open
grsr wants to merge 30 commits intodevelopfrom
feature/issue-416-drs-upload
Open

Optional DRS upload, update & delete extensions#418
grsr wants to merge 30 commits intodevelopfrom
feature/issue-416-drs-upload

Conversation

@grsr
Copy link
Collaborator

@grsr grsr commented Nov 2, 2025

DRS Upload, Update and Delete Extensions

This PR adds optional upload, delete, access method update and checksum addition endpoints to the DRS specification, enabling complete DRS object lifecycle management. It also includes a new endpoint supporting fetching DRS objects by checksum value. Existing DRS servers remain fully compliant without implementing these extensions.

New Endpoints

Upload

  • POST /upload-request - Negotiate upload methods and credentials for file uploads
  • POST /objects/register - Register uploaded files or existing data as DRS objects

Delete

  • PUT /objects/{object_id}/delete - Delete single DRS object with optional storage data removal
  • PUT /objects/delete - Bulk delete DRS objects with optional storage data removal

Update Access Methods

  • PUT /objects/{object_id}/access-methods - Update access methods for a single DRS object
  • PUT /objects/access-methods - Update access methods for multiple DRS objects

Add Checksums

  • PUT /objects/{object_id}/checksums - Add checksum to a single DRS object
  • PUT /objects/checksums - Add checksums for multiple DRS objects

Fetch by checksum (due to @kellrott)

  • GET /objects/checksum/{checksum} - Fetch DRS objects that match the supplied checksum

Service Discovery

New service-info flags for granular capability advertisement:

  • uploadRequestSupported / objectRegistrationSupported
  • deleteSupported / deleteStorageDataSupported
  • accessMethodUpdateSupported / validateAccessMethods
  • checksumAdditionSupported / validateChecksums
  • Bulk operation limits: maxUploadRequestLength, maxRegisterRequestLength, maxBulkDeleteLength, maxBulkAccessMethodUpdateLength, maxBulkChecksumAdditionLength
  • fetchByChecksumSupported

This approach supports GA4GH Passports, Bearer tokens, and API keys for authentication. We use PUT methods rather than DELETE for delete operations to allow passports to be used in request bodies (as support for request bodies in DELETE methods is not guaranteed in all HTTP infrastructure).

The details and motivation for the proposal are included in the documentation for the new endpoints, which are rendered here: https://ga4gh.github.io/data-repository-service-schemas/preview/feature/issue-416-drs-upload/docs/

NB: The auto-rendered documentation is not updating and shows stale content, please refer to the files in the most recent commit. The generated OpenAPI spec YAML is up to date, and can be viewed by pasting the URL into the Swagger editor. The main prose markdown docs are linked below:

The corresponding issue is #417 but please also refer to the discussion for issue #416, which this PR should be considered a continuation of.

@grsr grsr mentioned this pull request Nov 2, 2025
@grsr
Copy link
Collaborator Author

grsr commented Nov 4, 2025

I have made a flurry of updates including some changes to minor details of the spec, and added more docs on implementation considerations, I think this proposal is now fairly stable and I'd very much welcome any comments or feedback. I will present this proposal in the Cloud WS call on November 10th.

@grsr grsr changed the title add proposed DRS upload and delete specification and documentation DRS upload and delete optional extensions Nov 4, 2025
@grsr grsr marked this pull request as draft November 4, 2025 21:32
@grsr grsr self-assigned this Nov 5, 2025
@grsr grsr requested a review from briandoconnor November 5, 2025 23:33
@grsr grsr changed the title DRS upload and delete optional extensions OpyDRS upload and delete optional extensions Nov 9, 2025
@grsr grsr changed the title OpyDRS upload and delete optional extensions Optional DRS upload and delete extensions Nov 9, 2025
@andrew-nimbus
Copy link

A couple quick questions:

  • If the server supports delete_storage_data, but the user has permission to delete a DRS object but not the underlying storage data--is a 401 returned upon a delete request (w/ delete_storage_data=True) and the DRS object is not deleted? Or, should there be another code for this corner case?
  • Is there a use case where one would upload files, but not register them via DRS? If so, is there a potential problem with /uploadrequest creating state via pre-assigned DRS object IDs? On the other hand, if registration must always happen--any thoughts on how the server should handle an uploadrequest and upload without registration?

Apologies if I missed answers to these in my skim of the docs!

@grsr grsr mentioned this pull request Jan 11, 2026
@grsr grsr marked this pull request as ready for review January 11, 2026 21:31
@grsr
Copy link
Collaborator Author

grsr commented Jan 11, 2026

A couple quick questions:

  • If the server supports delete_storage_data, but the user has permission to delete a DRS object but not the underlying storage data--is a 401 returned upon a delete request (w/ delete_storage_data=True) and the DRS object is not deleted? Or, should there be another code for this corner case?
  • Is there a use case where one would upload files, but not register them via DRS? If so, is there a potential problem with /uploadrequest creating state via pre-assigned DRS object IDs? On the other hand, if registration must always happen--any thoughts on how the server should handle an uploadrequest and upload without registration?

Apologies if I missed answers to these in my skim of the docs!

Apologies for the slow response @andrew-nimbus, I missed this comment somehow! My initial responses below:

  • I was imagining that in this situation the server would return a 204 success because the DRS object metadata was deleted, but the underlying data in storage wasn't. My reason for this, arguably misleading, response is that it might take a long time to delete some files, or the server may put files in a delete list and then process them hours later, and so I don't think its possible to return a useful 4XX message in general to a synchronous delete request. For the happy path where the server can delete the data synchronously (or is happy to confirm to the client that the data will be deleted) perhaps we could return some addition information to the client somehow? Thoughts welcome.
  • In the current proposal /upload-request does not create any state in the DRS server and no DRS IDs are created. If a client does not subsequently register uploaded data with /objects/register after the upload then a DRS object will not be created. I chose this route in order to avoid intermediate state in the DRS server, but this comes at the cost of intermediate state in the underlying storage. In our implementation we mitigate this by using an S3 bucket with a short lifecycle policy as the "dropzone" for uploads, and then move the data to more permanent storage once the DRS object is created (and update the access_methods accordingly). This also mitigates against incomplete (multipart) uploads from clients etc. Ideally we would implement some form of transaction wrapping around the 3 upload phases: /upload-request, actual data upload to the negotiated storage location, /objects/register, but given the middle phase might take a long time and depends on services DRS knows nothing about, I concluded that this proposal was the least worst option! This also allows /objects/register to be used straightforwardly for objects already in suitable storage and for which /upload-request was not necessary, because in either case there is no pre-existing DRS ID etc. Further comments also very welcome.

@grsr grsr changed the title Optional DRS upload and delete extensions Optional DRS upload, update & delete extensions Jan 11, 2026
@grsr
Copy link
Collaborator Author

grsr commented Mar 9, 2026

@kellrott my GitHub skills failed me and so rather than merge in your fetch by checksum changes to this branch I have just added them as a new commit to this PR. I took the liberty of adding a service-info flag allowing servers to advertise if they support the new /objects/checksum/{checksum} endpoint.

@briandoconnor I think this should be good to merge into develop now as we discussed? Apologies for the delay in following up on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants