Skip to content

Add DELETE /storage/user-bucket endpoint#66

Merged
nss10 merged 12 commits intomasterfrom
chore/delete_storage_bucket
Mar 26, 2025
Merged

Add DELETE /storage/user-bucket endpoint#66
nss10 merged 12 commits intomasterfrom
chore/delete_storage_bucket

Conversation

@nss10
Copy link
Contributor

@nss10 nss10 commented Feb 25, 2025

Link to JIRA ticket if there is one: MIDRC-945

New Features

  • Add new endpoint DELETE /storage/user-bucket that lets user delete the bucket along with everything in there`

Bug Fixes

Improvements

Dependency updates

Deployment changes

@github-actions
Copy link

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

@coveralls
Copy link

coveralls commented Feb 25, 2025

Pull Request Test Coverage Report for Build 14089800485

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 5 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.4%) to 84.06%

Files with Coverage Reduction New Missed Lines %
aws_utils.py 5 88.89%
Totals Coverage Status
Change from base Build 13315751395: 0.4%
Covered Lines: 501
Relevant Lines: 596

💛 - Coveralls

@nss10 nss10 requested a review from paulineribeyre February 25, 2025 23:00
@nss10 nss10 marked this pull request as ready for review February 25, 2025 23:00
@nss10 nss10 removed the request for review from paulineribeyre February 26, 2025 00:08
@nss10 nss10 marked this pull request as draft February 26, 2025 00:08
@nss10 nss10 marked this pull request as ready for review March 6, 2025 15:38
@nss10 nss10 requested a review from paulineribeyre March 6, 2025 15:38
f"Deleting all contents from '{user_bucket_name}' for user '{user_id}' before deleting the bucket"
)
keys = [{"Key": obj.get("Key")} for obj in object_list["Contents"]]
s3_client.delete_objects(Bucket=user_bucket_name, Delete={"Objects": keys})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs this will fail if the list of keys is >1000.
You could delete in batches of 1000 manually, or try something like this (I'd recommend trying that first, it's probably optimized)

Copy link
Contributor Author

@nss10 nss10 Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I totally missed it. Will add batching to this approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your update looks good, i'm curious why not use bucket.objects.all().delete()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason not to use the other approach. Since we were already using the s3_client reference, my initial approach was to maintain uniformity by working within that solution. If it hadn’t worked, I would have considered creating an s3_resource. But since this worked, I didn’t explore the other option further.

@nss10 nss10 requested a review from paulineribeyre March 10, 2025 18:27
Comment on lines +69 to +71
assert (
e.value.response.get("ResponseMetadata", {}).get("HTTPStatusCode") == 404
), f"Bucket exists: {e.value}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the ClientError is raised before this assert statement runs?

# Verify the bucket is deleted
with pytest.raises(ClientError) as e:
aws_utils.s3_client.head_bucket(Bucket=bucket_name)
assert (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here about ClientError vs assert statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was indented appropriately. No?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sorry, i think i got this wrong, the assert statement can/should be inside of the with block here. I confused it with a try block. Could you revert that? my bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. I think the assert statement must be outside the with block. (reference here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whichever is fine.

Comment on lines +273 to +277
# Since we are unable to user put_object API due to a probable bug in moto,
# we will monkeypatch the list_objects_v2 API to get a list of objects to the bucket
# Add 1500 objects to the bucket, to ensure batching is working correctly
with patch.object(
aws_utils.s3_client, "list_objects_v2", get_dummy_object_list(bucket_name, 1500)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this approach is that we're not testing that the delete_object call actually works against a bucket with objects, because it's effectively called against an empty bucket. What's the problem with moto?

Also, does s3_client.delete_objects not raise an error if the object doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does s3_client.delete_objects not raise an error if the object doesn't exist?

Weirdly, no! I expected s3_client.delete_objects to raise an error if the object didn’t exist, but Moto doesn’t throw any errors in that case.

I wanted to add tests to verify that the batching logic works correctly without triggering errors. Initially, I considered mocking the delete_objects endpoint to throw an error when the input exceeds 1000 records, but that felt like overfitting, so I decided against it.

What's the problem with moto?

I also couldn’t create objects in the bucket for testing because Moto throws an error when using the putObject API with an encrypted bucket, even when making a signed request. (Relevant Code)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Since in this case we're not testing the encryption, i think we could use an unencrypted bucket so we can test the deletion properly. I think the quick way would be to add a line to delete the bucket policy (and a comment linking to the previous issue so we know it's the same thing). A better way would be to reconfigure the service with disabled KMS for this test, but that sounds like a waste of time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m glad you brought this up—I realized that the list_objects_v2 function has a restriction of returning no more than 1,000 files per operation. I’ve added a function to properly fetch and delete them in batches.

Additionally, I’ve removed the bucket_policy enforcing encryption for the bucket in the unit test.

@nss10 nss10 requested a review from paulineribeyre March 14, 2025 01:53
@nss10 nss10 requested a review from paulineribeyre March 23, 2025 08:23
Copy link
Collaborator

@paulineribeyre paulineribeyre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, no new comments. Just a few conversations above we still need to address/finalize

@nss10 nss10 requested a review from paulineribeyre March 24, 2025 21:39
user_id (str): The user's unique Gen3 ID.
user_bucket_name (str): The name of the S3 bucket.
"""
object_list = get_all_bucket_objects(user_bucket_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine because this endpoint is only meant to be used in integration tests for now. But in general, it would be better practice to fetch 1000 objects and delete them, and then fetch the next 1000, rather than store in memory the full list of objects and then loop over them in batches of 1000.

If you don't want to spend time making this improvement now, could you add a TODO comment before the call to get_all_bucket_objects so we know to do it if we ever need to use this endpoint in production?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Initially, I considered making the get_all_bucket_objects function a generator that accepts a batch_size parameter (capped at 1,000) and yields results one batch at a time. This approach would avoid the batching logic we’re currently applying in the delete function and could also serve as a form of pagination for end users if needed.

However, given the narrow scope of this use case right now, it felt somewhat over-engineered. If you anticipate us using this functionality in the near future, I’d be happy to allocate time to implement it. Otherwise, I can leave a TODO as you suggested and probably we can backlog a ticket to revisit this later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea a comment is fine

@nss10 nss10 requested a review from paulineribeyre March 26, 2025 19:07
@nss10 nss10 merged commit 8de8a62 into master Mar 26, 2025
8 checks passed
@nss10 nss10 deleted the chore/delete_storage_bucket branch July 21, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants