Add DELETE /storage/user-bucket endpoint#66
Conversation
|
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
Pull Request Test Coverage Report for Build 14089800485Details
💛 - Coveralls |
…alled multiple times
gen3workflow/aws_utils.py
Outdated
| f"Deleting all contents from '{user_bucket_name}' for user '{user_id}' before deleting the bucket" | ||
| ) | ||
| keys = [{"Key": obj.get("Key")} for obj in object_list["Contents"]] | ||
| s3_client.delete_objects(Bucket=user_bucket_name, Delete={"Objects": keys}) |
There was a problem hiding this comment.
Thanks for catching this. I totally missed it. Will add batching to this approach.
There was a problem hiding this comment.
your update looks good, i'm curious why not use bucket.objects.all().delete()?
There was a problem hiding this comment.
No reason not to use the other approach. Since we were already using the s3_client reference, my initial approach was to maintain uniformity by working within that solution. If it hadn’t worked, I would have considered creating an s3_resource. But since this worked, I didn’t explore the other option further.
tests/test_misc.py
Outdated
| assert ( | ||
| e.value.response.get("ResponseMetadata", {}).get("HTTPStatusCode") == 404 | ||
| ), f"Bucket exists: {e.value}" |
There was a problem hiding this comment.
i think the ClientError is raised before this assert statement runs?
| # Verify the bucket is deleted | ||
| with pytest.raises(ClientError) as e: | ||
| aws_utils.s3_client.head_bucket(Bucket=bucket_name) | ||
| assert ( |
There was a problem hiding this comment.
same comment here about ClientError vs assert statement
There was a problem hiding this comment.
I think this was indented appropriately. No?
There was a problem hiding this comment.
yeah sorry, i think i got this wrong, the assert statement can/should be inside of the with block here. I confused it with a try block. Could you revert that? my bad
There was a problem hiding this comment.
I don't understand. I think the assert statement must be outside the with block. (reference here)
There was a problem hiding this comment.
whichever is fine.
tests/test_misc.py
Outdated
| # Since we are unable to user put_object API due to a probable bug in moto, | ||
| # we will monkeypatch the list_objects_v2 API to get a list of objects to the bucket | ||
| # Add 1500 objects to the bucket, to ensure batching is working correctly | ||
| with patch.object( | ||
| aws_utils.s3_client, "list_objects_v2", get_dummy_object_list(bucket_name, 1500) |
There was a problem hiding this comment.
The problem with this approach is that we're not testing that the delete_object call actually works against a bucket with objects, because it's effectively called against an empty bucket. What's the problem with moto?
Also, does s3_client.delete_objects not raise an error if the object doesn't exist?
There was a problem hiding this comment.
does s3_client.delete_objects not raise an error if the object doesn't exist?
Weirdly, no! I expected s3_client.delete_objects to raise an error if the object didn’t exist, but Moto doesn’t throw any errors in that case.
I wanted to add tests to verify that the batching logic works correctly without triggering errors. Initially, I considered mocking the delete_objects endpoint to throw an error when the input exceeds 1000 records, but that felt like overfitting, so I decided against it.
What's the problem with moto?
I also couldn’t create objects in the bucket for testing because Moto throws an error when using the putObject API with an encrypted bucket, even when making a signed request. (Relevant Code)
There was a problem hiding this comment.
I see. Since in this case we're not testing the encryption, i think we could use an unencrypted bucket so we can test the deletion properly. I think the quick way would be to add a line to delete the bucket policy (and a comment linking to the previous issue so we know it's the same thing). A better way would be to reconfigure the service with disabled KMS for this test, but that sounds like a waste of time
There was a problem hiding this comment.
I’m glad you brought this up—I realized that the list_objects_v2 function has a restriction of returning no more than 1,000 files per operation. I’ve added a function to properly fetch and delete them in batches.
Additionally, I’ve removed the bucket_policy enforcing encryption for the bucket in the unit test.
paulineribeyre
left a comment
There was a problem hiding this comment.
Looks good, no new comments. Just a few conversations above we still need to address/finalize
… for list_objects_v2
| user_id (str): The user's unique Gen3 ID. | ||
| user_bucket_name (str): The name of the S3 bucket. | ||
| """ | ||
| object_list = get_all_bucket_objects(user_bucket_name) |
There was a problem hiding this comment.
This is fine because this endpoint is only meant to be used in integration tests for now. But in general, it would be better practice to fetch 1000 objects and delete them, and then fetch the next 1000, rather than store in memory the full list of objects and then loop over them in batches of 1000.
If you don't want to spend time making this improvement now, could you add a TODO comment before the call to get_all_bucket_objects so we know to do it if we ever need to use this endpoint in production?
There was a problem hiding this comment.
Thanks! Initially, I considered making the get_all_bucket_objects function a generator that accepts a batch_size parameter (capped at 1,000) and yields results one batch at a time. This approach would avoid the batching logic we’re currently applying in the delete function and could also serve as a form of pagination for end users if needed.
However, given the narrow scope of this use case right now, it felt somewhat over-engineered. If you anticipate us using this functionality in the near future, I’d be happy to allocate time to implement it. Otherwise, I can leave a TODO as you suggested and probably we can backlog a ticket to revisit this later.
There was a problem hiding this comment.
yea a comment is fine
Link to JIRA ticket if there is one: MIDRC-945
New Features
DELETE /storage/user-bucketthat lets user delete the bucket along with everything in there`Bug Fixes
Improvements
Dependency updates
Deployment changes