MIDRC-1228 Multipart upload support#104
Conversation
| if request.method == "GET" and path == "s3": | ||
| err_msg = f"'ls' not supported, use 'ls s3://{user_bucket}' instead" | ||
| logger.error(err_msg) | ||
| raise HTTPException(HTTP_400_BAD_REQUEST, err_msg) |
|
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
| # The `httpx_client` parameter is not meant to be used in production. It allows mocking | ||
| # external calls when testing. | ||
| app.async_client = httpx_client or httpx.AsyncClient( | ||
| transport=httpx.AsyncHTTPTransport(retries=3), timeout=120 |
There was a problem hiding this comment.
we can fine-tune this during the testing phase
There was a problem hiding this comment.
S3 Connection errors being handled here is nice! Thanks! :)
There was a problem hiding this comment.
hopefully it's enough!
Pull Request Test Coverage Report for Build 22645891739Details
💛 - Coveralls |
|
Failed to Prepare CI environment Please find the Github Action logs here |
|
Failed to Prepare CI environment Please find the Github Action logs here |
|
Failed to Prepare CI environment Please find the Github Action logs here |
nss10
left a comment
There was a problem hiding this comment.
Changes look good. A couple of questions and then good to approve.
| # The `httpx_client` parameter is not meant to be used in production. It allows mocking | ||
| # external calls when testing. | ||
| app.async_client = httpx_client or httpx.AsyncClient( | ||
| transport=httpx.AsyncHTTPTransport(retries=3), timeout=120 |
There was a problem hiding this comment.
S3 Connection errors being handled here is nice! Thanks! :)
| headers["x-amz-security-token"] = credentials.token | ||
|
|
||
| # if this is a PUT request, we need the KMS key ID to use for encryption | ||
| if config["KMS_ENCRYPTION_ENABLED"] and request.method == "PUT": |
There was a problem hiding this comment.
Do Multi-part uploads use POST? I'm wondering why didn't we check for method=POST in the past?
There was a problem hiding this comment.
Yes multipart uploads use POST. i guess we just didn't test with large files until now
| # to test a multipart upload, set the part size to 1 to force splitting the file | ||
| # into multiple parts: | ||
| Config=( | ||
| boto3.s3.transfer.TransferConfig(multipart_threshold=1) |
There was a problem hiding this comment.
According to boto docs, default part size for a multi-part upload seems to be 8MB (here). So, if the file size is 6MB, it will still be uploaded as a single part, even if the multipart threshold is set to 1 MB.
With the current setup, we can test whether multipart uploads are triggered. However, to properly test multipart uploads, you could either:
- Increase the file size (
in line:366) to something larger, such as 12 MB, or - Set
multipart_chunksizeto a smaller value (e.g.,3 MB5MB s3 minimum limit) so that the file is split into multiple parts during upload.
Alternatively, we could do both. For example:
- File size: 12 MB
- Multipart chunk size: 5 MB
This would result in the upload being split into 3 parts, ensuring gen3-workflow's handling of multipart behavior is actually put to test.
There was a problem hiding this comment.
Right now we get 3 calls: init upload, upload part, and complete upload.
The change you're suggesting would just add more "upload part" calls.
Which makes sense but i think that test belongs in integration tests. This unit test doesn't actually test much since all S3 calls are mocked, it really just tests that the code doesn't break for each of the 3 paths (init, upload part, complete). Wdyt?
There was a problem hiding this comment.
As long as we cover init, upload and complete part in the test, I think we are good.
|
Test summary after running integration tests
Test summary after rerunning failed integration tests
Please find the detailed integration test report here Please find the detailed integration test report after rerunning failed tests here Please find the Github Action logs here |
Link to JIRA ticket if there is one: https://ctds-planx.atlassian.net/browse/MIDRC-1228
New Features
Breaking Changes
Bug Fixes
Improvements
Dependency updates
Deployment changes