Skip to content

MIDRC-1228 Multipart upload support#104

Merged
paulineribeyre merged 4 commits intomasterfrom
multipart-support
Mar 4, 2026
Merged

MIDRC-1228 Multipart upload support#104
paulineribeyre merged 4 commits intomasterfrom
multipart-support

Conversation

@paulineribeyre
Copy link
Collaborator

Link to JIRA ticket if there is one: https://ctds-planx.atlassian.net/browse/MIDRC-1228

New Features

  • The S3 endpoint now supports multipart uploads

Breaking Changes

Bug Fixes

Improvements

Dependency updates

Deployment changes

Comment on lines -195 to -198
if request.method == "GET" and path == "s3":
err_msg = f"'ls' not supported, use 'ls s3://{user_bucket}' instead"
logger.error(err_msg)
raise HTTPException(HTTP_400_BAD_REQUEST, err_msg)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed in #102 and added back by mistake in #103

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

The style in this PR agrees with black. ✔️

This formatting comment was generated automatically by a script in uc-cdis/wool.

# The `httpx_client` parameter is not meant to be used in production. It allows mocking
# external calls when testing.
app.async_client = httpx_client or httpx.AsyncClient(
transport=httpx.AsyncHTTPTransport(retries=3), timeout=120
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can fine-tune this during the testing phase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 Connection errors being handled here is nice! Thanks! :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully it's enough!

@paulineribeyre paulineribeyre requested a review from nss10 March 3, 2026 22:35
@coveralls
Copy link

Pull Request Test Coverage Report for Build 22645891739

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 17 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.3%) to 88.441%

Files with Coverage Reduction New Missed Lines %
app.py 2 96.88%
routes/s3.py 15 86.52%
Totals Coverage Status
Change from base Build 22595832763: 0.3%
Covered Lines: 658
Relevant Lines: 744

💛 - Coveralls

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Failed to Prepare CI environment

Please find the Github Action logs here

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

Failed to Prepare CI environment

Please find the Github Action logs here

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

Failed to Prepare CI environment

Please find the Github Action logs here

Copy link
Contributor

@nss10 nss10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. A couple of questions and then good to approve.

# The `httpx_client` parameter is not meant to be used in production. It allows mocking
# external calls when testing.
app.async_client = httpx_client or httpx.AsyncClient(
transport=httpx.AsyncHTTPTransport(retries=3), timeout=120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 Connection errors being handled here is nice! Thanks! :)

headers["x-amz-security-token"] = credentials.token

# if this is a PUT request, we need the KMS key ID to use for encryption
if config["KMS_ENCRYPTION_ENABLED"] and request.method == "PUT":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do Multi-part uploads use POST? I'm wondering why didn't we check for method=POST in the past?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes multipart uploads use POST. i guess we just didn't test with large files until now

# to test a multipart upload, set the part size to 1 to force splitting the file
# into multiple parts:
Config=(
boto3.s3.transfer.TransferConfig(multipart_threshold=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to boto docs, default part size for a multi-part upload seems to be 8MB (here). So, if the file size is 6MB, it will still be uploaded as a single part, even if the multipart threshold is set to 1 MB.

With the current setup, we can test whether multipart uploads are triggered. However, to properly test multipart uploads, you could either:

  • Increase the file size (in line:366) to something larger, such as 12 MB, or
  • Set multipart_chunksize to a smaller value (e.g., 3 MB 5MB s3 minimum limit) so that the file is split into multiple parts during upload.

Alternatively, we could do both. For example:

  • File size: 12 MB
  • Multipart chunk size: 5 MB
    This would result in the upload being split into 3 parts, ensuring gen3-workflow's handling of multipart behavior is actually put to test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we get 3 calls: init upload, upload part, and complete upload.
The change you're suggesting would just add more "upload part" calls.
Which makes sense but i think that test belongs in integration tests. This unit test doesn't actually test much since all S3 calls are mocked, it really just tests that the code doesn't break for each of the 3 paths (init, upload part, complete). Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we cover init, upload and complete part in the test, I think we are good.

@paulineribeyre paulineribeyre requested a review from nss10 March 4, 2026 17:25
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

Test summary after running integration tests

filepath passed failed SUBTOTAL
tests/test_gen3_workflow.py 12 1 13
TOTAL 12 1 13

Test summary after rerunning failed integration tests

filepath passed SUBTOTAL
tests/test_gen3_workflow.py 1 1
TOTAL 1 1

Please find the detailed integration test report here

Please find the detailed integration test report after rerunning failed tests here

Please find the Github Action logs here

@paulineribeyre paulineribeyre merged commit 80e97ff into master Mar 4, 2026
13 of 16 checks passed
@paulineribeyre paulineribeyre deleted the multipart-support branch March 4, 2026 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants