Skip to content

s3: ability to gradually increase chunk size for multipart uploard#205

Closed
ali-gholami-aiven wants to merge 1 commit intomainfrom
dynamic-chunk-size-for-s3-multipart-upload
Closed

s3: ability to gradually increase chunk size for multipart uploard#205
ali-gholami-aiven wants to merge 1 commit intomainfrom
dynamic-chunk-size-for-s3-multipart-upload

Conversation

@ali-gholami-aiven
Copy link
Collaborator

About this change - What it does

Resolves: #xxxxx

Why this way

Copy link
Contributor

@giacomo-alzetta-aiven giacomo-alzetta-aiven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

I think we should probably add these to other storage backends as well though.

return chunks, chunk_size

def get_multipart_chunk_size(self, current_chunk_size: int, current_part_number: int) -> int:
if current_part_number % self.multipart_chunk_size_increase_rate == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe part numbers start at 1, but nevertheless, I'd rather be safe than sorry and add an explicit check for the 0 case to avoid accidental increases on the very first chunk:

Suggested change
if current_part_number % self.multipart_chunk_size_increase_rate == 0:
if current_part_number > 0 and current_part_number % self.multipart_chunk_size_increase_rate == 0:

mp_id = cmu_response["UploadId"]

while True:
chunk_size = self.get_multipart_chunk_size(current_chunk_size=chunk_size, current_part_number=part_number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think logging the chunk size change would make sense:

Suggested change
chunk_size = self.get_multipart_chunk_size(current_chunk_size=chunk_size, current_part_number=part_number)
old_chunk_size = chunk_size
chunk_size = self.get_multipart_chunk_size(current_chunk_size=chunk_size, current_part_number=part_number)
if chunk_size != old_chunk_size:
self.log.info("Updated chunk size from %s to %s at part_number %s", old_chunk_size, chunk_size, part_number)

@@ -395,7 +400,7 @@ def test_calculate_chunks_and_chunk_size(
) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should rename the test here

@ali-gholami-aiven ali-gholami-aiven force-pushed the dynamic-chunk-size-for-s3-multipart-upload branch 2 times, most recently from c50be5e to 588fd2f Compare May 2, 2025 15:01
@ali-gholami-aiven ali-gholami-aiven force-pushed the dynamic-chunk-size-for-s3-multipart-upload branch from 588fd2f to eca995d Compare May 2, 2025 15:06
@giacomo-alzetta-aiven
Copy link
Contributor

Closing this since we added #206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants