Add max_multipart_parts setting to S3 repository #113989

mhl-b · 2024-10-03T00:22:40Z

Introduce new S3 repository setting - max_multipart_parts. Currently, with default settings, to upload large files to S3 we split them into 5TB chunks and then each chunk into parts with up to 100MB size. That means we need to send up to 50k parts, where S3 limit is 10k. Attempts to upload files larger than 1TB would fail using default settings.

With new setting we split files by smallest of chunk_size or buffer_size x max_multipart_parts. With default values of buffer_size=100MB and max_multipart_parts=10_000 maximum "actual" chunk size will be ~1TB, not 5TB.

This PR is a successor to #112708

github-actions · 2024-10-03T00:22:54Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2024-10-03T00:25:49Z

Hi @mhl-b, I've created a changelog YAML for you.

elasticsearchmachine · 2024-10-03T01:36:31Z

Pinging @elastic/es-distributed (Team:Distributed)

nicktindall

LGTM

DaveCTurner

I'll take a deeper look later but I'd definitely like a different name for this setting. Could we call it max_multipart_parts?

…to s3-max-parts-size

DaveCTurner

LGTM except some suggestions for the docs and a naming nit. We can iterate on the docs separately if you'd prefer.

DaveCTurner · 2024-10-03T19:41:25Z

docs/reference/snapshot-restore/repository-s3.asciidoc

    (<<byte-units,byte value>>) Big files can be broken down into chunks during snapshotting if needed.
-    Specify the chunk size as a value and unit, for example:
+    When large file split into chunks, the chunk size will be defined by smallest of `chunk_size`
+    or `buffer_size * max_multipart_parts`. Specify the chunk size as a value and unit, for example:
    `1TB`, `1GB`, `10MB`. Defaults to the maximum size of a blob in the S3 which is `5TB`.


I never really liked the wording here to start with :) How about something like this?

(<<byte-units,byte value>>) The maximum size of object that {es} will write to the repository when creating a snapshot. Files which are larger than `chunk_size` will be chunked into several smaller objects. {es} may also split a file across multiple objects to satisfy other constraints such as the `max_multipart_parts` limit. Defaults to `5TB` which is the https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html[maximum size of an object in AWS S3].

++ on doc changes, thank you. Will include in this PR

DaveCTurner · 2024-10-03T19:48:46Z

docs/reference/snapshot-restore/repository-s3.asciidoc

+    (<<number,numeric>>) Maximum number of parts for multipart upload. When large file split into
+    chunks, the chunk size will be defined by smallest of `chunk_size` or `buffer_size * max_multipart_parts`.
+    Default value is 10,000, also see https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html[S3 multipart upload limits].
+    For example, with `buffer_size=100MB` and `max_multipart_parts=10,000` summation of all parts is about 1TB.
+    If chunk_size is set to 5TB then smallest between two would be 1TB.


I'd be inclined not to dwell too much on the calculations here, how about something like this?

Suggested change

(<<number,numeric>>) Maximum number of parts for multipart upload. When large file split into

chunks, the chunk size will be defined by smallest of `chunk_size` or `buffer_size * max_multipart_parts`.

Default value is 10,000, also see https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html[S3 multipart upload limits].

For example, with `buffer_size=100MB` and `max_multipart_parts=10,000` summation of all parts is about 1TB.

If chunk_size is set to 5TB then smallest between two would be 1TB.

(<<number,integer>>) The maximum number of parts that {es} will write during a multipart upload of a single object. Files which are

larger than `buffer_size × max_multipart_parts` will be chunked into several smaller objects. {es} may also split a file across multiple

objects to satisfy other constraints such as the `chunk_size` limit. Defaults to `10000` which is the

https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html[maximum number of parts in a multipart upload in AWS S3].

DaveCTurner · 2024-10-03T19:50:43Z

modules/repository-s3/src/main/java/org/elasticsearch/repositories/s3/S3Repository.java

+     * @param partSize part size in s3 or buffer_size
+     * @param partsNum number of parts(buffers)
+     */
+    static ByteSizeValue objectSizeLimit(ByteSizeValue objectSize, ByteSizeValue partSize, int partsNum) {


Nit: could we align the argument names with the names of the variables passed in? I know they're kinda weird names but we can't change the settings' names for legacy reasons and we're following that naming elsewhere.

DaveCTurner · 2024-10-04T16:58:35Z

As discussed elsewhere, it'd be good to backport this to 8.x too. I'm adding the appropriate labels.

(cherry picked from commit fc1bee2)

add parts number setting to s3 repo

18821f6

mhl-b added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v9.0.0 labels Oct 3, 2024

Update docs/changelog/113989.yaml

698e581

mhl-b marked this pull request as ready for review October 3, 2024 01:36

mhl-b requested review from DaveCTurner and ywangd October 3, 2024 01:36

mhl-b mentioned this pull request Oct 3, 2024

Change default S3 repository chunk_size #112708

Closed

mhl-b requested review from nicktindall and removed request for ywangd October 3, 2024 01:38

nicktindall approved these changes Oct 3, 2024

View reviewed changes

DaveCTurner requested changes Oct 3, 2024

View reviewed changes

mhl-b added 2 commits October 2, 2024 22:58

rename parts_number to max_multipart_parts

5e3da9c

Merge branch 's3-max-parts-size' of github.com:mhl-b/elasticsearch in…

4ffb421

…to s3-max-parts-size

mhl-b changed the title ~~Add parts_number setting to S3 repository~~ Add max_multipart_parts setting to S3 repository Oct 3, 2024

mhl-b requested a review from DaveCTurner October 3, 2024 18:07

DaveCTurner approved these changes Oct 3, 2024

View reviewed changes

mhl-b added 3 commits October 3, 2024 13:29

feedback

52390da

changelog summary update

292e511

Merge remote-tracking branch 'upstream/main' into s3-max-parts-size

190891b

mhl-b merged commit fc1bee2 into elastic:main Oct 4, 2024
16 checks passed

DaveCTurner added backport pending v8.16.0 labels Oct 4, 2024

mhl-b added a commit to mhl-b/elasticsearch that referenced this pull request Oct 4, 2024

Add max_multipart_parts setting to S3 repository (elastic#113989)

42aecf5

(cherry picked from commit fc1bee2)

mhl-b mentioned this pull request Oct 4, 2024

Add max_multipart_parts setting to S3 repository (#113989) #114161

Merged

mhl-b added a commit that referenced this pull request Oct 4, 2024

Add max_multipart_parts setting to S3 repository (#113989) (#114161)

519da17

mhl-b removed the backport pending label Oct 4, 2024

matthewabbott pushed a commit to matthewabbott/elasticsearch that referenced this pull request Oct 10, 2024

Add max_multipart_parts setting to S3 repository (elastic#113989)

02dc21c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add max_multipart_parts setting to S3 repository #113989

Add max_multipart_parts setting to S3 repository #113989

Uh oh!

mhl-b commented Oct 3, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Oct 3, 2024

Uh oh!

elasticsearchmachine commented Oct 3, 2024

Uh oh!

elasticsearchmachine commented Oct 3, 2024

Uh oh!

nicktindall left a comment

Uh oh!

DaveCTurner left a comment

Uh oh!

DaveCTurner left a comment

Uh oh!

DaveCTurner Oct 3, 2024

Uh oh!

mhl-b Oct 3, 2024 •

edited

Loading

Uh oh!

DaveCTurner Oct 3, 2024

Uh oh!

DaveCTurner Oct 3, 2024

Uh oh!

Uh oh!

DaveCTurner commented Oct 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add max_multipart_parts setting to S3 repository #113989

Add max_multipart_parts setting to S3 repository #113989

Uh oh!

Conversation

mhl-b commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2024

Uh oh!

elasticsearchmachine commented Oct 3, 2024

Uh oh!

elasticsearchmachine commented Oct 3, 2024

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

mhl-b Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Oct 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DaveCTurner commented Oct 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mhl-b commented Oct 3, 2024 •

edited

Loading

mhl-b Oct 3, 2024 •

edited

Loading