-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add max_multipart_parts setting to S3 repository #113989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Documentation preview: |
Hi @mhl-b, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a deeper look later but I'd definitely like a different name for this setting. Could we call it max_multipart_parts
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except some suggestions for the docs and a naming nit. We can iterate on the docs separately if you'd prefer.
(<<byte-units,byte value>>) Big files can be broken down into chunks during snapshotting if needed. | ||
Specify the chunk size as a value and unit, for example: | ||
When large file split into chunks, the chunk size will be defined by smallest of `chunk_size` | ||
or `buffer_size * max_multipart_parts`. Specify the chunk size as a value and unit, for example: | ||
`1TB`, `1GB`, `10MB`. Defaults to the maximum size of a blob in the S3 which is `5TB`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never really liked the wording here to start with :) How about something like this?
(<<byte-units,byte value>>) The maximum size of object that {es} will write to the repository when creating a snapshot. Files which are
larger than `chunk_size` will be chunked into several smaller objects. {es} may also split a file across multiple objects to satisfy
other constraints such as the `max_multipart_parts` limit. Defaults to `5TB` which is the
https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html[maximum size of an object in AWS S3].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ on doc changes, thank you. Will include in this PR
(<<number,numeric>>) Maximum number of parts for multipart upload. When large file split into | ||
chunks, the chunk size will be defined by smallest of `chunk_size` or `buffer_size * max_multipart_parts`. | ||
Default value is 10,000, also see https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html[S3 multipart upload limits]. | ||
For example, with `buffer_size=100MB` and `max_multipart_parts=10,000` summation of all parts is about 1TB. | ||
If chunk_size is set to 5TB then smallest between two would be 1TB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be inclined not to dwell too much on the calculations here, how about something like this?
(<<number,numeric>>) Maximum number of parts for multipart upload. When large file split into | |
chunks, the chunk size will be defined by smallest of `chunk_size` or `buffer_size * max_multipart_parts`. | |
Default value is 10,000, also see https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html[S3 multipart upload limits]. | |
For example, with `buffer_size=100MB` and `max_multipart_parts=10,000` summation of all parts is about 1TB. | |
If chunk_size is set to 5TB then smallest between two would be 1TB. | |
(<<number,integer>>) The maximum number of parts that {es} will write during a multipart upload of a single object. Files which are | |
larger than `buffer_size × max_multipart_parts` will be chunked into several smaller objects. {es} may also split a file across multiple | |
objects to satisfy other constraints such as the `chunk_size` limit. Defaults to `10000` which is the | |
https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html[maximum number of parts in a multipart upload in AWS S3]. |
* @param partSize part size in s3 or buffer_size | ||
* @param partsNum number of parts(buffers) | ||
*/ | ||
static ByteSizeValue objectSizeLimit(ByteSizeValue objectSize, ByteSizeValue partSize, int partsNum) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could we align the argument names with the names of the variables passed in? I know they're kinda weird names but we can't change the settings' names for legacy reasons and we're following that naming elsewhere.
As discussed elsewhere, it'd be good to backport this to |
(cherry picked from commit fc1bee2)
Introduce new S3 repository setting -
max_multipart_parts
. Currently, with default settings, to upload large files to S3 we split them into 5TB chunks and then each chunk into parts with up to 100MB size. That means we need to send up to 50k parts, where S3 limit is 10k. Attempts to upload files larger than 1TB would fail using default settings.With new setting we split files by smallest of
chunk_size
orbuffer_size x max_multipart_parts
. With default values ofbuffer_size=100MB
andmax_multipart_parts=10_000
maximum "actual" chunk size will be ~1TB, not 5TB.This PR is a successor to #112708