Spark job with AWS S3 or S3 compatible storage, s3n file size limit? #130
Unanswered
heungheung
asked this question in
Query
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Are we still using s3n:// for Spark job(s) with AWS S3 or S3 compatible storage?
If yes, is there any plan to change to s3a for AWS? How about other S3 compatible storage?
I understand there is/are adopter(s) have problem when file size is large, error will be something like
diving into the storeFile function, it seems to me the function storeFile
https://github.com/apache/hadoop/blob/branch-2.7.3/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.java#L106
should auto detect whether to use storeLargeFile instead of just putObject - of course, that putObject will have size limitation.
How to enable such storeLargeFile? Is it possible to set / override this
in the job level or this has been config in the whole Spark cluster?
Beta Was this translation helpful? Give feedback.
All reactions