GH-47904: [C++] [Python] Allow S3 Filesystem Re-initialization#47903
GH-47904: [C++] [Python] Allow S3 Filesystem Re-initialization#47903corpoverlords wants to merge 1 commit intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
This would allow Arrow to coexist with other libraries that also use the AWS SDK. Internally since 2 years ago AWS SDK already has a refcount mechanism for `InitAPI` and supports re-init after deinit. - Introduced a mutex for thread-safe re-initialization of the S3 client after finalization, replacing the previous std::call_once mechanism. - Added an Initialize method to reset the finalized state of the S3 client. - Updated EnsureInitialized to allow re-initialization while ensuring thread safety. This change improves the flexibility and safety of the S3 client lifecycle management.
397c69b to
837205e
Compare
|
Thanks for doing this @corpoverlords ! It seems that the Python tests need updating now that re-initialization is allowed, can you take a look? |
| std::lock_guard<std::mutex> lock(init_mutex_); | ||
|
|
||
| if (!is_initialized_.load()) { | ||
| // Not already initialized, allow re-initialization after finalization |
There was a problem hiding this comment.
Can we perhaps only allow it if the AWS SDK version is recent enough?
|
Hi @pitrou Thanks for the review! After internal use for a week now I think there is probably a need for revising the entire AWS SDK initialization mechanism. Since AWS SDK now supports a reference count we can actually let the |
|
What I think can be done:
|
Is it documented somewhere?
Yuck. We'd certainly like to avoid that. |
|
@pitrou aws/aws-sdk-cpp#2710 documented here |
Rationale for this change
This would allow Arrow to coexist with other libraries that also use the AWS SDK. Internally since 2 years ago AWS SDK already has a refcount mechanism for
InitAPIand supports re-init after deinit.This change improves the flexibility and safety of the S3 client lifecycle management.
What changes are included in this PR?
S3FS init changes
Are these changes tested?
Tested with our local infra. I can add a unit test in a dedicated cpp file (due to init/deinit usage) if wanted.
Are there any user-facing changes?
No