Load s3 config from env and directories using aws-config crate#203
Load s3 config from env and directories using aws-config crate#203kylebarron merged 5 commits intomainfrom
aws-config crate#203Conversation
ion-elgreco
left a comment
There was a problem hiding this comment.
Have you checked what the binary size is after adding this?
If these AWS crates add some size you could potentially also make a credential provider that checks if boto3 is available.
I noticed our AWS integration was like 10% of our binary, albeit it contains more than - just the AWS crates, but worth checking though :)
| /// | ||
| /// Downstream consumers may explicitly want to depend on tokio and add `rt-multi-thread` as a | ||
| /// tokio feature flag to opt-in to the multi-threaded tokio runtime. | ||
| pub fn get_runtime(py: Python<'_>) -> PyResult<Arc<Runtime>> { |
There was a problem hiding this comment.
You might want to reuse our runtime init: https://github.com/delta-io/delta-rs/blob/9dc8f328859b4fe66fa82a636aae662cee22db30/python/src/utils.rs#L11
It has a more graceful error for forked processes
There was a problem hiding this comment.
I don't really understand forked processes but I'll take your word for it and use that 😄
| """ | ||
|
|
||
| @classmethod | ||
| def from_aws_defaults( |
There was a problem hiding this comment.
Good that you separated and have to explicitly call it.
We have it disabled when you provide a custom endpoint because that generally means you are not on AWS, the AWS SDK will throw quite some warnings and try to resolve things with no success if you are not on aws..
There was a problem hiding this comment.
Any naming suggestions here?
from_native_credentials
maybe?
There was a problem hiding this comment.
I would keep aws at least in the wording since it's an AWS only feature.
Maybe from_aws_native or from_aws_credentials
There was a problem hiding this comment.
Hmm... yes, but the S3 part of S3Store.from_native_credentials should already make clear that it's specific to AWS?
There was a problem hiding this comment.
I think from_native_credentials is too long of a name though
(Compressed) wheel goes from 3.4MB to 4.8MB on macOS. I'm inclined to say that's ok if it significantly simplifies AWS credentials |
We do have boto3.session integration already by the way: https://developmentseed.org/obstore/latest/api/store/aws/#obstore.store.S3Store.from_session But it only passes in frozen credentials; it won't refresh the credentials. And I imagine having the credential provider go through Python would be slow. |
Right, I missed that :) |
Yeah up to you :) just wanted to highlight it only in case you wanted to keep the binary size very small |
|
Another thought, maybe you can use a custom env var (USE_AWS_NATIVE_CREDENRIALS) to force AWS native credentials resolving in the init of S3Store This could be useful in setups where only init is called on the S3Store by an external library, however you want it to pick up the native credentials. |
All else equal of course I want the binary size to stay small. It would be ideal if we could optionally distribute the heightened AWS support, but that's hard to do. If ObjectStore were FFI stable we could distribute s3 bindings in a separate Python namespace module, but that's not really feasible today. And given that S3 is the most popular cloud this extra 1.4MB seems worth it?
That sounds much too magical, IMO. A better approach would be for external libraries to (at least optionally) accept a |
|
Perhaps the credentials provider can be in a separate library using ffi instead of the full S3 implementation. I do see the need for ffi objectstore, also today someone asked about it in polars discord. Regarding the auto credentials loading when an environment variable is set, it is bit of auto magic but you have to consider some use cases where users don't provide a store themselves. Take this dagster-obstore thing I wrote, users configure the store through a yaml configuration. If they can toggle credentials loading through an env var its quite useful |
I haven't designed an FFI implementation from scratch myself (e.g. I've only reused Arrow's), so while I'm open in theory to using FFI for the credentials provider it's not something I want to put my own time into doing right now (though an issue laying out a path forward would be welcome for discussion). I'm inclined to provisionally merge this and at some point in the future we could refactor to support external credentials providers.
Right... I see in principle the benefit of passing in config via env vars, but I'm very wary of signing up for too much of a maintenance burden too quickly. Especially because I believe it's possible this library could get quite popular and heavily used. So for stuff like that I think it's better to wait and see how many people chime in on the issues asking for it, rather than moving too quickly and being stuck. In dagster-obstore it would be possible for you to check if |
Well |
|
I renamed the constructor to and added some documentation to call this provisionally supported. I'd like to get more feedback (including naming) before committing to a specific API, but we can include it in a release still! |
Explore using aws-config
Other prior work:
cc @ion-elgreco since you've already discussed this in delta-rs
Closes #202
Todo: