feat(io): Vended credential refresh for S3#3351
Conversation
kevinjqliu
left a comment
There was a problem hiding this comment.
Thanks for the PR! Automatically refreshing storage credentials is very useful.
WDYT about making this a catalog responsibility? The refresh endpoint is a catalog endpoint, so instead of passing session down to the individual FileIO layer, we could handle credential refresh in the catalog itself.
I'm thinking something like how OAuth2AuthManager operates today, it handles the auth token refresh responsibility. We could have something like CredentialProvider that the catalog creates and passes to FileIO, where FileIO just calls get_credentials().
That way the IO layer stays decoupled from REST/session concerns.
|
I think that I a good call out, would be kind of the same implementation as we have to propagate the CredentialProvider object down to FileIO object. But I do agree, it abstracts away the refresh impl from FIleIO which is nice, will make the changes this week :) |
Rationale for this change
When doing long writes to a table the credentials vended with the last load table can be expired if the commit takes too long.
Here as same as Java, we proactively refresh the s3 vended credentials if they are 5 minutes away or less from expiration.
Are these changes tested?
Yes, added tests to tests/io/test_io.py to test all the logic for when its needed to call the vended credentials endpoint from the file io
Are there any user-facing changes?
Yes, users can now add the following properties to allow automatic vended credential refresh to take place