-
Notifications
You must be signed in to change notification settings - Fork 35
Refresh service account tokens #154 #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I think this generally looks good, but needs more thorough review. In the meantime, approving tests to run now. |
|
@rhusar Is this specific to Kubernetes, or does it apply to |
|
@belaban This pertains to any Kubernetes-based runtime that rotates tokens on periodic base. Not all environments do that by default, so we need to be especially cautious not to break legacy runtimes since the JWT spec (RFC 7519) defines |
|
I'm not an expert in this and neither do I have the time to review this. Are you willing to take over and/or do a review? Otherwise I'll simply accept the PR. |
|
OK @belaban I will do that, ideally wanting to quickly test this against real k8s. |
|
Thanks @rhusar owe you a beer (or two!) :-) |
|
|
||
| public String getToken() throws IOException { | ||
| long currentTime = System.currentTimeMillis() / 1000; | ||
| if (token == null || (tokenExpiry > 0 && tokenExpiry < currentTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phani0207 This is relying on the fact that the system clock time is in sync between the pod and the k8s control plane and that the token is updated in time on the pod. I am assuming this is normally handled by refreshing the token early or leeway on the server – none of which seem to be an option here.
Any chance you have given some thought to this? I am wondering whether we need to retry a 401 to address this situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok to give some background, we use KUBE_PING in our production for cluster discovery and communication. This started failing when we migrated to k8s on azure as azure refreshes the token every hour. In aws we never faced the issue as the token expiry is 1 year and we restart the pod before the token expires. We have this code currently deployed in our environment as we could not wait for this commit.
Retry already happens at the Utils.openStream call. So even if a call fails intermittently. it will recover in the next attempt. The responsibility of refreshing the token is with the k8s service and client does not have any control on that. As far as clock sync is concerned, I am not sure if there is a way we can handle it.
When considering the implementation, I also considered relying on the token file modified time, but it did not feel logical. Hence went with the current approach.
As far as I have investigated, there is a buffer of 7 secs on the azure before the token expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use KUBE_PING in our production for cluster discovery and communication
Cool stuff @phani0207 Out of curiosity, is this within WildFly or Infinispan or something else? Since I will be back-porting this to older branches.
Retry already happens at the Utils.openStream call.
Right, good point, the retry there is 1 second (controlled by KUBERNETES_OPERATION_SLEEP) and attempts is 3 (KUBERNETES_OPERATION_ATTEMPTS) so that might address majority of time skew well enough.
The responsibility of refreshing the token is with the k8s service and client does not have any control on that.
Right, that's what I meant by 'none of which seem to be an option here.'; I should have been more clear.
When considering the implementation, I also considered relying on the token file modified time, but it did not feel logical. Hence went with the current approach.
I was thinking about that too but I wonder how reliable that is across different implementors/vendors.
As far as I have investigated, there is a buffer of 7 secs on the azure before the token expires.
OK so there appears to be some leeway in this case.
So I think this all checks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhusar Neither. We use jgroups as a TPL and have logic for cluster initialization and communication in multiple microservices in our deployment.
|
@rhusar Any update on this? |
Adding logic to refresh token if the toekn has expired. In ecosystems like Azure the service token expiry is 1 hour which cannot be configured to a higher value. Hence adding the logic to refresh token if the token expires.