Skip to content

Conversation

@phani0207
Copy link

Adding logic to refresh token if the toekn has expired. In ecosystems like Azure the service token expiry is 1 hour which cannot be configured to a higher value. Hence adding the logic to refresh token if the token expires.

@rhusar
Copy link
Member

rhusar commented Nov 3, 2025

I think this generally looks good, but needs more thorough review.

In the meantime, approving tests to run now.

@belaban
Copy link
Member

belaban commented Nov 4, 2025

@rhusar Is this specific to Kubernetes, or does it apply to AWS / Azure etc too?

@rhusar
Copy link
Member

rhusar commented Nov 4, 2025

@belaban This pertains to any Kubernetes-based runtime that rotates tokens on periodic base. Not all environments do that by default, so we need to be especially cautious not to break legacy runtimes since the JWT spec (RFC 7519) defines exp as optional. IIUC the rotating became mostly the default around 2021-22 time frame even in vanilla k8s (BoundServiceAccountTokenVolume) so this will have to be backported to older branches too.

@belaban
Copy link
Member

belaban commented Nov 4, 2025

I'm not an expert in this and neither do I have the time to review this. Are you willing to take over and/or do a review? Otherwise I'll simply accept the PR.

@rhusar
Copy link
Member

rhusar commented Nov 4, 2025

OK @belaban I will do that, ideally wanting to quickly test this against real k8s.

@belaban
Copy link
Member

belaban commented Nov 4, 2025

Thanks @rhusar owe you a beer (or two!) :-)


public String getToken() throws IOException {
long currentTime = System.currentTimeMillis() / 1000;
if (token == null || (tokenExpiry > 0 && tokenExpiry < currentTime)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phani0207 This is relying on the fact that the system clock time is in sync between the pod and the k8s control plane and that the token is updated in time on the pod. I am assuming this is normally handled by refreshing the token early or leeway on the server – none of which seem to be an option here.

Any chance you have given some thought to this? I am wondering whether we need to retry a 401 to address this situation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to give some background, we use KUBE_PING in our production for cluster discovery and communication. This started failing when we migrated to k8s on azure as azure refreshes the token every hour. In aws we never faced the issue as the token expiry is 1 year and we restart the pod before the token expires. We have this code currently deployed in our environment as we could not wait for this commit.

Retry already happens at the Utils.openStream call. So even if a call fails intermittently. it will recover in the next attempt. The responsibility of refreshing the token is with the k8s service and client does not have any control on that. As far as clock sync is concerned, I am not sure if there is a way we can handle it.
When considering the implementation, I also considered relying on the token file modified time, but it did not feel logical. Hence went with the current approach.
As far as I have investigated, there is a buffer of 7 secs on the azure before the token expires.

Copy link
Member

@rhusar rhusar Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use KUBE_PING in our production for cluster discovery and communication

Cool stuff @phani0207 Out of curiosity, is this within WildFly or Infinispan or something else? Since I will be back-porting this to older branches.

Retry already happens at the Utils.openStream call.

Right, good point, the retry there is 1 second (controlled by KUBERNETES_OPERATION_SLEEP) and attempts is 3 (KUBERNETES_OPERATION_ATTEMPTS) so that might address majority of time skew well enough.

The responsibility of refreshing the token is with the k8s service and client does not have any control on that.

Right, that's what I meant by 'none of which seem to be an option here.'; I should have been more clear.

When considering the implementation, I also considered relying on the token file modified time, but it did not feel logical. Hence went with the current approach.

I was thinking about that too but I wonder how reliable that is across different implementors/vendors.

As far as I have investigated, there is a buffer of 7 secs on the azure before the token expires.

OK so there appears to be some leeway in this case.
So I think this all checks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhusar Neither. We use jgroups as a TPL and have logic for cluster initialization and communication in multiple microservices in our deployment.

@phani0207 phani0207 closed this Nov 6, 2025
@phani0207 phani0207 reopened this Nov 6, 2025
@rhusar rhusar self-assigned this Nov 10, 2025
@phani0207
Copy link
Author

@rhusar Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants