Skip to content

Conversation

@ebarlas
Copy link
Contributor

@ebarlas ebarlas commented Oct 23, 2025

This change implements automatic reloading of PKC (Public Key Cryptography) JWK (JSON Web Key) sets from both file and URL sources, with configurable intervals.


Settings Configurations (JwtRealmSettings.java)

Added 4 new realm settings:

  • pkc_jwkset_reload.enabled - Enable/disable automatic reloading (default: false)
  • pkc_jwkset_reload.file_interval - File check interval (default: 5 minutes)
  • pkc_jwkset_reload.url_interval_min - Minimum URL reload interval (default: 60 minutes)
  • pkc_jwkset_reload.url_interval_max - Maximum URL reload interval (default: 5 days)

Core Refactoring (JwkSetLoader.java)

  • Major restructuring to support reloading mechanisms
  • Introduced two new loader backing implementations:
    • FilePkcJwkSetLoader - Monitors file changes using FileWatcher
    • UrlPkcJwkSetLoader - Schedules periodic HTTPS fetches with interval calculation based on Expires header

Threading Integration

  • JwtRealm.java: Now accepts ThreadPool for scheduling reload tasks
  • JwtAuthenticator.java: Updated to pass ThreadPool to signature validator
  • InternalRealms.java: Injects ThreadPool into JWT realm construction

HTTP Response Handling (JwtUtil.java)

  • Changed readBytes() to readResponse() returning new JwksResponse record
  • Added JwksResponse record to capture both content and Expires header
  • Added parseExpires() to parse RFC 1123 HTTP date format

Interface Reorganization

  • Moved PkcJwkSetReloadNotifier interface from JwtSignatureValidator to its own file
  • Updated PkcJwtSignatureValidator to remove reload notification responsibility (now in loader)

@ebarlas ebarlas added >enhancement :Security/Security Security issues without another label Team:Security Meta label for security team labels Oct 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @ebarlas, I've created a changelog YAML for you.

@ebarlas ebarlas requested a review from a team October 23, 2025 05:03
@ebarlas ebarlas force-pushed the periodically-reload-jwks branch from b970c65 to 414c39c Compare October 23, 2025 06:07
Copy link
Contributor

@jfreden jfreden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice job on this! Didn't get that far with my review today but wanted to leave some initial comments at least. Will keep looking tomorrow.

return false;
}

record JwksResponse(byte[] content, Instant expires) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also parse Cache-Control?

Copy link
Contributor Author

@ebarlas ebarlas Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Cache-Control was mentioned in the ticket, but it's less clear how we ought to apply that, since there are many directives. I thought that could be a follow-up enhancement since it doesn't fundamentally change the data flow or reload mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was specifically requested for serverless, what does it look like there? I agree that this could be done in a follow up if not needed there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll add a simple implementation based on the max-age directive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache-Control max-age directives are now considered in addition to Expires.

}

static class FilePkcJwkSetLoader implements PkcJwkSetLoader {
final RealmConfig realmConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these instance variables are not private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for aesthetic reasons. I opted for less ceremony on the inner class fields, since private fields on an inner class still permit access from the enclosing class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner class is package-private so you can still access the internals from outside JwkSetLoader within the org.elasticsearch.xpack.security.authc.jwt. The convention in the code base is also to use private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private added to inner class fields.

}

static class UrlPkcJwkSetLoader implements PkcJwkSetLoader {
final RealmConfig realmConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these instance variables are not private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private added to inner class fields.

private void handleReloadedContentAndJwksAlgs(byte[] bytes) {
private void handleReloadedContentAndJwksAlgs(byte[] bytes, boolean init) {
final ContentAndJwksAlgs newContentAndJwksAlgs = parseContent(bytes);
assert newContentAndJwksAlgs != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to your change but I noticed this assert can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Scheduler.Cancellable task;

FilePkcJwkSetLoader(RealmConfig realmConfig, ThreadPool threadPool, String jwkSetPath, Consumer<byte[]> listener) {
this(realmConfig, threadPool, threadPool == null ? null : threadPool.generic(), jwkSetPath, listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that you're not interested in ThreadPool per se, more so the ExecutorService instance it provides. It might be more informative then to have an ExecutorService var be the arg for this chain of constructors starting from the very top.

Also, then for tests, you don't need to pass in a null you can do EsExecutors.DIRECT_EXECUTOR_SERVICE for a simple test executor.

You'll find various search hits for:

when(threadPool.generic()).thenReturn(EsExecutors.DIRECT_EXECUTOR_SERVICE);

which suggests that this is fairly common substitution for testing.

Copy link
Contributor Author

@ebarlas ebarlas Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent quite a while fussing with this aspect of the change. It is clunky. But ExecutorService alone won't work. A scheduler abstraction is what I'm after here. And, for better or worse, the ES Scheduler also need an Executor. So, awkwardly, both are required, I think. Which is why I decided to just propagate ThreadPool.

And I opted for null rather than actual ThreadPool instances (such as TestThreadPool) because the ThreadPool constructor starts threads, which I want to avoid in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah I missed that you need both. makes sense, it's the best you can do here I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you land on using the generic threadpool here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I opted for null rather than actual ThreadPool instances (such as TestThreadPool) because the ThreadPool constructor starts threads, which I want to avoid in tests.

I think handling the null case here is adding test code to the production code. I think you should consider sticking with the TestThreadPool or a mock instance for that reason if that works?

If you do want to support null here the parameter should be marked as @Nullable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you land on using the generic threadpool here?

The JwkSetLoader async workloads are inexpensive and they don't seem related to the other categories

I think handling the null case here is adding test code to the production code.

Agreed. I'll take another look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadPool is now used throughout.

I discovered that Mockito mocks of ThreadPool use Objenesis internally to create mock instances that don't invoke the superclass (ThreadPool) constructor. As a result, mocks don't result in thread creation.

Copy link
Contributor

@jfreden jfreden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished looking at the changes. I have a couple of comments but overall I think this looks very good.

I think some more integration testing would be nice. Did you consider adding some tests for this to JwtRestIT? Or maybe a separate ESRestTestCase? It's difficult to test because of the minimum refresh interval of 5 minutes, but at least testing the init could be achieved? If not an ESRestTestCase (external cluster) an ESIntegTestCase could work that operates in the same JVM as the test and therefore allows you to do some pretty invasive things.

}

static class FilePkcJwkSetLoader implements PkcJwkSetLoader {
final RealmConfig realmConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner class is package-private so you can still access the internals from outside JwkSetLoader within the org.elasticsearch.xpack.security.authc.jwt. The convention in the code base is also to use private.

Scheduler.Cancellable task;

FilePkcJwkSetLoader(RealmConfig realmConfig, ThreadPool threadPool, String jwkSetPath, Consumer<byte[]> listener) {
this(realmConfig, threadPool, threadPool == null ? null : threadPool.generic(), jwkSetPath, listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you land on using the generic threadpool here?

Scheduler.Cancellable task;

FilePkcJwkSetLoader(RealmConfig realmConfig, ThreadPool threadPool, String jwkSetPath, Consumer<byte[]> listener) {
this(realmConfig, threadPool, threadPool == null ? null : threadPool.generic(), jwkSetPath, listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I opted for null rather than actual ThreadPool instances (such as TestThreadPool) because the ThreadPool constructor starts threads, which I want to avoid in tests.

I think handling the null case here is adding test code to the production code. I think you should consider sticking with the TestThreadPool or a mock instance for that reason if that works?

If you do want to support null here the parameter should be marked as @Nullable.

}
}

static TimeValue calculateNextUrlReload(TimeValue minVal, TimeValue maxVal, Instant targetTime, double maxJitterPct) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: targetTime should be marked @Nullable

try {
// HTTP dates are in RFC 1123 format
return ZonedDateTime.parse(expires, DateTimeFormatter.RFC_1123_DATE_TIME).toInstant();
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just catching DateTimeException here is sufficient and add a trace log (or maybe even debug?) would be helpful.


boolean changed() throws IOException {
fileWatcher.checkAndNotify(); // may call onFileInit, onFileChanged
boolean c = changed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is doing more than just checking if something changed, I wonder if the name could be updated to better reflect that?

Copy link
Contributor Author

@ebarlas ebarlas Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileChangeWatcher presents a simple file-changed abstraction. That could be implemented in several different ways. I'd prefer not to encode the implementation in the method name, if possible. I realize there is some hand waving since it's a concrete inner class, but that's the ideal I was striving for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reservation is that it's not idempotent (since it updates the state of the file watcher) and only reading the method name kind of suggests that it is, but I see your point. An example of a problem this could cause is someone doing logger.debug("File changed [{}]", fileWatcher.changed()) before actually doing the check. Maybe that's a stretch, I'll leave it up to you.

Copy link
Contributor Author

@ebarlas ebarlas Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! I missed that subtlety in the initial comment. I'll update the name.


static class FileChangeWatcher implements FileChangesListener {
final FileWatcher fileWatcher;
boolean changed = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how your approach removes some multithreading concerns. I wonder if this should still be a final AtomicBoolean since a single instance could in theory be accessed from multiple threads when scheduled on a tight interval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FileChangeWatcher instance in FilePkcJwkSetLoader cannot be accessed by multiple threads simultaneously. The associated task is scheduled with scheduleWithFixedDelay, so no concurrent access is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, looking at this again that makes sense, thanks for clarifying! Really nice that we don't have to worry about concurrency here.

return false;
}

record JwksResponse(byte[] content, Instant expires) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was specifically requested for serverless, what does it look like there? I agree that this could be done in a follow up if not needed there.

this.listener = listener;
this.httpClient = httpClient;
this.reloadIntervalMin = realmConfig.getSetting(JwtRealmSettings.PKC_JWKSET_RELOAD_URL_INTERVAL_MIN);
this.reloadIntervalMax = realmConfig.getSetting(JwtRealmSettings.PKC_JWKSET_RELOAD_URL_INTERVAL_MAX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a check is required here to make sure min <= max.


void reload() {
doLoad(ActionListener.wrap(res -> {
logger.debug("Successfully reloaded PKC JWK set from HTTPS URI [{}]", jwkSetPathUri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to log the next reload here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

return minVal;
}
var min = Duration.ofSeconds(minVal.seconds());
var max = Duration.ofSeconds(maxVal.seconds());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An assertion that min <= max could help to find bugs here.

@ebarlas
Copy link
Contributor Author

ebarlas commented Oct 24, 2025

Thanks for the thorough review! My intuition was that broader integration tests would do more harm than good in terms of cost and complexity. This change is neutral about the underlying loading mechanisms, which are already tested thoroughly. But it would be nice to have something that verifies that everything is linked properly and that the tasks are running. I'll look into that.

@ebarlas
Copy link
Contributor Author

ebarlas commented Oct 27, 2025

I managed to add JwtRealm integration tests that verify PKC JWK set reloading behavior.

It turns out that the existing integration test scaffolding in JwtRealmTestCase and JwtRealmAuthenticateTests was fairly easy to extend for key set reloading.

@ebarlas ebarlas force-pushed the periodically-reload-jwks branch from 94f6576 to 88ea7ce Compare October 27, 2025 15:16
Copy link
Contributor

@jfreden jfreden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great job!

I have a couple of non-blocking comments. The PR is still in draft mode, so let's take it out of draft before merging.


@Override
public void stop() {
if (task != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think closed should be set here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!


static class FileChangeWatcher implements FileChangesListener {
final FileWatcher fileWatcher;
boolean changed = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, looking at this again that makes sense, thanks for clarifying! Really nice that we don't have to worry about concurrency here.


boolean changed() throws IOException {
fileWatcher.checkAndNotify(); // may call onFileInit, onFileChanged
boolean c = changed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reservation is that it's not idempotent (since it updates the state of the file watcher) and only reading the method name kind of suggests that it is, but I see your point. An example of a problem this could cause is someone doing logger.debug("File changed [{}]", fileWatcher.changed()) before actually doing the check. Maybe that's a stretch, I'll leave it up to you.

record JwksResponse(byte[] content, Instant expires, Integer maxAgeSeconds) {
private static final Pattern MAX_AGE_PATTERN = Pattern.compile("\\bmax-age\\s*=\\s*(\\d+)\\b", Pattern.CASE_INSENSITIVE);

JwksResponse(byte[] content, String expires, String cacheControl) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: expires and maxAgeSeconds can be marked @Nullable

* The Expires header follows RFC 7231 format (e.g., "Thu, 01 Jan 2024 00:00:00 GMT").
* @return the parsed Instant, or null if the header is null or cannot be parsed
*/
static Instant parseExpires(String expires) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be marked @Nullable

}

/**
* Parse the Cache-Control header to extract the max-age value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since you reference the RFC in the other method you could potentially add RFC 7234 reference here.


package org.elasticsearch.xpack.security.authc.jwt;

interface PkcJwkSetReloadNotifier {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be public since it's an argument of the DelegatingJwtSignatureValidator constructor that's public.

ThreadPool threadPool = mock(ThreadPool.class);
CountingCallback callback = new CountingCallback();

new JwkSetLoader.FilePkcJwkSetLoader(realmConfig, threadPool, path.toString(), callback).start(); // schedules task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You could also verify close() here.

expectThrows(SettingsException.class, () -> new JwkSetLoader.UrlPkcJwkSetLoader(realmConfig, null, null, null, null));
}

static class CountingCallback implements Consumer<byte[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: This could be a mock object (looks like there are already some instances doing similar stuff with doAnswer).

@Before
public void init() throws Exception {
threadPool = new TestThreadPool("JWT realm tests");
immediateThreadPool = new FixedDelayThreadPool(TimeValue.timeValueMillis(10));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: I think doing something like this is a little cleaner.

Copy link
Contributor Author

@ebarlas ebarlas Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this example invokes the Runnable task synchronously in the foreground thread. I think this would create an infinite recursion since the reloader tasks re-schedule themselves.

Also, despite the clunkiness, I do think there's value in including a multi-threading environment in these in integration tests.

@ebarlas ebarlas marked this pull request as ready for review October 28, 2025 17:29
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@ebarlas ebarlas force-pushed the periodically-reload-jwks branch from f0f38a4 to 49219bf Compare October 29, 2025 18:40
@github-actions
Copy link
Contributor

github-actions bot commented Oct 29, 2025

🔍 Preview links for changed docs

@github-actions
Copy link
Contributor

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@ebarlas ebarlas force-pushed the periodically-reload-jwks branch 2 times, most recently from f440ce2 to 28b92ab Compare October 30, 2025 04:49
@ebarlas ebarlas force-pushed the periodically-reload-jwks branch from 28b92ab to 6205941 Compare October 30, 2025 15:55
@ebarlas ebarlas merged commit 2c64f47 into elastic:main Oct 30, 2025
40 checks passed
chrisparrinello pushed a commit to chrisparrinello/elasticsearch that referenced this pull request Nov 3, 2025
Add automatic reloading of PKC (Public Key Cryptography) JWK (JSON Web
Key) sets from both file and URL sources, with configurable intervals.

File-based JWK sets are reloaded at a fixed interval, with a default of
5 minutes.

URL-based JWK sets are reloaded at adaptive intervals informed by
Expires and Cache-Control header responses from the JWKS provider. The
interval is bounded by a range, with defaults of 5 minutes to 5 days.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>enhancement :Security/Security Security issues without another label Team:Security Meta label for security team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants