Skip to content

Conversation

@ankit--sethi
Copy link
Contributor

@ankit--sethi ankit--sethi commented Sep 11, 2025

Introducing the notion of a FileGroupWatcher, an orchestrating watcher built atop FileWatcher that invokes a callback only once per group of file changes.

The motivation is to better handle cases where multiple files may have changed in a particular monitoring interval, yet the action we need to take should execute only once per set of changes. Triggering callbacks over a collection of File Watchers results in the same action being executed more number of times than is necessary.

…built atop FileWatcher that invokes a callback once per group of file changes.
@ankit--sethi ankit--sethi added >non-issue :Security/Security Security issues without another label labels Sep 11, 2025
@elasticsearchmachine elasticsearchmachine added serverless-linked Added by automation, don't add manually Team:Security Meta label for security team labels Sep 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@Override
protected void doCheckAndNotify() throws IOException {
if (coalescingListener.filesChanged()) {
// fallback in case state wasn't cleared correctly in previous iteration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds concerning. How can this happen?

Copy link
Contributor Author

@ankit--sethi ankit--sethi Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it shouldn't ever happen due to the reset happening in the finally block. I was thinking of very rare situations where the JDK spec says finally will not execute. I had initially found this reference:

Likewise, if the thread executing the try or catch code is interrupted or killed, the finally block may not execute even though the application as a whole continues

https://stackoverflow.com/a/2417986

After digging a little deeper to write this reply, it turns out this is a JDK documentation mistake that they have now updated. Apparently, the section I quoted above is not actually true!
https://bugs.openjdk.org/browse/JDK-8276156?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel

With things where they are now, I'm fine with removing this if condition if it scares more than helps!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this if-condition


static class CoalescingFileChangesListener implements FileChangesListener {

private final ThreadLocal<List<Path>> filesChanged = ThreadLocal.withInitial(() -> Collections.synchronizedList(new ArrayList<>()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm failing to understand the need for ThreadLocal and synchronized list. Could you explain?

Copy link
Contributor Author

@ankit--sethi ankit--sethi Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think synchronized list was the result of an earlier solution that got superseded by ThreadLocal but I forgot to remove it. I'll explain the general scenario either way though:

  1. File change detected at the end of a particular time interval triggers this listener to action, the listener is executing in thread A.
  2. If the listener execution is time-consuming it is possible that thread A is still executing by the time a second monitoring interval completes. And if there is yet another change to the files, then the resource watcher will assign a second thread B to execute the same same listener. (I'm making some assumptions about how the resource watcher schedules tasks here)
  3. now thread A and B are modifying the same list concurrently.

This edge case requires CoalescingFileChangesListener to be doing something very time-consuming, or the watcher frequency is very very small. It's very unlikely but wrapping a list in the ThreadLocal is a low cost way to have some thread isolation in case this ever happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the listener execution is time-consuming it is possible that thread A is still executing by the time a second monitoring interval completes. And if there is yet another change to the files, then the resource watcher will assign a second thread B to execute the same same listener. (I'm making some assumptions about how the resource watcher schedules tasks here)

I think we should verify these assumptions with @elastic/es-core-infra (assuming you are the owners of resource watcher). My understanding is that file watcher does not schedule new thread until it finishes with processing current. Meaning, it seems to be single threaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds right, I tracked it down to these lines here and the flow is such that only after a task completes is the next runnable scheduled.

I feel though that depending on this information adds risk because it is tightly coupling the design to the internals of how a separate component works. With the ThreadLocal in place we don't need to depend on the scheduler guaranteeing thread safety, or things breaking if the design is ever modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the synchronizedList usage, it's not required.

@slobodanadamovic slobodanadamovic added :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team labels Sep 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@rjernst
Copy link
Member

rjernst commented Sep 18, 2025

I'm trying to understand the usecase. Can you describe a concrete use case that requires grouping in this way?

@slobodanadamovic
Copy link
Contributor

slobodanadamovic commented Sep 19, 2025

I'm trying to understand the usecase. Can you describe a concrete use case that requires grouping in this way?

We need to be able to watch multiple SSL configuration files and be notified if any of the watched files changes, instead of being notified for every file we are watching. We use notification as a trigger to build new SSLContext and re-create an HTTP client that uses this configuration. This is the main motivation for the change.

Some background: Currently, FileWatcher only accepts a single file or a directory. Due to entitlements issues, we went from watching a directory to watching an individual files (#129738). Which means we now get notified and re-create configs and client on each notification. This is not ideal.

We may not necessarily have to take the approach as proposed in this PR. And we could certainly take other approaches or implement security-specific workarounds, but it would be preferred to be able to tell watcher here are files to monitor, notify me once with grouped information about all changed files.

@rjernst
Copy link
Member

rjernst commented Sep 22, 2025

Due to entitlements issues, we went from watching a directory to watching an individual files

We are working towards watching a directory being ok. The code in question would think the secure files do not exist, but would not receive any errors. I think this would be a more robust long term approach. The proposal here seems like a workaround that will maintenance/complexity in the long term.

But aside from the above, I wonder if SSL service really needs to be in xpack core. It seems the only reason it is defined/exposed there is for xpack plugins to be able to create ssl capable clients. I wonder if this could be hidden away so that security implements this "create an ssl capable client", and then it can truly own these secure files.

cc @mosche

@tvernum
Copy link
Contributor

tvernum commented Sep 23, 2025

I wonder if SSL service really needs to be in xpack core

At the moment it does because monitoring, watcher, ML, etc all use it.

We're working on reducing the need for other plugins to directly depend on the SSL service. The end point might get us to where it doesn't need to be in x-pack core anymore, but we're a number of steps away from that.

@ankit--sethi
Copy link
Contributor Author

In light of the discussion above, I'm closing this PR as we will eventually have a better solution native to File Watcher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Core Core issues without another label >non-issue :Security/Security Security issues without another label serverless-linked Added by automation, don't add manually Team:Core/Infra Meta label for core/infra team Team:Security Meta label for security team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants