Skip to content

Conversation

@jfreden
Copy link
Contributor

@jfreden jfreden commented May 30, 2025

This PR addresses #128619 where this appears to be happening:

  1. settings.json is written to config/operator/..TIMESTAMP_TEMP_FOLDER_1/settings.json
  2. A symlink config/operator/..DATA -> config/operator/..TIMESTAMP_TEMP_FOLDER_1 is created
  3. A symlink config/operator/settings.json -> config/operator/..DATA/settings.json is created
  4. File watcher picks up settings.json
  5. An updated settings.json is written to config/operator/..TIMESTAMP_TEMP_FOLDER_2/settings.json
  6. Symlink config/operator/..DATA is removed
  7. Symlink config/operator/..DATA -> config/operator/..TIMESTAMP_TEMP_FOLDER_2 is created
  8. File watcher picks up settings.json

In (6) the ..DATA symlink could be attempted to be resolved in parallel with the removal (because the files in config/operator are being reprocessed), if that happens WindowsFS could throw an AccessDeniedException and therefore the AbstractFileWatchingService throws an exception and stops. This is roughly how things work in Kubernetes so I feel hesitant modifying the test or muting this on Windows (even though kubernetes pods running windows seem unlikely + this is currently only used internally).

The proposed fix in this PR is to add retries for AccessDeniedException.

@jfreden jfreden added the test-windows Trigger CI checks on Windows label May 30, 2025
@jfreden jfreden changed the title Unmute testSymlinkUpdateTriggerReload Add retry for AccessDeniedException in AbstractFileWatchingService May 30, 2025
@jfreden jfreden added :Core/Infra/Settings Settings infrastructure and APIs >bug labels May 30, 2025
@jfreden jfreden marked this pull request as ready for review May 30, 2025 11:37
@jfreden jfreden requested a review from a team as a code owner May 30, 2025 11:37
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @jfreden, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label May 30, 2025
@jfreden
Copy link
Contributor Author

jfreden commented May 30, 2025

Looks like the failing test is a known issue: #128604

Copy link
Contributor

@mark-vieira mark-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for this similar to AbstractFileWatchingServiceTests#testRegisterWatchKeyRetry?

@jfreden
Copy link
Contributor Author

jfreden commented Jun 2, 2025

Thanks for the review @mark-vieira ! Great idea, I've added a test for that.

@jfreden jfreden requested a review from mark-vieira June 2, 2025 07:55
Copy link
Contributor

@mark-vieira mark-vieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jfreden jfreden merged commit 2696451 into elastic:main Jun 3, 2025
21 of 23 checks passed
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Jun 3, 2025
…lastic#128653)

* Unmute testSymlinkUpdateTriggerReload

* Add retry for AccessDeniedException in AbstractFileWatchingService

* Update docs/changelog/128653.yaml
Samiul-TheSoccerFan pushed a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Jun 5, 2025
…lastic#128653)

* Unmute testSymlinkUpdateTriggerReload

* Add retry for AccessDeniedException in AbstractFileWatchingService

* Update docs/changelog/128653.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Core/Infra/Settings Settings infrastructure and APIs Team:Core/Infra Meta label for core/infra team test-windows Trigger CI checks on Windows v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants