-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Ensure config reload on ..data symlink switch for CSI driver support #127628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5d4a284 to
26e5d5a
Compare
| assertClusterStateSaveOK(savedClusterState.v1(), savedClusterState.v2(), "43mb"); | ||
| } | ||
|
|
||
| public void testSymlinkUpdateTriggerReload() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails on linux without the changes to AbstractFileWatchingService.
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
|
Hi @jfreden, I've created a changelog YAML for you. |
Just wanted to mention that the |
| .collect(Collectors.toSet()); | ||
| for (var changedPath : changedPaths) { | ||
| // If a symlinked dir changed in the settings dir, it could be linked to other symlinks, so reprocess all files | ||
| if (Files.isDirectory(changedPath) && Files.isSymbolicLink(changedPath)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe isDirectory and isSymbolic link need to be extracted to helper methods the way other Files methods are, so that these operations can be granted via entitlements for modules outside server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the file operations to their own methods. I'm trying to understand why this works with the MultiProjectFileSettingsService and I think it might be because getDeclaringClass is used to determine the caller. In this case the caller is MultiProjectFileSettingsService but since we don't override the methods in FileSettingsService the calling class becomes FileSettingsService since that's where the method is declared. FileSettingsService in turn is part of server, so that's why this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without overriding it, the calling class appears to be in the server module, so server policies will apply. Extracting methods and overriding them in the module is the current way to make the call appear as originated from the module, so the module's policy appears.
It could make no difference today e.g. all modules are granted read access to the config directory, but that might change, so better to structure the code so it resolves to the correct module/policy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we are designing ways to make this easier and more explicit in the future, but atm this is the way with entitlements)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining @ldematte!
For the serverless-multi-project module, is the entitlement-policy inherited from the extended plugin (core in this case) and is that why it works without an explicit policy? I've updated the MultiProjectFileSettingsService to override the files methods in https://github.com/elastic/elasticsearch-serverless/pull/3881
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works because this class, AbstractFileWatchingService, is in server, so the calls made in this class have the entitlements of server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a Files.newInputStream(file) call in the MultiProjectFileSettingsService and that's why my logging captured the MultiProjectFileSettingsService as a the caller and I got confused. I've changed it to use the filesNewInputStream instead. Thanks for explaining, it makes sense to me now! 👍
| for (var changedPath : changedPaths) { | ||
| // If a symlinked dir changed in the settings dir, it could be linked to other symlinks, so reprocess all files | ||
| if (Files.isDirectory(changedPath) && Files.isSymbolicLink(changedPath)) { | ||
| reprocessAllChangedFilesInSettingsDir(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to reprocess the entire settings dir, instead of just what the symbolic link points at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we only want to reprocess what's in the ..data directory symlink (or other directory symlinks are present in the settings dir) we need to iterate all files in the directory ..data points to and then resolve the file names against the settings directory, to find all the symlinks that changed (and also filter any files that are not present).
This works under the assumption that the target file and symlink has the same name, which I think is acceptable in this case since it's a directory symlink. In practice (at least in serverless) all files in the settings directory (config/operator) are managed through the ..data symlink, so when it's switched we always need to reprocess all of them. For that reason adding additional logic (resolving the file names against the settings dir) would add complexity for little benefit. For the general case (ECK?) it might make sense to only try to resolve the files in ..data against the settings directory though. WDYT?
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR fixes an issue that was surfaced when using the new CSI driver (not yet in production) used by the Elasticsearch controller to mount config files as secrets. Unlike the current kubelet-based approach, the CSI driver doesn't trigger an
ENTRY_MODIFYevent on theconfigfile watcher when there are symlink updates inconfig/operator, causing file changes to go unnoticed by the file-watching service—so the fix ensures a full re-read of theconfig/operatordirectory when directory symlinks are updated.The controller mounts
settings.json,project-xyz.jsonandproject-xyz.secrets.jsonon the Elastcisearch pods in serverless as kubernetes secrets. This is currently done using the kublet that run on each node. The files are are represented by symlinks in theconfig/operatordirectory that point to the files in a symlinked directory. For example:In Elasticsearch,
MultiProjectFileSettingsService(andFileSettingsServicein non-multi-project environments) is listening for file updates in theconfigandconfig/operatorthrough theAbstractFileWatchingService(by registering a WatchService for the config and operator directories and listen forENTRY_MODIFY,ENTRY_CREATEandENTRY_DELETE). Because of the symlink approach, the symlinks for the files that are monitored are not updated, only the..datasymlink is switched to a different timestamped directory, to allow for an atomic update of all files. When running with kublet the..datasymlink is updated (recreated) in theconfig/operatordirectory, theAbstractFileWatchingServicegets an update for theconfigdirectory (ENTRY_MODIFYon theconfigwatch key), this in turn trigger a re-register of theconfig/operatorwatch key and a re-read of all files in theconfig/operator, because we don't know if the file name maps to the same native file system file id.With the new csi-driver, the
..datasymlink switch no longer triggersENTRY_MODIFYon theconfigdirectory aftersettings.jsonis changed, so we don't re-register theconfig/operatorwatch key and the file changes are never picked up. We're not entirely sure what the difference is in the standard kublet flow vs the csi-driver, one difference is that kublet uses atmpfsfile system and the csi-driver usesext4. I any case, I think thatENTRY_MODIFYon the config directory for the..datasymlink switch might have been accidental and that we should always re-read (check if a file has been updated) the fullconfig/operatordirectory every time a directory symlink is updated in the settings directory.I've added a
FileSettingsServiceITtest to reproduce the issue.This only happens on Linux. On Mac (my local) this doesn't happen because the
ENTRY_MODIFYevent is triggered for theconfigwatcher when updating theconfig/operator/..datasymlink. I'm guessing this is because mac doesn't use inotify so things work differently.