-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Infrastructure for resharding search filters #134252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -166,8 +166,8 @@ public interface DirectoryWrapper { | |
| private final AnalysisRegistry analysisRegistry; | ||
| private final EngineFactory engineFactory; | ||
| private final SetOnce<DirectoryWrapper> indexDirectoryWrapper = new SetOnce<>(); | ||
| private final SetOnce<Function<IndexService, CheckedFunction<DirectoryReader, DirectoryReader, IOException>>> indexReaderWrapper = | ||
| new SetOnce<>(); | ||
| private final List<Function<IndexService, CheckedFunction<DirectoryReader, DirectoryReader, IOException>>> indexReaderWrappers = | ||
| new ArrayList<>(); | ||
| private final Set<IndexEventListener> indexEventListeners = new HashSet<>(); | ||
| private final Map<String, TriFunction<Settings, IndexVersion, ScriptService, Similarity>> similarities = new HashMap<>(); | ||
| private final Map<String, IndexStorePlugin.DirectoryFactory> directoryFactories; | ||
|
|
@@ -368,9 +368,9 @@ public void addSimilarity(String name, TriFunction<Settings, IndexVersion, Scrip | |
| } | ||
|
|
||
| /** | ||
| * Sets the factory for creating new {@link DirectoryReader} wrapper instances. | ||
| * Adds a new instance of factory creating new {@link DirectoryReader} wrapper instances. | ||
| * The factory ({@link Function}) is called once the IndexService is fully constructed. | ||
| * NOTE: this method can only be called once per index. Multiple wrappers are not supported. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know whether there was any reason beyond "not implemented yet" why there might have been a restriction on more than a single wrapper? I ask because the implementation is so small it doesn't seem like it would necessarily have been an impediment on its own. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It was added as a protection to disallow other plugins than security to mess with this low level api. |
||
| * The order of execution of wrappers is not guaranteed. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we do execute them in order and should define it instead. It might be important to performance which wrapper goes first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you mean something like if the first wrapper filters out many documents, the second wrapper has less work to do? How do we decide what the correct order should be? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can keep the order of application of wrappers based on the order in which they were added (we already do). The problem with the order here is that this is called by plugins and the order in which plugins are set up is not defined. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So real clients of this API can't really rely on the order in reality. |
||
| * <p> | ||
| * The {@link CheckedFunction} is invoked each time a {@link Engine.Searcher} is requested to do an operation, | ||
| * for example search, and must return a new directory reader wrapping the provided directory reader or if no | ||
|
|
@@ -384,11 +384,11 @@ public void addSimilarity(String name, TriFunction<Settings, IndexVersion, Scrip | |
| * The returned reader is closed once it goes out of scope. | ||
| * </p> | ||
| */ | ||
| public void setReaderWrapper( | ||
| public void addReaderWrapper( | ||
| Function<IndexService, CheckedFunction<DirectoryReader, DirectoryReader, IOException>> indexReaderWrapperFactory | ||
| ) { | ||
| ensureNotFrozen(); | ||
| this.indexReaderWrapper.set(indexReaderWrapperFactory); | ||
| this.indexReaderWrappers.add(indexReaderWrapperFactory); | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -499,8 +499,26 @@ public IndexService newIndexService( | |
| QueryRewriteInterceptor queryRewriteInterceptor | ||
| ) throws IOException { | ||
| final IndexEventListener eventListener = freeze(); | ||
| Function<IndexService, CheckedFunction<DirectoryReader, DirectoryReader, IOException>> readerWrapperFactory = indexReaderWrapper | ||
| .get() == null ? (shard) -> null : indexReaderWrapper.get(); | ||
| Function<IndexService, CheckedFunction<DirectoryReader, DirectoryReader, IOException>> readerWrapperFactory = | ||
| switch (indexReaderWrappers.size()) { | ||
| case 0 -> indexService -> null; | ||
| case 1 -> indexReaderWrappers.get(0); | ||
| default -> indexService -> { | ||
| // Call factories only one time when creating index service to avoid duplicate work. | ||
| var wrappers = new ArrayList<CheckedFunction<DirectoryReader, DirectoryReader, IOException>>(); | ||
| for (var indexReaderWrapper : indexReaderWrappers) { | ||
| wrappers.add(indexReaderWrapper.apply(indexService)); | ||
| } | ||
|
|
||
| return directoryReader -> { | ||
| var wrapped = wrappers.get(0).apply(directoryReader); | ||
| for (int i = 1; i < wrappers.size(); i++) { | ||
| wrapped = wrappers.get(i).apply(wrapped); | ||
| } | ||
| return wrapped; | ||
| }; | ||
| }; | ||
| }; | ||
| eventListener.beforeIndexCreated(indexSettings.getIndex(), indexSettings.getSettings()); | ||
| final IndexStorePlugin.DirectoryFactory directoryFactory = getDirectoryFactory(indexSettings, directoryFactories); | ||
| final IndexStorePlugin.RecoveryStateFactory recoveryStateFactory = getRecoveryStateFactory(indexSettings, recoveryStateFactories); | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -477,7 +477,7 @@ public void testOnIndexModuleIsNoOpWithSecurityDisabled() throws Exception { | |
| ); | ||
| security.onIndexModule(indexModule); | ||
| // indexReaderWrapper is a SetOnce so if Security#onIndexModule had already set an ReaderWrapper we would get an exception here | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comment seems outdated by this change. I did not "debug" why the comment was made, might be worth looking into. Sounds like this "setReaderWrapper" really verifies that no reader wrapper was set yet? |
||
| indexModule.setReaderWrapper(null); | ||
| indexModule.addReaderWrapper(null); | ||
| } | ||
|
|
||
| public void testFilteredSettings() throws Exception { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jimczi i see you've done work in this area previously. Do you see any problem with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main worry is the ordering. Are we sure that it's ok if the ordering is not guaranteed? Like would we take different decisions in the extension if the live docs are different depending on which directory is last?
If not, we should ensure that the directory implementation applies the recommendation in the javadoc (like delegating cache helper, ...). Maybe that should be enforced too if that's not the case already (in tests or asserts).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay. Both readers take live docs from the wrapped reader and apply a bitset on top of them e.g.
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/accesscontrol/DocumentSubsetReader.java#L207.
My new implementation essentially boils down to:
If security wrapper comes first then all "unsecure" docs are not live and stay that way since
original.get(index)isfalse. If a new wrapper comes first then security works normally except some of the docs that need to be excluded are already excluded by the new wrapper.There are checks for the contract of the wrappers already in here
elasticsearch/server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Line 1754 in 8cb29e0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only dangerous code path i can see is if wrapper uses something like
FilterLeafReader.unwrap()which we don't do and IMO it is kinda obvious that this is a dangerous path.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep we only to ensure that the
ElasticsearchDirectoryReadercan be extracted to get the shard infos.From the description of the feature, it looks fine. Let's make sure that we don't add more to it, this is still a very expert thing to do and hopefully not an extension point for any plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have prior art in making sure that certain extension points are only used by Elastic supplied plugins (modules) and not by arbitrary 3rd party plugins
For example
elasticsearch/server/src/main/java/org/elasticsearch/action/ActionModule.java
Lines 548 to 566 in 4042137
@rjernst probably has a view on the preferred way to enforce that