Skip to content

Conversation

@lkts
Copy link
Contributor

@lkts lkts commented Sep 5, 2025

This PR relaxes the restriction on there only ever being one DirectoryReader wrapper per index and adds a possibility to apply multiple layers of wrappers. Currently only two are expected but the code does not enforce this.

This is needed for the ongoing autosharding project work.

@elasticsearchmachine elasticsearchmachine added serverless-linked Added by automation, don't add manually v9.2.0 labels Sep 5, 2025
* Sets the factory for creating new {@link DirectoryReader} wrapper instances.
* Adds a new instance of factory creating new {@link DirectoryReader} wrapper instances.
* The factory ({@link Function}) is called once the IndexService is fully constructed.
* NOTE: this method can only be called once per index. Multiple wrappers are not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know whether there was any reason beyond "not implemented yet" why there might have been a restriction on more than a single wrapper? I ask because the implementation is so small it doesn't seem like it would necessarily have been an impediment on its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added as a protection to disallow other plugins than security to mess with this low level api.

@lkts lkts requested a review from jimczi September 11, 2025 21:26
private final SetOnce<DirectoryWrapper> indexDirectoryWrapper = new SetOnce<>();
private final SetOnce<Function<IndexService, CheckedFunction<DirectoryReader, DirectoryReader, IOException>>> indexReaderWrapper =
new SetOnce<>();
private final List<Function<IndexService, CheckedFunction<DirectoryReader, DirectoryReader, IOException>>> indexReaderWrappers =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimczi i see you've done work in this area previously. Do you see any problem with this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main worry is the ordering. Are we sure that it's ok if the ordering is not guaranteed? Like would we take different decisions in the extension if the live docs are different depending on which directory is last?
If not, we should ensure that the directory implementation applies the recommendation in the javadoc (like delegating cache helper, ...). Maybe that should be enforced too if that's not the case already (in tests or asserts).

Copy link
Contributor Author

@lkts lkts Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay. Both readers take live docs from the wrapped reader and apply a bitset on top of them e.g.
https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/accesscontrol/DocumentSubsetReader.java#L207.

My new implementation essentially boils down to:

static class FilterBits implements Bits {
        private final Bits original;
        private final Bits filteredOut;

        FilterBits(Bits original, BitSet filteredOut) {
            this.original = original;
            this.filteredOut = filteredOut;
        }

        @Override
        public boolean get(int index) {
            return original.get(index) && (filteredOut.get(index) == false);
        }

        @Override
        public int length() {
            return original.length();
        }
    }

If security wrapper comes first then all "unsecure" docs are not live and stay that way since original.get(index) is false. If a new wrapper comes first then security works normally except some of the docs that need to be excluded are already excluded by the new wrapper.

There are checks for the contract of the wrappers already in here

final ElasticsearchDirectoryReader elasticsearchDirectoryReader = ElasticsearchDirectoryReader.getElasticsearchDirectoryReader(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only dangerous code path i can see is if wrapper uses something like FilterLeafReader.unwrap() which we don't do and IMO it is kinda obvious that this is a dangerous path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep we only to ensure that the ElasticsearchDirectoryReader can be extracted to get the shard infos.
From the description of the feature, it looks fine. Let's make sure that we don't add more to it, this is still a very expert thing to do and hopefully not an extension point for any plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have prior art in making sure that certain extension points are only used by Elastic supplied plugins (modules) and not by arbitrary 3rd party plugins

For example

if (plugin instanceof RestServerActionPlugin restPlugin) {
var newInstance = function.apply(restPlugin);
if (newInstance != null) {
logger.debug("Using custom {} from plugin {}", type, plugin.getClass().getName());
if (isInternalPlugin(plugin) == false) {
throw new IllegalArgumentException(
"The "
+ plugin.getClass().getName()
+ " plugin tried to install a custom "
+ type
+ ". This functionality is not available to external plugins."
);
}
if (result != null) {
throw new IllegalArgumentException("Cannot have more than one plugin implementing a " + type);
}
result = newInstance;
}
}

@rjernst probably has a view on the preferred way to enforce that

@lkts lkts marked this pull request as ready for review September 11, 2025 21:26
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Sep 11, 2025
@lkts lkts added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. label Sep 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team and removed needs:triage Requires assignment of a team area label labels Sep 11, 2025
Copy link
Contributor

@bcully bcully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd like to get Jim's opinion before merging if possible.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with this, but like Jim I'd like a defined order.

* Adds a new instance of factory creating new {@link DirectoryReader} wrapper instances.
* The factory ({@link Function}) is called once the IndexService is fully constructed.
* NOTE: this method can only be called once per index. Multiple wrappers are not supported.
* The order of execution of wrappers is not guaranteed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do execute them in order and should define it instead. It might be important to performance which wrapper goes first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean something like if the first wrapper filters out many documents, the second wrapper has less work to do? How do we decide what the correct order should be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the order of application of wrappers based on the order in which they were added (we already do). The problem with the order here is that this is called by plugins and the order in which plugins are set up is not defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So real clients of this API can't really rely on the order in reality.

MergeMetrics.NOOP
);
security.onIndexModule(indexModule);
// indexReaderWrapper is a SetOnce so if Security#onIndexModule had already set an ReaderWrapper we would get an exception here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems outdated by this change. I did not "debug" why the comment was made, might be worth looking into. Sounds like this "setReaderWrapper" really verifies that no reader wrapper was set yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Indexing Meta label for Distributed Indexing team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants