Skip to content

OAK-12146 Use the latest elastic OR lucene index, if there are multiple#2813

Open
thomasmueller wants to merge 11 commits intotrunkfrom
OAK-12146
Open

OAK-12146 Use the latest elastic OR lucene index, if there are multiple#2813
thomasmueller wants to merge 11 commits intotrunkfrom
OAK-12146

Conversation

@thomasmueller
Copy link
Copy Markdown
Member

No description provided.

@sonarqubecloud
Copy link
Copy Markdown

Comment on lines +164 to +185
// Why we create the ES index manually instead of registering ElasticIndexEditorProvider:
//
// Normally, ElasticIndexEditorProvider would create the Elasticsearch index during
// root.commit(). However, this module's test classpath has a Lucene version conflict:
// oak-lucene depends on Lucene 4.7.2, while oak-search-elastic requires Lucene 9.x.
// Maven's dependency resolution picks lucene-core:4.7.2 (the nearer declaration), so
// lucene-core:9.x is absent. ElasticIndexEditorProvider transitively loads
// ElasticCustomAnalyzer, which imports org.apache.lucene.util.ResourceLoader — a class
// that only exists in lucene-core 9.x (in 4.x it had a different package). The result is
// a NoClassDefFoundError at commit time, even when no custom analyzers are configured.
//
// This test does not need to index actual content — it only checks which index the query
// planner *selects* (via EXPLAIN). For plan generation, ElasticIndexStatistics.numDocs()
// is called to estimate the entry count; it issues a COUNT request to Elasticsearch and
// throws index_not_found_exception if the index does not exist. That exception propagates
// as UncheckedExecutionException and is caught in FulltextIndex.getPlans(), which silently
// skips the index — causing the test to fail with "traverse allNodes".
//
// The fix: after committing the Oak index definitions (which registers the definition in
// the ElasticIndexTracker via the Observer callback), we create an empty Elasticsearch
// index directly via the REST client. numDocs() then returns 0, the planner can generate
// a plan, and version selection is exercised correctly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we could avoid this. If the issue is just ElasticIndexStatistics.numDocs(), perhaps we can add a @TestOnly constructor in ElasticIndexTracker to pass a NoOp ElasticIndexStatistics instance. This will be used by ElasticIndexNodeManager/ElasticIndexNode. If this works, there would be no need to include oak-search-elastic and testcontainers dependencies along with ElasticConnectionRule that makes the test slow since it has to start an ES container.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an IT test test, so I think it's fine if it's slower than other tests. I think it makes sense that we test with the "correct" setup, and don't mock it. And for this, I think we do need to include both oak-search-elastic and oak-lucene, because this is what we test here: they need to reply with a plan for a given query.

But I'm still trying to find is a way to make it more real-world: that is, I think it should be possible to configure things so that we don't risk NoClassDefFoundError, and we don't need to create a elastic index from within the test case...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants