Indexed table integration for Lucene + parquet by bharath-techie · Pull Request #20954 · opensearch-project/OpenSearch

bharath-techie · 2026-03-21T18:20:04Z

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: bharath-techie <bharath78910@gmail.com>

github-actions · 2026-03-21T18:21:30Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 9de19b4.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/index/IndexSettings.java	862	medium	Default for OPTIMIZED_INDEX_ENABLED_SETTING silently changed from false to true. This globally enables an experimental/optimized index path for ALL indexes without explicit opt-in, potentially altering query behavior across existing deployments. The change is unrelated to the stated PR purpose of adding indexed query support.
plugins/engine-datafusion/jni/src/query_executor.rs	151	medium	target_partitions is hardcoded to 4, replacing the dynamic `target_partitions` parameter that was previously passed from the caller. This silently overrides user/operator configuration and is unrelated to the indexed query feature, suggesting a debug artifact that was accidentally committed and could degrade performance at scale.
plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/search/LuceneIndexSearcher.java	54	medium	Static ConcurrentHashMaps (activeShardWeights, activePartitionScorers) accumulate Weight and Scorer objects per query. On the success path, releaseShardWeight is never called — the comment in IndexedQueryBridge explicitly defers release to the caller but no caller performs this release after successful stream consumption. This creates a JVM-wide memory/resource leak that grows with query volume.
plugins/engine-datafusion/jni/src/indexed_table/stream.rs	303	low	Pervasive eprintln! debug statements (tagged [INDEXED-DEBUG], [PARTITION-DEBUG], [INDEXED-TIMING]) throughout the stream execution path write internal structural details — row group bitset popcounts, doc ranges, thread names, partition assignments, and timing — to stderr. While not exfiltrating to external endpoints, this verbose output is production-inappropriate and leaks index internals.
plugins/engine-datafusion/jni/src/indexed_query_executor.rs	97	low	println! (stdout) used for explain plan output rather than the project's logging framework. Combined with eprintln! usage elsewhere, debug instrumentation was not cleaned up before submission and bypasses log level controls.
plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/search/IndexedQueryBridge.java	80	low	searcher.setQueryCache(null) unconditionally disables Lucene's query cache for all indexed queries. The TODO comment acknowledges this is temporary, but shipping with caching disabled degrades performance and bypasses cache-based optimizations without a runtime setting to re-enable it.
server/src/main/java/org/opensearch/index/engine/InternalEngine.java	673	low	reinitReaderManager replaces volatile externalReaderManager and internalReaderManager fields at runtime using an external IndexWriter. This mutates core engine state post-construction without synchronization guards visible in the diff, potentially causing races with concurrent searcher acquisitions during the replacement window.

The table above displays the top 10 most important findings.

Total: 7 | Critical: 0 | High: 0 | Medium: 3 | Low: 4

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

bharath-techie added 4 commits March 21, 2026 21:58

adding ability to switch to lucene engine

f3890f5

Signed-off-by: bharath-techie <bharath78910@gmail.com>

fixing query

d920617

Signed-off-by: bharath-techie <bharath78910@gmail.com>

indexed lucene changes

3e7b0f6

Signed-off-by: bharath-techie <bharath78910@gmail.com>

adding rest test for Lucene + parquet with a hardcoded plan

9de19b4

Signed-off-by: bharath-techie <bharath78910@gmail.com>

bharath-techie requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners March 21, 2026 18:20

bharath-techie merged commit 361b2b3 into opensearch-project:feature/datafusion Mar 21, 2026
23 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexed table integration for Lucene + parquet#20954

Indexed table integration for Lucene + parquet#20954
bharath-techie merged 4 commits intoopensearch-project:feature/datafusionfrom
bharath-techie:originos-search-integration

bharath-techie commented Mar 21, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bharath-techie commented Mar 21, 2026

Description

Related Issues

Check List

Uh oh!

Uh oh!

github-actions bot commented Mar 21, 2026

PR Code Analyzer ❗

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant