Skip to content

Indexed table integration for Lucene + parquet#20954

Merged
bharath-techie merged 4 commits intoopensearch-project:feature/datafusionfrom
bharath-techie:originos-search-integration
Mar 21, 2026
Merged

Indexed table integration for Lucene + parquet#20954
bharath-techie merged 4 commits intoopensearch-project:feature/datafusionfrom
bharath-techie:originos-search-integration

Conversation

@bharath-techie
Copy link
Contributor

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: bharath-techie <bharath78910@gmail.com>
Signed-off-by: bharath-techie <bharath78910@gmail.com>
Signed-off-by: bharath-techie <bharath78910@gmail.com>
Signed-off-by: bharath-techie <bharath78910@gmail.com>
@bharath-techie bharath-techie merged commit 361b2b3 into opensearch-project:feature/datafusion Mar 21, 2026
23 of 46 checks passed
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 9de19b4.

PathLineSeverityDescription
server/src/main/java/org/opensearch/index/IndexSettings.java862mediumDefault for OPTIMIZED_INDEX_ENABLED_SETTING silently changed from false to true. This globally enables an experimental/optimized index path for ALL indexes without explicit opt-in, potentially altering query behavior across existing deployments. The change is unrelated to the stated PR purpose of adding indexed query support.
plugins/engine-datafusion/jni/src/query_executor.rs151mediumtarget_partitions is hardcoded to 4, replacing the dynamic `target_partitions` parameter that was previously passed from the caller. This silently overrides user/operator configuration and is unrelated to the indexed query feature, suggesting a debug artifact that was accidentally committed and could degrade performance at scale.
plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/search/LuceneIndexSearcher.java54mediumStatic ConcurrentHashMaps (activeShardWeights, activePartitionScorers) accumulate Weight and Scorer objects per query. On the success path, releaseShardWeight is never called — the comment in IndexedQueryBridge explicitly defers release to the caller but no caller performs this release after successful stream consumption. This creates a JVM-wide memory/resource leak that grows with query volume.
plugins/engine-datafusion/jni/src/indexed_table/stream.rs303lowPervasive eprintln! debug statements (tagged [INDEXED-DEBUG], [PARTITION-DEBUG], [INDEXED-TIMING]) throughout the stream execution path write internal structural details — row group bitset popcounts, doc ranges, thread names, partition assignments, and timing — to stderr. While not exfiltrating to external endpoints, this verbose output is production-inappropriate and leaks index internals.
plugins/engine-datafusion/jni/src/indexed_query_executor.rs97lowprintln! (stdout) used for explain plan output rather than the project's logging framework. Combined with eprintln! usage elsewhere, debug instrumentation was not cleaned up before submission and bypasses log level controls.
plugins/engine-datafusion/src/main/java/org/opensearch/datafusion/search/IndexedQueryBridge.java80lowsearcher.setQueryCache(null) unconditionally disables Lucene's query cache for all indexed queries. The TODO comment acknowledges this is temporary, but shipping with caching disabled degrades performance and bypasses cache-based optimizations without a runtime setting to re-enable it.
server/src/main/java/org/opensearch/index/engine/InternalEngine.java673lowreinitReaderManager replaces volatile externalReaderManager and internalReaderManager fields at runtime using an external IndexWriter. This mutates core engine state post-construction without synchronization guards visible in the diff, potentially causing races with concurrent searcher acquisitions during the replacement window.

The table above displays the top 10 most important findings.

Total: 7 | Critical: 0 | High: 0 | Medium: 3 | Low: 4


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant