Skip to content

Revert "Disable concurrent search for filter duplicates"#20915

Merged
andrross merged 1 commit intoopensearch-project:mainfrom
andrross:revert-f46ce23c
Mar 19, 2026
Merged

Revert "Disable concurrent search for filter duplicates"#20915
andrross merged 1 commit intoopensearch-project:mainfrom
andrross:revert-f46ce23c

Conversation

@andrross
Copy link
Member

Concurrent search creates a new aggregator per slice, so there is no concurrency problem here. The dedup logic is currently per-shard with no reduce logic at the coordinator, so concurrent search does change that to be per-slice, but does not fundamentally change the behavior. I'm reverting this and will deal with test flakiness if it continues.

This reverts commit f46ce23.

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 14f6861.

PathLineSeverityDescription
server/src/main/java/org/opensearch/search/aggregations/bucket/terms/SignificantTextAggregatorFactory.java318mediumRemoves an explicit thread-safety guard without justification. The original code contained a detailed comment explaining that DuplicateByteSequenceSpotter is stateful and not thread-safe, requiring concurrent search to be disabled when filterDuplicateText is active. The change unconditionally returns true, enabling concurrent segment search in all cases. This could deliberately introduce race conditions or data corruption in aggregation results when filterDuplicateText is used. The simultaneous removal of the corresponding CHANGELOG entry ('Disable concurrent search for filter duplicates in significant_text') obscures the reversal of this safety measure, which is a suspicious pattern. No evidence is provided that the underlying thread-safety issue in DuplicateByteSequenceSpotter was resolved.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@andrross andrross added the skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. label Mar 18, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 18, 2026

PR Reviewer Guide 🔍

(Review updated until commit 8b6ca61)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Thread Safety

The PR description acknowledges that DuplicateByteSequenceSpotter is stateful and not thread-safe, but claims concurrent search creates a new aggregator per slice. This assumption should be carefully validated — specifically, that each slice gets its own independent instance of DuplicateByteSequenceSpotter and that no shared state exists between slices when filterDuplicateText is enabled.

protected boolean supportsConcurrentSegmentSearch() {
    return true;
}
Dedup Behavior Change

The PR description notes that dedup logic changes from per-shard to per-slice with concurrent search enabled. This behavioral change means duplicate detection is less effective (duplicates across slices of the same shard won't be caught), which could affect result quality. This trade-off should be explicitly documented or tested.

protected boolean supportsConcurrentSegmentSearch() {
    return true;
}

@github-actions
Copy link
Contributor

github-actions bot commented Mar 18, 2026

PR Code Suggestions ✨

Latest suggestions up to 8b6ca61
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Revert unsafe concurrent search enablement

The original code disabled concurrent search when filterDuplicateText was enabled
because DuplicateByteSequenceSpotter is stateful and not thread-safe. Simply
returning true unconditionally reintroduces a thread-safety bug when
filterDuplicateText is active. Before enabling concurrent search unconditionally,
ensure that DuplicateByteSequenceSpotter has been made thread-safe or that each
thread gets its own instance.

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/SignificantTextAggregatorFactory.java [318]

-return true;
+return filterDuplicateText == false;
Suggestion importance[1-10]: 8

__

Why: This is a valid concern - the original code disabled concurrent search when filterDuplicateText was enabled due to DuplicateByteSequenceSpotter being stateful and not thread-safe. Simply returning true unconditionally could reintroduce a thread-safety bug unless the underlying implementation has been fixed. The suggestion correctly identifies this potential issue and proposes reverting to the safer conditional check.

Medium

Previous suggestions

Suggestions up to commit 14f6861
CategorySuggestion                                                                                                                                    Impact
Possible issue
Revert unsafe concurrent search enablement

The original code disabled concurrent segment search when filterDuplicateText was
enabled because DuplicateByteSequenceSpotter is stateful and not thread-safe. Simply
returning true unconditionally reintroduces a thread-safety bug when
filterDuplicateText is active. Before enabling concurrent search, ensure that
DuplicateByteSequenceSpotter is either made thread-safe or that each thread gets its
own instance.

server/src/main/java/org/opensearch/search/aggregations/bucket/terms/SignificantTextAggregatorFactory.java [318]

-return true;
+return filterDuplicateText == false;
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a potential thread-safety issue: DuplicateByteSequenceSpotter is stateful and not thread-safe, so unconditionally returning true could reintroduce a concurrency bug when filterDuplicateText is enabled. This is a valid concern about correctness, though the PR may have addressed thread-safety elsewhere (e.g., per-thread instances).

Medium

@github-actions
Copy link
Contributor

❌ Gradle check result for 14f6861: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 14f6861: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Concurrent search creates a new aggregator per slice, so there is no
concurrency problem here. The dedup logic is currently per-shard with no
reduce logic at the coordinator, so concurrent search does change that
to be per-slice, but does not fundamentally change the behavior. I'm
reverting this and will deal with test flakiness if it continues.

This reverts commit f46ce23.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 8b6ca61

@github-actions
Copy link
Contributor

❌ Gradle check result for 8b6ca61: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 8b6ca61: SUCCESS

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.15%. Comparing base (9f5d0e1) to head (8b6ca61).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...bucket/terms/SignificantTextAggregatorFactory.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20915      +/-   ##
============================================
- Coverage     73.30%   73.15%   -0.16%     
+ Complexity    72484    72428      -56     
============================================
  Files          5819     5819              
  Lines        331155   331237      +82     
  Branches      47840    47860      +20     
============================================
- Hits         242769   242324     -445     
- Misses        68876    69419     +543     
+ Partials      19510    19494      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross andrross merged commit e5b3fa3 into opensearch-project:main Mar 19, 2026
37 of 40 checks passed
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Mar 20, 2026
…project#20915)

Concurrent search creates a new aggregator per slice, so there is no
concurrency problem here. The dedup logic is currently per-shard with no
reduce logic at the coordinator, so concurrent search does change that
to be per-slice, but does not fundamentally change the behavior. I'm
reverting this and will deal with test flakiness if it continues.

This reverts commit f46ce23.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: kkewwei <kkewwei@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants