-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[Workload Management][Rule Based Autotagging] Scroll API support in autotagging #20151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Workload Management][Rule Based Autotagging] Scroll API support in autotagging #20151
Conversation
Signed-off-by: Lamine Idjeraoui <[email protected]>
WalkthroughThe changes enable scroll requests to be tagged by the workload management system by capturing and propagating original request indices through scroll IDs. Index resolution support is added to TransportSearchScrollAction, allowing it to extract indices from parsed scroll IDs for auto-tagging workflows. Changes
Sequence DiagramsequenceDiagram
participant Client
participant TSA as TransportSearchScrollAction
participant PSID as ParsedScrollId
participant ATAF as AutoTaggingActionFilter
participant WLS as RuleProcessingService
Client->>TSA: SearchScrollRequest (scrollId)
activate TSA
TSA->>PSID: parseScrollId()
activate PSID
PSID-->>TSA: ParsedScrollId (with originalIndices)
deactivate PSID
TSA->>TSA: resolveIndices(SearchScrollRequest)
Note over TSA: Extract originalIndices from ParsedScrollId<br/>Build OptionallyResolvedIndices
TSA->>TSA: Store in request metadata
TSA->>ATAF: Filter request
activate ATAF
ATAF->>ATAF: Recognize SearchScrollRequest
ATAF->>ATAF: Extract ResolvedIndices from metadata
ATAF->>ATAF: Create AttributeExtractor with INDEX_PATTERN
ATAF->>WLS: evaluateLabel(RuleAttribute...)
activate WLS
WLS-->>ATAF: Workload group label
deactivate WLS
ATAF-->>Client: Tagged response
deactivate ATAF
deactivate TSA
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/main/java/org/opensearch/action/search/SearchScrollRequest.java (1)
101-111: Bug: Cache is not invalidated when scrollId is changed.If
scrollId(String)is called afterparseScrollId()has already been invoked, the cachedparsedScrollIdwill become stale and return results for the old scroll ID.Apply this diff to clear the cache when scrollId changes:
public SearchScrollRequest scrollId(String scrollId) { this.scrollId = scrollId; + this.parsedScrollId = null; return this; }
🧹 Nitpick comments (5)
plugins/workload-management/src/internalClusterTest/java/org/opensearch/plugin/wlm/WlmAutoTaggingIT.java (1)
442-477: Scroll auto‑tagging IT is correct; consider clearing scroll context as a small improvementThis test is well-structured and mirrors the existing WLM auto-tagging tests: it enables WLM, creates a workload group and rule, indexes data, and then verifies that both the initial search (with
setScroll(...)) and the subsequentprepareSearchScroll(...)call increase completions. The assertions onscrollIdand monotonic increases ingetCompletions(...)make sense and exercise the new scroll-tagging path end-to-end.One minor improvement you might consider:
- After finishing with the scroll, clear the scroll context (e.g.,
client().prepareClearScroll().addScrollId(scrollId).get();), possibly in atry/finally, to avoid leaving scrolls open across potentialassertBusyretries. It’s not a blocker for a small IT but would make the test more self-contained and resource-friendly.server/src/test/java/org/opensearch/action/search/SearchScrollAsyncActionTests.java (1)
481-485: Constructor update is correct, but consider adding test coverage for non-empty originalIndices.The update to match the new
ParsedScrollIdconstructor signature is correct. However, this test file doesn't exercise the neworiginalIndicesfunctionality. Consider adding a test case that verifies scroll behavior whenoriginalIndicesis populated, to ensure the autotagging flow works end-to-end.server/src/main/java/org/opensearch/action/search/ParsedScrollId.java (1)
56-79: Consider defensive copies for array field in public API.Since
ParsedScrollIdis annotated with@PublicApi, external callers could mutate theoriginalIndicesarray after construction or after callinggetOriginalIndices(). Consider defensive copying for better encapsulation.- ParsedScrollId(String source, String type, SearchContextIdForNode[] context, String[] originalIndices) { + ParsedScrollId(String source, String type, SearchContextIdForNode[] context, String[] originalIndices) { this.source = source; this.type = type; this.context = context; - this.originalIndices = originalIndices; + this.originalIndices = originalIndices != null ? originalIndices.clone() : null; }public String[] getOriginalIndices() { - return originalIndices; + return originalIndices != null ? originalIndices.clone() : null; }server/src/main/java/org/opensearch/action/search/TransportSearchHelper.java (1)
137-146: Verify the deserialization logic aligns with the position check.The conditional reading of
originalIndicesis correct, but ensure the position check on line 148 (in.getPosition() != bytes.length) still functions as intended:
- When
originalIndicesare absent, position should equalbytes.lengthbefore line 148.- When
originalIndicesare present, they must be fully consumed so position equalsbytes.length.Consider adding a unit test that verifies scroll IDs with and without
originalIndicespass the position check.plugins/workload-management/src/main/java/org/opensearch/plugin/wlm/AutoTaggingActionFilter.java (1)
100-119: Consider extracting the anonymous AttributeExtractor to improve readability.The inline anonymous
AttributeExtractorimplementation is functionally correct but could be refactored into a named helper method (e.g.,createResolvedIndicesExtractor) to improve code clarity and testability.Example refactor:
private static AttributeExtractor<String> createResolvedIndicesExtractor(Set<String> indexNames) { return new AttributeExtractor<>() { @Override public Attribute getAttribute() { return RuleAttribute.INDEX_PATTERN; } @Override public Iterable<String> extract() { return indexNames; } @Override public LogicalOperator getLogicalOperator() { return LogicalOperator.AND; } }; }Then replace lines 104-119 with:
- attributeExtractors.add(new AttributeExtractor<>() { - @Override - public Attribute getAttribute() { - return RuleAttribute.INDEX_PATTERN; - } - - @Override - public Iterable<String> extract() { - return names; - } - - @Override - public LogicalOperator getLogicalOperator() { - return LogicalOperator.AND; - } - }); + attributeExtractors.add(createResolvedIndicesExtractor(names));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
plugins/workload-management/src/internalClusterTest/java/org/opensearch/plugin/wlm/WlmAutoTaggingIT.java(1 hunks)plugins/workload-management/src/main/java/org/opensearch/plugin/wlm/AutoTaggingActionFilter.java(3 hunks)plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/AutoTaggingActionFilterTests.java(2 hunks)server/src/main/java/org/opensearch/action/search/AbstractSearchAsyncAction.java(1 hunks)server/src/main/java/org/opensearch/action/search/ParsedScrollId.java(2 hunks)server/src/main/java/org/opensearch/action/search/SearchScrollRequest.java(2 hunks)server/src/main/java/org/opensearch/action/search/TransportSearchHelper.java(3 hunks)server/src/main/java/org/opensearch/action/search/TransportSearchScrollAction.java(4 hunks)server/src/test/java/org/opensearch/action/search/ParsedScrollIdTests.java(1 hunks)server/src/test/java/org/opensearch/action/search/SearchScrollAsyncActionTests.java(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
server/src/main/java/org/opensearch/action/search/AbstractSearchAsyncAction.java (1)
server/src/main/java/org/opensearch/action/search/TransportSearchHelper.java (1)
TransportSearchHelper(55-160)
plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/AutoTaggingActionFilterTests.java (1)
server/src/main/java/org/opensearch/action/support/ActionRequestMetadata.java (1)
ActionRequestMetadata(21-61)
server/src/main/java/org/opensearch/action/search/SearchScrollRequest.java (1)
server/src/main/java/org/opensearch/action/search/TransportSearchHelper.java (1)
TransportSearchHelper(55-160)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: Analyze (java)
🔇 Additional comments (11)
server/src/main/java/org/opensearch/action/search/AbstractSearchAsyncAction.java (1)
777-779: LGTM! Indices are now correctly propagated to scroll ID construction.The change correctly passes
request.indices()tobuildScrollId, enabling the scroll ID to carry original index information for autotagging purposes. TheTransportSearchHelper.buildScrollIdmethod properly handles null/empty arrays with version gating.server/src/test/java/org/opensearch/action/search/ParsedScrollIdTests.java (1)
53-58: LGTM! Constructor call updated correctly.The test correctly uses the new constructor signature. The
hasLocalIndices()test remains valid since it doesn't depend onoriginalIndices. Consider adding a separate test forgetOriginalIndices()to verify it returns the expected values.server/src/main/java/org/opensearch/action/search/TransportSearchScrollAction.java (3)
54-56: LGTM! TransportIndicesResolvingAction implementation enables autotagging for scroll requests.The class now correctly implements
TransportIndicesResolvingAction<SearchScrollRequest>, enabling the workload management system to resolve indices for scroll requests and apply appropriate tags.
87-87: Good change to use cached parsing.Using
request.parseScrollId()instead ofTransportSearchHelper.parseScrollId(request.scrollId())leverages the lazy caching mechanism, avoiding redundant parsing ifresolveIndices()was called earlier in the request lifecycle.
123-143: Robust implementation with proper defensive handling.The
resolveIndices()implementation correctly:
- Guards against null/empty scroll ID strings
- Handles null parsed result
- Returns
OptionallyResolvedIndices.unknown()for missing originalIndices- Catches all exceptions to prevent failures from blocking request processing
One minor note: the null check on line 132 (
parsed == null) should never be true given the currentparseScrollId()implementation throws on parse failure rather than returning null. However, keeping it is good defensive programming.server/src/main/java/org/opensearch/action/search/TransportSearchHelper.java (4)
65-67: LGTM!Delegating to the new overload with
nullfororiginalIndicesmaintains backward compatibility for existing callers.
151-151: Verify the ParsedScrollId constructor signature.The code at line 151 calls
ParsedScrollId(scrollId, type, context, originalIndices)with four parameters. Ensure that theParsedScrollIdclass constructor has been updated to accept all four parameters, includingString[] originalIndicesas the fourth parameter.
63-63: Verify that Version.V_3_3_2 exists and is the intended version.This constant gates a wire format change for including originalIndices in scroll IDs. Ensure that
Version.V_3_3_2is defined in theVersionclass and that this is the correct minimum version for the feature.
88-99: No action required — scroll IDs are designed as opaque, versioned tokens.The code correctly uses version gating to conditionally serialize
originalIndices. OpenSearch's documented design explicitly treats scroll IDs as opaque, internally-versioned structures that are not intended to be cross-version compatible. Scroll IDs should never be parsed or shared across versions; they are opaque tokens whose internal format can change arbitrarily. Each version reads exactly what it wrote, and the conditional serialization ensures older nodes do not encounter unexpected fields.plugins/workload-management/src/main/java/org/opensearch/plugin/wlm/AutoTaggingActionFilter.java (1)
88-90: LGTM: Request type validation extended correctly.The validation logic now correctly includes both
SearchRequestandSearchScrollRequest, allowing scroll operations to be tagged by the workload management system.plugins/workload-management/src/test/java/org/opensearch/plugin/wlm/AutoTaggingActionFilterTests.java (1)
98-115: Additional test coverage should verify fallback behavior and validate extracted indices.The test correctly verifies the happy path for
SearchScrollRequestwith resolved indices and workload tagging. However, without access to the codebase for verification, I cannot confirm:
- Whether a fallback scenario test is actually needed (requires reviewing lines 122-125 in
AutoTaggingActionFilter.java)- Whether the API references in the suggested example (
OptionallyResolvedIndices.unknown()) are valid- Whether the
ResolvedIndices.of()factory method exists as usedThe suggestion to verify actual indices passed to
evaluateLabelvia argument captors remains valid and should be considered if the fallback scenario is indeed a code path that needs testing.
| final OptionallyResolvedIndices optionallyResolved = actionRequestMetadata.resolvedIndices(); | ||
| final boolean hasResolvedIndices = optionallyResolved instanceof ResolvedIndices; | ||
|
|
||
| if (hasResolvedIndices) { | ||
| final ResolvedIndices resolved = (ResolvedIndices) optionallyResolved; | ||
| final Set<String> names = resolved.local().names(); | ||
|
|
||
| attributeExtractors.add(new AttributeExtractor<>() { | ||
| @Override | ||
| public Attribute getAttribute() { | ||
| return RuleAttribute.INDEX_PATTERN; | ||
| } | ||
|
|
||
| @Override | ||
| public Iterable<String> extract() { | ||
| return names; | ||
| } | ||
|
|
||
| @Override | ||
| public LogicalOperator getLogicalOperator() { | ||
| return LogicalOperator.AND; | ||
| } | ||
| }); | ||
| } else if (isSearchRequest) { | ||
| attributeExtractors.add(new IndicesExtractor((IndicesRequest) request)); | ||
| } else { | ||
| chain.proceed(task, action, request, listener); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for LogicalOperator usage in similar contexts
rg -n -C 3 "LogicalOperator\." --type=java -g '!**/test/**' plugins/workload-management/Repository: opensearch-project/OpenSearch
Length of output: 202
🌐 Web query:
OpenSearch workload management rule evaluation LogicalOperator index patterns
💡 Result:
Summary of how OpenSearch evaluates workload-management rules with index patterns
- index_pattern supports exact names or wildcard/prefix patterns (e.g., "logs-2025-04" or "logs-*"). [1][2]
- Within a single attribute value list (e.g., "index_pattern": ["logs-","events-"]), items are matched as OR — a request targeting any listed pattern satisfies that attribute. [1][2]
- A rule with multiple different attributes (for example index_pattern plus principal.username) requires all those attribute conditions to be met to match (the rule’s attributes are conjunctive). In other words, attribute clauses within a rule are combined so the rule matches only when every specified attribute matches the request. [2]
- When multiple rules match a request, OpenSearch resolves which rule to apply using precedence: attributes have a fixed priority order (e.g., principal.username/principal.role higher than index_pattern for workload_group rules), and more specific patterns win over more general ones (e.g., logs-prod-2025-* beats logs-prod-* beats logs-*). The rule with the highest-priority attribute / most specific match is applied. [1][2]
- Practical implications:
- Use specific index patterns to ensure the intended rule wins.
- Use lists in index_pattern to cover multiple indices (they behave as OR).
- Combine attributes in a rule to target narrower request sets (attributes are ANDed). [1][2]
Sources:
[1] OpenSearch — Workload group rules (Rules API / workload_group).
[2] OpenSearch — Rule-based auto-tagging (attribute matching, precedence, examples).
Fix LogicalOperator.AND to LogicalOperator.OR for index pattern matching.
Within a single attribute's value list, index patterns are evaluated as OR (a request matching any pattern satisfies the attribute). Line 117 should use LogicalOperator.OR instead of LogicalOperator.AND to align with OpenSearch workload management rule semantics. Using AND would incorrectly require all index patterns to match simultaneously, breaking the intended behavior where multiple patterns in a list provide alternative matching criteria.
The fallback behavior at lines 122-125 (proceeding silently when SearchScrollRequest lacks resolved indices) appears intentional as a defensive measure, though logging this scenario could improve observability for debugging unexpected states where the transport layer fails to provide resolved indices.
🤖 Prompt for AI Agents
In
plugins/workload-management/src/main/java/org/opensearch/plugin/wlm/AutoTaggingActionFilter.java
around lines 97 to 125, the LogicalOperator used for the INDEX_PATTERN attribute
is incorrect: change LogicalOperator.AND to LogicalOperator.OR on the
AttributeExtractor at line ~117 so that multiple index patterns are treated as
alternatives (match any) rather than requiring all to match; leave the rest of
the control flow intact (keep the IndicesExtractor fallback and the early
proceed/return behavior), and optionally add a debug log where the code proceeds
silently for clarity if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the same logic than IndicesExtractor which uses AND.
public class IndicesExtractor implements AttributeExtractor<String> {
private final IndicesRequest indicesRequest;
/**
* Default constructor
* @param indicesRequest
*/
public IndicesExtractor(IndicesRequest indicesRequest) {
this.indicesRequest = indicesRequest;
}
@Override
public Attribute getAttribute() {
return RuleAttribute.INDEX_PATTERN;
}
@Override
public Iterable<String> extract() {
return List.of(indicesRequest.indices());
}
@Override
public LogicalOperator getLogicalOperator() {
return AND;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for LogicalOperator enum definition and usage in rule evaluation
rg -n -A 10 "enum LogicalOperator" --type=java
# Search for how evaluateLabel uses LogicalOperator
rg -n -C 5 "getLogicalOperator\(\)" --type=java -g '!**/test/**'
# Search for rule evaluation logic
rg -n -C 5 "evaluateLabel" --type=java -g '!**/test/**' | head -100
Length of output: 8550
🏁 Script executed:
#!/bin/bash
# Find tests for IndicesExtractor and rule evaluation with multiple indices
rg -n -B 5 -A 10 "IndicesExtractor" --type=java -g '**/test/**'
# Search for AttributeExtractor interface definition
ast-grep --pattern 'interface AttributeExtractor$_$ {
$$$
}'
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for the rule processing service implementation
fd -e java RuleProcessingService | head -5 | xargs cat
Length of output: 10746
@laminelam You're absolutely right, and I apologize for the confusion in my previous comment.
After reviewing the code more carefully, I found the documentation in AttributeExtractor.java that clarifies the semantics:
"if the request targets both index A and B, then a rule must have both index A and B as attributes, requiring an AND operator."
The LogicalOperator controls how multiple indices extracted from the request should be matched against rules, not how multiple patterns within a rule are evaluated:
- AND: If a request targets indices ["logs-2024", "metrics-2024"], a matching rule must contain BOTH indices in its
index_patternattribute - OR: If a request targets indices ["logs-2024", "metrics-2024"], a matching rule can contain EITHER index (or both)
Using AND is correct because when a request operates on multiple indices simultaneously, you want to ensure the rule covers all of them. This prevents partial matches that could lead to incorrect tagging when sensitive and non-sensitive indices are accessed together.
Your implementation correctly follows the same logic as IndicesExtractor. My earlier comment about OR was based on a misunderstanding—I was conflating how patterns within a rule definition are evaluated (which is OR) with how the extractor's logical operator works (which should be AND for this use case).
Thank you for the clarification!
🐰
/___/
( o o )
( =^= ) learned something new!
/| |
(| |)
🧬 Code Graph Analysis Results
[plugins/workload-management/src/main/java/org/opensearch/plugin/wlm/rule/attribute_extractor/IndicesExtractor.java] (lines 23-48)
public class IndicesExtractor implements AttributeExtractor<String> {
private final IndicesRequest indicesRequest;
/**
* Default constructor
* @param indicesRequest
*/
public IndicesExtractor(IndicesRequest indicesRequest) {
this.indicesRequest = indicesRequest;
}
@Override
public Attribute getAttribute() {
return RuleAttribute.INDEX_PATTERN;
}
@Override
public Iterable<String> extract() {
return List.of(indicesRequest.indices());
}
@Override
public LogicalOperator getLogicalOperator() {
return AND;
}
}
[server/src/main/java/org/opensearch/action/support/ActionRequestMetadata.java] (lines 21-61)
Docstring-style summary:
- Purpose: ActionRequestMetadata tracks metadata about an ActionRequest, including the ability to resolve indices referenced by the request when supported by the transport action.
- Method described: resolvedIndices()
- Returns: OptionallyResolvedIndices
- Behavior:
- If the underlying transportAction is not an instance of TransportIndicesResolvingAction, returns an unknown/unknown-indices placeholder.
- If it can resolve indices, calls resolveIndices(request) to obtain the actual indices information.
- Important details:
- Uses a generic type <Request extends ActionRequest, Response extends ActionResponse>.
- The method relies on the transport-action's ability to resolve indices and may not cache results, as per the comment in the code.
- This mechanism is used by callers to determine if indices can be resolved for a given request, enabling downstream logic to predicate on resolved local indices.
|
❌ Gradle check result for 2cc9bb7: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Resolves #8362
Description
[Describe what this change achieves]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Release Notes
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.