-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Fixing indexing regression and bug fixes for grouping criteria #20145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdjusts lookup-map lock retry behavior: raises default retry limit, propagates a max-retry parameter into CompositeIndexWriter's lookup path using bounded tryAcquire loops, removes exception-driven retry from bulk action, narrows a mapper return set, updates tests and CHANGELOG entries. (≤50 words) Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (2)
215-215: Reduced encapsulation ofmapReadLock.The visibility of
mapReadLockhas been changed fromprivate finalto package-private, allowing direct access from other classes in the same package. This field controls critical concurrency behavior, and exposing it directly increases the risk of misuse.Consider:
- Keeping the field
privateand exposing only necessary operations through methods (e.g.,tryAcquireLock()).- If package-private access is required for the retry logic, add clear documentation about proper usage patterns and thread-safety requirements.
- Restrict access using a package-private accessor method rather than exposing the field directly.
498-498: Simplify boolean comparisons.The condition uses explicit
== falseand== truecomparisons which are redundant in Java.Apply this diff:
-if (success == false && current != null && current.mapReadLock.isHeldByCurrentThread() == true) { +if (!success && current != null && current.mapReadLock.isHeldByCurrentThread()) { current.mapReadLock.close(); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(1 hunks)server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java(10 hunks)server/src/main/java/org/opensearch/index/mapper/MapperService.java(2 hunks)server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java(0 hunks)server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java(4 hunks)server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java(1 hunks)
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: gradle-check
🔇 Additional comments (13)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)
87-87: LGTM!Import correctly added to support the stream operations in
getCompositeFieldTypes().
693-697: Verify the behavior change scope and call frequency.The filtering to return only
StarTreeMapper.StarTreeFieldTypeinstances represents a narrowed scope from returning all composite field types. Confirm this change is intentional and whether any callers expect otherCompositeMappedFieldTypeimplementations. Additionally, verify the call frequency of this method; if invoked on hot paths, consider caching the filtered result to avoid repeated stream collection operations.server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java (1)
106-106: LGTM!The test constant is appropriately set to a lower value (20) than the production default (100) for faster test execution while still being within the valid range (5-500).
server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (5)
44-46: LGTM!Mockito imports correctly added to support the new verification test.
71-77: LGTM!Method call correctly updated to include
MAX_NUMBER_OF_RETRIESparameter, aligning with the new bounded retry API.
141-146: LGTM!Method call correctly updated with retry parameter.
197-202: LGTM!Method call correctly updated with retry parameter.
208-227: Test validates bounded retry semantics correctly.The test properly verifies:
LookupMapLockAcquisitionExceptionis thrown after exhausting retriestryAcquire()is called exactlyMAX_NUMBER_OF_RETRIEStimesOne consideration: the mock setup directly assigns to
map.currentandmap.current.mapReadLockwhich accesses package-private fields. This works for testing but creates tight coupling to internal implementation details.server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)
724-753: Retry logic moved to lower layer - verify exception handling.The
LookupMapLockAcquisitionExceptionretry logic has been removed from bulk action handling and moved toCompositeIndexWriterwith bounded retries. This architectural approach places retry logic closer to where the exception originates.Ensure that when
LookupMapLockAcquisitionExceptionpropagates up after max retries are exhausted, it's properly handled and doesn't cause unexpected bulk operation failures.server/src/main/java/org/opensearch/index/IndexSettings.java (1)
499-506: Significant default value change - verify upgrade impact.The default retry count increased to 100 with a maximum of 500. Since this is a dynamic setting, existing indices will apply the new default upon upgrade. Consider whether this change should be documented in release notes for operators who have tuned their clusters based on previous defaults.
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (3)
691-693: LGTM: Metrics gathering refactoring.The refactoring from stream-based iteration to explicit for-loops improves code clarity and performance for these simple aggregation operations. The logic is correct in all cases, with proper handling of both current and old maps where necessary, and appropriate locking in
ramBytesUsed().Also applies to: 702-704, 731-742, 758-770, 796-806
210-210: Verify removal offinalmodifier is intentional.The
finalmodifier has been removed fromCriteriaBasedIndexWriterLookup,CriteriaBasedWriterLock, andLiveIndexWriterDeletesMap. This allows subclassing of these internal implementation classes. Confirm whether:
- Subclassing is required for test mocking/stubbing.
- If so, consider restricting visibility to test scope or use sealed classes.
- If intentional for production extensibility, document extension points and invariants.
Also applies to: 301-301, 406-406
678-679: Verify retry configuration defaults and bounds.The
maxRetryOnLookupMapAcquisitionExceptionsetting controls retry behavior for lookup map acquisition. Without access to the codebase, I cannot confirm the specific default values, upper/lower bounds, or whether they align with industry best practices for lock acquisition retry mechanisms. Ensure the following:
- Default retry count is reasonable (typically 3–10 attempts for in-thread locks per best practices)
- Upper bounds prevent excessive retries that could cause performance degradation
- Configuration is documented and tunable per workload expectations
- Retry strategy includes backoff/jitter to avoid retry storms (not just blind spinning)
CHANGELOG.md
Outdated
| - Fix node bootstrap error when enable stream transport and remote cluster state ([#19948](https://github.com/opensearch-project/OpenSearch/pull/19948)) | ||
| - Fix deletion failure/error of unused index template; case when an index template matches a data stream but has a lower priority. ([#20102](https://github.com/opensearch-project/OpenSearch/pull/20102)) | ||
| - Fix toBuilder method in EngineConfig to include mergedSegmentTransferTracker([20105](https://github.com/opensearch-project/OpenSearch/pull/20105)) | ||
| - Fix indexing regression and bug fixes for grouping criteria([20145](https://github.com/opensearch-project/OpenSearch/pull/20145)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor formatting inconsistency.
The PR link format is inconsistent with other entries in the changelog. Missing # before the PR number.
Apply this diff to fix the formatting:
-- Fix indexing regression and bug fixes for grouping criteria([20145](https://github.com/opensearch-project/OpenSearch/pull/20145))
+- Fix indexing regression and bug fixes for grouping criteria ([#20145](https://github.com/opensearch-project/OpenSearch/pull/20145))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Fix indexing regression and bug fixes for grouping criteria([20145](https://github.com/opensearch-project/OpenSearch/pull/20145)) | |
| - Fix indexing regression and bug fixes for grouping criteria ([#20145](https://github.com/opensearch-project/OpenSearch/pull/20145)) |
🤖 Prompt for AI Agents
CHANGELOG.md around line 94: the PR link is missing the '#' before the PR number
which makes it inconsistent with other entries; update the link by inserting '#'
before the PR number so it reads .../pull/#20145 (i.e. change
(https://github.com/opensearch-project/OpenSearch/pull/20145) to include the
hash) and ensure the rest of the entry formatting matches surrounding lines.
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)
475-489: Add exponential backoff to retry loop.The retry loop performs a busy-wait by calling
tryAcquire()repeatedly without any delay between attempts. This pattern wastes CPU cycles under contention when the lock is frequently unavailable.As suggested in the previous review, add exponential backoff between retry attempts with
Thread.sleep(Math.min(1L << counter, 100L))and handleInterruptedExceptionby restoring the interrupt flag and propagatingLookupMapLockAcquisitionException.
🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)
483-483: Simplify boolean comparisons.The expressions
current.isClosed() == true(line 483) andcurrent.mapReadLock.isHeldByCurrentThread() == true(line 498) contain redundant boolean comparisons.Apply this diff to simplify:
- if (current != null && current.isClosed() == true) { + if (current != null && current.isClosed()) {- if (success == false && current != null && current.mapReadLock.isHeldByCurrentThread() == true) { + if (!success && current != null && current.mapReadLock.isHeldByCurrentThread()) {Also applies to: 498-498
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(1 hunks)server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java(10 hunks)server/src/main/java/org/opensearch/index/mapper/MapperService.java(2 hunks)server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java(0 hunks)server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java(4 hunks)server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java(1 hunks)
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (5)
- server/src/main/java/org/opensearch/index/mapper/MapperService.java
- CHANGELOG.md
- server/src/main/java/org/opensearch/index/IndexSettings.java
- server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java
- server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
🔇 Additional comments (6)
server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)
717-731: Behavior change: retries now only on version conflicts inonCompleteThe retry path in
onCompletenow exclusively handlesVersionConflictEngineException(viaisConflictException) bounded byretryOnConflict, with the previousLookupMapLockAcquisitionException-specific retry branch removed. That seems consistent with moving lock-acquisition retry logic down intoCompositeIndexWriterand keeps this layer from depending on engine-internal exception types.Please double-check that:
LookupMapLockAcquisitionExceptionis fully retried/consumed withinCompositeIndexWriter(or below), and- it does not bubble back up to this point expecting a retry here anymore,
so we don’t inadvertently change failure behavior for callers that previously relied on the bulk action to transparently retry on transient lookup-map lock contention.
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (5)
210-210: Visibility changes support the new retry mechanism.The removal of
finalmodifiers from inner classes and the change ofmapReadLockto package-private access are necessary to support the refactored retry logic. The package-private access onmapReadLock(line 215) enablesLiveIndexWriterDeletesMap.computeIndexWriterIfAbsentForCriteriato calltryAcquire()directly at line 482.Also applies to: 215-215, 301-301, 406-406, 408-408
466-471: Proper parameter propagation for configurable retry limit.The addition of
maxRetryOnLookupMapAcquisitionExceptionparameter enables configurable retry behavior, and the value is correctly sourced fromengineConfig.getIndexSettings().Also applies to: 675-680
691-693: Simplified iteration improves readability.The refactoring from stream-based iteration to direct iteration over
criteriaBasedIndexWriterMap.values()is cleaner and avoids unnecessary intermediate operations.Also applies to: 702-704
731-750: Proper tragic exception checking across all writers.The iteration through both current and old writer maps to detect tragic exceptions is thorough and correctly checks if writers are closed before accessing their tragic exception state.
753-774: Correct synchronization and state checks.The
ramBytesUsed()method properly acquires write locks before iterating, and therollback()method correctly checks if writers are open before attempting rollback operations.Also applies to: 794-811
|
❌ Gradle check result for 4b247a0: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (1)
44-46: Retry wiring in tests looks correct; consider simplifying the mocking for readability.
- Updating all
computeIndexWriterIfAbsentForCriteriainvocations to passMAX_NUMBER_OF_RETRIESkeeps the tests consistent with the new API and the retry semantics; this looks correct.testMaxRetryCountWhenWriteLockDuringIndexingcorrectly verifies thattryAcquire()is invoked exactlyMAX_NUMBER_OF_RETRIEStimes when the lock is never obtained, and theLookupMapLockAcquisitionExceptionis thrown as expected.As a minor test ergonomics tweak, you could stub and verify directly on
writerLockinstead of going throughmap.current.mapReadLockin thewhen(...)andverify(...)calls. That would make the test a bit less coupled to the internal layout ofLiveIndexWriterDeletesMapandCriteriaBasedIndexWriterLookupwhile preserving the behavior being asserted.Also applies to: 72-77, 141-146, 197-202, 208-227
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)
210-215: Visibility and mutability changes for nested types are acceptable but could use an explicit “for testing” annotation.Making
CriteriaBasedIndexWriterLookupandCriteriaBasedWriterLockmore visible, and relaxingmapReadLockandLiveIndexWriterDeletesMap.currentfromfinal, is understandable to support the new tests that need to mock and override these internals.To keep the public surface area tidy and signal intent, consider adding an explicit
@opensearch.internal(or similar) Javadoc tag or comment on these nested types/fields indicating that they are exposed primarily for testing. That helps discourage external production code from depending on them and makes future refactors easier.Also applies to: 301-301, 406-412
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(1 hunks)server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java(10 hunks)server/src/main/java/org/opensearch/index/mapper/MapperService.java(2 hunks)server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java(0 hunks)server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java(4 hunks)server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java(1 hunks)
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (4)
- server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
- server/src/main/java/org/opensearch/index/IndexSettings.java
- CHANGELOG.md
- server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: detect-breaking-change
🔇 Additional comments (3)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)
87-87: LGTM!The import is necessary for the
Collectors.toSet()operation used in thegetCompositeFieldTypes()method below.
694-696: Verify initialization safety, caller expectations, and method naming alignment.The method now filters to return only
StarTreeMapper.StarTreeFieldTypeinstances fromcompositeMappedFieldTypes:
Potential NPE risk: Verify that
getCompositeFieldTypes()is never called beforeinternalMerge()initializescompositeMappedFieldTypesat line 552. If called during early initialization phases,.stream()could fail on a null reference.Semantic narrowing: Confirm whether the method name
getCompositeFieldTypes()still accurately reflects its behavior. If other composite field type implementations exist or may be added, consider renaming togetStarTreeFieldTypes()or updating documentation to clarify the filtering behavior.Performance: If
getCompositeFieldTypes()is called frequently in hot paths, consider caching the filtered result to avoid recreating the set on each invocation.server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)
691-705: Iteration refactors over current/old writer maps look correct and improve clarity.
getFlushingBytes()andgetPendingNumDocs()now iterate directly overliveIndexWriterDeletesMap.current.criteriaBasedIndexWriterMap.values(), summing per-child metrics before adding the accumulating writer’s values. This preserves behavior and is straightforward.getTragicException()now checks bothcurrentandoldchild writers for a tragic exception before falling back to the accumulating writer, which ensures group-specific failures are surfaced.ramBytesUsed()androllback()explicitly iterate over bothcurrentandoldwriters, and the use ofmapWriteLock.acquire()around theramBytesUsed()traversals is appropriate for a consistent snapshot.Overall, these loops are clear and consistent with the data structures being used; no issues from a correctness or concurrency standpoint.
Also applies to: 731-742, 757-772, 796-805
server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java
Show resolved
Hide resolved
|
❌ Gradle check result for dbcae67: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: RS146BIJAY <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)
694-696: Consider renaming method to reflect StarTree-specific filtering.The method name
getCompositeFieldTypes()suggests it returns all composite field types, but it now filters to return onlyStarTreeMapper.StarTreeFieldTypeinstances. This could mislead callers who expect all composite types.Consider renaming to
getStarTreeFieldTypes()or adding a clarifying comment:+ /** + * Returns only StarTree composite field types. + * For all composite field types, use getCompositeFieldTypesFromMapper(). + */ public Set<CompositeMappedFieldType> getCompositeFieldTypes() { return compositeMappedFieldTypes.stream() .filter(compositeMappedFieldType -> compositeMappedFieldType instanceof StarTreeMapper.StarTreeFieldType) .collect(Collectors.toSet()); }
694-696: Consider caching the filtered result to avoid repeated stream operations.The method creates a new stream, filters, and collects to a Set on every invocation. Since
compositeMappedFieldTypesonly changes during merge operations (line 552), the filtered result could be cached in a separate volatile field and updated alongsidecompositeMappedFieldTypes.Example optimization:
private volatile Set<CompositeMappedFieldType> compositeMappedFieldTypes; +private volatile Set<CompositeMappedFieldType> starTreeFieldTypes; // In internalMerge() after line 552: this.compositeMappedFieldTypes = getCompositeFieldTypesFromMapper(); +this.starTreeFieldTypes = compositeMappedFieldTypes.stream() + .filter(type -> type instanceof StarTreeMapper.StarTreeFieldType) + .collect(Collectors.toSet()); public Set<CompositeMappedFieldType> getCompositeFieldTypes() { - return compositeMappedFieldTypes.stream() - .filter(compositeMappedFieldType -> compositeMappedFieldType instanceof StarTreeMapper.StarTreeFieldType) - .collect(Collectors.toSet()); + return starTreeFieldTypes; }server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)
475-489: Consider adding exponential backoff between retry attempts.The retry loop repeatedly calls
tryAcquire()without delay, creating a busy-wait pattern. With a default max retry of 100 (configurable up to 500), this can waste significant CPU cycles under contention.A past review comment suggested adding exponential backoff, but the current code doesn't implement it. Consider adding a small delay (e.g.,
Thread.sleep(Math.min(1L << counter, 100L))) between attempts, with appropriateInterruptedExceptionhandling.int counter = 0; while ((current == null || current.isClosed()) && counter < maxRetryOnLookupMapAcquisitionException) { // This function acquires a first read lock on a map which does not have any write lock present. Current keeps // on getting rotated during refresh, so there will be one current on which read lock can be obtained. // Validate that no write lock is applied on the map and the map is not closed. Idea here is write lock was // never applied on this map as write lock gets only during closing time. We are doing this instead of acquire, // because acquire can also apply a read lock in case refresh completed and map is closed. current = this.current.mapReadLock.tryAcquire(); if (current != null && current.isClosed() == true) { current.mapReadLock.close(); current = null; } + if (current == null && counter < maxRetryOnLookupMapAcquisitionException - 1) { + try { + Thread.sleep(Math.min(1L << counter, 100L)); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new LookupMapLockAcquisitionException(shardId, "Interrupted during retry", e); + } + } ++counter; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(1 hunks)server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java(10 hunks)server/src/main/java/org/opensearch/index/mapper/MapperService.java(2 hunks)server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java(0 hunks)server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java(4 hunks)server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java(1 hunks)
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (2)
- CHANGELOG.md
- server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: detect-breaking-change
🔇 Additional comments (9)
server/src/main/java/org/opensearch/index/IndexSettings.java (1)
499-506: LGTM! Increased retry limits for lock acquisition.The changes to
INDEX_MAX_RETRY_ON_LOOKUP_MAP_LOCK_ACQUISITION_EXCEPTIONappropriately raise the default from 15 to 100 and the maximum from 100 to 500, providing better tolerance for lock contention scenarios during index writes with context-aware grouping criteria.server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)
87-87: LGTM!The
Collectorsimport is necessary for the stream operations added ingetCompositeFieldTypes().
694-696: Verify that filtering to StarTree types only doesn't break existing functionality.This change narrows the return value to only
StarTreeMapper.StarTreeFieldTypeinstances. Ensure that:
- Other composite field type implementations (if any) are correctly handled elsewhere
- The lookup sets (
fieldsPartOfCompositeMappings,nestedFieldsPartOfCompositeMappings) built from allcompositeMappedFieldTypesremain sufficient for other composite types- External callers of
getCompositeFieldTypes()andisCompositeIndexPresent()are not expecting all composite field typesTo verify: Search the codebase for other
CompositeMappedFieldTypeimplementations, all call sites ofgetCompositeFieldTypes()andisCompositeIndexPresent(), and confirm how the filtered results are used downstream.server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)
730-730: LGTM: Retry logic correctly moved to lower layer.The removal of the exception-driven retry loop simplifies the bulk action layer. Retry handling for lookup map lock acquisition is now managed within
CompositeIndexWriter.computeIndexWriterIfAbsentForCriteria(with boundedtryAcquireloops and a configurable max retry parameter), which provides better encapsulation.server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (3)
497-502: LGTM: Finally block correctly handles lock cleanup.The finally block now uses
assertfor theisHeldByCurrentThread()check, which addresses the previous concern aboutUnsupportedOperationExceptionwhen assertions are disabled. In production, the lock is unconditionally closed on error; in tests/debug builds, ownership is validated.
692-693: LGTM: Improved encapsulation in iteration patterns.The refactoring to iterate directly over
DisposableIndexWritervalues (instead of extracting and operating onIndexWriterinstances) improves encapsulation and reduces coupling. The consistent pattern acrossgetFlushingBytes(),getPendingNumDocs(),getTragicException(),ramBytesUsed(), androllback()enhances maintainability.Also applies to: 703-704, 732-743, 759-772, 797-806
210-210: Verify that visibility relaxations don't expose internal state inappropriately.The removal of
finalmodifiers fromCriteriaBasedIndexWriterLookup,CriteriaBasedWriterLock, andLiveIndexWriterDeletesMap, along with wideningmapReadLockto package-private and makingcurrentnon-final, enables mocking in tests.While these changes support testing, ensure they don't inadvertently expose internal state or allow unintended subclassing or mutation by other classes in the package. Specifically, verify that:
mapReadLock(now package-private) is not accessed outside of test and core composite writer code- The inner classes cannot be extended by unintended classes in the same package
current(now non-final) is not reassigned outside controlled contextsserver/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (2)
72-77: LGTM: Test calls updated consistently with new signature.The calls to
computeIndexWriterIfAbsentForCriteriacorrectly pass the newMAX_NUMBER_OF_RETRIESparameter, maintaining existing test coverage while adapting to the updated signature.Also applies to: 141-146, 197-202
44-46: LGTM: New test provides good coverage of retry exhaustion.The new
testMaxRetryCountWhenWriteLockDuringIndexingtest effectively validates the bounded retry behavior by:
- Mocking lock acquisition to always fail (return
null)- Asserting that
LookupMapLockAcquisitionExceptionis thrown after exhausting retries- Verifying via Mockito that
tryAcquire()is called exactlyMAX_NUMBER_OF_RETRIEStimesThis complements the existing integration test at line 179 and provides focused unit-level coverage of the retry mechanism.
Also applies to: 208-227
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)
694-696: Consider renaming method to reflect StarTree-specific filtering.The method name
getCompositeFieldTypes()suggests it returns all composite field types, but it now filters to return onlyStarTreeMapper.StarTreeFieldTypeinstances. This could mislead callers who expect all composite types.Consider renaming to
getStarTreeFieldTypes()or adding a clarifying comment:+ /** + * Returns only StarTree composite field types. + * For all composite field types, use getCompositeFieldTypesFromMapper(). + */ public Set<CompositeMappedFieldType> getCompositeFieldTypes() { return compositeMappedFieldTypes.stream() .filter(compositeMappedFieldType -> compositeMappedFieldType instanceof StarTreeMapper.StarTreeFieldType) .collect(Collectors.toSet()); }
694-696: Consider caching the filtered result to avoid repeated stream operations.The method creates a new stream, filters, and collects to a Set on every invocation. Since
compositeMappedFieldTypesonly changes during merge operations (line 552), the filtered result could be cached in a separate volatile field and updated alongsidecompositeMappedFieldTypes.Example optimization:
private volatile Set<CompositeMappedFieldType> compositeMappedFieldTypes; +private volatile Set<CompositeMappedFieldType> starTreeFieldTypes; // In internalMerge() after line 552: this.compositeMappedFieldTypes = getCompositeFieldTypesFromMapper(); +this.starTreeFieldTypes = compositeMappedFieldTypes.stream() + .filter(type -> type instanceof StarTreeMapper.StarTreeFieldType) + .collect(Collectors.toSet()); public Set<CompositeMappedFieldType> getCompositeFieldTypes() { - return compositeMappedFieldTypes.stream() - .filter(compositeMappedFieldType -> compositeMappedFieldType instanceof StarTreeMapper.StarTreeFieldType) - .collect(Collectors.toSet()); + return starTreeFieldTypes; }server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (1)
475-489: Consider adding exponential backoff between retry attempts.The retry loop repeatedly calls
tryAcquire()without delay, creating a busy-wait pattern. With a default max retry of 100 (configurable up to 500), this can waste significant CPU cycles under contention.A past review comment suggested adding exponential backoff, but the current code doesn't implement it. Consider adding a small delay (e.g.,
Thread.sleep(Math.min(1L << counter, 100L))) between attempts, with appropriateInterruptedExceptionhandling.int counter = 0; while ((current == null || current.isClosed()) && counter < maxRetryOnLookupMapAcquisitionException) { // This function acquires a first read lock on a map which does not have any write lock present. Current keeps // on getting rotated during refresh, so there will be one current on which read lock can be obtained. // Validate that no write lock is applied on the map and the map is not closed. Idea here is write lock was // never applied on this map as write lock gets only during closing time. We are doing this instead of acquire, // because acquire can also apply a read lock in case refresh completed and map is closed. current = this.current.mapReadLock.tryAcquire(); if (current != null && current.isClosed() == true) { current.mapReadLock.close(); current = null; } + if (current == null && counter < maxRetryOnLookupMapAcquisitionException - 1) { + try { + Thread.sleep(Math.min(1L << counter, 100L)); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new LookupMapLockAcquisitionException(shardId, "Interrupted during retry", e); + } + } ++counter; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CHANGELOG.md(1 hunks)server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java(1 hunks)server/src/main/java/org/opensearch/index/IndexSettings.java(1 hunks)server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java(10 hunks)server/src/main/java/org/opensearch/index/mapper/MapperService.java(2 hunks)server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java(0 hunks)server/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java(4 hunks)server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java(1 hunks)
💤 Files with no reviewable changes (1)
- server/src/test/java/org/opensearch/action/bulk/TransportShardBulkActionTests.java
🚧 Files skipped from review as they are similar to previous changes (2)
- CHANGELOG.md
- server/src/test/java/org/opensearch/index/engine/CriteriaBasedCompositeIndexWriterBaseTests.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: Analyze (java)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: detect-breaking-change
🔇 Additional comments (9)
server/src/main/java/org/opensearch/index/IndexSettings.java (1)
499-506: LGTM! Increased retry limits for lock acquisition.The changes to
INDEX_MAX_RETRY_ON_LOOKUP_MAP_LOCK_ACQUISITION_EXCEPTIONappropriately raise the default from 15 to 100 and the maximum from 100 to 500, providing better tolerance for lock contention scenarios during index writes with context-aware grouping criteria.server/src/main/java/org/opensearch/index/mapper/MapperService.java (2)
87-87: LGTM!The
Collectorsimport is necessary for the stream operations added ingetCompositeFieldTypes().
694-696: Verify that filtering to StarTree types only doesn't break existing functionality.This change narrows the return value to only
StarTreeMapper.StarTreeFieldTypeinstances. Ensure that:
- Other composite field type implementations (if any) are correctly handled elsewhere
- The lookup sets (
fieldsPartOfCompositeMappings,nestedFieldsPartOfCompositeMappings) built from allcompositeMappedFieldTypesremain sufficient for other composite types- External callers of
getCompositeFieldTypes()andisCompositeIndexPresent()are not expecting all composite field typesTo verify: Search the codebase for other
CompositeMappedFieldTypeimplementations, all call sites ofgetCompositeFieldTypes()andisCompositeIndexPresent(), and confirm how the filtered results are used downstream.server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (1)
730-730: LGTM: Retry logic correctly moved to lower layer.The removal of the exception-driven retry loop simplifies the bulk action layer. Retry handling for lookup map lock acquisition is now managed within
CompositeIndexWriter.computeIndexWriterIfAbsentForCriteria(with boundedtryAcquireloops and a configurable max retry parameter), which provides better encapsulation.server/src/main/java/org/opensearch/index/engine/CompositeIndexWriter.java (3)
497-502: LGTM: Finally block correctly handles lock cleanup.The finally block now uses
assertfor theisHeldByCurrentThread()check, which addresses the previous concern aboutUnsupportedOperationExceptionwhen assertions are disabled. In production, the lock is unconditionally closed on error; in tests/debug builds, ownership is validated.
692-693: LGTM: Improved encapsulation in iteration patterns.The refactoring to iterate directly over
DisposableIndexWritervalues (instead of extracting and operating onIndexWriterinstances) improves encapsulation and reduces coupling. The consistent pattern acrossgetFlushingBytes(),getPendingNumDocs(),getTragicException(),ramBytesUsed(), androllback()enhances maintainability.Also applies to: 703-704, 732-743, 759-772, 797-806
210-210: Verify that visibility relaxations don't expose internal state inappropriately.The removal of
finalmodifiers fromCriteriaBasedIndexWriterLookup,CriteriaBasedWriterLock, andLiveIndexWriterDeletesMap, along with wideningmapReadLockto package-private and makingcurrentnon-final, enables mocking in tests.While these changes support testing, ensure they don't inadvertently expose internal state or allow unintended subclassing or mutation by other classes in the package. Specifically, verify that:
mapReadLock(now package-private) is not accessed outside of test and core composite writer code- The inner classes cannot be extended by unintended classes in the same package
current(now non-final) is not reassigned outside controlled contextsserver/src/test/java/org/opensearch/index/engine/CompositeIndexWriterForAppendTests.java (2)
72-77: LGTM: Test calls updated consistently with new signature.The calls to
computeIndexWriterIfAbsentForCriteriacorrectly pass the newMAX_NUMBER_OF_RETRIESparameter, maintaining existing test coverage while adapting to the updated signature.Also applies to: 141-146, 197-202
44-46: LGTM: New test provides good coverage of retry exhaustion.The new
testMaxRetryCountWhenWriteLockDuringIndexingtest effectively validates the bounded retry behavior by:
- Mocking lock acquisition to always fail (return
null)- Asserting that
LookupMapLockAcquisitionExceptionis thrown after exhausting retries- Verifying via Mockito that
tryAcquire()is called exactlyMAX_NUMBER_OF_RETRIEStimesThis complements the existing integration test at line 179 and provides focused unit-level coverage of the retry mechanism.
Also applies to: 208-227
|
❌ Gradle check result for 7ee65b1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20145 +/- ##
============================================
- Coverage 73.30% 73.21% -0.09%
- Complexity 71732 71749 +17
============================================
Files 5793 5793
Lines 328056 328045 -11
Branches 47245 47243 -2
============================================
- Hits 240476 240193 -283
- Misses 68264 68567 +303
+ Partials 19316 19285 -31 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| if (current != null && current.isClosed() == true) { | ||
| current.mapReadLock.close(); | ||
| current = null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this logic centrally in close(). Also didn't quite understand why the close operation done as a part of write lock acquisition doesn't handle this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This handles the scenario where try acquire succeded in obtaining the lock on the current writer but the map itself rotated and the writer got closed. In that case, we close the old writer as we retry to obtain lock again on the current. As this ensures the lock is correctly released on the old writer before we try acquiring lock on new writer.
| for (DisposableIndexWriter disposableIndexWriter : liveIndexWriterDeletesMap.old.criteriaBasedIndexWriterMap.values()) { | ||
| if (disposableIndexWriter.getIndexWriter().isOpen() == true) { | ||
| disposableIndexWriter.getIndexWriter().rollback(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be thread-safe for instance the index writer might be closed while we are doing a rollback?
Description
Fixing indexing regression and bug fixes for grouping criteria. For testing grouping criteria changes, enabled the grouping criteria on local and tested with setting criteria. Wil raise the changes for integ test enablement for CAS in a separate PR as that require decent changes in integ test as well.
Related Issues
#19919
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
Bug Fixes
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.