-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Handle custom metadata files in subdirectory-store #20157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: xuxiong1 <[email protected]>
WalkthroughLoads and validates metadata for both segment files and custom non-segment files (prefixed Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java (2)
140-154: Consider more precise segment file identification.The check
fileName.contains(IndexFileNames.SEGMENTS)could potentially match non-segment files that happen to contain "segments" in their name. Consider using a more precise check:- if (fileName.contains(IndexFileNames.SEGMENTS)) { + String localFileName = filePath.getFileName().toString(); + if (localFileName.startsWith(IndexFileNames.SEGMENTS)) {This ensures only actual segment files (prefixed with "segments_") are matched, rather than any file containing "segments" anywhere in the path.
247-247: Consider documenting the version selection rationale.Using
minimumIndexCompatibilityVersion().luceneVersionis a reasonable fallback for non-segment files where the actual writer version is unknown. Consider adding a brief comment explaining this choice for future maintainability.+ // Use minimum compatible version as fallback since non-segment files don't store writer version Version version = org.opensearch.Version.CURRENT.minimumIndexCompatibilityVersion().luceneVersion;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java(7 hunks)modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: detect-breaking-change
- GitHub Check: Analyze (java)
🔇 Additional comments (9)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (6)
11-22: LGTM!The new imports for
CodecUtil,IOContext, andIndexOutputare necessary for the non-segment file creation functionality.
116-133: LGTM!The verification logic correctly aggregates both segment files (from
SegmentInfos) and non-segment files (by prefix matching), ensuring comprehensive recovery validation.
146-157: LGTM!The additional verification ensures non-segment files are present in the subdirectory after recovery, which is essential for validating the new functionality.
205-211: LGTM!The constant and counter are well-defined. The counter is an instance field, which is appropriate for this test engine context where each shard has its own engine instance.
259-273: Well-structured non-segment file creation.The method properly creates a Lucene-compatible file with
CodecUtil.writeHeaderandCodecUtil.writeFooter, ensuring the checksum can be validated during recovery. The use of try-with-resources ensures proper resource cleanup.
343-351: LGTM!The modification to include non-segment files in
getFileNames()is consistent with the verification logic and ensures these files are properly included in the recovery process.modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java (3)
13-23: LGTM!The new imports for
CodecUtilandVersionare necessary for the checksum computation and version handling in the newcomputeFileMetadatamethod.
127-128: LGTM!The updated comment and method call correctly reflect the expanded scope to include non-segment files.
158-167: LGTM!The processing logic correctly handles both file types:
- Segment files are processed via
loadMetadataFromSubdirectorySegmentsFilewhich aggregates all referenced segment files- Non-segment files are only processed if not already present in the builder, avoiding duplicate work
...directory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java
Show resolved
Hide resolved
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @xuxiong1. * #20157 (comment) The following files were modified: * `modules/store-subdirectory/src/main/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareStore.java`
|
❌ Gradle check result for 65c8087: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: xuxiong1 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (1)
342-350: LGTM!The addition correctly includes non-segment files in the
IndexCommit.getFileNames()result, ensuring these files are included during recovery. The logic appropriately mirrors the verification method for consistency.Optional: The non-segment file detection logic (filtering by prefix) is duplicated here and in
verifySubdirectoryFilesOnAllNodes. Consider extracting a helper method likeisNonSegmentFile(String fileName)to centralize this logic, though for test code the current approach is acceptable.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.md(1 hunks)modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java(7 hunks)
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: detect-breaking-change
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: Analyze (java)
🔇 Additional comments (5)
modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (5)
11-22: LGTM!The new imports are necessary for creating Lucene-compatible non-segment files with proper codec headers and footers.
116-133: LGTM!The verification logic correctly aggregates both segment files (via
SegmentInfos) and non-segment files (via prefix filtering) to ensure complete file coverage during recovery verification.
146-156: LGTM!Good addition of the assertion to verify non-segment files are actually present. This ensures the test validates the intended functionality rather than silently passing when no non-segment files exist.
204-210: LGTM!The constant for non-segment file prefix and the instance-level counter for generating unique file names are appropriately defined.
251-272: LGTM!The
createNonSegmentFilemethod correctly creates Lucene-compatible files with proper codec headers and footers. This ensures the non-segment files have valid checksums that can be verified during recovery, which aligns with the PR's goal of testing custom metadata file handling.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20157 +/- ##
============================================
- Coverage 73.25% 73.19% -0.07%
- Complexity 71684 71706 +22
============================================
Files 5788 5793 +5
Lines 327866 328073 +207
Branches 47218 47248 +30
============================================
- Hits 240194 240134 -60
- Misses 68399 68599 +200
- Partials 19273 19340 +67 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: xuxiong1 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
modules/store-subdirectory/src/test/java/org/opensearch/plugin/store/subdirectory/SubdirectoryStorePluginTests.java (5)
150-187: TestStoreSetup/createTestStoreSetup look sound, but consider reusing in existing testsThe holder and factory encapsulate shard/store setup cleanly and mirror the existing
testStatssetup; construction ofSubdirectoryAwareStoreand paths looks consistent. You might optionally refactortestStatsto usecreateTestStoreSetup()as well to avoid duplication and keep test configuration changes in one place.
241-292: Mixed files test logic is good; re-use subdirectory path helper to avoid driftThe mixed segment/non‑segment scenario and duplicate‑detection assertions are solid and should catch regressions in how
getMetadatamerges file lists.Same concern as above about
setup.path.resolve("subdir"): it would be safer to compute this path via a shared helper (or a constant matching the production subdirectory name and relative root) so both tests and implementation refer to the same physical location.
294-349: Checksum validation test is strong; consider asserting presence of both files explicitlyThis test nicely validates that the checksum produced by the store metadata matches a direct
CodecUtil.checksumEntireFilecomputation and that different content yields different checksums.Two small suggestions:
- Add explicit
assertTrue(metadata.contains("subdir/checksum_test.dat"))/contains("subdir/checksum_test2.dat")before reading frommetadata2to get clearer failure messages if the files are accidentally filtered out.- As with the other tests, consider factoring out subdirectory path computation to ensure it stays aligned with the production subdirectory location.
351-361: writeTestFileWithCodec helper is correct; optional deterministic timestampThe helper correctly writes a Lucene‑style header/footer and some payload, which is appropriate for files that will be checksummed via Lucene APIs.
If you ever need bit‑for‑bit deterministic test data (e.g., across JVMs or runs), you could replace
System.currentTimeMillis()with a fixed value. For the current use (only comparing different files), the current approach is fine.
363-460: TestIndexCommit implementation is generally correct; consider deduplication and small robustness tweaksThe custom
TestIndexCommitcorrectly:
- Delegates segment metadata to a captured
SegmentInfos.- Aggregates main‑directory segment files plus any subdirectory segment files (when a commit exists).
- Adds non‑segment “custom metadata” files while filtering out Lucene internals and test infrastructure via
isCustomMetadataFile.A few optional improvements:
Avoid shadowing
directory: insidegetFileNames, the localDirectory directory = FSDirectory.open(subdirPath)shadows the fielddirectory. Renaming the local (e.g.,subdirDirectory) would improve readability.Return a Set to enforce uniqueness: using a
Set<String>forallFileswould guarantee uniqueness at construction time rather than relying on downstreamdistinct()checks.Code reuse with existing TestIndexCommit: there’s a very similar
TestIndexCommitinSubdirectoryAwareRecoveryTests. If feasible, consider sharing a common helper (e.g., in a test utility) to avoid divergence in how file lists are composed and filtered.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modules/store-subdirectory/src/test/java/org/opensearch/plugin/store/subdirectory/SubdirectoryStorePluginTests.java(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
modules/store-subdirectory/src/test/java/org/opensearch/plugin/store/subdirectory/SubdirectoryStorePluginTests.java (2)
libs/common/src/main/java/org/opensearch/common/util/io/IOUtils.java (1)
IOUtils(58-317)modules/store-subdirectory/src/internalClusterTest/java/org/opensearch/plugin/store/subdirectory/SubdirectoryAwareRecoveryTests.java (1)
TestIndexCommit(314-385)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
- GitHub Check: gradle-check
- GitHub Check: Analyze (java)
- GitHub Check: precommit (25, ubuntu-24.04-arm)
- GitHub Check: precommit (25, macos-15-intel)
- GitHub Check: precommit (21, windows-2025, true)
- GitHub Check: precommit (25, windows-latest)
- GitHub Check: precommit (21, macos-15-intel)
- GitHub Check: precommit (25, macos-15)
- GitHub Check: precommit (25, ubuntu-latest)
- GitHub Check: precommit (21, ubuntu-24.04-arm)
- GitHub Check: precommit (21, ubuntu-latest)
- GitHub Check: precommit (21, windows-latest)
- GitHub Check: precommit (21, macos-15)
- GitHub Check: detect-breaking-change
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (21, ubuntu-latest)
...ory/src/test/java/org/opensearch/plugin/store/subdirectory/SubdirectoryStorePluginTests.java
Show resolved
Hide resolved
yupeng9
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good
| try (IndexInput in = dir.openInput(localFileName, IOContext.READONCE)) { | ||
| long length = in.length(); | ||
| String checksum = Store.digestToString(CodecUtil.checksumEntireFile(in)); | ||
| Version version = org.opensearch.Version.CURRENT.minimumIndexCompatibilityVersion().luceneVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for non-segment info file, we also need lucene version as placeholder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you check the constructor of StoreFileMetadata.java, the version is mandatory.
this.writtenBy = Objects.requireNonNull(writtenBy, "writtenBy must not be null");
Description
Added support for handling non-segment files (custom metadata) in subdirectory-aware store recovery.
loadSubdirectoryMetadataFromSegments()toloadSubdirectoryMetadata()computeFileMetadata()to calculate metadata (checksum, length) for non-segment filesCheck List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.