Skip to content

Conversation

@gbbafna
Copy link
Contributor

@gbbafna gbbafna commented Dec 2, 2025

Description

Currently we delete the files one by one in Remote Segment Store. This is highly inefficient , as for every segment_n file , we end up making 5 to 10 delete calls instead of batching them together .

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Added batch deletion functionality that deletes multiple files in a single operation, improving efficiency and reducing I/O overhead. Error handling consolidated for batch operations.
  • Tests

    • Added comprehensive test coverage for batch deletion scenarios, including error handling and edge cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 2, 2025

Walkthrough

The pull request introduces batch deletion capabilities to the remote directory layer by adding a new deleteFiles(List<String>) method to RemoteDirectory and refactoring RemoteSegmentStoreDirectory to use batch deletion instead of per-file operations, consolidating error handling and cache updates accordingly.

Changes

Cohort / File(s) Summary
Batch deletion API
server/src/main/java/org/opensearch/index/store/RemoteDirectory.java
Adds new public method deleteFiles(List<String> names) that delegates to blobContainer.deleteBlobsIgnoringIfNotExists(), ignoring missing files and handling null/empty input as no-op.
Batch deletion integration
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java
Refactors deletion logic from iterative per-file deletes to single batch operation via deleteFiles(). Consolidates error handling into one try-catch block, moves cache updates to execute after successful batch deletion, and maintains metadata file deletion only upon batch success.
Batch deletion API tests
server/src/test/java/org/opensearch/index/store/RemoteDirectoryTests.java
Adds five new test methods (testDeleteFiles, testDeleteFilesEmptyCollection, testDeleteFilesNullCollection, testDeleteFilesWithNonExistentFiles, testDeleteFilesException) validating deleteFiles behavior including edge cases and exception propagation.
Batch deletion integration tests
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java
Adds two new batch-specific tests and updates existing deletion tests to verify batch deleteFiles() calls with order-insensitive matching; introduces Set-based verification and adjusts logging expectations for batch operation context.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Areas requiring extra attention:

  • Error handling consolidation in RemoteSegmentStoreDirectory.java: verify that cache consistency is maintained when batch deletion fails
  • Cache update ordering: confirm deletedSegmentFiles and segmentsUploadedToRemoteStore are updated correctly only after successful batch deletion
  • Test assertions in RemoteSegmentStoreDirectoryTests.java: review batch matching logic using argThat and Set equality to ensure all test scenarios are correctly validated
  • Metadata file deletion flow: ensure it remains conditional on batch success

Poem

🐰 A batch of files hops toward deletion's call,
No longer one-by-one, but all in thrall!
Error handling grouped, caches sync with care,
Efficiency blooms in the refactored air. 🌱

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: introducing batch deletion operations in remote segment store instead of per-file deletions.
Description check ✅ Passed The description explains the problem (per-file deletes are inefficient) and the solution (batch operations), with testing confirmed. However, the Related Issues section is incomplete as the issue number placeholder is not filled in.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

❌ Gradle check result for a71020a: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1)

1023-1034: Consider using a plain boolean instead of AtomicBoolean.

The deletionSuccessful variable is only accessed within a single thread (no concurrent access), so AtomicBoolean is unnecessary overhead. A simple boolean variable would suffice.

-            AtomicBoolean deletionSuccessful = new AtomicBoolean(true);
+            boolean deletionSuccessful = true;
             try {
                 // Batch delete all stale segment files
                 remoteDataDirectory.deleteFiles(filesToDelete);
                 deletedSegmentFiles.addAll(filesToDelete);

                 // Update cache after successful batch deletion
                 for (String file : filesToDelete) {
                     if (!activeSegmentFilesMetadataMap.containsKey(getLocalSegmentFilename(file))) {
                         segmentsUploadedToRemoteStore.remove(getLocalSegmentFilename(file));
                     }
                 }
             } catch (IOException e) {
-                deletionSuccessful.set(false);
+                deletionSuccessful = false;
                 logger.warn(
                     ...
                 );
             }
-            if (deletionSuccessful.get()) {
+            if (deletionSuccessful) {
                 logger.debug("Deleting stale metadata file {} from remote segment store", metadataFile);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f76826c and a71020a.

📒 Files selected for processing (4)
  • server/src/main/java/org/opensearch/index/store/RemoteDirectory.java (1 hunks)
  • server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1 hunks)
  • server/src/test/java/org/opensearch/index/store/RemoteDirectoryTests.java (1 hunks)
  • server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (5 hunks)
🧰 Additional context used
🪛 GitHub Actions: Gradle Precommit
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java

[error] 1037-1037: Bad usage of logger: Wrong usage of org.apache.logging.log4j.Logger#warn detected by OpenSearchLoggerUsageChecker (likely incorrect arguments).

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: gradle-check
🔇 Additional comments (10)
server/src/main/java/org/opensearch/index/store/RemoteDirectory.java (1)

187-201: LGTM! Clean batch delete implementation.

The new deleteFiles method correctly handles edge cases (null/empty) and delegates to the underlying blob container's batch delete API, maintaining consistency with the existing deleteFile method behavior.

server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java (1)

1016-1021: LGTM! Batch file collection logic is correct.

The filtering logic correctly identifies files to delete by excluding:

  1. Files present in the active segment set
  2. Files already deleted in previous iterations

This ensures safe batch deletion without affecting active segments.

server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java (7)

68-68: LGTM!

Import for argThat matcher added to support order-independent verification of batch delete calls.


960-982: LGTM! Test correctly validates batch deletion semantics.

The test properly uses Set comparison with argThat for order-independent verification of the batch delete call, which is appropriate since the order of files in the delete operation doesn't affect correctness.


991-1011: LGTM!

Test correctly validates that batch deletion respects metadata file locks.


1065-1092: LGTM! Deduplication test correctly validates batch behavior.

The test verifies that:

  1. First batch deletes unique stale files from metadataFilename3
  2. Second batch (for metadataFilename4 with duplicate content) correctly passes an empty list since files were already deleted

1094-1124: LGTM!

Test correctly validates that when batch deletion fails with an IOException, the metadata file is preserved (not deleted), maintaining data consistency.


1126-1152: LGTM!

Test correctly validates that the batch delete operation (using deleteBlobsIgnoringIfNotExists) handles non-existent files gracefully, allowing the metadata file deletion to proceed.


1190-1231: LGTM! Error handling test is thorough.

The test correctly validates:

  1. Batch delete is attempted with correct files
  2. Metadata file is preserved when deletion fails
  3. Cache entries remain intact after failed deletion (correctly using local filenames for verification)
server/src/test/java/org/opensearch/index/store/RemoteDirectoryTests.java (1)

201-255: LGTM! Comprehensive test coverage for the new batch delete method.

The five new tests provide thorough coverage:

  • Normal operation with multiple files
  • Empty collection handling (no-op)
  • Null collection handling (no-op)
  • Mixed existent/non-existent files
  • Exception propagation

Tests follow the existing patterns in the file and use appropriate verification.

Comment on lines +1035 to +1042
} catch (IOException e) {
deletionSuccessful.set(false);
logger.warn(
"Exception while deleting segment files corresponding to metadata file {}. Deletion will be re-tried",
metadataFile,
e
);
}
Copy link

@coderabbitai coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix incorrect logger usage causing pipeline failure.

The pipeline reports "Bad usage of logger" at line 1037. The logger.warn call has a format string with one placeholder {} but passes two additional arguments (metadataFile, e). For Log4j2 parameterized logging with an exception, the exception should be the last argument after all placeholder substitutions are complete.

Apply this diff to fix the logger usage:

             } catch (IOException e) {
                 deletionSuccessful.set(false);
                 logger.warn(
-                    "Exception while deleting segment files corresponding to metadata file {}. Deletion will be re-tried",
-                    metadataFile,
-                    e
+                    () -> new ParameterizedMessage(
+                        "Exception while deleting segment files corresponding to metadata file {}. Deletion will be re-tried",
+                        metadataFile
+                    ),
+                    e
                 );
             }
🧰 Tools
🪛 GitHub Actions: Gradle Precommit

[error] 1037-1037: Bad usage of logger: Wrong usage of org.apache.logging.log4j.Logger#warn detected by OpenSearchLoggerUsageChecker (likely incorrect arguments).

🤖 Prompt for AI Agents
In
server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java
around lines 1035-1042, the logger.warn call is using parameterized message and
passing the exception in the middle of the varargs which triggers "Bad usage of
logger"; ensure the Throwable is the final argument by calling
logger.warn("Exception while deleting segment files corresponding to metadata
file {}. Deletion will be re-tried", metadataFile, e) so the metadataFile fills
the placeholder and the exception is passed as the last parameter to be logged
with the stack trace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a valid catch

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@linuxpi, thank you for confirming! The fix will ensure the exception is properly logged with its stack trace by Log4j2. 🐰

Comment on lines +1181 to +1188
// Get the cache after deletion
Map<String, RemoteSegmentStoreDirectory.UploadedSegmentMetadata> cacheAfter = remoteSegmentStoreDirectory
.getSegmentsUploadedToRemoteStore();
// Verify that unique files from deleted metadata are removed from cache
for (String file : expectedFilesToDelete) {
assertFalse("File " + file + " should not be in cache after deletion", cacheAfter.containsKey(file));
}
}
Copy link

@coderabbitai coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect cache verification logic - test assertion will always pass.

The cache (segmentsUploadedToRemoteStore) uses local filenames as keys (e.g., "_0.cfe"), but expectedFilesToDelete contains remote filenames (e.g., "_0.cfe__uuid"). The assertion cacheAfter.containsKey(file) will always be false for remote filenames, making this test pass even if the cache wasn't properly updated.

Apply this diff to fix the assertion:

         // Verify that unique files from deleted metadata are removed from cache
         for (String file : expectedFilesToDelete) {
-            assertFalse("File " + file + " should not be in cache after deletion", cacheAfter.containsKey(file));
+            String localFilename = file.split(RemoteSegmentStoreDirectory.SEGMENT_NAME_UUID_SEPARATOR)[0];
+            assertFalse("File " + localFilename + " should not be in cache after deletion", cacheAfter.containsKey(localFilename));
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Get the cache after deletion
Map<String, RemoteSegmentStoreDirectory.UploadedSegmentMetadata> cacheAfter = remoteSegmentStoreDirectory
.getSegmentsUploadedToRemoteStore();
// Verify that unique files from deleted metadata are removed from cache
for (String file : expectedFilesToDelete) {
assertFalse("File " + file + " should not be in cache after deletion", cacheAfter.containsKey(file));
}
}
// Get the cache after deletion
Map<String, RemoteSegmentStoreDirectory.UploadedSegmentMetadata> cacheAfter = remoteSegmentStoreDirectory
.getSegmentsUploadedToRemoteStore();
// Verify that unique files from deleted metadata are removed from cache
for (String file : expectedFilesToDelete) {
String localFilename = file.split(RemoteSegmentStoreDirectory.SEGMENT_NAME_UUID_SEPARATOR)[0];
assertFalse("File " + localFilename + " should not be in cache after deletion", cacheAfter.containsKey(localFilename));
}
}
🤖 Prompt for AI Agents
In
server/src/test/java/org/opensearch/index/store/RemoteSegmentStoreDirectoryTests.java
around lines 1181-1188, the test incorrectly checks cache keys using remote
filenames (which include the "__uuid" suffix) while the cache stores local
filenames (without the suffix), so the assertion always passes; to fix, convert
each entry in expectedFilesToDelete to the local filename before asserting
(e.g., strip the "__" and trailing UUID part or parse until the first "__" and
use that substring), then assert that cacheAfter does not
containKey(localFilename) for each converted name, ensuring the cache
verification actually validates removal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is valid as well. cacheAfter would contain keys with files name without uuid

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how segmentsUploadedToRemoteStore is populated and what keys it uses
echo "=== Searching for segmentsUploadedToRemoteStore usage ==="
rg -n -C5 'segmentsUploadedToRemoteStore\.put' --type=java

echo -e "\n=== Checking UploadedSegmentMetadata and cache key structure ==="
rg -n -C3 'class.*UploadedSegmentMetadata' --type=java

echo -e "\n=== Looking at how cache is accessed with containsKey ==="
rg -n -C3 'segmentsUploadedToRemoteStore.*containsKey' --type=java

Length of output: 6469


🏁 Script executed:

#!/bin/bash
# Look at the actual deletion implementation to understand cache update
echo "=== Searching for deleteStaleCommits or deleteStale in RemoteSegmentStoreDirectory ==="
ast-grep --pattern 'deleteStale$$$($$$) {
  $$$
}'

echo -e "\n=== Check how cache entries are removed after deletion ==="
rg -n -C5 'segmentsUploadedToRemoteStore\.remove' --type=java

Length of output: 2951


🏁 Script executed:

#!/bin/bash
# Look at the metadata format and how local vs remote filenames are used
echo "=== Check SEGMENT_NAME_UUID_SEPARATOR usage ==="
rg -n -C3 'SEGMENT_NAME_UUID_SEPARATOR' --type=java server/src/main/java/org/opensearch/index/store/

echo -e "\n=== Check how metadata maps local to remote names ==="
rg -n -C5 'getMetadata\(\)' --type=java server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java

Length of output: 5873


@linuxpi Thank you for the clarification! I understand your point, but I believe there's actually an issue with the test logic. Let me explain:

The cache does store local filenames as keys (you're correct about that), but expectedFilesToDelete contains remote filenames with UUIDs. This creates a mismatch:

Evidence from the implementation:

  1. Cache keys are local filenames:

    // Line 700: When adding to cache
    segmentsUploadedToRemoteStore.put(src, segmentMetadata);  // 'src' is local filename
    
    // Line 513: When removing from cache  
    segmentsUploadedToRemoteStore.remove(name);  // 'name' is local filename
  2. The actual deletion implementation (Line 1032) correctly converts remote to local:

    segmentsUploadedToRemoteStore.remove(getLocalSegmentFilename(file));

The test's issue:

The test extracts remote filenames:

Set<String> expectedFilesToDelete = metadataFilenameContentMapping.get(metadataFilename3)
    .values()
    .stream()
    .map(metadata -> metadata.split(RemoteSegmentStoreDirectory.UploadedSegmentMetadata.SEPARATOR)[1])
    .collect(Collectors.toSet());

This gets index [1] from the split, which is the remote filename (e.g., _0.cfe__uuid), not the local filename (e.g., _0.cfe).

Then it checks:

cacheAfter.containsKey(file)  // 'file' is "_0.cfe__uuid", but cache key is "_0.cfe"

This will always return false, making the test pass without actually validating cache cleanup.

The fix should mirror what the implementation does:

for (String file : expectedFilesToDelete) {
    String localFilename = file.split(RemoteSegmentStoreDirectory.SEGMENT_NAME_UUID_SEPARATOR)[0];
    assertFalse("File " + localFilename + " should not be in cache after deletion", cacheAfter.containsKey(localFilename));
}

Does this clarification help explain the issue?


🧬 Code Graph Analysis Results

server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java

<possibly_relevant_code_snippets>

Snippet 1: MetadataFilenameUtils (relevance: filename structure, nodeId, and parsing)

File: server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java
Lines: 404-486

public static class MetadataFilenameUtils {
        public static final String SEPARATOR = "__";
        public static final String METADATA_PREFIX = "metadata";

        static String getMetadataFilePrefixForCommit(long primaryTerm, long generation) {
            return String.join(
                SEPARATOR,
                METADATA_PREFIX,
                RemoteStoreUtils.invertLong(primaryTerm),
                RemoteStoreUtils.invertLong(generation)
            );
        }

        // Visible for testing
        public static String getMetadataFilename(
            long primaryTerm,
            long generation,
            long translogGeneration,
            long uploadCounter,
            int metadataVersion,
            String nodeId,
            long creationTimestamp
        ) {
            return String.join(
                SEPARATOR,
                METADATA_PREFIX,
                RemoteStoreUtils.invertLong(primaryTerm),
                RemoteStoreUtils.invertLong(generation),
                RemoteStoreUtils.invertLong(translogGeneration),
                RemoteStoreUtils.invertLong(uploadCounter),
                String.valueOf(Objects.hash(nodeId)),
                RemoteStoreUtils.invertLong(creationTimestamp),
                String.valueOf(metadataVersion)
            );
        }

        public static String getMetadataFilename(
            long primaryTerm,
            long generation,
            long translogGeneration,
            long uploadCounter,
            int metadataVersion,
            String nodeId
        ) {
            return getMetadataFilename(
                primaryTerm,
                generation,
                translogGeneration,
                uploadCounter,
                metadataVersion,
                nodeId,
                System.currentTimeMillis()
            );
        }

        // Visible for testing
        static long getPrimaryTerm(String[] filenameTokens) {
            return RemoteStoreUtils.invertLong(filenameTokens[1]);
        }

        // Visible for testing
        static long getGeneration(String[] filenameTokens) {
            return RemoteStoreUtils.invertLong(filenameTokens[2]);
        }

        public static long getTimestamp(String filename) {
            String[] filenameTokens = filename.split(SEPARATOR);
            return RemoteStoreUtils.invertLong(filenameTokens[filenameTokens.length - 2]);
        }

        public static Tuple<String, String> getNodeIdByPrimaryTermAndGen(String filename) {
            String[] tokens = filename.split(SEPARATOR);
            if (tokens.length < 8) {
                // For versions < 2.11, we don't have node id.
                return null;
            }
            String primaryTermAndGen = String.join(SEPARATOR, tokens[1], tokens[2], tokens[3]);

            String nodeId = tokens[5];
            return new Tuple<>(primaryTermAndGen, nodeId);
        }

    }

Snippet 2: Insertion point for nodeId handling in metadata filenames (context)

File: server/src/main/java/org/opensearch/index/store/RemoteSegmentStoreDirectory.java
Lines: 404-486

        // Visible for testing
        static long getPrimaryTerm(String[] filenameTokens) {
            return RemoteStoreUtils.invertLong(filenameTokens[1]);
        }

        // Visible for testing
        static long getGeneration(String[] filenameTokens) {
            return RemoteStoreUtils.invertLong(filenameTokens[2]);
        }

        public static long getTimestamp(String filename) {
            String[] filenameTokens = filename.split(SEPARATOR);
            return RemoteStoreUtils.invertLong(filenameTokens[filenameTokens.length - 2]);
        }

        public static Tuple<String, String> getNodeIdByPrimaryTermAndGen(String filename) {
            String[] tokens = filename.split(SEPARATOR);
            if (tokens.length < 8) {
                // For versions < 2.11, we don't have node id.
                return null;
            }
            String primaryTermAndGen = String.join(SEPARATOR, tokens[1], tokens[2], tokens[3]);

            String nodeId = tokens[5];
            return new Tuple<>(primaryTermAndGen, nodeId);
        }

</possibly_relevant_code_snippets>

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

❌ Gradle check result for a71020a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

❌ Gradle check result for a71020a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ajaymovva
Copy link
Contributor

Few nit picks from above comments. Rest LGTM.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2025

❌ Gradle check result for a71020a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants