Skip to content

[NSFS | NC | GLACIER] Add support for tape reclaim#9241

Open
tangledbytes wants to merge 1 commit intonoobaa:masterfrom
tangledbytes:utkarsh/feat/add-tape-reclaim
Open

[NSFS | NC | GLACIER] Add support for tape reclaim#9241
tangledbytes wants to merge 1 commit intonoobaa:masterfrom
tangledbytes:utkarsh/feat/add-tape-reclaim

Conversation

@tangledbytes
Copy link
Member

@tangledbytes tangledbytes commented Oct 13, 2025

Describe the Problem

Tape keeps on accumulating deleted data and needs a way to reclaim the space. IBM Diamondback has this feature but isn't exposed in NooBaa. This PR adds support for the same feature.

Explain the Changes

This PR adds support for tape reclaim. It adds a new command under noobaa-cli glacier commands. Tape reclaim feature just like restore and migrate relies on journaling but it is disabled by default and must be enabled by NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM to true.

Entries are added to the reclaim log/journal under 3 circumstances:

  1. When the object is explicitly deleted
  2. When the object is overwritten
  3. When the object is automatically deleted due to lifecycle configuration

Testing Instructions:

./node_modules/.bin/mocha src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • Adds automated tape reclaim for Glacier with a configurable reclaim interval and a new CLI reclaim action.
    • Records reclaimable objects during uploads, overwrites, and deletions to a reclaim log for later processing.
  • Improvements

    • Adds DMAPI tape UID support and cluster locking/timestamping to coordinate reclaim runs and reduce conflicts.
  • Tests

    • Adds unit tests verifying reclaim logging and reclaim behavior.
  • Style

    • Minor documentation/JSDoc formatting tweaks.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Oct 13, 2025

📝 Walkthrough

Walkthrough

Adds end-to-end Glacier tape reclaim: new config flags, CLI reclaim action, manage_nsfs reclaim handler, Glacier reclaim flow and cluster lock, TapeCloud reclaim execution, NamespaceFS reclaim WAL logging, native DMAPI xattr exposure for tape UID, and tests exercising reclaim logging.

Changes

Cohort / File(s) Summary
Configuration flags
config.js
Adds NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM (default false) and NSFS_GLACIER_RECLAIM_INTERVAL (default 15 * 60 * 1000).
CLI / Manage constants
src/cmd/manage_nsfs.js, src/manage_nsfs/manage_nsfs_constants.js
Adds GLACIER_ACTIONS.RECLAIM, maps CLI option, and routes RECLAIM to the new process handler.
Manage reclaim process
src/manage_nsfs/manage_nsfs_glacier.js
Adds process_reclaim() mirroring expiry flow: fs_context, free-space and interval gating, calls Glacier reclaim, writes reclaim timestamp.
Glacier SDK
src/sdk/glacier.js
Adds constants (RECLAIM_TIMESTAMP_FILE, GPFS_DMAPI_XATTR_TAPE_UID, RECLAIM_WAL_NAME, GLACIER_RECLAIM_CLUSTER_LOCK), new reclaim(fs_context, log_file, failure_recorder) method, and perform(..., "RECLAIM") branch that acquires cluster lock and processes the reclaim WAL.
TapeCloud integration
src/sdk/glacier_tapecloud.js
Adds TapeCloudUtils.RECLAIM_SCRIPT and reclaim(file); adds TapeCloudGlacier.reclaim(...) and helper _reclaim(file) to run reclaim script and log errors (non-throwing).
NamespaceFS reclaim WAL
src/sdk/namespace_fs.js
Adds append_to_reclaim_wal(fs_context, file_path, stat), static getter reclaim_wal, and NamespaceFS._reclaim_wal field; logs reclaim entries during upload finalize and delete/unlink flows when flags enabled.
Native FS DMAPI xattr
src/native/fs/fs_napi.cpp
Adds macros for IBMUID/GPFS DMAPI tape UID and includes GPFS_DMAPI_XATTR_TAPE_UID in exported DMAPI xattr list.
Tests: nsfs glacier backend
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
Test instrumentation enabling tape reclaim, initializes reclaim WAL, patches backend _reclaim, and adds tests verifying reclaim entries on delete/overwrite/lifecycle flows.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client as App/Client
  participant NFS as NamespaceFS
  participant FS as Filesystem (stat/xattrs)
  participant WAL as Reclaim WAL

  Client->>NFS: finalize upload / delete
  alt DMAPI reclaim enabled
    NFS->>FS: stat & read xattrs (incl. IBMUID)
    NFS->>WAL: append(full_path, logical_size, xattrs)
    WAL-->>NFS: ack
  end
  NFS-->>Client: operation complete
Loading
sequenceDiagram
  autonumber
  actor Op as Operator/Timer
  participant CLI as manage_nsfs
  participant MNG as manage_nsfs_glacier
  participant G as Glacier.perform("RECLAIM")
  participant Lock as ClusterLock
  participant TC as TapeCloudGlacier
  participant Utils as TapeCloudUtils

  Op->>CLI: run --glacier reclaim
  CLI->>MNG: manage_glacier_operations(RECLAIM)
  MNG->>G: perform(fs_context, "RECLAIM")
  G->>Lock: acquire GLACIER_RECLAIM_CLUSTER_LOCK
  alt interval & free-space checks pass
    G->>TC: reclaim(fs_context, log_file, failure_recorder)
    TC->>Utils: reclaim(WAL entry file)
    Utils-->>TC: returns (logs errors)
    TC-->>G: processed entries
    G->>G: update reclaim timestamp file
  else skipped
    G-->>MNG: noop
  end
  G->>Lock: release
  MNG-->>CLI: done
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • alphaprinz
  • jackyalbo
  • naveenpaul1
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main feature being added: tape reclaim support for NSFS Glacier functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
src/sdk/namespace_fs.js (1)

2128-2140: Pass the stat to avoid duplicate filesystem calls.

In both deletion paths, the code calls append_to_reclaim_wal without passing the stat parameter, which causes the method to fetch the stat again from the filesystem. However, the stat is already available in the lifecycle deletion path from the verification step.

Optimize by passing the stat when available:

At line 2128, in the lifecycle deletion path:

 try {
     files = await this._open_files(fs_context, { src_path: file_path, delete_version: true });
+    const stat = await files.delete_version.src_file.stat(fs_context);
     await this._verify_lifecycle_filter_and_unlink(fs_context, params, file_path, files.delete_version);
-    await this.append_to_reclaim_wal(fs_context, file_path);
+    await this.append_to_reclaim_wal(fs_context, file_path, stat);
 } catch (err) {

Note: The non-lifecycle path at lines 2135-2140 doesn't have easy access to the stat, so it's acceptable to let the method fetch it.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69d598a and fa98cef.

📒 Files selected for processing (7)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
  • GLACIER_ACTIONS (25-30)
src/manage_nsfs/manage_nsfs_glacier.js (1)
src/util/native_fs_utils.js (2)
  • fs_context (549-554)
  • config (10-10)
src/sdk/glacier.js (3)
src/cmd/manage_nsfs.js (3)
  • fs_context (73-73)
  • type (57-57)
  • native_fs_utils (23-23)
src/manage_nsfs/manage_nsfs_glacier.js (5)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
  • native_fs_utils (9-9)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
src/sdk/glacier_tapecloud.js (3)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
src/manage_nsfs/manage_nsfs_glacier.js (4)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
  • fs_context (191-191)
  • fs_context (303-303)
  • fs_context (439-439)
src/sdk/namespace_fs.js (2)
src/manage_nsfs/manage_nsfs_glacier.js (7)
  • config (6-6)
  • nb_native (7-7)
  • nb_native (85-85)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
src/native/fs/fs_napi.cpp (2)
  • stat (2029-2033)
  • stat (2030-2030)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (6)
src/manage_nsfs/manage_nsfs_constants.js (1)

29-29: LGTM!

The RECLAIM action is correctly added to GLACIER_ACTIONS and properly configured in VALID_OPTIONS_GLACIER following the same pattern as existing glacier actions.

Also applies to: 76-76

src/cmd/manage_nsfs.js (1)

882-884: LGTM!

The RECLAIM action handler is correctly integrated following the same pattern as other glacier operations (MIGRATE, RESTORE, EXPIRY).

config.js (1)

931-931: LGTM!

The new configuration flag is appropriately placed in the GLACIER configuration section with a conservative default of false, requiring explicit opt-in for tape reclaim functionality.

src/sdk/glacier.js (1)

24-24: LGTM!

The reclaim infrastructure is correctly integrated:

  • New constants follow naming conventions (RECLAIM_TIMESTAMP_FILE, RECLAIM_WAL_NAME, GLACIER_RECLAIM_CLUSTER_LOCK)
  • Base reclaim() method properly throws 'Unimplemented' for subclass implementation
  • perform() correctly handles "RECLAIM" type with single-phase processing (no staging)

Note: The reclaim flow intentionally differs from MIGRATE/RESTORE by using direct log processing without a staging phase, which is appropriate for cleanup operations.

Also applies to: 89-89, 101-101, 197-209, 229-229, 296-299

src/sdk/namespace_fs.js (2)

3771-3780: LGTM - reclaim_wal getter follows established patterns.

The static getter for reclaim_wal is correctly implemented and follows the same pattern as migrate_wal and restore_wal.


3854-3855: LGTM - static field declaration is correct.

The static field declaration for _reclaim_wal is consistent with other WAL declarations in the class.

@tangledbytes tangledbytes force-pushed the utkarsh/feat/add-tape-reclaim branch from fa98cef to 1217008 Compare October 13, 2025 05:25
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/sdk/namespace_fs.js (2)

1408-1409: Logic inconsistency: stat destination before file is moved there, but then log wrong path.

This code has a critical flaw flagged in the previous review that remains unaddressed:

  1. You stat file_path (the destination) before the uploaded file has been moved there from upload_path
  2. Errors are silently suppressed with .catch(_.noop), potentially hiding real issues
  3. At line 1460, you log upload_path to the reclaim WAL, not file_path, which is inconsistent with the stat you just performed

If the intent is to detect tape reclaim on the existing destination file (before overwrite), then:

  • file_path might not exist yet (ENOENT for new uploads)
  • You should log file_path (the file being replaced), not upload_path
  • ENOENT should be handled explicitly

Apply this diff to fix the logic:

-const file_path_stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM &&
-    await nb_native().fs.stat(fs_context, file_path).catch(_.noop);
+let file_path_stat;
+if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM && !same_inode && !part_upload) {
+    try {
+        file_path_stat = await nb_native().fs.stat(fs_context, file_path);
+    } catch (err) {
+        if (err.code !== 'ENOENT') {
+            dbg.error('_finish_upload: failed to stat destination for reclaim check', file_path, err);
+        }
+        // Destination doesn't exist yet (new file), no reclaim needed
+    }
+}

And update line 1460 to use file_path:

 if (file_path_stat && file_path_stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) {
-    await this.append_to_reclaim_wal(fs_context, upload_path, file_path_stat);
+    await this.append_to_reclaim_wal(fs_context, file_path, file_path_stat);
 }

3725-3745: Add error handling to prevent reclaim logging failures from breaking main operations.

The method lacks error handling for the stat operation and WAL append, which could cause unhandled promise rejections. While debug console.log statements mentioned in the previous review appear to have been removed, the error handling issue remains critical.

Apply this diff:

 async append_to_reclaim_wal(fs_context, file_path, stat) {
     if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;

-    if (!stat) {
-        stat = await nb_native().fs.stat(fs_context, file_path);
+    try {
+        if (!stat) {
+            stat = await nb_native().fs.stat(fs_context, file_path);
+        }
+
+        const data = JSON.stringify({
+            full_path: file_path,
+            logical_size: stat.size,
+            ea: stat.xattr,
+        });
+        await NamespaceFS.reclaim_wal.append(data);
+    } catch (err) {
+        // Log error but don't fail the operation
+        dbg.error('append_to_reclaim_wal: failed to log reclaim entry', file_path, err);
     }
-
-    const data = JSON.stringify({
-        full_path: file_path,
-        logical_size: stat.size,
-        ea: stat.xattr,
-    });
-    await NamespaceFS.reclaim_wal.append(data);
 }

This ensures that reclaim logging failures don't break the main operation flow (upload, delete, etc.).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa98cef and 1217008.

📒 Files selected for processing (9)
  • .editorconfig (0 hunks)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/native/fs/fs_napi.cpp (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (6 hunks)
💤 Files with no reviewable changes (1)
  • .editorconfig
🚧 Files skipped from review as they are similar to previous changes (2)
  • config.js
  • src/manage_nsfs/manage_nsfs_constants.js
🧰 Additional context used
🧬 Code graph analysis (2)
src/manage_nsfs/manage_nsfs_glacier.js (1)
src/util/native_fs_utils.js (2)
  • fs_context (549-554)
  • config (10-10)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
  • GLACIER_ACTIONS (25-30)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (3)
src/sdk/namespace_fs.js (3)

2128-2128: LGTM: Reclaim logging after lifecycle deletion.

The call to append_to_reclaim_wal here correctly logs files deleted via lifecycle policies. The error handling appropriately catches ENOENT in case the file was already deleted.


2135-2140: LGTM: Reclaim logging after regular deletion.

The call to append_to_reclaim_wal here correctly logs files after successful deletion in non-lifecycle flows. The try-catch properly handles expected errors (ENOENT, EISDIR) while allowing unexpected errors to propagate.


3769-3778: LGTM: Proper lazy initialization for reclaim WAL logger.

The static getter follows the established pattern used by migrate_wal and restore_wal, with appropriate configuration for shared locking and polling interval.

Comment on lines +196 to +207
static async reclaim(file) {
try {
dbg.log1("Starting reclaim for file", file);
const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
dbg.log4("reclaim finished with:", out);
dbg.log1("Finished reclaim for file", file);
} catch (error) {
dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
}

return true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate reclaim failures so WAL entries aren’t lost.

TapeCloudUtils.reclaim() always resolves to true, even when exec() fails. That tells PersistentLogger.process() the reclaim batch succeeded, so the WAL file is deleted and the entries are never retried. Please return false (or rethrow) on failure so the WAL stays on disk for the next run.

     static async reclaim(file) {
         try {
             dbg.log1("Starting reclaim for file", file);
             const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
             dbg.log4("reclaim finished with:", out);
             dbg.log1("Finished reclaim for file", file);
-        } catch (error) {
-            dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
+            return true;
+        } catch (error) {
+            dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
+            return false;
         }
-
-        return true;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static async reclaim(file) {
try {
dbg.log1("Starting reclaim for file", file);
const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
dbg.log4("reclaim finished with:", out);
dbg.log1("Finished reclaim for file", file);
} catch (error) {
dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
}
return true;
}
static async reclaim(file) {
try {
dbg.log1("Starting reclaim for file", file);
const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
dbg.log4("reclaim finished with:", out);
dbg.log1("Finished reclaim for file", file);
return true;
} catch (error) {
dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
return false;
}
}
🤖 Prompt for AI Agents
In src/sdk/glacier_tapecloud.js around lines 196-207 the async reclaim() always
returns true even when exec() throws, causing PersistentLogger to treat failures
as successes and delete WAL entries; change the catch to return false (or
rethrow) so failures propagate — keep the existing error logging, and ensure the
function only returns true on successful exec and returns false inside the catch
block so the WAL is retained for retry.

@tangledbytes tangledbytes force-pushed the utkarsh/feat/add-tape-reclaim branch from 1217008 to 0cd41c1 Compare October 13, 2025 07:24
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1217008 and 0cd41c1.

📒 Files selected for processing (8)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/native/fs/fs_napi.cpp (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • config.js
  • src/cmd/manage_nsfs.js
  • src/manage_nsfs/manage_nsfs_glacier.js
  • src/sdk/glacier_tapecloud.js
  • src/sdk/namespace_fs.js
🧰 Additional context used
🧬 Code graph analysis (1)
src/sdk/glacier.js (4)
src/cmd/manage_nsfs.js (3)
  • fs_context (73-73)
  • type (57-57)
  • native_fs_utils (23-23)
src/manage_nsfs/manage_nsfs_glacier.js (5)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
  • native_fs_utils (9-9)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
  • fs_context (191-191)
  • fs_context (303-303)
  • fs_context (439-439)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
🔇 Additional comments (9)
src/manage_nsfs/manage_nsfs_constants.js (2)

25-30: LGTM!

The addition of the RECLAIM action follows the existing pattern and is consistent with other Glacier actions.


72-77: LGTM!

The reclaim option is correctly configured with the same pattern as other Glacier operations.

src/native/fs/fs_napi.cpp (2)

53-57: LGTM!

The IBMUID DMAPI extended attribute macros follow the established pattern for other DMAPI xattrs (IBMObj, IBMPMig, IBMTPS) and are correctly constructed.


256-261: LGTM!

The GPFS_DMAPI_XATTR_TAPE_UID is correctly added to the GPFS_DMAPI_XATTRS vector, ensuring it will be retrieved alongside other DMAPI attributes.

src/sdk/glacier.js (5)

24-24: LGTM!

The RECLAIM_TIMESTAMP_FILE constant follows the established naming pattern for other timestamp files.


89-89: LGTM!

The RECLAIM_WAL_NAME constant follows the naming pattern of other WAL constants.


101-101: LGTM!

The GLACIER_RECLAIM_CLUSTER_LOCK constant follows the established naming pattern for cluster locks.


229-229: LGTM!

The perform method signature is correctly updated to include "RECLAIM" in the type union.


296-299: LGTM!

The RECLAIM handling correctly uses a cluster lock and processes the reclaim WAL logs. The simpler pattern (without staging) is appropriate for cleanup operations.

@tangledbytes tangledbytes force-pushed the utkarsh/feat/add-tape-reclaim branch from 0cd41c1 to d747791 Compare October 13, 2025 12:44
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/sdk/glacier.js (2)

75-84: Tighten IBMUID JSDoc phrasing.

Redundant wording; simplify for clarity.

Apply:

-    /**
-     * GPFS_DMAPI_XATTR_TAPE_UID xattr contains UID which contains the unique ID of the UID
+    /**
+     * GPFS_DMAPI_XATTR_TAPE_UID xattr contains the unique ID of the file

197-209: Fix typos in reclaim stub.

  • JSDoc: “inindexed” → “unindexed”.
  • Error text: “Unimplementented” → “Unimplemented”.

Apply:

-    /**
-     * reclaim cleans up inindexed items in the underlying
+    /**
+     * reclaim cleans up unindexed items in the underlying
@@
-    async reclaim(fs_context, log_file, failure_recorder) {
-        throw new Error('Unimplementented');
+    async reclaim(fs_context, log_file, failure_recorder) {
+        throw new Error('Unimplemented');
     }
src/sdk/namespace_fs.js (1)

3730-3750: Make append_to_reclaim_wal non-failing and filter entries.

This must never break uploads/deletes. Wrap in try/catch, and log only when DMAPI tape xattr is present. Optionally, trim xattrs (privacy).

Apply:

-    async append_to_reclaim_wal(fs_context, file_path, stat) {
-        if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;
-
-        if (!stat) {
-            stat = await nb_native().fs.stat(fs_context, file_path);
-        }
-
-        const data = JSON.stringify({
-            full_path: file_path,
-            logical_size: stat.size,
-            ea: stat.xattr,
-        });
-        await NamespaceFS.reclaim_wal.append(data);
-    }
+    async append_to_reclaim_wal(fs_context, file_path, stat) {
+        if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;
+        try {
+            if (!stat) {
+                stat = await nb_native().fs.stat(fs_context, file_path);
+            }
+            if (!stat?.xattr?.[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) return;
+            const data = JSON.stringify({
+                full_path: file_path,
+                logical_size: stat.size,
+                // keep only DMAPI-related EAs to reduce noise/sensitivity
+                ea: {
+                    [Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR],
+                    [Glacier.GPFS_DMAPI_XATTR_TAPE_PREMIG]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_PREMIG],
+                    [Glacier.GPFS_DMAPI_XATTR_TAPE_TPS]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_TPS],
+                    [Glacier.GPFS_DMAPI_XATTR_TAPE_UID]: stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_UID],
+                },
+            });
+            await NamespaceFS.reclaim_wal.append(data);
+        } catch (err) {
+            dbg.warn('append_to_reclaim_wal: failed to log reclaim entry', file_path, err);
+        }
+    }
🧹 Nitpick comments (1)
src/sdk/namespace_fs.js (1)

2136-2145: Deletion path: WAL logging should gate on DMAPI indicator and avoid noisy warns.

  • Avoid logging all deletes; check DMAPI indicator before appending.
  • Wrap stat + append in try/catch; don’t warn on ENOENT.

Apply:

-            try {
-                const stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM &&
-                    await nb_native().fs.stat(fs_context, file_path).catch(dbg.warn.bind(this));
-                await nb_native().fs.unlink(fs_context, file_path);
-                if (stat) {
-                    await this.append_to_reclaim_wal(fs_context, file_path, stat);
-                }
-            } catch (err) {
+            try {
+                let stat;
+                if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) {
+                    try {
+                        stat = await nb_native().fs.stat(fs_context, file_path);
+                    } catch (e) {
+                        if (!['ENOENT', 'EISDIR'].includes(e.code)) dbg.warn('delete_object stat failed', file_path, e);
+                    }
+                }
+                await nb_native().fs.unlink(fs_context, file_path);
+                if (stat?.xattr?.[Glacier.GPFS_DMAPI_XATTR_TAPE_INDICATOR]) {
+                    await this.append_to_reclaim_wal(fs_context, file_path, stat).catch(dbg.warn.bind(this));
+                }
+            } catch (err) {
                 if (err.code !== 'ENOENT' && err.code !== 'EISDIR') throw err;
             }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd41c1 and d747791.

📒 Files selected for processing (8)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/native/fs/fs_napi.cpp (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/manage_nsfs/manage_nsfs_glacier.js
  • src/sdk/glacier_tapecloud.js
  • src/native/fs/fs_napi.cpp
  • config.js
🧰 Additional context used
🧬 Code graph analysis (2)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
  • GLACIER_ACTIONS (25-30)
src/sdk/glacier.js (1)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (6)
src/cmd/manage_nsfs.js (1)

882-884: Add RECLAIM action handling — looks good.

Switch case correctly wires to manage_nsfs_glacier.process_reclaim().

Optionally verify manage_nsfs_glacier exports process_reclaim and CLI help includes 'reclaim'.

src/manage_nsfs/manage_nsfs_constants.js (1)

29-30: RECLAIM action and options registered — good coverage.

GLACIER_ACTIONS and VALID_OPTIONS_GLACIER updated consistently.

Also applies to: 76-77

src/sdk/glacier.js (1)

24-25: RECLAIM constants and perform() branch — LGTM.

New timestamp/WAL/lock constants and perform() path align with existing patterns.

Also applies to: 89-90, 101-102, 296-300

src/sdk/namespace_fs.js (3)

3774-3783: Expose reclaim_wal — LGTM.

Consistent with migrate/restore WALs; SHARED writer lock matches usage.


3811-3813: Lifecycle deletion WAL append — OK, will be safe after making append non-throwing.

No further action if append_to_reclaim_wal is wrapped.

Ensure reclaim processing tolerates mixed entries with trimmed xattrs as proposed.


3860-3862: Static _reclaim_wal init — LGTM.

@tangledbytes tangledbytes force-pushed the utkarsh/feat/add-tape-reclaim branch from d747791 to 5b6ed9f Compare October 21, 2025 17:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
src/sdk/glacier_tapecloud.js (2)

471-484: Unused parameter: failure_recorder in reclaim

failure_recorder is accepted but never used. Either wire a failure recorder path (if partial failures will be tracked) or drop the param for clarity. The former is preferable if retry semantics are needed.


196-207: Don’t return success on reclaim failures — keep WAL for retry

TapeCloudUtils.reclaim() always returns true, even when exec() throws. This will delete WAL files and lose entries.

Apply this fix to propagate failure:

-    static async reclaim(file) {
+    static async reclaim(file) {
         try {
             dbg.log1("Starting reclaim for file", file);
             const out = await exec(`${get_bin_path(TapeCloudUtils.RECLAIM_SCRIPT)} ${file}`, { return_stdout: true });
             dbg.log4("reclaim finished with:", out);
             dbg.log1("Finished reclaim for file", file);
-        } catch (error) {
-            dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
-        }
-
-        return true;
+            return true;
+        } catch (error) {
+            dbg.error("Failed to run TapeCloudUtils.reclaim for file:", file, "due to error:", error);
+            return false;
+        }
     }
src/sdk/glacier.js (2)

75-84: Tidy IBMUID JSDoc phrasing

Clarify wording.

-    /**
-     * GPFS_DMAPI_XATTR_TAPE_UID xattr contains UID which contains the unique ID of the UID
+    /**
+     * GPFS_DMAPI_XATTR_TAPE_UID xattr contains the unique ID of the file

197-209: Fix JSDoc typo in reclaim description

"inindexed" → "unindexed".

-    /**
-     * reclaim cleans up inindexed items in the underlying
+    /**
+     * reclaim cleans up unindexed items in the underlying
src/sdk/namespace_fs.js (2)

1410-1411: Consider explicit ENOENT handling instead of silently swallowing all errors.

The .catch(_.noop) suppresses all stat errors (EPERM, EIO, etc.), not just ENOENT when the destination doesn't exist yet. While this might not break functionality, it could hide unexpected issues.

Consider applying the suggested fix from the past review comment:

-const file_path_stat = config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM &&
-    await nb_native().fs.stat(fs_context, file_path).catch(_.noop);
+let file_path_stat;
+if (config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) {
+    try {
+        file_path_stat = await nb_native().fs.stat(fs_context, file_path);
+    } catch (err) {
+        if (err.code !== 'ENOENT') {
+            dbg.warn('append_to_reclaim_wal: failed to stat destination', file_path, err);
+        }
+        // Destination doesn't exist yet or stat failed, no reclaim needed
+    }
+}

3733-3753: Add error handling to prevent reclaim logging failures from breaking operations.

The past review comment requested wrapping this method's body in try-catch so failures don't bubble up and break the main operation flow. This has not been implemented yet.

As suggested in the previous review, apply this error handling:

 async append_to_reclaim_wal(fs_context, file_path, stat) {
     if (!config.NSFS_GLACIER_LOGS_ENABLED || !config.NSFS_GLACIER_DMAPI_ENABLE_TAPE_RECLAIM) return;

-    if (!stat) {
-        stat = await nb_native().fs.stat(fs_context, file_path);
-    }
-
-    const data = JSON.stringify({
-        full_path: file_path,
-        logical_size: stat.size,
-        ea: stat.xattr,
-    });
-    await NamespaceFS.reclaim_wal.append(data);
+    try {
+        if (!stat) {
+            stat = await nb_native().fs.stat(fs_context, file_path);
+        }
+
+        const data = JSON.stringify({
+            full_path: file_path,
+            logical_size: stat.size,
+            ea: stat.xattr,
+        });
+        await NamespaceFS.reclaim_wal.append(data);
+    } catch (err) {
+        // Log error but don't fail the operation
+        dbg.error('append_to_reclaim_wal: failed to log reclaim entry', file_path, err);
+    }
 }

Based on learnings

🧹 Nitpick comments (1)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (1)

171-179: Close reclaim WAL in teardown to avoid FD leak

NamespaceFS._reclaim_wal is opened but not closed in mocha.after(). Add close similar to migrate/restore to avoid dangling handles.

 mocha.after(async function() {
-        await Promise.all([
-            fs.rm(ns_src_bucket_path, { recursive: true, force: true }),
-            fs.rm(config.NSFS_GLACIER_LOGS_DIR, { recursive: true, force: true }),
-        ]);
+        try {
+            await NamespaceFS._reclaim_wal?.close();
+        } finally {
+            await Promise.all([
+                fs.rm(ns_src_bucket_path, { recursive: true, force: true }),
+                fs.rm(config.NSFS_GLACIER_LOGS_DIR, { recursive: true, force: true }),
+            ]);
+        }
     });
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d747791 and 5b6ed9f.

📒 Files selected for processing (9)
  • config.js (3 hunks)
  • src/cmd/manage_nsfs.js (1 hunks)
  • src/manage_nsfs/manage_nsfs_constants.js (2 hunks)
  • src/manage_nsfs/manage_nsfs_glacier.js (2 hunks)
  • src/native/fs/fs_napi.cpp (2 hunks)
  • src/sdk/glacier.js (7 hunks)
  • src/sdk/glacier_tapecloud.js (4 hunks)
  • src/sdk/namespace_fs.js (7 hunks)
  • src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/manage_nsfs/manage_nsfs_constants.js
  • config.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
🧬 Code graph analysis (5)
src/cmd/manage_nsfs.js (1)
src/manage_nsfs/manage_nsfs_constants.js (1)
  • GLACIER_ACTIONS (25-30)
src/manage_nsfs/manage_nsfs_glacier.js (3)
src/cmd/manage_nsfs.js (3)
  • fs_context (73-73)
  • native_fs_utils (23-23)
  • config (18-18)
src/manage_nsfs/manage_nsfs_cli_utils.js (2)
  • fs_context (125-125)
  • native_fs_utils (7-7)
src/util/native_fs_utils.js (2)
  • fs_context (549-554)
  • config (10-10)
src/sdk/namespace_fs.js (2)
src/manage_nsfs/manage_nsfs_glacier.js (7)
  • config (6-6)
  • nb_native (7-7)
  • nb_native (85-85)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
src/native/fs/fs_napi.cpp (2)
  • stat (2032-2036)
  • stat (2033-2033)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
src/manage_nsfs/manage_nsfs_glacier.js (8)
  • require (5-5)
  • require (8-8)
  • require (10-10)
  • backend (14-14)
  • backend (29-29)
  • backend (43-43)
  • backend (63-63)
  • config (6-6)
src/sdk/glacier.js (4)
  • require (7-7)
  • require (10-10)
  • config (9-9)
  • s3_utils (6-6)
src/sdk/namespace_fs.js (8)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • lifecycle_utils (27-27)
  • config (13-13)
  • s3_utils (15-15)
  • buffer_utils (18-18)
  • crypto (14-14)
src/sdk/glacier.js (3)
src/cmd/manage_nsfs.js (3)
  • fs_context (73-73)
  • type (57-57)
  • native_fs_utils (23-23)
src/manage_nsfs/manage_nsfs_glacier.js (5)
  • fs_context (13-13)
  • fs_context (28-28)
  • fs_context (42-42)
  • fs_context (62-62)
  • native_fs_utils (9-9)
src/util/native_fs_utils.js (1)
  • fs_context (549-554)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (12)
src/native/fs/fs_napi.cpp (3)

260-260: LGTM! Addition to GPFS_DMAPI_XATTRS vector is correct.

The new xattr is properly added to the GPFS_DMAPI_XATTRS vector, which is used by get_fd_gpfs_xattr() (line 482) to retrieve GPFS-specific extended attributes when use_dmapi is enabled. This change is backward compatible and follows the established pattern.


53-260: Note: PR description and tests need attention.

While the code changes are correct and minimal, please note:

  1. The PR description contains only placeholder text with no actual explanation of the changes
  2. The PR checklist shows "Tests added" is unchecked

Given that this adds support for a new DMAPI xattr for Glacier tape reclaim, please:

  • Update the PR description to explain the purpose of the IBMUID xattr
  • Add tests to verify the xattr is correctly retrieved when use_dmapi is enabled
  • Document the expected format/content of the IBMUID xattr value

53-57: LGTM! IBMUID attribute name verified and implementation is consistent.

The new macros follow the exact same naming convention and pattern as the existing DMAPI xattrs. Verification across the codebase confirms "IBMUID" is the correct GPFS DMAPI extended attribute name for tape UID—it's already used in src/sdk/glacier.js with documentation confirming it indicates file migration/premigration status. The C++ implementation properly mirrors the JavaScript constant and adds the xattr to the retrieval vector using the established pattern.

src/cmd/manage_nsfs.js (1)

882-884: RECLAIM action wiring LGTM

Dispatch to manage_nsfs_glacier.process_reclaim() with break is correct and consistent with other actions.

src/manage_nsfs/manage_nsfs_glacier.js (2)

61-73: process_reclaim logic looks correct

Uses NSFS_GLACIER_RECLAIM_INTERVAL and timestamp gating; updates timestamp post-run.


145-145: Export added correctly

process_reclaim exported alongside other flows.

src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (1)

540-596: Reclaim test coverage is solid

Validates single/multi delete, overwrite, and lifecycle-led delete entries in reclaim WAL. No issues noted.

Also applies to: 598-657, 659-685

src/sdk/glacier.js (1)

296-300: perform('RECLAIM') path LGTM

Cluster lock + processing RECLAIM WAL mirrors migrate/restore patterns.

src/sdk/namespace_fs.js (4)

1459-1462: LGTM: Reclaim logging for tape-resident files.

The logic correctly checks for the tape indicator xattr and logs the destination file for reclaim before it's overwritten. The path consistency issue mentioned in past reviews has been addressed.


3777-3786: LGTM: Standard lazy initialization pattern.

The reclaim WAL getter follows the same pattern as migrate_wal and restore_wal, with proper lazy initialization and consistent configuration.


3814-3816: LGTM: Lifecycle deletion reclaim logging with error handling.

The call to append_to_reclaim_wal is properly wrapped with .catch(dbg.warn.bind(this)) to ensure logging failures don't break the lifecycle deletion flow.


3863-3864: LGTM: Static field declaration.

Properly declares the static field for the reclaim WAL instance.

@tangledbytes tangledbytes requested review from a team, alphaprinz, jackyalbo, naveenpaul1 and shirady and removed request for a team October 27, 2025 06:34
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add all kind of tests for lifecycle

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
@tangledbytes tangledbytes force-pushed the utkarsh/feat/add-tape-reclaim branch from 5b6ed9f to 8fc8213 Compare January 13, 2026 03:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @src/sdk/glacier_tapecloud.js:
- Around line 471-484: reclaim currently catches exceptions but doesn't return a
boolean on error; update the catch block in async reclaim(fs_context, log_file,
failure_recorder) to return a boolean (e.g., return false) or rethrow the error
to match the Promise<boolean> signature, ensure you keep the existing
dbg.error('unexpected error occured while running tapecloud.reclaim:', error)
call and return false after logging so callers always receive a boolean result;
refer to the reclaim method and the internal call to
this._reclaim(log_file.log_path).
🧹 Nitpick comments (2)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (1)

625-681: Consider using assert.strictEqual for better error messages.

The assertions assert(found === 1) and assert(found === 3) will not provide helpful error messages on failure. Using assert.strictEqual would show the actual vs expected values.

♻️ Suggested improvement
-            assert(found === 1);
+            assert.strictEqual(found, 1, 'Expected 1 reclaim entry for single delete');
...
-            assert(found === 3);
+            assert.strictEqual(found, 3, 'Expected 3 reclaim entries for all deleted objects');
src/sdk/namespace_fs.js (1)

3654-3674: Document that stat should always be provided when file may be deleted.

The stat parameter is marked optional, but the fallback path (lines 3664-3666) will fail with ENOENT if called after the file is deleted. All current callers correctly provide stat, but this could be a footgun for future callers.

Consider either:

  1. Making stat required since all callers provide it
  2. Adding a JSDoc note explaining that stat must be provided for deleted files
📝 Suggested documentation improvement
     /**
-     *
+     * Appends reclaim data to the reclaim WAL for tape reclaim processing.
+     * 
+     * IMPORTANT: When logging reclaim for a deleted file, `stat` MUST be provided
+     * since the file no longer exists on disk.
+     * 
      * @param {nb.NativeFSContext} fs_context
      * @param {string} file_path
-     * @param {nb.NativeFSStats} [stat]
+     * @param {nb.NativeFSStats} [stat] - Required when file may be deleted
      * @returns
      */
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b6ed9f and 8fc8213.

📒 Files selected for processing (9)
  • config.js
  • src/cmd/manage_nsfs.js
  • src/manage_nsfs/manage_nsfs_constants.js
  • src/manage_nsfs/manage_nsfs_glacier.js
  • src/native/fs/fs_napi.cpp
  • src/sdk/glacier.js
  • src/sdk/glacier_tapecloud.js
  • src/sdk/namespace_fs.js
  • src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/manage_nsfs/manage_nsfs_glacier.js
  • src/manage_nsfs/manage_nsfs_constants.js
  • src/cmd/manage_nsfs.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*

⚙️ CodeRabbit configuration file

src/test/**/*.*: Ensure that the PR includes tests for the changes.

Files:

  • src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
🧠 Learnings (5)
📓 Common learnings
Learnt from: tangledbytes
Repo: noobaa/noobaa-core PR: 9241
File: src/sdk/glacier_tapecloud.js:471-484
Timestamp: 2025-10-22T11:00:25.141Z
Learning: In TapeCloudGlacier, the reclaim operation does not implement failure recording or retry logic because retry handling is the responsibility of tcmgr (TapeCloud manager). The failure_recorder parameter in reclaim() is unused by design.
📚 Learning: 2025-10-22T11:00:25.141Z
Learnt from: tangledbytes
Repo: noobaa/noobaa-core PR: 9241
File: src/sdk/glacier_tapecloud.js:471-484
Timestamp: 2025-10-22T11:00:25.141Z
Learning: In TapeCloudGlacier, the reclaim operation does not implement failure recording or retry logic because retry handling is the responsibility of tcmgr (TapeCloud manager). The failure_recorder parameter in reclaim() is unused by design.

Applied to files:

  • src/sdk/glacier_tapecloud.js
  • src/sdk/glacier.js
📚 Learning: 2025-08-18T05:00:52.294Z
Learnt from: tangledbytes
Repo: noobaa/noobaa-core PR: 9183
File: src/sdk/glacier_tapecloud.js:377-406
Timestamp: 2025-08-18T05:00:52.294Z
Learning: In TapeCloudGlacier.restore method, the method always returns true (rather than propagating _recall() result) because it uses a staging approach where the original log file should always be deleted after processing. Failed entries are captured by failure_recorder for retry through staged files, so the original log preservation is not needed.

Applied to files:

  • src/sdk/glacier_tapecloud.js
  • src/sdk/glacier.js
📚 Learning: 2025-08-05T10:05:51.959Z
Learnt from: tangledbytes
Repo: noobaa/noobaa-core PR: 9183
File: src/sdk/glacier_tapecloud.js:507-509
Timestamp: 2025-08-05T10:05:51.959Z
Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.

Applied to files:

  • src/sdk/glacier_tapecloud.js
📚 Learning: 2025-08-18T05:00:21.149Z
Learnt from: tangledbytes
Repo: noobaa/noobaa-core PR: 9183
File: src/sdk/glacier_tapecloud.js:273-311
Timestamp: 2025-08-18T05:00:21.149Z
Learning: In TapeCloudGlacier.migrate method, ignoring the boolean return value from _migrate() and always proceeding to unstage entries is acceptable behavior - the migrate operation doesn't require checking the return value like other operations might.

Applied to files:

  • src/sdk/glacier_tapecloud.js
🧬 Code graph analysis (1)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (2)
src/sdk/glacier.js (2)
  • require (7-7)
  • require (10-10)
src/sdk/namespace_fs.js (4)
  • require (26-26)
  • require (29-29)
  • require (30-30)
  • require (31-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-jest-unit-tests
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
🔇 Additional comments (19)
config.js (1)

944-946: LGTM!

The new configuration flags follow existing naming conventions and are appropriately placed. The default values (disabled by default, 15-minute interval matching other glacier intervals) are sensible.

src/native/fs/fs_napi.cpp (2)

60-64: LGTM!

The new DMAPI xattr constants follow the established naming conventions and patterns used by existing tape-related xattrs (IBMOBJ, IBMPMig, IBMTPS).


260-265: LGTM!

Adding GPFS_DMAPI_XATTR_TAPE_UID to the GPFS_DMAPI_XATTRS vector ensures it will be retrieved alongside other DMAPI xattrs when use_dmapi is enabled.

src/sdk/glacier_tapecloud.js (2)

186-207: LGTM!

The TapeCloudUtils.reclaim implementation correctly logs errors without throwing, and always returns true since retry handling is the responsibility of tcmgr (TapeCloud manager). Based on learnings, this design is intentional.


553-562: LGTM!

The _reclaim private method follows the established pattern of other similar methods (_migrate, _recall, _process_expired) by delegating to the corresponding TapeCloudUtils static method.

src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (4)

98-98: LGTM!

The patched backend correctly stubs out _reclaim to return true, consistent with the stubbing of other methods like _migrate, _recall, and _process_expired.


180-187: LGTM!

The reclaim WAL setup follows the same pattern as the existing migrate and restore WAL initialization, properly closing any existing logger before replacing it.


683-742: LGTM!

The test correctly verifies that overwriting an object logs a reclaim entry for the previous version, while the initial upload does not create a reclaim entry. This validates the expected behavior that reclaim entries are only created when replacing existing data.


744-770: LGTM!

Good coverage for lifecycle-triggered deletions marking objects for tape reclaim. The test uses lifecycle_utils.build_lifecycle_filter to simulate real lifecycle deletion behavior.

src/sdk/glacier.js (4)

24-24: LGTM!

The RECLAIM_TIMESTAMP_FILE constant follows the naming pattern of existing timestamp files (MIGRATE_TIMESTAMP_FILE, RESTORE_TIMESTAMP_FILE, EXPIRY_TIMESTAMP_FILE).


75-83: LGTM!

The GPFS_DMAPI_XATTR_TAPE_UID constant is well-documented with an example format, matching the documentation style of other DMAPI xattr constants in this class.


194-206: LGTM!

The reclaim method signature follows the established pattern of other glacier operations (migrate, restore), with consistent parameter types and return type.


284-286: No locking issue with reclaim processing.

The RECLAIM path uses the same PersistentLogger infrastructure as MIGRATION and RESTORE, which is initialized with EXCLUSIVE file-level locking (line 250). The open_with_lock call with EXCLUSIVE locking and retries prevents concurrent access to log files, ensuring only one process can hold the lock at a time. Unlike MIGRATION and RESTORE which additionally use GLACIER_CLUSTER_LOCK for their staging operations, RECLAIM relies on EXCLUSIVE file-level locking, which is sufficient for this use case.

src/sdk/namespace_fs.js (6)

1337-1338: LGTM - Guarded stat for reclaim tracking.

The short-circuit evaluation and .catch(_.noop) correctly handle the case where the file doesn't exist yet (new upload), returning undefined to skip reclaim logging.


1387-1391: Consider adding error suppression for reclaim logging during uploads.

Unlike the lifecycle deletion path (line 3736), errors from append_to_reclaim_wal here will propagate and fail the upload. Per the retrieved learnings, reclaim retry handling is the responsibility of tcmgr.

If reclaim logging failures should not fail user uploads, consider:

🔧 Suggested change
             if (file_path_stat) {
-                await this.append_to_reclaim_wal(fs_context, file_path, file_path_stat);
+                await this.append_to_reclaim_wal(fs_context, file_path, file_path_stat).catch(dbg.warn.bind(this));
             }

2059-2068: LGTM - Correct integration of reclaim logging into delete flow.

The logic correctly:

  1. Stats the file before deletion (only when feature enabled)
  2. Proceeds with unlink regardless of stat result
  3. Logs reclaim only if both stat succeeded and unlink succeeded
  4. Preserves the original ENOENT/EISDIR error suppression behavior

Based on learnings, reclaim errors being handled by tcmgr justifies the error handling approach.


3698-3707: LGTM - Follows established WAL getter pattern.

The implementation is consistent with migrate_wal and restore_wal getters, using the same lazy initialization pattern and configuration.


3735-3737: LGTM - Appropriate error suppression for lifecycle reclaim logging.

The .catch(dbg.warn.bind(this)) pattern correctly prevents reclaim logging failures from propagating to the lifecycle deletion flow, consistent with the design that tcmgr handles retry logic. Based on learnings, this is the intended behavior.


3784-3785: LGTM - Consistent static field initialization.

Comment on lines +471 to +484
/**
*
* @param {nb.NativeFSContext} fs_context
* @param {LogFile} log_file log filename
* @param {(entry: string) => Promise<void>} failure_recorder
* @returns {Promise<boolean>}
*/
async reclaim(fs_context, log_file, failure_recorder) {
try {
return this._reclaim(log_file.log_path);
} catch (error) {
dbg.error('unexpected error occured while running tapecloud.reclaim:', error);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing return statement in catch block.

The method signature indicates it returns Promise<boolean>, but when an exception is caught, the function implicitly returns undefined. This inconsistency could cause issues for callers expecting a boolean.

🐛 Proposed fix
     async reclaim(fs_context, log_file, failure_recorder) {
         try {
             return this._reclaim(log_file.log_path);
         } catch (error) {
             dbg.error('unexpected error occured while running tapecloud.reclaim:', error);
+            return false;
         }
     }
🤖 Prompt for AI Agents
In @src/sdk/glacier_tapecloud.js around lines 471 - 484, reclaim currently
catches exceptions but doesn't return a boolean on error; update the catch block
in async reclaim(fs_context, log_file, failure_recorder) to return a boolean
(e.g., return false) or rethrow the error to match the Promise<boolean>
signature, ensure you keep the existing dbg.error('unexpected error occured
while running tapecloud.reclaim:', error) call and return false after logging so
callers always receive a boolean result; refer to the reclaim method and the
internal call to this._reclaim(log_file.log_path).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant