-
Notifications
You must be signed in to change notification settings - Fork 87
[NSFS | GLACIER] Add support for parallel recall and migrate #9183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NSFS | GLACIER] Add support for parallel recall and migrate #9183
Conversation
WalkthroughAdds a staging layer and centralized orchestration/locking for Glacier migration/restore/expiry, introduces a LogFile abstraction and open_with_lock utility, updates restore finalization to optionally check DMAPI xattrs via a new config flag, and adapts utilities and tests to the new LogFile APIs. (≤50 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Manager as manage_nsfs_glacier.js
participant Backend as Glacier / TapeCloudGlacier
participant Logger as PersistentLogger
participant LogFile as LogFile
Manager->>Backend: perform(fs_context, "MIGRATION"/"RESTORE"/"EXPIRY")
alt MIGRATION or RESTORE
Backend->>Logger: process(primary logs) -- acquire GLACIER_CLUSTER_LOCK
Logger->>LogFile: init (acquire log lock)
LogFile->>Backend: stage_migrate/stage_restore(log_file, failure_recorder)
Backend->>Logger: process(staged logs) -- acquire per-type cluster lock
Logger->>LogFile: init (staged log lock)
LogFile->>Backend: migrate/restore(log_file, failure_recorder)
Backend->>Backend: finalize (e.g., _finalize_restore with optional DMAPI check)
else EXPIRY
Backend->>Logger: process(expiry logs) -- guarded by GLACIER_SCAN_LOCK
Logger->>LogFile: init (acquire log lock)
LogFile->>Backend: process expiry entries
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
src/util/native_fs_utils.js
Outdated
* }} [cfg] | ||
* @returns {Promise<nb.NativeFile | undefined>} | ||
*/ | ||
async function open_with_lock(fs_context, file_path, flags, mode, cfg = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using in newline reader. However newline reader wants to be able to read even a deleted file I guess.
@tangledbytes reevaluate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
src/native/fs/fs_napi.cpp (1)
1897-1954
: Well-implemented non-blocking file locking with minor typo.The implementation correctly handles non-blocking file locking semantics:
- Uses
F_OFD_SETLK
for non-blocking operation- Properly distinguishes between lock unavailable (
EAGAIN
) and actual errors- Returns boolean result to indicate lock acquisition success
- Follows established patterns from
FileFcntlLock
Fix the typo in the comment:
- // Failed to aquire lock because another process + // Failed to acquire lock because another processsrc/util/native_fs_utils.js (1)
800-802
: Consider additional file validation checksThe current validation only checks inode match and nlink > 0. Consider adding additional checks like file size or modification time to ensure the file hasn't been replaced.
if (fh_stat.ino === path_stat.ino && fh_stat.nlink > 0) { + // Optionally validate other attributes like size or mtime + // if specific consistency requirements exist return file; }src/sdk/glacier_tapecloud.js (1)
217-217
: Inconsistent file handle managementAt line 217,
entry_fh
is closed without checking if it's defined, but at line 251 optional chaining is used. Consider using consistent approach.- await entry_fh?.close(fs_context); + if (entry_fh) await entry_fh.close(fs_context);Also applies to: 251-251
src/util/persistent_logger.js (1)
250-253
: Consider making NewlineReader options configurableThe
NewlineReader
is created with hardcoded options. Consider making these configurable for flexibility.- constructor(fs_context, log_path) { + constructor(fs_context, log_path, options = {}) { this.fs_context = fs_context; this.log_path = log_path; this.log_reader = new NewlineReader( this.fs_context, - this.log_path, { lock: 'EXCLUSIVE', skip_overflow_lines: true, skip_leftover_line: true }, + this.log_path, { + lock: 'EXCLUSIVE', + skip_overflow_lines: true, + skip_leftover_line: true, + ...options + }, ); }src/sdk/glacier.js (1)
104-112
: Consider adding validation in base staging methodsThe base implementations of
stage_migrate
andstage_restore
simply copy all entries without validation. While subclasses can override these, consider adding basic validation in the base class.async stage_migrate(fs_context, log_file, failure_recorder) { try { - await log_file.collect(Glacier.MIGRATE_STAGE_WAL_NAME, async (entry, batch_recorder) => batch_recorder(entry)); + await log_file.collect(Glacier.MIGRATE_STAGE_WAL_NAME, async (entry, batch_recorder) => { + // Basic validation: check if file exists + try { + await nb_native().fs.stat(fs_context, entry); + await batch_recorder(entry); + } catch (err) { + if (err.code !== 'ENOENT') { + await failure_recorder(entry); + } + } + }); return true; } catch (error) { dbg.error('Glacier.stage_migrate error:', error); throw error; } }Also applies to: 145-153
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
config.js
(1 hunks)src/manage_nsfs/manage_nsfs_glacier.js
(1 hunks)src/native/fs/fs_napi.cpp
(4 hunks)src/sdk/glacier.js
(7 hunks)src/sdk/glacier_tapecloud.js
(6 hunks)src/sdk/nb.d.ts
(1 hunks)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
(2 hunks)src/util/bucket_logs_utils.js
(4 hunks)src/util/file_reader.js
(1 hunks)src/util/native_fs_utils.js
(2 hunks)src/util/notifications_util.js
(3 hunks)src/util/persistent_logger.js
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*
⚙️ CodeRabbit Configuration File
src/test/**/*.*
: Ensure that the PR includes tests for the changes.
Files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
🧠 Learnings (1)
📚 Learning: the noobaa config.js file has a built-in environment variable override system via `load_config_env_o...
Learnt from: liranmauda
PR: noobaa/noobaa-core#9174
File: config.js:1044-1046
Timestamp: 2025-07-30T10:20:09.402Z
Learning: The NooBaa config.js file has a built-in environment variable override system via `load_config_env_overrides()` function. Any config property can be overridden using environment variables prefixed with `CONFIG_JS_`. For example, `CONFIG_JS_NC_HEALTH_BUCKETS_COUNT_LIMIT=10000` would override `config.NC_HEALTH_BUCKETS_COUNT_LIMIT`. The system handles type conversion automatically and provides consistent override patterns across all config values.
Applied to files:
config.js
🧬 Code Graph Analysis (4)
src/util/file_reader.js (1)
src/manage_nsfs/manage_nsfs_glacier.js (2)
fh
(96-96)fh
(98-98)
src/util/native_fs_utils.js (1)
src/util/debug_module.js (2)
nb_native
(26-26)file
(60-60)
src/manage_nsfs/manage_nsfs_glacier.js (3)
src/util/native_fs_utils.js (3)
fs_context
(549-554)config
(10-10)path
(7-7)src/manage_nsfs/manage_nsfs_cli_utils.js (2)
fs_context
(125-125)native_fs_utils
(7-7)src/sdk/glacier.js (3)
config
(9-9)path
(4-4)native_fs_utils
(11-11)
src/sdk/glacier_tapecloud.js (5)
src/util/native_fs_utils.js (9)
fs_context
(549-554)dbg
(4-4)nb_native
(12-12)nb_native
(670-670)stat
(401-401)stat
(449-449)stat
(598-598)err
(353-353)config
(10-10)src/sdk/glacier.js (3)
dbg
(8-8)nb_native
(5-5)config
(9-9)src/util/persistent_logger.js (2)
dbg
(9-9)nb_native
(5-5)src/util/file_reader.js (1)
nb_native
(5-5)config.js (2)
stat
(1348-1348)config
(7-7)
🔇 Additional comments (18)
config.js (1)
919-922
: LGTM! Well-documented configuration flag added.The new
NSFS_GLACIER_DMAPI_FINALIZE_RESTORE_ENABLE
configuration flag is properly named, well-documented, and follows existing conventions. The conservative default value offalse
is appropriate for a feature that prevents accidental blocking reads during restore finalization.src/sdk/nb.d.ts (1)
1033-1033
: LGTM! Consistent non-blocking lock method added.The new
fcntltrylock
method is well-designed with a signature consistent with existing lock methods. ThePromise<boolean>
return type appropriately indicates the success/failure of the non-blocking lock attempt, providing useful functionality for concurrent file operations.src/util/file_reader.js (1)
203-206
: LGTM! Improved file handle lifecycle management.The refactored
close
method enhances safety by immediately clearing the instance's file handle reference (this.fh = null
) before performing the asynchronous close operation. This prevents potential race conditions and accidental reuse of the file handle during or after the close operation, which is particularly important in concurrent scenarios.src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (2)
179-179
: LGTM! Correctly adapted to the new LogFile abstraction.This change properly uses the
log_reader
property from theLogFile
instance instead of manually creating aNewlineReader
. This aligns with the refactored log file handling system whereLogFile
manages the reader internally with proper locking.
221-221
: LGTM! Correctly updated for new restore method signature.The addition of the third argument
async () => null
as a failure recorder callback aligns with the updatedrestore
method signature in the Glacier backend. The no-op callback is appropriate for this test scenario where failure handling isn't specifically tested.src/util/notifications_util.js (3)
6-6
: LGTM! Import correctly updated for refactored design.Removing the
LogFile
import is appropriate sinceLogFile
instances are now passed directly to the_notify
method rather than being constructed locally. ThePersistentLogger
import is correctly retained.
100-100
: LGTM! Call correctly simplified for new interface.The simplified call to
_notify
with onlyfile
andfailure_append
parameters aligns with the refactored method signature. Thefs_context
parameter is no longer needed since theLogFile
instance encapsulates the necessary context.
116-120
: LGTM! Method signature correctly updated for LogFile abstraction.The updated
_notify
method signature properly accepts aLogFile
instance directly rather than constructing one internally. The JSDoc comments are appropriately updated to reflect the new parameter types. This change aligns with the broader refactoring to use theLogFile
abstraction for better resource management and locking consistency.src/native/fs/fs_napi.cpp (3)
1458-1458
: LGTM!The method registration correctly follows the established pattern used by other FileWrap methods.
1489-1489
: LGTM!The method declaration is consistent with other FileWrap methods and follows the established pattern.
2114-2118
: LGTM!The method implementation correctly follows the established pattern used by other FileWrap methods, properly utilizing the
api<>
template to create and queue the worker.src/util/bucket_logs_utils.js (4)
9-9
: LGTM!The removal of the unused
LogFile
import is correct and aligns with the refactoring whereLogFile
instances are now provided byPersistentLogger.process
.
39-39
: LGTM!The removal of the
fs_context
parameter is correct and consistent with the updated_upload_to_targets
function signature. TheLogFile
instance provided byPersistentLogger.process
should contain all necessary context.
55-59
: LGTM!The function signature simplification and updated JSDoc documentation correctly reflect the API changes:
- Removal of the redundant
fs_context
parameter- Direct use of
LogFile
instance parameter- Proper JSDoc type annotation for the
log_file
parameter
63-64
: LGTM!The updated usage correctly leverages the provided
LogFile
instance:
- Using
log_file.log_path
for logging maintains the same information while working with the new API- Direct call to
log_file.collect_and_process()
eliminates redundant LogFile creation- Consistent error logging with
log_file.log_path
Also applies to: 101-101
src/util/native_fs_utils.js (2)
724-736
: Potential resource management issue with lock releaseThe current implementation always closes the file descriptor in the finally block, but if the callback uses
release_lock()
to unlock, the file descriptor might be in an inconsistent state. Consider tracking whether the lock was released to avoid potential issues.async function lock_and_run(fs_context, lock_path, cb) { const lockfd = await nb_native().fs.open(fs_context, lock_path, 'w'); + let lock_released = false; const release_lock = async () => { + if (lock_released) return; await lockfd.fcntllock(fs_context, 'UNLOCK'); + lock_released = true; }; try { await lockfd.fcntllock(fs_context, 'EXCLUSIVE'); await cb(release_lock); } finally { - await lockfd.close(fs_context); + if (!lock_released) { + await lockfd.fcntllock(fs_context, 'UNLOCK'); + } + await lockfd.close(fs_context); } }Likely an incorrect or invalid review comment.
763-791
: Incorrect retry logic and unnecessary delaysThe retry logic has several issues:
- The loop decrements retries at the start, causing one less retry than configured
- There's an unnecessary delay after successful non-blocking lock acquisition
- for (; retries > 0; retries -= 1) { + while (retries > 0) { try { file = await nb_native().fs.open(fs_context, file_path, flags, mode); if (cfg.lock && cfg.lock_no_block) { if (!await file.fcntltrylock(fs_context, cfg.lock)) { + await file.close(fs_context); + file = undefined; + retries -= 1; await P.delay(backoff * (1 + Math.random())); continue; } }Likely an incorrect or invalid review comment.
src/manage_nsfs/manage_nsfs_glacier.js (1)
31-34
: The low_free_space check for restores is correct—no change neededRestores bring data back onto disk and consume space, so it’s intentional to skip restore operations when free space is low. You can safely ignore the earlier suggestion to invert this condition.
Likely an incorrect or invalid review comment.
cb099e6
to
0b11eac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
src/sdk/glacier_tapecloud.js (2)
503-503
: Add documentation for the intentional error caseAs noted in past comments, this section needs documentation to explain why the error is thrown when failure_recorder is not provided.
Add a comment explaining the intentional error behavior:
+ // Intentionally throw error to fail the entire batch and preserve log file for retry + // This prevents files from being silently stuck in an unrestored state if (!failure_recorder) { throw new Error('restored file not actually restored'); }
344-344
: Fix the incorrect time comparison conditionThe condition
stat.ctime <= stat.mtime
will never be true as noted in previous reviews. This needs to be removed as mentioned in the past comments.Apply this diff to fix the condition:
- if (stat.xattr[Glacier.XATTR_STAGE_MIGRATE] || stat.ctime <= stat.mtime) { + if (stat.xattr[Glacier.XATTR_STAGE_MIGRATE]) {src/sdk/glacier.js (2)
156-156
: Fix reference to non-existent functionThe comment references
restore_namespace()
which doesn't exist, as noted in previous reviews.Apply this diff to fix the comment:
- * restore must take a log file (from `restore_namespace()`) which will + * restore must take a log file (from `Glacier.RESTORE_STAGE_WAL_NAME`) which will
238-240
: Add missing await in async arrow functionThe arrow function is missing an await before the
process_glacier_logs
call, as noted in previous reviews.Apply this diff to add the missing await:
await native_fs_utils.lock_and_run( - fs_context, lock_path(lock_file), () => process_glacier_logs(namespace_stage, log_process_cb) + fs_context, lock_path(lock_file), async () => await process_glacier_logs(namespace_stage, log_process_cb) );
🧹 Nitpick comments (1)
src/sdk/glacier.js (1)
243-260
: Consider making operation type handling more explicitThe current if statements work correctly, but the mutual exclusivity of operation types could be made more explicit with if-else if structure.
Consider this more explicit structure:
- if (type === 'MIGRATION') { + if (type === 'MIGRATION') { await run_operation( Glacier.MIGRATE_WAL_NAME, Glacier.MIGRATE_STAGE_WAL_NAME, this.stage_migrate.bind(this), this.migrate.bind(this), Glacier.GLACIER_MIGRATE_CLUSTER_LOCK, ); - } - if (type === 'RESTORE') { + } else if (type === 'RESTORE') { await run_operation( Glacier.RESTORE_WAL_NAME, Glacier.RESTORE_STAGE_WAL_NAME, this.stage_restore.bind(this), this.restore.bind(this), Glacier.GLACIER_RESTORE_CLUSTER_LOCK, ); }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
config.js
(1 hunks)src/manage_nsfs/manage_nsfs_glacier.js
(1 hunks)src/native/fs/fs_napi.cpp
(4 hunks)src/sdk/glacier.js
(7 hunks)src/sdk/glacier_tapecloud.js
(6 hunks)src/sdk/nb.d.ts
(1 hunks)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
(2 hunks)src/util/bucket_logs_utils.js
(4 hunks)src/util/file_reader.js
(1 hunks)src/util/native_fs_utils.js
(2 hunks)src/util/notifications_util.js
(3 hunks)src/util/persistent_logger.js
(6 hunks)
✅ Files skipped from review due to trivial changes (1)
- config.js
🚧 Files skipped from review as they are similar to previous changes (9)
- src/manage_nsfs/manage_nsfs_glacier.js
- src/util/file_reader.js
- src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
- src/sdk/nb.d.ts
- src/util/bucket_logs_utils.js
- src/util/native_fs_utils.js
- src/native/fs/fs_napi.cpp
- src/util/notifications_util.js
- src/util/persistent_logger.js
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:507-509
Timestamp: 2025-08-05T10:05:51.938Z
Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.
📚 Learning: in tapecloudglacier._finalize_restore method, when failure_recorder is not provided but the dmapi pr...
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:507-509
Timestamp: 2025-08-05T10:05:51.938Z
Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.
Applied to files:
src/sdk/glacier_tapecloud.js
src/sdk/glacier.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-package-lock-validation
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (8)
src/sdk/glacier_tapecloud.js (5)
18-18
: LGTM: Clean JSDoc import for type annotationsThe JSDoc import for LogFile is appropriate and aligns with the new LogFile abstraction from persistent_logger.
199-260
: Well-implemented staging method with proper error handlingThe
stage_migrate
method correctly implements the staging pattern with:
- Proper file handle management and cleanup
- Appropriate error handling for ENOENT cases
- Staging xattr marking before batch recording
- Comprehensive failure recording
268-306
: Solid implementation of migration processingThe
migrate
method correctly separates migration execution from staging cleanup. The approach of removing staging xattrs from all files (regardless of individual migration success) is appropriate since failed files are recorded for retry.
378-408
: Well-structured restore processing with proper callback handlingThe
restore
method correctly implements the two-phase restore approach with appropriate success/failure callbacks and conditional batch processing.
491-513
: DMAPI premig check implementation is correctThe DMAPI premigration check is properly implemented. The error throwing behavior when
failure_recorder
is not provided (lines 507-509) is intentional per the retrieved learning - it ensures the entire batch fails and preserves the log file for retry, preventing files from being silently stuck in an unrestored state.src/sdk/glacier.js (3)
4-89
: New imports and constants properly support staging architectureThe added imports (path, PersistentLogger, native_fs_utils) and new constants for staging WAL namespaces, cluster locks, and staging xattr are well-structured and consistent with the overall refactor.
104-112
: Appropriate base implementation for stagingThe
stage_migrate
method provides a solid base implementation that simply transfers entries to the staging namespace, allowing subclasses to implement more sophisticated staging logic.
446-446
: Correct addition of staging xattr to should_restore checkAdding
Glacier.XATTR_STAGE_MIGRATE
to the xattr keys ensures proper staging state checking during restore operations.
0b11eac
to
5452e21
Compare
5452e21
to
35a442a
Compare
@alphaprinz - This PR attempts to deal with the race between @alphaprinz @jackyalbo the change in the |
35a442a
to
5b850e1
Compare
@tangledbytes Nsfs panic was not reproduced with the fix. Great job :) The changes to notification_utils are minor, I have no problems with them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with the Glacier content.
Had some minor notes regarding comments.
57f9bef
to
09fbdf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🔭 Outside diff range comments (1)
src/sdk/glacier_tapecloud.js (1)
204-266
: Ensure file handle is always awaited on close and error pathsA couple of close calls aren’t awaited, which can leak descriptors and hold locks longer than needed if the promise rejects.
- await entry_fh?.close(fs_context); + await entry_fh?.close(fs_context); ... - } finally { - entry_fh?.close(fs_context); - } + } finally { + await entry_fh?.close(fs_context); + }
🧹 Nitpick comments (4)
src/util/native_fs_utils.js (1)
764-769
: includes_any should return a boolean in all casesThe helper returns true or undefined. Make it explicit to avoid accidental truthiness/falseiness bugs and improve readability.
- const includes_any = (string, ...chars) => { - for (const ch of chars) { - if (string.includes(ch)) return true; - } - }; + const includes_any = (string, ...chars) => { + for (const ch of chars) { + if (string.includes(ch)) return true; + } + return false; + };src/manage_nsfs/manage_nsfs_glacier.js (1)
56-58
: Fix function name typo: prepare_galcier_fs_context → prepare_glacier_fs_contextMinor typo in the helper name; rename for clarity and consistency.
- await backend.perform(prepare_galcier_fs_context(fs_context), "EXPIRY"); + await backend.perform(prepare_glacier_fs_context(fs_context), "EXPIRY");- await backend.perform(prepare_galcier_fs_context(fs_context), "RESTORE"); + await backend.perform(prepare_glacier_fs_context(fs_context), "RESTORE");-function prepare_galcier_fs_context(fs_context) { +function prepare_glacier_fs_context(fs_context) {Also applies to: 117-127, 21-23, 36-38
src/sdk/glacier.js (2)
114-121
: Fix typo in documentation (“LofFile” → “LogFile”) and clarify namespaceDoc nit; correct type name improves readability for consumers.
- * migrate must take a LofFile object (this should from the + * migrate must take a LogFile object (this should be from the
449-457
: Include only-needed xattrs in should_restore; XATTR_STAGE_MIGRATE inclusion is redundant hereYou fetch XATTR_STAGE_MIGRATE in should_restore but don’t use it in the function. It’s checked in stage_restore. Consider removing it here to save a sys-call payload, unless you plan to consult it here as well.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (10)
config.js
(1 hunks)src/manage_nsfs/manage_nsfs_glacier.js
(1 hunks)src/sdk/glacier.js
(7 hunks)src/sdk/glacier_tapecloud.js
(7 hunks)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
(4 hunks)src/util/bucket_logs_utils.js
(4 hunks)src/util/file_reader.js
(1 hunks)src/util/native_fs_utils.js
(3 hunks)src/util/notifications_util.js
(3 hunks)src/util/persistent_logger.js
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- src/util/file_reader.js
- src/util/bucket_logs_utils.js
- src/util/notifications_util.js
- src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
- src/util/persistent_logger.js
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-05T10:05:51.959Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:507-509
Timestamp: 2025-08-05T10:05:51.959Z
Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.
Applied to files:
src/sdk/glacier_tapecloud.js
src/sdk/glacier.js
🧬 Code Graph Analysis (5)
config.js (16)
src/manage_nsfs/manage_nsfs_glacier.js (1)
config
(6-6)src/util/notifications_util.js (1)
config
(5-5)src/util/native_fs_utils.js (1)
config
(10-10)src/server/node_services/nodes_monitor.js (1)
config
(15-15)src/sdk/namespace_fs.js (1)
config
(13-13)src/endpoint/endpoint.js (1)
config
(18-18)src/manage_nsfs/nc_lifecycle.js (1)
config
(10-10)src/util/http_utils.js (1)
config
(19-19)src/util/fork_utils.js (1)
config
(11-11)src/cmd/manage_nsfs.js (1)
config
(18-18)src/sdk/bucketspace_fs.js (1)
config
(9-9)src/endpoint/s3/s3_rest.js (1)
config
(15-15)src/cmd/nsfs.js (1)
config
(20-20)src/agent/block_store_services/block_store_base.js (1)
config
(9-9)src/util/debug_module.js (1)
config
(24-24)src/server/bg_workers.js (1)
config
(16-16)
src/sdk/glacier_tapecloud.js (4)
src/sdk/glacier.js (2)
nb_native
(5-5)config
(9-9)src/util/persistent_logger.js (1)
nb_native
(5-5)src/util/file_reader.js (1)
nb_native
(5-5)config.js (2)
stat
(1351-1351)config
(7-7)
src/manage_nsfs/manage_nsfs_glacier.js (4)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (6)
backend
(90-90)backend
(174-174)fs_context
(191-191)fs_context
(303-303)fs_context
(439-439)config
(11-11)src/util/native_fs_utils.js (2)
fs_context
(549-554)config
(10-10)src/cmd/manage_nsfs.js (3)
fs_context
(73-73)config
(18-18)native_fs_utils
(23-23)src/manage_nsfs/manage_nsfs_cli_utils.js (2)
fs_context
(125-125)native_fs_utils
(7-7)
src/sdk/glacier.js (3)
src/manage_nsfs/manage_nsfs_glacier.js (5)
path
(4-4)native_fs_utils
(9-9)fs_context
(13-13)fs_context
(28-28)fs_context
(42-42)src/util/persistent_logger.js (3)
path
(4-4)dbg
(9-9)native_fs_utils
(6-6)src/util/native_fs_utils.js (3)
path
(7-7)dbg
(4-4)fs_context
(549-554)
src/util/native_fs_utils.js (1)
src/native/fs/fs_napi.cpp (5)
flags
(800-828)error
(700-708)error
(700-700)error
(722-735)error
(722-722)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-package-lock-validation
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (6)
config.js (1)
919-923
: LGTM: adds DMAPI finalize-restore toggleThe new NSFS_GLACIER_DMAPI_FINALIZE_RESTORE_ENABLE flag and inline docs look good and align with the TapeCloud finalize path.
src/util/native_fs_utils.js (1)
762-763
: Confirm '+*' open mode support in nb_native().fs.open
We didn’t find any parsing for the custom '+' mode in our JS wrappers or native sources, and '+' isn’t a standard fs.open flag. Please verify thatnb_native().fs.open
supports '+*'; if not, replace it with a supported mode (for example,'r+'
or'rw'
).• Location to check/use:
– src/util/native_fs_utils.js:762
js flags = flags || (cfg.lock === 'EXCLUSIVE' ? '+*' : 'r');
src/manage_nsfs/manage_nsfs_glacier.js (2)
16-24
: LGTM: migration trigger conditions and timestampingThe OR combination of low_free_space/time_exceeded/log_threshold makes sense to drive more aggressive migrations during pressure while keeping periodic runs and a size-triggered fast path. Timestamp handling is correct.
31-39
: LGTM: restore guard prevents work under pressure and enforces intervalEarly-return on low_free_space or insufficient interval is appropriate to avoid competing with migrations and keep restore cadence bounded.
src/sdk/glacier_tapecloud.js (1)
121-130
: LGTM: task ID parsing handles new “a:b:c” format with fallbackThe updated regex with a sane fallback is robust to both TapeCloud versions.
src/sdk/glacier.js (1)
200-271
: LGTM: two-phase orchestration with cluster locksThe central perform() flow (cluster-wide staging + per-op consumption under separate locks) matches the PR design goals for parallelism and safety. The lock file derivation under NSFS_GLACIER_LOGS_DIR is consistent and avoids cross-namespace contention.
09fbdf6
to
d0a93ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
src/manage_nsfs/manage_nsfs_glacier.js (1)
92-105
: File handle leak inmigrate_log_exceeds_threshold
fh
is opened but never closed, which can leak FDs. Close it in a finally block.async function migrate_log_exceeds_threshold(threshold = config.NSFS_GLACIER_MIGRATE_LOG_THRESHOLD) { const log = new PersistentLogger(config.NSFS_GLACIER_LOGS_DIR, Glacier.MIGRATE_WAL_NAME, { locking: null }); let log_size = Number.MAX_SAFE_INTEGER; + let fh; try { - const fh = await log._open(); + fh = await log._open(); const { size } = await fh.stat(log.fs_context); log_size = size; } catch (error) { console.error("failed to get size of", Glacier.MIGRATE_WAL_NAME, error); + } finally { + try { await fh?.close(log.fs_context); } catch (_) {} } return log_size > threshold; }src/sdk/glacier_tapecloud.js (1)
204-258
: Two resource-handling bugs instage_migrate
(FD leak and missing await on close)
- FD leak: when
should_migrate
is false, the openedentry_fh
is not closed before returning.- Missing await:
entry_fh?.close(fs_context)
is not awaited in thefinally
block, risking delayed cleanup and lock leaks.Fix both:
let entry_fh; let should_migrate = true; try { entry_fh = await nb_native().fs.open(fs_context, entry); const stat = await entry_fh.stat(fs_context, { xattr_get_keys: [ Glacier.XATTR_RESTORE_REQUEST, Glacier.XATTR_RESTORE_EXPIRY, Glacier.STORAGE_CLASS_XATTR, ], }); should_migrate = await this.should_migrate(fs_context, entry, stat); } catch (err) { - await entry_fh?.close(fs_context); + await entry_fh?.close(fs_context); if (err.code === 'ENOENT') { // Skip this file return; } ... } // Skip the file if it shouldn't be migrated - if (!should_migrate) return; + if (!should_migrate) { + await entry_fh?.close(fs_context); + return; + } // Mark the file staged try { await entry_fh.replacexattr(fs_context, { [Glacier.XATTR_STAGE_MIGRATE]: Date.now().toString() }); await batch_recorder(entry); } catch (error) { dbg.error('failed to mark the entry migrate staged', error); ... } finally { - entry_fh?.close(fs_context); + await entry_fh?.close(fs_context); }
♻️ Duplicate comments (2)
src/util/native_fs_utils.js (1)
762-775
: Reiterating prior discussion: default flag'+*'
is unconventionalThis was discussed previously and you opted to keep
'+*'
. Acknowledging and not blocking; just tagging as a duplicate for traceability.src/sdk/glacier_tapecloud.js (1)
483-511
: Finalize-restore DMAPI premig enforcementThe premig check and failure-recorder/throw behavior match the previously stated design choice to avoid silent "restored" states. No change requested.
🧹 Nitpick comments (17)
src/util/native_fs_utils.js (4)
735-745
: JSDoc inconsistencies and typos inopen_with_lock
docs
- "aqcuired" → "acquired"
- "cfg.locking" should be "cfg.lock" (matches implementation)
- "O_RDRW" should be "O_RDWR" or reflect your actual flag strings (e.g., 'r+')
Recommend correcting to avoid confusion for future callers.
- * acquiring the lock on the file. Once lock is aqcuired + * acquiring the lock on the file. Once lock is acquired ... - * - If locking is needed then cfg.locking must be specified. + * - If locking is needed then cfg.lock must be specified. ... - * the flag is automatically updated to O_RDRW. + * the flag is automatically updated to a write-capable mode (e.g., 'r+').
764-769
: Ensureincludes_any
returns a booleanCurrently returns
true
orundefined
. Make it explicit for readability and to prevent accidental truthiness bugs in the future.const includes_any = (string, ...chars) => { for (const ch of chars) { if (string.includes(ch)) return true; } + return false; };
786-801
: Consider retrying on transient race errors during validation (stat
phase)If
nb_native().fs.stat(fs_context, file_path)
races with a rename/unlink, it can throwENOENT
. Today this aborts immediately. Given you already have a retry loop for identity mismatches, treating certain errors as retriable can makeopen_with_lock
more robust without changing lock semantics.} catch (error) { - await file?.close(fs_context); - throw error; + try { await file?.close(fs_context); } catch (_) {} + file = undefined; + if (['ENOENT', 'ESTALE'].includes(error.code) && retries > 1) { + await P.delay(backoff * (1 + Math.random())); + continue; + } + throw error; }
804-807
: Explicitly returnundefined
for clarity when retries are exhaustedThe function currently relies on implicit
undefined
return. Being explicit improves readability.- await file?.close(fs_context); + await file?.close(fs_context); + return undefined;src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (1)
124-140
: Restore all mutated config flags insafe_dmapi_surround
(missing NSFS_GLACIER_ENABLED)The helper mutates
NSFS_GLACIER_ENABLED
but does not restore it, which can leak state across tests. Capture and restore it like the others.- const safe_dmapi_surround = async (init, cb) => { + const safe_dmapi_surround = async (init, cb) => { await dmapi_config_semaphore.surround(async () => { - const start_value_1 = config.NSFS_GLACIER_DMAPI_ENABLE; - const start_value_2 = config.NSFS_GLACIER_DMAPI_IMPLICIT_SC; - const start_value_3 = config.NSFS_GLACIER_DMAPI_IMPLICIT_RESTORE_STATUS; + const start_value_enabled = config.NSFS_GLACIER_ENABLED; + const start_value_dmapi_enable = config.NSFS_GLACIER_DMAPI_ENABLE; + const start_value_implicit_sc = config.NSFS_GLACIER_DMAPI_IMPLICIT_SC; + const start_value_implicit_restore = config.NSFS_GLACIER_DMAPI_IMPLICIT_RESTORE_STATUS; config.NSFS_GLACIER_ENABLED = init; config.NSFS_GLACIER_DMAPI_IMPLICIT_SC = init; config.NSFS_GLACIER_DMAPI_IMPLICIT_RESTORE_STATUS = init; await cb(); - config.NSFS_GLACIER_DMAPI_ENABLE = start_value_1; - config.NSFS_GLACIER_DMAPI_IMPLICIT_SC = start_value_2; - config.NSFS_GLACIER_DMAPI_IMPLICIT_RESTORE_STATUS = start_value_3; + config.NSFS_GLACIER_ENABLED = start_value_enabled; + config.NSFS_GLACIER_DMAPI_ENABLE = start_value_dmapi_enable; + config.NSFS_GLACIER_DMAPI_IMPLICIT_SC = start_value_implicit_sc; + config.NSFS_GLACIER_DMAPI_IMPLICIT_RESTORE_STATUS = start_value_implicit_restore; }); };src/util/bucket_logs_utils.js (1)
39-45
: Incorrect property in error log (log.file
)
log
is aPersistentLogger
instance; usinglog.file
is likely incorrect or undefined. Prefer a path identifier such aslog.log_path
(or construct from dir/name).- dbg.error('processing log file failed', log.file); + dbg.error('processing log file failed', log.log_path);If
log.log_path
is not available on PersistentLogger, consider:path.join(config.PERSISTENT_BUCKET_LOG_DIR, entry.name)src/manage_nsfs/manage_nsfs_glacier.js (1)
117-127
: Typo in helper name:prepare_galcier_fs_context
Consider renaming to
prepare_glacier_fs_context
for clarity and consistency. If you agree, update all call sites in this file and the export.-function prepare_galcier_fs_context(fs_context) { +function prepare_glacier_fs_context(fs_context) { ... - return { ...fs_context, backend: 'GPFS', use_dmapi: true }; + return { ...fs_context, backend: 'GPFS', use_dmapi: true }; ... - return { ...fs_context }; + return { ...fs_context }; } -exports.process_migrations = process_migrations; +exports.process_migrations = process_migrations; ... -exports.process_expiry = process_expiry; +exports.process_expiry = process_expiry; -// and export the renamed function if it's part of the public API +// update call sites above: prepare_glacier_fs_context(fs_context)src/sdk/glacier.js (8)
86-90
: Lock taxonomy is clear; consider grouping constantsThe separate cluster-wide vs per-operation locks are well-defined. Minor: consider grouping these under a single object/enum to avoid string typos and improve discoverability.
- static GLACIER_CLUSTER_LOCK = 'glacier.cluster.lock'; - static GLACIER_MIGRATE_CLUSTER_LOCK = 'glacier.cluster.migrate.lock'; - static GLACIER_RESTORE_CLUSTER_LOCK = 'glacier.cluster.restore.lock'; - static GLACIER_SCAN_LOCK = 'glacier.scan.lock'; + static LOCKS = { + CLUSTER: 'glacier.cluster.lock', + MIGRATE: 'glacier.cluster.migrate.lock', + RESTORE: 'glacier.cluster.restore.lock', + SCAN: 'glacier.scan.lock', + };
91-103
: JSDoc nits: typos and parameter clarificationTighten wording and fix typos for clarity (“separated”, parameter type/description).
- /** - * stage_migrate must take a LogFile object (this should be from the - * `GLACIER.MIGRATE_WAL_NAME` namespace) which will have - * newline seperated entries of filenames which needs to be - * migrated to GLACIER and should stage the files for migration. - * - * The function should return false if it needs the log file to be - * preserved. - * @param {nb.NativeFSContext} fs_context - * @param {LogFile} log_file log filename - * @param {(entry: string) => Promise<void>} failure_recorder - * @returns {Promise<boolean>} - */ + /** + * Stage entries from a MIGRATE log into the stage namespace. + * - Input: LogFile from Glacier.MIGRATE_WAL_NAME (newline-separated file paths). + * - Output: Entries are copied into Glacier.MIGRATE_STAGE_WAL_NAME for later processing. + * - Return: true to delete the source log; false to preserve it. + * @param {nb.NativeFSContext} fs_context + * @param {LogFile} log_file + * @param {(entry: string) => Promise<void>} failure_recorder + * @returns {Promise<boolean>} + */
105-112
: Stage-migrate flow is correct; minor stylistic nitReturning true is consistent with the staging design. The async arrow’s body can just return the promise for brevity.
- await log_file.collect(Glacier.MIGRATE_STAGE_WAL_NAME, async (entry, batch_recorder) => batch_recorder(entry)); + await log_file.collect(Glacier.MIGRATE_STAGE_WAL_NAME, (entry, batch_recorder) => batch_recorder(entry));
115-126
: Fix typos in JSDoc (LogFile, grammar)LofFile → LogFile; minor wording/typos.
- /** - * migrate must take a LofFile object (this should from the - * `GLACIER.MIGRATE_STAGE_WAL_NAME` namespace) which will have newline - * separated entries of filenames which needs to be migrated to GLACIER - * and should perform migration of those files if feasible. + /** + * migrate must take a LogFile object (from the + * `Glacier.MIGRATE_STAGE_WAL_NAME` namespace) which has newline- + * separated file paths to migrate to GLACIER and should perform + * migration of those files if feasible.
134-146
: Fix copy/paste errors in JSDoc for stage_restoreThe input should be from RESTORE (primary) namespace, and the wording should be about staging restore, not migration.
- /** - * stage_restore must take a log file (from `Glacier.RESTORE_STAGE_WAL_NAME`) - * which will have newline seperated entries of filenames which needs to be - * migrated to GLACIER and should stage the files for migration. - * - * The function should return false if it needs the log file to be - * preserved. - * @param {nb.NativeFSContext} fs_context - * @param {LogFile} log_file log filename - * @param {(entry: string) => Promise<void>} failure_recorder - * @returns {Promise<boolean>} - */ + /** + * Stage entries from a RESTORE log into the stage namespace. + * - Input: LogFile from `Glacier.RESTORE_WAL_NAME` (newline-separated file paths). + * - Output: Entries are copied into `Glacier.RESTORE_STAGE_WAL_NAME` for later processing. + * - Return: true to delete the source log; false to preserve it. + * @param {nb.NativeFSContext} fs_context + * @param {LogFile} log_file + * @param {(entry: string) => Promise<void>} failure_recorder + * @returns {Promise<boolean>} + */
156-167
: JSDoc typo and consistency in restore signatureMinor: “seperated” → “separated”; ensure LogFile type mention is consistent across methods.
- * have newline seperated entries of filenames which needs to be + * have newline separated entries of filenames which need to be
219-230
: Nit: rename “entry” to “log_file” for clarityThe callback receives a LogFile, not an entry. Renaming improves readability and avoids confusion with line entries inside the log.
- await logs.process(async (entry, failure_recorder) => cb(fs_context, entry, failure_recorder)); + await logs.process(async (log_file, failure_recorder) => cb(fs_context, log_file, failure_recorder));
449-456
: XATTR_STAGE_MIGRATE is read but never usedFetching XATTR_STAGE_MIGRATE here has no effect since the logic doesn’t reference it. Either remove it from xattr_get_keys or incorporate it into the decision (e.g., to avoid redundant restore staging when a file is already flagged for migration).
- xattr_get_keys: [ - Glacier.XATTR_RESTORE_REQUEST, - Glacier.STORAGE_CLASS_XATTR, - Glacier.XATTR_STAGE_MIGRATE, - ], + xattr_get_keys: [ + Glacier.XATTR_RESTORE_REQUEST, + Glacier.STORAGE_CLASS_XATTR, + ],Would you like to factor the staging xattr into should_restore/should_migrate decisions? If not, dropping it from the stat call reduces overhead.
src/util/persistent_logger.js (2)
137-140
: Consider preserving deletion failure contextCurrently rethrowing will surface the error; consider adding the caught error (with code/path) to dbg.warn for quicker triage, or include it in the throw to retain context.
- } catch (error) { - dbg.warn('failed to delete the log file:', log_file.log_path); - throw error; + } catch (error) { + dbg.warn('failed to delete the log file:', log_file.log_path, 'error:', error); + throw error; }
280-322
: collect pipeline is correct; minor nits
- Resetting the reader before each run is fine and avoids partial reads.
- Minor: close() is called both before return and in finally; harmless but redundant.
Optional cleanup:
- if (filtered_log.local_size === 0) return; - - await filtered_log.close(); + if (filtered_log.local_size === 0) return;
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (10)
config.js
(1 hunks)src/manage_nsfs/manage_nsfs_glacier.js
(1 hunks)src/sdk/glacier.js
(7 hunks)src/sdk/glacier_tapecloud.js
(7 hunks)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
(4 hunks)src/util/bucket_logs_utils.js
(4 hunks)src/util/file_reader.js
(1 hunks)src/util/native_fs_utils.js
(3 hunks)src/util/notifications_util.js
(3 hunks)src/util/persistent_logger.js
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/util/notifications_util.js
- src/util/file_reader.js
- config.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*
⚙️ CodeRabbit Configuration File
src/test/**/*.*
: Ensure that the PR includes tests for the changes.
Files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
🧠 Learnings (4)
📚 Learning: 2025-08-18T05:00:52.268Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:377-406
Timestamp: 2025-08-18T05:00:52.268Z
Learning: In TapeCloudGlacier.restore method, the method always returns true (rather than propagating _recall() result) because it uses a staging approach where the original log file should always be deleted after processing. Failed entries are captured by failure_recorder for retry through staged files, so the original log preservation is not needed.
Applied to files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
src/sdk/glacier.js
src/sdk/glacier_tapecloud.js
📚 Learning: 2025-08-05T10:05:51.959Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:507-509
Timestamp: 2025-08-05T10:05:51.959Z
Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.
Applied to files:
src/sdk/glacier.js
src/sdk/glacier_tapecloud.js
📚 Learning: 2025-08-18T05:01:58.212Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/util/native_fs_utils.js:778-799
Timestamp: 2025-08-18T05:01:58.212Z
Learning: In the NooBaa codebase, fcntllock uses F_OFD_SETLKW which is a blocking lock operation that waits until the lock can be acquired, rather than returning EAGAIN/EWOULDBLOCK like non-blocking variants would.
Applied to files:
src/util/native_fs_utils.js
📚 Learning: 2025-08-18T05:00:21.111Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:273-311
Timestamp: 2025-08-18T05:00:21.111Z
Learning: In TapeCloudGlacier.migrate method, ignoring the boolean return value from _migrate() and always proceeding to unstage entries is acceptable behavior - the migrate operation doesn't require checking the return value like other operations might.
Applied to files:
src/sdk/glacier_tapecloud.js
🧬 Code Graph Analysis (3)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (2)
src/manage_nsfs/manage_nsfs_glacier.js (6)
backend
(14-14)backend
(29-29)backend
(43-43)fs_context
(13-13)fs_context
(28-28)fs_context
(42-42)src/util/native_fs_utils.js (1)
fs_context
(549-554)
src/sdk/glacier.js (3)
src/manage_nsfs/manage_nsfs_glacier.js (5)
config
(6-6)native_fs_utils
(9-9)fs_context
(13-13)fs_context
(28-28)fs_context
(42-42)src/util/persistent_logger.js (1)
native_fs_utils
(6-6)src/util/native_fs_utils.js (2)
config
(10-10)fs_context
(549-554)
src/manage_nsfs/manage_nsfs_glacier.js (5)
src/util/native_fs_utils.js (3)
fs_context
(549-554)config
(10-10)path
(7-7)src/cmd/manage_nsfs.js (3)
fs_context
(73-73)config
(18-18)native_fs_utils
(23-23)src/manage_nsfs/manage_nsfs_cli_utils.js (2)
fs_context
(125-125)native_fs_utils
(7-7)src/sdk/glacier.js (3)
config
(9-9)path
(4-4)native_fs_utils
(11-11)config.js (2)
config
(7-7)path
(11-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
🔇 Additional comments (26)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (4)
89-98
: Good test helper: centralized backend patchingCentralizing backend patching into
get_patched_backend()
reduces duplication and makes tests easier to reason about.
176-207
: Test logic aligns with staged WAL processingUsing
collect_and_process
on a LogFile and deferring delete by returningfalse
matches the new PersistentLogger callback contract. Also verifyingshould_migrate
here is a nice sanity check.
236-237
: RESTORE orchestration throughbackend.perform
is the right abstractionSwitching to
backend.perform(fs_context, "RESTORE")
appropriately reflects the new centralized execution model.
306-307
: End-to-end failure path validation for RESTORE looks goodExercising partial failures via a patched backend and asserting only success entries finalize is valuable coverage for the staging+finalization pipeline.
src/util/bucket_logs_utils.js (2)
9-9
: Import cleanup aligns with new LogFile callback contractRelying on the LogFile instance passed by PersistentLogger.process keeps responsibilities clear and avoids redundant wrappers.
55-60
: JSDoc update to acceptLogFile
is correctProcessing the provided
log_file
directly viacollect_and_process
aligns with the PersistentLogger changes.src/manage_nsfs/manage_nsfs_glacier.js (3)
16-24
: Migration trigger logic correctly centralizes via backend.performConditionally invoking
backend.perform(..., "MIGRATION")
and updating the timestamp afterward matches the stated design. Looks good.
36-39
: Restore trigger path LGTMTime-based gating followed by
backend.perform(..., "RESTORE")
is clear and consistent with the new orchestration.
56-58
: Expiry trigger path LGTMUsing
is_desired_time
and recording timestamp afterperform(..., "EXPIRY")
keeps the periodicity semantics intact.src/sdk/glacier_tapecloud.js (3)
122-131
: Robust task ID parsing with compatibility fallbackRecognizing the colon-delimited task ID and falling back to the legacy numeric format is a good compatibility improvement.
319-369
: Stage RESTORE flow reads and routes correctlyOpening per-entry, checking
should_restore
, and routing entries tofailure_recorder
whenXATTR_STAGE_MIGRATE
is present protects against migrate/restore overlap. Cleanup is awaited. Looks good.
377-405
: RESTORE orchestration and finalization path LGTMDeferring to
_recall
with partial failure/success recorders and then finalizing via_finalize_restore
on success aligns with the staged design. Always returningtrue
here (to delete the original log) is consistent with the intended pipeline.src/sdk/glacier.js (7)
4-4
: Imports and type hinting look correctpath, PersistentLogger, native_fs_utils, and the LogFile JSDoc import are appropriate for the new staging/locking flow.
Also applies to: 10-12, 13-13
49-50
: Clarify semantics and lifecycle of XATTR_STAGE_MIGRATEThis xattr is introduced but not used in this file’s logic. Please confirm:
- Which component is responsible for setting/clearing it (backend migrate/restore flows)?
- Whether should_migrate should also read/consider it to avoid re-staging already-staged entries (or if that’s intentionally handled purely via WAL names).
75-75
: Stage WAL naming is consistentstage.migrate and stage.restore align with the two-phase orchestration and make the flow readable.
Also applies to: 77-77
145-153
: Stage-restore mirrors stage-migrate correctlyFlow matches the intended design (collect into staged WAL and delete original). Good error handling.
205-211
: Expiry under dedicated scan lock looks goodRunning expiry under GLACIER_SCAN_LOCK isolates it from staging/processing locks. LGTM.
239-251
: Two-phase orchestration and locks are correct
- Phase 1: cluster-wide staging under GLACIER_CLUSTER_LOCK
- Phase 2: type-specific processing under MIGRATE/RESTORE lock
This cleanly enables one migrate and one restore concurrently cluster-wide.
253-269
: MIGRATION/RESTORE wiring is correctCorrect mapping of primary/staged namespaces and functions to the two-phase runner.
src/util/persistent_logger.js (7)
54-64
: open_with_lock + ensured directory creation = safer initCreating the directory and using native_fs_utils.open_with_lock with retries/backoff resolves the race seen previously and improves robustness under contention.
100-100
: Updated callback signature is soundUpdating the internal processing callback to accept LogFile enables holding the file lock across callback execution and deletion.
119-146
: Race fix: delete under lock, then deinit — correct sequencing
- LogFile.init() acquires an exclusive lock.
- Callback runs while held.
- Deletion happens before deinit, avoiding the previous race.
- Finally deinit ensures the lock is released even on errors.
This addresses the unlink/init race and is aligned with the PR objective.
250-254
: EXCLUSIVE NewlineReader per LogFile is appropriateEnsures per-file mutual exclusion while iterating entries and interacting with staging/processing.
256-268
: Explicit init semantics are cleanEagerly initializing to acquire the lock when needed is helpful; callers have control to hold/release locks as desired.
269-279
: Graceful deinitIdempotent close/reset is good; safe even if the callback preemptively deinitialized.
345-348
: Reset before collect_and_process is goodEnsures a fresh read regardless of prior usage; matches the updated lifecycle model.
d0a93ec
to
37e5240
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
src/sdk/glacier_tapecloud.js (1)
208-258
: File descriptor leak when should_migrate() returns falseentry_fh is not closed when should_migrate is false (early return on Line 242), leaking fds under load. Ensure the handle is closed before returning.
Apply this diff within the callback to close the handle on the early return:
- // Skip the file if it shouldn't be migrated - if (!should_migrate) return; + // Skip the file if it shouldn't be migrated + if (!should_migrate) { + await entry_fh?.close(fs_context); + return; + }src/manage_nsfs/manage_nsfs_glacier.js (1)
93-105
: File handle leak in migrate_log_exceeds_thresholdfh from log._open() is never closed, leaking descriptors. Close it in a finally block.
Apply this diff:
async function migrate_log_exceeds_threshold(threshold = config.NSFS_GLACIER_MIGRATE_LOG_THRESHOLD) { const log = new PersistentLogger(config.NSFS_GLACIER_LOGS_DIR, Glacier.MIGRATE_WAL_NAME, { locking: null }); let log_size = Number.MAX_SAFE_INTEGER; - try { - const fh = await log._open(); - - const { size } = await fh.stat(log.fs_context); - log_size = size; - } catch (error) { - console.error("failed to get size of", Glacier.MIGRATE_WAL_NAME, error); - } + let fh; + try { + fh = await log._open(); + const { size } = await fh.stat(log.fs_context); + log_size = size; + } catch (error) { + console.error("failed to get size of", Glacier.MIGRATE_WAL_NAME, error); + } finally { + if (fh) { + try { await fh.close(log.fs_context); } catch (e) { /* best effort */ } + } + } return log_size > threshold; }
♻️ Duplicate comments (1)
src/sdk/glacier_tapecloud.js (1)
483-511
: Consider gating premig check on fs_context.use_dmapi (already discussed)A prior review suggested adding fs_context.use_dmapi to the conditional to avoid false positives when DMAPI isn’t used. The discussion concluded it’s not necessary. Just noting the prior context here.
🧹 Nitpick comments (13)
src/sdk/glacier_tapecloud.js (1)
49-52
: Error message does not include the exit code due to incorrect Error constructionnew Error accepts only the message string; passing errcode as a second argument is ignored, losing the exit code in the message.
Apply this diff:
- if (errcode) { - throw new Error('process exited with non-zero exit code:', errcode); - } + if (errcode) { + throw new Error(`process exited with non-zero exit code: ${errcode}`); + }src/manage_nsfs/manage_nsfs_glacier.js (4)
16-24
: Add error handling around MIGRATION perform + timestamp updateIf backend.perform throws, you currently skip timestamping (good) but also lose diagnostics. Wrap with try/catch to log the failure explicitly.
Apply this diff:
- if ( + if ( await backend.low_free_space() || await time_exceeded(fs_context, config.NSFS_GLACIER_MIGRATE_INTERVAL, Glacier.MIGRATE_TIMESTAMP_FILE) || await migrate_log_exceeds_threshold() ) { - await backend.perform(prepare_galcier_fs_context(fs_context), "MIGRATION"); - const timestamp_file_path = path.join(config.NSFS_GLACIER_LOGS_DIR, Glacier.MIGRATE_TIMESTAMP_FILE); - await record_current_time(fs_context, timestamp_file_path); + try { + await backend.perform(prepare_galcier_fs_context(fs_context), "MIGRATION"); + const timestamp_file_path = path.join(config.NSFS_GLACIER_LOGS_DIR, Glacier.MIGRATE_TIMESTAMP_FILE); + await record_current_time(fs_context, timestamp_file_path); + } catch (err) { + console.error('process_migrations: backend.perform failed:', err); + } }
31-39
: Add error handling around RESTORE perform + timestamp updateMirror the migration handling to improve observability on failures.
Apply this diff:
- await backend.perform(prepare_galcier_fs_context(fs_context), "RESTORE"); - const timestamp_file_path = path.join(config.NSFS_GLACIER_LOGS_DIR, Glacier.RESTORE_TIMESTAMP_FILE); - await record_current_time(fs_context, timestamp_file_path); + try { + await backend.perform(prepare_galcier_fs_context(fs_context), "RESTORE"); + const timestamp_file_path = path.join(config.NSFS_GLACIER_LOGS_DIR, Glacier.RESTORE_TIMESTAMP_FILE); + await record_current_time(fs_context, timestamp_file_path); + } catch (err) { + console.error('process_restores: backend.perform failed:', err); + }
56-58
: Add error handling around EXPIRY perform + timestamp updateKeep behavior consistent with other operations and preserve diagnostics.
Apply this diff:
- await backend.perform(prepare_galcier_fs_context(fs_context), "EXPIRY"); - await record_current_time(fs_context, timestamp_file_path); + try { + await backend.perform(prepare_galcier_fs_context(fs_context), "EXPIRY"); + await record_current_time(fs_context, timestamp_file_path); + } catch (err) { + console.error('process_expiry: backend.perform failed:', err); + }
117-127
: Fix typo: prepare_galcier_fs_context → prepare_glacier_fs_contextMinor naming typo; rename for clarity and consistency.
Apply this diff for the definition:
-function prepare_galcier_fs_context(fs_context) { +function prepare_glacier_fs_context(fs_context) {And update call sites similarly:
- Line 21: backend.perform(prepare_glacier_fs_context(fs_context), "MIGRATION")
- Line 36: backend.perform(prepare_glacier_fs_context(fs_context), "RESTORE")
- Line 56: backend.perform(prepare_glacier_fs_context(fs_context), "EXPIRY")
src/sdk/glacier.js (8)
91-112
: Docs: fix typos (“seperated” → “separated”), tighten wordingSmall documentation cleanup for stage_migrate.
Apply this diff within the comment block:
- * newline seperated entries of filenames which needs to be - * migrated to GLACIER and should stage the files for migration. + * newline separated entries of filenames that need to be + * migrated to GLACIER and should stage the files for migration. - * The function should return false if it needs the log file to be - * preserved. + * Return false to preserve the log file.
114-129
: Docs: fix type name (“LofFile” → “LogFile”) and typosMinor documentation corrections.
Apply this diff:
- * migrate must take a LofFile object (this should from the + * migrate must take a LogFile object (from the - * `GLACIER.MIGRATE_STAGE_WAL_NAME` namespace) which will have newline - * separated entries of filenames which needs to be migrated to GLACIER + * `GLACIER.MIGRATE_STAGE_WAL_NAME` namespace) which has newline + * separated filenames that need to be migrated to GLACIER
133-153
: Docs: stage_restore description is copied from migrate; adjust wordingThis block should describe staging for restore, not migration, and fix typos.
Apply this diff:
- * stage_restore must take a log file (from `Glacier.RESTORE_STAGE_WAL_NAME`) - * which will have newline seperated entries of filenames which needs to be - * migrated to GLACIER and should stage the files for migration. + * stage_restore must take a log file (from `Glacier.RESTORE_WAL_NAME`) + * which has newline separated filenames that need to be + * restored from GLACIER and should stage the files for restore. - * The function should return false if it needs the log file to be - * preserved. + * Return false to preserve the log file. - * @param {LogFile} log_file log filename + * @param {LogFile} log_file
155-169
: Docs: fix typos and clarify restore descriptionMinor clarity improvements.
Apply this diff:
- * restore must take a log file (from `Glacier.RESTORE_WAL_NAME`) which will - * have newline seperated entries of filenames which needs to be - * restored from GLACIER and should perform restore of those files if - * feasible + * restore must take a log file (from `Glacier.RESTORE_STAGE_WAL_NAME`) which + * has newline separated filenames to restore from GLACIER and should + * perform restore of those files if feasible.
129-131
: Error message typo: “Unimplementented” → “Unimplemented”Fix the exception text for clarity.
Apply this diff:
- throw new Error('Unimplementented'); + throw new Error('Unimplemented');
171-172
: Error message typo: “Unimplementented” → “Unimplemented” (restore)Same typo in restore.
Apply this diff:
- throw new Error('Unimplementented'); + throw new Error('Unimplemented');
181-182
: Error message typo: “Unimplementented” → “Unimplemented” (expiry)Same typo in expiry.
Apply this diff:
- throw new Error('Unimplementented'); + throw new Error('Unimplemented');
197-198
: Error message typo: “Unimplementented” → “Unimplemented” (low_free_space)Same typo in low_free_space.
Apply this diff:
- throw new Error('Unimplementented'); + throw new Error('Unimplemented');
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (10)
config.js
(1 hunks)src/manage_nsfs/manage_nsfs_glacier.js
(1 hunks)src/sdk/glacier.js
(7 hunks)src/sdk/glacier_tapecloud.js
(7 hunks)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
(4 hunks)src/util/bucket_logs_utils.js
(4 hunks)src/util/file_reader.js
(1 hunks)src/util/native_fs_utils.js
(3 hunks)src/util/notifications_util.js
(3 hunks)src/util/persistent_logger.js
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- config.js
- src/util/file_reader.js
- src/util/notifications_util.js
- src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
- src/util/persistent_logger.js
- src/util/bucket_logs_utils.js
- src/util/native_fs_utils.js
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-18T05:00:52.268Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:377-406
Timestamp: 2025-08-18T05:00:52.268Z
Learning: In TapeCloudGlacier.restore method, the method always returns true (rather than propagating _recall() result) because it uses a staging approach where the original log file should always be deleted after processing. Failed entries are captured by failure_recorder for retry through staged files, so the original log preservation is not needed.
Applied to files:
src/sdk/glacier.js
src/sdk/glacier_tapecloud.js
📚 Learning: 2025-08-05T10:05:51.959Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:507-509
Timestamp: 2025-08-05T10:05:51.959Z
Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.
Applied to files:
src/sdk/glacier.js
src/sdk/glacier_tapecloud.js
📚 Learning: 2025-08-18T05:00:21.111Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:273-311
Timestamp: 2025-08-18T05:00:21.111Z
Learning: In TapeCloudGlacier.migrate method, ignoring the boolean return value from _migrate() and always proceeding to unstage entries is acceptable behavior - the migrate operation doesn't require checking the return value like other operations might.
Applied to files:
src/sdk/glacier_tapecloud.js
🧬 Code Graph Analysis (1)
src/manage_nsfs/manage_nsfs_glacier.js (5)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (7)
backend
(90-90)backend
(174-174)fs_context
(191-191)fs_context
(303-303)fs_context
(439-439)config
(11-11)path
(6-6)src/util/native_fs_utils.js (3)
fs_context
(549-554)config
(10-10)path
(7-7)src/cmd/manage_nsfs.js (3)
fs_context
(73-73)config
(18-18)native_fs_utils
(23-23)src/sdk/glacier.js (3)
config
(9-9)path
(4-4)native_fs_utils
(11-11)config.js (2)
config
(7-7)path
(11-11)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
🔇 Additional comments (3)
src/sdk/glacier_tapecloud.js (3)
18-18
: JSDoc import-only for LogFile looks goodUsing a JSDoc import to pull types without a runtime dependency is appropriate here.
121-130
: Robust task ID parsing with backward-compatible fallbackThe new regex supports the modern 1005:1111:4444 format and correctly falls back to the legacy numeric format. Good defensive handling.
380-402
: Restore always returns true by design (staging semantics) — LGTMReturning true regardless of _recall()’s boolean aligns with the two-phase staging approach: original logs are deleted post-staging; failed entries are retried via staged logs using failure_recorder.
Signed-off-by: Utkarsh Srivastava <[email protected]> remove fcntltrylock Signed-off-by: Utkarsh Srivastava <[email protected]> add support for new task ID format Signed-off-by: Utkarsh Srivastava <[email protected]>
37e5240
to
197213b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
src/sdk/glacier.js (1)
449-456
: Include RESTORE_EXPIRY in xattr_get_keys for should_restoreget_restore_status consults XATTR_RESTORE_EXPIRY. Fetch it in the stat call to avoid partial status evaluation, especially when DMAPI implicit flags are disabled.
stat = await nb_native().fs.stat(fs_context, file, { xattr_get_keys: [ Glacier.XATTR_RESTORE_REQUEST, Glacier.STORAGE_CLASS_XATTR, + Glacier.XATTR_RESTORE_EXPIRY, Glacier.XATTR_STAGE_MIGRATE, ], });
src/sdk/glacier_tapecloud.js (1)
207-258
: File descriptor leak when skipping non-migratable entries + missing await on close.
- If should_migrate is false, the function returns without closing entry_fh.
- The final close is not awaited; if it rejects you may get an unhandled rejection, and you also risk racing with subsequent operations.
Apply this diff to ensure proper closure in all paths:
// Skip the file if it shouldn't be migrated - if (!should_migrate) return; + if (!should_migrate) { + await entry_fh?.close(fs_context); + return; + } // Mark the file staged try { await entry_fh.replacexattr(fs_context, { [Glacier.XATTR_STAGE_MIGRATE]: Date.now().toString() }); await batch_recorder(entry); } catch (error) { dbg.error('failed to mark the entry migrate staged', error); // Can't really do anything if this fails - provider // needs to make sure that appropriate error handling // is being done there await failure_recorder(entry); } finally { - entry_fh?.close(fs_context); + await entry_fh?.close(fs_context); }
♻️ Duplicate comments (5)
src/util/persistent_logger.js (1)
274-278
: Redundant reader reset during deinit [same concern as before]deinit already closes the reader; both collect and collect_and_process also reset before iterating. Resetting here is likely redundant; consider removing to reduce surprises and keep lifecycle symmetric.
src/sdk/glacier_tapecloud.js (4)
18-18
: Type-only JSDoc import is correct; keep it.This import is useful for tooling/typing and has no runtime cost.
273-311
: Migrate flow matches the two-stage design; ignoring _migrate()’s boolean is OK.Proceeding to unstage regardless of _migrate()’s boolean aligns with the design decision captured in this PR’s discussion and learnings.
383-407
: Returning true from restore() is intentional with the staged flow.Capturing success to decide whether to iterate the staged log but always returning true matches the stated design: original logs are deleted post-staging; failures are retried via staged files.
489-517
: DMAPI pre-migration gate: behavior and documentation look correct; optional guard for misconfig.
- Good addition: only request the DMAPI xattr when fs_context.use_dmapi is true.
- Throwing when failure_recorder is absent is intentional to preserve the log and avoid silent stuck files.
- The doc block explains the rationale clearly. Nice.
Optional hardening: Add fs_context.use_dmapi to the conditional to avoid false positives if the config flag is accidentally enabled without DMAPI.
- if ( - config.NSFS_GLACIER_DMAPI_FINALIZE_RESTORE_ENABLE && - !stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_PREMIG] - ) { + if ( + config.NSFS_GLACIER_DMAPI_FINALIZE_RESTORE_ENABLE && + fs_context.use_dmapi && + !stat.xattr[Glacier.GPFS_DMAPI_XATTR_TAPE_PREMIG] + ) {
🧹 Nitpick comments (10)
src/util/notifications_util.js (2)
100-100
: Remove redundant await and async in the callback passed to PersistentLogger.processNo need to wrap _notify with an extra async/await. Let process await the returned promise directly.
- await log.process(async (file, failure_append) => await this._notify(file, failure_append)); + await log.process((file, failure_append) => this._notify(file, failure_append));
120-160
: Avoid creating a temporary filtered log when only iterating entriesUsing file.collect_and_process without ever calling the batch_recorder parameter forces a temporary PersistentLogger allocation per file (albeit quickly removed). Consider adding and using a lightweight iteration helper (e.g., LogFile.forEach) or exposing NewlineReader iteration on LogFile to avoid the extra logger allocations. If not feasible now, optional.
src/util/persistent_logger.js (2)
174-188
: Consider closing the failure logger after processing inactive failuresCurrent code closes failure_log in finally, which is fine. Optionally, call await failure_log.close() after each phase to release handles earlier if contention arises. Not critical.
345-349
: Reset before iteration is fine; ensure this stays the single point of resetGiven the explicit reset before reading, the extra reset in deinit can be dropped to reduce duplication. Keeping reset localized to iteration points improves clarity.
src/util/bucket_logs_utils.js (1)
39-39
: Integrate failure recording to improve retry granularity (optional)You’re returning false on any error, which preserves the whole WAL file. If you pass through failure_recorder and use it for per-entry failures, you can delete the file while persisting only failed entries to a failure log, reducing reprocessing. This is optional and depends on your retry strategy.
For example, thread the recorder through:
- return log.process(async file => _upload_to_targets(s3_connection, file, bucket_to_owner_keys_func)); + return log.process(async (file, failure_recorder) => + _upload_to_targets(s3_connection, file, bucket_to_owner_keys_func, failure_recorder) + );And extend _upload_to_targets accordingly to append selectively on failures.
src/sdk/glacier.js (2)
91-112
: Stage-to-stage copy is correct; underscore unused params (optional)stage_migrate writes primary entries to the staged WAL namespace. failure_recorder isn’t used here; prefixing it with _failure_recorder will satisfy linters.
223-229
: Close the logs after processing to minimize handle lifetime (optional)process_glacier_logs constructs a PersistentLogger and runs process but never closes the logger. Today that's fine (no append/init), but explicitly closing improves hygiene and future-proofs changes.
- const logs = new PersistentLogger( + const logs = new PersistentLogger( config.NSFS_GLACIER_LOGS_DIR, namespace, { locking: 'EXCLUSIVE' }, ); - await logs.process(async (entry, failure_recorder) => cb(fs_context, entry, failure_recorder)); + try { + await logs.process(async (file, failure_recorder) => cb(fs_context, file, failure_recorder)); + } finally { + await logs.close(); + }src/sdk/glacier_tapecloud.js (3)
121-130
: Unify task-id parsing into a single, future-proof regex.Support both the new colon-delimited form and the legacy numeric form without a fallback branch. Also make the match case-insensitive.
Apply this diff:
- // ID in latest version must look like 1005:1111:4444 - let match = stdout.match(/task ID is (\d+:\d+:\d+)/); - if (!match || match.length !== 2) { - // Fallback to the older task ID extraction in case we fail to extract new one - match = stdout.match(/task ID is (\d+)/); - if (!match || match.length !== 2) { - throw error; - } - } + // ID can look like 1005:1111:4444 (new) or a plain number (old) + const match = stdout.match(/task ID is (\d+(?::\d+){2,}|\d+)/i); + if (!match) { + throw error; + }
295-301
: Typo in log message.Minor nit: “enty” -> “entry”.
- // Add the enty to the failure log - This could be wasteful as it might + // Add the entry to the failure log - This could be wasteful as it might
495-506
: Nit: fix step numbering in the comment.Step 8 is missing (goes 7 -> 9). Keeping sequential numbering aids future readers.
- // 9. NooBaa finalizes the restore + // 8. NooBaa finalizes the restore
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (10)
config.js
(1 hunks)src/manage_nsfs/manage_nsfs_glacier.js
(1 hunks)src/sdk/glacier.js
(7 hunks)src/sdk/glacier_tapecloud.js
(7 hunks)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
(4 hunks)src/util/bucket_logs_utils.js
(4 hunks)src/util/file_reader.js
(1 hunks)src/util/native_fs_utils.js
(3 hunks)src/util/notifications_util.js
(3 hunks)src/util/persistent_logger.js
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- config.js
- src/util/native_fs_utils.js
- src/util/file_reader.js
- src/manage_nsfs/manage_nsfs_glacier.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*
⚙️ CodeRabbit Configuration File
src/test/**/*.*
: Ensure that the PR includes tests for the changes.
Files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
🧠 Learnings (3)
📚 Learning: 2025-08-18T05:00:52.268Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:377-406
Timestamp: 2025-08-18T05:00:52.268Z
Learning: In TapeCloudGlacier.restore method, the method always returns true (rather than propagating _recall() result) because it uses a staging approach where the original log file should always be deleted after processing. Failed entries are captured by failure_recorder for retry through staged files, so the original log preservation is not needed.
Applied to files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
src/sdk/glacier.js
src/sdk/glacier_tapecloud.js
📚 Learning: 2025-08-05T10:05:51.959Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:507-509
Timestamp: 2025-08-05T10:05:51.959Z
Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.
Applied to files:
src/sdk/glacier.js
src/sdk/glacier_tapecloud.js
📚 Learning: 2025-08-18T05:00:21.111Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:273-311
Timestamp: 2025-08-18T05:00:21.111Z
Learning: In TapeCloudGlacier.migrate method, ignoring the boolean return value from _migrate() and always proceeding to unstage entries is acceptable behavior - the migrate operation doesn't require checking the return value like other operations might.
Applied to files:
src/sdk/glacier_tapecloud.js
🧬 Code Graph Analysis (5)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (3)
src/manage_nsfs/manage_nsfs_glacier.js (6)
backend
(14-14)backend
(29-29)backend
(43-43)fs_context
(13-13)fs_context
(28-28)fs_context
(42-42)src/sdk/namespace_fs.js (2)
s3_utils
(15-15)entry
(267-267)src/sdk/glacier.js (1)
s3_utils
(6-6)
src/util/bucket_logs_utils.js (2)
src/util/persistent_logger.js (1)
require
(8-8)src/endpoint/s3/s3_bucket_logging.js (1)
log
(146-158)
src/sdk/glacier.js (2)
src/manage_nsfs/manage_nsfs_glacier.js (8)
path
(4-4)nb_native
(7-7)nb_native
(72-72)config
(6-6)native_fs_utils
(9-9)fs_context
(13-13)fs_context
(28-28)fs_context
(42-42)src/util/persistent_logger.js (3)
path
(4-4)nb_native
(5-5)native_fs_utils
(6-6)
src/sdk/glacier_tapecloud.js (3)
src/native/fs/fs_napi.cpp (2)
stat
(2029-2033)stat
(2030-2030)src/util/native_fs_utils.js (7)
fs_context
(549-554)nb_native
(12-12)nb_native
(670-670)stat
(401-401)stat
(449-449)stat
(598-598)config
(10-10)src/sdk/glacier.js (2)
nb_native
(5-5)config
(9-9)
src/util/notifications_util.js (3)
src/sdk/namespace_fs.js (2)
require
(26-26)require
(29-29)src/server/bg_services/lifecycle.js (1)
require
(13-14)src/manage_nsfs/manage_nsfs_glacier.js (1)
log
(93-93)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ceph-s3-tests / Ceph S3 Tests
🔇 Additional comments (7)
src/util/notifications_util.js (1)
115-120
: API migration to LogFile looks goodJSDoc and signature updates correctly align _notify to accept a LogFile and a failure recorder, matching the new PersistentLogger.process contract. Implementation uses the provided file instance without re-locking.
src/util/persistent_logger.js (1)
119-147
: Good: switching _process to LogFile with explicit lifecycle and lock ownershipCreating LogFile per inactive WAL, acquiring EX lock in init(), and deinit() in finally ensures safe processing and deletion semantics. The delete-or-keep contract is clear.
src/util/bucket_logs_utils.js (1)
59-105
: LGTM: direct use of LogFile.collect_and_processUsing the LogFile instance directly simplifies the flow and aligns with the new PersistentLogger API. Logging with log_file.log_path is consistent.
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (1)
89-98
: Centralized backend patching is cleanget_patched_backend() consolidates test doubles for migrate/recall/expiry and keeps tests focused. Good refactor.
src/sdk/glacier.js (2)
253-269
: Two-phase orchestration with cluster and per-op locks looks goodStaging under GLACIER_CLUSTER_LOCK and executing under per-op locks (migrate/restore) is a solid approach to improve parallelism while preserving consistency. Nice.
133-153
: Fix JSDoc: source namespace for stage_restoreDocumentation says stage_restore takes a log file from Glacier.RESTORE_STAGE_WAL_NAME, but stage_restore is responsible for copying from the primary RESTORE WAL to the staged log. Update the doc to say it takes a file from Glacier.RESTORE_WAL_NAME.
- * stage_restore must take a log file (from `Glacier.RESTORE_STAGE_WAL_NAME`) + * stage_restore must take a log file (from `Glacier.RESTORE_WAL_NAME`)⛔ Skipped due to learnings
Learnt from: tangledbytes PR: noobaa/noobaa-core#9183 File: src/sdk/glacier_tapecloud.js:377-406 Timestamp: 2025-08-18T05:00:52.268Z Learning: In TapeCloudGlacier.restore method, the method always returns true (rather than propagating _recall() result) because it uses a staging approach where the original log file should always be deleted after processing. Failed entries are captured by failure_recorder for retry through staged files, so the original log preservation is not needed.
Learnt from: tangledbytes PR: noobaa/noobaa-core#9183 File: src/sdk/glacier_tapecloud.js:507-509 Timestamp: 2025-08-05T10:05:51.959Z Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.
src/sdk/glacier_tapecloud.js (1)
319-370
: Stage/restore screening and requeueing look solid.
- Requests the needed xattrs up-front.
- Correctly requeues entries staged for migration to avoid restore/migrate contention.
- Handles ENOENT gracefully and closes handles properly.
mocha.it('upload to GLACIER should work', async function() { | ||
const data = crypto.randomBytes(100); | ||
const upload_res = await glacier_ns.upload_object({ | ||
bucket: upload_bkt, | ||
key: upload_key, | ||
storage_class: s3_utils.STORAGE_CLASS_GLACIER, | ||
storage_class: s3_utils.STORAGE_CLASS_GLACIER, | ||
xattr, | ||
source_stream: buffer_utils.buffer_to_read_stream(data) | ||
}, dummy_object_sdk); | ||
|
||
console.log('upload_object response', inspect(upload_res)); | ||
|
||
// Check if the log contains the entry | ||
let found = false; | ||
await NamespaceFS.migrate_wal._process(async file => { | ||
const fs_context = glacier_ns.prepare_fs_context(dummy_object_sdk); | ||
const reader = new NewlineReader(fs_context, file, { lock: 'EXCLUSIVE' }); | ||
|
||
await reader.forEachFilePathEntry(async entry => { | ||
if (entry.path.endsWith(`${upload_key}`)) { | ||
found = true; | ||
|
||
// Not only should the file exist, it should be ready for | ||
// migration as well | ||
assert(backend.should_migrate(fs_context, entry.path)); | ||
} | ||
// Check if the log contains the entry | ||
let found = false; | ||
await NamespaceFS.migrate_wal._process(async file => { | ||
const fs_context = glacier_ns.prepare_fs_context(dummy_object_sdk); | ||
await file.collect_and_process(async entry => { | ||
if (entry.endsWith(`${upload_key}`)) { | ||
found = true; | ||
|
||
return true; | ||
}); | ||
// Not only should the file exist, it should be ready for | ||
// migration as well | ||
assert(backend.should_migrate(fs_context, entry)); | ||
} | ||
}); | ||
|
||
// Don't delete the file | ||
return false; | ||
}); | ||
// Don't delete the file | ||
return false; | ||
}); | ||
|
||
assert(found); | ||
}); | ||
assert(found); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Await the async should_migrate() before asserting
should_migrate returns a Promise. Asserting the Promise object will always pass. Await it before asserting the boolean result.
- assert(backend.should_migrate(fs_context, entry));
+ assert(await backend.should_migrate(fs_context, entry));
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
mocha.it('upload to GLACIER should work', async function() { | |
const data = crypto.randomBytes(100); | |
const upload_res = await glacier_ns.upload_object({ | |
bucket: upload_bkt, | |
key: upload_key, | |
storage_class: s3_utils.STORAGE_CLASS_GLACIER, | |
storage_class: s3_utils.STORAGE_CLASS_GLACIER, | |
xattr, | |
source_stream: buffer_utils.buffer_to_read_stream(data) | |
}, dummy_object_sdk); | |
console.log('upload_object response', inspect(upload_res)); | |
// Check if the log contains the entry | |
let found = false; | |
await NamespaceFS.migrate_wal._process(async file => { | |
const fs_context = glacier_ns.prepare_fs_context(dummy_object_sdk); | |
const reader = new NewlineReader(fs_context, file, { lock: 'EXCLUSIVE' }); | |
await reader.forEachFilePathEntry(async entry => { | |
if (entry.path.endsWith(`${upload_key}`)) { | |
found = true; | |
// Not only should the file exist, it should be ready for | |
// migration as well | |
assert(backend.should_migrate(fs_context, entry.path)); | |
} | |
// Check if the log contains the entry | |
let found = false; | |
await NamespaceFS.migrate_wal._process(async file => { | |
const fs_context = glacier_ns.prepare_fs_context(dummy_object_sdk); | |
await file.collect_and_process(async entry => { | |
if (entry.endsWith(`${upload_key}`)) { | |
found = true; | |
return true; | |
}); | |
// Not only should the file exist, it should be ready for | |
// migration as well | |
assert(backend.should_migrate(fs_context, entry)); | |
} | |
}); | |
// Don't delete the file | |
return false; | |
}); | |
// Don't delete the file | |
return false; | |
}); | |
assert(found); | |
}); | |
assert(found); | |
}); | |
mocha.it('upload to GLACIER should work', async function() { | |
const data = crypto.randomBytes(100); | |
const upload_res = await glacier_ns.upload_object({ | |
bucket: upload_bkt, | |
key: upload_key, | |
storage_class: s3_utils.STORAGE_CLASS_GLACIER, | |
xattr, | |
source_stream: buffer_utils.buffer_to_read_stream(data) | |
}, dummy_object_sdk); | |
console.log('upload_object response', inspect(upload_res)); | |
// Check if the log contains the entry | |
let found = false; | |
await NamespaceFS.migrate_wal._process(async file => { | |
const fs_context = glacier_ns.prepare_fs_context(dummy_object_sdk); | |
await file.collect_and_process(async entry => { | |
if (entry.endsWith(`${upload_key}`)) { | |
found = true; | |
// Not only should the file exist, it should be ready for | |
// migration as well | |
assert(await backend.should_migrate(fs_context, entry)); | |
} | |
}); | |
// Don't delete the file | |
return false; | |
}); | |
assert(found); | |
}); |
🤖 Prompt for AI Agents
In src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js around lines 176 to
207, the test calls backend.should_migrate(fs_context, entry) which returns a
Promise<boolean> but the code asserts the Promise object directly; update the
test to await the result (e.g. const migratable = await
backend.should_migrate(fs_context, entry)) and then assert the boolean
(assert(migratable)); ensure the surrounding callback is async so the await
works and no Promise is accidentally asserted.
await native_fs_utils._create_path(this.dir, this.fs_context); | ||
this.fh = await native_fs_utils.open_with_lock( | ||
this.fs_context, | ||
this.active_path, | ||
'as+', | ||
undefined, | ||
{ lock: this.locking, retries: 10, backoff: 5 } | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Preserve active file change detection by setting fh_stat on init
The poller (_poll_active_file_change) relies on this.fh_stat to detect inode changes, but init no longer sets it. Without it, active file change detection is effectively disabled. Set fh_stat after opening to restore the behavior.
this.fh = await native_fs_utils.open_with_lock(
this.fs_context,
this.active_path,
'as+',
undefined,
{ lock: this.locking, retries: 10, backoff: 5 }
);
-
- return this.fh;
+ // Record current inode to allow _poll_active_file_change to detect rotations
+ try {
+ this.fh_stat = await nb_native().fs.stat(this.fs_context, this.active_path);
+ } catch (e) {
+ // If the file vanishes between open and stat we will re-initialize on next append
+ dbg.warn('failed to stat active log after open:', this.active_path, e);
+ }
+ return this.fh;
Also applies to: 63-63
🤖 Prompt for AI Agents
In src/util/persistent_logger.js around lines 54-61 (and also line 63), init
opens the active file but no longer sets this.fh_stat, which disables the
poller’s inode-change detection; after successfully opening the file handle
assign this.fh_stat to the current file stat (e.g., obtain the inode/mtime via
the file handle’s stat or fs.fstat on the descriptor) so the
_poll_active_file_change routine has a baseline to compare against; apply the
same assignment in the alternate/branch at line 63 so both init paths restore
active file change detection.
Describe the Problem
Currently NooBaa allows to run only one glacier operation at a time. Either a migration can be running or a restore operation can be running. This was done to avoid several kinds of race cases. This doesn't create any "correctness" problem however it means that we don't completely utilize the underlying tape drives.
Another problem this PR tries to address is that the
PersistentLogger.process
would delete the file after the callback would return. This can potentially create a race situation because the file gets "moved" outside of lock.Explain the Changes
This PR addresses the parallel processing of migration and recall by splitting the processing into 2 different stages:
Promise.all([this._migrate...])
to run multiple migrations concurrently within the same process which could potentially improve the drive utilization further.To address the race in delete scenario - This PR acquires a lock on the file before handing it over to the callback and releases the lock once the callback returns. This has some implications:
deinit
the LogFile as that would close the file leading to losing the lock.LogFile
object.Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
./node_modules/.bin/mocha src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests