-
Notifications
You must be signed in to change notification settings - Fork 87
[NSFS | GLACIER] Support newline and backslash character in Glacier logs #9142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[NSFS | GLACIER] Support newline and backslash character in Glacier logs #9142
Conversation
WalkthroughAdds log entry encoding/decoding support across Glacier components. Introduces encode_log/decode_log APIs in Glacier, implements actual escaping with a delimiter in TapeCloudGlacier, applies encoding when appending WAL entries in NamespaceFS, and updates unit tests to cover special-character keys and encoded WAL behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant NSFS as NamespaceFS
participant G as Glacier (backend API)
participant TCG as TapeCloudGlacier
participant WAL as migrate_wal
participant FR as FailureRecorder
NSFS->>G: getBackend()
NSFS->>G: encode_log(entry)
G-->>NSFS: encoded_entry
NSFS->>WAL: append(encoded_entry)
TCG->>TCG: migrate()
TCG->>FR: onFailure(encode_log(entry))
FR-->>TCG: recorded
sequenceDiagram
participant NSFS as NamespaceFS
participant G as Glacier (backend API)
participant TCG as TapeCloudGlacier
participant WAL as restore_wal
participant BR as BatchRecorder
participant FR as FailureRecorder
NSFS->>G: encode_log(entry)
NSFS->>WAL: append(encoded_entry)
TCG->>WAL: read()
WAL-->>TCG: encoded_entry
TCG->>TCG: decode_log(entry)
TCG->>BR: process(decoded)
TCG->>FR: onFailure(encode_log(entry))
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/sdk/glacier_tapecloud.js
(2 hunks)src/sdk/namespace_fs.js
(1 hunks)src/util/file_reader.js
(4 hunks)src/util/persistent_logger.js
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-package-lock-validation
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (9)
src/sdk/glacier_tapecloud.js (1)
196-196
: Delimiter consistency confirmedThe
'\n-- '
delimiter added insrc/sdk/glacier_tapecloud.js
(lines 196 & 246) exactly matches the--
prefix used when appending entries insrc/sdk/namespace_fs.js
(lines 3690–3696:migrate_wal.append(\
-- ${entry}`)and
restore_wal.append(`-- ${entry}`)`). No further changes needed.src/util/persistent_logger.js (3)
9-9
: Import statement correctly updated.The addition of
LineReader
to the imports is necessary for the new delimiter functionality.
263-269
: Constructor properly extended with optional delimiter parameter.The optional
delim
parameter is correctly added with appropriate JSDoc documentation, maintaining backward compatibility for existing code.
293-305
: Conditional reader instantiation correctly implemented.The logic properly uses
LineReader
with the custom delimiter when provided, falling back toNewlineReader
for backward compatibility. The options (lock
,skip_overflow_lines
,skip_leftover_line
) are consistently applied to both readers.src/util/file_reader.js (5)
22-26
: Delimiter validation is correctly implemented.The type check ensures delimiter is either a string or number, providing clear error messaging for invalid types.
29-30
: Delimiter and encoding properties correctly initialized.The delimiter is stored as provided, and encoding defaults to 'utf8' which is appropriate for text log files.
64-80
: Delimiter handling in line extraction is correctly implemented.The code properly:
- Calculates delimiter length for both string and numeric types
- Uses the encoding parameter when searching for string delimiters
- Advances the buffer position by the correct delimiter length
201-234
: NewlineReader subclass maintains backward compatibility.The implementation correctly:
- Extends LineReader with newline character code (10) as the delimiter
- Preserves the original
forEachFilePathEntry
method- Maintains the same constructor interface for existing code
236-238
: Module exports correctly updated.The new
LineReader
class is properly exported alongside existing exports, maintaining backward compatibility.
src/sdk/namespace_fs.js
Outdated
async append_to_migrate_wal(entry) { | ||
if (!config.NSFS_GLACIER_LOGS_ENABLED) return; | ||
|
||
await NamespaceFS.migrate_wal.append(entry); | ||
await NamespaceFS.migrate_wal.append(`-- ${entry}`); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delimiter is incomplete – missing the leading \n
, log parsing will break
PersistentLogger
elsewhere was updated to split on '\n-- '
.
Appending "-- ${entry}"
omits that mandatory preceding newline, so the very first record after a file-open (and every subsequent one if the previous append did not end with \n
) will not be recognised by the reader.
Glacier WAL re-play and compaction will mis-detect boundaries.
- await NamespaceFS.migrate_wal.append(`-- ${entry}`);
+ // Prepend the required newline so each record begins with "\n-- "
+ await NamespaceFS.migrate_wal.append(`\n-- ${entry}`);
(Consider factoring the delimiter into a constant to avoid future drift.)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
async append_to_migrate_wal(entry) { | |
if (!config.NSFS_GLACIER_LOGS_ENABLED) return; | |
await NamespaceFS.migrate_wal.append(entry); | |
await NamespaceFS.migrate_wal.append(`-- ${entry}`); | |
} | |
async append_to_migrate_wal(entry) { | |
if (!config.NSFS_GLACIER_LOGS_ENABLED) return; | |
// Prepend the required newline so each record begins with "\n-- " | |
await NamespaceFS.migrate_wal.append(`\n-- ${entry}`); | |
} |
🤖 Prompt for AI Agents
In src/sdk/namespace_fs.js around lines 3687 to 3691, the append_to_migrate_wal
method appends entries with the delimiter "-- " but misses the required leading
newline character '\n', causing log parsing issues. Fix this by prepending '\n'
to the delimiter so the append call uses '\n-- ' before the entry. Also, define
this delimiter as a constant to ensure consistency and prevent future errors.
src/sdk/namespace_fs.js
Outdated
async append_to_restore_wal(entry) { | ||
if (!config.NSFS_GLACIER_LOGS_ENABLED) return; | ||
|
||
await NamespaceFS.restore_wal.append(entry); | ||
await NamespaceFS.restore_wal.append(`-- ${entry}`); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same delimiter bug as above in restore WAL
The restore WAL must use the same delimiter; otherwise it will suffer the identical parsing failure.
- await NamespaceFS.restore_wal.append(`-- ${entry}`);
+ await NamespaceFS.restore_wal.append(`\n-- ${entry}`);
Please fix both occurrences and, ideally, reuse a shared constant for the delimiter.
🤖 Prompt for AI Agents
In src/sdk/namespace_fs.js around lines 3693 to 3697, the append_to_restore_wal
method uses a delimiter for entries that is inconsistent with the one used
elsewhere, causing parsing failures. Fix this by defining a shared constant for
the delimiter at a common scope and replace the hardcoded delimiter in this
method and the other occurrence with that constant to ensure consistency.
@tangledbytes few comments: What if the filename contains this new compound delimiter? I feel like this solution is a bit random, and I might consider using encodeURI on the paths and then decodeURI when reading it:
Another option is to use null character '\0' as a line terminator, which is very stable, though it makes the file a bit less readable. |
@guymguym, just want to mention that I am still working on this PR, it is not clean yet. Regarding the solution, I agree it is a little strange however it was chose not only because our reader suffers from newline delimiter problem but the same problem is with eeadm as well. It cannot handle file with '\n' in path names. If we don want it to support that, we need to use |
3bce1bd
to
e29cfff
Compare
Signed-off-by: Utkarsh Srivastava <[email protected]>
e29cfff
to
950f8a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (7)
169-171
: Add backslash coverage to match PR objective (newline + backslash).The tests introduce newline-containing keys, but there’s no backslash-focused coverage. Since the PR target explicitly includes backslash support in Glacier logs, please add representative backslash cases too (including a literal backslash and a “\n” sequence that should not be interpreted as a newline).
Apply this diff to add backslash variants alongside the newline keys:
- const restore_key_spcl_char_1 = 'restore_key_2_\n'; - const restore_key_spcl_char_2 = 'restore_key_2_\n_2'; + const restore_key_spcl_char_1 = 'restore_key_2_\n'; + const restore_key_spcl_char_2 = 'restore_key_2_\n_2'; + // Backslash-focused keys: + // - restore_key_spcl_char_bs_1: contains a literal backslash + // - restore_key_spcl_char_bs_2: contains the two characters '\' + 'n' (not a newline) + const restore_key_spcl_char_bs_1 = 'restore_key_3_\\_1'; + const restore_key_spcl_char_bs_2 = 'restore_key_3_\\n_mix';
285-291
: Encode before recording WAL entries: good. Clarify the comment to avoid confusion.Using
failure_backend.encode_log(...)
before recording is correct and aligns with the new encode/decode flow. The “unintentionally” wording in the comments can be misleading since duplication here is part of how the test simulates the system’s behavior. Consider clarifying the intent.- // This unintentionally also replicates duplicate entries in WAL + // This intentionally replicates duplicate entries in WAL for test coverage await failure_recorder(failure_backend.encode_log(failed_file_path)); - // This unintentionally also replicates duplicate entries in WAL + // This intentionally replicates duplicate entries in WAL for test coverage await success_recorder(failure_backend.encode_log(success_file_path));
329-371
: Extend the special-characters test to include backslash cases.To fully validate the new encoding/decoding semantics, include the two backslash keys in the
all_params
list so both newline and backslash are exercised end-to-end.- const all_params = [ + const all_params = [ { bucket: upload_bkt, key: restore_key_spcl_char_1, storage_class: s3_utils.STORAGE_CLASS_GLACIER, xattr, days: 1, source_stream: buffer_utils.buffer_to_read_stream(data) }, { bucket: upload_bkt, key: restore_key_spcl_char_2, storage_class: s3_utils.STORAGE_CLASS_GLACIER, xattr, days: 1, source_stream: buffer_utils.buffer_to_read_stream(data) - } + }, + { + bucket: upload_bkt, + key: restore_key_spcl_char_bs_1, + storage_class: s3_utils.STORAGE_CLASS_GLACIER, + xattr, + days: 1, + source_stream: buffer_utils.buffer_to_read_stream(data) + }, + { + bucket: upload_bkt, + key: restore_key_spcl_char_bs_2, + storage_class: s3_utils.STORAGE_CLASS_GLACIER, + xattr, + days: 1, + source_stream: buffer_utils.buffer_to_read_stream(data) + } ];
353-354
: Keep tests quiet; gate debug logs behind an env var.Unconditional console logging in unit tests adds noise and can slow CI. Gate logs behind a debug flag.
- console.log('upload_object response', inspect(upload_res)); + if (process.env.DEBUG_NSFS_GLACIER_TESTS) { + console.log('upload_object response', inspect(upload_res)); + }
413-418
: Same nit: gate debug logs to reduce noise in CI.- const upload_res_1 = await glacier_ns.upload_object(failed_params, dummy_object_sdk); - console.log('upload_object response', inspect(upload_res_1)); + const upload_res_1 = await glacier_ns.upload_object(failed_params, dummy_object_sdk); + if (process.env.DEBUG_NSFS_GLACIER_TESTS) { + console.log('upload_object response', inspect(upload_res_1)); + } - const upload_res_2 = await glacier_ns.upload_object(success_params, dummy_object_sdk); - console.log('upload_object response', inspect(upload_res_2)); + const upload_res_2 = await glacier_ns.upload_object(success_params, dummy_object_sdk); + if (process.env.DEBUG_NSFS_GLACIER_TESTS) { + console.log('upload_object response', inspect(upload_res_2)); + }
425-429
: Reuse the prepared fs_context.You compute
fs_context
but don’t use it inperform
. Minor cleanup and avoids re-preparing context.- await failure_backend.perform(glacier_ns.prepare_fs_context(dummy_object_sdk), "RESTORE"); + await failure_backend.perform(fs_context, "RESTORE");
329-371
: Optional: Add a quick encode/decode roundtrip assertion for edge strings.A focused unit assertion will catch regressions in escaping rules independent of the broader flow tests.
Add this test (outside the shown ranges) under the same
describe('nsfs_glacier_tapecloud', ...)
:mocha.it('encode_log/decode_log should be a lossless roundtrip for edge strings', function() { const backend = new TapeCloudGlacier(); const samples = [ 'plain', 'contains newline\ninside', 'ends with newline\n', '\\', // single backslash '\\n', // backslash + n (not newline) 'mix\\nline\nend\\', // combo + trailing backslash ]; samples.forEach(s => { const enc = backend.encode_log(s); const dec = backend.decode_log(enc); assert.strictEqual(dec, s, `Roundtrip failed for sample: ${JSON.stringify(s)}`); }); });Also applies to: 373-447
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
src/sdk/glacier.js
(1 hunks)src/sdk/glacier_tapecloud.js
(11 hunks)src/sdk/namespace_fs.js
(1 hunks)src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/sdk/namespace_fs.js
- src/sdk/glacier.js
- src/sdk/glacier_tapecloud.js
🧰 Additional context used
📓 Path-based instructions (1)
src/test/**/*.*
⚙️ CodeRabbit Configuration File
src/test/**/*.*
: Ensure that the PR includes tests for the changes.
Files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
🧠 Learnings (2)
📚 Learning: 2025-08-05T10:05:51.959Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:507-509
Timestamp: 2025-08-05T10:05:51.959Z
Learning: In TapeCloudGlacier._finalize_restore method, when failure_recorder is not provided but the DMAPI premig check fails, the code intentionally throws an error to cause the entire batch to fail and preserve the log file for retry, preventing files from being silently stuck in an unrestored state.
Applied to files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
📚 Learning: 2025-08-18T05:00:52.268Z
Learnt from: tangledbytes
PR: noobaa/noobaa-core#9183
File: src/sdk/glacier_tapecloud.js:377-406
Timestamp: 2025-08-18T05:00:52.268Z
Learning: In TapeCloudGlacier.restore method, the method always returns true (rather than propagating _recall() result) because it uses a staging approach where the original log file should always be deleted after processing. Failed entries are captured by failure_recorder for retry through staged files, so the original log preservation is not needed.
Applied to files:
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ceph-s3-tests / Ceph S3 Tests
🔇 Additional comments (1)
src/test/unit_tests/nsfs/test_nsfs_glacier_backend.js (1)
373-447
: Good end-to-end failure-path coverage with special characters.The test validates both success and failure flows with newline-containing keys, including xattr checks. This meaningfully exercises the new encode/decode path via WAL.
Describe the Problem
NooBaa Glacier cannot work with object keys which have newline characters in their name.
Explain the Changes
This PR adds/changes the following:
encode_log
anddecode_log
. The idea is to be able to encode a log before writing it to the disk and decode a log after reading it from the disk. This allows to encode special characters like\n
.Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
./node_modules/.bin/mocha src/test/unit_tests/test_nsfs_glacier_backend.js
Summary by CodeRabbit
Bug Fixes
Tests