Skip to content

feat: chunkserver side chunk lock#743

Merged
dmga44 merged 3 commits intodevfrom
feat-cs-size-chunk-lock
Feb 18, 2026
Merged

feat: chunkserver side chunk lock#743
dmga44 merged 3 commits intodevfrom
feat-cs-size-chunk-lock

Conversation

@dmga44
Copy link
Collaborator

@dmga44 dmga44 commented Feb 11, 2026

The current implementation of the chunk locking in the system relies only in the client responses when finishing write and truncate operations. This commit targets including also chunkserver side replies after receiving an actual stream of write operations to decide whether the chunk is locked or not. This is not useful in the current state of code and must not affect the behavior of the system because the client responses to the master in order to unlock the chunk happen after the data is already in the chunkserver or there is some error.

The idea of the change is to keep track in each of the chunk parts of whether it is expected to be written at the time. Please note this affects all the writes coming from the client and some truncates. Those cases trigger one these functions:

  • chunk_create_operation
  • chunk_increase_version_operation
  • chunk_lock_operation
  • chunk_duplicate_operation

which can be traced back to the chunk_multi_modify function. After checking some conditions (specially if the CS version supports this feature) the new packets arrive to the CS and in most of the cases the write operation needs to wait for its responses, and the chunk parts are considered being written.

In the CS side, the locking is handled by the master's main JobPool. It can start, enforce, end and erase chunk lock jobs. The locked chunks are special the way that after ending the write operations on it, the master receives the status of the write operations that was not told to the clients, so far it is always OK because everything is told to the clients. Master jobs on enforced locked chunks will have to wait until the chunk is released, i.e finish writing.

Back again in the master side, it handles disconnection of parts being written and errors on those writes the same way: increasing version when the chance appears.

Side changes:

  • refactor chunk_multi_modify and chunk_multi_truncate functions.
  • change the effective grace time when starting and trying to create
    chunk before responding "no space" from 600s to 60s.
  • add documentation to some members of the Chunk class deeply involved
    in the change.
  • remove usedummylockid from chunk_multi_modify and writeChunk
    functions.
  • in the client side, always talk to the chunkservers when writing.
  • the main SaunaFS package version was updated to 5.8.0.
  • the cases that return SAUNAFS_ERROR_NOSPACE in the master side
    were improved.

The new behavior implemented previously can be now enabled and disabled
via the option USE_CHUNKSERVER_SIDE_CHUNK_LOCK. This option is
reloadable. The decision making instant is at the moment of sending the
specific packet type to the chunkservers.

The testing framework was modified in order to enable this option in
all tests, while the default master behavior has it disabled.

Closes LS-314.

Signed-off-by: Dave dave@leil.io

@gemini-code-assist
Copy link

Summary of Changes

Hello @dmga44, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's chunk locking mechanism by introducing chunkserver-side locking. This change aims to improve the robustness and consistency of write and truncate operations by allowing chunkservers to track the 'being written' state of chunk parts. The master now actively manages these locks, ensuring that operations are properly serialized and that chunk versions are correctly increased in the event of failures or disconnections. This provides a more reliable foundation for handling concurrent modifications to chunks across the distributed file system.

Highlights

  • Chunkserver-Side Locking Mechanism: Implemented a new chunkserver-side chunk locking mechanism to track write operations more robustly. This involves new JobPool methods (addJobIfNotLocked, addLockJob, startChunkLock, enforceChunkLock, endChunkLock, eraseChunkLock) and associated data structures (LockedChunkData) to manage locked chunks and pending jobs.
  • Master-Side Chunk Operation Refactoring: Refactored master-side chunk modification logic by introducing granular functions like chunk_create_operation, chunk_increase_version_operation, chunk_lock_operation, chunk_duplicate_operation, chunk_truncate_operation, and chunk_duplicate_and_truncate_operation. The main chunk_multi_modify and chunk_multi_truncate functions now orchestrate these smaller operations, improving code clarity and maintainability.
  • Enhanced Chunk State Management: Introduced a beingWritten flag in the ChunkPart struct on the master side to indicate if a chunk part is actively being written to, complementing the existing lockedto and lockid fields. The isLocked() logic was updated to consider this new flag, providing a more accurate representation of a chunk's locked state.
  • New Protocol Messages and Versioning: Added several new protocol messages (SAU_MATOCS_CREATE_AND_LOCK_CHUNK, SAU_MATOCS_SET_VERSION_AND_LOCK, SAU_MATOCS_LOCK_CHUNK, SAU_MATOCS_UNLOCK_CHUNK, SAU_MATOCS_DUPLICATE_AND_LOCK_CHUNK, SAU_CSTOMA_LOCK_CHUNK, SAU_CSTOMA_WRITE_END_STATUS) to facilitate communication between master and chunkservers regarding the new locking states. A new version constant kFirstVersionWithChunkserverSideChunkLock was introduced to manage compatibility.
  • Client-Side Write Operation Consistency: Modified client-side write operations to always communicate with chunkservers, even if no data is being written, to ensure that chunkserver-side locks are properly released. This change improves the consistency of write operations and lock management.
Changelog
  • src/chunkserver/bgjobs.cc
    • Included 'common/output_packet.h'.
    • Added addJobIfNotLocked method to JobPool to conditionally add jobs or defer them if a chunk is locked.
    • Added addLockJob method to JobPool for creating marker jobs for locked chunks.
    • Implemented startChunkLock, enforceChunkLock, endChunkLock, and eraseChunkLock methods in JobPool to manage chunk locks.
    • Modified existing job creation calls to use addJobIfNotLocked for chunk operations like delete, version change, truncate, and duplicate.
  • src/chunkserver/bgjobs.h
    • Included 'chunkserver-common/chunk_map.h'.
    • Defined AddJobFunc type for pending job functions.
    • Declared new addJobIfNotLocked and addLockJob methods in JobPool.
    • Declared new startChunkLock, enforceChunkLock, endChunkLock, and eraseChunkLock methods in JobPool.
    • Added LockedChunkData struct to hold information about locked chunks, including pending jobs and write initialization status.
    • Introduced chunkToJobReplyMapMutex_ and chunkToJobReplyMap_ for managing locked chunk data.
  • src/chunkserver/chunk_high_level_ops.cc
    • Included 'chunkserver/masterconn.h'.
    • Modified ReadHighLevelOp::cleanup to reset chunk ID, version, and type.
    • Added a call to masterconn_get_job_pool()->enforceChunkLock in WriteHighLevelOp::setup.
    • Added a check for chunkId_ == 0 at the beginning of WriteHighLevelOp::cleanup to prevent unnecessary operations.
    • Added a call to masterconn_get_job_pool()->endChunkLock in WriteHighLevelOp::cleanup to signal the end of a write operation.
  • src/chunkserver/chunkserver_entry.cc
    • Modified checkAndApplyClosed to conditionally call cleanup on writeHLO_ and readHLO_ only if a chunk is associated.
  • src/chunkserver/master_connection.cc
    • Added new case statements in gotPacket for SAU_MATOCS_CREATE_AND_LOCK_CHUNK, SAU_MATOCS_SET_VERSION_AND_LOCK, SAU_MATOCS_LOCK_CHUNK, SAU_MATOCS_UNLOCK_CHUNK, and SAU_MATOCS_DUPLICATE_AND_LOCK_CHUNK.
    • Implemented createAndLockChunk to handle chunk creation with locking.
    • Implemented setChunkVersionAndLock to handle chunk version updates with locking.
    • Implemented lockChunk to initiate a chunk lock on the chunkserver.
    • Implemented unlockChunk to remove a chunk lock on the chunkserver.
    • Implemented duplicateAndLockChunk to handle chunk duplication with locking.
    • Added sauJobFinishedAndLock callback function to manage post-job locking procedures.
  • src/chunkserver/master_connection.h
    • Included 'common/chunk_part_type.h'.
    • Declared new methods: createAndLockChunk, setChunkVersionAndLock, lockChunk, unlockChunk, duplicateAndLockChunk.
    • Declared sauJobFinishedAndLock static callback function.
  • src/chunkserver/masterconn.cc
    • Added masterconn_get_job_pool function to return the global JobPool instance.
  • src/chunkserver/masterconn.h
    • Included 'chunkserver/bgjobs.h'.
    • Declared masterconn_get_job_pool function.
  • src/common/event_loop.h
    • Updated documentation for eventloop_time to clarify it returns time in seconds.
  • src/common/saunafs_version.h
    • Added kFirstVersionWithChunkserverSideChunkLock constant.
  • src/master/chunks.cc
    • Added beingWritten boolean field to ChunkPart struct and associated is_being_written, mark_being_written, unmark_being_written methods.
    • Added LOCK to the ChunkOperation enum in the Chunk class.
    • Renamed needverincrease to needVersionIncrease in Chunk struct and updated its usage.
    • Modified Chunk::isLocked() to also consider the beingWritten flag.
    • Updated invalidateCopy to unmark beingWritten for the invalidated part.
    • Introduced chunk_create_operation, chunk_increase_version_operation, chunk_lock_operation, chunk_duplicate_operation, chunk_truncate_operation, and chunk_duplicate_and_truncate_operation functions to encapsulate specific chunk operations.
    • Modified chunk_emergency_increase_version to use chunk_increase_version_operation.
    • Updated chunk_finalize_failed_operation to send chunkunlock messages for parts that were beingWritten.
    • Enhanced chunk_handle_disconnected_copies to account for beingWritten parts when determining if an operation was interrupted.
    • Modified chunk_unlock to check isLocked() before notifying clients.
    • Added should_increase_chunk_version_on_modification helper function.
    • Refactored chunk_multi_modify into chunk_create and chunk_modify helper functions, and updated chunk_multi_modify to use these new helpers and the new operation functions.
    • Refactored chunk_multi_truncate to use chunk_truncate_operation and chunk_duplicate_and_truncate_operation.
    • Introduced chunk_write_end_status to handle status updates for write end operations.
    • Added chunk_got_chunklock_status and chunk_got_writeend_status handlers.
  • src/master/chunks.h
    • Declared should_increase_chunk_version_on_modification function.
    • Declared chunk_got_chunklock_status and chunk_got_writeend_status functions.
  • src/master/filesystem_operations.cc
    • Updated changeLog in writeChunk to use should_increase_chunk_version_on_modification for logging the operation flag.
  • src/master/matocsserv.cc
    • Modified matocsserv_send_createchunk to include a sentChunkLock parameter and send SAU_MATOCS_CREATE_AND_LOCK_CHUNK if the chunkserver supports it.
    • Added matocsserv_send_chunklock to send chunk lock requests.
    • Implemented matocsserv_got_chunklock_status to process chunk lock status replies.
    • Implemented matocsserv_got_writeend_status to process write end status replies.
    • Added matocsserv_send_chunkunlock to send chunk unlock requests.
    • Modified matocsserv_send_setchunkversion to include needChunkLock and sentChunkLock parameters, sending SAU_MATOCS_SET_VERSION_AND_LOCK if supported and needed.
    • Modified matocsserv_send_duplicatechunk to include a sentChunkLock parameter and send SAU_MATOCS_DUPLICATE_AND_LOCK_CHUNK if the chunkserver supports it.
    • Added case statements in matocsserv_gotpacket for SAU_CSTOMA_LOCK_CHUNK and SAU_CSTOMA_WRITE_END_STATUS.
  • src/master/matocsserv.h
    • Modified declarations for matocsserv_send_createchunk, matocsserv_send_setchunkversion, and matocsserv_send_duplicatechunk to include sentChunkLock parameters.
    • Declared matocsserv_send_chunklock, matocsserv_send_chunkunlock, matocsserv_got_chunklock_status, and matocsserv_got_writeend_status.
  • src/mount/writedata.cc
    • Removed haveDataToWrite flag and logic in InodeChunkWriter::processJob.
    • Modified InodeChunkWriter::processJob and ChunkJobWriter::processJob to always initialize and finish communication with chunkservers, ensuring lock release regardless of data presence.
  • src/protocol/SFSCommunication.h
    • Defined new protocol message IDs: SAU_MATOCS_CREATE_AND_LOCK_CHUNK, SAU_MATOCS_DUPLICATE_AND_LOCK_CHUNK, SAU_MATOCS_SET_VERSION_AND_LOCK, SAU_MATOCS_LOCK_CHUNK, SAU_CSTOMA_LOCK_CHUNK, SAU_CSTOMA_WRITE_END_STATUS, SAU_MATOCS_UNLOCK_CHUNK.
  • src/protocol/cstoma.h
    • Defined packet serialization for cstoma::chunkLock and cstoma::writeEndStatus.
  • src/protocol/matocs.h
    • Defined packet serialization for matocs::setVersionAndLock, matocs::chunkLock, matocs::chunkUnlock, matocs::createAndLockChunk, and matocs::duplicateAndLockChunk.
  • tests/test_suites/ShortSystemTests/test_read_corrupted_files.sh
    • Added WriteMaxRetries=2 to the .saunafs_tweaks file for the test.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-implemented feature: chunkserver-side chunk locking. The changes are extensive, touching the master, chunkserver, and network protocol, but they appear to be consistent and correct. The refactoring of chunk_multi_modify and chunk_multi_truncate in src/master/chunks.cc is a major improvement for code readability and maintainability. The new locking mechanism, including deferred job execution on locked chunks and proper handling of disconnections and errors, is thoughtfully designed.

I have a couple of suggestions for minor improvements. One is to improve efficiency in chunk_handle_disconnected_copies by reducing redundant iterations. The other is to replace a magic number with a named constant for better readability.

Overall, this is a high-quality contribution.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements chunkserver-side chunk locking to improve write operation tracking and consistency. The system now tracks whether chunk parts are being written on the chunkserver side, in addition to the existing client-side locking mechanism. This allows the master to make better decisions about chunk availability and operations.

Changes:

  • Introduced new protocol messages for chunkserver-side chunk locking (lock, unlock, write end status)
  • Refactored chunk modification operations in the master into separate, well-documented functions
  • Implemented chunk lock job management in the chunkserver's JobPool
  • Modified client-side write operations to always communicate with chunkservers for lock release
  • Fixed test_read_corrupted_files test by adding WriteMaxRetries configuration

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/protocol/matocs.h Added packet definitions for createAndLockChunk, setVersionAndLock, chunkLock, chunkUnlock, and duplicateAndLockChunk
src/protocol/cstoma.h Added packet definitions for chunkLock and writeEndStatus responses
src/protocol/SFSCommunication.h Added packet command constants for the new lock-related messages
src/common/saunafs_version.h Added version constant for chunkserver-side chunk lock feature (5.7.0)
src/common/event_loop.h Fixed documentation comment for eventloop_time() (returns seconds, not milliseconds)
src/master/matocsserv.h Updated function signatures to support lock tracking with sentChunkLock parameter
src/master/matocsserv.cc Implemented new message sending functions with version checks for backward compatibility
src/master/chunks.h Added declarations for new chunk operation functions and lock status handlers
src/master/chunks.cc Major refactoring: extracted chunk operations into separate functions, added beingWritten tracking, improved disconnection handling
src/master/filesystem_operations.cc Updated changelog to use operation type instead of boolean flag
src/chunkserver/masterconn.h Added declaration for masterconn_get_job_pool()
src/chunkserver/masterconn.cc Implemented masterconn_get_job_pool() to expose job pool
src/chunkserver/master_connection.h Added declarations for new lock-related message handlers
src/chunkserver/master_connection.cc Implemented handlers for lock messages and lock-aware job completion callbacks
src/chunkserver/bgjobs.h Added chunk lock management API to JobPool
src/chunkserver/bgjobs.cc Implemented chunk lock tracking and deferred job execution for locked chunks
src/chunkserver/chunk_high_level_ops.cc Integrated lock enforcement and release into write operations
src/chunkserver/chunkserver_entry.cc Added defensive checks before calling cleanup()
src/mount/writedata.cc Removed optimization to always communicate with chunkservers for lock release
tests/test_suites/ShortSystemTests/test_read_corrupted_files.sh Added WriteMaxRetries=2 configuration for test reliability

@dmga44 dmga44 force-pushed the feat-cs-size-chunk-lock branch from 888d7ec to 09b15e5 Compare February 11, 2026 05:27
@dmga44 dmga44 requested a review from Copilot February 11, 2026 05:31
@dmga44
Copy link
Collaborator Author

dmga44 commented Feb 11, 2026

@gemini-code-assist Please review again.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-implemented feature for chunkserver-side chunk locking. The changes are comprehensive, spanning the master, chunkserver, and client-side components, and appear to be correctly implemented.

The most notable improvement is the major refactoring of chunk_multi_modify and chunk_multi_truncate functions in src/master/chunks.cc. Breaking down these large functions into smaller, single-purpose, and well-documented functions (chunk_create_operation, chunk_increase_version_operation, etc.) greatly enhances the readability and maintainability of the code.

The new locking logic seems robust, with careful attention to concurrency, potential deadlocks, and error handling scenarios such as chunkserver disconnections during a write operation. The state management on both the master (with the beingWritten flag) and the chunkserver (via the JobPool) is consistent and appears correct.

Overall, this is a high-quality contribution that not only adds an important feature but also improves the existing codebase. The code is clean, well-documented, and demonstrates a solid understanding of the system's architecture.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 6 comments.

@dmga44 dmga44 force-pushed the feat-cs-size-chunk-lock branch from 09b15e5 to 0c1055b Compare February 11, 2026 06:48
@dmga44 dmga44 requested a review from Copilot February 11, 2026 06:49
@dmga44
Copy link
Collaborator Author

dmga44 commented Feb 11, 2026

@gemini-code-assist Please review again.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant feature: chunkserver-side chunk locking. This enhances the existing client-side locking mechanism by allowing the master to track writes at the chunkserver level, improving consistency. The implementation involves new packet types for lock management, updates to the chunkserver's JobPool to handle locked chunks and pending jobs, and corresponding changes on the master side to manage the lock lifecycle.

The review identified a potential crash in a new unit test due to a nullptr being passed to a callback that doesn't handle it. Additionally, there are a couple of opportunities to reduce code duplication in the new JobPool logic by extracting common functionality into helper methods.

Overall, the changes are well-implemented. The extensive refactoring of chunk_multi_modify and chunk_multi_truncate in src/master/chunks.cc is a notable improvement, making the code much more readable and maintainable.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.

Copy link

Copilot AI commented Feb 11, 2026

@dmga44 I've opened a new pull request, #745, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link

Copilot AI commented Feb 11, 2026

@dmga44 I've opened a new pull request, #746, to work on those changes. Once the pull request is ready, I'll request review from you.

@dmga44 dmga44 force-pushed the feat-cs-size-chunk-lock branch 2 times, most recently from 1202734 to b8b0dd0 Compare February 11, 2026 23:17
@dmga44 dmga44 force-pushed the feat-cs-size-chunk-lock branch 5 times, most recently from 2651c75 to 004dada Compare February 13, 2026 13:18
@dmga44 dmga44 requested a review from Copilot February 13, 2026 21:14
@dmga44
Copy link
Collaborator Author

dmga44 commented Feb 13, 2026

@gemini-code-assist Please review again.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and valuable feature: chunkserver-side locking. The implementation is comprehensive, touching on the master, chunkserver, and client components. The refactoring of chunk_multi_modify and chunk_multi_truncate into smaller, more focused functions is a major improvement for code clarity and maintainability. The new locking logic appears robust, and the addition of tests, especially the enhanced concurrent write test that toggles the feature flag, is excellent. I've identified one potential race condition related to client lock timeouts, which I've detailed in a specific comment. Overall, this is a high-quality contribution.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

@dmga44 dmga44 force-pushed the feat-cs-size-chunk-lock branch from 004dada to a7009ec Compare February 16, 2026 01:51
@dmga44 dmga44 requested a review from Copilot February 16, 2026 01:53
@dmga44
Copy link
Collaborator Author

dmga44 commented Feb 16, 2026

@gemini-code-assist Please review again.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-implemented feature: chunkserver-side chunk locking. The changes are extensive, touching the master, chunkserver, client-side mount, and protocol definitions, but they are logically sound and consistent with the detailed description provided.

The core of the change is to allow the master to track writes not just by client acknowledgements but also by chunkserver-side status, which improves the robustness of chunk locking. This is controlled by a new, reloadable configuration option USE_CHUNKSERVER_SIDE_CHUNK_LOCK.

Key highlights of this PR include:

  • A major and beneficial refactoring of the chunk_multi_modify and chunk_multi_truncate functions in the master, which greatly improves readability and maintainability.
  • Introduction of new packets and handlers for lock management between the master and chunkservers, with proper handling for backward compatibility.
  • Robust implementation on the chunkserver side to queue jobs on locked chunks and manage the lock lifecycle.
  • Thoughtful updates to the client-side write logic to ensure lock release protocols are followed.
  • Comprehensive updates to unit tests and system tests, including a stress test that toggles the new feature dynamically.

The code quality is high, with good attention to detail, especially regarding memory management of network packets and handling of concurrent operations. I have reviewed the changes carefully and found no issues that meet the required severity levels. This is an excellent contribution to the project.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/mount/writedata.cc:623

  • At line 618, when e.status() == SAUNAFS_ERROR_LOCKED, the retry counter is not incremented and the error is thrown again (line 623). However, there's no mechanism shown here to wait or backoff before retrying. If the chunk remains locked, this could lead to a tight retry loop consuming CPU. Consider whether a delay or backoff mechanism should be added before retrying locked chunks, or verify that such a mechanism exists elsewhere in the write job scheduling logic.
			if (inodeData_->trycnt >= maxretries && e.status() != SAUNAFS_ERROR_LOCKED) {
				// Convert error to an unrecoverable error
				throw UnrecoverableWriteException(e.message(), e.status());
			} else {
				// This may be recoverable or unrecoverable error
				throw;

@dmga44 dmga44 mentioned this pull request Feb 16, 2026
@dmga44 dmga44 requested a review from uristdwarf February 16, 2026 09:27
Copy link
Contributor

@lgsilva3087 lgsilva3087 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Please check my comments.

Copy link
Contributor

@ralcolea ralcolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @dmga44!! 💪 💪 💪
I shared some minor suggestions for your consideration.

@dmga44 dmga44 force-pushed the feat-cs-size-chunk-lock branch from a7009ec to 1b04041 Compare February 17, 2026 14:03
Copy link
Contributor

@lgsilva3087 lgsilva3087 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work

@ralcolea
Copy link
Contributor

Great work @dmga44!! 👍 💪 🔥

@dmga44 dmga44 force-pushed the feat-cs-size-chunk-lock branch from 1b04041 to 17d1d2c Compare February 17, 2026 17:50
The current implementation of the chunk locking in the system relies
only in the client responses when finishing write and truncate
operations. This commit targets including also chunkserver side replies
after receiving an actual stream of write operations to decide whether
the chunk is locked or not. This is not useful in the current state of
code and must not affect the behavior of the system because the client
responses to the master in order to unlock the chunk happen after the
data is already in the chunkserver or there is some error.

The idea of the change is to keep track in each of the chunk parts of
whether it is expected to be written at the time. Please note this
affects all the writes coming from the client and some truncates. Those
cases trigger one these functions:
- chunk_create_operation
- chunk_increase_version_operation
- chunk_lock_operation
- chunk_duplicate_operation
which can be traced back to the chunk_multi_modify function. After
checking some conditions (specially if the CS version supports this
feature) the new packets arrive to the CS and in most of the cases
the write operation needs to wait for its responses, and the chunk
parts are considered being written.

In the CS side, the locking is handled by the master's main JobPool.
It can start, enforce, end and erase chunk lock jobs. The locked chunks
are special the way that after ending the write operations on it, the
master receives the status of the write operations that was not told to
the clients, so far it is always OK because everything is told to the
clients. Master jobs on enforced locked chunks will have to wait until
the chunk is released, i.e finish writing.

Back again in the master side, it handles disconnection of parts being
written and errors on those writes the same way: increasing version
when the chance appears.

Side changes:
- refactor chunk_multi_modify and chunk_multi_truncate functions.
- change the effective grace time when starting and trying to create
chunk before responding "no space" from 600s to 60s.
- add documentation to some members of the Chunk class deeply involved
in the change.
- in the client side, always talk to the chunkservers when writing.
- the main SaunaFS package version was updated to 5.8.0

Signed-off-by: Dave <dave@leil.io>
The new behavior implemented previously can be now enabled and disabled
via the option USE_CHUNKSERVER_SIDE_CHUNK_LOCK. This option is
reloadable. The decision making instant is at the moment of sending the
specific packet type to the chunkservers.

The testing framework was modified in order to enable this option in
all tests, while the default master behavior has it disabled by
default.

Signed-off-by: Dave <dave@leil.io>
Current implementation has a critical error when the following
conditions are met:
- clients are creating files in goal ec(D, P).
- more than one minute has passed.
- all the chunkservers restart.

In such case, it is very likely that master replies to some of the
chunk write (creations) requests SAUNAFS_ERROR_NOSPACE. This happens
when not being able to create the chunk due to not having at least D
chunkservers available and having at least one available.

The error code SAUNAFS_ERROR_NOSPACE is not retryable by the client,
which is correct, but makes the write operations to completely fail.
The incorrect behavior is the one from the master, which must reply
SAUNAFS_ERROR_NOCHUNKSERVERS in such cases and only leave the NOSPACE
error code for the case when at least D chunkservers are available.

Signed-off-by: Dave <dave@leil.io>
@dmga44 dmga44 force-pushed the feat-cs-size-chunk-lock branch from 17d1d2c to 4f0aea4 Compare February 17, 2026 20:18
@dmga44 dmga44 merged commit 4264de9 into dev Feb 18, 2026
11 checks passed
@dmga44 dmga44 deleted the feat-cs-size-chunk-lock branch February 18, 2026 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants