Skip to content

perf(chunkserver): Protect IO performance during massive delete LS #79#739

Draft
GigaCronos wants to merge 2 commits intodevfrom
perf-improve-massive-delete
Draft

perf(chunkserver): Protect IO performance during massive delete LS #79#739
GigaCronos wants to merge 2 commits intodevfrom
perf-improve-massive-delete

Conversation

@GigaCronos
Copy link
Contributor

@GigaCronos GigaCronos commented Feb 7, 2026

This PR further reduces trash-GC impact on foreground I/O by dynamically throttling expired-file deletion based on observed disk throughput, and improves delete wall-clock time by running removals per-disk in parallel.

Key changes:

  • I/O-aware GC throttling (with floor guard)

    • Adds GC-specific I/O counters in HddStats and increments them on every read/write accounting path.
    • In ChunkTrashManagerImpl::collectGarbage():
      • Samples and resets GC I/O counters each cycle (exchange(0)).
      • Normalizes by disk count (gDisks.size() under gDisksMutex) to estimate per-disk pressure.
      • Maintains max observed per-disk bytes (read/write) to create a stable baseline.
      • Computes totalIOPercentage and feeds it through an inverted sigmoid (steepness=12, center=0.3) to scale down
        trashGarbageCollectorBulkSize under load.
      • Adds a minimum threshold: only perform expired-file deletion when the scaled bulk size is >= 5, preventing “death by a thousand tiny deletes” during heavy I/O.
  • Parallel trash removal across disks
    -removeTrashFiles() now spawns a std::jthread per disk and delegates actual deletion/index updates to removeTrashFilesFromDisk() allowing independent disks to progress concurrently.

  • Behavioral defaults

    • CHUNK_TRASH_ENABLED default flips 0 → 1 (trash manager enabled unless explicitly disabled).
    • Default retention (kDefaultTrashTimeLimitSeconds) flips 259200 → 0 (immediate expiry unless configured).
    • Changed documentation accordingly.
  • Resource thread pacing

    • in hddFreeResourcesThread() loop: replacing sleep() with Timeout + usleep(timeout.remaining_us()) for more precise loop cadence.
  • Test dependency: disabled GC for rebalancing test, mainly because chunks in trash affect disk usage statistics.

  • Changed the functionality of implMutex for protecting only the getter and setter of the current GC implementation.
    Rationale:

Large delete storms can saturate disks and degrade chunkserver latency. This patch makes the garbage collector self-throttling using real I/O signals and avoids thrashing by skipping deletion work when the system is already busy, while parallelizing deletes per disk to reclaim space faster when bandwidth is available.

@gemini-code-assist
Copy link

Summary of Changes

Hello @GigaCronos, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the chunkserver's garbage collection mechanism to mitigate performance impacts during intensive file deletion operations. By dynamically monitoring disk I/O and adjusting the rate of trash file removal, the system can now intelligently throttle deletions when disks are under heavy load. Furthermore, the introduction of parallel deletion across individual disks allows for more efficient and faster space recovery when system resources permit, leading to a more robust and responsive chunkserver environment.

Highlights

  • Dynamic I/O-aware GC throttling: Implemented dynamic I/O-aware throttling for garbage collection, adjusting deletion bulk size based on observed disk throughput to prevent performance degradation. This includes tracking GC-specific I/O, calculating I/O percentage, and applying an inverted sigmoid function to scale the bulk size, with a minimum threshold to avoid tiny deletes.
  • Parallel trash file removal: Introduced parallel trash file removal across multiple disks using std::jthread to accelerate space reclamation, allowing independent disks to process deletions concurrently.
  • Updated behavioral defaults: Changed default configurations: CHUNK_TRASH_ENABLED now defaults to true (enabled), and kDefaultTrashTimeLimitSeconds defaults to 0 (immediate expiry) for more aggressive space reclamation by default.
  • Improved resource thread pacing: Enhanced resource thread pacing in hddFreeResourcesThread by increasing the loop period from 2 seconds to 3 seconds and replacing sleep() with Timeout + usleep() for more precise and controlled loop cadence.
Changelog
  • src/chunkserver/chunkserver-common/chunk_trash_index.cc
    • Refactored getExpiredFilesInternal to ensure bulkSize is positive before processing, and removed internal bulk size counting, implying external control.
  • src/chunkserver/chunkserver-common/chunk_trash_manager.cc
    • Changed the default value of CHUNK_TRASH_ENABLED configuration from disabled (0) to enabled (1).
  • src/chunkserver/chunkserver-common/chunk_trash_manager_impl.cc
    • Added necessary headers (<cstdint>, global_shared_resources.h, hdd_stats.h) for concurrency and I/O statistics.
    • Introduced static members maxBytesReadPerDisk and maxBytesWritePerDisk for I/O tracking.
    • Modified removeTrashFiles to utilize std::jthread for parallel execution of removeTrashFilesFromDisk per disk.
    • Implemented removeTrashFilesFromDisk to handle the actual file removal and index updates for a given disk.
    • Enhanced collectGarbage with I/O-aware throttling logic, including tracking GC-specific I/O, calculating totalIOPercentage, applying an inverted sigmoid function to scale the bulk size, and enforcing a minimum bulk size threshold of 5.
  • src/chunkserver/chunkserver-common/chunk_trash_manager_impl.h
    • Declared the new static parallel deletion function removeTrashFilesFromDisk.
    • Added declarations for maxBytesReadPerDisk and maxBytesWritePerDisk.
    • Updated kDefaultTrashTimeLimitSeconds to 0, enabling immediate file expiry by default.
  • src/chunkserver/chunkserver-common/hdd_stats.cc
    • Integrated GC-specific I/O counters (gGCTotalBytesRead, gGCTotalBytesWrite) into the totalRead and totalWrite functions to track garbage collection I/O.
  • src/chunkserver/chunkserver-common/hdd_stats.h
    • Declared new atomic variables gGCTotalBytesWrite and gGCTotalBytesRead for tracking I/O operations specifically related to garbage collection.
  • src/chunkserver/hddspacemgr.cc
    • Increased the kDelayedStep for hddFreeResourcesThread from 2 to 3 seconds.
    • Replaced the sleep() call with a Timeout object and usleep() for more precise and controlled loop pacing.
Activity
  • No specific activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@GigaCronos GigaCronos changed the title perf(chunkserver): Protect IO performance during massive delete perf(chunkserver): Protect IO performance during massive delete LS #79 Feb 7, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces I/O-aware throttling for the garbage collector and parallelizes trash file removal to improve performance during mass deletions. The changes are generally well-implemented, but I've identified a few issues. There's a significant bug in getExpiredFilesInternal that breaks the behavior for unlimited bulk sizes, and another in collectGarbage where local static variables incorrectly shadow class members, preventing state persistence. I've also suggested improvements for code clarity and efficiency by renaming misleadingly named global counters and removing a redundant calculation. Overall, the approach to improving I/O performance is solid, and these corrections are needed for correctness and maintainability.

@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch 3 times, most recently from 609b6e5 to 56c1a1a Compare February 8, 2026 14:27
@GigaCronos GigaCronos self-assigned this Feb 8, 2026
@GigaCronos
Copy link
Contributor Author

@gemini-code-assist take a look

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces I/O-aware throttling for the garbage collector and parallelizes trash removal to improve performance during massive delete operations. The implementation is solid, using std::jthread for parallelization and a sigmoid function for dynamic throttling. I've identified one high-severity issue in the I/O percentage calculation logic that could cause overly aggressive throttling, which is critical for performance-sensitive components, and have provided a suggested fix. The other changes, including configuration updates and making the main resource loop timing more precise, are well-implemented and align with the PR's goals.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance improvements for the chunkserver's garbage collection by making it I/O-aware and parallelizing file deletion. The core ideas of dynamic throttling based on disk throughput and parallel per-disk removal are excellent. The implementation is mostly solid, but I've identified a couple of logic issues in the new throttling mechanism that could undermine its effectiveness, particularly in how the deletion batch size is calculated and applied. I've also included a few minor suggestions to improve code clarity and maintainability. Overall, this is a valuable enhancement, and with a few adjustments, it will be a great addition.

@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch 2 times, most recently from 0960b78 to da26c17 Compare February 8, 2026 15:03
@GigaCronos
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an intelligent I/O-aware throttling mechanism for the garbage collector to reduce its impact on foreground I/O, which is a significant improvement. The parallelization of trash removal using std::jthread is also a welcome performance enhancement. The changes to default configurations are clearly described. The code is well-structured, but I have a couple of suggestions to refine the new throttling logic for better accuracy and smoother behavior on startup.

@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch from da26c17 to 0db29d5 Compare February 8, 2026 15:24
@GigaCronos
Copy link
Contributor Author

/gemini review ,deeper review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an intelligent I/O-aware throttling mechanism for the garbage collector and parallelizes trash file removal across disks. These are excellent performance improvements, especially for scenarios with massive file deletions. The code is well-structured, and the use of std::jthread for parallelization and a precise Timeout for the resource thread loop are good modern C++ practices. I've found one potential bug in the new throttling logic related to integer division and a few minor typos. Overall, this is a solid contribution to improving the chunkserver's performance under load.

@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch from 0db29d5 to edf34c1 Compare February 8, 2026 15:37
@GigaCronos
Copy link
Contributor Author

/gemini review take another look, pay attetion to inconsistent whitespaces

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance improvements to the chunkserver's trash garbage collection. The I/O-aware throttling mechanism is a clever way to protect foreground I/O performance during heavy delete operations, and parallelizing deletions per disk should notably improve wall-clock time for cleanup. The code is generally well-structured, but I've identified a few areas for improvement regarding correctness and maintainability. My main concern is a logic change that appears to alter the garbage collection batch size from a total limit to a per-disk limit, which could lead to excessive deletions. I've also included suggestions to improve code clarity and testability, aligning with best practices for explicit dependencies and accurate documentation.

@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch 3 times, most recently from e32dca7 to 701b465 Compare February 11, 2026 13:13
@GigaCronos
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a sophisticated I/O-aware throttling mechanism for the garbage collector and parallelizes trash removal to improve performance during large-scale deletions. The changes are well-structured and include corresponding updates to documentation and tests. My review focuses on refining the throttling logic to be more adaptive over time and improving the clarity of related documentation and correcting minor typos. Overall, this is a solid performance enhancement.

@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch 5 times, most recently from f042ea5 to 8cd0352 Compare February 18, 2026 07:26
@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch 2 times, most recently from cb73e4c to 23dc728 Compare February 23, 2026 13:42
@GigaCronos
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance improvements to the chunkserver's garbage collection mechanism. Key changes include I/O-aware throttling of expired-file deletion based on observed disk throughput and parallel trash removal across disks using std::jthread. The default behavior for CHUNK_TRASH_ENABLED and CHUNK_TRASH_EXPIRATION_SECONDS has been updated, and the CHUNK_TRASH_GC_BATCH_SIZE is now dynamically scaled. Additionally, the mutex locking for ChunkTrashManager::getImpl() has been refactored to minimize lock holding time, which is a good practice. Test configurations have been updated to disable trash for rebalancing tests, ensuring consistent test environments. Overall, these changes aim to protect foreground I/O performance during heavy delete operations and improve the efficiency of space reclamation. A specific comment highlights a discrepancy between the PR description and the code regarding the hddFreeResourcesThread() loop period, requesting clarification or an update.

pthread_setname_np(pthread_self(), "freeResThread");

while (!gTerminate) {
Timeout timeout(std::chrono::microseconds(kDelayedStep * 1'000'000));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pull request description states that the hddFreeResourcesThread() loop period increases from 2s to 3s. However, the code uses kDelayedStep, which is defined as 2, resulting in a 2-second timeout. This creates a discrepancy between the intended change and the actual implementation. Please clarify if the loop period should indeed be 3 seconds, and if so, update kDelayedStep accordingly.

@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch from 23dc728 to 63bbc23 Compare February 28, 2026 05:16
This PR further reduces trash-GC impact on foreground I/O
by dynamically throttling expired-file deletion based on observed
disk throughput, and improves delete wall-clock time
by running removals per-disk in parallel.

Key changes:
- I/O-aware GC throttling (with floor guard)
  - Adds GC-specific I/O counters in HddStats
 and increments them on every read/write accounting path.
  - In ChunkTrashManagerImpl::collectGarbage():
    - Samples and resets GC I/O counters each cycle (exchange(0)).
    - Normalizes by disk count (gDisks.size() under gDisksMutex)
    to estimate per-disk pressure.
    - Maintains max observed per-disk bytes (read/write) to create
    a stable baseline.
    - Computes totalIOPercentage and feeds it through
    an inverted sigmoid (steepness=12, center=0.3) to scale down
    trashGarbageCollectorBulkSize under load.
    - Adds a minimum threshold: only perform expired-file deletion
    when the scaled bulk size is >= 5, preventing “death by a thousand
    tiny deletes” during heavy I/O.

- Parallel trash removal across disks
  -removeTrashFiles() now spawns a std::jthread per disk and
  delegates actual deletion/index updates to removeTrashFilesFromDisk()
  allowing independent disks to progress concurrently.

- Behavioral defaults
  - CHUNK_TRASH_ENABLED default flips 0 → 1 (trash manager enabled
  unless explicitly disabled).
  - Default retention (kDefaultTrashTimeLimitSeconds) flips 259200 → 0
  (immediate expiry unless configured).
  - Changed documentation accordingly.

- Resource thread pacing
  - in hddFreeResourcesThread() loop replacing sleep()
  with Timeout + usleep(timeout.remaining_us())
  for more precise loop cadence.

- Changed the functionality of implMutex for protecting only
the getter and setter of the current GC implementation.

- Test dependency:
  - Disabled GC for rebalancing test, mainly because chunks
  in trash affect disk usage statistics.
  -Added more sleep time to sfschunkserver_check_no_buffer_in_use()
  given that GC cycles now take more time.

Rationale:
Large delete storms can saturate disks and degrade chunkserver latency.
This patch makes the garbage collector self-throttling using real I/O
signals and avoids thrashing by skipping deletion work when
the system is already busy, while parallelizing deletes per disk
to reclaim space faster when bandwidth is available.

Signed-off-by: GigaCronos <jorge.cabrera@leil.io>
Add a ProducerConsumerQueue and ThreadPool
with up to 5 runners, for executing the per disk deletion jobs.

Signed-off-by: GigaCronos <jorge.cabrera@leil.io>
@GigaCronos GigaCronos force-pushed the perf-improve-massive-delete branch from 63bbc23 to 06a78b6 Compare February 28, 2026 05:24
@GigaCronos
Copy link
Contributor Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance improvements to the chunkserver's garbage collection process by adding I/O-aware throttling and parallelizing file deletion. The review identified a critical performance and Denial of Service (DoS) risk in the trash removal synchronization logic, a high-severity issue with unsafe memory management, and several medium-severity issues related to configuration consistency and hardcoded values. All original comments have been retained as they do not contradict any established rules and provide valuable feedback. One comment has been enhanced with a reference to an existing rule regarding parameter passing for job functions.

Comment on lines +182 to +184
reinterpret_cast<uint8_t *>(
new std::pair<ChunkTrashIndex::TrashIndexFileEntries, std::string>(fileEntries,
diskPath)),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using new with reinterpret_cast is unsafe and not idiomatic C++. It's prone to memory leaks if not handled carefully in all code paths in the consumer thread. Also, passing fileEntries by value to the std::pair constructor causes the entire std::multimap to be copied for each job, which can be inefficient.

Consider using smart pointers like std::unique_ptr to manage the lifetime of the job data automatically. This would make the code safer and more aligned with modern C++ practices.

For example, you could define a struct for the job data:

struct TrashRemovalJob {
    ChunkTrashIndex::TrashIndexFileEntries files;
    std::string diskPath;
};

Then, you can use std::make_unique<TrashRemovalJob>(...) and pass the raw pointer to the queue, and reconstruct the unique_ptr in the worker thread to ensure automatic cleanup.

References
  1. Prefer passing individual parameters to low-level job functions instead of a pointer to a high-level operation object. Encapsulating job data in a dedicated struct and managing it with smart pointers aligns with this principle by providing a clear, safe, and efficient way to pass job-specific parameters.

Comment on lines +188 to +190
while(NotIdleThreadCount && !removeFromTrashJobQueue->isEmpty()){

}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The removeTrashFiles function contains a busy-wait loop that consumes 100% of a CPU core, posing a performance and Denial of Service (DoS) risk. The condition NotIdleThreadCount && !removeFromTrashJobQueue->isEmpty() is logically flawed; it can cause the loop to exit prematurely, leading to inefficient space recovery and potential resource exhaustion as callers might incorrectly assume jobs are complete. It's recommended to replace this busy-wait with a proper synchronization mechanism, such as a std::condition_variable, to efficiently wait for job completion.

Suggested change
while(NotIdleThreadCount && !removeFromTrashJobQueue->isEmpty()){
}
while(!removeFromTrashJobQueue->isEmpty() || NotIdleThreadCount > 0) {
std::this_thread::yield();
}

{"PLUGINS_DIR", ""},
{"CHUNK_TRASH_ENABLED", "0"},
{"CHUNK_TRASH_ENABLED", "1"},
{"CHUNK_TRASH_EXPIRATION_SECONDS", "259200"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default value for CHUNK_TRASH_EXPIRATION_SECONDS appears to be outdated. It is set to "259200" here, but in src/chunkserver/chunkserver-common/chunk_trash_manager_impl.h, the corresponding default kDefaultTrashTimeLimitSeconds has been changed to 0. To ensure consistency across the system, this default value should be updated to match.

    {"CHUNK_TRASH_EXPIRATION_SECONDS", "0"},

}
}

if (ChunkTrashManagerImpl::removeFromTrashThreads.size() < 5) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The number of worker threads for removing trash files is hardcoded to 5. This should be a named constant to improve readability and maintainability. Consider making it configurable or basing it on std::thread::hardware_concurrency().

    static constexpr uint8_t kRemoveFromTrashThreadCount = 5;
    if (ChunkTrashManagerImpl::removeFromTrashThreads.size() < kRemoveFromTrashThreadCount) {

Comment on lines +357 to +358
const double steepness = 10.0; // steepness
const double center = 0.15; // center in [0,1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parameters for the invertedSigmoid function (steepness = 10.0, center = 0.15) do not match the values mentioned in the pull request description (steepness=12, center=0.3). Please clarify if this discrepancy is intentional.

Additionally, these magic numbers should be defined as named constants for better readability and maintainability.

        const double kSigmoidSteepness = 12.0; // As per PR description
        const double kSigmoidCenter = 0.3;     // As per PR description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant