Skip to content

Commit f042ea5

Browse files
committed
perf(chunkserver): Protect IO performance during massive delete
This PR further reduces trash-GC impact on foreground I/O by dynamically throttling expired-file deletion based on observed disk throughput, and improves delete wall-clock time by running removals per-disk in parallel. Key changes: - I/O-aware GC throttling (with floor guard) - Adds GC-specific I/O counters in HddStats and increments them on every read/write accounting path. - In ChunkTrashManagerImpl::collectGarbage(): - Samples and resets GC I/O counters each cycle (exchange(0)). - Normalizes by disk count (gDisks.size() under gDisksMutex) to estimate per-disk pressure. - Maintains max observed per-disk bytes (read/write) to create a stable baseline. - Computes totalIOPercentage and feeds it through an inverted sigmoid (steepness=12, center=0.3) to scale down trashGarbageCollectorBulkSize under load. - Adds a minimum threshold: only perform expired-file deletion when the scaled bulk size is >= 5, preventing “death by a thousand tiny deletes” during heavy I/O. - Parallel trash removal across disks -removeTrashFiles() now spawns a std::jthread per disk and delegates actual deletion/index updates to removeTrashFilesFromDisk() allowing independent disks to progress concurrently. - Behavioral defaults - CHUNK_TRASH_ENABLED default flips 0 → 1 (trash manager enabled unless explicitly disabled). - Default retention (kDefaultTrashTimeLimitSeconds) flips 259200 → 0 (immediate expiry unless configured). - Changed documentation accordingly. - Resource thread pacing - hddFreeResourcesThread() loop period increases 2s → 3s, replacing sleep() with Timeout + usleep(timeout.remaining_us()) for more precise loop cadence. - Changed the functionality of implMutex for protecting only the getter and setter of the current GC implementation. - Test dependency: - Disabled GC for rebalancing test, mainly because chunks in trash affect disk usage statistics. -Added more sleep time to sfschunkserver_check_no_buffer_in_use() given that GC cycles now take more time. Rationale: Large delete storms can saturate disks and degrade chunkserver latency. This patch makes the garbage collector self-throttling using real I/O signals and avoids thrashing by skipping deletion work when the system is already busy, while parallelizing deletes per disk to reclaim space faster when bandwidth is available. Signed-off-by: GigaCronos <jorge.cabrera@leil.io>
1 parent 2483a82 commit f042ea5

16 files changed

+113
-40
lines changed

doc/sfschunkserver.cfg.5.adoc

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ defined at build time), /usr/local/lib/saunafs/plugins/chunkserver, and
186186

187187
*CHUNK_TRASH_ENABLED (EXPERIMENTAL)*:: enables or disables the chunk trash
188188
feature. When enabled, deleted chunks are moved to a trash directory instead of
189-
being immediately removed. (Default: 0)
189+
being immediately removed. (Default: 1)
190190

191191
*CHUNK_TRASH_EXPIRATION_SECONDS (EXPERIMENTAL)*:: specifies the timeout in
192192
seconds for chunks to remain in the trash before being permanently deleted.
@@ -198,9 +198,11 @@ threshold, the system will start deleting older chunks from the trash to free
198198
up space. A suggested value is about the 15% of the capacity of the smaller
199199
configured disk. (Default: 0)
200200

201-
*CHUNK_TRASH_GC_BATCH_SIZE (EXPERIMENTAL)*:: defines the bulk size for the
202-
garbage collector when processing chunks in the trash. This determines how many
203-
files are processed in each garbage collection cycle. (Default: 1000)
201+
*CHUNK_TRASH_GC_BATCH_SIZE (EXPERIMENTAL)*:: defines the maximum bulk size
202+
per disk for the garbage collector when processing chunks in the trash.
203+
This value is used when no I/O pressure is detected and is dynamically
204+
scaled down based on current disk load.
205+
(Default: 1000)
204206

205207
*CHUNK_TRASH_GC_SPACE_RECOVERY_BATCH_SIZE (EXPERIMENTAL)*:: [ADVANCED] The
206208
number of files to remove from the trash in a single GC cycle, in case the disk

src/admin/dump_config_command.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ const static std::unordered_map<std::string, std::string> defaultOptionsCS = {
176176
{"REPLICATION_CONNECTION_TIMEOUT_MS", "1000"},
177177
{"REPLICATION_WAVE_TIMEOUT_MS", "500"},
178178
{"PLUGINS_DIR", ""},
179-
{"CHUNK_TRASH_ENABLED", "0"},
179+
{"CHUNK_TRASH_ENABLED", "1"},
180180
{"CHUNK_TRASH_EXPIRATION_SECONDS", "259200"},
181181
{"CHUNK_TRASH_FREE_SPACE_THRESHOLD_GB", "0"},
182182
{"CHUNK_TRASH_GC_BATCH_SIZE", "1000"},

src/chunkserver/chunkserver-common/chunk_trash_index.cc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,8 @@ ChunkTrashIndex::TrashIndexDiskEntries ChunkTrashIndex::getExpiredFilesInternal(
8080
const time_t &timeLimit, size_t bulkSize) {
8181
TrashIndexDiskEntries expiredFiles;
8282

83-
size_t count = 0;
8483
for (const auto &diskEntry : trashIndex) {
85-
if (bulkSize != 0 && count >= bulkSize) { break; }
86-
count += getExpiredFilesInternal(diskEntry.first, timeLimit, expiredFiles, bulkSize);
84+
getExpiredFilesInternal(diskEntry.first, timeLimit, expiredFiles, bulkSize);
8785
}
8886

8987
return expiredFiles;

src/chunkserver/chunkserver-common/chunk_trash_manager.cc

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,11 @@ int ChunkTrashManager::moveToTrash(const std::filesystem::path &filePath,
5454
if (!isEnabled) { return 0; }
5555

5656
// Protect against concurrent access
57-
std::lock_guard<std::mutex> lock(implMutex);
58-
59-
auto &impl = getImpl();
57+
ImplementationPtr impl;
58+
{
59+
std::lock_guard<std::mutex> lock(implMutex);
60+
impl = getImpl();
61+
}
6062
if (!impl) {
6163
safs::log_error_code(SAUNAFS_ERROR_EINVAL,
6264
"ChunkTrashManager implementation not initialized");
@@ -69,9 +71,11 @@ int ChunkTrashManager::init(const std::string &diskPath) {
6971
reloadConfig();
7072

7173
// Protect against concurrent access
72-
std::lock_guard<std::mutex> lock(implMutex);
73-
74-
auto &impl = getImpl();
74+
ImplementationPtr impl;
75+
{
76+
std::lock_guard<std::mutex> lock(implMutex);
77+
impl = getImpl();
78+
}
7579
if (!impl) {
7680
safs::log_error_code(SAUNAFS_ERROR_EINVAL,
7781
"ChunkTrashManager implementation not initialized");
@@ -84,9 +88,11 @@ void ChunkTrashManager::collectGarbage() {
8488
if (!isEnabled) { return; }
8589

8690
// Protect against concurrent access
87-
std::lock_guard<std::mutex> lock(implMutex);
88-
89-
auto &impl = getImpl();
91+
ImplementationPtr impl;
92+
{
93+
std::lock_guard<std::mutex> lock(implMutex);
94+
impl = getImpl();
95+
}
9096
if (!impl) {
9197
safs::log_error_code(SAUNAFS_ERROR_EINVAL,
9298
"ChunkTrashManager implementation not initialized");
@@ -96,17 +102,19 @@ void ChunkTrashManager::collectGarbage() {
96102
}
97103

98104
void ChunkTrashManager::reloadConfig() {
99-
// Protect against concurrent access
100-
std::lock_guard<std::mutex> lock(implMutex);
101-
102-
auto &impl = getImpl();
105+
// Protect against concurrent access
106+
ImplementationPtr impl;
107+
{
108+
std::lock_guard<std::mutex> lock(implMutex);
109+
impl = getImpl();
110+
}
103111
if (!impl) {
104112
safs::log_error_code(SAUNAFS_ERROR_EINVAL,
105113
"ChunkTrashManager implementation not initialized");
106114
return;
107115
}
108116

109-
isEnabled = cfg_get("CHUNK_TRASH_ENABLED", static_cast<u_short>(0));
117+
isEnabled = cfg_get("CHUNK_TRASH_ENABLED", static_cast<u_short>(1));
110118
safs::log_info("Chunk trash manager is {}", isEnabled ? "enabled" : "disabled");
111119
impl->reloadConfig();
112120
}

src/chunkserver/chunkserver-common/chunk_trash_manager_impl.cc

Lines changed: 48 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,13 @@
2020

2121
#include <sys/statvfs.h>
2222
#include <algorithm>
23+
#include <cstdint>
2324
#include <ctime>
2425

2526
#include "chunkserver-common/chunk_trash_manager_impl.h"
2627
#include "config/cfg.h"
2728
#include "errors/saunafs_error_codes.h"
29+
#include "global_shared_resources.h"
2830
#include "hdd_stats.h"
2931
#include "slogger/slogger.h"
3032

@@ -36,6 +38,9 @@ size_t ChunkTrashManagerImpl::trashGarbageCollectorBulkSize = kDefaultTrashGarba
3638
size_t ChunkTrashManagerImpl::garbageCollectorSpaceRecoveryStep =
3739
kDefaultGarbageCollectorSpaceRecoveryStep;
3840

41+
uint64_t ChunkTrashManagerImpl::maxBytesReadPerDisk = 1024 * 1024; //1MiB
42+
uint64_t ChunkTrashManagerImpl::maxBytesWritePerDisk = 1024 * 1024; //1MiB
43+
3944
const std::string ChunkTrashManagerImpl::kTrashGuardString =
4045
std::string("/") + ChunkTrashManager::kTrashDirname + "/";
4146

@@ -159,12 +164,19 @@ int ChunkTrashManagerImpl::moveToTrash(const fs::path &filePath, const fs::path
159164

160165
void ChunkTrashManagerImpl::removeTrashFiles(
161166
const ChunkTrashIndex::TrashIndexDiskEntries &filesToRemove) const {
167+
std::vector<std::jthread> parallelRemovers;
168+
162169
for (const auto &[diskPath, fileEntries] : filesToRemove) {
163-
for (const auto &fileEntry : fileEntries) {
164-
if (removeFileFromTrash(fileEntry.second) != SAUNAFS_STATUS_OK) { continue; }
170+
parallelRemovers.emplace_back(removeTrashFilesFromDisk, fileEntries, diskPath);
171+
}
172+
}
173+
174+
void ChunkTrashManagerImpl::removeTrashFilesFromDisk(
175+
const ChunkTrashIndex::TrashIndexFileEntries &filesToRemove, const std::string &diskPath) {
176+
for (const auto &fileEntry : filesToRemove) {
177+
if (removeFileFromTrash(fileEntry.second) != SAUNAFS_STATUS_OK) { continue; }
165178
HddStats::gStatsOperationsGCPurge++;
166-
getTrashIndex().remove(fileEntry.first, fileEntry.second, diskPath);
167-
}
179+
getTrashIndex().remove(fileEntry.first, fileEntry.second, diskPath);
168180
}
169181
}
170182

@@ -256,7 +268,38 @@ void ChunkTrashManagerImpl::collectGarbage() {
256268
if (!ChunkTrashManager::isEnabled) { return; }
257269
std::time_t const currentTime = std::time(nullptr);
258270
std::time_t const expirationTime = currentTime - trashTimeLimitSeconds;
259-
removeExpiredFiles(expirationTime, trashGarbageCollectorBulkSize);
271+
uint64_t currentBytesWrite = HddStats::gBytesWrittenSinceLastGCSweep.exchange(0);
272+
uint64_t currentBytesRead = HddStats::gBytesReadSinceLastGCSweep.exchange(0);
273+
uint64_t currentDiskCount = 1;
274+
{
275+
std::lock_guard disksLockGuard(gDisksMutex);
276+
currentDiskCount = std::max(currentDiskCount, gDisks.size());
277+
}
278+
currentBytesRead /= currentDiskCount;
279+
currentBytesWrite /= currentDiskCount;
280+
maxBytesReadPerDisk = std::max(maxBytesReadPerDisk, currentBytesRead);
281+
maxBytesWritePerDisk = std::max(maxBytesWritePerDisk, currentBytesWrite);
282+
283+
double totalIOPercentage =
284+
static_cast<double>(currentBytesRead) * 100.0 / maxBytesReadPerDisk +
285+
static_cast<double>(currentBytesWrite) * 100.0 / maxBytesWritePerDisk;
286+
287+
auto invertedSigmoid = [](double val) -> double {
288+
const double steepness = 10.0; // steepness
289+
const double center = 0.15; // center in [0,1]
290+
const double valnorm = val / 100.0; // normalize so that 100% total I/O maps to 1.0
291+
292+
const double res = 1.0 / (1.0 + std::exp(-steepness * (valnorm - center)));
293+
return 1.0 - res;
294+
};
295+
296+
uint64_t bulksizeScaled = trashGarbageCollectorBulkSize * invertedSigmoid(totalIOPercentage);
297+
298+
static constexpr uint64_t kMinGCBulkSizeForActivation = 5;
299+
300+
if (bulksizeScaled >= kMinGCBulkSizeForActivation) {
301+
removeExpiredFiles(expirationTime, bulksizeScaled);
302+
}
260303
makeSpace(availableThresholdGB, garbageCollectorSpaceRecoveryStep);
261304
}
262305

src/chunkserver/chunkserver-common/chunk_trash_manager_impl.h

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,14 @@ class ChunkTrashManagerImpl : public IChunkTrashManagerImpl {
8080
*/
8181
void removeTrashFiles(const ChunkTrashIndex::TrashIndexDiskEntries &filesToRemove) const;
8282

83+
/**
84+
* @brief Removes a set of specified files from the trash for an specific FileEntries.
85+
* @param filesToRemove The list of files to be permanently deleted.
86+
* @param diskPath The path of the disk of the files.
87+
*/
88+
static void removeTrashFilesFromDisk(
89+
const ChunkTrashIndex::TrashIndexFileEntries &filesToRemove, const std::string &diskPath);
90+
8391
/**
8492
* @brief Checks if a given timestamp string matches the expected format.
8593
* @param timestamp The timestamp string to validate.
@@ -137,6 +145,10 @@ class ChunkTrashManagerImpl : public IChunkTrashManagerImpl {
137145
void reloadConfig() override;
138146

139147
private:
148+
/// Maximum recorded BytesReadPerDisk and BytesWritePerDisk in a GC cycle
149+
static uint64_t maxBytesReadPerDisk;
150+
static uint64_t maxBytesWritePerDisk;
151+
140152
/// Minimum available space threshold (in GB) before triggering garbage
141153
/// collection.
142154
static size_t availableThresholdGB;
@@ -145,7 +157,7 @@ class ChunkTrashManagerImpl : public IChunkTrashManagerImpl {
145157
/// Time limit (in seconds) for files to be considered eligible for
146158
/// deletion.
147159
static size_t trashTimeLimitSeconds;
148-
static constexpr size_t kDefaultTrashTimeLimitSeconds = 259200;
160+
static constexpr size_t kDefaultTrashTimeLimitSeconds = 0;
149161

150162
/// Number of files processed in each bulk operation during garbage
151163
/// collection.

src/chunkserver/chunkserver-common/hdd_stats.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ static inline void totalRead(IDisk *disk, uint64_t size, MicroSeconds duration)
4242

4343
gStatsTotalOperationsRead++;
4444
gStatsTotalBytesRead += size;
45+
gBytesReadSinceLastGCSweep += size;
4546
gStatsTotalTimeRead += duration;
4647

4748
auto &diskStats = disk->getCurrentStats();
@@ -61,6 +62,7 @@ static inline void totalWrite(IDisk *disk, uint64_t size,
6162

6263
gStatsTotalOperationsWrite++;
6364
gStatsTotalBytesWrite += size;
65+
gBytesWrittenSinceLastGCSweep += size;
6466
gStatsTotalTimeWrite += duration;
6567

6668
auto &diskStats = disk->getCurrentStats();

src/chunkserver/chunkserver-common/hdd_stats.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,12 @@ inline std::atomic<uint32_t> gStatsOperationsDupTrunc(0);
5858

5959
inline std::atomic<uint32_t> gStatsOperationsGCPurge(0);
6060

61+
62+
// This is for internal use of GarbageCollector
63+
inline std::atomic<uint64_t> gBytesWrittenSinceLastGCSweep(0);
64+
inline std::atomic<uint64_t> gBytesReadSinceLastGCSweep(0);
65+
66+
6167
struct statsReport {
6268
statsReport(uint64_t *overBytesRead, uint64_t *overBytesWrite,
6369
uint32_t *overOpsRead, uint32_t *overOpsWrite,

src/chunkserver/hddspacemgr.cc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2524,22 +2524,22 @@ void hddDisksThread() {
25242524
}
25252525

25262526
void hddFreeResourcesThread() {
2527-
static const int kDelayedStep = 2;
2527+
static const int kDelayedStep = 3;
25282528
static const uint32_t kOldIoBuffersExpirationTimeMs = kDelayedStep * 1000;
25292529
static const int kMaxFreeUnused = 1024;
25302530
TRACETHIS();
25312531

25322532
pthread_setname_np(pthread_self(), "freeResThread");
25332533

25342534
while (!gTerminate) {
2535+
Timeout timeout(std::chrono::microseconds(kDelayedStep * 1'000'000));
25352536
gOpenChunks.freeUnused(eventloop_time(), gChunksMapMutex,
25362537
kMaxFreeUnused);
25372538
ChunkTrashManager::collectGarbage();
25382539
hddReleaseDisksToBeDeleted();
25392540
/// Release buffers older than kDelayedStep seconds
25402541
releaseOldIoBuffers(kOldIoBuffersExpirationTimeMs);
2541-
2542-
sleep(kDelayedStep);
2542+
usleep(timeout.remaining_us());
25432543
}
25442544
}
25452545

src/data/sfschunkserver.cfg.in

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -253,9 +253,10 @@
253253
## the capacity of the smaller configured disk. (Default: 0)
254254
# CHUNK_TRASH_FREE_SPACE_THRESHOLD_GB = 0
255255

256-
## Defines the bulk size for the garbage collector when processing chunks in the
257-
## trash. This determines how many files are processed in each garbage
258-
## collection cycle. (Default: 1000)
256+
## Defines the maximum bulk size per disk for the garbage
257+
## collector when processing chunks in the trash.
258+
## This value is used when no I/O pressure is detected and is dynamically
259+
## scaled down based on current disk load. (Default: 1000)
259260
# CHUNK_TRASH_GC_BATCH_SIZE = 1000
260261

261262
## [ADVANCED] The number of files to remove from the trash in a single GC cycle,

0 commit comments

Comments
 (0)