Skip to content

Conversation

gagik
Copy link
Collaborator

@gagik gagik commented Feb 6, 2025

Adds retentionGB which enforces maximal GB of log files which are kept.

@gagik gagik requested a review from alenakhineika February 6, 2025 13:19
@gagik gagik changed the title feat(mongodb-log-writer): add logRetentionGB configuration MONGOSH-1985 feat(mongodb-log-writer): add retentionGB configuration MONGOSH-1985 Feb 7, 2025
@gagik gagik force-pushed the gagik/log-writer-gb-retention branch from d7a3e25 to 3f326b0 Compare February 7, 2025 12:02
@gagik gagik force-pushed the gagik/log-writer-gb-retention branch from 3f326b0 to 6e4e929 Compare February 7, 2025 12:03
leastRecentFileHeap.size() > this._options.maxLogFileCount;

if (reachedMaxStorageSize || reachedMaxFileCount) {
toDelete = leastRecentFileHeap.pop();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So ... took some time to think this through, but the problem with this approach is that we'll generally want to delete the oldest files that are necessary to ensure that max file count + max file size limitations

But with this approach, that's no longer the case – you could end up with e.g. the most recent file being deleted, if it happens to be the first file we read from the directory and it's already over the size limit – right?

(I'm not sure if we need to let go of the heap as the underlying data structure, certainly wouldn't be the end of the world, but I'm pretty sure that the "read one file, check conditions, maybe delete one file" approach won't work anymore)

Copy link
Collaborator Author

@gagik gagik Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah, we could get rid of the heap, sort by creation date (which would be same as the log's name?) from the getgo and adjust the logic accordingly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which would be same as the log's name?

Yes, that's an assumption I'd still keep around 👍

@gagik gagik force-pushed the gagik/log-writer-gb-retention branch from 8b8c74e to b008854 Compare February 7, 2025 15:46
@@ -17,9 +16,11 @@ interface MongoLogOptions {
retentionDays: number;
/** The maximal number of log files which are kept. */
maxLogFileCount?: number;
/** A handler for warnings related to a specific filesystem path. */
onerror: (err: Error, path: string) => unknown | Promise<void>;
/** The maximal GB of log files which are kept. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** The maximal GB of log files which are kept. */
/** The maximal size of log files which are kept. */

Comment on lines 50 to 51
const files = await fs.readdir(dir, { withFileTypes: true });
for (const file of files) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not .opendir(), which properly streams, if we're iterating it in a loop anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, I originally had a reduce-based logic (which is also how the sort got lost...) which is why I switched to read, will switch back

}

sortedLogFiles.push({ fullPath, id, size });
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being a loop means we should be checking maxDurationMs now here as well

It's also a bit unfortunate that if this part takes longer than the maximum duration, we now don't end up deleting files even if we have already identified that they should be deleted regardless of that (e.g. through the expiration time setting or the max file count setting)

? this._options.retentionGB * 1024 * 1024 * 1024
: Infinity;

for await (const { id, fullPath } of [...sortedLogFiles]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused – Where is sortedLogFiles being sorted?

@gagik gagik force-pushed the gagik/log-writer-gb-retention branch 3 times, most recently from 5bc4cb7 to b219436 Compare February 10, 2025 13:49
}
}

if (this._options.retentionGB) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should address most of the concerns.
Not sure what other issues might be with fs.stat worth handling

Includes test for random OS-based file order as well as mixing different settings together.
@gagik gagik force-pushed the gagik/log-writer-gb-retention branch from b219436 to 99afade Compare February 10, 2025 13:53
@gagik gagik force-pushed the gagik/log-writer-gb-retention branch from 96537b0 to 9f3c1bf Compare February 10, 2025 14:55
@gagik gagik merged commit 4448d1b into main Feb 10, 2025
6 checks passed
@gagik gagik deleted the gagik/log-writer-gb-retention branch February 10, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants