-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: resolve DirectoryScanner memory leak and improve file limit handling #5785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Remove codeBlocks accumulation that was causing memory exhaustion - Fix batch processing bugs where file info was added multiple times per file - Move totalBlockCount increment outside block loop to fix counting bug - Return empty codeBlocks array since it's not used by main orchestrator logic - Update tests to expect empty codeBlocks array This fixes the extension running out of memory during indexing of large codebases. The memory usage should drop from ~500MB-1GB to ~10-50MB for large projects.
- Remove codeBlocks property from IDirectoryScanner interface - Update scanner implementation to not return codeBlocks - Update tests to remove codeBlocks assertions - This completes the memory optimization by eliminating the unused return value The scanner now only returns stats and totalBlockCount, which are the only values actually used by the orchestrator. This further reduces memory usage and simplifies the interface.
…tant - Rename MAX_LIST_FILES_LIMIT to MAX_LIST_FILES_LIMIT_CODE_INDEX for clarity - Increase limit from 3,000 to 50,000 files to handle larger codebases - This complements the memory leak fixes by allowing proper scanning of enterprise projects
|
|
||
| // Add file info once per file (outside the block loop) | ||
| if (addedBlocksFromFile) { | ||
| totalBlockCount += fileBlockCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update to shared batch accumulators (totalBlockCount and currentBatchFileInfos) is done outside a mutex lock. This could lead to race conditions. Also, consider reusing the cached hash value (avoid calling cacheManager.getHash(filePath) twice).
|
✅ No security or compliance issues detected. Reviewed everything up to b16dc44. Security Overview
Detected Code Changes
Reply to this PR with |
- Wrap totalBlockCount and currentBatchFileInfos updates in mutex lock to prevent race conditions - Cache isNewFile result to avoid duplicate cacheManager.getHash() calls - Ensures thread-safe batch processing in concurrent file parsing
- Convert activeBatchPromises from Array to Set for efficient removal - Clean up completed promises immediately after they finish - Remove unnecessary Array.from() when passing Set to Promise.all - Prevents unbounded growth of promise references during large scans
* main: fix: Resolve confusing auto-approve checkbox states (RooCodeInc#5602) fix: prevent empty mode names from being saved (RooCodeInc#5766) (RooCodeInc#5794) Format time in ISO 8601 (RooCodeInc#5793) fix: resolve DirectoryScanner memory leak and improve file limit handling (RooCodeInc#5785) Fix settings dirty check (RooCodeInc#5779) feat: increase Ollama API timeout values and extract as constants (RooCodeInc#5778) fix: Exclude Terraform and Terragrunt cache directories from checkpoints (RooCodeInc#4601) (RooCodeInc#5750) Move less commonly used provider settings into an advanced dropdown (RooCodeInc#5762) feat: Add configurable error & repetition limit with unified control (RooCodeInc#5654) (RooCodeInc#5752) list-files must include at least the first-level directory contents (RooCodeInc#5303) Update evals repo link (RooCodeInc#5758) Feature/vertex ai model name conversion (RooCodeInc#5728) fix(litellm): handle baseurl with paths correctly (RooCodeInc#5697) Add telemetry for todos (RooCodeInc#5746) feat: add undo functionality for enhance prompt feature (fixes RooCodeInc#5741) (RooCodeInc#5742) Fix max_tokens limit for moonshotai/kimi-k2-instruct on Groq (RooCodeInc#5740) Changeset version bump (RooCodeInc#5735) Add changeset for v3.23.12 patch release (RooCodeInc#5734) Update the max-token calculation in model-params to use the shared logic (RooCodeInc#5720) Changeset version bump (RooCodeInc#5719) chore: add changeset for v3.23.11 patch release (RooCodeInc#5718) Add Kimi K2 model and better support (RooCodeInc#5717) Fix: Remove invalid skip-checkout parameter from GitHub Actions workflows (RooCodeInc#5676) feat: add Cmd+Shift+. keyboard shortcut for previous mode switching (RooCodeInc#5695) Changeset version bump (RooCodeInc#5708) chore: add changeset for v3.23.10 patch release (RooCodeInc#5707) Add padding to the index model options (RooCodeInc#5706) fix: prioritize built-in model dimensions over custom dimensions (RooCodeInc#5705) Update CHANGELOG.md Changeset version bump (RooCodeInc#5702) chore: add changeset for v3.23.9 patch release (RooCodeInc#5701) Tweaks to command timeout error (RooCodeInc#5700) Update contributors list (RooCodeInc#5639) feat: enable Claude Code provider to run natively on Windows (RooCodeInc#5615) feat: Add configurable timeout for command execution (RooCodeInc#5668) feat: add gemini-embedding-001 model to code-index service (RooCodeInc#5698) fix: resolve vector dimension mismatch error when switching embedding models (RooCodeInc#5616) (RooCodeInc#5617) fix: [5424] return the cwd in the exec tool's response so that the model is not lost after subsequent calls (RooCodeInc#5667) Changeset version bump (RooCodeInc#5670) chore: add changeset for v3.23.8 patch release (RooCodeInc#5669)
Closes #5642
Closes #5516
Closes #5763
Fixes critical memory leak in DirectoryScanner that was causing out-of-memory issues when indexing large codebases.
Key Changes:
Impact:
All tests pass and functionality is preserved.
Important
Fixes memory leak in
DirectoryScanner, increases file limit, and simplifies interface by removingcodeBlocksaccumulation and return value.DirectoryScannerby removingcodeBlocksaccumulation inscanner.ts.MAX_LIST_FILES_LIMIT_CODE_INDEXfrom 3,000 to 50,000 inconstants/index.ts.codeBlocksreturn value fromscanDirectory()infile-processor.tsandscanner.ts.scanner.spec.tsto reflect removal ofcodeBlocksand verify processing without it.scanner.ts.This description was created by
for b16dc44. You can customize this summary. It will automatically update as commits are pushed.