-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: optimize parser performance by reusing parser instances #7478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add global caches for parser instances and loaded languages - Reuse existing parser instances instead of creating new ones for each file - Cache loaded WASM language files to avoid redundant loading - This significantly reduces CPU usage during code indexing Fixes #7476
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed my own code and found bugs I created 5 minutes ago. Classic.
|
|
||
| // Reuse parser instance from cache or create new one | ||
| let parser: ParserT | ||
| if (parserInstanceCache.has(languageName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical issue: The caching logic has a fundamental flaw. Multiple file extensions map to the same language (e.g., .js, .jsx, .json all use "javascript"), but you're caching by languageName while returning parsers keyed by extension.
This means when processing ["test.js", "test.jsx"], the second iteration will reuse the cached parser but still store it under a different key. More importantly, if different extensions need different parser configurations, this approach won't work.
Consider either:
- Cache by extension instead of language name
- Or ensure the parserKey logic aligns with the caching key
| } | ||
|
|
||
| // Create query | ||
| query = new Query(language, queryString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance concern: While you're caching parser instances and languages, Query objects are still being created fresh on every call. Since Query construction can be expensive (it parses and compiles the query string), should we also cache these?
| // Global cache for parser instances to avoid recreating them | ||
| const parserInstanceCache: Map<string, ParserT> = new Map() | ||
| // Global cache for loaded languages to avoid reloading WASM files | ||
| const languageCache: Map<string, LanguageT> = new Map() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory management concern: These caches will grow indefinitely without any eviction policy. For large codebases with many file types, this could lead to memory issues over time.
Consider:
- Adding a maximum cache size with LRU eviction
- Providing a way to clear caches when needed
- Adding cache statistics for monitoring
| let isParserInitialized = false | ||
|
|
||
| // Global cache for parser instances to avoid recreating them | ||
| const parserInstanceCache: Map<string, ParserT> = new Map() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread safety consideration: If this code runs in a multi-threaded environment (workers, etc.), these global Map objects could face race conditions. While JavaScript is single-threaded in most contexts, VS Code extensions can use workers.
Is this a concern for the current architecture?
|
Closing, see #7476 (comment) This is already done where the languageParser function is used, there's no need to do it again on the actual function |
This PR attempts to address Issue #7476 regarding high CPU usage during code indexing.
Problem
The CPU profile analysis revealed that the
parseContentfunction was being called excessively (32,538 hits), with new Parser instances being created for each file instead of reusing existing ones.Solution
Implemented a caching strategy to optimize parser performance:
Changes
src/services/tree-sitter/languageParser.tsto implement parser and language cachingTesting
Performance Impact
This optimization should significantly reduce CPU usage by:
Fixes #7476
Feedback and guidance are welcome!
Important
Optimizes parser performance in
languageParser.tsby implementing caching for parser instances and loaded languages, reducing CPU usage.languageParser.tsto reuse parser instances and cache loaded languages.parserInstanceCacheandlanguageCacheto store parser instances and loaded languages.loadRequiredLanguageParsers()checks caches before loading languages or creating parsers.This description was created by
for 32fae63. You can customize this summary. It will automatically update as commits are pushed.