- 
                Notifications
    
You must be signed in to change notification settings  - Fork 83
 
chore: add worker thread for processWorkspaceFolders #2441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
          Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##             main    #2441      +/-   ##
==========================================
- Coverage   62.76%   62.58%   -0.19%     
==========================================
  Files         266      266              
  Lines       59671    59784     +113     
  Branches     3844     3837       -7     
==========================================
- Hits        37454    37413      -41     
- Misses      22142    22296     +154     
  Partials       75       75              
 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
  | 
    
| 
               | 
          ||
| for (const file of files) { | ||
| const fileExtName = '.' + getFileExtensionName(file) | ||
| if (!uniqueFiles.has(file) && fileExtensions.includes(fileExtName)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting fileExtensions to set should quicken the includes lookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copied from previous processWorkspaceFolders
| }) | ||
| 
               | 
          ||
| // Wait if too many batches in progress | ||
| while (batchesInProgress > 5) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to constants:
private readonly BATCH_SIZE = 10000;
private readonly MAX_CONCURRENT_BATCHES = 5;
private readonly WORKER_TIMEOUT_MS = 300_000;
| tls: false, | ||
| http2: false, | ||
| buffer: require.resolve('buffer/'), | ||
| worker_threads: false, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confused about this...wouldn't this turn off worker threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no the worker thread are copied to the servers folder on package.sh, we are not put it in the browser bundle using webpack. This will help resolve a CI failure on webpack
| batchesInProgress-- | ||
| } else if (msg.type === 'result') { | ||
| clearTimeout(timeout) | ||
| void worker.terminate() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try and catch for terminate failures/errors
| parentPort.postMessage({ | ||
| type: 'result', | ||
| data: { | ||
| files: [...uniqueFiles], | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of 184K files like in description, would sending a message help with all the 184k file paths? could we stream it instead or in chunks?
Problem
Node is single-threaded. Past, we used the main thread for processing everything. For large projects (e.g., Chromium with 184k+ files), workspace indexing blocked the main thread indefinitely and never completed. The synchronous file size checking (fs.statSync()) prevented the event loop from processing other operations, causing both file collection and vector library indexing to fail.
Solution
Implemented a worker thread to offload file size checking from the main thread:
Changes
Now large projects complete indexing successfully (previously never finished). Main thread remains responsive throughout indexing. Both file collection and vector library operations are faster due to parallel execution. For small project, there is also large improvement for both of the context command received and index built for repomap.
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.