Skip to content

Conversation

@XiaoxuanLu
Copy link
Contributor

Problem

Node is single-threaded. Past, we used the main thread for processing everything. For large projects (e.g., Chromium with 184k+ files), workspace indexing blocked the main thread indefinitely and never completed. The synchronous file size checking (fs.statSync()) prevented the event loop from processing other operations, causing both file collection and vector library indexing to fail.

Solution

Implemented a worker thread to offload file size checking from the main thread:

  • Worker Thread - Handles fs.statSync() calls in a separate thread, preventing main thread blocking
  • Batch Processing - Sends files in 10k batches to avoid stack overflow
  • Parallel Execution - File processing and vector library operations now run simultaneously without blocking each other

Changes

  • Added: server/aws-lsp-codewhisperer/src/shared/fileProcessingWorker.js - Worker thread implementation
  • Modified: localProjectContextController.ts - Worker thread integration with fallback to synchronous processing if worker file not found
  • Modified: package.sh - Copies worker file into our bundle

Now large projects complete indexing successfully (previously never finished). Main thread remains responsive throughout indexing. Both file collection and vector library operations are faster due to parallel execution. For small project, there is also large improvement for both of the context command received and index built for repomap.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@XiaoxuanLu XiaoxuanLu marked this pull request as ready for review October 24, 2025 01:51
@XiaoxuanLu XiaoxuanLu requested a review from a team as a code owner October 24, 2025 01:51
@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 5.21739% with 109 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.58%. Comparing base (a53d14e) to head (04b9158).

Files with missing lines Patch % Lines
...sperer/src/shared/localProjectContextController.ts 1.80% 109 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2441      +/-   ##
==========================================
- Coverage   62.76%   62.58%   -0.19%     
==========================================
  Files         266      266              
  Lines       59671    59784     +113     
  Branches     3844     3837       -7     
==========================================
- Hits        37454    37413      -41     
- Misses      22142    22296     +154     
  Partials       75       75              
Flag Coverage Δ
unittests 62.58% <5.21%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


for (const file of files) {
const fileExtName = '.' + getFileExtensionName(file)
if (!uniqueFiles.has(file) && fileExtensions.includes(fileExtName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting fileExtensions to set should quicken the includes lookup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied from previous processWorkspaceFolders

})

// Wait if too many batches in progress
while (batchesInProgress > 5) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to constants:

private readonly BATCH_SIZE = 10000;
private readonly MAX_CONCURRENT_BATCHES = 5;
private readonly WORKER_TIMEOUT_MS = 300_000;

tls: false,
http2: false,
buffer: require.resolve('buffer/'),
worker_threads: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused about this...wouldn't this turn off worker threads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no the worker thread are copied to the servers folder on package.sh, we are not put it in the browser bundle using webpack. This will help resolve a CI failure on webpack

batchesInProgress--
} else if (msg.type === 'result') {
clearTimeout(timeout)
void worker.terminate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try and catch for terminate failures/errors

parentPort.postMessage({
type: 'result',
data: {
files: [...uniqueFiles],
Copy link
Contributor

@ashishrp-aws ashishrp-aws Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of 184K files like in description, would sending a message help with all the 184k file paths? could we stream it instead or in chunks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants