Skip to content

Parallelize forest parsing for faster initial load #758

@polarmutex

Description

@polarmutex

Problem

Initial forest parsing processes included files sequentially in a single-threaded loop:

Affected file:

  • crates/lsp/src/forest.rs:78-202

Current flow:

while !done {
    for file in to_process.iter() {
        // Parse file (sequential)
        let tree = parser.parse(&text, None).unwrap();
        // Extract data
        let beancount_data = BeancountData::new(&tree, &content);
        // Process includes, add to queue
    }
}

Impact for project with 50 included files:

  • Current: 50 files × 10ms = 500ms initial load
  • Wasted: 7 CPU cores sitting idle while parsing sequentially

Solution

Use rayon to parse independent files in parallel:

use rayon::prelude::*;

// Parse all files in current batch in parallel
let results: Vec<_> = to_process
    .par_iter()
    .map(|file| {
        let mut parser = Parser::new();
        parser.set_language(&tree_sitter_beancount::language()).unwrap();
        
        let text = read_file_cached(file, &mut file_cache)?;
        let tree = parser.parse(&text, None).unwrap();
        let content = Rope::from_str(&text);
        let beancount_data = BeancountData::new(&tree, &content);
        
        // Extract include patterns
        let includes = extract_includes(&tree, &text, file);
        
        Ok((file.clone(), tree, beancount_data, includes))
    })
    .collect();

// Process results, collect new includes
for result in results {
    // Send to main thread, update state
    // Add discovered includes to next batch
}

Considerations:

  • Each thread needs its own Parser (not thread-safe)
  • Mutex/channel for sending results back
  • Progress reporting needs thread-safe counter

Expected Impact

  • 8x faster initial load on 8-core system (500ms → 62ms)
  • Better CPU utilization
  • Scales with available cores

Acceptance Criteria

  • Forest parsing uses rayon for parallel processing
  • Each worker thread has its own Parser
  • Progress reporting works correctly with parallel execution
  • File cache is thread-safe (or per-thread)
  • Existing tests pass
  • No race conditions or data races
  • Benchmark showing speedup with multiple files

Effort Estimate

~4 hours

Priority

Medium - High impact but only affects initial load, not ongoing usage

Notes

  • Consider using ThreadPoolBuilder to limit max threads
  • Need to handle errors gracefully from parallel workers
  • Progress reporting might need adjustment for concurrent updates

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions