-
Notifications
You must be signed in to change notification settings - Fork 0
Optimize RegexMatcher and CaseInsensitiveMatcher hot paths #27
Copy link
Copy link
Open
Labels
area/perfPerformance, benchmarksPerformance, benchmarksarea/searchSearch pipeline, matchersSearch pipeline, matcherstype/refactorCode improvement without behavior changeCode improvement without behavior change
Milestone
Description
Source: Code quality review 2026-03-31 (Score: 52/100)
Category: Performance Characteristics (3/10)
Priority: 3/5
Problem
RegexMatcher double-match (search.rs:181-200)
let content_str = String::from_utf8_lossy(content); // full file allocation
if !self.re.is_match(&content_str) { return vec![]; } // 1st: full scan
for (i, line) in content_str.lines().enumerate() {
if self.re.is_match(line) { ... } // 2nd: per-line scan
}With 5 candidate files (100KB each): 500KB String conversion + 500KB full scan + 500KB line scan = 1.5MB wasted work. The first is_match on the entire content is redundant — line iteration alone is sufficient.
CaseInsensitiveMatcher full copy (search.rs:137-138)
let mut lowered = content.to_vec(); // full file copy
lowered.make_ascii_lowercase();Full byte copy + lowercase for every candidate file that passes early rejection.
Proposed Solution
- Remove the redundant
is_matchon full content in RegexMatcher - Replace
from_utf8_lossywith per-linefrom_utf8to avoid allocation on valid UTF-8 - Explore lazy line-by-line lowercasing for CaseInsensitiveMatcher
Why Now
Should be done immediately after benchmark infrastructure (#2) to measure impact quantitatively. Expected 1.5-2x throughput improvement for regex search.
Estimated effort: 2 hours
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/perfPerformance, benchmarksPerformance, benchmarksarea/searchSearch pipeline, matchersSearch pipeline, matcherstype/refactorCode improvement without behavior changeCode improvement without behavior change