Skip to content

Optimize RegexMatcher and CaseInsensitiveMatcher hot paths #27

@momokun7

Description

@momokun7

Source: Code quality review 2026-03-31 (Score: 52/100)
Category: Performance Characteristics (3/10)
Priority: 3/5

Problem

RegexMatcher double-match (search.rs:181-200)

let content_str = String::from_utf8_lossy(content);  // full file allocation
if !self.re.is_match(&content_str) { return vec![]; } // 1st: full scan
for (i, line) in content_str.lines().enumerate() {
    if self.re.is_match(line) { ... }                  // 2nd: per-line scan
}

With 5 candidate files (100KB each): 500KB String conversion + 500KB full scan + 500KB line scan = 1.5MB wasted work. The first is_match on the entire content is redundant — line iteration alone is sufficient.

CaseInsensitiveMatcher full copy (search.rs:137-138)

let mut lowered = content.to_vec();  // full file copy
lowered.make_ascii_lowercase();

Full byte copy + lowercase for every candidate file that passes early rejection.

Proposed Solution

  1. Remove the redundant is_match on full content in RegexMatcher
  2. Replace from_utf8_lossy with per-line from_utf8 to avoid allocation on valid UTF-8
  3. Explore lazy line-by-line lowercasing for CaseInsensitiveMatcher

Why Now

Should be done immediately after benchmark infrastructure (#2) to measure impact quantitatively. Expected 1.5-2x throughput improvement for regex search.

Estimated effort: 2 hours

Metadata

Metadata

Assignees

Labels

area/perfPerformance, benchmarksarea/searchSearch pipeline, matcherstype/refactorCode improvement without behavior change

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions