Skip to content

[M-003] MemoryRouter performance optimization: reduce 260ms overhead to <20ms #734

@rjmurillo-bot

Description

@rjmurillo-bot

Context

The MemoryRouter module (M-003, ADR-037) was implemented with correct functionality but does not meet the performance targets specified in ADR-037. This issue tracks the optimization work to achieve the target latency.

Current Performance vs Targets

Metric ADR-037 Target Actual Gap
Serena-only search <20ms 477.30ms 457ms over target
Augmented (Serena + Forgetful) <50ms Not tested -
Health check (cached) <1ms 4.48ms 3.48ms over target

Root Cause Analysis

The module adds 260ms overhead compared to the raw benchmark script (217.42ms baseline → 477.30ms with module). Contributing factors identified:

1. SHA-256 Hashing (Estimated: ~100-150ms)

Each matched file has its content hashed for deduplication. With 9-10 results per query and 465 memory files scanned:

# Current implementation (executed per file)
[System.Security.Cryptography.SHA256]::HashData([System.Text.Encoding]::UTF8.GetBytes($Content))

2. File Content Reading (Estimated: ~80-100ms)

Full file content is read for every matched file, even when only the filename matched:

# Current: reads all content upfront
$content = Get-Content -Path $file.FullName -Raw -Encoding UTF8

3. Input Validation (Estimated: ~10-20ms)

ValidatePattern regex executed on every call:

[ValidatePattern('^[a-zA-Z0-9\s\-.,_()&:]+$')]

4. Object Construction (Estimated: ~30-50ms)

Full PSCustomObject created for each result with 6 properties.

Proposed Optimizations

High Impact (P0)

  1. Lazy content loading: Don't read file content until explicitly requested

    • Return [Lazy[string]] or callback instead of eager loading
    • Skip SHA-256 hash until merge phase requires it
  2. Filename-only mode: Add -NamesOnly switch for index-style queries

    • Return just name + path without content/hash
    • Sufficient for most agent memory lookups

Medium Impact (P1)

  1. Content caching: Cache file contents with modification time check

    • $script:ContentCache = @{} with LastWriteTime validation
    • 465 files × ~1KB avg = ~500KB memory footprint
  2. Pre-computed keyword index: Build index at module load

    • Map keywords → file paths
    • Refresh on -Force or TTL expiry

Low Impact (P2)

  1. Health check pre-warming: Call Test-ForgetfulAvailable at module import

    • Reduces first-call latency
    • Current 4.48ms → target <1ms
  2. Batch hashing: Hash multiple files in parallel using runspaces

Acceptance Criteria

  • Serena-only search: <20ms (currently 477ms)
  • Health check cached: <1ms (currently 4.48ms)
  • Augmented search: <50ms (when Forgetful available)
  • No regression in test coverage (38 tests passing)
  • Memory footprint documented

Validation Approach

Use existing benchmark script to measure before/after:

# Baseline comparison
pwsh scripts/Measure-MemoryPerformance.ps1 -Queries @('memory router', 'git hooks') -Iterations 5

References

  • ADR: .agents/architecture/ADR-037-memory-router-architecture.md
  • Module: scripts/MemoryRouter.psm1
  • Tests: tests/MemoryRouter.Tests.ps1
  • Baseline: .agents/analysis/M-003-baseline.md
  • Validation: .agents/analysis/M-003-performance-validation.md
  • Implementation commit: 59cabcd

Detailed Benchmark Data

Query-by-Query Performance (5 iterations each)

Query Average (ms) Results
memory router 490.95 9
PowerShell arrays 392.67 9
git hooks validation 577.45 10
session protocol 411.23 10
security patterns 514.21 10

Observations:

  • Queries returning 10 results are ~50ms slower than 9-result queries
  • Variance correlates with result count (more files = more hashing)
  • git hooks validation is slowest due to keyword density in filenames

Created from M-003 implementation (Session 126)

Metadata

Metadata

Assignees

Labels

agent-memoryContext persistence agentagent-qaTesting and verification agentagent-securitySecurity assessment agentarea-infrastructureBuild, CI/CD, configurationenhancementNew feature or requestpriority:P1Important: Affects user experience significantly, high business value

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions