fix(security): reclassify search_code as ToolResult, add PII NER input truncation by bug-ops · Pull Request #2518 · bug-ops/zeph

bug-ops · 2026-03-31T11:27:34Z

Summary

Reclassify search_code as ToolResult (same trust tier as shell/read_file) instead of McpResponse. The tool queries a local AST index built from user-owned code — treating it as remote/untrusted caused false-positive injection flags on Cargo.toml section headers, README badge URLs, and shell examples in code blocks, silently blocking Qdrant memory writes.
Add pii_ner_max_chars config field (default 8192) to ClassifiersConfig / AgentSecurity. PII NER input is now truncated at a valid UTF-8 boundary before backend.classify(), preventing 150+ DeBERTa chunks on large search_code outputs from exceeding the per-call timeout and falling back to regex-only detection.
Add sanitizer_injection_fp_local and pii_ner_timeouts metrics counters.

Closes #2515, closes #2516.

Changed files

crates/zeph-config/src/classifiers.rs — new pii_ner_max_chars: usize field (default 8192)
crates/zeph-core/src/agent/state/mod.rs — pii_ner_max_chars in SecurityState
crates/zeph-core/src/agent/builder.rs — with_pii_ner_classifier accepts max_chars
crates/zeph-core/src/agent/mod.rs — default initializer
crates/zeph-core/src/agent/tool_execution/mod.rs — reclassify search_code; truncate NER input
crates/zeph-core/src/metrics.rs — two new counters
src/agent_setup.rs — passes pii_ner_max_chars from config to builder
CHANGELOG.md — entries under [Unreleased]

Test plan

7564/7564 tests pass (cargo nextest run --workspace --features full --lib --bins)
cargo clippy --workspace --features full -- -D warnings clean
cargo +nightly fmt --check clean
Live session: ask agent to read Cargo.toml — no injection flags in log
Live session: trigger search_code — no PII NER timeout in log

…t truncation Fixes #2515 and #2516. - Remove search_code from McpResponse source branch in tool_execution; it queries a local AST index (user-owned code) and must not be subject to the higher-sensitivity injection patterns intended for remote content. Eliminates false-positive flags=9 on Cargo.toml / README reads. - Add pii_ner_max_chars (default 8192) to AgentSecurity and ClassifiersConfig. Truncate NER input at a valid UTF-8 boundary before backend.classify() to prevent 150+ DeBERTa chunks and timeout on large search_code outputs. char_to_byte offset map is built from the truncated slice so span offsets remain correct. - Add sanitizer_injection_fp_local and pii_ner_timeouts metrics counters.

github-actions bot added bug Something isn't working size/M Medium PR (51-200 lines) documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate and removed bug Something isn't working size/M Medium PR (51-200 lines) labels Mar 31, 2026

bug-ops enabled auto-merge (squash) March 31, 2026 11:27

bug-ops merged commit 2fdfd1e into main Mar 31, 2026
27 checks passed

bug-ops deleted the fix/2515-security-sanitizer branch March 31, 2026 11:35

bug-ops mentioned this pull request Mar 31, 2026

perf(classifiers): PII NER still timing out at 30s in --features full builds — metal feature not included in 'full' #2538

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): reclassify search_code as ToolResult, add PII NER input truncation#2518

fix(security): reclassify search_code as ToolResult, add PII NER input truncation#2518
bug-ops merged 1 commit intomainfrom
fix/2515-security-sanitizer

bug-ops commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 31, 2026

Summary

Changed files

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant