fix(security): reclassify search_code as ToolResult, add PII NER input truncation#2518
Merged
fix(security): reclassify search_code as ToolResult, add PII NER input truncation#2518
Conversation
…t truncation Fixes #2515 and #2516. - Remove search_code from McpResponse source branch in tool_execution; it queries a local AST index (user-owned code) and must not be subject to the higher-sensitivity injection patterns intended for remote content. Eliminates false-positive flags=9 on Cargo.toml / README reads. - Add pii_ner_max_chars (default 8192) to AgentSecurity and ClassifiersConfig. Truncate NER input at a valid UTF-8 boundary before backend.classify() to prevent 150+ DeBERTa chunks and timeout on large search_code outputs. char_to_byte offset map is built from the truncated slice so span offsets remain correct. - Add sanitizer_injection_fp_local and pii_ner_timeouts metrics counters.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
search_codeasToolResult(same trust tier asshell/read_file) instead ofMcpResponse. The tool queries a local AST index built from user-owned code — treating it as remote/untrusted caused false-positive injection flags on Cargo.toml section headers, README badge URLs, and shell examples in code blocks, silently blocking Qdrant memory writes.pii_ner_max_charsconfig field (default 8192) toClassifiersConfig/AgentSecurity. PII NER input is now truncated at a valid UTF-8 boundary beforebackend.classify(), preventing 150+ DeBERTa chunks on largesearch_codeoutputs from exceeding the per-call timeout and falling back to regex-only detection.sanitizer_injection_fp_localandpii_ner_timeoutsmetrics counters.Closes #2515, closes #2516.
Changed files
crates/zeph-config/src/classifiers.rs— newpii_ner_max_chars: usizefield (default 8192)crates/zeph-core/src/agent/state/mod.rs—pii_ner_max_charsinSecurityStatecrates/zeph-core/src/agent/builder.rs—with_pii_ner_classifieracceptsmax_charscrates/zeph-core/src/agent/mod.rs— default initializercrates/zeph-core/src/agent/tool_execution/mod.rs— reclassifysearch_code; truncate NER inputcrates/zeph-core/src/metrics.rs— two new counterssrc/agent_setup.rs— passespii_ner_max_charsfrom config to builderCHANGELOG.md— entries under[Unreleased]Test plan
cargo nextest run --workspace --features full --lib --bins)cargo clippy --workspace --features full -- -D warningscleancargo +nightly fmt --checkcleanCargo.toml— no injection flags in logsearch_code— no PII NER timeout in log