fix(security): sidecar OOM caps, secret-file exclusion, SLSA attestations, cargo-deny by avifenesh · Pull Request #37 · agent-sh/agent-analyzer

avifenesh · 2026-04-26T11:15:54Z

Summary

Security hardening from an internal audit (2026-04-26). 5 commits, each a single concern:

#	Severity	Finding	Fix
1	HIGH	Sidecar parser allocates up to 4 GiB from attacker-controlled u32 lengths (OOM + `panic = abort`)	Cap header_len (1 MiB), vector_count (10M), dim (4096), name_len (1 KiB); 5 reject-path tests
2	HIGH	No release attestation — only SHA256 sidecars	Publish SLSA build-provenance via `actions/attest-build-provenance`
3	MED	Embed/slop walkers include hidden files (`.env`, `.git/`, `.ssh/`, keys)	`.hidden(true)` + is_secret_like deny patterns
4	MED	No per-file size cap before embedding	`max_filesize(5 MiB)` on both walkers
5	MED	No cargo-deny config	Add `deny.toml` (advisories/licenses/bans/sources)

Test plan

`cargo test -p analyzer-embed`: 35 unit + 4 integration, all pass
`cargo build --release`: clean
`cargo deny check`: all categories clean

Follow-ups (separate issues)

Client-side attestation verification in `agent-core`'s `lib/binary/index.js`
Add `cargo deny check` CI job (`EmbarkStudios/cargo-deny-action`)
Deduplicate `is_secret_like` between analyzer-embed and analyzer-graph

Note

Medium Risk
Moderate risk because it changes low-level sidecar parsing and repo file discovery (hidden/large/secret-like files now skipped), which could affect embedding/analyzer coverage, and it alters release workflow permissions/steps for attestations.

Overview
Adds security hardening across artifact parsing, repo walking, and releases.

The sidecar (repo-intel.embeddings.bin) reader now enforces strict caps and size cross-checks (header length, vector count, dim, string lengths) via Sidecar::from_bytes, with new tests to ensure malformed sidecars fail fast instead of causing OOM/panic.

Repo walkers used by embedding and slop analysis now exclude hidden paths, skip secret-like filenames/extensions/directories, and apply a 5 MiB per-file size cap (with new scan.rs tests).

The release workflow now publishes SLSA build-provenance attestations for per-target archives (OIDC id-token/attestations permissions, pinned actions/attest-build-provenance), and the repo adds a baseline deny.toml policy for cargo-deny (advisories/licenses/sources).

^{Reviewed by Cursor Bugbot for commit c6dc519. Configure here.}

The sidecar binary format is fully attacker-controlled once a repository's embedding cache is written to disk, and Sidecar::read previously trusted every length/count field to drive allocations directly: - header_len (u32) -> vec![0u8; header_len] could request up to 4 GiB. - header.vector_count drove an unbounded outer loop. - header.dim drove Vec::with_capacity per vector. - name_len (u32) drove another unbounded string allocation. Introduce explicit caps - 1 MiB header, 10M vectors, 4096 dim, 1 KiB names - and validate each field before any allocation. Add a from_bytes entry point that additionally cross-checks the declared vector count against the actual byte buffer so a malformed header cannot claim 10M vectors in a 100-byte file. read<R: Read> now slurps and delegates. Unit tests cover each cap and the implied-size check. The existing round-trip tests continue to pass.

…slop walks Both the embed scanner and the slop walker built WalkBuilder with .hidden(false), which meant .git/, .env, .ssh/id_rsa, .aws/credentials and similar secret-bearing paths were visited and - in the embed case - hashed and potentially shipped in the vector sidecar. Flip .hidden(true) in both walkers and add an explicit is_secret_like() filter that rejects well-known dotfiles (.env, .env.*, .npmrc, .pypirc, .netrc, .htpasswd), SSH/PGP/cloud config directories (.git/, .ssh/, .gnupg/, .aws/, .gcloud/, .azure/), SSH private-key filenames (id_rsa*, id_ed25519*, ...), and credential-like extensions (.pem/.key/.crt/.p12/.pfx/.jks/.keystore). The helper is duplicated across the two crates to avoid pulling analyzer-embed into analyzer-graph for one filter. Test: walk_excludes_dotfiles_and_secret_patterns constructs a tempdir populated with src/a.rs plus a set of secret-like files and confirms only the legitimate source file survives the walk.

Both walkers called fs::read / std::fs::read_to_string on every file they visited with no size check. A single large vendored bundle or a hostile 1 GiB file in the repo would force that much RAM per worker. Use WalkBuilder::max_filesize(Some(5 * 1024 * 1024)) in both the embed scanner and the slop walker; ignore::WalkBuilder already short-circuits oversized files before yielding them, so no per-file metadata call is needed in hot paths. 5 MiB comfortably covers the long tail of real source files while cutting off the obvious DoS vector. Test: walk_skips_files_larger_than_cap writes a 10 MiB .rs file alongside a small one in a tempdir and asserts only the small file is returned.

Previously the release workflow only published SHA-256 checksums, which prove integrity but not provenance. Add actions/attest-build-provenance (pinned to SHA) after archive creation on each matrix job so every published .tar.gz / .zip gets a signed in-toto attestation traceable back to the exact workflow run that produced it. Grant the job id-token: write and attestations: write so the action can sign via the runner's OIDC identity and post the attestation to the release. Note: this is defense in depth - the attestations are published but not yet verified client-side. The agent-core binary downloader would need cosign or slsa-verifier to check them; wiring that in is a separate PR.

deny.toml establishes baseline policy for four checks: - advisories: vulnerability = deny, yanked = deny. - licenses: allow MIT/Apache-2.0/BSD-{2,3}-Clause/ISC/MPL-2.0/Unicode/ Zlib/CC0; GPL-*/AGPL-*/SSPL are rejected by omission. - bans: multiple-versions + wildcards = warn; deny list is empty for now, extend as concrete risks surface. - sources: unknown-registry and unknown-git both deny; crates.io is the only allowed registry, agent-sh is the only allowed git org. Verified locally with cargo deny check - all four checks pass (the only warnings are "unmatched license allowance" / "multiple-versions", which are informational). CI integration (a `cargo deny check` job) is a follow-up.

gemini-code-assist

Code Review

This pull request introduces several security hardening measures, including file size limits and secret-pattern exclusion during repository walks, as well as robust input validation for sidecar parsing to prevent potential memory exhaustion attacks. Additionally, a deny.toml configuration file is added to manage dependency security and licensing. The review feedback suggests refactoring the hardcoded file size limit in slop.rs into a constant for consistency and improving performance by reusing the tree_sitter::Parser instance.

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit c6dc519. Configure here.}

…aps, deduplicate is_secret_like into analyzer-core - Introduce analyzer-core::limits with MAX_WALK_FILE_SIZE (5 MiB), MAX_NAME_LEN (1 KiB) and MAX_PATH_LEN (4 KiB / PATH_MAX) as the single source of truth for defense-in-depth bounds. Replaces the hardcoded 5*1024*1024 in analyzer-graph::slop::walk_repo_files and the local MAX_FILE_BYTES in analyzer-embed::scan. - Introduce analyzer-core::secrets::is_secret_like as the shared deny-list for credential-looking file/dir patterns. analyzer-embed and analyzer-graph previously carried byte-identical copies; any future addition (say, a new cloud-cred directory) would have had to be made in two places. Now there's one. - Split path vs. name caps in the sidecar reader. `read_string` previously used MAX_NAME_LEN (1 KiB) for the length prefix on path strings, which rejects legitimate deeply-nested monorepo paths. It now uses MAX_PATH_LEN via a new read_bounded_string helper; the name_len branch keeps MAX_NAME_LEN for symbol identifiers where a tight cap is appropriate. - Add tests: 5 new is_secret_like cases in analyzer-core, plus oversized_path_len_is_rejected and long_path_under_cap_round_trips in the sidecar suite to cover the new split.

…Store The security pass set .hidden(true) for both embed and slop walkers, but slop's tracked_artifacts() needs to flag .DS_Store, .swp, etc. as deletable junk. Secret-like paths are still filtered by is_secret_like. - embed walker: .hidden(true) kept (don't index .env) - slop walker: .hidden(false) restored (must see OS junk)

avifenesh added 5 commits April 26, 2026 13:22

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread crates/analyzer-graph/src/slop.rs Outdated

cursor Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread crates/analyzer-embed/src/sidecar.rs

Comment thread crates/analyzer-graph/src/slop.rs Outdated

avifenesh added 3 commits April 26, 2026 14:30

style: cargo fmt

9d0bfe2

avifenesh merged commit 1234c8d into main Apr 26, 2026
4 checks passed

avifenesh deleted the fix/security-hardening branch April 26, 2026 12:35

avifenesh mentioned this pull request Apr 26, 2026

ci: add cargo-deny workflow #38

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): sidecar OOM caps, secret-file exclusion, SLSA attestations, cargo-deny#37

fix(security): sidecar OOM caps, secret-file exclusion, SLSA attestations, cargo-deny#37
avifenesh merged 8 commits into
mainfrom
fix/security-hardening

avifenesh commented Apr 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avifenesh commented Apr 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Follow-ups (separate issues)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

avifenesh commented Apr 26, 2026 •

edited by cursor Bot

Loading