fix(security): sidecar OOM caps, secret-file exclusion, SLSA attestations, cargo-deny#37
Conversation
The sidecar binary format is fully attacker-controlled once a repository's embedding cache is written to disk, and Sidecar::read previously trusted every length/count field to drive allocations directly: - header_len (u32) -> vec![0u8; header_len] could request up to 4 GiB. - header.vector_count drove an unbounded outer loop. - header.dim drove Vec::with_capacity per vector. - name_len (u32) drove another unbounded string allocation. Introduce explicit caps - 1 MiB header, 10M vectors, 4096 dim, 1 KiB names - and validate each field before any allocation. Add a from_bytes entry point that additionally cross-checks the declared vector count against the actual byte buffer so a malformed header cannot claim 10M vectors in a 100-byte file. read<R: Read> now slurps and delegates. Unit tests cover each cap and the implied-size check. The existing round-trip tests continue to pass.
…slop walks Both the embed scanner and the slop walker built WalkBuilder with .hidden(false), which meant .git/, .env, .ssh/id_rsa, .aws/credentials and similar secret-bearing paths were visited and - in the embed case - hashed and potentially shipped in the vector sidecar. Flip .hidden(true) in both walkers and add an explicit is_secret_like() filter that rejects well-known dotfiles (.env, .env.*, .npmrc, .pypirc, .netrc, .htpasswd), SSH/PGP/cloud config directories (.git/, .ssh/, .gnupg/, .aws/, .gcloud/, .azure/), SSH private-key filenames (id_rsa*, id_ed25519*, ...), and credential-like extensions (.pem/.key/.crt/.p12/.pfx/.jks/.keystore). The helper is duplicated across the two crates to avoid pulling analyzer-embed into analyzer-graph for one filter. Test: walk_excludes_dotfiles_and_secret_patterns constructs a tempdir populated with src/a.rs plus a set of secret-like files and confirms only the legitimate source file survives the walk.
Both walkers called fs::read / std::fs::read_to_string on every file they visited with no size check. A single large vendored bundle or a hostile 1 GiB file in the repo would force that much RAM per worker. Use WalkBuilder::max_filesize(Some(5 * 1024 * 1024)) in both the embed scanner and the slop walker; ignore::WalkBuilder already short-circuits oversized files before yielding them, so no per-file metadata call is needed in hot paths. 5 MiB comfortably covers the long tail of real source files while cutting off the obvious DoS vector. Test: walk_skips_files_larger_than_cap writes a 10 MiB .rs file alongside a small one in a tempdir and asserts only the small file is returned.
Previously the release workflow only published SHA-256 checksums, which prove integrity but not provenance. Add actions/attest-build-provenance (pinned to SHA) after archive creation on each matrix job so every published .tar.gz / .zip gets a signed in-toto attestation traceable back to the exact workflow run that produced it. Grant the job id-token: write and attestations: write so the action can sign via the runner's OIDC identity and post the attestation to the release. Note: this is defense in depth - the attestations are published but not yet verified client-side. The agent-core binary downloader would need cosign or slsa-verifier to check them; wiring that in is a separate PR.
deny.toml establishes baseline policy for four checks:
- advisories: vulnerability = deny, yanked = deny.
- licenses: allow MIT/Apache-2.0/BSD-{2,3}-Clause/ISC/MPL-2.0/Unicode/
Zlib/CC0; GPL-*/AGPL-*/SSPL are rejected by omission.
- bans: multiple-versions + wildcards = warn; deny list is empty for
now, extend as concrete risks surface.
- sources: unknown-registry and unknown-git both deny; crates.io is the
only allowed registry, agent-sh is the only allowed git org.
Verified locally with cargo deny check - all four checks pass (the only
warnings are "unmatched license allowance" / "multiple-versions", which
are informational).
CI integration (a `cargo deny check` job) is a follow-up.
There was a problem hiding this comment.
Code Review
This pull request introduces several security hardening measures, including file size limits and secret-pattern exclusion during repository walks, as well as robust input validation for sidecar parsing to prevent potential memory exhaustion attacks. Additionally, a deny.toml configuration file is added to manage dependency security and licensing. The review feedback suggests refactoring the hardcoded file size limit in slop.rs into a constant for consistency and improving performance by reusing the tree_sitter::Parser instance.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c6dc519. Configure here.
…aps, deduplicate is_secret_like into analyzer-core - Introduce analyzer-core::limits with MAX_WALK_FILE_SIZE (5 MiB), MAX_NAME_LEN (1 KiB) and MAX_PATH_LEN (4 KiB / PATH_MAX) as the single source of truth for defense-in-depth bounds. Replaces the hardcoded 5*1024*1024 in analyzer-graph::slop::walk_repo_files and the local MAX_FILE_BYTES in analyzer-embed::scan. - Introduce analyzer-core::secrets::is_secret_like as the shared deny-list for credential-looking file/dir patterns. analyzer-embed and analyzer-graph previously carried byte-identical copies; any future addition (say, a new cloud-cred directory) would have had to be made in two places. Now there's one. - Split path vs. name caps in the sidecar reader. `read_string` previously used MAX_NAME_LEN (1 KiB) for the length prefix on path strings, which rejects legitimate deeply-nested monorepo paths. It now uses MAX_PATH_LEN via a new read_bounded_string helper; the name_len branch keeps MAX_NAME_LEN for symbol identifiers where a tight cap is appropriate. - Add tests: 5 new is_secret_like cases in analyzer-core, plus oversized_path_len_is_rejected and long_path_under_cap_round_trips in the sidecar suite to cover the new split.
…Store The security pass set .hidden(true) for both embed and slop walkers, but slop's tracked_artifacts() needs to flag .DS_Store, .swp, etc. as deletable junk. Secret-like paths are still filtered by is_secret_like. - embed walker: .hidden(true) kept (don't index .env) - slop walker: .hidden(false) restored (must see OS junk)

Summary
Security hardening from an internal audit (2026-04-26). 5 commits, each a single concern:
panic = abort)Test plan
Follow-ups (separate issues)
Note
Medium Risk
Moderate risk because it changes low-level sidecar parsing and repo file discovery (hidden/large/secret-like files now skipped), which could affect embedding/analyzer coverage, and it alters release workflow permissions/steps for attestations.
Overview
Adds security hardening across artifact parsing, repo walking, and releases.
The sidecar (
repo-intel.embeddings.bin) reader now enforces strict caps and size cross-checks (header length, vector count, dim, string lengths) viaSidecar::from_bytes, with new tests to ensure malformed sidecars fail fast instead of causing OOM/panic.Repo walkers used by embedding and slop analysis now exclude hidden paths, skip secret-like filenames/extensions/directories, and apply a 5 MiB per-file size cap (with new
scan.rstests).The release workflow now publishes SLSA build-provenance attestations for per-target archives (OIDC
id-token/attestationspermissions, pinnedactions/attest-build-provenance), and the repo adds a baselinedeny.tomlpolicy forcargo-deny(advisories/licenses/sources).Reviewed by Cursor Bugbot for commit c6dc519. Configure here.