Skip to content

Secret store: credential isolation, encryption at rest, output scrubbing#260

Merged
jamiepine merged 11 commits intomainfrom
secrets-store
Mar 1, 2026
Merged

Secret store: credential isolation, encryption at rest, output scrubbing#260
jamiepine merged 11 commits intomainfrom
secrets-store

Conversation

@jamiepine
Copy link
Member

@jamiepine jamiepine commented Feb 28, 2026

Screenshot 2026-02-27 at 9 17 45 PM

Summary

Complete secret store implementation per docs/design-docs/secret-store.md. Instance-level credential storage with layered security — every protection layer works without encryption, encryption adds defense against volume compromise.

  • Two secret categories — system (never exposed to subprocesses) and tool (injected as env vars into workers). Auto-categorized via the SystemSecrets trait so each config section declares its own credential fields
  • Config resolutionsecret: prefix in config.toml resolves from the store at load time, keeping config files free of plaintext credentials
  • Encryption at rest — opt-in AES-256-GCM with Argon2id KDF. Master key stored in OS credential store (macOS Keychain / Linux kernel keyring), never on disk
  • Output scrubbing — all tool secret values redacted from worker output before reaching channels. Rolling buffer handles secrets split across stream chunks
  • Worker secret_set tool — workers can store credentials they obtain (account creation, OAuth tokens) for future use
  • Full API — CRUD, encryption lifecycle (encrypt/unlock/lock/rotate), migration from plaintext config, export/import
  • CLI — 12 spacebot secrets subcommands
  • Dashboard UI — secrets management panel in Settings
  • Platform integration — master key generation, injection via Fly Machines, rotation endpoint

Depends On

#259 (Sandbox hardening) must be merged first — this branch includes those commits. After #259 merges, merge main back into this branch and the sandbox commits will converge.

Key Design Decisions

  • Secrets are instance-level, not per-agent. Config.toml is shared across all agents, so the store is a single instance-level redb database at <instance_dir>/data/secrets.redb
  • Self-declaring secrets via traits. Each config section (LlmConfig, DiscordConfig, etc.) implements SystemSecrets to declare its credential fields. No hard-coded lists to maintain
  • Auto-categorization is inverted from the design doc. Known internal credentials are explicitly declared as system; everything else defaults to tool (exposed to workers)
  • Unencrypted mode is fully functional. All features work without encryption — categories, env sanitization, scrubbing, config resolution. Encryption adds protection against disk access only
  • Worker prompt injection via Jinja. Tool secret names are passed to render_worker_prompt() and rendered via a {% if tool_secret_names %} block — no hardcoded push_str() in Rust

Files Changed

New files

  • src/secrets/keystore.rs — OS credential store abstraction (macOS Keychain, Linux kernel keyring, Noop fallback)
  • src/api/secrets.rs — full CRUD + lifecycle API endpoints
  • src/tools/secret_set.rs — worker tool for storing secrets
  • prompts/en/tools/secret_set_description.md.j2 — tool description
  • docs/content/docs/(configuration)/secrets.mdx — user-facing documentation

Modified (core)

  • src/secrets/store.rsSecretField, SystemSecrets trait, auto_categorize(), system_secret_registry(), encryption lifecycle, 14 tests
  • src/config.rssecret: prefix in resolve_env_value(), SystemSecrets impls on all config sections
  • src/sandbox.rs — tool secret injection in all 3 wrap methods, keyring isolation
  • src/main.rsbootstrap_secrets_store(), legacy migration, CLI subcommands
  • src/agent/channel.rs — tool secrets via Jinja, output scrubbing
  • src/agent/cortex.rs — tool secrets via Jinja (was missing)
  • src/agent/branch.rs — scrubbing branch conclusions
  • src/tools/set_status.rs — scrubbing status text
  • src/opencode/worker.rs — SSE scrubbing
  • prompts/en/worker.md.j2secret_set docs + {% if tool_secret_names %} block

Modified (platform)

  • spacebot-platform/api/src/db.rsmaster_key column
  • spacebot-platform/api/src/fly.rs — key injection via Fly Machines
  • spacebot-platform/api/src/routes.rs — rollout re-injection, rotate endpoint
  • spacebot-platform/api/migrations/007_instance_master_key.sql

Style cleanup

  • Removed all // ── section-divider comments from store.rs, keystore.rs, api/secrets.rs, api/mcp.rs, main.rs per RUST_STYLE_GUIDE.md

Testing

  • 294/294 lib tests pass (includes 14 secret store tests)
  • Clean clippy, clean fmt
  • Integration tests compile
  • Manually tested: encrypted mode, secret CRUD via dashboard, config resolution, worker env injection, output scrubbing

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds an instance-level secret store (system/tool categories) with optional AES-256-GCM encryption and OS keystore integration, secret: config resolution and migration, CLI/API/UI for secret lifecycle, secret-aware sandbox/worker wiring (injection + scrubbing), prompt/tool updates, docs, and tests.

Changes

Cohort / File(s) Summary
Docs — Configuration & Secrets
docs/content/docs/(configuration)/config.mdx, docs/content/docs/(configuration)/permissions.mdx, docs/content/docs/(configuration)/secrets.mdx, docs/content/docs/(configuration)/sandbox.mdx
Add secret: resolution mode, secret-store docs, migration guidance, passthrough_env notes, secret-store cross-links, and minor wording updates.
Docs — Architecture, Agents & Deployment
docs/content/docs/(core)/agents.mdx, docs/content/docs/(core)/architecture.mdx, docs/content/docs/(deployment)/roadmap.mdx, docs/content/docs/(features)/workers.mdx, docs/content/docs/index.mdx
Split agent data into config.redb, settings.redb, secrets.redb; document per-store roles and shipped Secrets Store.
Top-level Docs & README
AGENTS.md, README.md
Add Bun tooling note and reorganize security section into Credential Isolation, Process Containment, and Durable binary storage items.
API Server & Client
src/api/secrets.rs, src/api/server.rs, interface/src/api/client.ts
New /secrets HTTP routes and server wiring; client API types/methods added (note: client declarations duplicated in-file).
Core Secrets Implementation
src/secrets/store.rs, src/secrets/keystore.rs, src/secrets/scrub.rs, src/secrets.rs
Implement SecretsStore with categories, encryption lifecycle, export/import, OS keystore abstraction (macOS/Linux/Noop), and redaction helpers.
Config & Runtime
src/config.rs
Add SystemSecrets trait impls, thread-local resolver for secret: resolution (with env fallback), RuntimeConfig secrets field and setter APIs.
Startup & CLI
src/main.rs
Secrets CLI subcommands and bootstrap logic to pre-open/migrate/unlock store; propagate bootstrapped store into run/initialize flow and API state; add HTTP helper wrappers.
Worker / Prompt / Sandbox Integration
src/agent/channel.rs, src/agent/branch.rs, src/agent/cortex.rs, src/prompts/engine.rs, src/sandbox.rs, src/opencode/worker.rs
Thread tool_secret_names into prompt rendering, scrub branch and worker outputs, propagate secrets_store into sandbox/worker creation, inject tool secrets into subprocess envs, and add Linux keyring isolation hooks.
Tools & Worker Tools
src/tools.rs, src/tools/secret_set.rs, src/tools/set_status.rs, prompts/en/tools/secret_set_description.md.j2, prompts/en/worker.md.j2
Add secret_set tool and re-export; SetStatusTool supports scrubbing; tool implementation and templates for guidance.
API State & Module Wiring
src/api.rs, src/api/state.rs, src/api/mcp.rs
Declare secrets module, add ArcSwap secrets_store to ApiState with setter(s) (duplicate setter occurrences present), and minor mcp.rs cleanup.
Interface / UI
interface/src/routes/Settings.tsx, interface/src/components/UpdateBanner.tsx, interface/src/components/UpdatePill.tsx, interface/src/router.tsx, interface/src/routes/Overview.tsx, interface/src/api/client.ts
Add Secrets settings tab/section with CRUD/encryption UI; replace UpdateBanner with UpdatePill and adjust router/overview; expand client API surface for secrets (duplicate entries noted).
Tests & Helpers
tests/bulletin.rs, tests/context_dump.rs
Bootstrap secrets store for tests so secret: resolves before config parsing; adjust test helpers and prompt/context rendering.
Prompts / Templates / Text
src/prompts/text.rs, prompts/en/channel.md.j2, prompts/en/tools/secret_set_description.md.j2, prompts/en/worker.md.j2
Add tool description mapping and template edits to surface secret_set guidance and list available tool secrets.
Other Rust changes
src/tools/secret_set.rs, src/tools.rs, src/tools/set_status.rs, src/opencode/worker.rs
New tool implementation, tool re-exports, and tool/worker scrubbing integrations across invocation paths.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: secret store implementation with credential isolation, encryption, and output scrubbing.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering implementation details, design decisions, file changes, testing status, and dependencies.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch secrets-store

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jamiepine jamiepine changed the base branch from main to sandbox-hardening February 28, 2026 05:23
Copy link
Contributor

@l33t0 l33t0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Does scrubbing apply to structured tool result payloads, or just the raw stream? A secret injected as an env var could end up reflected in a JSON tool result and bypass it entirely.
  2. What's the failure mode in bootstrap_secrets_store() when the kernel keyring isn't available on a headless Linux box - hard fail, graceful fallback, or silent unencrypted mode?
  3. Is there anything in the setup flow that nudges operators toward enabling encryption, or is it purely opt-in? Most people won't enable it if it's not the default path...

@jamiepine
Copy link
Member Author

  1. Does scrubbing apply to structured tool result payloads, or just the raw stream? A secret injected as an env var could end up reflected in a JSON tool result and bypass it entirely.

It does not currently, that wouldn't be an issue if it were not for the worker_inspect tool that allows branches to pull worker logs. Might need to apply the same scrubbing technique in this tool.

  1. What's the failure mode in bootstrap_secrets_store() when the kernel keyring isn't available on a headless Linux box - hard fail, graceful fallback, or silent unencrypted mode?

It currently fails silently to unencrypted mode, not ideal. It can be locked and unlocked via the dashboard but for updates this is less than ideal, there aren't many other ways I can think of that would protect the key inside a docker deployment from unsandboxed agents and persist restarts at the same time. Will keep thinking on this.

  1. Is there anything in the setup flow that nudges operators toward enabling encryption, or is it purely opt-in? Most people won't enable it if it's not the default path...

Not yet, aside from the secrets screen. We could probably put a dismissible card somewhere.

Copy link
Contributor

@vsumner vsumner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the large security pass here. Sharing a few targeted correctness issues to tighten before merge.

// and the worker is not started (correct — a worker that inherits
// the parent's session keyring could access the master key).
unsafe {
cmd.pre_exec(|| crate::secrets::keystore::pre_exec_new_session_keyring());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently hard-fails worker spawn whenever keyctl join fails. In restricted Linux/container environments that can take down all workers. Consider a guarded fallback: log a warning and continue spawn without keyring isolation (or gate strict fail-closed behind config). Please add a test path that forces pre_exec_new_session_keyring() failure.

value: &str,
category: SecretCategory,
) -> Result<(), SecretsError> {
let state = self.state();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set() checks state, then encodes/writes later. lock() / rotate_key() can interleave between those steps, which can leave mixed plaintext/ciphertext or old-key ciphertext in storage. Suggest serializing mutating store operations behind a single internal lock/guard and adding a concurrent stress test (set vs lock, set vs rotate_key).

.tool(ReadSkillTool::new(runtime_config));
.tool({
let mut status_tool = SetStatusTool::new(agent_id, worker_id, channel_id, event_tx);
if let Some(store) = runtime_config.secrets.load().as_ref() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetStatusTool snapshots tool secrets only at worker creation. If a worker calls secret_set mid-run, later status messages won't scrub the new value. Suggest storing Arc<SecretsStore> in SetStatusTool and scrubbing with fresh pairs per call (or refreshing pairs before each status emit).

… scrubbing, worker secret_set tool

Instance-level credential storage with system/tool categories, config
resolution via secret: prefix, AES-256-GCM encryption with OS keystore
(macOS Keychain / Linux kernel keyring), output scrubbing across all
worker output paths, and a secret_set tool for autonomous credential
management.

Includes full API (CRUD + lifecycle), CLI subcommands, dashboard UI,
platform master key injection, SystemSecrets trait for auto-categorization,
Jinja-based worker prompt injection, and documentation.
@jamiepine jamiepine changed the base branch from sandbox-hardening to main March 1, 2026 03:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/tools/set_status.rs (1)

95-105: ⚠️ Potential issue | 🟡 Minor

Scrub before truncation to avoid partial secret leaks.

Right now truncation runs first, so a secret split by the 256-char cutoff can bypass exact-match redaction. Scrub first, then cap length.

🛡️ Proposed ordering fix
-        let status = if args.status.len() > 256 {
-            let end = args.status.floor_char_boundary(256);
-            let boundary = args.status[..end].rfind(char::is_whitespace).unwrap_or(end);
-            format!("{}...", &args.status[..boundary])
-        } else {
-            args.status
-        };
-
-        // Scrub tool secret values before the status reaches the channel.
-        let status = crate::secrets::scrub::scrub_secrets(&status, &self.tool_secret_pairs);
+        // Scrub tool secret values before any truncation.
+        let scrubbed = crate::secrets::scrub::scrub_secrets(&args.status, &self.tool_secret_pairs);
+        let status = if scrubbed.len() > 256 {
+            let end = scrubbed.floor_char_boundary(256);
+            let boundary = scrubbed[..end].rfind(char::is_whitespace).unwrap_or(end);
+            format!("{}...", &scrubbed[..boundary])
+        } else {
+            scrubbed
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/set_status.rs` around lines 95 - 105, The truncation runs before
redaction so partial secrets can escape; call
crate::secrets::scrub::scrub_secrets on the original args.status (passing
&self.tool_secret_pairs) first and assign that to a status variable, then
perform the existing length check/truncation logic (using floor_char_boundary,
rfind(char::is_whitespace), and format!("{}...", ...)) against the scrubbed
status string so the final status is both scrubbed and length-capped.
src/agent/channel.rs (1)

2352-2365: ⚠️ Potential issue | 🔴 Critical

Scrub worker failure text too, not only success text.

At Line [2364], the error path emits Worker failed: {error} without redaction. If the error string reflects a tool secret, it can leak into WorkerComplete.result and channel context.

Proposed fix
-        let (result_text, notify, success) = match future.await {
-            Ok(text) => {
-                // Scrub tool secret values from the result before it reaches
-                // the channel. The channel never sees raw secret values.
-                let scrubbed = if let Some(store) = &secrets_store {
-                    crate::secrets::scrub::scrub_with_store(&text, store)
-                } else {
-                    text
-                };
-                (scrubbed, true, true)
-            }
-            Err(error) => {
-                tracing::error!(worker_id = %worker_id, %error, "worker failed");
-                (format!("Worker failed: {error}"), true, false)
-            }
-        };
+        let (raw_result_text, notify, success) = match future.await {
+            Ok(text) => (text, true, true),
+            Err(error) => {
+                tracing::error!(worker_id = %worker_id, %error, "worker failed");
+                (format!("Worker failed: {error}"), true, false)
+            }
+        };
+        let result_text = if let Some(store) = &secrets_store {
+            crate::secrets::scrub::scrub_with_store(&raw_result_text, store)
+        } else {
+            raw_result_text
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel.rs` around lines 2352 - 2365, The error path currently
returns an unredacted error string (format!("Worker failed: {error}")) which can
leak secrets; update the Err(error) branch in the same scope where secrets_store
and crate::secrets::scrub::scrub_with_store are available to scrub the error
text before returning it: build the message string ("Worker failed: {error}"),
then if let Some(store) = &secrets_store call scrub_with_store on that message
(mirroring the Ok branch) and return the scrubbed text along with the same
booleans; keep tracing::error logging but ensure the value sent back in
WorkerComplete.result is the scrubbed version.
src/opencode/worker.rs (1)

379-405: ⚠️ Potential issue | 🟠 Major

Scrub/scan is missing on ToolState::Error output path.

ToolState::Completed output is protected, but ToolState::Error still emits raw tool error text through send_status, which leaves a leak path open.

🔧 Suggested patch
                                 ToolState::Error { error, .. } => {
-                                    let description = error.as_deref().unwrap_or("unknown");
-                                    self.send_status(&format!(
-                                        "tool error: {tool_name}: {description}"
-                                    ));
+                                    let description = error.as_deref().unwrap_or("unknown");
+                                    let scrubbed = self.scrub_text(description);
+                                    if crate::secrets::scrub::scan_for_leaks(&scrubbed).is_some() {
+                                        tracing::error!(
+                                            worker_id = %self.id,
+                                            tool = %tool_name,
+                                            "LEAK DETECTED in OpenCode tool error output — terminating"
+                                        );
+                                        return EventAction::Error(
+                                            "leak detected in tool error output".to_string(),
+                                        );
+                                    }
+                                    self.send_status(&format!(
+                                        "tool error: {tool_name}: {scrubbed}"
+                                    ));
                                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/opencode/worker.rs` around lines 379 - 405, ToolState::Error currently
emits raw tool error text via send_status, leaving secrets un-scrubbed; update
the ToolState::Error branch (in the match handling tool results) to mirror the
ToolState::Completed path: if the error contains output, run
self.scrub_text(...) and call crate::secrets::scrub::scan_for_leaks(&scrubbed)
and if a leak is found log the leak with tracing::error! (including worker_id
and tool) and return EventAction::Error("leak detected in tool
output".to_string()); after scanning, clear current_tool (set to None when
current_tool.as_deref() == Some(tool_name.as_str())) and call self.send_status
with a sanitized description (not the raw error text). Ensure you reference
ToolState::Error, send_status, scrub_text, crate::secrets::scrub::scan_for_leaks
and current_tool when making the change.
♻️ Duplicate comments (5)
src/secrets/store.rs (2)

283-332: ⚠️ Potential issue | 🔴 Critical

Serialize state transitions and writes to prevent mixed-encryption corruption.

set() can interleave with lock()/rotate_key()/enable_encryption() between state check and write. That can persist plaintext or old-key ciphertext while config state has moved, making secrets unreadable.

🧵 Suggested synchronization pattern
 pub struct SecretsStore {
     db: Database,
     cipher_state: RwLock<Option<CipherState>>,
     encrypted: RwLock<bool>,
+    mutation_lock: std::sync::Mutex<()>,
 }
@@
     pub fn set(&self, name: &str, value: &str, category: SecretCategory) -> Result<(), SecretsError> {
+        let _mutation_guard = self.mutation_lock.lock().expect("mutation lock poisoned");
         let state = self.state();
         if state == StoreState::Locked {
@@
     pub fn enable_encryption(&self) -> Result<Vec<u8>, SecretsError> {
+        let _mutation_guard = self.mutation_lock.lock().expect("mutation lock poisoned");
         ...
     }
@@
     pub fn unlock(&self, master_key: &[u8]) -> Result<(), SecretsError> {
+        let _mutation_guard = self.mutation_lock.lock().expect("mutation lock poisoned");
         ...
     }
@@
     pub fn lock(&self) -> Result<(), SecretsError> {
+        let _mutation_guard = self.mutation_lock.lock().expect("mutation lock poisoned");
         ...
     }
@@
     pub fn rotate_key(&self) -> Result<Vec<u8>, SecretsError> {
+        let _mutation_guard = self.mutation_lock.lock().expect("mutation lock poisoned");
         ...
     }

Also applies to: 527-785

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/store.rs` around lines 283 - 332, The set() method currently
checks self.state() then performs writes, allowing interleaving with
lock()/rotate_key()/enable_encryption() that changes encryption semantics; fix
by serializing state transitions and writes so the state check and DB
insert/commit happen atomically: acquire the same mutex or state guard used by
lock()/rotate_key()/enable_encryption() (or add one if missing) at the start of
set(), hold it while calling self.state(), encode_value(), building
SecretMetadata, opening tables (SECRETS_TABLE/METADATA_TABLE), performing
inserts, and committing the transaction, then release the guard; ensure
get_metadata and metadata construction still read under the guard so created_at
is preserved consistently.

276-282: ⚠️ Potential issue | 🟠 Major

Validate secret names centrally in set().

set() accepts arbitrary names, which can flow into worker env injection for tool secrets. Please enforce a strict allowlist format and block dangerous reserved names at the store boundary.

🔐 Suggested central guard
+fn validate_secret_name(name: &str) -> Result<(), SecretsError> {
+    static VALID: std::sync::LazyLock<regex::Regex> =
+        std::sync::LazyLock::new(|| regex::Regex::new(r"^[A-Z][A-Z0-9_]*$").expect("hardcoded regex"));
+    const RESERVED: &[&str] = &["PATH", "HOME", "LD_PRELOAD", "DYLD_INSERT_LIBRARIES"];
+
+    if !VALID.is_match(name) {
+        return Err(SecretsError::Other(anyhow::anyhow!("invalid secret name format")));
+    }
+    if RESERVED.contains(&name) || name.starts_with("SPACEBOT_") {
+        return Err(SecretsError::Other(anyhow::anyhow!("reserved secret name")));
+    }
+    Ok(())
+}
+
     pub fn set(
         &self,
         name: &str,
         value: &str,
         category: SecretCategory,
     ) -> Result<(), SecretsError> {
+        validate_secret_name(name)?;
         let state = self.state();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/store.rs` around lines 276 - 282, The set(...) method currently
accepts arbitrary secret names; add a central validation step at the start of
SecretStore::set to enforce a strict allowlist pattern (e.g., ASCII
alphanumerics, underscores, limited length) and explicitly reject
dangerous/reserved names (e.g., "PATH", "HOME", "SHELL", any OS env var names
you consider reserved or prefixes like "LD_", etc.) based on SecretCategory;
implement this via a private helper validate_secret_name(name: &str, category:
SecretCategory) that returns Result<(), SecretsError> and call it from set,
returning a SecretsError on invalid names so nothing unsafe is stored or later
injected into worker envs. Ensure tests are updated to cover allowed,
disallowed, and reserved-name cases.
src/secrets/keystore.rs (1)

113-119: ⚠️ Potential issue | 🔴 Critical

Fix Linux keyctl arguments: wrong opcode and non-C-string descriptions.

Line 116 uses 0x0b while labeled KEYCTL_GET_KEYRING_ID (0x0b is KEYCTL_READ), and description.as_ptr() is used without NUL termination in add/search calls. Both can break store/load/delete reliability.

🐧 Proposed fix pattern
+use std::ffi::CString;
+
     fn store_key(&self, instance_id: &str, key: &[u8]) -> Result<(), SecretsError> {
-        let description = format!("{SERVICE_NAME}:{instance_id}");
+        let description = CString::new(format!("{SERVICE_NAME}:{instance_id}"))
+            .map_err(|_| SecretsError::Other(anyhow::anyhow!("invalid key description")))?;
 
         // Get the session keyring.
         let session_keyring = unsafe {
             libc::syscall(
                 libc::SYS_keyctl,
-                0x0b_i64, // KEYCTL_GET_KEYRING_ID
-                -3_i64,   // KEY_SPEC_SESSION_KEYRING
+                libc::KEYCTL_GET_KEYRING_ID as libc::c_long,
+                libc::KEY_SPEC_SESSION_KEYRING as libc::c_long,
                 0_i64,    // don't create
             )
         };
@@
         let result = unsafe {
             libc::syscall(
                 libc::SYS_add_key,
-                b"user\0".as_ptr(),
+                b"user\0".as_ptr().cast::<libc::c_char>(),
                 description.as_ptr(),
                 key.as_ptr(),
                 key.len(),
                 session_keyring,
             )
         };

Apply the same CString approach to load_key() and delete_key() search calls.

#!/bin/bash
# Read-only verification of the problematic call sites in this PR branch.
rg -n -C2 '0x0b_i64|KEYCTL_GET_KEYRING_ID|description\.as_ptr\(\)|SYS_keyctl' src/secrets/keystore.rs

Also applies to: 134-135, 155-161, 217-224

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/keystore.rs` around lines 113 - 119, The syscall and string-arg
handling for keyctl calls are wrong: replace the hardcoded 0x0b opcode with the
correct KEYCTL_GET_KEYRING_ID symbol (use libc::KEYCTL_GET_KEYRING_ID or the
correct constant) when obtaining session_keyring, and convert any Rust &str
description passed to keyctl/add_key/search/delete into a NUL-terminated C
string (use CString::new(description)? and pass its .as_ptr()) in load_key(),
delete_key(), and the add/search call sites (refer to functions load_key,
delete_key, and the session_keyring syscall usage) so keyctl opcodes and
description pointers are correct and safe.
src/tools.rs (1)

413-417: ⚠️ Potential issue | 🟠 Major

set_status scrubbing still uses a stale secret snapshot.

At Line [416], with_tool_secrets(store.tool_secret_pairs()) captures values only once at worker creation. If secret_set updates credentials mid-run (Line [423]), later status messages can miss redaction for newly added/rotated secrets.

Also applies to: 422-424

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools.rs` around lines 413 - 417, The SetStatusTool is capturing a
one-time secret snapshot via with_tool_secrets(store.tool_secret_pairs()) during
creation (see SetStatusTool::new and with_tool_secrets) which causes later
status messages to miss newly added/rotated secrets from
runtime_config.secrets.load(); instead, change the design so the tool consults
the live secret store at send-time: either pass a reference/Arc to the secret
holder (runtime_config.secrets) into SetStatusTool or replace with_tool_secrets
with a callback/accessor on SetStatusTool that calls
runtime_config.secrets.load().as_ref().map(...).tool_secret_pairs() each time a
status is redacted/sent, so secrets are resolved dynamically rather than
captured once. Ensure the change touches SetStatusTool methods that perform
redaction/send so they use the live accessor rather than the initial snapshot.
src/config.rs (1)

3311-3331: ⚠️ Potential issue | 🟠 Major

Thread-local secret resolution can fail nondeterministically across reload paths.

At Line 3316, secret lookup depends on per-thread state. If Config::load*() runs on a different thread than the one that called set_resolve_secrets_store, secret: values resolve to None silently.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` around lines 3311 - 3331, The current RESOLVE_SECRETS_STORE is
thread-local, so calls to
set_resolve_secrets_store()/clear_resolve_secrets_store() only affect the
current thread and cause nondeterministic missing secrets when Config::load*()
runs on another thread; replace the thread-local storage with a process-wide,
thread-safe global (e.g. a static OnceCell<Arc<...>> or a static
RwLock<Option<Arc<crate::secrets::store::SecretsStore>>>), update
set_resolve_secrets_store and clear_resolve_secrets_store to write/read that
global, and ensure resolve_env_value (and any callers that read
RESOLVE_SECRETS_STORE) uses the new global so secret resolution is consistent
across threads.
🟡 Minor comments (12)
docs/content/docs/(configuration)/permissions.mdx-53-53 (1)

53-53: ⚠️ Potential issue | 🟡 Minor

Avoid absolute wording about plaintext secrets in config.

Line 53 says config.toml has “secret references, not plaintext keys,” but plaintext/literal values can still exist on unmigrated setups. Please soften this to avoid a false security guarantee.

📝 Suggested wording
-~/.spacebot/config.toml                    # instance config (secret references, not plaintext keys)
+~/.spacebot/config.toml                    # instance config (prefer secret references; plaintext may still exist until migrated)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/content/docs/`(configuration)/permissions.mdx at line 53, Replace the
absolute claim in the inline comment for ~/.spacebot/config.toml (the text
"(secret references, not plaintext keys)") with softened language that
acknowledges some setups may still contain literal secrets, e.g. "(typically
contains secret references; may contain plaintext values in older or unmigrated
setups)"; update the line that references config.toml in permissions.mdx
accordingly so the wording no longer promises that plaintext keys are never
present.
src/secrets/store.rs-493-497 (1)

493-497: ⚠️ Potential issue | 🟡 Minor

Log or propagate per-secret read failures in tool env export.

Currently failed self.get(name) calls are silently ignored, which makes missing env injection hard to diagnose.

🔍 Suggested minimal observability fix
-        for (name, meta) in &metadata {
-            if meta.category == SecretCategory::Tool
-                && let Ok(secret) = self.get(name)
-            {
-                result.insert(name.clone(), secret.expose().to_string());
-            }
-        }
+        for (name, metadata_entry) in &metadata {
+            if metadata_entry.category != SecretCategory::Tool {
+                continue;
+            }
+            match self.get(name) {
+                Ok(secret) => {
+                    result.insert(name.clone(), secret.expose().to_string());
+                }
+                Err(error) => {
+                    tracing::warn!(secret_name = %name, %error, "failed to read tool secret for env injection");
+                }
+            }
+        }

As per coding guidelines: "Don't silently discard errors. No let _ = on Results. Handle them, log them, or propagate them."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/store.rs` around lines 493 - 497, The loop that filters
SecretCategory::Tool currently ignores errors from self.get(name); update the
code around the SecretCategory::Tool branch so that the Err case from self.get
is not discarded but either logged (e.g., tracing::warn/error with the secret
name and the error) or propagated upstream as a Result; specifically handle the
Result from self.get(name) instead of using the pattern that only matches Ok,
and on Ok continue to call secret.expose() and result.insert(name.clone(), ...),
while on Err emit a clear log entry referencing name and the error (or
collect/return the error) so per-secret read failures are observable.
docs/content/docs/(core)/agents.mdx-235-237 (1)

235-237: ⚠️ Potential issue | 🟡 Minor

secrets.redb is instance-level, not per-agent.

Line 237 documents secrets.redb under each agent's data directory, but it is actually stored at the instance level (~/.spacebot/data/secrets.redb). The code explicitly opens it as instance-level (see src/main.rs:1120), and secrets.mdx confirms the layout was migrated from legacy per-agent storage to instance-level. This documentation should be corrected to avoid operator confusion during setup, backups, and migrations.

Note: architecture.mdx:180 has the same issue, documenting "per-agent credential storage" when it should reference instance-level storage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/content/docs/`(core)/agents.mdx around lines 235 - 237, The docs
currently list secrets.redb as per-agent under each agent's data directory;
update agents.mdx (the entries listing config.redb, settings.redb, secrets.redb)
to mark secrets.redb as instance-level and not per-agent, and also update
architecture.mdx where it mentions "per-agent credential storage" (around the
architecture.mdx:180 reference) to reflect instance-level storage; reference the
actual implementation that opens the file at instance level (see
src/main.rs:1120) and link or mention secrets.mdx migration notes so the wording
clarifies that secrets.redb lives under the instance data directory (e.g.,
~/.spacebot/data/secrets.redb) and should be handled in instance-level
backups/migration instructions.
tests/context_dump.rs-21-30 (1)

21-30: ⚠️ Potential issue | 🟡 Minor

Handle encrypted-store unlock failures explicitly.

At Line [26], let _ = store.unlock(&key); silently drops unlock errors. That can mask encrypted-store failures and produce misleading config-resolution behavior in this test harness.

Proposed fix
-    if let Ok(store) = spacebot::secrets::store::SecretsStore::new(&secrets_path) {
+    if let Ok(store) = spacebot::secrets::store::SecretsStore::new(&secrets_path) {
         let store = Arc::new(store);
         if store.is_encrypted() {
             let keystore = spacebot::secrets::keystore::platform_keystore();
             if let Some(key) = keystore.load_key("instance").ok().flatten() {
-                let _ = store.unlock(&key);
+                if let Err(error) = store.unlock(&key) {
+                    eprintln!("warning: failed to unlock secrets store for config resolution: {error}");
+                    return;
+                }
             }
         }
         spacebot::config::set_resolve_secrets_store(store);
     }
As per coding guidelines, "Don't silently discard errors. No `let _ =` on Results. Handle them, log them, or propagate them."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/context_dump.rs` around lines 21 - 30, The code silently drops the
Result from store.unlock(&key) which can hide failures; change the call in the
SecretsStore setup (where SecretsStore::new, is_encrypted, platform_keystore,
load_key and unlock are used) to explicitly handle the Result returned by
unlock: e.g., match or unwrap the Result and on Err either propagate/panic or at
minimum log the error with context (include the key id and error) before
proceeding, so failures are not discarded when calling store.unlock(&key) prior
to calling set_resolve_secrets_store.
docs/content/docs/(configuration)/secrets.mdx-175-189 (1)

175-189: ⚠️ Potential issue | 🟡 Minor

Secrets API docs are out of sync with the implemented contract.

Please align this section with runtime behavior:

  • PUT is /api/secrets/{name} (not /api/secrets).
  • Unlock uses a master key (hex), not a password.
  • Import behavior should document overwrite/conflict semantics (skips vs updates).

Also applies to: 203-208, 231-239

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/content/docs/`(configuration)/secrets.mdx around lines 175 - 189, Update
the Secrets API docs to match runtime behavior: change the PUT endpoint from
`/api/secrets` to `/api/secrets/{name}` and show the request body without a
top-level "name" field (since the name comes from the path); replace any mention
of "password" for unlock with "master key (hex)" and show the expected hex
format; and add explicit import semantics describing whether the import endpoint
skips existing secrets or updates/overwrites them (state clearly which behavior
the runtime implements and any flags to force overwrite). Apply the same
corrections to the other occurrences of this section referenced (lines ~203-208
and ~231-239).
src/main.rs-1268-1271 (1)

1268-1271: ⚠️ Potential issue | 🟡 Minor

Silent error discard when migrating keystore entry.

If keystore.store_key fails, the key won't persist and the user will need to manually unlock after next restart without any warning.

Proposed fix
         if let Ok(Some(key)) = keystore.load_key(&agent_id) {
             // Migrate the key to the instance-level keystore entry.
-            let _ = keystore.store_key(KEYSTORE_INSTANCE_ID, &key);
+            if let Err(error) = keystore.store_key(KEYSTORE_INSTANCE_ID, &key) {
+                tracing::warn!(%error, "failed to migrate master key to instance keystore");
+            }
             return Some(key);
         }

As per coding guidelines: "Don't silently discard errors. No let _ = on Results."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.rs` around lines 1268 - 1271, The code currently discards the Result
from keystore.store_key when migrating a key (in the block using
keystore.load_key, keystore.store_key, KEYSTORE_INSTANCE_ID and agent_id);
replace the silent `let _ =` with proper error handling: match or use if let
Err(e) = keystore.store_key(KEYSTORE_INSTANCE_ID, &key) and emit a clear
warning/error log that includes the agent_id and the error details so failures
are visible, and decide whether to still return Some(key) or propagate the error
per surrounding semantics (at minimum log the failure instead of dropping it
silently).
src/api/secrets.rs-338-355 (1)

338-355: ⚠️ Potential issue | 🟡 Minor

Keystore deletion should happen after successful store.lock().

The current order deletes the key from the OS credential store before calling store.lock(). If lock() fails, the key is already removed from the keystore, potentially leaving the system in an inconsistent state where the store is unlocked but the key is no longer persisted.

Proposed fix
-    // Clear key from OS credential store too.
-    let keystore = crate::secrets::keystore::platform_keystore();
-    if let Err(error) = keystore.delete_key(KEYSTORE_INSTANCE_ID) {
-        tracing::warn!(%error, "failed to delete master key from OS credential store");
-    }
-
     match store.lock() {
-        Ok(()) => Json(serde_json::json!({
+        Ok(()) => {
+            // Clear key from OS credential store after successful lock.
+            let keystore = crate::secrets::keystore::platform_keystore();
+            if let Err(error) = keystore.delete_key(KEYSTORE_INSTANCE_ID) {
+                tracing::warn!(%error, "failed to delete master key from OS credential store");
+            }
+            Json(serde_json::json!({
             "state": "locked",
             "message": "Secret manager locked. Secrets remain encrypted on disk."
-        }))
-        .into_response(),
+            }))
+            .into_response()
+        }
         Err(error) => (
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/secrets.rs` around lines 338 - 355, Move the keystore deletion to
occur only after store.lock() succeeds: call store.lock() first, check that it
returned Ok(()) and only then obtain the platform keystore and call
keystore.delete_key(KEYSTORE_INSTANCE_ID); if store.lock() returns Err(error) do
not delete the OS key and return the Err branch (StatusCode::BAD_REQUEST ...).
Ensure you keep the same error logging (tracing::warn!(%error, "failed to delete
master key from OS credential store")) but invoke it only on a failed
keystore.delete_key call after successful store.lock().
src/api/secrets.rs-218-231 (1)

218-231: ⚠️ Potential issue | 🟡 Minor

Distinguish "not found" from other errors in secret_info.

All errors from get_metadata are mapped to NOT_FOUND, but if the store is locked or encounters an I/O error, returning 404 is misleading. Other endpoints (e.g., put_secret, delete_secret) explicitly check for StoreState::Locked and return StatusCode::LOCKED.

Proposed fix
+    if store.state() == StoreState::Locked {
+        return (
+            StatusCode::LOCKED,
+            Json(serde_json::json!({"error": "secret store is locked — unlock first"})),
+        )
+            .into_response();
+    }
+
     match store.get_metadata(&name) {
         Ok(meta) => Json(SecretInfoResponse {
             name,
             category: meta.category,
             created_at: meta.created_at,
             updated_at: meta.updated_at,
         })
         .into_response(),
-        Err(_) => (
+        Err(error) => (
             StatusCode::NOT_FOUND,
-            Json(serde_json::json!({"error": format!("secret '{name}' not found")})),
+            Json(serde_json::json!({"error": format!("secret '{name}' not found: {error}")})),
         )
             .into_response(),
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/secrets.rs` around lines 218 - 231, The current secret_info handler
maps all errors from store.get_metadata(&name) to NOT_FOUND; update the Err
branch to inspect the error returned by get_metadata and return appropriate
status codes: if the error indicates StoreState::Locked (match
StoreState::Locked or the store error variant that wraps it) return
StatusCode::LOCKED, if the error is a not-found variant return
StatusCode::NOT_FOUND as before, otherwise return an internal server error
(StatusCode::INTERNAL_SERVER_ERROR) with a useful JSON error message; locate the
code around the call to get_metadata in the secret_info handler and replace the
blanket Err(_) arm with a match that checks for StoreState::Locked and other
specific error variants before falling back to 500.
src/main.rs-1191-1248 (1)

1191-1248: ⚠️ Potential issue | 🟡 Minor

Multiple silent error discards in migration logic.

Per coding guidelines, errors should not be silently discarded. While migration is "best effort," logging failures helps with debugging:

  • Line 1198: Err(_) => return — no indication why agent directory couldn't be read
  • Line 1215: Err(_) => continue — legacy store open failures are silently skipped
  • Line 1223: let _ = legacy_store.unlock(&key) — unlock failures discarded
  • Line 1232: Err(_) => continue — export failures silently skipped
Proposed fix
 fn migrate_legacy_agent_stores(
     instance_dir: &std::path::Path,
     target_store: &spacebot::secrets::store::SecretsStore,
 ) {
     let agents_dir = instance_dir.join("agents");
     let entries = match std::fs::read_dir(&agents_dir) {
         Ok(entries) => entries,
-        Err(_) => return,
+        Err(error) => {
+            tracing::debug!(%error, "no legacy agent stores to migrate");
+            return;
+        }
     };

     // ... inside loop ...
         let legacy_store = match spacebot::secrets::store::SecretsStore::new(&secrets_path) {
             Ok(store) => store,
-            Err(_) => continue,
+            Err(error) => {
+                tracing::debug!(%error, path = %secrets_path.display(), "skipping unreadable legacy store");
+                continue;
+            }
         };

         if legacy_store.is_encrypted() {
             let agent_id = entry.file_name().to_string_lossy().to_string();
             let keystore = spacebot::secrets::keystore::platform_keystore();
             if let Some(key) = keystore.load_key(&agent_id).ok().flatten() {
-                let _ = legacy_store.unlock(&key);
+                if let Err(error) = legacy_store.unlock(&key) {
+                    tracing::debug!(%error, %agent_id, "failed to unlock legacy store, skipping");
+                    continue;
+                }
             } else {
                 continue;
             }
         }

         let export = match legacy_store.export_all() {
             Ok(export) => export,
-            Err(_) => continue,
+            Err(error) => {
+                tracing::debug!(%error, path = %secrets_path.display(), "failed to export legacy store");
+                continue;
+            }
         };

As per coding guidelines: "Don't silently discard errors. No let _ = on Results."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.rs` around lines 1191 - 1248, In migrate_legacy_agent_stores: don't
silently discard errors — when read_dir(&agents_dir) fails, log the error
(include agents_dir.display()) before returning; when
SecretsStore::new(&secrets_path) returns Err, log that failure with the
secrets_path and error and continue; when legacy_store.unlock(&key) returns Err,
handle it by logging the error with agent_id and skip that agent; when
legacy_store.export_all() returns Err, log the export failure with secrets_path
and the error and continue; include clear contextual messages and the error
details for each of these failure points so debugging is possible.
src/api/secrets.rs-314-327 (1)

314-327: ⚠️ Potential issue | 🟡 Minor

Match on typed SecretsError variants instead of string inspection.

The error classification at lines 315-317 relies on error.to_string().contains("invalid") which is fragile. Since SecretsError has typed variants including InvalidKey, use pattern matching:

let status = match error {
    SecretsError::InvalidKey | SecretsError::DecryptionFailed(_) => StatusCode::UNAUTHORIZED,
    _ => StatusCode::INTERNAL_SERVER_ERROR,
};

This avoids brittle string matching and leverages the type system.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/secrets.rs` around lines 314 - 327, Replace the fragile string
inspection on the error variable with a match on the SecretsError enum:
pattern-match error (the Err(error) branch) against SecretsError::InvalidKey and
SecretsError::DecryptionFailed(_) to return StatusCode::UNAUTHORIZED and fall
back to StatusCode::INTERNAL_SERVER_ERROR for other variants, then build the
Json response with error.to_string() as before; reference the Err(error) branch
and the SecretsError variants (InvalidKey, DecryptionFailed) when making this
change.
src/config.rs-446-455 (1)

446-455: ⚠️ Potential issue | 🟡 Minor

LlmConfig secret mappings include keys that are not modeled in config structs.

Line 447 (cerebras_key) and Line 452 (sambanova_key) don’t have corresponding fields in LlmConfig / TOML parsing structs in this file, so these mappings can’t resolve into runtime provider config.

🔧 Proposed fix (remove dead mappings until full provider support is added)
-            SecretField {
-                toml_key: "cerebras_key",
-                secret_name: "CEREBRAS_API_KEY",
-                instance_pattern: None,
-            },
-            SecretField {
-                toml_key: "sambanova_key",
-                secret_name: "SAMBANOVA_API_KEY",
-                instance_pattern: None,
-            },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` around lines 446 - 455, The mapping entries in the SecretField
array referencing toml_key "cerebras_key" and "sambanova_key" are dead because
LlmConfig (and the TOML parsing structs) do not define fields for those
providers; remove the SecretField structs with secret_name "CEREBRAS_API_KEY"
and "SAMBANOVA_API_KEY" (the two entries with toml_key "cerebras_key" and
"sambanova_key") from the secret mappings until full provider support and
corresponding fields are added to LlmConfig so the mappings can resolve at
runtime.
src/sandbox.rs-467-467 (1)

467-467: ⚠️ Potential issue | 🟡 Minor

Fix duplicate comment numbering.

This comment says "13." but the previous section (line 460) already uses "13." for re-injecting safe environment variables. Should be "14." with subsequent sections renumbered accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sandbox.rs` at line 467, The inline comment "// 13. Re-inject tool
secrets from the secret store." is a duplicate of the earlier "// 13. Re-inject
safe environment variables" comment; update the former to "// 14. Re-inject tool
secrets from the secret store." and renumber any subsequent numbered section
comments accordingly so numbering is sequential (ensure you update the comment
containing "Re-inject safe environment variables" reference and all later
section comments that follow "re-inject tool secrets" to preserve correct
ordering).
🧹 Nitpick comments (6)
src/secrets/store.rs (1)

832-836: Validate ExportData.version during import.

import_all() currently ignores the version field. A guard now will prevent silent misreads when the export schema evolves.

♻️ Suggested guard
     pub fn import_all(
         &self,
         data: &ExportData,
         overwrite: bool,
     ) -> Result<ImportResult, SecretsError> {
+        if data.version != 1 {
+            return Err(SecretsError::Other(anyhow::anyhow!(
+                "unsupported export version: {}",
+                data.version
+            )));
+        }
         if self.state() == StoreState::Locked {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/store.rs` around lines 832 - 836, The import_all function
currently ignores ExportData.version; add a version guard at the start of the
import_all(&self, data: &ExportData, overwrite: bool) -> Result<ImportResult,
SecretsError> method that checks data.version against the supported schema
version(s) and returns a clear SecretsError when mismatched (instead of
proceeding), so imports for incompatible ExportData versions are rejected;
update any tests or callers expect ImportResult only on successful version
validation.
interface/src/api/client.ts (1)

1300-1304: Use StoreState for secret state response typing.

You already define StoreState; tightening these response shapes will improve type safety in UI flows.

🔧 Suggested patch
 export interface UnlockResponse {
-	state: string;
+	state: StoreState;
 	secret_count: number;
 	message: string;
 }

-	lockSecrets: async (): Promise<{ state: string; message: string }> => {
+	lockSecrets: async (): Promise<{ state: StoreState; message: string }> => {
 		const response = await fetch(`${API_BASE}/secrets/lock`, { method: "POST" });
 		if (!response.ok) {
 			const body = await response.json().catch(() => ({}));
 			throw new Error(body.error || `API error: ${response.status}`);
 		}
 		return response.json();
 	},

Also applies to: 2047-2054

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@interface/src/api/client.ts` around lines 1300 - 1304, The UnlockResponse
interface currently types state as a plain string; change it to use the existing
StoreState type (i.e., state: StoreState) to tighten typing for UI flows, and
update the similar response shape(s) referenced around the other occurrence
(lines 2047-2054) to also use StoreState instead of string so all secret/state
response interfaces consistently refer to StoreState (look for the
UnlockResponse interface and the related response interfaces in that second
block and replace their state:string with state: StoreState).
src/api/secrets.rs (2)

24-25: Consider extracting KEYSTORE_INSTANCE_ID to a shared module.

This constant is duplicated in src/main.rs (line 1118). Extract it to src/secrets/mod.rs or src/secrets/keystore.rs to maintain a single source of truth.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/secrets.rs` around lines 24 - 25, The constant KEYSTORE_INSTANCE_ID
is duplicated; extract it into a shared secrets module (e.g., create
KEYSTORE_INSTANCE_ID in src/secrets/mod.rs or src/secrets/keystore.rs) and
remove the local definitions in src/api/secrets.rs and src/main.rs; then export
it (pub const KEYSTORE_INSTANCE_ID: &str = "instance") and update all references
to use the shared symbol (e.g., crate::secrets::KEYSTORE_INSTANCE_ID or
crate::secrets::keystore::KEYSTORE_INSTANCE_ID) and add the necessary mod
declaration so both modules compile against the single source of truth.

421-432: Consider using async file I/O for config operations.

std::fs::read_to_string and std::fs::write are blocking operations called in an async context. While config files are typically small and this is unlikely to cause issues in practice, using tokio::fs would be more consistent with async patterns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/secrets.rs` around lines 421 - 432, The code in the handler uses
blocking std::fs::read_to_string to load the config (see config_path,
config_content and the match on std::fs::read_to_string), which should be
replaced with async file I/O to avoid blocking the async runtime; change the
read to use tokio::fs::read_to_string (await it) and adjust error handling to
the async result, and similarly replace any std::fs::write uses for saving
config with tokio::fs::write so all config file ops in this module (e.g., where
config_path is used) use non-blocking tokio async APIs.
src/main.rs (1)

2070-2074: Redundant repeated call to set_resolve_secrets_store in loop.

This call is inside the for agent_config in &resolved_agents loop, setting the same store reference repeatedly. The store is already set in bootstrap_secrets_store (line 1184). If this is needed for runtime resolution, it should be called once outside the loop.

Proposed fix
+    // Share the instance-level secrets store with all agents.
+    if let Some(secrets_store) = bootstrapped_store {
+        spacebot::config::set_resolve_secrets_store(secrets_store.clone());
+    }
+
     for agent_config in &resolved_agents {
         // ...
-        // Share the instance-level secrets store with this agent.
-        if let Some(secrets_store) = bootstrapped_store {
-            runtime_config.set_secrets(secrets_store.clone());
-            spacebot::config::set_resolve_secrets_store(secrets_store.clone());
-        }
+        if let Some(secrets_store) = bootstrapped_store {
+            runtime_config.set_secrets(secrets_store.clone());
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.rs` around lines 2070 - 2074, The loop repeatedly calls
spacebot::config::set_resolve_secrets_store with the same bootstrapped_store
reference, which is redundant; remove the repeated call from inside the for
agent_config in &resolved_agents loop and instead call
spacebot::config::set_resolve_secrets_store(bootstrapped_store.clone()) once
outside the loop (after bootstrapped_store is created or after the existing
bootstrap_secrets_store initialization); keep
runtime_config.set_secrets(secrets_store.clone()) only if per-agent
runtime_config requires it, otherwise also move that single global set to run
once using the existing bootstrapped_store variable to avoid repeated
assignments.
src/sandbox.rs (1)

376-376: Consider extracting common parameters into a struct.

The #[allow(clippy::too_many_arguments)] suppression is acceptable for now, but as an optional future refactor, bundling path_env, config, and tool_secrets into a WrapContext struct would reduce parameter sprawl across all three wrap methods.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sandbox.rs` at line 376, Extract the repeated parameters path_env,
config, and tool_secrets into a new struct (e.g., WrapContext) and update the
three wrap methods to accept a single WrapContext parameter instead of separate
arguments; specifically, define WrapContext with fields for path_env, config,
and tool_secrets, replace occurrences of those individual parameters in the
signatures of the wrap methods and their callers, and update internal references
inside the wrap functions to use context.path_env, context.config, and
context.tool_secrets to eliminate the #[allow(clippy::too_many_arguments)]
suppression.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba4d2a4 and b73c822.

⛔ Files ignored due to path filters (3)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
  • Cargo.toml is excluded by !**/*.toml
  • docs/content/docs/(configuration)/meta.json is excluded by !**/*.json
📒 Files selected for processing (44)
  • AGENTS.md
  • README.md
  • docs/content/docs/(configuration)/config.mdx
  • docs/content/docs/(configuration)/permissions.mdx
  • docs/content/docs/(configuration)/sandbox.mdx
  • docs/content/docs/(configuration)/secrets.mdx
  • docs/content/docs/(core)/agents.mdx
  • docs/content/docs/(core)/architecture.mdx
  • docs/content/docs/(deployment)/roadmap.mdx
  • docs/content/docs/(features)/workers.mdx
  • docs/content/docs/index.mdx
  • interface/src/api/client.ts
  • interface/src/components/UpdateBanner.tsx
  • interface/src/components/UpdatePill.tsx
  • interface/src/router.tsx
  • interface/src/routes/Overview.tsx
  • interface/src/routes/Settings.tsx
  • prompts/en/channel.md.j2
  • prompts/en/tools/secret_set_description.md.j2
  • prompts/en/worker.md.j2
  • src/agent/branch.rs
  • src/agent/channel.rs
  • src/agent/cortex.rs
  • src/api.rs
  • src/api/mcp.rs
  • src/api/secrets.rs
  • src/api/server.rs
  • src/api/state.rs
  • src/config.rs
  • src/error.rs
  • src/main.rs
  • src/opencode/worker.rs
  • src/prompts/engine.rs
  • src/prompts/text.rs
  • src/sandbox.rs
  • src/secrets.rs
  • src/secrets/keystore.rs
  • src/secrets/scrub.rs
  • src/secrets/store.rs
  • src/tools.rs
  • src/tools/secret_set.rs
  • src/tools/set_status.rs
  • tests/bulletin.rs
  • tests/context_dump.rs
💤 Files with no reviewable changes (3)
  • src/api/mcp.rs
  • interface/src/router.tsx
  • interface/src/components/UpdateBanner.tsx

Address review feedback from @vsumner:

1. Replace thread-local RESOLVE_SECRETS_STORE with a process-wide
   LazyLock<ArcSwap<...>> so secret: config references resolve correctly
   on any thread (file watcher, API handlers, tokio workers). The
   thread-local only worked on the main thread at startup, causing
   silent None resolution during hot-reload.

2. Fix wrong keyctl constant in store_key: 0x0b is KEYCTL_READ, not
   KEYCTL_GET_KEYRING_ID (0x00). The wrong operation returned payload
   size instead of the keyring serial, which was then passed as a
   keyring ID to add_key.

3. Fix missing NUL termination on description strings passed to kernel
   syscalls (add_key, keyctl SEARCH). Rust String does not include a
   trailing NUL — use CString to guarantee kernel-safe C strings.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/secrets/keystore.rs (1)

58-60: ⚠️ Potential issue | 🟠 Major

Do not silently ignore Keychain pre-delete errors.

store_key drops all delete_generic_password failures, which can mask permission/Keychain-state problems before writing the new key.

Suggested fix
-        // Delete any existing entry first (set_generic_password fails on duplicate).
-        let _ = delete_generic_password(SERVICE_NAME, instance_id);
+        // Delete any existing entry first (set_generic_password fails on duplicate).
+        if let Err(error) = delete_generic_password(SERVICE_NAME, instance_id) {
+            // errSecItemNotFound
+            if error.code() != -25300 {
+                return Err(SecretsError::Other(anyhow::anyhow!(
+                    "keychain pre-delete failed for {SERVICE_NAME}/{instance_id}: {error}"
+                )));
+            }
+        }

As per coding guidelines, "Don't silently discard errors. No let _ = on Results. Handle them, log them, or propagate them."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/keystore.rs` around lines 58 - 60, In store_key, don't ignore the
Result from delete_generic_password(SERVICE_NAME, instance_id); instead handle
failures (e.g., log a warning with the error and context or return/propagate the
Err) so Keychain/permission issues aren't masked before attempting
set_generic_password; replace the `let _ = delete_generic_password(...)` with an
explicit match or ?/map_err usage that records the error via the module's logger
or converts it into the function's error type.
src/config.rs (1)

3294-3301: ⚠️ Potential issue | 🟠 Major

Handle resolver failures explicitly and preserve env-first precedence.

At Line 3299, .ok() drops secret-store errors, and the secret: path currently skips env-first override semantics. This can hide real breakages and invert expected precedence.

🔧 Proposed fix
 fn resolve_env_value(value: &str) -> Option<String> {
     if let Some(alias) = value.strip_prefix("secret:") {
+        if let Ok(from_env) = std::env::var(alias) {
+            return Some(from_env);
+        }
+
         let guard = RESOLVE_SECRETS_STORE.load();
         match (*guard).as_ref() {
-            Some(store) => store
-                .get(alias)
-                .ok()
-                .map(|secret| secret.expose().to_string()),
+            Some(store) => match store.get(alias) {
+                Ok(secret) => Some(secret.expose().to_string()),
+                Err(error) => {
+                    tracing::warn!(secret_alias = alias, %error, "failed to resolve secret reference");
+                    None
+                }
+            },
             None => None,
         }
     } else if let Some(var_name) = value.strip_prefix("env:") {
         std::env::var(var_name).ok()
     } else {

As per coding guidelines: "Don't silently discard errors. No let _ = on Results..." and "Use config resolution order: env > DB > default with per-subsystem resolve() methods".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` around lines 3294 - 3301, The secret resolution branch
currently calls value.strip_prefix("secret:") then uses
RESOLVE_SECRETS_STORE.get(...).ok(), which silently drops store errors and also
bypasses the required env-first precedence; change this to first respect any
environment override (ensure the env-check runs before entering the "secret:"
handling), and then handle the secret store Result explicitly instead of .ok():
call store.get(alias) and match its Result to propagate or log errors (return an
Err or propagate the error through the resolve() path) when the secret store
fails, while only treating a genuine “not found” result as None; reference
RESOLVE_SECRETS_STORE, value.strip_prefix("secret:"), store.get(alias) and
secret.expose() when making the change.
src/sandbox.rs (2)

479-484: ⚠️ Potential issue | 🟠 Major

Block reserved env names when injecting tool secrets.

Tool secrets are injected without is_reserved_env_var() checks, so names like PATH, HOME, or TMPDIR can override hardened defaults.

🛡️ Proposed fix
@@
-        for (name, value) in tool_secrets {
-            cmd.arg("--setenv").arg(name).arg(value);
-        }
+        for (name, value) in tool_secrets {
+            if is_reserved_env_var(name) {
+                tracing::warn!(%name, "skipping tool secret with reserved env var name");
+                continue;
+            }
+            cmd.arg("--setenv").arg(name).arg(value);
+        }
@@
-        for (name, value) in tool_secrets {
-            cmd.env(name, value);
-        }
+        for (name, value) in tool_secrets {
+            if is_reserved_env_var(name) {
+                tracing::warn!(%name, "skipping tool secret with reserved env var name");
+                continue;
+            }
+            cmd.env(name, value);
+        }
@@
-        for (name, value) in tool_secrets {
-            cmd.env(name, value);
-        }
+        for (name, value) in tool_secrets {
+            if is_reserved_env_var(name) {
+                tracing::warn!(%name, "skipping tool secret with reserved env var name");
+                continue;
+            }
+            cmd.env(name, value);
+        }

Also applies to: 551-554, 598-601

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sandbox.rs` around lines 479 - 484, When injecting tool_secrets into the
subprocess (the loop that does for (name, value) in tool_secrets {
cmd.arg("--setenv").arg(name).arg(value); }), first check each name with
is_reserved_env_var(name) and skip (and optionally log/debug) any reserved env
names so we don't override PATH/HOME/TMPDIR; apply the same guard to the other
similar injection sites referenced (the other loops using cmd.arg("--setenv")
around tool_secrets) to ensure reserved vars are never injected into the
subprocess environment.

498-509: ⚠️ Potential issue | 🟠 Major

Unconditional Linux keyring isolation can make workers unspawnable.

If pre_exec_new_session_keyring() fails in restricted/containerized Linux environments, worker spawn fails hard. Consider a guarded fail-open mode (or capability probe) so operators can choose availability vs strict fail-closed behavior.

Also applies to: 612-619

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sandbox.rs` around lines 498 - 509, The current unconditional call to set
a pre-exec that runs crate::secrets::keystore::pre_exec_new_session_keyring()
makes spawn fail if that call errors in restricted/containerized environments;
change the logic around cmd.pre_exec so the pre-exec closure returns a Result
that either applies the new session keyring or, on failure, either logs the
error and returns Ok(()) (fail-open) or returns Err (fail-closed) based on a
configuration flag or capability probe. In practice, replace the direct unsafe
cmd.pre_exec(...) invocation with a closure that calls
pre_exec_new_session_keyring(), matches its Result, and uses a runtime flag
(e.g. enable_keyring_isolation or probe_keyring_capability) to decide whether to
ignore the error (log and Ok(())) or propagate it (Err), so operators can choose
availability vs strict isolation; apply the same pattern for the other
occurrence around pre_exec at the other location.
🧹 Nitpick comments (1)
src/config.rs (1)

446-455: Remove or fully wire unsupported LLM secret mappings.

SecretField entries for toml_key: "cerebras_key" and toml_key: "sambanova_key" don’t have corresponding config fields in this file, so they cannot be resolved through the current LLM config path.

♻️ Proposed cleanup
-            SecretField {
-                toml_key: "cerebras_key",
-                secret_name: "CEREBRAS_API_KEY",
-                instance_pattern: None,
-            },
-            SecretField {
-                toml_key: "sambanova_key",
-                secret_name: "SAMBANOVA_API_KEY",
-                instance_pattern: None,
-            },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` around lines 446 - 455, The SecretField entries for toml_key
"cerebras_key" and "sambanova_key" are orphaned (no matching config fields) so
either remove these SecretField entries or add corresponding config fields and
resolution logic; update the SecretField list in the same struct/initializer
where SecretField is defined (look for the SecretField initializer block in
src/config.rs) and either delete the two SecretField entries for
"cerebras_key"/"sambanova_key" or add matching config struct members and ensure
the LLM config path resolution code (the functions/methods that read secret_name
-> config field) is updated to include those keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/secrets/keystore.rs`:
- Around line 239-242: The delete_key function currently treats any negative
return from the KEYCTL_SEARCH syscall as "already deleted"; change it to check
errno and only treat ENOKEY as non-fatal: after calling KEYCTL_SEARCH in
delete_key, if the return is negative inspect the OS error (errno) and if it
equals ENOKEY return Ok(()), otherwise propagate the error (convert to Err with
the original error) instead of silently returning Ok. Use the same
error-handling pattern used in read_key to locate the behavior and ensure
KEYCTL_SEARCH failures like EACCES/EINVAL are surfaced.

---

Duplicate comments:
In `@src/config.rs`:
- Around line 3294-3301: The secret resolution branch currently calls
value.strip_prefix("secret:") then uses RESOLVE_SECRETS_STORE.get(...).ok(),
which silently drops store errors and also bypasses the required env-first
precedence; change this to first respect any environment override (ensure the
env-check runs before entering the "secret:" handling), and then handle the
secret store Result explicitly instead of .ok(): call store.get(alias) and match
its Result to propagate or log errors (return an Err or propagate the error
through the resolve() path) when the secret store fails, while only treating a
genuine “not found” result as None; reference RESOLVE_SECRETS_STORE,
value.strip_prefix("secret:"), store.get(alias) and secret.expose() when making
the change.

In `@src/sandbox.rs`:
- Around line 479-484: When injecting tool_secrets into the subprocess (the loop
that does for (name, value) in tool_secrets {
cmd.arg("--setenv").arg(name).arg(value); }), first check each name with
is_reserved_env_var(name) and skip (and optionally log/debug) any reserved env
names so we don't override PATH/HOME/TMPDIR; apply the same guard to the other
similar injection sites referenced (the other loops using cmd.arg("--setenv")
around tool_secrets) to ensure reserved vars are never injected into the
subprocess environment.
- Around line 498-509: The current unconditional call to set a pre-exec that
runs crate::secrets::keystore::pre_exec_new_session_keyring() makes spawn fail
if that call errors in restricted/containerized environments; change the logic
around cmd.pre_exec so the pre-exec closure returns a Result that either applies
the new session keyring or, on failure, either logs the error and returns Ok(())
(fail-open) or returns Err (fail-closed) based on a configuration flag or
capability probe. In practice, replace the direct unsafe cmd.pre_exec(...)
invocation with a closure that calls pre_exec_new_session_keyring(), matches its
Result, and uses a runtime flag (e.g. enable_keyring_isolation or
probe_keyring_capability) to decide whether to ignore the error (log and Ok(()))
or propagate it (Err), so operators can choose availability vs strict isolation;
apply the same pattern for the other occurrence around pre_exec at the other
location.

In `@src/secrets/keystore.rs`:
- Around line 58-60: In store_key, don't ignore the Result from
delete_generic_password(SERVICE_NAME, instance_id); instead handle failures
(e.g., log a warning with the error and context or return/propagate the Err) so
Keychain/permission issues aren't masked before attempting set_generic_password;
replace the `let _ = delete_generic_password(...)` with an explicit match or
?/map_err usage that records the error via the module's logger or converts it
into the function's error type.

---

Nitpick comments:
In `@src/config.rs`:
- Around line 446-455: The SecretField entries for toml_key "cerebras_key" and
"sambanova_key" are orphaned (no matching config fields) so either remove these
SecretField entries or add corresponding config fields and resolution logic;
update the SecretField list in the same struct/initializer where SecretField is
defined (look for the SecretField initializer block in src/config.rs) and either
delete the two SecretField entries for "cerebras_key"/"sambanova_key" or add
matching config struct members and ensure the LLM config path resolution code
(the functions/methods that read secret_name -> config field) is updated to
include those keys.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b73c822 and 688a93d.

📒 Files selected for processing (4)
  • src/config.rs
  • src/prompts/engine.rs
  • src/sandbox.rs
  • src/secrets/keystore.rs

…se-in-depth

- docs/secrets.mdx: soften absolute guarantee wording for secret
  scrubbing coverage.

- config.rs: log secret-store read errors in resolve_env_value instead
  of silently dropping them with .ok(). Failed secret: references now
  produce a tracing::warn with the alias and error.

- main.rs: log failures for tmpfs key deletion (security concern),
  keystore persistence, and secret store unlock instead of let _ =.

- sandbox.rs: apply is_reserved_env_var guard to tool secret injection
  in all three sandbox paths (bubblewrap, sandbox-exec, passthrough)
  for defense-in-depth consistency with passthrough_env handling.

- secrets/scrub.rs: sort secret values by descending length before
  replacement so longer secrets are scrubbed first, preventing partial
  replacement when one secret value is a prefix of another.

- secrets/store.rs: log metadata deserialization failures instead of
  silently skipping corrupted rows.

- tools/secret_set.rs: validate secret names against env-var-safe
  format (A-Z, 0-9, _ only, starting with a letter) before persisting,
  since names are injected as subprocess environment variables.
delete_key treated any negative KEYCTL_SEARCH return as 'not found',
silently swallowing EACCES/EINVAL errors. Now only ENOKEY returns Ok,
matching the error handling pattern in load_key.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (4)
src/secrets/scrub.rs (1)

198-206: ⚠️ Potential issue | 🟠 Major

Keep streaming and non-streaming redaction ordering consistent.

Line 202 correctly applies longest-first replacement for non-streaming text, but StreamScrubber::scrub() / flush() still use unsorted self.secrets. Overlapping secret values can still leak suffixes on streaming paths.

🔒 Suggested fix (sort once at construction)
 impl StreamScrubber {
     /// Create a new scrubber with the given secret name/value pairs.
-    pub fn new(secrets: Vec<(String, String)>) -> Self {
+    pub fn new(mut secrets: Vec<(String, String)>) -> Self {
+        // Longest-first prevents prefix-collision partial leaks.
+        secrets.sort_by(|a, b| b.1.len().cmp(&a.1.len()));
+
         let max_secret_len = secrets
             .iter()
             .map(|(_, value)| value.len())
             .max()
             .unwrap_or(0);
         Self {
             secrets,
             tail: String::new(),
             max_secret_len,
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/scrub.rs` around lines 198 - 206, Non-streaming scrub_secrets
sorts secrets by descending value length but StreamScrubber::scrub() and
StreamScrubber::flush() still iterate over unsorted self.secrets, allowing
overlapping secrets to leak during streaming; fix by sorting secrets once when
constructing StreamScrubber (store secrets in longest-first order) so both
scrub() and flush() use the pre-sorted self.secrets, ensuring consistent
longest-first replacement across streaming and non-streaming paths (refer to
scrub_secrets, StreamScrubber::scrub, StreamScrubber::flush, and self.secrets).
src/secrets/store.rs (2)

283-333: ⚠️ Potential issue | 🔴 Critical

Serialize mutating operations to avoid write/rotation races that can corrupt stored secrets.

set() can encode with the old cipher while rotate_key() commits a new salt/cipher, allowing old-cipher ciphertext to be committed after rotation and become undecryptable.

Suggested direction
 pub struct SecretsStore {
     db: Database,
     cipher_state: RwLock<Option<CipherState>>,
     encrypted: RwLock<bool>,
+    mutation_lock: std::sync::Mutex<()>,
 }

 // in new()
 Ok(Self {
     db,
     cipher_state: RwLock::new(None),
     encrypted: RwLock::new(encrypted),
+    mutation_lock: std::sync::Mutex::new(()),
 })

 pub fn set(...) -> Result<(), SecretsError> {
+    let _guard = self.mutation_lock.lock().expect("mutation lock poisoned");
     ...
 }

 pub fn rotate_key(&self) -> Result<Vec<u8>, SecretsError> {
+    let _guard = self.mutation_lock.lock().expect("mutation lock poisoned");
     ...
 }

Apply the same lock to enable_encryption, lock, unlock, import_all, and delete to keep state transitions atomic.

Also applies to: 718-794

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/store.rs` around lines 283 - 333, The set() path can race with
rotate_key() (and other mutating ops) because encoding happens before the DB
commit and key rotation can change cipher/salt, so acquire and hold the same
mutation lock for the entire operation (from before encode_value() through
commit) to serialize state-changing operations; apply the same locking
discipline to rotate_key(), enable_encryption(), lock(), unlock(), import_all(),
and delete() so they all use the identical mutex/guard to make encode/commit and
key/state transitions atomic and prevent old-cipher values being written after a
rotation.

276-282: ⚠️ Potential issue | 🟠 Major

Enforce secret-name policy in SecretsStore::set as the central guardrail.

Validation currently depends on caller behavior; direct store/API paths can still persist names that are unsafe or operationally dangerous for env injection.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/store.rs` around lines 276 - 282, Add centralized name validation
inside SecretsStore::set so all callers are guarded; before persisting, validate
the provided name against the project's secret-name policy (allowed characters,
length, reserved words, env-injection safety) and return an appropriate
SecretsError (e.g., InvalidName) if it fails. Apply the check at the top of the
set method (method: SecretsStore::set) and ensure behavior is consistent for all
categories (SecretCategory) so direct store/API paths cannot bypass it; update
any existing callers/tests that currently expect the store to accept invalid
names to reflect the new validation.
src/main.rs (1)

1228-1230: ⚠️ Potential issue | 🟠 Major

Handle legacy unlock/store failures instead of discarding them.

Both legacy_store.unlock(&key) and keystore.store_key(...) failures are ignored, which can silently skip migration and later auto-unlock.

Suggested fix
-            if let Some(key) = keystore.load_key(&agent_id).ok().flatten() {
-                let _ = legacy_store.unlock(&key);
+            if let Some(key) = keystore.load_key(&agent_id).ok().flatten() {
+                if let Err(error) = legacy_store.unlock(&key) {
+                    tracing::warn!(agent_id = %agent_id, %error, "failed to unlock legacy secrets store");
+                    continue;
+                }
             } else {
                 continue; // Can't read encrypted store without key.
             }
...
-        if let Ok(Some(key)) = keystore.load_key(&agent_id) {
-            // Migrate the key to the instance-level keystore entry.
-            let _ = keystore.store_key(KEYSTORE_INSTANCE_ID, &key);
+        if let Ok(Some(key)) = keystore.load_key(&agent_id) {
+            if let Err(error) = keystore.store_key(KEYSTORE_INSTANCE_ID, &key) {
+                tracing::warn!(agent_id = %agent_id, %error, "failed to migrate legacy key to instance-level keystore entry");
+            }
             return Some(key);
         }
As per coding guidelines: "Don't silently discard errors. No `let _ =` on Results. Handle them, log them, or propagate them."

Also applies to: 1274-1277

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.rs` around lines 1228 - 1230, The code is silently discarding
Results from legacy_store.unlock(&key) and from keystore.store_key(...); instead
of using "let _ =", match or use ?/map_err to surface failures: for the block
that calls keystore.load_key(&agent_id) and then legacy_store.unlock(&key),
check the Result from legacy_store.unlock and either propagate the error or log
it with context (include agent_id and operation name). Do the same for the later
keystore.store_key(...) call (and any legacy_store.store_key calls) — handle the
Result by matching Ok/Err, logging descriptive errors or returning Err so
migration/auto-unlock failures are not ignored. Ensure to reference the existing
functions keystore.load_key, legacy_store.unlock, and keystore.store_key when
updating the code paths.
🧹 Nitpick comments (1)
src/config.rs (1)

3317-3324: Keep process-wide resolver state and RuntimeConfig secrets in sync.

RESOLVE_SECRETS_STORE and RuntimeConfig::secrets can diverge because RuntimeConfig::set_secrets only updates the runtime field. Syncing both in one place reduces stale-resolution risk during reload/update paths.

♻️ Proposed refactor
 pub fn set_secrets(&self, secrets: Arc<crate::secrets::store::SecretsStore>) {
-    self.secrets.store(Arc::new(Some(secrets)));
+    self.secrets.store(Arc::new(Some(secrets.clone())));
+    set_resolve_secrets_store(secrets);
 }

Also applies to: 5549-5550, 5627-5630

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.rs` around lines 3317 - 3324, The process-wide resolver state
(RESOLVE_SECRETS_STORE) and the per-config secrets field can go out of sync;
update the code so setting secrets updates both places: modify
RuntimeConfig::set_secrets to also call the process-wide setter
(set_resolve_secrets_store) or directly store into RESOLVE_SECRETS_STORE (using
std::sync::Arc::new(Some(store.clone()))), ensuring you pass an
Arc<crate::secrets::store::SecretsStore> and clone where needed; apply the same
sync change to the other setters/locations referenced (around the other
occurrences noted) so the runtime field and RESOLVE_SECRETS_STORE remain
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/config.rs`:
- Around line 446-455: The added SecretField entries for toml_key "cerebras_key"
and "sambanova_key" are dead because LlmConfig/TOML structs and provider
registration don't define or use these keys; remove these SecretField blocks or
fully wire them by adding corresponding fields to LlmConfig and updating the
provider registration and TOML parsing so the secrets are actually consumed.
Locate the SecretField entries in the array (the structs with toml_key
"cerebras_key" and "sambanova_key"), then either delete those entries or add
matching fields to LlmConfig (and any TOML serde structs) and register/handle
those provider keys in the provider registration code so the secrets are
correctly mapped and used.

In `@src/main.rs`:
- Line 1118: Replace the fixed KEYSTORE_INSTANCE_ID constant with a per-instance
identifier: implement a keystore_instance_id(instance_dir: &Path) -> String (or
similar) that derives a unique id from the instance directory (e.g., hash or
canonicalized path) and use &keystore_instance_id(&instance_dir) wherever
KEYSTORE_INSTANCE_ID was used (references in the diff include the symbol
KEYSTORE_INSTANCE_ID and its call sites around the keystore creation/lookup code
such as the blocks that use KEYSTORE_INSTANCE_ID at lines noted in the review).
Compute the keystore_instance_id once (or pass it through to functions that need
it) and replace all occurrences so each instance directory gets a unique
keystore id, preventing cross-instance collisions.
- Around line 780-781: Replace the plain std::fs::write call with creating and
writing via std::fs::OpenOptions configured with owner-only permissions (Unix)
to ensure exported secret backups are created with 0o600; specifically, use
OpenOptions::new().write(true).create(true).truncate(true).mode(0o600).open(&output)
and then write_all the content (and keep the existing with_context wrapper for
errors), and add the import for std::os::unix::fs::OpenOptionsExt so the
.mode(...) call compiles; this change targets the write site where
std::fs::write(&output, content) is used and ensures the file is created with
restrictive permissions.

In `@src/secrets/store.rs`:
- Around line 501-506: In tool_env_vars(), do not silently ignore failures from
self.get(name); either change tool_env_vars() to return a Result and propagate
each Err from self.get(name) upward, or at minimum log each per-secret error
before skipping it (e.g., include the secret name and the error), instead of
using `if let Ok(secret) = self.get(name)` that discards errors; locate the loop
over metadata in tool_env_vars(), replace the `if let Ok(secret) =
self.get(name)` pattern with explicit match on self.get(name) and handle Err by
returning or logging, and keep the successful branch that calls
secret.expose().to_string() and inserts into result.
- Around line 218-221: The code currently treats any
table.get(CONFIG_KEY_ENCRYPTED) error as "false" which can misclassify
encrypted/corrupted stores; change the match on table.get(CONFIG_KEY_ENCRYPTED)
to handle Err explicitly instead of falling through to `_ => false`: either
propagate the error (use `?` or return Err(e) from the enclosing function) or
return a distinct Result/enum indicating the read failure, keeping only
Ok(Some(v)) => v.value() == [1] and Ok(None) => false as the non-error branches;
update the function that calls table.get(CONFIG_KEY_ENCRYPTED) accordingly so
errors are not silenced.

---

Duplicate comments:
In `@src/main.rs`:
- Around line 1228-1230: The code is silently discarding Results from
legacy_store.unlock(&key) and from keystore.store_key(...); instead of using
"let _ =", match or use ?/map_err to surface failures: for the block that calls
keystore.load_key(&agent_id) and then legacy_store.unlock(&key), check the
Result from legacy_store.unlock and either propagate the error or log it with
context (include agent_id and operation name). Do the same for the later
keystore.store_key(...) call (and any legacy_store.store_key calls) — handle the
Result by matching Ok/Err, logging descriptive errors or returning Err so
migration/auto-unlock failures are not ignored. Ensure to reference the existing
functions keystore.load_key, legacy_store.unlock, and keystore.store_key when
updating the code paths.

In `@src/secrets/scrub.rs`:
- Around line 198-206: Non-streaming scrub_secrets sorts secrets by descending
value length but StreamScrubber::scrub() and StreamScrubber::flush() still
iterate over unsorted self.secrets, allowing overlapping secrets to leak during
streaming; fix by sorting secrets once when constructing StreamScrubber (store
secrets in longest-first order) so both scrub() and flush() use the pre-sorted
self.secrets, ensuring consistent longest-first replacement across streaming and
non-streaming paths (refer to scrub_secrets, StreamScrubber::scrub,
StreamScrubber::flush, and self.secrets).

In `@src/secrets/store.rs`:
- Around line 283-333: The set() path can race with rotate_key() (and other
mutating ops) because encoding happens before the DB commit and key rotation can
change cipher/salt, so acquire and hold the same mutation lock for the entire
operation (from before encode_value() through commit) to serialize
state-changing operations; apply the same locking discipline to rotate_key(),
enable_encryption(), lock(), unlock(), import_all(), and delete() so they all
use the identical mutex/guard to make encode/commit and key/state transitions
atomic and prevent old-cipher values being written after a rotation.
- Around line 276-282: Add centralized name validation inside SecretsStore::set
so all callers are guarded; before persisting, validate the provided name
against the project's secret-name policy (allowed characters, length, reserved
words, env-injection safety) and return an appropriate SecretsError (e.g.,
InvalidName) if it fails. Apply the check at the top of the set method (method:
SecretsStore::set) and ensure behavior is consistent for all categories
(SecretCategory) so direct store/API paths cannot bypass it; update any existing
callers/tests that currently expect the store to accept invalid names to reflect
the new validation.

---

Nitpick comments:
In `@src/config.rs`:
- Around line 3317-3324: The process-wide resolver state (RESOLVE_SECRETS_STORE)
and the per-config secrets field can go out of sync; update the code so setting
secrets updates both places: modify RuntimeConfig::set_secrets to also call the
process-wide setter (set_resolve_secrets_store) or directly store into
RESOLVE_SECRETS_STORE (using std::sync::Arc::new(Some(store.clone()))), ensuring
you pass an Arc<crate::secrets::store::SecretsStore> and clone where needed;
apply the same sync change to the other setters/locations referenced (around the
other occurrences noted) so the runtime field and RESOLVE_SECRETS_STORE remain
consistent.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 688a93d and 598249a.

📒 Files selected for processing (7)
  • docs/content/docs/(configuration)/secrets.mdx
  • src/config.rs
  • src/main.rs
  • src/sandbox.rs
  • src/secrets/scrub.rs
  • src/secrets/store.rs
  • src/tools/secret_set.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/content/docs/(configuration)/secrets.mdx

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/secrets/keystore.rs (1)

58-60: ⚠️ Potential issue | 🟠 Major

Handle Keychain pre-delete errors instead of discarding them.

Line 59 drops a Result, which can hide Keychain permission/state failures before set_generic_password.

Proposed fix
-        // Delete any existing entry first (set_generic_password fails on duplicate).
-        let _ = delete_generic_password(SERVICE_NAME, instance_id);
+        // Delete any existing entry first (set_generic_password fails on duplicate).
+        if let Err(error) = delete_generic_password(SERVICE_NAME, instance_id) {
+            if error.code() != -25300 {
+                return Err(SecretsError::Other(anyhow::anyhow!(
+                    "keychain pre-delete failed for {SERVICE_NAME}/{instance_id}: {error}"
+                )));
+            }
+        }

As per coding guidelines: "Don't silently discard errors. No let _ = on Results. Handle them, log them, or propagate them."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/secrets/keystore.rs` around lines 58 - 60, The call to
delete_generic_password(SERVICE_NAME, instance_id) currently drops its Result
(let _ = ...) which hides keychain errors; replace that with proper handling in
the function that calls it (match or ?): call
delete_generic_password(SERVICE_NAME, instance_id) and if it returns Err, either
propagate the error (use ? to return a Result from the surrounding function) or
log it with context (include SERVICE_NAME and instance_id) and continue only for
expected "not found" cases; ensure you reference delete_generic_password,
set_generic_password, SERVICE_NAME and instance_id when adding the
match/log/propagate so failures are not silently discarded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/secrets/keystore.rs`:
- Around line 58-60: The call to delete_generic_password(SERVICE_NAME,
instance_id) currently drops its Result (let _ = ...) which hides keychain
errors; replace that with proper handling in the function that calls it (match
or ?): call delete_generic_password(SERVICE_NAME, instance_id) and if it returns
Err, either propagate the error (use ? to return a Result from the surrounding
function) or log it with context (include SERVICE_NAME and instance_id) and
continue only for expected "not found" cases; ensure you reference
delete_generic_password, set_generic_password, SERVICE_NAME and instance_id when
adding the match/log/propagate so failures are not silently discarded.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 598249a and 380cb81.

📒 Files selected for processing (1)
  • src/secrets/keystore.rs

…store

Add Mutex<()> mutation guard to serialize set/lock/rotate_key/enable_encryption
and prevent interleaving races that could write plaintext to an encrypted store.

Write secret exports with 0600 permissions on Unix. Propagate config table read
errors instead of silently defaulting to unencrypted. Log per-secret decryption
failures in tool_env_vars() instead of silently skipping.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/main.rs`:
- Around line 1293-1296: The migration currently discards the Result from
keystore.store_key(KEYSTORE_INSTANCE_ID, &key) which hides failures; update the
code around keystore.load_key(&agent_id) / keystore.store_key(...) to handle the
Result explicitly—either propagate the error (using ? if inside a function
returning Result), or match/if let Err(e) to log the failure via the appropriate
logger and decide on fallback behavior (e.g., return None or continue) instead
of using let _ =; ensure KEYSTORE_INSTANCE_ID, agent_id, and the returned key
handling remain consistent.
- Around line 1246-1252: The call to legacy_store.unlock(&key) currently
discards any Result (let _ = ...); change this to handle the Result from
legacy_store.unlock in the unlock logic: call legacy_store.unlock(&key) and
match or use if let Err(e) to log the error (including context like agent_id and
which store) and decide whether to continue or propagate; ensure you use the
project logger (e.g., tracing::error or the existing logger in scope) rather
than silently ignoring the error; keep the keystore.load_key(&agent_id) flow
intact but replace the silent discard with explicit error handling for
legacy_store.unlock.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 380cb81 and b64a050.

📒 Files selected for processing (2)
  • src/main.rs
  • src/secrets/store.rs

The [target.'cfg(target_os = "macos")'.dependencies] section was placed
mid-file, causing all subsequent deps (serenity, emojis, tempfile, etc.)
to be scoped to macOS only. This broke Linux CI with 602 unresolved crate
errors.
Test keyctl once at boot. If the syscall is blocked (restrictive seccomp,
gVisor, etc.), disable keyring isolation but let workers start normally.
Workers always start — keyring isolation is defense-in-depth, not a hard
requirement.
@jamiepine jamiepine merged commit fdab751 into main Mar 1, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants