Skip to content

Background agent improvements: retry, env scrub, tasks, budget, reactive triggers#108

Merged
priyanshujain merged 42 commits intomasterfrom
background-agent-refactor
Mar 20, 2026
Merged

Background agent improvements: retry, env scrub, tasks, budget, reactive triggers#108
priyanshujain merged 42 commits intomasterfrom
background-agent-refactor

Conversation

@priyanshujain
Copy link
Copy Markdown
Collaborator

@priyanshujain priyanshujain commented Mar 20, 2026

Summary

Six improvements to background agent capabilities, implemented across 34 atomic commits:

  • Error-classified retry — Fix dead-code NextRetryAtNextRetry bug in River worker. Classify errors (auth, rate-limit, transient) for adaptive retry timing. Cancel non-retryable jobs immediately with river.JobCancel.
  • Environment scrubbing — Strip sensitive env vars (API keys, tokens, secrets) from all external agent subprocesses (Gemini, Codex), not just Claude. Uses allowlist/blocklist pattern matching.
  • Persistent task tracking — New service/tasks/ package with SQLite/Postgres schema. TaskTracker gets write-through DB backing for cross-session task visibility. 7-day auto-cleanup.
  • Budget controlsBudgetTracker wraps UsageRecorder to enforce per-task cost limits. Schedules carry model_tier (default: fast) and max_budget_usd. Shared pricing extracted to provider/pricing.go.
  • Event-driven reactive triggers — New reactive schedule type. SQL WHERE clauses evaluated against synced data (gmail, whatsapp, imessage, applenotes) with watermark tracking. SyncNotifier signals trigger evaluation after each sync cycle.
  • Spec tests — End-to-end tests with real LLM: reactive email trigger pipeline and no-match negative test.

Review fixes (post-review)

Seven fixes from PR review:

  1. P0 — SQL injection in trigger queries — Replaced denylist (DROP, DELETE, etc.) with allowlist of permitted column names per source + allowed SQL keywords. String literals stripped before validation. Blocks UNION SELECT, subqueries, ATTACH, PRAGMA, load_extension, unknown columns.
  2. P0 — WhatsApp reactive triggers never firednotifier.Notify("whatsapp") was only called on goroutine exit (streaming sync). Added periodic 30s ticker to match cadence of other sync sources.
  3. P1 — Deduplicated openTaskTracker — Identical function in internal/cli/chat.go and channel/telegram/session.go extracted to tools.OpenPersistentTaskTracker.
  4. P1 — Unreadable task row formatting%v on map[string]string produced Go map syntax. Now formats as sorted key: value | key: value pairs.
  5. P2 — Dead code in NextRetry — Auth/context-window error cases were unreachable (already cancelled in Work() via river.JobCancel). Removed.
  6. P2 — Env scrub gaps — Added DATABASE_URL, REDIS_URL, MONGODB_URI, _DSN, _PRIVATE_KEY to sensitive patterns.
  7. P2 — Missing budget test — Added TestLoop_BudgetExceeded and TestLoop_BudgetNotExceeded to verify agent loop stops on budget exceeded with partial text.

Test plan

  • go test ./service/scheduler/ — store CRUD, reactive queries, trigger validation (allowlist)
  • go test ./service/tasks/ — insert, complete, fail, list, cleanup
  • go test ./agent/ — budget tracker (incl. agent loop integration), persistent task tracker, env scrubbing
  • go test ./daemon/jobs/ — sync notifier, scheduled task retry classification
  • go test ./provider/ — shared pricing estimation
  • Spec tests pass with Gemini (OBK_TEST_PROVIDER=gemini go test ./spectest/ -run Reactive -v)

NextRetryAt was dead code — River's Worker interface expects NextRetry,
so the custom 15-minute retry backoff was never active.
Auth/context-window errors delay retry by 1 year (effectively never).
429 rate limits retry after 30min, 5xx after 10min, default stays 15min.
Auth and context-window errors are now immediately cancelled instead of
retrying. Uses job.MaxAttempts instead of hardcoded 2 for retry check.
Gives scheduled tasks one more retry attempt before permanent failure
notification, working alongside the new error-classified retry backoff.
Adds scrubEnv function that strips API keys, tokens, secrets, and
passwords from environment variables passed to external agent processes.
Uses allowlist for safe vars and pattern matching for sensitive ones.
Previously only Claude had CLAUDECODE stripped. Now all agents (Gemini,
Codex, Claude) get sensitive env vars removed via scrubEnv.
Dual SQLite/Postgres schema for the tasks table. Stores task ID, prompt,
agent kind, status, timestamps, output, and error.
Insert, SetCompleted, SetFailed, Get, List, CountRunning, DeleteOlderThan.
Get returns (nil, nil) for not-found. List orders by started_at DESC.
Tests: InsertAndGet, SetCompleted, SetFailed, GetNotFound, List ordering,
CountRunning, DeleteOlderThan, Migrate idempotent.
Cleanup deletes completed/failed tasks older than 7 days. Called on
tracker construction (best-effort, errors logged but not propagated).
Adds TasksConfig struct, TasksDataDSN() method (default ~/.obk/tasks/data.db),
and wires into Default() and applyDefaults().
TaskTracker now optionally writes to a database. Start/Complete/Fail
write-through to DB. Get falls through to DB for cross-session lookup.
List reads from DB when available. DB errors are logged, never block.
Tests cross-session Get/List via DB fallthrough and write-through
verification (Start writes to DB, Complete updates DB).
Adds openTaskTracker helper that opens tasks DB and creates a persistent
tracker. Falls back to in-memory on DB error. Defers tracker.Close().
Tracker lives for lifetime of SessionManager, persisting tasks across
session restarts. Falls back to in-memory on DB error.
DelegateTaskConfig now accepts AuditLogger. Async task completion/failure
is logged with context="delegated". Scheduled task worker records results
in tasks DB when TasksDB is set.
Moves model pricing data from internal/cli/usage.go to provider package.
Adds EstimateCost function with prefix-matching for versioned model names.
Replaces local modelPricing map and estimateCost with thin wrapper
around provider.EstimateCost. Removes duplicate pricing data.
BudgetTracker wraps UsageRecorder, accumulates cost per LLM call, and
returns error when max budget is exceeded. 0 means unlimited.
After each LLM call, checks the budget. On exceeded, returns partial
text and error for graceful degradation.
Add model_tier and max_budget_usd to INSERT/SELECT/scan operations.
Remove unused alterBudget const from schema.go.
Add ModelTier and MaxBudgetUSD to ScheduledTaskArgs. Use Router.Resolve
to pick model by tier (default "fast"). Wire BudgetTracker when budget > 0.
Add model_tier and max_budget_usd params to CreateScheduleTool schema.
Default tier to "fast". Show tier/budget in list output. Pass both
fields through ScheduledTaskArgs in daemon/scheduler.go.
Add trigger_source, trigger_query, last_trigger_id to INSERT/SELECT/scan.
Add ListEnabledReactive and UpdateLastTriggerID functions.
Pass SyncNotifier through daemon startup to applenotes, imessage,
whatsapp, and gmail sync. Each notifies on successful sync. Scheduler
receives the notifier for reactive trigger evaluation.
Listen on SyncNotifier channel for sync completion signals. On each
signal, load reactive schedules for that source, evaluate trigger
queries against source DB, enqueue matching tasks with augmented
prompt, and update watermark to prevent re-firing.
Add reactive to type enum with trigger_source and trigger_query params.
Validate source and query on creation. Show reactive info in list output.
Add TestSpec_ReactiveEmailTrigger (full pipeline: seed emails, create
reactive schedule, trigger check, River job, LLM, pusher, watermark).
Add TestSpec_ReactiveNoMatchDoesNotFire (negative test).
Add CheckReactiveTriggersForTest helper to daemon/scheduler.go.
Add "tasks" to createSourceDirs in local_fixture.go.
Cover ListEnabledReactive, UpdateLastTriggerID, MarkCompleted,
tasks.Cleanup, and BudgetTracker.Total to close coverage gaps.
…ent SQL injection

The denylist approach was bypassable (missing UNION, SELECT, ATTACH, etc.).
Now uses an allowlist of permitted column names per source and SQL keywords.
String literals are stripped before checking to avoid false positives.
WhatsApp uses streaming sync (Follow: true) that blocks until context
cancellation. The notifier was only firing on exit, so reactive triggers
for WhatsApp data never evaluated during normal operation.
The identical openTaskTracker function existed in both internal/cli/chat.go
and channel/telegram/session.go. Moved DB-opening logic to tools package.
The %v formatting of map[string]string produced unreadable Go syntax.
Now formats as sorted key-value pairs (e.g. "from_addr: x | subject: y").
Auth and context-window errors are already cancelled via river.JobCancel
in Work(), so NextRetry is never called for those. Removed unreachable
switch cases and updated tests accordingly.
Connection strings (DATABASE_URL, REDIS_URL, MONGODB_URI, etc.) and
private keys were not caught by the existing patterns.
Verifies the agent loop stops on budget exceeded, returns partial text
with the error, and passes through normally when under budget.
@priyanshujain priyanshujain merged commit b656cf2 into master Mar 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant