Background agent improvements: retry, env scrub, tasks, budget, reactive triggers by priyanshujain · Pull Request #108 · 73ai/openbotkit

priyanshujain · 2026-03-20T11:02:29Z

Summary

Six improvements to background agent capabilities, implemented across 34 atomic commits:

Error-classified retry — Fix dead-code NextRetryAt → NextRetry bug in River worker. Classify errors (auth, rate-limit, transient) for adaptive retry timing. Cancel non-retryable jobs immediately with river.JobCancel.
Environment scrubbing — Strip sensitive env vars (API keys, tokens, secrets) from all external agent subprocesses (Gemini, Codex), not just Claude. Uses allowlist/blocklist pattern matching.
Persistent task tracking — New service/tasks/ package with SQLite/Postgres schema. TaskTracker gets write-through DB backing for cross-session task visibility. 7-day auto-cleanup.
Budget controls — BudgetTracker wraps UsageRecorder to enforce per-task cost limits. Schedules carry model_tier (default: fast) and max_budget_usd. Shared pricing extracted to provider/pricing.go.
Event-driven reactive triggers — New reactive schedule type. SQL WHERE clauses evaluated against synced data (gmail, whatsapp, imessage, applenotes) with watermark tracking. SyncNotifier signals trigger evaluation after each sync cycle.
Spec tests — End-to-end tests with real LLM: reactive email trigger pipeline and no-match negative test.

Review fixes (post-review)

Seven fixes from PR review:

P0 — SQL injection in trigger queries — Replaced denylist (DROP, DELETE, etc.) with allowlist of permitted column names per source + allowed SQL keywords. String literals stripped before validation. Blocks UNION SELECT, subqueries, ATTACH, PRAGMA, load_extension, unknown columns.
P0 — WhatsApp reactive triggers never fired — notifier.Notify("whatsapp") was only called on goroutine exit (streaming sync). Added periodic 30s ticker to match cadence of other sync sources.
P1 — Deduplicated openTaskTracker — Identical function in internal/cli/chat.go and channel/telegram/session.go extracted to tools.OpenPersistentTaskTracker.
P1 — Unreadable task row formatting — %v on map[string]string produced Go map syntax. Now formats as sorted key: value | key: value pairs.
P2 — Dead code in NextRetry — Auth/context-window error cases were unreachable (already cancelled in Work() via river.JobCancel). Removed.
P2 — Env scrub gaps — Added DATABASE_URL, REDIS_URL, MONGODB_URI, _DSN, _PRIVATE_KEY to sensitive patterns.
P2 — Missing budget test — Added TestLoop_BudgetExceeded and TestLoop_BudgetNotExceeded to verify agent loop stops on budget exceeded with partial text.

Test plan

go test ./service/scheduler/ — store CRUD, reactive queries, trigger validation (allowlist)
go test ./service/tasks/ — insert, complete, fail, list, cleanup
go test ./agent/ — budget tracker (incl. agent loop integration), persistent task tracker, env scrubbing
go test ./daemon/jobs/ — sync notifier, scheduled task retry classification
go test ./provider/ — shared pricing estimation
Spec tests pass with Gemini (OBK_TEST_PROVIDER=gemini go test ./spectest/ -run Reactive -v)

NextRetryAt was dead code — River's Worker interface expects NextRetry, so the custom 15-minute retry backoff was never active.

Auth/context-window errors delay retry by 1 year (effectively never). 429 rate limits retry after 30min, 5xx after 10min, default stays 15min.

Auth and context-window errors are now immediately cancelled instead of retrying. Uses job.MaxAttempts instead of hardcoded 2 for retry check.

Gives scheduled tasks one more retry attempt before permanent failure notification, working alongside the new error-classified retry backoff.

Adds scrubEnv function that strips API keys, tokens, secrets, and passwords from environment variables passed to external agent processes. Uses allowlist for safe vars and pattern matching for sensitive ones.

Previously only Claude had CLAUDECODE stripped. Now all agents (Gemini, Codex, Claude) get sensitive env vars removed via scrubEnv.

Dual SQLite/Postgres schema for the tasks table. Stores task ID, prompt, agent kind, status, timestamps, output, and error.

Insert, SetCompleted, SetFailed, Get, List, CountRunning, DeleteOlderThan. Get returns (nil, nil) for not-found. List orders by started_at DESC.

Tests: InsertAndGet, SetCompleted, SetFailed, GetNotFound, List ordering, CountRunning, DeleteOlderThan, Migrate idempotent.

Cleanup deletes completed/failed tasks older than 7 days. Called on tracker construction (best-effort, errors logged but not propagated).

Adds TasksConfig struct, TasksDataDSN() method (default ~/.obk/tasks/data.db), and wires into Default() and applyDefaults().

TaskTracker now optionally writes to a database. Start/Complete/Fail write-through to DB. Get falls through to DB for cross-session lookup. List reads from DB when available. DB errors are logged, never block.

Tests cross-session Get/List via DB fallthrough and write-through verification (Start writes to DB, Complete updates DB).

Adds openTaskTracker helper that opens tasks DB and creates a persistent tracker. Falls back to in-memory on DB error. Defers tracker.Close().

Tracker lives for lifetime of SessionManager, persisting tasks across session restarts. Falls back to in-memory on DB error.

DelegateTaskConfig now accepts AuditLogger. Async task completion/failure is logged with context="delegated". Scheduled task worker records results in tasks DB when TasksDB is set.

Moves model pricing data from internal/cli/usage.go to provider package. Adds EstimateCost function with prefix-matching for versioned model names.

Replaces local modelPricing map and estimateCost with thin wrapper around provider.EstimateCost. Removes duplicate pricing data.

BudgetTracker wraps UsageRecorder, accumulates cost per LLM call, and returns error when max budget is exceeded. 0 means unlimited.

After each LLM call, checks the budget. On exceeded, returns partial text and error for graceful degradation.

Add model_tier and max_budget_usd to INSERT/SELECT/scan operations. Remove unused alterBudget const from schema.go.

Add ModelTier and MaxBudgetUSD to ScheduledTaskArgs. Use Router.Resolve to pick model by tier (default "fast"). Wire BudgetTracker when budget > 0.

Add model_tier and max_budget_usd params to CreateScheduleTool schema. Default tier to "fast". Show tier/budget in list output. Pass both fields through ScheduledTaskArgs in daemon/scheduler.go.

Add trigger_source, trigger_query, last_trigger_id to INSERT/SELECT/scan. Add ListEnabledReactive and UpdateLastTriggerID functions.

Pass SyncNotifier through daemon startup to applenotes, imessage, whatsapp, and gmail sync. Each notifies on successful sync. Scheduler receives the notifier for reactive trigger evaluation.

Listen on SyncNotifier channel for sync completion signals. On each signal, load reactive schedules for that source, evaluate trigger queries against source DB, enqueue matching tasks with augmented prompt, and update watermark to prevent re-firing.

Add reactive to type enum with trigger_source and trigger_query params. Validate source and query on creation. Show reactive info in list output.

Add TestSpec_ReactiveEmailTrigger (full pipeline: seed emails, create reactive schedule, trigger check, River job, LLM, pusher, watermark). Add TestSpec_ReactiveNoMatchDoesNotFire (negative test). Add CheckReactiveTriggersForTest helper to daemon/scheduler.go. Add "tasks" to createSourceDirs in local_fixture.go.

Cover ListEnabledReactive, UpdateLastTriggerID, MarkCompleted, tasks.Cleanup, and BudgetTracker.Total to close coverage gaps.

…ent SQL injection The denylist approach was bypassable (missing UNION, SELECT, ATTACH, etc.). Now uses an allowlist of permitted column names per source and SQL keywords. String literals are stripped before checking to avoid false positives.

WhatsApp uses streaming sync (Follow: true) that blocks until context cancellation. The notifier was only firing on exit, so reactive triggers for WhatsApp data never evaluated during normal operation.

The identical openTaskTracker function existed in both internal/cli/chat.go and channel/telegram/session.go. Moved DB-opening logic to tools package.

The %v formatting of map[string]string produced unreadable Go syntax. Now formats as sorted key-value pairs (e.g. "from_addr: x | subject: y").

Auth and context-window errors are already cancelled via river.JobCancel in Work(), so NextRetry is never called for those. Removed unreachable switch cases and updated tests accordingly.

Connection strings (DATABASE_URL, REDIS_URL, MONGODB_URI, etc.) and private keys were not caught by the existing patterns.

Verifies the agent loop stops on budget exceeded, returns partial text with the error, and passes through normally when under budget.

priyanshujain added 30 commits March 20, 2026 17:18

fix(jobs): rename NextRetryAt to NextRetry to match River interface

aa7f075

NextRetryAt was dead code — River's Worker interface expects NextRetry, so the custom 15-minute retry backoff was never active.

feat(jobs): classify errors for adaptive retry timing

03f2dff

Auth/context-window errors delay retry by 1 year (effectively never). 429 rate limits retry after 30min, 5xx after 10min, default stays 15min.

feat(jobs): cancel non-retryable jobs with river.JobCancel

161ac8c

Auth and context-window errors are now immediately cancelled instead of retrying. Uses job.MaxAttempts instead of hardcoded 2 for retry check.

feat(scheduler): increase MaxAttempts to 3

a9c0dcb

Gives scheduled tasks one more retry attempt before permanent failure notification, working alongside the new error-classified retry backoff.

feat(tools): add scrubEnv to remove sensitive env vars

09185b2

Adds scrubEnv function that strips API keys, tokens, secrets, and passwords from environment variables passed to external agent processes. Uses allowlist for safe vars and pattern matching for sensitive ones.

fix(tools): scrub env for all external agent runners

a75d25b

Previously only Claude had CLAUDECODE stripped. Now all agents (Gemini, Codex, Claude) get sensitive env vars removed via scrubEnv.

feat(tasks): add schema and migration

9b2aa73

Dual SQLite/Postgres schema for the tasks table. Stores task ID, prompt, agent kind, status, timestamps, output, and error.

feat(tasks): add CRUD store functions

da1f86d

Insert, SetCompleted, SetFailed, Get, List, CountRunning, DeleteOlderThan. Get returns (nil, nil) for not-found. List orders by started_at DESC.

test(tasks): add store tests

4a3bdbd

Tests: InsertAndGet, SetCompleted, SetFailed, GetNotFound, List ordering, CountRunning, DeleteOlderThan, Migrate idempotent.

feat(tasks): add cleanup with 7-day retention

70d501c

Cleanup deletes completed/failed tasks older than 7 days. Called on tracker construction (best-effort, errors logged but not propagated).

feat(config): add tasks storage config

be06c19

Adds TasksConfig struct, TasksDataDSN() method (default ~/.obk/tasks/data.db), and wires into Default() and applyDefaults().

refactor(tools): add NewPersistentTaskTracker with DB backing

5541116

TaskTracker now optionally writes to a database. Start/Complete/Fail write-through to DB. Get falls through to DB for cross-session lookup. List reads from DB when available. DB errors are logged, never block.

test(tools): add persistent task tracker tests

3973e15

Tests cross-session Get/List via DB fallthrough and write-through verification (Start writes to DB, Complete updates DB).

feat(cli): wire persistent task tracker

7c08f05

Adds openTaskTracker helper that opens tasks DB and creates a persistent tracker. Falls back to in-memory on DB error. Defers tracker.Close().

feat(telegram): wire persistent task tracker

352209d

Tracker lives for lifetime of SessionManager, persisting tasks across session restarts. Falls back to in-memory on DB error.

feat(delegate): add audit logging with context=delegated

b331f30

DelegateTaskConfig now accepts AuditLogger. Async task completion/failure is logged with context="delegated". Scheduled task worker records results in tasks DB when TasksDB is set.

feat(provider): extract shared pricing map

736de9f

Moves model pricing data from internal/cli/usage.go to provider package. Adds EstimateCost function with prefix-matching for versioned model names.

refactor(cli): use shared pricing from provider package

09023f0

Replaces local modelPricing map and estimateCost with thin wrapper around provider.EstimateCost. Removes duplicate pricing data.

feat(agent): add BudgetTracker and BudgetChecker

25456cf

BudgetTracker wraps UsageRecorder, accumulates cost per LLM call, and returns error when max budget is exceeded. 0 means unlimited.

feat(agent): add WithBudgetChecker option to agent loop

28382fc

After each LLM call, checks the budget. On exceeded, returns partial text and error for graceful degradation.

feat(scheduler): update store for budget columns

c506ae4

Add model_tier and max_budget_usd to INSERT/SELECT/scan operations. Remove unused alterBudget const from schema.go.

feat(provider): export Router.Resolve method

f9c613f

feat(jobs): use model tier and budget in scheduled task worker

13eaa0c

Add ModelTier and MaxBudgetUSD to ScheduledTaskArgs. Use Router.Resolve to pick model by tier (default "fast"). Wire BudgetTracker when budget > 0.

feat(tools): add model_tier and max_budget_usd to schedule tools

baa74f6

Add model_tier and max_budget_usd params to CreateScheduleTool schema. Default tier to "fast". Show tier/budget in list output. Pass both fields through ScheduledTaskArgs in daemon/scheduler.go.

feat(scheduler): add reactive type and trigger fields to types

265be6e

feat(scheduler): add trigger columns to schema

ad973e1

feat(scheduler): update store for trigger columns

d042624

Add trigger_source, trigger_query, last_trigger_id to INSERT/SELECT/scan. Add ListEnabledReactive and UpdateLastTriggerID functions.

feat(scheduler): add trigger validation and query builder

ec00309

test(scheduler): add trigger tests

591a599

feat(daemon): add SyncNotifier for sync completion signals

3bc608d

priyanshujain added 12 commits March 20, 2026 17:44

feat(daemon): wire sync notifier into all sync goroutines

23107c7

Pass SyncNotifier through daemon startup to applenotes, imessage, whatsapp, and gmail sync. Each notifies on successful sync. Scheduler receives the notifier for reactive trigger evaluation.

feat(tools): add reactive type to schedule tools

ddfdd2b

Add reactive to type enum with trigger_source and trigger_query params. Validate source and query on creation. Show reactive info in list output.

test(coverage): add tests for reactive store, cleanup, and budget Total

b02055c

Cover ListEnabledReactive, UpdateLastTriggerID, MarkCompleted, tasks.Cleanup, and BudgetTracker.Total to close coverage gaps.

fix(daemon): notify reactive triggers periodically during WhatsApp sync

c63f1ee

WhatsApp uses streaming sync (Follow: true) that blocks until context cancellation. The notifier was only firing on exit, so reactive triggers for WhatsApp data never evaluated during normal operation.

refactor(tools): extract OpenPersistentTaskTracker to deduplicate

6b68654

The identical openTaskTracker function existed in both internal/cli/chat.go and channel/telegram/session.go. Moved DB-opening logic to tools package.

fix(daemon): format reactive trigger rows as key-value pairs

59c1515

The %v formatting of map[string]string produced unreadable Go syntax. Now formats as sorted key-value pairs (e.g. "from_addr: x | subject: y").

fix(jobs): remove dead code in NextRetry for cancelled error types

0820ad2

Auth and context-window errors are already cancelled via river.JobCancel in Work(), so NextRetry is never called for those. Removed unreachable switch cases and updated tests accordingly.

fix(tools): expand env scrub to catch DATABASE_URL, DSN, PRIVATE_KEY

8c8b5c1

Connection strings (DATABASE_URL, REDIS_URL, MONGODB_URI, etc.) and private keys were not caught by the existing patterns.

test(agent): add tests for budget-exceeded in agent loop

ee84c06

Verifies the agent loop stops on budget exceeded, returns partial text with the error, and passes through normally when under budget.

priyanshujain merged commit b656cf2 into master Mar 20, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Background agent improvements: retry, env scrub, tasks, budget, reactive triggers#108

Background agent improvements: retry, env scrub, tasks, budget, reactive triggers#108
priyanshujain merged 42 commits intomasterfrom
background-agent-refactor

priyanshujain commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

priyanshujain commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review fixes (post-review)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

priyanshujain commented Mar 20, 2026 •

edited

Loading