Production operations guide for the Akashi decision trace server.
For lifecycle-specific retention, archival, and reconciliation procedures, see:
docs/operations/data-lifecycle.md.
GET /health
No authentication required.
{
"data": {
"status": "healthy",
"version": "1.0.0",
"postgres": "connected",
"qdrant": "connected",
"buffer_depth": 0,
"buffer_status": "ok",
"sse_broker": "running",
"uptime_seconds": 86400
},
"meta": {
"request_id": "9a4c58db-8d9f-4cad-9ec8-c9476e4af9a6",
"timestamp": "2026-02-14T04:21:00Z"
}
}Field (under data) |
Healthy Value | Unhealthy Value |
|---|---|---|
status |
"healthy" |
"unhealthy" |
postgres |
"connected" |
"disconnected" |
qdrant |
"connected" |
"disconnected" |
buffer_status |
"ok" |
"high"/"critical" |
HTTP status is 200 when healthy, 503 when unhealthy. The endpoint returns 503 if and only if PostgreSQL is unreachable. Qdrant being down does NOT cause a 503 -- the system degrades to text search.
The qdrant field is omitted entirely when Qdrant is not configured (no QDRANT_URL).
The sse_broker field is omitted when SSE/NOTIFY is disabled.
# Kubernetes liveness probe
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
# Kubernetes readiness probe
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2For AWS ALB/NLB target groups, use /health with expected status 200.
Metrics are exported via OTLP/HTTP to the endpoint specified by OTEL_EXPORTER_OTLP_ENDPOINT. The metric reader flushes every 15 seconds. Traces are batched every 5 seconds. If the endpoint is not set, OTEL is disabled (no-op providers).
| Metric | Type | Unit | Labels |
|---|---|---|---|
http.server.request_count |
Counter | 1 | http.method, http.route, http.status_code, akashi.agent_id |
http.server.duration |
Histogram | ms | http.method, http.route, http.status_code, akashi.agent_id |
akashi.buffer.depth |
Gauge | 1 | (none) |
akashi.buffer.dropped_total |
Gauge | 1 | (none; ingress rejections due to capacity or shutdown drain) |
akashi.embedding.duration |
Histogram | ms | (none) |
akashi.search.duration |
Histogram | ms | (none) |
akashi.outbox.depth |
Gauge | 1 | (none, via pg_class.reltuples estimate) |
Trace spans include http.method, http.url, http.request_id, http.status_code, akashi.agent_id, and akashi.role.
| Condition | Query / Check | Suggested Threshold | Severity |
|---|---|---|---|
| Request latency p99 | histogram_quantile(0.99, http.server.duration) |
> 2000 ms for 5 min | Warning |
| Request latency p99 | histogram_quantile(0.99, http.server.duration) |
> 5000 ms for 5 min | Critical |
| 5xx error rate | rate(http.server.request_count{http.status_code=~"5.."}) |
> 1% of total for 5 min | Warning |
| 5xx error rate | rate(http.server.request_count{http.status_code=~"5.."}) |
> 5% of total for 2 min | Critical |
| Health endpoint down | GET /health returns non-200 |
3 consecutive failures | Critical |
| Outbox lag (stuck entries) | SELECT count(*) FROM search_outbox WHERE attempts > 0 |
> 100 entries for 10 min | Warning |
| Outbox dead letters | SELECT count(*) FROM search_outbox WHERE attempts >= 10 |
> 0 | Critical |
| Event ingestion rejected | akashi.buffer.dropped_total increasing OR log line "trace: buffer at capacity" |
Any occurrence | Critical |
| PostgreSQL pool exhaustion | pgxpool metrics or connection wait time | > 80% utilization | Warning |
| Qdrant health | /health response qdrant: "disconnected" |
Sustained > 5 min | Warning |
| Rate limit 429s | rate(http.server.request_count{http.status_code="429"}) |
> 10/s sustained | Warning |
Akashi logs JSON to stdout. Key log messages to monitor:
| Log Message (substring) | Meaning |
|---|---|
"trace: flush failed" |
Event buffer failed to write to PostgreSQL |
"trace: buffer at capacity" |
Event ingestion backpressure engaged (request rejected) |
"trace: buffer is draining" |
Node is shutting down; new event ingestion rejected |
"search outbox: dead-letter entry" |
Outbox entry exceeded 10 retry attempts |
"search outbox: qdrant upsert" + error |
Qdrant write failure (entries will retry) |
"storage: notify reconnect attempt failed" |
LISTEN/NOTIFY connection dropped, attempting recovery |
"conflict refresh failed" |
(Obsolete: conflicts are event-driven; RefreshConflicts is now a no-op) |
"rate limiter error, permitting request" |
Limiter malfunction; request allowed (fail-open) |
"rate limit exceeded" |
Agent hit rate limit; request rejected with 429 |
Symptoms: /health shows qdrant: "disconnected". POST /v1/search returns degraded results (text fallback). Outbox entries accumulate.
Impact: Semantic (vector) search unavailable. Text-based search still works. No data loss -- new decisions continue to be written to PostgreSQL and queued in the search_outbox table.
Remediation:
- Restore Qdrant.
- Outbox worker will automatically sync accumulated entries on next poll cycle.
- Verify:
SELECT count(*) FROM search_outbox WHERE attempts < 10;should trend to 0.
No operator intervention required once Qdrant is restored.
Symptoms: /health returns 503. All API requests fail with 500.
Impact: Complete service outage. No queries or writes succeed.
Remediation:
- Restore PgBouncer.
- If PgBouncer cannot be restored quickly, update
DATABASE_URLto point directly to PostgreSQL and restart Akashi. Be aware that direct connections bypass pooling -- monitor connection count.
Symptoms: SSE subscriptions (GET /v1/subscribe) stop receiving updates. Log lines: "storage: notify reconnect attempt failed" followed by "storage: notify connection restored" on success.
Impact: Real-time event streaming paused. All other functionality (API, ingestion, search) is unaffected.
Automatic recovery: The connection reconnects with exponential backoff (500ms base, doubling, up to 5 attempts with jitter). All previously subscribed channels (akashi_decisions, akashi_conflicts) are re-established on reconnect.
Remediation (if auto-reconnect fails after 5 attempts):
- Check that the
NOTIFY_URLPostgreSQL instance is reachable. - Restart the Akashi process to re-establish the connection.
Symptoms: POST /v1/trace and POST /v1/runs/{run_id}/events return errors. Log line: "trace: buffer at capacity".
Impact: New event ingestion is rejected (backpressure). Decisions and queries are unaffected.
Hard cap: 100,000 events in memory regardless of AKASHI_EVENT_BUFFER_SIZE.
Remediation:
- Check if PostgreSQL is accepting writes -- the buffer cannot flush if the database is down.
- Check for log line
"trace: flush failed"to identify the underlying cause. - If the load is legitimate, increase
AKASHI_EVENT_BUFFER_SIZE(up to 100,000) and restart. - Requests rejected at capacity increment
akashi.buffer.dropped_total; clients must retry to avoid event loss.
Symptoms: POST /v1/trace, POST /v1/runs, or POST /v1/runs/{run_id}/events returns 409 CONFLICT with a message about idempotency key mismatch or request already in progress.
Impact:
- No duplicate write is committed.
- A conflicting request is rejected until key/payload consistency is restored.
Common causes:
- Same
Idempotency-Keyreused for a different payload - Client retries while original request is still processing
- Very long-running request exceeded in-progress reclaim window
Remediation:
- Verify retries use the same payload bytes for the same key.
- For "already in progress", use exponential backoff and retry.
- Stale in-progress keys are cleared by the background cleanup job (
AKASHI_IDEMPOTENCY_ABANDONED_TTL). - Ensure key generation is unique per logical write operation.
Symptoms: SELECT count(*) FROM search_outbox WHERE attempts >= 10; returns non-zero. Log line: "search outbox: dead-letter entry".
Impact: Those specific decisions are not indexed in Qdrant. They exist in PostgreSQL and are queryable via SQL, but not via semantic search.
Common causes:
- Embedding dimension mismatch between Akashi config and Qdrant collection
- Qdrant collection deleted or renamed
- Persistent Qdrant connectivity issues
Remediation:
- Inspect the error:
SELECT id, decision_id, operation, attempts, last_error, created_at FROM search_outbox WHERE attempts >= 10 ORDER BY created_at DESC LIMIT 20;
- Fix the underlying issue (restore collection, fix dimensions, etc.).
- Reset attempts to allow retry:
UPDATE search_outbox SET attempts = 0, locked_until = NULL WHERE attempts >= 10;
- The outbox worker will pick them up on the next poll cycle.
Automatic cleanup: Dead-letter entries older than 7 days are archived to search_outbox_dead_letters, then removed from search_outbox (checked hourly).
Symptoms: Agents receiving 429 Too Many Requests responses. Log line: "rate limit exceeded".
Default limits: 100 requests/second sustained, 200 burst per agent per org.
Tuning:
AKASHI_RATE_LIMIT_RPS=200 # Double the sustained rate
AKASHI_RATE_LIMIT_BURST=500 # Allow larger burstsDisable entirely (not recommended for production):
AKASHI_RATE_LIMIT_ENABLED=falsePlatform admins are exempt from rate limiting. If a specific agent needs higher limits, either raise the global limit or promote the agent to platform_admin.
When Akashi runs behind a load balancer, set AKASHI_TRUST_PROXY=true so IP-based rate limits use the client IP from X-Forwarded-For instead of the proxy's address. Only enable when behind a trusted reverse proxy.
Symptoms: Log line "embedding: ... error". Decisions stored with embedding = NULL. Semantic search returns fewer results than expected.
Common causes:
AKASHI_EMBEDDING_PROVIDER=openaibutOPENAI_API_KEYis unset or invalid- Ollama is down or unreachable (check
OLLAMA_URLif using Ollama) - Embedding dimension mismatch between
AKASHI_EMBEDDING_DIMENSIONSand model output
Recovery: Fix the provider, then restart the server. The startup backfill job will embed any decisions that have embedding IS NULL.
Akashi uses Ed25519 (EdDSA) for JWT signing. Keys are loaded from PEM files at startup.
openssl genpkey -algorithm Ed25519 -out akashi-private.pem
openssl pkey -in akashi-private.pem -pubout -out akashi-public.pem
chmod 600 akashi-private.pem akashi-public.pemKey files must have permissions 0600 or stricter. The server refuses to start if they are world-readable.
- Generate new key pair (see above).
- Place the files where the server can read them.
- Update environment variables:
AKASHI_JWT_PRIVATE_KEY=/path/to/new/akashi-private.pem AKASHI_JWT_PUBLIC_KEY=/path/to/new/akashi-public.pem
- Restart the Akashi process.
- Default token lifetime: 24 hours (
AKASHI_JWT_EXPIRATION). - After rotation, existing tokens signed with the old key will fail validation immediately because the server only holds one public key in memory.
- There is no token revocation list. To force all sessions to re-authenticate, rotate keys and restart.
- If you need zero-downtime rotation, coordinate with clients to re-authenticate within the restart window.
If AKASHI_JWT_PRIVATE_KEY and AKASHI_JWT_PUBLIC_KEY are both unset, the server generates an ephemeral key pair in memory. Tokens are invalidated on every restart. Never use this in production -- a warning is logged at startup.
Migrations are managed by Atlas. Files live in migrations/ as sequential numbered SQL files.
# Apply pending migrations
atlas migrate apply --dir file://migrations --url "$DATABASE_URL"
# Validate migration integrity (checksums)
atlas migrate validate --dir file://migrations
# Rehash after modifying migration files
atlas migrate hash --dir file://migrationsOn startup: by default, the server applies migrations from the embedded migrations package (built into the binary). Migration failure is fatal — the server will not start.
If you run Atlas externally in production, set:
AKASHI_SKIP_EMBEDDED_MIGRATIONS=trueThis avoids startup migration races and keeps migration ownership with Atlas.
Standard pg_dump works. Key tables by priority:
| Table | Notes |
|---|---|
organizations |
Tenant configuration. Small. Always back up. |
agents |
Auth identities. Small. Always back up. |
agent_runs |
Trace run metadata. Moderate size. |
agent_events |
Append-only event log. Potentially very large. Consider partial or time-bounded backup. |
decisions |
Core decision data with embeddings. Can be large. |
alternatives |
Decision options. Required for complete decision reconstruction. |
evidence |
Decision evidence and citations. |
access_grants |
RBAC grants. Small. Always back up. |
scored_conflicts |
Conflict graph data used by conflict APIs. |
integrity_proofs |
Merkle batch proofs for tamper/audit verification. |
idempotency_keys |
Replay safety records for write APIs. |
schema_migrations |
Migration version tracking. |
search_outbox |
Pending sync queue for Qdrant. |
search_outbox_dead_letters |
Archived failed outbox entries (paper trail). |
deletion_audit_log |
Archived deleted records for destructive admin operations. |
mutation_audit_log |
Append-only ledger for API mutation paper trail. |
# Full backup
pg_dump "$DATABASE_URL" -Fc -f akashi-backup-$(date +%Y%m%d).dump
# Data-only backup of core tables (skip events)
pg_dump "$DATABASE_URL" -Fc --table=organizations --table=agents \
--table=decisions --table=agent_runs --table=access_grants \
-f akashi-core-$(date +%Y%m%d).dump- Stop all Akashi instances so no new writes arrive during restore.
- Restore PostgreSQL from a known-good dump:
pg_restore --clean --if-exists --no-owner --no-privileges \ -d "$DATABASE_URL" akashi-backup-YYYYMMDD.dump - Start Akashi and verify health:
curl -sf http://localhost:8080/health | jq .data - Run post-restore checks:
SELECT count(*) FROM decisions WHERE valid_to IS NULL; SELECT count(*) FROM agent_runs; SELECT count(*) FROM agent_events; SELECT count(*) FROM search_outbox WHERE attempts < 10;
- If Qdrant index state is stale or missing, repopulate outbox from current decisions, then allow worker replay:
INSERT INTO search_outbox (decision_id, org_id, operation) SELECT id, org_id, 'upsert' FROM decisions WHERE valid_to IS NULL AND embedding IS NOT NULL ON CONFLICT (decision_id, operation) DO UPDATE SET created_at = now(), attempts = 0, locked_until = NULL;
PostgreSQL is the source of truth. search_outbox is transient and cannot by itself reconstruct all historical sync intent after restore.
Automated verification helper:
# Verify restore invariants and table integrity checks
DATABASE_URL=postgres://... make verify-restore
# Optionally repopulate outbox from current decisions during drill recovery
DATABASE_URL=postgres://... REBUILD_OUTBOX=true make verify-restore-- Overall sync status
SELECT count(*) AS pending,
count(*) FILTER (WHERE attempts > 0) AS retrying,
count(*) FILTER (WHERE attempts >= 10) AS dead_letter,
max(attempts) AS max_attempts,
min(created_at) AS oldest_entry
FROM search_outbox;
-- Recent errors
SELECT decision_id, operation, attempts, last_error, created_at
FROM search_outbox
WHERE last_error IS NOT NULL
ORDER BY created_at DESC
LIMIT 10;Use reconciliation to detect and repair drift between PostgreSQL source-of-truth decisions and Qdrant indexed points.
# Detect drift (exit non-zero if mismatch exists)
DATABASE_URL=postgres://... QDRANT_URL=https://...:6333 make reconcile-qdrant
# Repair missing Qdrant points by queueing outbox upserts
DATABASE_URL=postgres://... QDRANT_URL=https://...:6333 make reconcile-qdrant-repairreconcile-qdrant-repair only queues missing entries into search_outbox; it does not delete extra Qdrant points automatically.
Use a single verifier to evaluate durability/consistency gates with structured JSON output:
DATABASE_URL=postgres://... make verify-exit-criteriaOptional thresholds:
MAX_DEAD_LETTERS(default0)MAX_OUTBOX_OLDEST_SECONDS(default1800)STRICT_RETENTION_CHECK(defaultfalse)RETAIN_DAYS(default90, only used when strict retention is enabled)
When QDRANT_URL is set, the verifier also checks Postgres/Qdrant drift by running reconciliation in read-only mode.
For protected branches (for example main), configure GitHub branch protection to require this status check before merge:
Verify Exit Criteria
Recommended minimum required checks:
CIBuild with UIVerify Exit Criteria
Use the admin-only endpoint:
DELETE /v1/agents/{agent_id}
Authorization: Bearer <admin-jwt>
X-Akashi-Org-Id: <org-uuid>This performs a transactional delete of the agent and related records (runs, events, decisions, access grants), and clears supersedes links that point at deleted decisions.
The endpoint is disabled by default; set AKASHI_ENABLE_DESTRUCTIVE_DELETE=true to allow execution.
agent_events is a Timescale hypertable and can grow quickly. Use archive-before-purge to preserve paper trail while controlling storage.
# Preview one archival window (safe default, no purge)
DATABASE_URL=postgres://... make archive-events-dry-run
# Archive then purge one window (explicit destructive mode)
DATABASE_URL=postgres://... DRY_RUN=false ENABLE_PURGE=true make archive-eventsOptional knobs:
RETAIN_DAYS(default90) - keep recent events in primary hypertableBATCH_DAYS(default1) - process one bounded time window per run to reduce lock pressure
Archive destination:
agent_events_archiveholds immutable historical rows moved out of the hot hypertable.
-- Active connections (run against PostgreSQL directly, not PgBouncer)
SELECT count(*) AS total,
count(*) FILTER (WHERE state = 'active') AS active,
count(*) FILTER (WHERE state = 'idle') AS idle
FROM pg_stat_activity
WHERE datname = 'akashi';A single Akashi binary handles approximately 1,000 req/s on modest hardware (4 vCPU, 8 GB RAM). The event buffer and COPY-based batch writes amortize database round trips.
Run multiple Akashi instances behind a load balancer.
| Component | Scaling Behavior |
|---|---|
| HTTP API | Stateless. Any instance can serve any request. |
| Event buffer | Per-instance. Each instance flushes its own buffer to PostgreSQL. |
| Outbox worker | Per-instance. Uses FOR UPDATE SKIP LOCKED -- multiple workers safely share work. |
| LISTEN/NOTIFY | Per-instance. Each instance maintains its own direct PostgreSQL connection. |
| SSE broker | Per-instance. Clients receive events only from the instance they are connected to. |
| JWT validation | Stateless. All instances must have the same public key. |
- PostgreSQL is the primary bottleneck. Scale read replicas for query load. Consider connection pooling (PgBouncer) tuning.
- Qdrant for vector search at scale. Monitor query latency via
http.server.durationon/v1/search.
SSE subscriptions are bound to the instance the client connects to. With multiple instances behind a load balancer, a client only receives events produced by its connected instance. For full coverage, clients should use polling (GET /v1/decisions/recent) or ensure sticky sessions.
See configuration.md for the full environment variable reference.
On SIGTERM or SIGINT, the server shuts down in this order:
1. HTTP server drains -- stops accepting new requests, completes in-flight (`AKASHI_SHUTDOWN_HTTP_TIMEOUT`)
2. Event buffer drains -- final flush to PostgreSQL (always indefinite, durability-first)
3. Outbox worker drains -- syncs remaining entries to Qdrant (`AKASHI_SHUTDOWN_OUTBOX_DRAIN_TIMEOUT`)
4. Database pools close -- PgBouncer pool + NOTIFY connection
5. OTEL flushes -- final trace/metric export
There is no single shared shutdown timeout. Each phase has its own timeout, and setting a timeout to 0 waits indefinitely.
- DO NOT send
kill -9during shutdown. If the event buffer is mid-flush, events in memory will be lost. - Buffer drain is durability-critical and runs without a timeout. Do not force-stop the process while draining.
- The outbox worker drain timeout (log:
"search outbox: drain timed out") means some outbox entries were not synced to Qdrant. They remain in PostgreSQL and will sync on next startup.
- Ensure load balancer has stopped sending traffic (remove from target group or mark unhealthy).
- Send
SIGTERM. - Wait for exit (can be indefinite when drain timeouts are set to
0). - Verify clean shutdown: log line
"akashi stopped"with exit code 0.
# Is the server running?
curl -sf http://localhost:8080/health | jq .data.status
# Outbox sync status
psql "$DATABASE_URL" -c "
SELECT count(*) AS pending,
count(*) FILTER (WHERE attempts > 0) AS retrying,
count(*) FILTER (WHERE attempts >= 10) AS dead_letter
FROM search_outbox;
"
# Recent outbox errors
psql "$DATABASE_URL" -c "
SELECT decision_id, attempts, last_error, created_at
FROM search_outbox
WHERE last_error IS NOT NULL
ORDER BY created_at DESC LIMIT 5;
"
# Decision count per org (capacity check)
psql "$DATABASE_URL" -c "
SELECT o.name, count(d.id) AS decisions
FROM organizations o
LEFT JOIN decisions d ON d.org_id = o.id AND d.valid_to IS NULL
GROUP BY o.name
ORDER BY decisions DESC;
"
# Active PostgreSQL connections
psql "$NOTIFY_URL" -c "
SELECT count(*) AS total,
count(*) FILTER (WHERE state = 'active') AS active
FROM pg_stat_activity
WHERE datname = 'akashi';
"
# Reset dead-letter outbox entries after fixing root cause
psql "$DATABASE_URL" -c "
UPDATE search_outbox SET attempts = 0, locked_until = NULL WHERE attempts >= 10;
"Akashi detects conflicts between agent decisions using an embedding-based scorer followed by an optional LLM validator. Two evaluation modes let you measure detection quality against ground truth.
| Term | Meaning |
|---|---|
| Scorer | Embedding similarity pipeline that flags candidate conflicts. Fast, cheap, always on. |
| Validator | LLM-based second pass that confirms or rejects scorer candidates. Slower, costs API tokens. |
| Ground truth label | Human judgment on a detected conflict: was it real? |
Every detected conflict (in scored_conflicts) can be labeled with one of three values:
| Label | Meaning | Scorer eval role |
|---|---|---|
genuine |
Real conflict — the decisions truly contradict each other | True positive |
related_not_contradicting |
Same topic but not actually contradictory (e.g. paraphrases) | False positive |
unrelated_false_positive |
Different topics entirely — should not have been flagged | False positive |
All label endpoints require admin authentication.
# Authenticate
TOKEN=$(curl -s http://localhost:8081/auth/token \
-d '{"agent_id":"admin","api_key":"ak_..."}' | jq -r .token)
# List detected conflicts to find IDs to label
curl -s http://localhost:8081/v1/admin/conflicts \
-H "Authorization: Bearer $TOKEN" | jq '.conflicts[:5]'
# Label a conflict as genuine
curl -X PUT http://localhost:8081/v1/admin/conflicts/{id}/label \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"label": "genuine", "notes": "clearly opposite caching strategies"}'
# Label a conflict as false positive
curl -X PUT http://localhost:8081/v1/admin/conflicts/{id}/label \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"label": "related_not_contradicting", "notes": "same decision, different wording"}'
# View a label
curl -s http://localhost:8081/v1/admin/conflicts/{id}/label \
-H "Authorization: Bearer $TOKEN" | jq .
# List all labels with counts
curl -s http://localhost:8081/v1/admin/conflict-labels \
-H "Authorization: Bearer $TOKEN" | jq .
# Delete a label (to re-label)
curl -X DELETE http://localhost:8081/v1/admin/conflicts/{id}/label \
-H "Authorization: Bearer $TOKEN"Once you have labeled conflicts, compute scorer precision. This measures what fraction of the scorer's detections are genuine conflicts.
Precision = genuine / (genuine + related_not_contradicting + unrelated_false_positive)
Note: recall cannot be computed from labels alone because labels only cover detected conflicts. Measuring recall requires a separate dataset of known conflicts that should have been detected.
# Via CLI (recommended)
export AKASHI_URL=http://localhost:8081
export AKASHI_AGENT_ID=admin
export AKASHI_API_KEY=ak_...
go run ./cmd/eval-conflicts --mode=scorer
# Save results to ./eval-results/
go run ./cmd/eval-conflicts --mode=scorer --save
# Via API directly
curl -X POST http://localhost:8081/v1/admin/scorer-eval \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{}' | jq .Example output:
Scorer Precision: 85.7% (6 TP, 1 FP, 7 labeled)
The validator eval runs a hardcoded dataset of 27 decision pairs through the LLM validator and measures precision and recall of the LLM's conflict/no-conflict judgments.
export AKASHI_URL=http://localhost:8081
export AKASHI_AGENT_ID=admin
export AKASHI_API_KEY=ak_...
# Run validator eval (requires LLM API access — costs tokens)
go run ./cmd/eval-conflicts --mode=validator
# Save results
go run ./cmd/eval-conflicts --mode=validator --saveThe validator eval requires the akashi server to have a working embedding/LLM provider configured.
The --save flag writes JSON results to ./eval-results/:
eval-results/
scorer_2026-03-07T14-30-00.json
validator_2026-03-07T14-35-00.json
This directory is gitignored. Results accumulate locally so you can track precision over time as you tune scorer thresholds or retrain embeddings.
A 300-pair synthetic dataset tests the scorer's embedding math in isolation (not real detection quality). This is gated behind an environment variable and requires a running TimescaleDB with testcontainers:
AKASHI_BENCH=1 go test -run TestScorerPrecisionRecall ./internal/conflicts/ -vThis is useful for verifying that threshold changes don't break the scorer's ability to distinguish orthogonal embeddings. It does not test real-world detection quality — use the label-based eval for that.
| Variable | Default | Description |
|---|---|---|
AKASHI_URL |
http://localhost:8081 |
Base URL of the akashi instance to evaluate |
AKASHI_AGENT_ID |
(required) | Agent ID for authentication |
AKASHI_API_KEY |
(required) | API key for admin authentication |
- Run your akashi instance locally on port 8081.
- Exercise the system — trace decisions, let the scorer detect conflicts.
- Review detected conflicts via the UI or
GET /v1/admin/conflicts. - Label 20+ conflicts across all three categories for a meaningful precision measurement.
- Run
go run ./cmd/eval-conflicts --mode=scorer --saveto compute precision. - Tune scorer thresholds (
AKASHI_CONFLICT_THRESHOLD, early exit floor) and re-evaluate. - Repeat after model or embedding changes to catch regressions.