Akashi automatically detects when AI agents make contradictory decisions. This document covers the full conflict pipeline: how conflicts are found, how they're scored, and how to resolve them.
When a decision is recorded via POST /v1/trace, Akashi asynchronously:
- Retrieves candidates — finds semantically similar past decisions via Qdrant (or brute-force cosine similarity in local-lite mode).
- Scores significance — computes a composite score from topic similarity, outcome divergence, confidence, and temporal decay.
- Validates via LLM (optional) — classifies the relationship as contradiction, supersession, complementary, refinement, or unrelated.
- Persists conflicts — only contradictions and supersessions are stored.
- Notifies — fires a
LISTEN/NOTIFYevent on theakashi_conflictschannel.
Every candidate pair receives a significance score:
significance = topic_similarity × outcome_divergence × confidence_weight × temporal_decay
| Component | Formula | Meaning |
|---|---|---|
| Topic similarity | cosine(embedding_A, embedding_B) |
How related the decisions are |
| Outcome divergence | 1 - cosine(outcome_emb_A, outcome_emb_B) |
How much the conclusions differ |
| Confidence weight | sqrt(confidence_A × confidence_B) |
Penalizes low-confidence decisions |
| Temporal decay | exp(-lambda × days_between) |
Older conflicts matter less |
The default significance threshold is 0.30 — pairs below this are not persisted.
When two decisions have high topic similarity (≥ 0.70), Akashi also performs claim-level analysis. Each outcome is split into individual claims, and claim pairs are scored independently. If claim-level scoring produces a higher significance than full-outcome scoring, the claim-level result wins.
Claims are extracted using one of two strategies:
-
Regex extraction (default): Splits on markdown lists, sentence boundaries, numbered items, colons, and semicolons. Filters boilerplate phrases ("LGTM", "all tests pass", etc.) and deduplicates. Minimum claim length: 20 characters.
-
LLM extraction (opt-in via
AKASHI_CLAIM_EXTRACTION_LLM=true): Uses the configured conflict LLM (Ollama or OpenAI) to extract structured claims with categories:Category Participates in conflict scoring? findingYes assessmentYes recommendationNo (informational only) statusNo (informational only) LLM extraction produces higher-quality claims but adds latency and cost. Use it when precision matters more than throughput.
When claim-level scoring wins, the specific claim_text_a and claim_text_b that
produced the highest score are persisted on the conflict record. These appear in the UI
to help reviewers understand exactly which statements are in tension.
When an LLM validator is configured (Ollama or OpenAI), candidate pairs that pass the significance threshold are classified into relationship types:
| Relationship | Meaning | Creates conflict? |
|---|---|---|
contradiction |
Incompatible positions on the same question | Yes |
supersession |
One decision explicitly replaces the other | Yes |
complementary |
Different findings about different aspects | No |
refinement |
One deepens the other without contradicting | No |
unrelated |
Different topics despite surface similarity | No |
The validator also assigns:
- Category:
factual,assessment,strategic, ortemporal - Severity:
critical,high,medium, orlow
The LLM prompt includes contextual signals that reduce false positives:
- Cross-project: Decisions from different projects are flagged as almost certainly unrelated.
- Same session: Sequential decisions in the same session (e.g., review → fix) are hinted as likely refinement or complementary.
- Workflow pairs: Known cause-and-effect patterns (code_review → bug_fix, assessment → implementation) are flagged.
When AKASHI_CONFLICT_EARLY_EXIT_FLOOR > 0 (default: 0.25), candidates are sorted by
pre-computed significance in descending order. Candidates below the floor are skipped
unless they qualify for "direct-to-scorer" bypass (high topic similarity pairs that only
an LLM can properly classify).
An optional cross-encoder can pre-filter candidates before LLM validation, reducing LLM calls by 50–80%. Configure with:
| Variable | Default | Description |
|---|---|---|
AKASHI_CROSS_ENCODER_URL |
(empty) | Cross-encoder service endpoint. Empty = disabled. |
AKASHI_CROSS_ENCODER_THRESHOLD |
0.50 |
Minimum cross-encoder score to proceed to LLM. |
Pairs below the threshold are skipped without an LLM call. If the cross-encoder service is unreachable, the system falls open to LLM validation (no silent suppression).
Decisions linked by supersedes_id (intentional revisions) are excluded from conflict
detection. The scorer walks both forward and backward through the revision chain (up to
100 hops) to find all related decisions.
| Status | Meaning | Transitions to |
|---|---|---|
open |
Detected, awaiting action | acknowledged, resolved, wont_fix |
acknowledged |
Reviewed but no action taken | resolved, wont_fix |
resolved |
Resolved with declared winner | Terminal |
wont_fix |
Dismissed as not a real conflict | Terminal |
PATCH /v1/conflicts/{id}
{
"status": "resolved",
"resolution_note": "Team agreed on approach A",
"winning_decision_id": "uuid-of-winning-decision"
}POST /v1/conflicts/{id}/adjudicate
{
"outcome": "Chose microservices based on team capacity",
"reasoning": "Monolith would require rewrite of auth layer",
"winning_decision_id": "uuid-of-winning-decision"
}This atomically creates a new decision (the adjudication trace) and resolves the conflict. The adjudication itself becomes part of the audit trail.
Conflicts between the same agent pair, decision type, and conflict kind are grouped. Resolve all open/acknowledged conflicts in a group with one call:
PATCH /v1/conflict-groups/{id}/resolve
{
"status": "resolved",
"resolution_note": "Cleaning up stale conflicts after architecture review",
"winning_agent": "planner-agent"
}When winning_agent is provided, the winning decision for each conflict is automatically
derived from the agent columns.
When a decision is revised (superseded by a new version), all open conflicts involving the old decision are automatically resolved. This prevents stale conflicts from accumulating when agents correct themselves.
For unresolved conflicts, GET /v1/conflicts/{id} returns an optional recommendation
field computed from four heuristic signals:
| Signal | Weight | Logic |
|---|---|---|
| Confidence delta | 0.35 | Higher-confidence decision favored |
| Recency | 0.25 | More recent decision favored (tanh-normalized over 1 week) |
| Agent win rate | 0.25 | Agent's historical success rate on this decision type (requires ≥ 3 prior resolutions) |
| Revision depth | 0.15 | Deeper revision chain = more refined position (tanh-normalized at depth 3) |
Recommendations are computed on-demand (not stored) and only returned when the composite signal exceeds 0.10. Each recommendation includes human-readable reasons sorted by contribution magnitude.
The UI surfaces recommendations in the adjudicate dialog with a one-click "Accept recommendation" button.
GET /v1/conflicts/analytics
Returns aggregated conflict metrics over a time window:
- Summary: Total detected, total resolved, mean time to resolution, false-positive rate
- Breakdowns: By agent pair, decision type, severity
- Trend: Daily detected-vs-resolved counts
| Param | Values | Default | Description |
|---|---|---|---|
period |
7d, 30d, 90d |
7d |
Convenience time window |
from / to |
RFC 3339 timestamps | — | Custom range (overrides period, max 365 days) |
agent_id |
string | — | Filter by agent |
decision_type |
string | — | Filter by type |
conflict_kind |
cross_agent, self_contradiction |
— | Filter by kind |
When OpenTelemetry is configured, the conflict pipeline emits these metrics:
| Metric | Attributes | Description |
|---|---|---|
akashi.conflicts.detected |
scoring_method, relationship, conflict_kind, severity | Conflicts found |
akashi.conflicts.resolved |
status, conflict_kind | Conflicts resolved |
akashi.conflicts.llm_calls |
result (success/error/timeout), validator | LLM validation calls |
akashi.conflicts.candidates_evaluated |
— | Candidate pairs evaluated |
akashi.conflicts.claim_level_wins |
— | Times claim scoring beat full-outcome scoring |
| Metric | Unit | Description |
|---|---|---|
akashi.conflicts.scoring_duration_ms |
ms | Per-decision conflict scoring latency |
akashi.conflicts.llm_call_duration_ms |
ms | Per-call LLM validation latency |
akashi.conflicts.significance_distribution |
— | Significance scores of detected conflicts |
akashi.conflicts.candidates_examined |
— | Candidates examined after early exit |
| Metric | Description |
|---|---|
akashi.conflicts.open_total |
Current open + acknowledged conflicts |
akashi.conflicts.backfill_remaining |
Decisions with embeddings not yet conflict-scored |
akashi.conflicts.open_total> 50: Conflicts are accumulating faster than resolution. Review triage workflow.akashi.conflicts.llm_calls{result=error}sustained: LLM service degraded. Check Ollama/OpenAI connectivity.akashi.conflicts.scoring_duration_msp99 > 5000: Scoring is slow. Consider enabling early exit or cross-encoder.akashi.conflicts.backfill_remainingnot decreasing: Backfill stalled. Check embedding provider health.
| Variable | Default | Description |
|---|---|---|
AKASHI_CONFLICT_SIGNIFICANCE_THRESHOLD |
0.30 |
Minimum significance to persist a conflict |
AKASHI_CONFLICT_CANDIDATE_LIMIT |
20 |
Max candidates retrieved from Qdrant per decision |
AKASHI_CONFLICT_EARLY_EXIT_FLOOR |
0.25 |
Min pre-LLM significance for early exit (0 disables) |
AKASHI_CONFLICT_DECAY_LAMBDA |
0.01 |
Temporal decay rate (0 disables; ~70 days to half significance) |
AKASHI_CONFLICT_BACKFILL_WORKERS |
4 |
Parallel workers for batch scoring |
AKASHI_CONFLICT_LLM_MODEL |
(empty) | LLM model for validation (e.g., qwen3.5:9b) |
AKASHI_CONFLICT_LLM_THREADS |
floor(NumCPU/3) |
CPU threads for Ollama |
AKASHI_CONFLICT_CLAIM_TOPIC_SIM_FLOOR |
0.60 |
Min cosine similarity for claim pairs |
AKASHI_CONFLICT_CLAIM_DIV_FLOOR |
0.15 |
Min outcome divergence for claim pairs |
AKASHI_CONFLICT_DECISION_TOPIC_SIM_FLOOR |
0.70 |
Min decision similarity to activate claim scoring |
AKASHI_CROSS_ENCODER_URL |
(empty) | Cross-encoder reranking endpoint |
AKASHI_CROSS_ENCODER_THRESHOLD |
0.50 |
Min cross-encoder score for LLM validation |
AKASHI_CLAIM_EXTRACTION_LLM |
false |
Use LLM for structured claim extraction |
AKASHI_FORCE_CONFLICT_RESCORE |
false |
Clear and re-score all conflicts at startup |
POST /v1/admin/conflicts/validate-pair
{
"outcome_a": "Use microservices for the payment system",
"outcome_b": "Use a monolith for the payment system",
"type_a": "architecture",
"type_b": "architecture"
}Returns the relationship, category, severity, and explanation without persisting anything.
POST /v1/admin/conflicts/eval
Runs the built-in evaluation dataset and returns precision, recall, F1, and accuracy. Use this after changing the LLM model or tuning thresholds to verify detection quality.