You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Detail**: Fuzzy string matching (rapidfuzz ratio >=85%) flags semantically distinct values as potential duplicates when they are string-similar but numerically different. Observed in live testing:
28
-
-`'2016-05-02 20:00'` vs `'2016-05-02 22:00'` (92%) — different timestamps (8 PM vs 10 PM)
29
-
-`'50,000 dollars'` vs `'2,000 dollars'` (88%) — 25x difference in amount
30
-
-`'$5,000'` vs `'$50,000'` (88%) — 10x difference in amount
31
-
-**Impact**: No data corruption (flags only, not auto-merged). But these false flags add noise for Phase 8 LLM resolution, wasting tokens and potentially confusing the synthesis agent.
32
-
-**Fix**: Add type-aware matching logic before fuzzy comparison. For `entity_type` in (`timestamp`, `monetary_amount`, `date`, `other` when value looks numeric): parse the actual value and compare semantically instead of string-matching. E.g., for monetary amounts, extract the number and compare magnitude; for timestamps, parse and compare actual time difference.
33
-
-**Priority**: Medium (should be fixed before or during Phase 8 to avoid noisy LLM resolution input)
34
-
-**Phase**: Fix during Phase 8 (Synthesis) when fuzzy flags are consumed
-**Resolution**: The programmatic KG Builder with fuzzy dedup (`backend/app/services/kg_builder.py`) was entirely replaced by the LLM-based KG Builder Agent. The LLM uses a clear-and-rebuild strategy and handles deduplication naturally by seeing all findings holistically. No fuzzy string matching exists in the current pipeline.
28
+
-**Original file**: `backend/app/services/kg_builder.py` — programmatic service is now dead code (superseded by `backend/app/agents/kg_builder.py`)
35
29
-**Added**: 2026-02-07, live pipeline testing
30
+
-**Resolved**: 2026-02-08, Phase 7.1 LLM KG Builder
36
31
37
32
## MI-004: Pipeline summary log mixes two different entity count semantics
Copy file name to clipboardExpand all lines: .planning/REQUIREMENTS.md
+62-47Lines changed: 62 additions & 47 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -511,10 +511,13 @@ This document defines formal requirements for Holmes v1. Requirements are derive
511
511
**Priority:** HIGH
512
512
**Description:** Agent that cross-references all domain findings to produce hypotheses, contradictions, gaps, timeline events, cross-domain conclusions, and case-level summary/verdict.
513
513
**Acceptance Criteria:**
514
-
- Receives ALL case_findings from PostgreSQL (rich markdown findings with citations)
515
-
- Additional context: entity summary from KG tables, file metadata, case description
514
+
-**Two-source DB input assembly:**
515
+
- Source 1: ALL `case_findings` rows for the case workflow — rich markdown `finding_text` with inline citations, `citations` JSONB (file_id + locator + exact_excerpt), `agent_type` (financial/legal/evidence/strategy), `category`, `confidence`. These are the persisted outputs from all domain agents + strategy agent (pipeline Stage 6).
516
+
- Source 2: Curated knowledge graph from `kg_entities` (name, entity_type, description_brief, description_detailed, aliases, domains, source_finding_ids) + `kg_relationships` (label, relationship_type, evidence_excerpt, temporal_context, source_finding_ids, confidence). These are the outputs from the LLM KG Builder Agent (pipeline Stage 7).
517
+
- Additional context: case metadata (name, description, case_type) + file metadata (filenames, types) from `cases` and `case_files` tables.
516
518
- Gemini 3 Pro with `thinking_level="high"` and 1M context window
517
519
- Runs in fresh stage-isolated ADK session (consistent with pipeline pattern)
520
+
- Pipeline position: Stage 8, after LLM KG Builder (Stage 7) + Entity Backfill (Stage 7b) complete
518
521
- Produces structured SynthesisOutput:
519
522
-`hypotheses`: Case hypotheses with initial confidence + supporting/contradicting evidence references
520
523
-`contradictions`: Detected contradictions with exact source pairs from both sides, severity classification (minor/significant/critical)
@@ -1550,73 +1553,84 @@ This document defines formal requirements for Holmes v1. Requirements are derive
*Architecture revision: 2026-02-08 (REQ-AGENT-009 revised for LLM-based KG Builder; REQ-VIS-003 updated for D3.js with Epstein-inspired patterns; vis-network deferred)*
1842
1857
*Status: Complete - Integration features added (REQ-RESEARCH, REQ-HYPO, REQ-GEO, REQ-TASK)*
0 commit comments