Skip to content

Commit 6030135

Browse files
docs(07.1-02): complete KG Builder agent implementation plan
Tasks completed: 2/2 - Task 1: KG Builder Agent -- Runner, Factory, Prompt, Input Assembly, DB Write - Task 2: Pipeline Wiring -- Replace Programmatic KG Builder with LLM Agent SUMMARY: .planning/phases/07.1-llm-kg-builder-agent/07.1-02-SUMMARY.md
1 parent 618958b commit 6030135

File tree

2 files changed

+124
-9
lines changed

2 files changed

+124
-9
lines changed

.planning/STATE.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Holmes Project State
22

33
**Last Updated:** 2026-02-08
4-
**Current Phase:** 7.1 of 12 (LLM-Based KG Builder Agent) — IN PROGRESS (1/2 plans)
4+
**Current Phase:** 7.1 of 12 (LLM-Based KG Builder Agent) — COMPLETE (2/2 plans)
55
**Next Phase:** 7.2 (D3.js KG Frontend Enhancement) → 8 (Synthesis)
66
**Current Milestone:** M1 - Holmes v1.0
77

@@ -18,7 +18,7 @@
1818
| 5 | Agent Flow | COMPLETE | 2026-02-04 | 2026-02-05 | SSE pipeline complete; HITL infra built but verification deferred to Phase 6+ |
1919
| 6 | Domain Agents | COMPLETE | 2026-02-06 | 2026-02-06 | 5 plans (14 commits) + 21 post-plan commits (35 total): refactoring, routing HITL, production hardening, live-testing bugfixes |
2020
| 7 | Knowledge Storage & Domain Agent Enrichment | COMPLETE | 2026-02-07 | 2026-02-07 | 6 plans (11 commits), 8/8 verified: 9 DB models + migration, KG/findings schemas, KG Builder + findings service, prompt enrichment, 10 API endpoints, pipeline wiring |
21-
| 7.1 | LLM-Based KG Builder Agent | IN_PROGRESS | 2026-02-08 | - | Plan 01/02 complete: schema evolution + Pydantic schemas |
21+
| 7.1 | LLM-Based KG Builder Agent | COMPLETE | 2026-02-08 | 2026-02-08 | 2 plans (4 commits): schema evolution, Pydantic schemas, agent runner/prompt/factory, pipeline wiring |
2222
| 7.2 | KG Frontend (D3.js Enhancement) | NOT_STARTED | - | - | Epstein-inspired D3.js improvements |
2323
| 7.3 | KG Frontend (vis-network) | DEFERRED | - | - | Optional; only if D3.js proves insufficient |
2424
| 8 | Intelligence Layer & Geospatial | NOT_STARTED | - | - | |
@@ -53,14 +53,12 @@
5353
- vis-network deferred to optional Phase 7.3
5454
- New phase structure: 7.1 (LLM KG Builder) → 7.2 (D3.js Enhancement) → 7.3 (vis-network, optional)
5555

56-
**Phase 7.1 Plan 01 complete** (2026-02-08): Schema evolution + Pydantic schemas -- 2 plans, 2 commits
57-
- Alembic migration adding 10 new nullable columns (5 entity, 5 relationship)
58-
- KgBuilderOutput/KgBuilderEntity/KgBuilderRelationship Pydantic schemas with integer ID cross-referencing
59-
- EntityResponse and RelationshipResponse updated with new optional fields
60-
- Frontend API types regenerated
56+
**Phase 7.1 Complete** (2026-02-08): LLM-Based KG Builder Agent -- 2 plans, 4 commits
57+
- Plan 01: Alembic migration adding 10 new nullable columns + KgBuilderOutput Pydantic schemas with integer ID cross-referencing
58+
- Plan 02: KgBuilderAgentRunner with text-only input, KG_BUILDER_SYSTEM_PROMPT (8+1 entity taxonomy), AgentFactory.create_kg_builder_agent(), DB writer with clear-and-rebuild, pipeline Stage 7 replaced with LLM invocation
59+
- Full pipeline: Triage -> Orchestrator -> Domain -> Strategy -> HITL -> Save Findings -> LLM KG Builder -> Backfill Entity IDs -> Final
6160

6261
**What's next:**
63-
- Phase 7.1 Plan 02: KG Builder agent runner, prompt, DB writer, pipeline wiring
6462
- Phase 7.2: D3.js KG Frontend Enhancement — Epstein-inspired layout, physics, sidebars, filtering, document excerpts
6563
- Phase 8: Synthesis Agent & Intelligence Layer — cross-referencing, hypotheses, contradictions, gaps, timeline
6664

@@ -373,6 +371,10 @@ All frontend features need these backend endpoints:
373371
| KG API data access | Direct model queries vs Service layer | Direct model queries | Simple CRUD doesn't need service abstraction; findings uses service for complex search |
374372
| EntityCreateRequest metadata mapping | Direct field vs Renamed | metadata -> properties | Schema field "metadata" maps to DB column "properties" (SQLAlchemy reserved attribute) |
375373
| KG Builder entity cross-referencing | Name matching vs Integer IDs | Integer IDs (1, 2, 3...) | Eliminates name-matching inconsistencies; LLM assigns sequential IDs, mapped to DB UUIDs during write |
374+
| KG Builder input format | Multimodal files vs Text-only | Text-only (findings + entities + case description) | Domain agents already processed raw evidence; KG Builder only needs pre-processed text |
375+
| KG Builder rebuild strategy | Incremental merge vs Clear-and-rebuild | Clear-and-rebuild | Clean slate every run; delete all KG data then insert curated LLM output |
376+
| KG Builder failure handling | Block pipeline vs Non-blocking | Non-blocking (try/except, continue) | KG Builder failure emits SSE error, pipeline continues; KG page shows empty state |
377+
| KG Builder media resolution | HIGH vs None | None (text-only input) | No generate_content_config needed; KG Builder receives text, not files |
376378

377379
---
378380

@@ -385,7 +387,7 @@ None currently.
385387
## Session Continuity
386388

387389
Last session: 2026-02-08
388-
Stopped at: Completed 07.1-01-PLAN.md (schema evolution + Pydantic schemas)
390+
Stopped at: Completed 07.1-02-PLAN.md (KG Builder agent runner, prompt, pipeline wiring)
389391
Resume file: None
390392

391393
---
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
---
2+
phase: 07.1-llm-kg-builder-agent
3+
plan: 02
4+
subsystem: agents
5+
tags: [gemini-pro, adk, knowledge-graph, llm-agent, pipeline, sse]
6+
7+
# Dependency graph
8+
requires:
9+
- phase: 07.1-01
10+
provides: "Alembic migration + ORM models + KgBuilderOutput Pydantic schema"
11+
- phase: 07
12+
provides: "KgEntity, KgRelationship ORM models, programmatic KG builder, pipeline wiring"
13+
provides:
14+
- "KgBuilderAgentRunner subclass with text-only content preparation"
15+
- "KG_BUILDER_SYSTEM_PROMPT with 8+1 entity taxonomy and semantic relationship instructions"
16+
- "AgentFactory.create_kg_builder_agent() with Pro model and high thinking"
17+
- "Input assembly from case_findings + domain entities + case description"
18+
- "DB write with clear-and-rebuild, lenient parsing, entity degree computation"
19+
- "Pipeline Stage 7 replaced with LLM KG Builder invocation in try/except"
20+
- "SSE lifecycle events for KG Builder (started, complete, error)"
21+
affects: [07.2, 08, 09]
22+
23+
# Tech tracking
24+
tech-stack:
25+
added: []
26+
patterns:
27+
- "Text-only DomainAgentRunner subclass (no multimodal files)"
28+
- "Clear-and-rebuild KG strategy (delete all, insert curated data)"
29+
- "Lenient LLM output parsing (skip malformed, log warning)"
30+
- "Non-blocking pipeline stage (try/except, continues on failure)"
31+
32+
key-files:
33+
created:
34+
- "backend/app/agents/kg_builder.py"
35+
- "backend/app/agents/prompts/kg_builder.py"
36+
modified:
37+
- "backend/app/agents/factory.py"
38+
- "backend/app/services/kg_builder.py"
39+
- "backend/app/services/pipeline.py"
40+
41+
key-decisions:
42+
- "KG Builder receives text-only input (no multimodal files) -- findings + entities + case description"
43+
- "Clear-and-rebuild strategy: delete all existing KG data before writing curated data"
44+
- "KG Builder failure is non-blocking: try/except in pipeline, emits SSE error, continues"
45+
- "Old programmatic builder functions preserved as deprecated (not deleted)"
46+
47+
patterns-established:
48+
- "Text-only DomainAgentRunner subclass pattern for agents consuming pre-processed text"
49+
- "Non-blocking pipeline stage pattern with SSE error emission on failure"
50+
51+
# Metrics
52+
duration: 5min
53+
completed: 2026-02-08
54+
---
55+
56+
# Phase 7.1 Plan 02: KG Builder Agent Implementation Summary
57+
58+
**LLM-based KG Builder agent with Gemini Pro, text-only input assembly, clear-and-rebuild DB writer, and non-blocking pipeline integration with SSE lifecycle events**
59+
60+
## Performance
61+
62+
- **Duration:** 5 min
63+
- **Started:** 2026-02-08T11:19:59Z
64+
- **Completed:** 2026-02-08T11:25:07Z
65+
- **Tasks:** 2
66+
- **Files modified:** 5
67+
68+
## Accomplishments
69+
- KgBuilderAgentRunner subclass following Strategy pattern with text-only content preparation from case findings, domain entities, and case description
70+
- System prompt with 8+1 entity taxonomy (PERSON, ORGANIZATION, LOCATION, EVENT, ASSET, FINANCIAL_ENTITY, COMMUNICATION, DOCUMENT, OTHER), semantic relationship instructions, deduplication rules, and evidence grounding requirements
71+
- AgentFactory.create_kg_builder_agent() producing fresh LlmAgent with Pro model, high thinking, and text-only input (no media resolution)
72+
- DB writer that clears existing KG data, inserts curated entities/relationships with integer-ID-to-UUID mapping, computes entity degrees, and uses lenient parsing (skip malformed, log warning)
73+
- Pipeline Stage 7 replaced: run_kg_builder() called instead of build_knowledge_graph(), wrapped in try/except for graceful failure, SSE events for started/complete/error
74+
75+
## Task Commits
76+
77+
Each task was committed atomically:
78+
79+
1. **Task 1: KG Builder Agent -- Runner, Factory, Prompt, Input Assembly, DB Write** - `07428c7` (feat)
80+
2. **Task 2: Pipeline Wiring -- Replace Programmatic KG Builder with LLM Agent** - `618958b` (feat)
81+
82+
## Files Created/Modified
83+
- `backend/app/agents/kg_builder.py` - KgBuilderAgentRunner, assemble_kg_builder_input(), write_kg_from_llm_output(), run_kg_builder()
84+
- `backend/app/agents/prompts/kg_builder.py` - KG_BUILDER_SYSTEM_PROMPT with entity taxonomy and relationship instructions
85+
- `backend/app/agents/factory.py` - Added create_kg_builder_agent() static method
86+
- `backend/app/services/kg_builder.py` - Marked 4 old functions as deprecated (extract_entities_from_output, build_relationships_from_findings, deduplicate_entities, build_knowledge_graph)
87+
- `backend/app/services/pipeline.py` - Stage 7 replaced with LLM KG Builder invocation, SSE lifecycle events added
88+
89+
## Decisions Made
90+
- KG Builder receives text-only input assembled from case_findings (with [FINDING:uuid] prefixes), DomainEntity JSON lists, and case description -- no multimodal files needed since domain agents already processed raw evidence
91+
- Clear-and-rebuild strategy for KG data: delete all kg_entities and kg_relationships for the case before writing curated data from LLM output
92+
- Lenient parsing: each entity and relationship insertion wrapped in try/except, malformed items skipped with warning log, partial results preserved
93+
- Old programmatic builder functions (extract_entities_from_output, build_relationships_from_findings, deduplicate_entities, build_knowledge_graph) marked deprecated but preserved for backward compatibility and audit trail
94+
95+
## Deviations from Plan
96+
97+
None - plan executed exactly as written.
98+
99+
## Issues Encountered
100+
- Import sorting: ruff pre-commit hook caught unsorted import of run_kg_builder in pipeline.py -- fixed by moving import to alphabetical position among app.agents.* imports
101+
102+
## User Setup Required
103+
None - no external service configuration required.
104+
105+
## Next Phase Readiness
106+
- LLM KG Builder agent fully wired into pipeline, ready for end-to-end testing
107+
- Phase 7.2 (D3.js KG Frontend Enhancement) can proceed -- existing KG API endpoints return curated data without changes
108+
- Phase 8 (Synthesis) can proceed -- KG Builder produces entities with semantic relationships for synthesis agent consumption
109+
- No blockers
110+
111+
---
112+
*Phase: 07.1-llm-kg-builder-agent*
113+
*Completed: 2026-02-08*

0 commit comments

Comments
 (0)