Skip to content

Commit 07ac18c

Browse files
Merge branch 'ftr/ui-improvements' into development
2 parents 95be3c3 + a26222a commit 07ac18c

35 files changed

+1522
-1044
lines changed

DOCS/Holmes-Arch.png

2.45 MB
Loading

README.md

Lines changed: 674 additions & 61 deletions
Large diffs are not rendered by default.

SUBMISSION_WRITEUP.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Holmes - AI-Powered Legal and Investigation Intelligence Platform
2+
3+
## Inspiration
4+
5+
Legal professionals and investigators spend a large amount of their time manually reviewing evidence, which includes cross-referencing documents, videos, audio, and images for connections and contradictions. Holmes aims to make this this through domain-specialized AI agents, turning weeks of analysis into minutes of transparent, citation-grounded intelligence.
6+
7+
## What it does
8+
9+
With Holmes, investigators can easily input a large amount of multimodal (Mp3 Audio, Mp4 Videos, JPEG/png images, PDFs) case evidence files and Holmes then orchestrates specialized agents - Financial, Legal, Evidence, Knowledge Graph, Synthesis and Geospatial - that extract entities, detect contradictions, identify gaps, generate hypotheses, and build a complete red string board of all the key insights from a case's evidence files.
10+
11+
Users can interact through five views: Agent Flow (real-time showcase of the agentic pipeline), Knowledge Graph (all the case enitities and their relationships), Timeline, Geospatial Map, and Verdict dashboard. A Chat agent answers questions grounded in clickable source citations. The Investigator's Notebook captures voice and text notes, while AI-powered redaction agents censor sensitive content across PDFs (black boxes), images (blur and pixelate), audio (bleep) and videos (blur, pixelate, blackbox) via natural language prompts - Gemini identifies targets, then applies pixel-level censorship (for videos and images, self-hosted locally deployed instances of SAM2 and SAM3 are utilised).
12+
13+
## How we built it - Gemini 3 at the Core
14+
15+
- **Deep Thinking & Reasoning:** All agents use Gemini 3's `ThinkingConfig` at HIGH level via Google ADK's `BuiltInPlanner`, enabling the Orchestrator to reason about routing and Synthesis to cross-reference findings holistically.
16+
17+
- **Native Multimodality:** Gemini 3 models process PDFs, videos, audio, and images natively. This allows specialised domain agents to receive raw files with configurable `media_resolution` (HIGH for forensic analysis, MEDIUM for strategy). Files ≤100MB go inline; larger files use Gemini's File API (up to 2GB).
18+
19+
- **1M Token Context Window:** Stage-isolated ADK sessions feed the Synthesis Agents all their generated findings, entities, and relationships in one call - enabling cross-domain contradiction detection - which is usually not possible with smaller windows.
20+
21+
- **Architecture:** 9 Gemini 3 agents orchestrated via Google ADK with PostgreSQL-backed stage-isolated sessions, Pro-to-Flash fallbacks, parallel execution, SSE streaming with thinking traces, and tool-based Chat. Orchestrator agent gets the autonomy to decide how many instances of a domain agent to spawn based on the case requirements derived from the initial triage.
22+
23+
### Stack
24+
**Frontend:**
25+
- Next.js 16
26+
- React 19
27+
- D3.js
28+
- React Flow
29+
30+
**Backend:**
31+
- FastAPI
32+
- Google ADK
33+
- Cloud Run
34+
- Cloud SQL
35+
- GCS
36+
37+
## Impact
38+
39+
Holmes transforms complex investigation from manual review into AI-augmented intelligence - surfacing connections humans miss, detecting cross-modal contradictions, and generating hypotheses. Every conclusion traces to its exact source, building the trust legal work demands.
40+
41+
## What's next for Holmes
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# ABOUTME: Shared citation and findings_text rules for all domain agent prompts.
2+
# ABOUTME: Eliminates duplication of ~30 lines of identical rules across 4 domain agents.
3+
4+
CITATION_AND_FINDINGS_TEXT_RULES = """\
5+
## CITATION AND FINDINGS TEXT REQUIREMENTS
6+
7+
### Exhaustive Citation Rules
8+
Every factual statement in your findings MUST have a citation. No exceptions.
9+
10+
For EACH citation, ALL THREE fields are REQUIRED:
11+
- `file_id` (REQUIRED): The exact file ID provided in the input. Never omit.
12+
- `locator` (REQUIRED): Use the format:
13+
- PDF/documents: "page:N" (e.g., "page:3", "page:17")
14+
- Video: "ts:MM:SS" (e.g., "ts:01:23", "ts:00:45:12")
15+
- Audio: "ts:MM:SS" (e.g., "ts:05:30")
16+
- Images: "region:description" (e.g., "region:top-left-corner")
17+
- `excerpt` (REQUIRED): The EXACT text from the source, character-for-character.
18+
The excerpt is used for PDF text-layer highlighting — if it is missing or
19+
paraphrased, the user CANNOT verify the source in the document viewer.
20+
Copy the source text EXACTLY as it appears, preserving:
21+
- Original spelling (even if incorrect)
22+
- Original punctuation and whitespace
23+
- Original line breaks within the excerpt
24+
- Original formatting (capitalization, abbreviations)
25+
26+
### Citation Anti-Patterns (DO NOT)
27+
- DO NOT leave excerpt empty or null — every citation MUST have an excerpt.
28+
- DO NOT paraphrase or summarize the source text. Copy it verbatim.
29+
- DO NOT combine non-contiguous text fragments into a single excerpt.
30+
- DO NOT use ellipsis ("...") to abbreviate the middle of an excerpt.
31+
If the relevant text is too long, select the most important contiguous
32+
fragment (up to 500 characters).
33+
- DO NOT hallucinate or reconstruct text that you cannot read from the source.
34+
If a passage is illegible, note that in the finding description instead.
35+
36+
### Citation Quality Checklist
37+
Before finalizing each citation, verify:
38+
1. file_id matches the exact ID from the input (not a filename or URL).
39+
2. locator pinpoints the specific page or timestamp (not a range).
40+
3. excerpt is a verbatim copy-paste from the source (not a paraphrase).
41+
4. excerpt is under 500 characters and is a single contiguous passage.
42+
5. excerpt would produce a match if searched in the original document.
43+
44+
{domain_specific_citation_notes}\
45+
If a finding spans multiple pages or time segments, create SEPARATE citations
46+
for each page/segment. Do not combine into ranges.
47+
48+
### findings_text Field
49+
In addition to the structured `findings` array, produce a `findings_text` field
50+
containing a rich markdown narrative analysis. This text:
51+
- Organizes analysis by category (use ## headers for each category)
52+
- Contains detailed paragraphs explaining each finding in context
53+
- References specific evidence using inline notation: [Source: file_id, page:N, "exact excerpt"]
54+
- Connects findings to broader case implications
55+
- Must be comprehensive -- this is the primary text used for search indexing
56+
and downstream synthesis
57+
- Minimum 500 words for cases with substantive findings
58+
- Every factual claim in the narrative must reference its source
59+
60+
{findings_text_example}"""

backend/app/agents/prompts/evidence.py

Lines changed: 44 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,35 @@
11
# ABOUTME: System prompt for the Evidence domain agent guiding authenticity, custody, and forensic analysis.
22
# ABOUTME: Instructs the model to produce findings, entities, hypothesis evaluations, and quality assessments.
33

4-
EVIDENCE_SYSTEM_PROMPT = """\
4+
from app.agents.prompts._citation_rules import CITATION_AND_FINDINGS_TEXT_RULES
5+
6+
_DOMAIN_CITATION_NOTES = """\
7+
For evidence files, pay special attention to:
8+
- Metadata timestamps in their exact original format (e.g., "2025:01:15 14:23:07")
9+
- Chain of custody details (custodian names, transfer dates, handling notes)
10+
- Authenticity indicators (device fingerprints, GPS coordinates, EXIF fields)
11+
- For video/audio evidence, use second-level timestamps (MM:SS or HH:MM:SS)
12+
to mark exact moments where key testimony or events occur
13+
14+
"""
15+
16+
_DOMAIN_FINDINGS_TEXT_EXAMPLE = """\
17+
Example findings_text format:
18+
```
19+
## Authenticity Analysis
20+
21+
Examination of the photograph's EXIF metadata (file_id: img789) reveals a
22+
creation date of 2025-01-15 [Source: img789, region:EXIF-header,
23+
"DateTimeOriginal=2025:01:15 14:23:07"] which precedes the claimed incident
24+
date by approximately three months. The GPS coordinates embedded in the metadata
25+
indicate Los Angeles rather than the claimed Chicago location.
26+
27+
## Chain of Custody
28+
29+
The evidence submission lacks standard chain of custody documentation...
30+
```"""
31+
32+
_PREAMBLE = """\
533
You are the **Evidence Analysis Agent** for Holmes, an investigative intelligence platform.
634
735
Your role is to perform forensically rigorous evidence evaluation on files routed to you \
@@ -100,7 +128,10 @@
100128
- **Image regions**: "region:x,y,w,h" (pixel coordinates)
101129
- **Document sections**: "section:Metadata Header"
102130
103-
Include an excerpt (up to 500 characters) when it helps clarify the citation.
131+
Every citation MUST include all three fields: file_id, locator, and excerpt. \
132+
The excerpt must contain the EXACT verbatim text from the source — it is used \
133+
for PDF text-layer highlighting. If the excerpt is missing or paraphrased, the \
134+
user cannot verify the source. Excerpts must be under 500 characters.
104135
105136
### 6. Hypothesis Evaluation
106137
@@ -171,64 +202,9 @@
171202
172203
---
173204
174-
## CITATION AND FINDINGS TEXT REQUIREMENTS
175-
176-
### Exhaustive Citation Rules
177-
Every factual statement in your findings MUST have a citation. No exceptions.
178-
179-
For EACH citation:
180-
- `file_id`: The exact file ID provided in the input.
181-
- `locator`: Use the format:
182-
- PDF/documents: "page:N" (e.g., "page:3", "page:17")
183-
- Video: "ts:MM:SS" (e.g., "ts:01:23", "ts:00:45:12")
184-
- Audio: "ts:MM:SS" (e.g., "ts:05:30")
185-
- Images: "region:description" (e.g., "region:top-left-corner")
186-
- `excerpt`: The EXACT text from the source, character-for-character.
187-
Copy the source text EXACTLY as it appears, preserving:
188-
- Original spelling (even if incorrect)
189-
- Original punctuation and whitespace
190-
- Original line breaks within the excerpt
191-
- Original formatting (capitalization, abbreviations)
192-
DO NOT paraphrase, summarize, or clean up the excerpt.
193-
The excerpt will be used for exact-match highlighting in a PDF viewer.
194-
195-
For evidence files, pay special attention to:
196-
- Metadata timestamps in their exact original format (e.g., "2025:01:15 14:23:07")
197-
- Chain of custody details (custodian names, transfer dates, handling notes)
198-
- Authenticity indicators (device fingerprints, GPS coordinates, EXIF fields)
199-
- For video/audio evidence, use second-level timestamps (MM:SS or HH:MM:SS)
200-
to mark exact moments where key testimony or events occur
201-
202-
If a finding spans multiple pages or time segments, create SEPARATE citations
203-
for each page/segment. Do not combine into ranges.
204-
205-
### findings_text Field
206-
In addition to the structured `findings` array, produce a `findings_text` field
207-
containing a rich markdown narrative analysis. This text:
208-
- Organizes analysis by category (use ## headers for each category)
209-
- Contains detailed paragraphs explaining each finding in context
210-
- References specific evidence using inline notation: [Source: file_id, page:N, "exact excerpt"]
211-
- Connects findings to broader case implications
212-
- Must be comprehensive -- this is the primary text used for search indexing
213-
and downstream synthesis
214-
- Minimum 500 words for cases with substantive findings
215-
- Every factual claim in the narrative must reference its source
216-
217-
Example findings_text format:
218-
```
219-
## Authenticity Analysis
220-
221-
Examination of the photograph's EXIF metadata (file_id: img789) reveals a
222-
creation date of 2025-01-15 [Source: img789, region:EXIF-header,
223-
"DateTimeOriginal=2025:01:15 14:23:07"] which precedes the claimed incident
224-
date by approximately three months. The GPS coordinates embedded in the metadata
225-
indicate Los Angeles rather than the claimed Chicago location.
226-
227-
## Chain of Custody
228-
229-
The evidence submission lacks standard chain of custody documentation...
230-
```
205+
"""
231206

207+
_OUTPUT_FORMAT = """
232208
---
233209
234210
## OUTPUT FORMAT
@@ -294,3 +270,12 @@
294270
295271
Analyze the file(s) provided below and respond with the JSON output.
296272
"""
273+
274+
EVIDENCE_SYSTEM_PROMPT = (
275+
_PREAMBLE
276+
+ CITATION_AND_FINDINGS_TEXT_RULES.format(
277+
domain_specific_citation_notes=_DOMAIN_CITATION_NOTES,
278+
findings_text_example=_DOMAIN_FINDINGS_TEXT_EXAMPLE,
279+
)
280+
+ _OUTPUT_FORMAT
281+
)

backend/app/agents/prompts/financial.py

Lines changed: 43 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,34 @@
11
# ABOUTME: System prompt for the Financial domain agent guiding transaction analysis and entity extraction.
22
# ABOUTME: Instructs the model to produce structured findings, entities, hypothesis evaluations, and citations.
33

4-
FINANCIAL_SYSTEM_PROMPT = """\
4+
from app.agents.prompts._citation_rules import CITATION_AND_FINDINGS_TEXT_RULES
5+
6+
_DOMAIN_CITATION_NOTES = """\
7+
For financial documents, pay special attention to:
8+
- Exact dollar amounts (e.g., "$450,000.00" not "$450K")
9+
- Account numbers as they appear in the source
10+
- Transaction dates in their original format
11+
- Table cell values with cell-level precision (cite the specific row/column)
12+
13+
"""
14+
15+
_DOMAIN_FINDINGS_TEXT_EXAMPLE = """\
16+
Example findings_text format:
17+
```
18+
## Financial Transactions
19+
20+
Analysis of the bank statements (file_id: abc123, page:2) reveals a series of
21+
wire transfers totaling $2.3M between January and March 2025. The first transfer
22+
of $450,000 [Source: abc123, page:2, "Wire Transfer - $450,000.00 - 01/15/2025 -
23+
Recipient: Offshore Holdings Ltd"] was directed to an entity not previously
24+
disclosed in the corporate filings.
25+
26+
## Anomalies Detected
27+
28+
A significant discrepancy exists between the reported revenue...
29+
```"""
30+
31+
_PREAMBLE = """\
532
You are the **Financial Analysis Agent** for Holmes, an investigative intelligence platform.
633
734
Your role is to perform deep financial analysis on evidence files routed to you by the \
@@ -92,7 +119,10 @@
92119
- **Image regions**: "region:x,y,w,h" (pixel coordinates)
93120
- **Document sections**: "section:Executive Summary"
94121
95-
Include an excerpt (up to 500 characters) when it helps clarify the citation.
122+
Every citation MUST include all three fields: file_id, locator, and excerpt. \
123+
The excerpt must contain the EXACT verbatim text from the source — it is used \
124+
for PDF text-layer highlighting. If the excerpt is missing or paraphrased, the \
125+
user cannot verify the source. Excerpts must be under 500 characters.
96126
97127
### 6. Hypothesis Evaluation
98128
@@ -140,63 +170,9 @@
140170
141171
---
142172
143-
## CITATION AND FINDINGS TEXT REQUIREMENTS
144-
145-
### Exhaustive Citation Rules
146-
Every factual statement in your findings MUST have a citation. No exceptions.
147-
148-
For EACH citation:
149-
- `file_id`: The exact file ID provided in the input.
150-
- `locator`: Use the format:
151-
- PDF/documents: "page:N" (e.g., "page:3", "page:17")
152-
- Video: "ts:MM:SS" (e.g., "ts:01:23", "ts:00:45:12")
153-
- Audio: "ts:MM:SS" (e.g., "ts:05:30")
154-
- Images: "region:description" (e.g., "region:top-left-corner")
155-
- `excerpt`: The EXACT text from the source, character-for-character.
156-
Copy the source text EXACTLY as it appears, preserving:
157-
- Original spelling (even if incorrect)
158-
- Original punctuation and whitespace
159-
- Original line breaks within the excerpt
160-
- Original formatting (capitalization, abbreviations)
161-
DO NOT paraphrase, summarize, or clean up the excerpt.
162-
The excerpt will be used for exact-match highlighting in a PDF viewer.
163-
164-
For financial documents, pay special attention to:
165-
- Exact dollar amounts (e.g., "$450,000.00" not "$450K")
166-
- Account numbers as they appear in the source
167-
- Transaction dates in their original format
168-
- Table cell values with cell-level precision (cite the specific row/column)
169-
170-
If a finding spans multiple pages or time segments, create SEPARATE citations
171-
for each page/segment. Do not combine into ranges.
172-
173-
### findings_text Field
174-
In addition to the structured `findings` array, produce a `findings_text` field
175-
containing a rich markdown narrative analysis. This text:
176-
- Organizes analysis by category (use ## headers for each category)
177-
- Contains detailed paragraphs explaining each finding in context
178-
- References specific evidence using inline notation: [Source: file_id, page:N, "exact excerpt"]
179-
- Connects findings to broader case implications
180-
- Must be comprehensive -- this is the primary text used for search indexing
181-
and downstream synthesis
182-
- Minimum 500 words for cases with substantive findings
183-
- Every factual claim in the narrative must reference its source
184-
185-
Example findings_text format:
186-
```
187-
## Financial Transactions
188-
189-
Analysis of the bank statements (file_id: abc123, page:2) reveals a series of
190-
wire transfers totaling $2.3M between January and March 2025. The first transfer
191-
of $450,000 [Source: abc123, page:2, "Wire Transfer - $450,000.00 - 01/15/2025 -
192-
Recipient: Offshore Holdings Ltd"] was directed to an entity not previously
193-
disclosed in the corporate filings.
194-
195-
## Anomalies Detected
196-
197-
A significant discrepancy exists between the reported revenue...
198-
```
173+
"""
199174

175+
_OUTPUT_FORMAT = """
200176
---
201177
202178
## OUTPUT FORMAT
@@ -249,3 +225,12 @@
249225
250226
Analyze the file(s) provided below and respond with the JSON output.
251227
"""
228+
229+
FINANCIAL_SYSTEM_PROMPT = (
230+
_PREAMBLE
231+
+ CITATION_AND_FINDINGS_TEXT_RULES.format(
232+
domain_specific_citation_notes=_DOMAIN_CITATION_NOTES,
233+
findings_text_example=_DOMAIN_FINDINGS_TEXT_EXAMPLE,
234+
)
235+
+ _OUTPUT_FORMAT
236+
)

0 commit comments

Comments
 (0)