Frfr Design Document

Problem

LLMs can miss details or introduce inaccuracies when analyzing complex compliance documents. High-stakes use cases (security audits, compliance verification, technical design review) require exhaustive, validated fact extraction.

Solution

Current Implementation (Phase 1) ✅

Extract structured facts from documents using:

Maximum depth extraction - Extract 5-10 facts per paragraph with enhanced metadata
Real-time validation - Verify every fact against source text immediately
Multiple evidence quotes - Support facts synthesized from multiple locations (V5)
Quality post-processing - Filter and score facts for optimal quality/density

Future Vision (Phase 2) 🔮

Add consensus layer:

Run multiple LLM instances in parallel with different random seeds
Use non-LLM semantic comparison to identify consensus
Discard outliers and resolve contradictions with judge model
Result: high-confidence answers with exact citations

Architecture

Current Implementation (Phase 1) ✅

Documents (PDF/Markdown)
    ↓
PyPDF2/OCR → Chunk (sliding window + resume)
    ↓
Enhanced Extract (Claude Sonnet via CLI)
  - Maximum depth (5-10 facts/paragraph)
  - 8 metadata fields
  - Multiple evidence quotes (V5)
  - Section-aware prompting
  - Parallel workers (5-11)
    ↓
Real-Time Validation
  - Quote verification
  - Line number validation
  - Fuzzy matching (70%)
  - Fact recovery (40-79%)
    ↓
Post-Processing Pipeline
  - Retroactive QV tagging
  - Quality scoring
  - Aggressive filtering (35% QV target)
    ↓
Structured Facts (JSON) → Validated 100%

Future Architecture (Phase 2) 🔮

Validated Facts (from Phase 1)
    ↓
Embed Facts (sentence transformers)
    ↓
Cluster by Semantic Similarity
    ↓
Swarm Extraction (N instances, different seeds)
    ↓
Detect Consensus (plurality threshold)
    ↓
Detect Contradictions
    ↓
Judge Resolution (Claude Opus) ← only for contradictions
    ↓
Interactive Q&A + Report Generation

Core Components

Current Implementation (Phase 1) ✅

1. Document Processing

Input: PDFs via PyPDF2 (fast) or Tesseract OCR (fallback)
Encrypted PDFs: Handled automatically (pycryptodome)
Chunking: Configurable (default: 500 lines, 100 overlap)
Resume: Smart session tracking for interrupted extractions
Target: SOC2 reports (50-150 pages), pentest reports (20-100 pages)

2. Enhanced Fact Extraction (V5)

Model: Claude Sonnet via CLI (headless mode)
Extraction Mode: Maximum depth (5-10 facts per paragraph)
Schema: Enhanced with 8 metadata fields
- fact_type, control_family, specificity_score
- entities, quantitative_values, process_details
- section_context, related_control_ids
V5 Feature: Multiple evidence quotes per fact
Parallel Processing: 5-11 workers (ThreadPoolExecutor)
Output: Structured facts (JSON schema)

3. Enhanced Fact Schema (V5)

{
  "claim": "specific assertion",
  "source_doc": "filename.pdf",
  "source_location": "Lines 42-45",
  "evidence_quotes": [                    // V5: Multiple quotes
    {
      "quote": "exact text from source",
      "source_location": "Lines 42-45",
      "relevance": "why this supports claim"
    }
  ],
  "confidence": 0.95,
  // Enhanced metadata (8 fields)
  "fact_type": "technical_control",
  "control_family": "access_control",
  "specificity_score": 0.85,
  "entities": ["AWS", "TLS 1.2"],
  "quantitative_values": ["daily", "90 days"],
  "process_details": {"who": "IT team", "when": "quarterly", "how": "automated"},
  "section_context": "Control Testing",
  "related_control_ids": ["CC6.1"]
}

4. Real-Time Validation

Quote Verification: Check every quote exists in source text
Line Number Validation: Verify quotes in specified ranges
Fuzzy Matching: 70% threshold for OCR artifacts
Fact Recovery: LLM-assisted recovery for 40-79% matches
Philosophy: 100% validation rate before saving

5. Post-Processing Pipeline

Retroactive QV Tagging: Scan claims for missed quantitative values
Quality Scoring: Specificity + entities + process details
Aggressive Filtering: Achieve target QV coverage (35%)
Output: High-quality, high-density fact set

6. Session Management

Storage: Local session directories
Resume: Track processed chunks, resume from interruption
Progress: Real-time progress tracking
Consolidation: Auto-consolidate facts per session

Future Components (Phase 2) 🔮

Consensus Detection (Planned)

Embedding: Sentence transformers (local, fast, no API costs)
Clustering: Cosine similarity (threshold: 0.85)
Plurality: Configurable threshold (default: 0.8 or "all but one")
Outliers: Facts below threshold are discarded as errors
Philosophy: Precision over recall

Contradiction Handling (Planned)

Detection: Semantically similar facts with conflicting claims
Resolution: Judge model (Claude Opus) examines source docs
Output: Resolved fact + reasoning documented in report appendix

Judge Synthesis (Planned)

Model: Claude Opus or latest Sonnet (highest quality)
Input: Consensus facts + resolved contradictions
Output: Final answer with confidence score and citations

Interactive Q&A (Planned)

Query: Natural language questions over extracted facts
Search: Semantic search for relevant facts
Synthesis: Generate answers with citations

Data Flow Example

User Question: "Does the system implement 2FA?"

Swarm Extraction (5 instances):

Instance 1: Extracts 3 facts about 2FA
Instance 2: Extracts 3 facts (2 match Instance 1)
Instance 3: Extracts 2 facts (both match previous)
Instance 4: Extracts 4 facts (3 match, 1 is hallucination)
Instance 5: Extracts 3 facts (all match consensus)

Consensus:

Cluster 1: "System uses 2FA" (5/5 instances) → Consensus ✓
Cluster 2: "2FA via SMS and TOTP" (4/5 instances) → Consensus ✓
Cluster 3: "Hardware tokens required" (1/5 instances) → Outlier ✗

Judge Synthesis:

Combines consensus facts
Confidence: 95% (strong consensus)
Answer: "Yes, 2FA is implemented via SMS and TOTP"

Report:

Direct answer with confidence
Supporting facts with exact page/section citations
Appendix: 1 hallucination corrected (hardware tokens)

Key Design Decisions

Non-LLM Consensus

Using embeddings instead of LLM-based comparison:

Fast: Local computation, no API latency
Cheap: No API costs for comparison
Deterministic: Reproducible clustering
Judge LLM only invoked for contradictions

High Confidence Default

Low-occurrence facts (1-2 instances) are assumed errors
Saves human time by not flagging low-confidence findings
Trade-off: May miss rare but true facts

Exact Citations

All facts must link to exact text in source
No paraphrasing in evidence quotes
Enables verification and trust

Workflow Orchestration 🔮 (Planned)

Session state persists across workflow steps
Parallel swarm execution with retries
Resume sessions across CLI invocations

Configuration

CLI flags:

--swarm-size: Number of LLM instances (default: 5)
--consensus-threshold: Plurality threshold (default: 0.8)
--similarity-threshold: Semantic clustering (default: 0.85)
--swarm-model: Model for swarm (default: claude-sonnet-4)
--judge-model: Model for judge (default: claude-opus-4)

Interface

$ frfr process documents/report.pdf documents/spec.md

Loading documents...
Session started: sess_abc123

> does the system implement 2FA?

[Processing with 5 instances...]

Answer: Yes, 2FA is implemented via SMS and TOTP.
Confidence: 95%

Type 'report' to view full details.

> report

# Full markdown report displayed #

> what TOTP apps are supported?

[Querying extracted facts...]

Answer: Google Authenticator and Authy.
Confidence: 88%

> exit

Session ended.

Output Format

Markdown report:

Executive Summary: Direct answer
Confidence Score: 0-100%
Supporting Facts: All consensus facts with citations
Methodology: Swarm size, consensus stats, outliers
Appendix:
- Corrected hallucinations
- Resolved contradictions
- Low-confidence facts (informational)

Implementation

Language: Python 3.10+

Dependencies:

anthropic: Claude API (via CLI)
pypdf2, pytesseract: Document processing
click, rich: CLI
textual: TUI interface
🔮 temporalio: Workflow orchestration (planned)
🔮 sentence-transformers: Embeddings (planned)

Module Structure:

frfr/
├── documents/      # Parsing, chunking, storage
├── extraction/     # LLM fact extraction, swarm coordination
├── consensus/      # 🔮 Embeddings, clustering, comparison (planned)
├── judge/          # 🔮 Contradiction resolution, synthesis (planned)
├── workflows/      # 🔮 Workflow orchestration (planned)
├── reporting/      # 🔮 Markdown generation, formatting (planned)
└── cli.py          # Interactive CLI

Use Cases

Security Audits: "Does this pentest report identify critical vulnerabilities?"
Compliance: "Does this SOC2 report implement controls in reference spec?"
Design Review: "Does this architecture doc address scaling requirements?"
Governance: "What data retention policies are described?"

Limitations

Speed: Multiple LLM calls are slower than single instance
Cost: N instances = N × API cost (mitigated by smaller swarm model)
Rare Facts: Low-occurrence truths may be discarded as outliers
Context Windows: Very large documents require chunking

Future Enhancements

Multi-model swarm (different models for diversity)
Dynamic prompt generation (meta-LLM creates variations)
Session persistence across CLI invocations
Incremental fact extraction for multi-document sessions
Web UI wrapper
Advanced comparison modes (compliance checking, feature verification)

Philosophy

Precision over recall: Better to say "I don't know" than hallucinate.

Transparency: Show all work in appendix.

Exact citations: Never paraphrase evidence.

Observability: Full execution history and session tracking.

Implementation Status

Phase 1 (Extraction & Validation): ✅ COMPLETE - Production Ready

Document processing: ✅ Complete (PyPDF2 + OCR)
Enhanced extraction: ✅ Complete (V5 with 8 metadata fields)
Real-time validation: ✅ Complete (100% rate achieved)
Post-processing: ✅ Complete (QV tagging + filtering)
CLI: ✅ Complete (7 commands)
Session management: ✅ Complete (resume capability)

Phase 2 (Consensus & Q&A): 🔮 Planned

Swarm consensus: 🔮 Planned
Semantic clustering: 🔮 Planned
Contradiction resolution: 🔮 Planned
Enhanced interactive Q&A: 🔮 Planned
Workflow orchestration: 🔮 Planned

Production Metrics (V5):

1,011 validated facts from 155-page SOC2 report
35.0% quantitative value coverage (target achieved)
0.878 average specificity (high quality)
28 minutes extraction time (170 chunks, 11 workers)
100% validation rate

License: TBD (intended for open source release)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frfr Design Document

Problem

Solution

Current Implementation (Phase 1) ✅

Future Vision (Phase 2) 🔮

Architecture

Current Implementation (Phase 1) ✅

Future Architecture (Phase 2) 🔮

Core Components

Current Implementation (Phase 1) ✅

1. Document Processing

2. Enhanced Fact Extraction (V5)

3. Enhanced Fact Schema (V5)

4. Real-Time Validation

5. Post-Processing Pipeline

6. Session Management

Future Components (Phase 2) 🔮

Consensus Detection (Planned)

Contradiction Handling (Planned)

Judge Synthesis (Planned)

Interactive Q&A (Planned)

Data Flow Example

Key Design Decisions

Non-LLM Consensus

High Confidence Default

Exact Citations

Workflow Orchestration 🔮 (Planned)

Configuration

Interface

Output Format

Implementation

Use Cases

Limitations

Future Enhancements

Philosophy

Implementation Status

FilesExpand file tree

DESIGN.md

Latest commit

History

DESIGN.md

File metadata and controls

Frfr Design Document

Problem

Solution

Current Implementation (Phase 1) ✅

Future Vision (Phase 2) 🔮

Architecture

Current Implementation (Phase 1) ✅

Future Architecture (Phase 2) 🔮

Core Components

Current Implementation (Phase 1) ✅

1. Document Processing

2. Enhanced Fact Extraction (V5)

3. Enhanced Fact Schema (V5)

4. Real-Time Validation

5. Post-Processing Pipeline

6. Session Management

Future Components (Phase 2) 🔮

Consensus Detection (Planned)

Contradiction Handling (Planned)

Judge Synthesis (Planned)

Interactive Q&A (Planned)

Data Flow Example

Key Design Decisions

Non-LLM Consensus

High Confidence Default

Exact Citations

Workflow Orchestration 🔮 (Planned)

Configuration

Interface

Output Format

Implementation

Use Cases

Limitations

Future Enhancements

Philosophy

Implementation Status