Skip to content

Conversation

@hizv
Copy link

@hizv hizv commented Oct 30, 2025

Summary

This PR implements Baseline Hallucination Defense for TokenSmith by integrating LettuceDetect, a lightweight ModernBERT-based hallucination detector. This establishes an essential safety net that detects unsupported claims in generated answers and warns users about potential hallucinations.

Changes Made

  • New File: src/hallucination_detector.py - Wrapper for LettuceDetect API with token-level hallucination detection

  • Modified: src/main.py - Added post-generation hallucination checking in the answer pipeline

  • Modified: src/config.py - Added hallucination detection configuration fields

  • Modified: config/config.yaml - Added hallucination detection settings:

    hallucination_detection:
      enabled: true
      model_path: "KRLabsOrg/lettucedect-base-modernbert-en-v1"
      threshold: 0.1

Other Changes

  • Modified: src/embedder.py - Minor compatibility fixes
  • Modified: src/preprocessing/chunking.py - Resource optimization for constrained devices

Technical Details

Runtime Behavior

  1. Retrieval: Get top-k relevant chunks (unchanged)
  2. Generation: Local GGUF model produces answer from contexts (unchanged)
  3. Detection: LettuceDetect analyzes answer vs. contexts, returns hallucination spans
  4. Warnings: If unsupported fraction > threshold, append warning about potential hallucinations

Detection Mechanism

  • Uses ModernBERT-based model trained on RAGTruth dataset
  • Performs token-level analysis to identify unsupported spans
  • Configurable threshold (default 10% unsupported content)
  • Lightweight inference suitable for CPU deployment

Testing

  • Synthetic test: Correctly flags unsupported claims (e.g., wrong population numbers)
  • End-to-end: Chat pipeline appends warnings when detection enabled
  • Integration: Works with existing GGUF models and retrieval pipeline

Future Work

This PR establishes the foundation for:

  • RAGXplain explainability layer (human-readable explanations)
  • Proactive defenses (CRAG, Self-RAG for pre-generation filtering)

@hizv hizv force-pushed the u/himanish/feat/lettucedetect branch 2 times, most recently from 42a865d to da515f4 Compare October 30, 2025 20:52
@hizv hizv force-pushed the u/himanish/feat/lettucedetect branch from da515f4 to 5980532 Compare November 14, 2025 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant