Truth Checker - LLM-Powered Fact Verification System

A lightweight fact-checking system that analyzes news posts and social media statements using Retrieval-Augmented Generation (RAG) with custom embeddings and robust exact-match logic.

Features

🔍 Claim Extraction: Uses spaCy NLP to extract key claims and entities from input text, always including the full input as a claim
⚡ Exact Match Detection: Instantly returns 'True' if your input or any claim exactly matches a fact in the database (case-insensitive)
📊 Vector Database: Embeddings stored in FAISS for fast similarity search, with persistent disk caching
🤖 LLM Analysis: OpenAI GPT for intelligent claim verification
🌐 Web Interface: Beautiful Streamlit app for easy interaction
📈 Confidence Scoring: Similarity-based confidence metrics

Quick Start

1. Setup Environment

# Create virtual environment
python -m venv env
# Windows:
env\Scripts\activate
# Mac/Linux:
# source env/bin/activate

# Install dependencies
pip install -r requirements.txt

# Download spaCy model
python -m spacy download en_core_web_sm

2. Configure API Key

Set your OpenAI API key as an environment variable (recommended):

# Windows
set OPENAI_API_KEY=your_api_key_here

# Mac/Linux
export OPENAI_API_KEY=your_api_key_here

Or add it to a .env file in the project root:

OPENAI_API_KEY=your_api_key_here

3. Run the Application

Streamlit Web App (Recommended)

streamlit run app.py

Command Line

python main.py

How It Works

Input Processing: Full input is always checked for exact match in the fact database
Claim Extraction: spaCy NLP extracts claims/entities, always including the full input
Embedding: Convert text to vectors using Sentence Transformers
Retrieval: Find similar facts in FAISS vector database (with persistent disk caching)
Analysis: LLM compares claims against retrieved facts
Verdict: Classify as True ✅, False ❌, or Unverifiable 🤷‍♂️

System Architecture

Input Text → [Exact Match Check] → Claim Extraction → Embedding → Vector Search → LLM Analysis → Verdict
              (fast)                (spaCy)           (SentenceT)   (FAISS)      (OpenAI)     (JSON)

Example Usage

Input

Prime Minister greets the people of Telangana on their Statehood Day

Output (if present in database)

{
  "claim": "Prime Minister greets the people of Telangana on their Statehood Day",
  "verdict": "True",
  "evidence": ["Prime Minister greets the people of Telangana on their Statehood Day"],
  "reasoning": "The input exactly matches a verified fact in the database.",
  "similar_facts": [{"fact": "Prime Minister greets the people of Telangana on their Statehood Day", "distance": 0.0, "similarity": 1.0}],
  "confidence": 1.0
}

File Structure

├── main.py              # Core fact-checking pipeline
├── app.py               # Streamlit web interface
├── requirements.txt     # Python dependencies
├── pib_headlines.csv    # Fact database (PIB press releases)
├── README.md            # This file
├── cache/               # Embedding and FAISS index cache
└── env/                 # Virtual environment

Key Components

1. Claim Extraction (`claim_extractor()`)

Uses spaCy's NLP pipeline
Always includes the full input text as a claim
Extracts noun chunks and named entities
Filters for meaningful claims

2. Exact Match Logic

Checks for exact (case-insensitive, whitespace-trimmed) match of input or any claim against the fact database
Returns 'True' instantly if found, bypassing embedding/LLM

3. Embedding System (`embed_headlines()`)

Sentence Transformers model: all-MiniLM-L6-v2
Creates dense vector representations
Optimized for semantic similarity
Embeddings are cached on disk for fast startup

4. Vector Database (`create_faiss_index()`)

FAISS IndexFlatL2 for L2 distance search
Fast similarity retrieval
Index is cached on disk for fast startup

5. LLM Integration (`llm_verdict()`)

OpenAI GPT-4.1 nano for reasoning
Structured JSON output
Error handling and fallbacks

Troubleshooting

Common Issues

spaCy model not found
```
python -m spacy download en_core_web_sm
```
OpenAI API errors
- Check API key is valid
- Ensure you have credits
- Check internet connection

FAISS installation issues

pip install faiss-cpu  # for CPU-only version

Cache not updating after changing facts
- The cache is keyed by a hash of the fact database file. If you update pib_headlines.csv, the cache will refresh automatically on next run.
- If you want to force a refresh, delete the files in the cache/ directory.
Exact match not detected
- Ensure your input matches a fact in the database exactly (case-insensitive, ignoring leading/trailing whitespace).
- The full input is always checked first, then each extracted claim.
Memory issues
- Reduce batch size in embedding
- Use smaller models
- Process fewer facts at once

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
cleanup_csv.py		cleanup_csv.py
main.py		main.py
pib_headlines.csv		pib_headlines.csv
pib_headlines_2025.csv		pib_headlines_2025.csv
requirements.txt		requirements.txt
scraper.ipynb		scraper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Truth Checker - LLM-Powered Fact Verification System

Features

Quick Start

1. Setup Environment

2. Configure API Key

3. Run the Application

Streamlit Web App (Recommended)

Command Line

How It Works

System Architecture

Example Usage

Input

Output (if present in database)

File Structure

Key Components

1. Claim Extraction (`claim_extractor()`)

2. Exact Match Logic

3. Embedding System (`embed_headlines()`)

4. Vector Database (`create_faiss_index()`)

5. LLM Integration (`llm_verdict()`)

Troubleshooting

Common Issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Truth Checker - LLM-Powered Fact Verification System

Features

Quick Start

1. Setup Environment

2. Configure API Key

3. Run the Application

Streamlit Web App (Recommended)

Command Line

How It Works

System Architecture

Example Usage

Input

Output (if present in database)

File Structure

Key Components

1. Claim Extraction (claim_extractor())

2. Exact Match Logic

3. Embedding System (embed_headlines())

4. Vector Database (create_faiss_index())

5. LLM Integration (llm_verdict())

Troubleshooting

Common Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Claim Extraction (`claim_extractor()`)

3. Embedding System (`embed_headlines()`)

4. Vector Database (`create_faiss_index()`)

5. LLM Integration (`llm_verdict()`)

Packages