🛡️ Truth-Lens: Real-Time Audio Deepfake Detector

A sophisticated AI-powered system for detecting synthetic audio in real-time

Features • Architecture • Installation • Usage • Demo

🎯 Problem Statement

With the rise of generative AI models like ElevenLabs and VALL-E, audio deepfakes have become indistinguishable to the human ear. These synthetic voices can:

Impersonate public figures
Conduct voice-based fraud
Spread misinformation
Bypass voice authentication systems

Truth-Lens is the digital immune system that detects these threats in real-time.

✨ Features

🧠 Advanced AI Detection

Ensemble Architecture: Multi-feature CNN with attention mechanism
Feature Engineering: MFCC + Mel-Spectrogram + Spectral analysis
Real-Time Processing: 3-second analysis windows
High Accuracy: 85%+ on ASVspoof benchmark

🔍 Explainable AI

Grad-CAM Heatmaps: Visual explanation of detection
Confidence Scores: Separate probabilities for real vs fake
Decision Transparency: Shows which audio regions triggered detection

⚡ Production Ready

FastAPI Backend: Async, scalable API
Modern Frontend: React-based UI with real-time visualization
Error Handling: Robust preprocessing and validation
Rate Limiting: Protection against abuse

🏗️ Architecture

System Overview

┌─────────────────┐      ┌──────────────┐      ┌─────────────────┐
│   Browser UI    │ ───> │  FastAPI     │ ───> │  CNN Model      │
│   (React)       │ <─── │  Backend     │ <─── │  (TensorFlow)   │
└─────────────────┘      └──────────────┘      └─────────────────┘
        │                       │                        │
        │                       │                        │
        v                       v                        v
   Audio Capture          Preprocessing            Feature Extract
   (Web Audio API)        (Librosa)               (MFCC + Mel-Spec)

Model Architecture

Input Audio (3 seconds @ 16kHz)
          │
          ├─── MFCC Features (40 coefficients × 3 [Δ, ΔΔ])
          │         │
          │         └─> Conv2D(32) -> Pool -> Conv2D(64) -> Pool
          │                                          │
          ├─── Mel-Spectrogram (128 bins)            │
          │         │                                │
          │         └─> Conv2D(32) -> Pool -> Conv2D(64) -> Pool
          │                                          │
          └────────────────────┬────────────────────┘
                              │
                      Feature Concatenation
                              │
                       Attention Layer
                              │
                      Dense(256) -> Dense(128)
                              │
                       Output: [Real, Fake]

📦 Installation

Prerequisites

Python 3.9+
pip
(Optional) CUDA-enabled GPU for faster training

Quick Start

# Clone repository
git clone https://github.com/yourusername/truth-lens.git
cd truth-lens

# Install dependencies
pip install -r requirements.txt

# Create necessary directories
mkdir -p data/{raw/{real,fake},processed,models} logs

# Configure (optional)
# Edit configs/config.yaml to customize settings

🎓 Training the Model

1. Prepare Dataset

Download audio files and organize as follows:

data/raw/
├── real/           # Authentic human speech
│   ├── sample1.wav
│   ├── sample2.wav
│   └── ...
└── fake/           # AI-generated speech
    ├── sample1.wav
    ├── sample2.wav
    └── ...

Recommended Datasets:

ASVspoof 2019 LA (Gold standard)
Fake-or-Real (FoR) (Kaggle, smaller)

2. Train Model

cd src
python train.py

Training Output:

Model: data/models/truth_lens_model.h5
Best checkpoint: data/models/best_model.h5
Training curves: data/models/training_curves.png
Confusion matrix: data/models/confusion_matrix.png

3. Evaluation

python evaluate.py

🚀 Running the Application

Backend

cd src/api
python app.py

Server runs on http://localhost:8000

API Endpoints:

GET / - Health check
POST /analyze - Analyze single audio file
POST /batch-analyze - Batch processing (up to 10 files)

Frontend

cd frontend
python -m http.server 3000

Open http://localhost:3000 in your browser

💻 Usage

Web Interface

Click "ACTIVATE SHIELD"
Allow microphone access
Speak or play audio
Real-time results appear every 3 seconds

API Usage

import requests

# Upload audio file
with open('test_audio.wav', 'rb') as f:
    files = {'file': f}
    response = requests.post('http://localhost:8000/analyze', files=files)
    
result = response.json()
print(f"Result: {result['result']}")
print(f"Confidence: {result['confidence']:.1f}%")

🔬 Technical Deep Dive

Why This Approach Works

1. Multi-Feature Analysis

Human speech and AI-generated speech differ in:

Feature	Real Speech	Fake Speech
Phase Continuity	Smooth transitions	Micro-breaks
Spectral Shape	Natural variations	Perfect but unnatural patterns
Silence Patterns	Natural pauses	Robotic gaps
Formant Structure	Complex harmonics	Simplified artifacts

2. MFCC Features

MFCCs capture the vocal tract shape - how sound is produced. AI models struggle to replicate the subtle imperfections of human vocal cords.

3. Attention Mechanism

Not all parts of audio are equally important. Attention helps the model focus on:

Transition regions between phonemes
Breath sounds
Background artifacts

📊 Performance

Metrics (ASVspoof 2019 LA Dataset)

Metric	Score
Accuracy	88.5%
Precision	89.2%
Recall	87.8%
F1-Score	88.5%
AUC-ROC	0.94

Inference Speed

Average: 150ms per 3-second clip
Hardware: CPU (Intel i7)
Real-time: ✅ Yes (under 200ms threshold)

🎨 Screenshots

Main Interface

Detection in Action

Explainability Heatmap

🛣️ Roadmap

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

⚖️ Legal & Ethics

Dataset Usage

This project uses the ASVspoof 2019 dataset for training. The dataset is used strictly for non-commercial research in compliance with its distribution license.

Trademarks

"ElevenLabs," "VALL-E," and other product names are trademarks of their respective owners. This project is not affiliated with these entities.

Privacy

Truth-Lens does not:

Store audio recordings
Transmit audio to external servers (when self-hosted)
Record conversation content

Truth-Lens only analyzes:

Audio signal integrity
Spectral patterns
Statistical features

Responsible Use

This tool should be used to:

✅ Verify authenticity of audio evidence
✅ Protect against voice-based fraud
✅ Educate about deepfake threats

This tool should NOT be used to:

❌ Violate privacy
❌ Harass individuals
❌ Enable illegal surveillance

📄 License

This project is licensed under the MIT License - see LICENSE file for details.

🙏 Acknowledgments

ASVspoof Challenge for the benchmark dataset
Librosa for audio processing
TensorFlow team
FastAPI framework

📧 Contact

Project Lead: Your Name
Email: your.email@example.com
GitHub: @yourusername
LinkedIn: Your Profile

🏆 Hackathon Information

Event: Quantumard National Hackathon 2026
Track: Artificial Intelligence & Machine Learning
Team: Truth-Lens Innovations

Problem Addressed: Audio deepfakes pose a growing threat to digital trust and security. Truth-Lens provides a real-time, explainable solution.

Innovation: First system to combine multi-feature ensemble learning with attention mechanisms and real-time explainability for audio deepfake detection.

⭐ If this project helped you, please give it a star! ⭐

Made with ❤️ for a safer digital future

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
frontend		frontend
src		src
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
DELIVERY_PACKAGE.md		DELIVERY_PACKAGE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
PROJECT_STRUCTURE.txt		PROJECT_STRUCTURE.txt
README.md		README.md
config.yaml		config.yaml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run.sh		run.sh
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

🛡️ Truth-Lens: Real-Time Audio Deepfake Detector

🎯 Problem Statement

✨ Features

🧠 Advanced AI Detection

🔍 Explainable AI

⚡ Production Ready

🏗️ Architecture

System Overview

Model Architecture

📦 Installation

Prerequisites

Quick Start

🎓 Training the Model

1. Prepare Dataset

2. Train Model

3. Evaluation

🚀 Running the Application

Backend

Frontend

💻 Usage

Web Interface

API Usage

🔬 Technical Deep Dive

Why This Approach Works

1. Multi-Feature Analysis

2. MFCC Features

3. Attention Mechanism

📊 Performance

Metrics (ASVspoof 2019 LA Dataset)

Inference Speed

🎨 Screenshots

Main Interface

Detection in Action

Explainability Heatmap

🛣️ Roadmap

🤝 Contributing

⚖️ Legal & Ethics

Dataset Usage

Trademarks

Privacy

Responsible Use

📄 License

🙏 Acknowledgments

📧 Contact

🏆 Hackathon Information

⭐ If this project helped you, please give it a star! ⭐

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages