pip install -r requirements.txtpython download_samples.pyThis creates test samples in samples/ directory:
- Human voices (clean recordings)
- TTS samples (AI-generated speech)
- Manipulated audio (pitch-shifted, time-stretched)
Option A: Web GUI (Recommended - Most User-Friendly)
python start_gui.pyOpens a beautiful web interface in your browser with:
- 🖱️ Drag-and-drop file upload
- 📊 Real-time visualizations
- 📥 Download reports
- 📁 Batch processing
Option B: Simple Command Line
# Analyze a single file
python analyze.py samples/tts/tts_smooth_prosody.wav
# Analyze all samples
python analyze.py --batch samples/python analyze.py your_audio.wavOutput:
╔══════════════════════════════════════════════════════════════╗
║ AUDIOANALYSISX1 ║
║ Voice Manipulation & AI Detection System ║
╚══════════════════════════════════════════════════════════════╝
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ANALYSIS RESULTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Voice Type: Female F0: 185.7 Hz (Median)
Physical Characteristics: Male F1: 235 Hz, F2: 473 Hz
Manipulation: DETECTED 99% (Very High)
AI Voice: DETECTED AI-Generated (Type Unknown)
⚠ EVIDENCE DETECTED:
[1] Pitch-Formant Incoherence Detected
[2] Phase Decoherence / Transient Smearing Detected
[3] Spectral Artifacts Detected
[4] AI Voice Detected (80% confidence)
python analyze.py --batch samples/ --output results/Analyzes all audio files in a directory and generates a summary table.
python tui.py interactiveProvides a menu-driven interface with:
- Single file analysis
- Batch processing
- Sample generation
- Full visualizations
- ✅ Pitch-shifting (male ↔ female voice conversion)
- ✅ Time-stretching (speed manipulation)
- ✅ Combined attacks (pitch + time)
- ✅ TTS Systems (Tacotron, FastSpeech, VITS)
- ✅ Voice Cloning (Real-Time VC, SV2TTS)
- ✅ Neural Vocoders (WaveNet, WaveGlow, HiFi-GAN)
- ✅ Deepfakes (multi-stage synthesis)
Every analysis generates:
-
JSON Report (
*_report.json)- Complete technical details
- Cryptographically signed
- Machine-readable
-
Markdown Report (
*_report.md)- Human-readable summary
- Executive summary
- Evidence details
-
Visualizations (4 PNG files)
- Overview dashboard
- Mel spectrogram
- Phase analysis
- Pitch-formant comparison
{
"ALTERATION_DETECTED": true, // Any manipulation found?
"AI_VOICE_DETECTED": true, // AI-generated voice?
"AI_TYPE": "Neural Vocoder", // Type of AI synthesis
"CONFIDENCE": "95% (Very High)", // Detection confidence
"PRESENTED_AS": "Female", // Apparent gender (from pitch)
"PROBABLE_SEX": "Male", // Actual gender (from formants)
"EVIDENCE_VECTOR_1_PITCH": "...", // Pitch manipulation evidence
"EVIDENCE_VECTOR_2_TIME": "...", // Time manipulation evidence
"EVIDENCE_VECTOR_3_SPECTRAL": "...", // Spectral artifacts
"EVIDENCE_VECTOR_4_AI": "..." // AI detection evidence
}python analyze.py audio.wav --no-vizpython analyze.py audio.wav --output ./investigation_001/python analyze.py --batch ./evidence/ --pattern "*.mp3"Extracts fundamental frequency to determine presented pitch.
Extracts vocal tract resonances (physical characteristics).
- Pitch-formant incoherence
- Mel spectrogram artifacts
- Phase decoherence/transient smearing
- Neural vocoder artifacts
- Prosody unnaturalness
- Breathing/pause patterns
- Micro-timing perfection
- Harmonic structure anomalies
- Statistical feature anomalies
Consolidates all findings with cryptographic verification.
- Use WAV files for best accuracy (lossless format)
- 3-10 seconds is optimal length
- Clean audio works better than noisy recordings
- Review visualizations for detailed analysis
| Confidence | Meaning | Action |
|---|---|---|
| 95-99% | Very High | Strong evidence, high certainty |
| 85-94% | High | Multiple indicators detected |
| 60-84% | Medium | Some indicators, review carefully |
| <60% | Low | Inconclusive, may need more analysis |
voice/
├── analyze.py # ← Simple CLI interface
├── tui.py # ← Interactive TUI
├── download_samples.py # ← Sample downloader
├── pipeline.py # Core detection pipeline
├── samples/ # Test audio samples
│ ├── human/
│ ├── tts/
│ ├── voice_cloning/
│ ├── deepfake/
│ └── manipulated/
└── results/ # Analysis outputs
pip install -r requirements.txt# Check file path
ls -la your_audio.wav
# Use absolute path
python analyze.py /full/path/to/audio.wav# Convert to WAV format
ffmpeg -i input.mp3 output.wav
# Then analyze
python analyze.py output.wav- Full Documentation: See README.md
- Technical Details: See TECHNICAL.md
- API Reference: See API.md
- Deployment Guide: See DEPLOYMENT.md
- ✅ Run
python download_samples.pyto get test samples - ✅ Try
python analyze.py samples/tts/tts_smooth_prosody.wav - ✅ Analyze your own audio files
- ✅ Explore interactive mode:
python tui.py interactive - ✅ Read full documentation for advanced features
🔬 You're ready to detect voice manipulation and AI-generated voices!