Ultra-fast local audio fingerprinting search system inspired by Shazam and Google Sound Search.
Audio Fingerprint is a production-ready audio identification system that uses acoustic fingerprinting with spectral peak extraction and combinatorial hashing. Identify songs from short audio clips in milliseconds with high accuracy.
- Fast: Query 10-second clips in <100ms against 10,000 songs
- Accurate: 95%+ recognition rate with noise and distortion
- Scalable: Handle up to 1 million songs on a single server
- Simple: Easy deployment with SQLite or in-memory storage
- Production-Ready: Docker support, logging, metrics, error handling
- REST API: Clean HTTP API for easy integration
# Clone repository
git clone <repository-url>
cd Audio-Fingerprint
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install FFmpeg (required for audio processing)
# Ubuntu/Debian: sudo apt-get install ffmpeg libsndfile1
# macOS: brew install ffmpeg
# Windows: Download from https://ffmpeg.org# Create data directory
mkdir -p data/songs
# Copy your audio files to data/songs/
# Index the songs
python scripts/train_index.py --songs-dir ./data/songs --workers 4# Start Flask development server
python -m fingerprint.api.app
# Server runs at http://localhost:5000# Using curl
curl -X POST -F "audio=@query.mp3" http://localhost:5000/api/v1/search
# Using Python
import requests
with open('query.mp3', 'rb') as f:
response = requests.post(
'http://localhost:5000/api/v1/search',
files={'audio': f}
)
print(response.json())- Audio Processing: Load audio, resample to 11025 Hz, convert to mono
- Fingerprinting: Extract spectral peaks using STFT and local maxima detection
- Hashing: Generate combinatorial hashes from peak pairs
- Matching: Query database and score candidates using time-offset histograms
Audio -> STFT -> Spectral Peaks -> Combinatorial Hashes -> Database
Query -> Same Process -> Match Hashes -> Score by Time Alignment -> Results
POST /api/v1/search
Content-Type: multipart/form-data
# Response
{
"matches": [
{
"song_id": "abc123",
"title": "Song Title",
"artist": "Artist Name",
"confidence": 0.95
}
],
"processing_time_ms": 85.3,
"found": true
}GET /api/v1/songsGET /api/v1/songs/<song_id>GET /api/v1/statsGET /api/v1/healthSee API Documentation for complete details.
| Metric | Value |
|---|---|
| Indexing Speed | 100-200 songs/minute (single core) |
| Query Time | <100ms for 10-second clip (10k songs) |
| Accuracy | 95%+ for 5+ second clips |
| Memory Usage | ~200 MB for 10,000 songs |
| Database Size | ~60 MB per 1,000 songs (SQLite) |
| Songs | Query Time | Memory (RAM) | Storage |
|---|---|---|---|
| 1,000 | 30-50 ms | 20 MB | 60 MB |
| 10,000 | 80-120 ms | 200 MB | 600 MB |
| 100,000 | 150-250 ms | 2 GB | 6 GB |
| 1,000,000 | 400-600 ms | 20 GB | 60 GB |
See Performance Documentation for detailed benchmarks.
Create a .env file:
# Flask Environment
FLASK_ENV=development
# Storage Configuration
STORAGE_TYPE=memory # memory, sqlite, postgres
SQLITE_DATABASE_PATH=./data/database/fingerprint.db
# Audio Processing
SAMPLE_RATE=11025
N_FFT=2048
HOP_LENGTH=512
PEAK_NEIGHBORHOOD_SIZE=20
MIN_AMPLITUDE=10
FAN_VALUE=5
# API Configuration
MAX_CONTENT_LENGTH=16777216 # 16 MB
CORS_ORIGINS=*
# Logging
LOG_LEVEL=INFO
LOG_FILE=./data/logs/fingerprint.logMemory Store (default):
- Fastest performance
- Data lost on restart
- Best for <100k songs
SQLite:
- Persistent storage
- Single-file database
- Best for <500k songs
PostgreSQL:
- Distributed storage
- High concurrency
- Best for >100k songs
# Build image
docker build -t audio-fingerprint:latest -f docker/Dockerfile .
# Run container
docker run -d \
--name fingerprint \
-p 5000:5000 \
-v $(pwd)/data:/app/data \
audio-fingerprint:latest
# Or use docker-compose
docker-compose up -dSee Deployment Guide for production setup.
Audio-Fingerprint/
├── config/ # Configuration files
├── fingerprint/ # Main application package
│ ├── core/ # Fingerprinting algorithms
│ ├── storage/ # Database backends
│ ├── api/ # REST API
│ ├── training/ # Indexing module
│ └── utils/ # Utilities
├── scripts/ # CLI tools
│ ├── train_index.py # Index songs
│ ├── benchmark.py # Performance tests
│ ├── export_db.py # Backup database
│ └── import_db.py # Restore database
├── tests/ # Test suite
├── docs/ # Documentation
└── data/ # Data directory (created on first run)
├── songs/ # Audio files
├── database/ # Database files
└── logs/ # Application logs
python scripts/train_index.py \
--songs-dir ./data/songs \
--storage-type sqlite \
--workers 4python scripts/benchmark.py \
--audio-dir ./data/songs \
--num-samples 10python scripts/export_db.py \
--storage-type sqlite \
--output backup.json \
--format jsonpython scripts/import_db.py \
--storage-type sqlite \
--input backup.json \
--format json# Install test dependencies
pip install pytest pytest-cov
# Run all tests
pytest tests/
# Run with coverage
pytest --cov=fingerprint tests/
# Run specific test file
pytest tests/test_fingerprinter.pyfingerprint/core/audio_processor.py- Audio loading and preprocessingfingerprint/core/fingerprinter.py- Spectral peak detectionfingerprint/core/hash_generator.py- Combinatorial hashingfingerprint/core/matcher.py- Fingerprint matching and scoringfingerprint/storage/- Storage backend implementationsfingerprint/api/- Flask REST API
- Architecture - System design and algorithms
- API Reference - Complete API documentation
- Deployment Guide - Production deployment
- Performance - Benchmarks and optimization
- Music Identification: Identify songs from audio clips
- Copyright Detection: Find copyrighted content in videos
- Duplicate Detection: Find duplicate audio files
- Audio Matching: Match audio across different sources
- Content Monitoring: Monitor audio streams for specific content
- Requires 5+ seconds of audio for reliable matching
- Performance degrades with very noisy audio
- Not suitable for speech recognition
- Database size grows linearly with song count
- Shazam: Cloud-based, 70M+ songs, proprietary
- This System: Self-hosted, unlimited songs, open source
- Chromaprint: Chroma-based fingerprinting
- This System: Spectral peak-based (similar to Shazam)
- AcoustID: Music identification service
- This System: Complete self-hosted solution
Contributions are welcome! Please feel free to submit pull requests or open issues.
See LICENSE file for details.
- Inspired by the Shazam algorithm (Avery Wang, 2003)
- Based on spectral peak fingerprinting techniques
- Uses librosa for audio processing
For questions or issues:
- Open an issue on GitHub
- Check the documentation
- Review the implementation guide
- WebSocket support for real-time streaming
- Web UI for drag-and-drop search
- Redis caching layer
- Distributed database sharding
- GPU acceleration for STFT
- Machine learning for parameter optimization
Built with Python, Flask, NumPy, SciPy, and librosa

