A compact, reproducible speaker-verification experiment built around a mini slice of LibriSpeech, a pretrained SpeechBrain ECAPA model, and Integrated Gradients explanations. It evaluates how simple pitch-shift anonymization affects match scores, reports EER and threshold, and visualizes time–frequency relevance.
Example results from a sample run
- Original EER: 0.00% @ threshold ≈ 0.463
- Anonymized (+4 semitones) EER: 46.88%
- Failure rates among same‑speaker pairs: +1 st ≈ 68.8% (fails to anonymize), +2 st ≈ 12.5%, +3–6 st ≈ 0% on this toy set
Numbers will vary with random seeds, subset, and environment.
- Data: small subset of LibriSpeech test-clean (auto‑downloaded)
- Model:
speechbrain/spkrec-ecapa-voxceleb - Anonymization: waveform pitch shift (+N semitones)
- Metrics: cosine similarity, EER & operating threshold
- Explainability: Integrated Gradients to waveform → projected to log‑mel heatmaps
- Failure mining: find/visualize pairs where anonymization still matches
- Click the badge at the top or open the notebook directly:
https://colab.research.google.com/github/{GITHUB_USER}/{REPO_NAME}/blob/main/notebooks/speaker_anonymization_ecapa_ig.ipynb - Runtime ▸ Change runtime type ▸ (GPU optional, but faster).
- Runtime ▸ Run all. The notebook will download a tiny dataset and produce figures + metrics.
Tip: Keep the notebook outputs cleared before committing (Colab: Edit ▸ Clear all outputs) to keep diffs and repo size clean.
You can run locally with Python ≥3.10. Make sure to install a matching PyTorch/Torchaudio for your CPU or CUDA.
# 1) Install torch/torchaudio appropriate for your system:
# See https://pytorch.org/get-started/locally/ for the right command.
pip install torch torchaudio
# 2) Install the project requirements:
pip install -r requirements.txtThen open the notebook in Jupyter:
jupyter lab notebooks/speaker_anonymization_ecapa_ig.ipynb.
├─ notebooks/
│ └─ speaker_anonymization_ecapa_ig.ipynb # main Colab-friendly notebook
├─ requirements.txt
├─ LICENSE
├─ .gitignore
└─ README.md
- Results are illustrative on a small subset; not a benchmark.
- Pitch shift is a simple anonymization; stronger methods exist (voice conversion, formant/prosody changes).
- IG explains the speaker-similarity score path (cosine vs fixed enrollment), not ASR text.
This project is released under the MIT License (see LICENSE).
Questions or suggestions? Open an issue or reach out.