ECAPA‑TDNN + Integrated Gradients for Speaker Anonymization (+4 st)

This repository evaluates how a +4 semitone pitch‑shift (as a simple anonymization) affects speaker verification and explains the model’s decisions using Integrated Gradients (Captum). We use a pretrained ECAPA‑TDNN (SpeechBrain, VoxCeleb‑trained) on LibriSpeech and produce both quantitative metrics (cosine scores, EER) and qualitative time–frequency attributions (IG heatmaps).

Model: speechbrain/spkrec-ecapa-voxceleb · Data: LibriSpeech (16 kHz) · XAI: Captum Integrated Gradients (+ optional NoiseTunnel)

Highlights

Verification baseline — ECAPA embeddings + cosine scoring on LibriSpeech.
Anonymization — apply +4 semitone pitch shift to the test utterance only.
Two independent runs — Run A and Run B, each on a different (disjoint) set of speakers, for robust comparison.
Explainability — log‑Mel + Integrated Gradients heatmaps (signed and |IG|) showing which time–frequency regions support or oppose “same speaker.”
Reproducible outputs — each run writes its own folder under results/<run_name>/ with config, log, pairs, metrics, and figures.

Project structure (suggested)

repo/
├── scripts/
│   └── LibriSpeech_ECAPA_TDNN.ipynb     # main notebook
├── data/                                # small text artifacts
│   ├── ig_run1_pairs.txt
│   └── ig_run2_pairs.txt
├── results/                             # per‑run outputs
│   ├── ig_run1/
│   │   ├── run_config.json
│   │   ├── run.log
│   │   ├── pairs.txt
│   │   └── ig_explanations/
│   │       ├── same_orig_1.png
│   │       ├── same_orig_1.abs.png
│   │       ├── same_orig_1.attr.npy
│   │       └── same_orig_1.fbank.npy
│   └── ig_run2/ ... (same layout)
└── README.md

Note: Large, derived artifacts (e.g., results/**) are typically excluded from Git; keep run_config.json, run.log, and small text files (pairs) for reproducibility.

Requirements

Python ≥ 3.10 (tested with 3.11)
PyTorch + torchaudio (CUDA optional)
SpeechBrain, Captum, scikit‑learn, tqdm, matplotlib, soundfile
LibriSpeech accessible locally (e.g., /speechdat/LibriSpeech/LibriSpeech)

Create the environment:

conda create -n xai_lrp python=3.11 -y
conda activate xai_lrp

# Pick the correct PyTorch/torchaudio wheels for your system (CPU or CUDA)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

# Core libs
pip install speechbrain==0.5.15 captum==0.7.0 librosa==0.10.2.post1 scikit-learn tqdm matplotlib soundfile

How it works (high level)

Index LibriSpeech (e.g., test-clean) and gather speakers with ≥3 utterances.
Build trials for each run:
- same‑speaker pairs (enroll vs two other utterances of the same speaker)
- different‑speaker pairs (enroll vs a different speaker)
- for both original and anonymized (+4 st) versions (anonymization applied to test only)
Score with ECAPA embeddings + cosine; compute EER split by original vs anonymized.
Explain selected trials with Integrated Gradients on the model’s log‑Mel features pipeline.
Save per‑run artifacts under results/<run_name>/.

Running the two disjoint‑speaker runs

Open scripts/LibriSpeech_ECAPA_TDNN.ipynb, set the parameters near the top (adjust paths as needed):

DATA_ROOT = "/speechdat/LibriSpeech/LibriSpeech"
SUBSET = "test-clean"
DEVICE = "auto"            # or "cuda" / "cpu"
MAX_SPEAKERS = 40
IG_STEPS = 64
EXPLAIN_PER_SPLIT = 2

Then run the multi‑run cell preconfigured to produce two runs with different speakers (disjoint cohorts), e.g.:

# Example: two runs with disjoint speakers, both using +4 semitone anonymization
spk2utt = index_librispeech(DATA_ROOT, SUBSET, min_utts_per_spk=3)
classifier = load_ecapa(device)

# Split into two non‑overlapping sets
all_spks = sorted(spk2utt.keys())
mid = len(all_spks) // 2
spk2utt_1 = {s: spk2utt[s] for s in all_spks[:mid]}
spk2utt_2 = {s: spk2utt[s] for s in all_spks[mid:]}

run_experiment(name="ig_run1", spk2utt=spk2utt_1, classifier=classifier,
               data_root=DATA_ROOT, subset=SUBSET, device=device,
               max_speakers=min(MAX_SPEAKERS, len(spk2utt_1)), seed=101,
               pitch_steps=4, ig_steps=IG_STEPS, internal_bs=16,
               explain_per_split=EXPLAIN_PER_SPLIT, smooth=True,
               nt_type="smoothgrad_sq", stdevs=0.02, nt_samples=8)

run_experiment(name="ig_run2", spk2utt=spk2utt_2, classifier=classifier,
               data_root=DATA_ROOT, subset=SUBSET, device=device,
               max_speakers=min(MAX_SPEAKERS, len(spk2utt_2)), seed=202,
               pitch_steps=4, ig_steps=IG_STEPS, internal_bs=16,
               explain_per_split=EXPLAIN_PER_SPLIT, smooth=True,
               nt_type="smoothgrad_sq", stdevs=0.02, nt_samples=8)

Each run writes to results/ig_run1/ and results/ig_run2/ respectively.

Outputs

`results/<run>/run.log`

Only the key summaries, for example:

[ig_run1] speakers=40  trials=120
[ig_run1] Original: N=120 pos=80 neg=40
[ig_run1] Anonymized: N=120 pos=80 neg=40
[ig_run1] Original EER: 3.12% @ thr=0.301
[ig_run1] Anonymized (+4 st) EER: 30.63% @ thr=0.107

`results/<run>/pairs.txt`

Tab‑separated trials (paths are relative to DATA_ROOT):

enroll_path    test_path    label    anonymized
test-clean/1089/.../0000.flac   test-clean/1089/.../0001.flac   1   0   # same speaker, original
test-clean/1089/.../0000.flac   test-clean/1089/.../0001.flac   1   1   # same speaker, anonymized (test)
test-clean/1089/.../0000.flac   test-clean/1188/.../0001.flac   0   0   # different speakers, original
test-clean/1089/.../0000.flac   test-clean/1188/.../0001.flac   0   1   # different speakers, anonymized (test)

IG heatmaps (`results/<run>/ig_explanations/*.png`)

Top panel — log‑Mel spectrogram of the test utterance (what ECAPA sees).
Bottom panel — Integrated Gradients attribution for the cosine score vs the enroll embedding:
warmer (positive) supports “same speaker,” cooler (negative) supports “different.”
Companion *.abs.png files show |IG| (importance magnitude only).

Plots can optionally be standardized to a fixed width for comparison; underlying arrays remain full‑length.

Reproducibility & logging

Each run saves a run_config.json with parameters (paths, seeds, IG settings).
A file‑only logger writes run.log with summary lines (INFO). Noisy per‑image messages are written at DEBUG and omitted from the file (but still printed to console).

Notes

The +4 semitone anonymization is applied to the test side only when anonymized=1.
IG is computed on the model’s log‑Mel feature representation for faithful attributions.
Expect some negative attributions in voiced regions (due to normalization/attention coupling).
EER will vary slightly across runs because speaker cohorts differ by design.

Acknowledgments

SpeechBrain — Mirco Ravanelli et al., SpeechBrain: A General-Purpose Speech Toolkit.
ECAPA‑TDNN — Desplanques et al., Emphasized Channel Attention, Propagation and Aggregation in TDNN based Speaker Verification.
Captum — Kokhlikyan et al., Captum: A Model Interpretability Library for PyTorch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECAPA‑TDNN + Integrated Gradients for Speaker Anonymization (+4 st)

Highlights

Project structure (suggested)

Requirements

How it works (high level)

Running the two disjoint‑speaker runs

Outputs

`results/<run>/run.log`

`results/<run>/pairs.txt`

IG heatmaps (`results/<run>/ig_explanations/*.png`)

Reproducibility & logging

Notes

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
results		results
scripts		scripts
LICENSE		LICENSE
README.md		README.md

License

Ashly1991/ecapa-ig-speaker-anonymization

Folders and files

Latest commit

History

Repository files navigation

ECAPA‑TDNN + Integrated Gradients for Speaker Anonymization (+4 st)

Highlights

Project structure (suggested)

Requirements

How it works (high level)

Running the two disjoint‑speaker runs

Outputs

results/<run>/run.log

results/<run>/pairs.txt

IG heatmaps (results/<run>/ig_explanations/*.png)

Reproducibility & logging

Notes

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`results/<run>/run.log`

`results/<run>/pairs.txt`

IG heatmaps (`results/<run>/ig_explanations/*.png`)

Packages