Whisper Large-v3 Fine-Tuning for Accented English ASR

Fine-tuning Whisper large-v3 (1.5B parameters) with LoRA on 100+ hours of accented English speech to improve ASR performance on underrepresented English accents.

Approach

Base model: OpenAI Whisper large-v3
Fine-tuning: QLoRA (rank=32, alpha=64, 4-bit quantization) targeting attention and feed-forward layers
Training: 15 epochs, cosine LR schedule, effective batch size 32
Hardware: Single NVIDIA A10 (24GB VRAM), ~140 hours total training time

Datasets

Dataset	Hours	Description
EdAcc	~40h	Diverse international accents, conversational English over Zoom
English Dialects	~31h	British Isles regional accents (Irish, Scottish, Welsh, Northern, Midlands, Southern)
Common Voice v24 en-AU	~40h	Australian English subset via Mozilla Data Collective (CC0-1.0)

Benchmark

Evaluated against the EdAcc Leaderboard, which benchmarks ASR systems on accented English. The leaderboard uses WER scored via sclite. Note: two evaluation protocols exist (V0.1 conversation-level, V1.0 utterance-level) with non-comparable scores.

Usage

# Install dependencies (Cell 1), restart runtime, then run from Cell 2
pip install torch torchvision torchaudio
pip install transformers accelerate datasets[audio] peft bitsandbytes evaluate jiwer librosa

The full training script is in whisper_accent_finetune.py. It handles dataset downloading, preprocessing, training, and evaluation end-to-end.

Citations

@inproceedings{sanabria23edacc,
  title={The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR},
  author={Sanabria, Ramon and Bogoychev, Nikolay and Markl, Nina and Carmantini, Andrea and Klejch, Ondrej and Bell, Peter},
  booktitle={ICASSP 2023},
  year={2023}
}

@article{radford2023robust,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  journal={ICML},
  year={2023}
}

@inproceedings{ardila2020common,
  title={Common Voice: A Massively-Multilingual Speech Corpus},
  author={Ardila, Rosana and Branez, Megan and Davis, Kelly and Henretty, Michael and Kohler, Michael and Meyer, Josh and Morais, Reuben and Saunders, Lindsay and Tyers, Francis and Weber, Gregor},
  booktitle={LREC},
  year={2020}
}

License

The training code is MIT. Dataset licenses vary: EdAcc (CC-BY-4.0), English Dialects (check source), Common Voice v24 en-AU (CC0-1.0).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
fine-tuning script.ipynb		fine-tuning script.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Large-v3 Fine-Tuning for Accented English ASR

Approach

Datasets

Benchmark

Usage

Citations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Large-v3 Fine-Tuning for Accented English ASR

Approach

Datasets

Benchmark

Usage

Citations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages