Add a new model: WavLM #1966

a2d8a4v · 2026-01-02T08:57:52Z

Well, in CTranslate2, it already has wav2vec2.0 codbase, which can run wav2vec2.0, MMS, parts of omnilingual-asr models (-SSL and -CTC branches), HuBERT (which only differs in training strategy but are the same in backbone model, in the best of my knowledge). However, wavlm has gated relative mechanism to compute the gated position bias in the first attention layer with the pre-layernormed hidden states. After getting the position bias, it will added together with kv matrix just before the softmax operation (computing the attention matrix), and the position bias will be pass to later attention layers without computing it again.

The major changes comparing to wav2vec2.0 C++ codebase can be seen at two files:
src/layers/attention.cc, in which I need to modify the logic inside dot_product_attention function, and
src/layers/wavlm.cc, where I need to pass one additional object called position_bias.

I've tested the code and get the last hidden state, computed the cosine similarity with the one of the huggingface wavlm. The result is 1.0. So I think the logic of my codebae is correct.

References:

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations, NeurIPS 2020.
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, TASLP 2021.
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing, JSTSP 2022.
Scaling Speech Technology to 1,000+ Languages, JMLR 2024.
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages, arxiv 2025 submitting.

jordimas · 2026-01-03T16:02:02Z

Great work. I am looking forward to test it.

Four quick comments:

Will be possible to add it here https://github.com/OpenNMT/CTranslate2/blob/master/docs/guides/transformers.md a small example how to use it?
If you can look at this test fail:

  =========================== short test summary info ============================
FAILED python/tests/test_transformers.py::TestWavLM::test_transformers_wavlm[microsoft/wavlm-large-expected_transcription0-cpu] - TypeError: TransformerEncoderLayerSpec.__init__() got an unexpected keyword argument 'gated_relative_attention_bias'

Run python -m black python/ to fix check-python-style job that is not passing.
Consider adding a test here: https://github.com/OpenNMT/CTranslate2/blob/master/python/tests/test_transformers.py Look at what is done for Whisper.

Thanks!

jordimas · 2026-01-03T16:04:45Z

include/ctranslate2/models/wavlm.h

+  namespace models {
+
+    struct WavLMOptions {
+      // Maximum generation length.


Are we planing to use the WavLMOptions structure?
It is not referenced at the moment.

Humm, in fact it is not used at this moment.
I tried microsoft/wavlm-large for the test case, which output the last hidden state alone. It may be useful when someone using wavlm plusing a linear layer (language model) training with CTC loss, which outputs token at inferencing stage.

a2d8a4v · 2026-01-04T07:11:15Z

Hi, @jordimas
My reply to the 4 comments

Of course. Thank you for reminding me of this part, I missed finding this document haha.
I found the document also lacks the wav2vec 2.0 section. I will add this part together.
OK, it seems that I forgot to add the argument.
Sure, let me check it again
Humm, I think I've added the class TestWavLM inside test_transformers.py.
Can you take a look at: https://github.com/OpenNMT/CTranslate2/pull/1966/files#diff-87c343e816a510ee31b7408b49bf7da834849e5e62cf90b897ffc2485ccf91a1

Thanks a lot!

a2d8a4v · 2026-01-04T15:03:43Z

btw, @jordimas
I would like to inquire your advice.
About wav2vec2.0, although I've said that wav2vec2.0, hubert, mms, omnilingual-asr can use Wav2vec2.0 codebase due to the same backbone model architecture. Despite this, the current codebase can not read them directly due to the settings inside the converters. For example, HubertConfig is not set inside the converter.

It'd need some additional changes to fit those models. I'm wondering whether I should create model templates for each of them, or just change the configs and converters.

Thank you for your attention

jordimas · 2026-01-04T18:32:34Z

btw, @jordimas I would like to inquire your advice. About wav2vec2.0, although I've said that wav2vec2.0, hubert, mms, omnilingual-asr can use Wav2vec2.0 codebase due to the same backbone model architecture. Despite this, the current codebase can not read them directly due to the settings inside the converters. For example, HubertConfig is not set inside the converter.

It'd need some additional changes to fit those models. I'm wondering whether I should create model templates for each of them, or just change the configs and converters.

Thank you for your attention

Wiill be possible to add one of these on the PR to see exactly how the problem looks like?

a2d8a4v · 2026-01-07T00:29:17Z

Sure

a2d8a4v and others added 11 commits September 25, 2025 14:05

Create median_filter_gpu.cu

d476985

Create median_filter_cpu.cc

d51d661

Update median_filter.h to contain CPU and GPU compute call

f35fb69

Add CPU and GPU of median_filter operator

9cd7f17

Update median_filter.cc

08e7dc0

Merge branch 'master' into master

d645853

Add performance benchmark test

62baa10

Run the median filter tests also in CUDA device

c937f94

Merge branch 'OpenNMT:master' into master

41c658c

Merge branch 'OpenNMT:master' into master

fe7a80e

Add a new model: WavLM

bccfd43

a2d8a4v force-pushed the feat/wavlm branch from 37fb299 to 9cd497e Compare January 2, 2026 16:11

fix format of coding

d0efe55

a2d8a4v force-pushed the feat/wavlm branch from 9cd497e to d0efe55 Compare January 2, 2026 16:13

a2d8a4v and others added 2 commits January 3, 2026 09:51

Add description for encoder-only models

0426c16

Merge branch 'master' into feat/wavlm

c1ce83e

jordimas reviewed Jan 3, 2026

View reviewed changes

a2d8a4v and others added 4 commits January 4, 2026 16:36

fix a bug: missing gated_relative_attention_bias argument

f2d99e2

using black to reformat

b147d4d

resorting import

4c75c2f

Merge branch 'master' into feat/wavlm

753db44

Merge branch 'master' into feat/wavlm

488d76c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a new model: WavLM #1966

Add a new model: WavLM #1966

Uh oh!

a2d8a4v commented Jan 2, 2026 •

edited

Loading

Uh oh!

jordimas commented Jan 3, 2026 •

edited

Loading

Uh oh!

jordimas Jan 3, 2026

Uh oh!

a2d8a4v Jan 4, 2026 •

edited

Loading

Uh oh!

a2d8a4v commented Jan 4, 2026 •

edited

Loading

Uh oh!

a2d8a4v commented Jan 4, 2026

Uh oh!

jordimas commented Jan 4, 2026

Uh oh!

a2d8a4v commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a new model: WavLM #1966

Are you sure you want to change the base?

Add a new model: WavLM #1966

Uh oh!

Conversation

a2d8a4v commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordimas commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordimas Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

a2d8a4v Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a2d8a4v commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a2d8a4v commented Jan 4, 2026

Uh oh!

jordimas commented Jan 4, 2026

Uh oh!

a2d8a4v commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

a2d8a4v commented Jan 2, 2026 •

edited

Loading

jordimas commented Jan 3, 2026 •

edited

Loading

a2d8a4v Jan 4, 2026 •

edited

Loading

a2d8a4v commented Jan 4, 2026 •

edited

Loading