Skip to content

Latest commit

 

History

History
56 lines (43 loc) · 2.84 KB

File metadata and controls

56 lines (43 loc) · 2.84 KB

Multilingual IPA Phone Recognition Model

What is this?

This is the base source code to utilize a Wav2Vec2-based phone recognition / feature extraction model. It was created in the scope of the PhD thesis Phonetic Transfer Learning from Healthy References for the Analysis of Pathological Speech by Philipp Klumpp to analyze pathological speech signals.. The model parameters are deployed via huggingface. Check out the model card here.

Who wants to use this model?

  • You want to recognize IPA phones (for example to evaluate pronunciation)
  • You want to extract feature vectors and group them by their respective underlying speech sound
  • You want to work with multilingual data
  • You need a highly robust phone recognizer

This recognizer is capable of predicting phone symbols following the International Phonetic Alphabet (IPA), even for audio recorded under imperfect conditions, such as reverberation, background noise or poor recording equipment (like a smartphone). The model was trained using the multilingual Common Phone dataset.

For every recognized phone, the model also emits an associated Softmax probability, as well as two feature vectors. The first comes from the CNN block, the second from the last Transformer block.

Which IPA symbols does this model understand?

Check out /phonetics/ipa.py for the full list of IPA symbols.

Got any numbers?

Sure, the model was evaluated on the test split of Common Phone. The following results represent Phone Error Rates (PER) in percent:

Language Test PER
English 11.0
French 9.9
German 9.8
Italian 9.1
Russian 6.6
Spanish 8.8
Average 9.2

Quick start

Creating an instance of the model and downloading the parameters is only a single line of code:

    wav2vec2 = Wav2Vec2.from_pretrained("pklumpp/Wav2Vec2_CommonPhone")

The example.py script briefly summarizes the core functions of this model and how to use them. Always standardize your inputs before feeding them into the model (see the example)!

Under what license is this work distributed

Creative Commons Zero 1.0. You can use this model for any purpose, even commercially. See the LICENSE for further information.

How can I reference this work in my publication?

To cite this work, please use the following BibTex snippet:

@phdthesis{klumpp2024phdthesis,
  author  = "Philipp Klumpp",
  title   = "Phonetic Transfer Learning from Healthy References for the Analysis of Pathological Speech",
  school  = "Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg",
  address = "Erlangen, Germany",
  year    = 2024,
  month   = may
}