Albanian language support #675

e-jajaga · 2022-12-13T14:08:19Z

e-jajaga
Dec 13, 2022

Dear all,

You have some results for Albanian language. However, in the notebook example Multilingual_ASR.ipynb the langauage code is missing.
languages = {"af_za": "Afrikaans", ...
What code have you used for Abanian langauage settings? Is it sq_al?

Answered by jongwook

Dec 13, 2022

Those language codes are the ones used by the Fleurs dataset, which did not include Albanian. Those are slightly different from what's used in our tokenizer.py.

In Whisper, you can specify --language Albanian or --language sq to tell the model to transcribe Albanian speech, but we unfortunately don't have quantitative metrics on Whisper's performance in Albanian. I'd be curious to know if you can share your experience.

View full answer

jongwook · 2022-12-13T23:24:38Z

jongwook
Dec 13, 2022
Maintainer

Those language codes are the ones used by the Fleurs dataset, which did not include Albanian. Those are slightly different from what's used in our tokenizer.py.

In Whisper, you can specify --language Albanian or --language sq to tell the model to transcribe Albanian speech, but we unfortunately don't have quantitative metrics on Whisper's performance in Albanian. I'd be curious to know if you can share your experience.

7 replies

marisbasha Feb 3, 2023

I have created a fine-tuned version for Albanian.

https://huggingface.co/spaces/n-iv/wsq

If anyone is interested in collaborating or using the model message me.

https://huggingface.co/niv-al/peshperima-large-v2

@marisbasha Thanks! Is the finetuning data available? Was interested in a pure transformer model, so I can use it with CTranslate2 :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Albanian language support #675

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Albanian language support #675

Uh oh!

Uh oh!

e-jajaga Dec 13, 2022

Replies: 1 comment · 7 replies

Uh oh!

Uh oh!

jongwook Dec 13, 2022 Maintainer

Uh oh!

marisbasha Feb 3, 2023

Uh oh!

florijanqosja Mar 27, 2023

Uh oh!

carlos-havier Apr 20, 2023

Uh oh!

marisbasha Apr 20, 2023

Uh oh!

carlos-havier Apr 24, 2023

e-jajaga
Dec 13, 2022

Replies: 1 comment 7 replies

jongwook
Dec 13, 2022
Maintainer