Skip to content
Discussion options

You must be logged in to vote

Those language codes are the ones used by the Fleurs dataset, which did not include Albanian. Those are slightly different from what's used in our tokenizer.py.

In Whisper, you can specify --language Albanian or --language sq to tell the model to transcribe Albanian speech, but we unfortunately don't have quantitative metrics on Whisper's performance in Albanian. I'd be curious to know if you can share your experience.

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@marisbasha
Comment options

@florijanqosja
Comment options

@carlos-havier
Comment options

@marisbasha
Comment options

@carlos-havier
Comment options

Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
6 participants