Skip to content
Discussion options

You must be logged in to vote

For any one facing such issue in transformers.

  1. Check your dataset, if you have too many incorrect clips (with silence or incorrect audio format, sample rate should be exactly 16000)
  2. Check your clips length, it should be cut if you have more than 30 seconds, otherwise you may face issues with padding and hallucination
  3. Use latest transformers version
  4. Reduce learning rate and batch size (in my case big batch size and learning rate(1e-4 and 80batch size with accumulation 2) made generalization worse and sometimes even caused issues with hallucination, not sure why). In my case the model did very well with 1e-5/2 learning rate and batch size 32 on medium model.

In my case, the dataset was fi…

Replies: 3 comments 11 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
11 replies
@a-nahar
Comment options

@a-nahar
Comment options

@DavraYoung
Comment options

@Stanwang1210
Comment options

@luvwinnie
Comment options

Answer selected by DavraYoung
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
8 participants