Training data information #1807
marisbasha
started this conversation in
General
Replies: 1 comment 1 reply
-
Hi, we didn't specifically include those two datasets in training, but some samples from them may have been mixed into the data given the scale of the dataset. I'd still consider them "out-of-distribution". |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello @jongwook, I wonder if you could provide information if Whisper large-v2 was trained with the Ami Speech Corpus and TIMIT. I am writing a paper on using whisper embeddings as a learned similarity, and I need to know this information since if it's pretrained on those datasets it would make it hard to make assumptions.
Beta Was this translation helpful? Give feedback.
All reactions