Replies: 1 comment 2 replies
-
I would advise you to check out Lightning Flash. This is what we are building there. The ability to make prediction on raw data. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm trying to fine-tune the baseline wav2vec model with my own audio training/test data using Lightning Flash, essentially exactly following the tutorial in this doc:
https://lightning-flash.readthedocs.io/en/latest/reference/speech_recognition.html
However, I am running into an issue when generating the prediction for an audio file, and I'm getting a null output:
I'm not sure what the issue is, as I've only replaced the Timit dataset with my own input data for fine-tuning, and the rest of the script follows exactly from the doc above. All of the input data are wav files with the following format:
I'm new to PyTorch Lightning and training with wav2vec as a whole, so I'm guessing that I'm missing something obvious. Any help would be greatly appreciated!
Here is the full script I'm running:
And here is a sample of the train.csv file with the annotations:
Beta Was this translation helpful? Give feedback.
All reactions