Skip to content
Discussion options

You must be logged in to vote

The audio and the labels should be segmented into 30s or shorter chunks, to match the training distribution. I guess it should make the memory usage low enough. Using mixed precision and gradient checkpointing may further reduce the memory usage during fine-tuning.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants