Skip to content
Discussion options

You must be logged in to vote

Hi!

  1. The <|notimestamps|> was used 50% of the samples
  2. timestamp tokens were included in the prompt when not using <|notimestamps|> (50% of the time), and not included in the prompt when using <|notimestamps|> (the other 50% of the time). In practice, the model will mostly behave as expected with or without the timestamp tokens in the prompt.
  3. It was trained as one-hot labels, and many training examples started at <|0.00|> timestamp, which resulted in a huge bias on that token as well as on the integer timestamps. I think some form of soft labels like you suggested would mitigate this issue.
  4. That's a great point! I should not zero-pad the spectrogram but zero-pad the audio and then convert …

Replies: 5 comments 7 replies

Comment options

You must be logged in to vote
3 replies
@jumon
Comment options

@RaulKite
Comment options

@jumon
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
4 replies
@jumon
Comment options

@lunixbochs
Comment options

@jongwook
Comment options

@lunixbochs
Comment options

Answer selected by jongwook
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
7 participants