SemanticTransformer training #202

amitaie · 2023-06-26T12:35:36Z

amitaie
Jun 26, 2023

Hey, anyone that succeeded in training an AudioLM model from this repo or maybe just the SemanticTransformer and is willing to share the training params, how long did you train? some other tips?

I'm using 64 batch size, 2 sec for sample, and the other are default and i'm getting to loss ~2 very quickly but it seems that the model isn't getting better.

Thanks in advanced!

LWprogramming · 2023-06-26T19:06:38Z

LWprogramming
Jun 26, 2023

I think before this commit I was also getting loss around 2 and not decreasing, so I might check that a) you have the latest version and b) you're able to intentionally overfit to a tiny dataset. If you still have loss stuck at 2 even on a tiny dataset, then that seems like a sign of a bug.

After the commit, I was able to take audiolm_pytorch, trained it to a very tiny dataset made up of the same data file copied a bunch of times, and was able to intentionally overfit it so the loss was near (or sometimes exactly) 0 and the output sounded pretty similar to the input!

Here's the original input and what I was able to generate using the overfitted network:

out_305380_mp4_format.mp4

original_sample_mp4_format.mp4

Since the dataset was intentionally so tiny I didn't really need a ton of steps-- I set 5000 but the loss went to 0 really fast, probably unnecessary. Batch size/ accum grad every was 1 and I trained on a 1 second sample. The sample I used is at 24kHz, since I'm using EnCodec and wanted to avoid any potential issues in case there's any bug in the resampling part (rather than the model itself).

11 replies

LWprogramming Jul 2, 2023

all my training scripts are here if you want to see if my code (which seems to work) works on your machine

amitaie Jul 6, 2023
Author

@LWprogramming I'll look at it, thanks!
Can you share your's semantic loss graph? or just the range that the loss was?

amitaie Jul 10, 2023
Author

@LWprogramming I think i found my bug, I somehow didn't use grad_clip. Nevertheless my accuracy in the Semantic transformer is around 0.5, do you (or anyone else) got to a higher results? 0.5 sounds to me too low..

amitaie Jul 20, 2023
Author

@LWprogramming Did you tried training on a real dataset? I'm training Semantic and Coarse, the loss is going down and everything fine but when i'm generating there is no reasonable speech, mainly bla bla. Can youshare some of your results?

my Semantic loss after 140K steps of batch size 64 and data len of 320 * 100 is ~1.76
my Coarse loss after 300K steps of batch size 32 and data len of 320 * 100 is ~2.82

LWprogramming Jul 20, 2023

Hi! I've been trying to get multi-gpu working but keep running into bugs so no results yet :(

acul3 · 2023-07-01T16:47:51Z

acul3
Jul 1, 2023

sorry for jumping in

about semantic transformers, if i want to train language with non-english,should i train hubert/wav2vec with that language first?
since we need hubert/wav2vec checkpoint to train semantic?

0 replies

amitaie · 2023-07-02T13:01:26Z

amitaie
Jul 2, 2023
Author

On the same subject but different angle - what wav2vec did you use? i'm using Hubert with 500 clustering but in the paper they used w2v-bert with 1000 clustering. anyone trained something with 1000 clusters?

1 reply

LWprogramming Jul 2, 2023

I'm using 500 as well and it seems to be working so far

SemanticTransformer training #202

Uh oh!

amitaie Jun 26, 2023

Replies: 3 comments · 12 replies

Uh oh!

Uh oh!

LWprogramming Jun 26, 2023

Uh oh!

LWprogramming Jul 2, 2023

Uh oh!

amitaie Jul 6, 2023 Author

Uh oh!

amitaie Jul 10, 2023 Author

Uh oh!

Uh oh!

amitaie Jul 20, 2023 Author

Uh oh!

LWprogramming Jul 20, 2023

Uh oh!

acul3 Jul 1, 2023

Uh oh!

amitaie Jul 2, 2023 Author

Uh oh!

LWprogramming Jul 2, 2023

amitaie
Jun 26, 2023

Replies: 3 comments 12 replies

LWprogramming
Jun 26, 2023

amitaie Jul 6, 2023
Author

amitaie Jul 10, 2023
Author

amitaie Jul 20, 2023
Author

acul3
Jul 1, 2023

amitaie
Jul 2, 2023
Author