During training, why is the word embedding size also 64 and it is the same as the batch size? Is it to align one-to-one with the features?