Thank you for your great work!
When using the following command line to inference on a certain dataset:
python inference.py \
--hf-dataset xxx \
--ea-model-path Tengyunw/qwen3_8b_eagle3 \
--base-model-path Qwen/Qwen3-8B \
--temperature 0.2 \
--max-new-tokens 512 \
--total-token -1
I encountered an error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (112x151552 and 12288x4096)
It seemed that all hidden layers are concated together (37 * 4096 = 151552) while the MLP requires only three hidden layers (3 *4096 = 12288) ? Is more process in model/utils.py required?
Thank you!