Your question
Hi, it seems like I have triggered many assertion errors when trying to train pure Mamba2 without any attention by setting NUM_ATTENTION_HEADS=0.
Can I just give
--hybrid-attention-ratio 0 \
--hybrid-mlp-ratio 0 \
and give NUM_ATTENTION_HEADS a random num to avoid triggering assertions?
I don't see all the errors by doing so.