[QUESTION] Setting num-attention-heads=0 for Mamba

**Your question**

Hi, it seems like I have triggered many assertion errors when trying to train pure Mamba2 without any attention by setting ```NUM_ATTENTION_HEADS=0```. 

Can I just give 
```
--hybrid-attention-ratio 0 \
--hybrid-mlp-ratio 0 \
```

and give `NUM_ATTENTION_HEADS` a random num to avoid triggering assertions? 


I don't see all the errors by doing so.