Windows is a tricky beast for running this model. Decided to test this model on the only box I have with the capacity - a windows 11 machine.
This is as far as I can get:
Downloaded the 7-8GB Spiking Brain-7B model weights
Installed PyTorch 2.6.0 with CUDA 12.4 support
Set up all CUDA paths and Triton compiler
Model loads successfully - I can see it using GPU 30-90%
Triton CUDA kernels compile correctly for my RTX 3080 Ti
Current Blocker:
The model requires flash-attn (Flash Attention 2.7.3), which doesn't officially support Windows. This is blocking
the forward pass.
The answer may be WSL2 in Win11 for folks like me who are (for the moment) stuck with their best hardware on Windows environments. I'll update this comment if this is successful as it may help others.