Support for Mistral-Small-Instruct-2409

Hi, thank you for your amazing work. 

I have been running experiments with Llama 3.1 base and it seems to perform really well for some tasks. I was wondering if the existing implementation will support Mistral-Small-Instruct-2409. 

Furthermore, are there any plans to support newer LLMs as well as multi-gpu training for larger models ? 

Thank you!