Hi, thank you for your amazing work.
I have been running experiments with Llama 3.1 base and it seems to perform really well for some tasks. I was wondering if the existing implementation will support Mistral-Small-Instruct-2409.
Furthermore, are there any plans to support newer LLMs as well as multi-gpu training for larger models ?
Thank you!