Add RapidFire AI concurrent multi-config SFT training cookbook#341
Conversation
This cookbook demonstrates how to fine-tune LLMs using Supervised Fine-Tuning (SFT) with RapidFire AI, enabling concurrent training of multiple configurations on a single GPU. - Uses TRL/Transformers APIs with LoRA adapters - Shows chunk-based scheduling for 16-24x faster experimentation - Includes TensorBoard integration for real-time monitoring - Demonstrates Interactive Control Operations (IC Ops) for mid-training management cc @merveenoyan @stevhliu
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
stevhliu
left a comment
There was a problem hiding this comment.
ccol, lgtm! please upload your images to huggingface/cookbook-images :)
pinging @sergiopaniego for a quick look in case you have any feedback!
|
Hi! I tried to upload the image to \huggingface/cookbook-images\ but I'm getting a 403 Forbidden error - it seems external contributors don't have write access to that dataset. \ Could you please either:
Once uploaded, I'll update the notebook to reference the correct URL. Let me know which option works best! Thanks! |
|
can you try opening a PR on the |
|
Done! I've uploaded the image to \huggingface/cookbook-images\ and removed it from this PR.
The notebook now references the correct URL. Thanks! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@stevhliu Is everything look good to you? or I need to do something for this PR to go forward? Thanks |
sergiopaniego
left a comment
There was a problem hiding this comment.
Thanks a lot for this contribution!
The content is really nice. A small tip for future occasions: please consider a more subtle and neutral focus in the recipes
Summary
This PR adds a new cookbook demonstrating how to fine-tune LLMs using Supervised Fine-Tuning (SFT) with RapidFire AI, enabling concurrent training of multiple configurations on a single GPU.
What's Included
apidfire_sft_multiconfig_training.ipynb\ - Complete tutorial for concurrent multi-config SFT training
rapidfire-gantt-1gpu.png\ - Diagram showing sequential vs RapidFire AI scheduling
Key Features Demonstrated
Checklist
Note on Image
The image
apidfire-gantt-1gpu.png\ is included in this PR. Per the contribution guidelines, it should be uploaded to the \huggingface/cookbook-images\ dataset. I can coordinate with the team to upload it there, or it can be done during the merge process.
cc @merveenoyan @stevhliu