Skip to content

Add RapidFire AI concurrent multi-config SFT training cookbook#341

Merged
sergiopaniego merged 2 commits intohuggingface:mainfrom
kamran-rapidfireAI:add-rapidfire-sft-cookbook
Jan 15, 2026
Merged

Add RapidFire AI concurrent multi-config SFT training cookbook#341
sergiopaniego merged 2 commits intohuggingface:mainfrom
kamran-rapidfireAI:add-rapidfire-sft-cookbook

Conversation

@kamran-rapidfireAI
Copy link

@kamran-rapidfireAI kamran-rapidfireAI commented Jan 5, 2026

Summary

This PR adds a new cookbook demonstrating how to fine-tune LLMs using Supervised Fine-Tuning (SFT) with RapidFire AI, enabling concurrent training of multiple configurations on a single GPU.

What's Included

  • Notebook:
    apidfire_sft_multiconfig_training.ipynb\ - Complete tutorial for concurrent multi-config SFT training
  • Image:
    rapidfire-gantt-1gpu.png\ - Diagram showing sequential vs RapidFire AI scheduling

Key Features Demonstrated

  • Concurrent LLM Experimentation: Train 4 LoRA configurations simultaneously using chunk-based scheduling
  • 16-24x Faster Experimentation: Compare multiple hyperparameter combinations in the time it takes to run one sequentially
  • TRL/Transformers Integration: Uses familiar APIs with minimal code changes
  • TensorBoard Monitoring: Real-time visualization of training progress across all configurations
  • Interactive Control Operations: Stop, Resume, Clone-Modify, and Delete runs mid-training

Checklist

  • Notebook filename is lowercase with underscores
  • Author attribution included after first header
  • Added to _toctree.yml\ under LLM Recipes
  • Added to \index.md\ Latest notebooks section
  • Cell outputs cleared
  • Image uploaded to \huggingface/cookbook-images\ dataset (included in PR for now, can be moved on request)

Note on Image

The image
apidfire-gantt-1gpu.png\ is included in this PR. Per the contribution guidelines, it should be uploaded to the \huggingface/cookbook-images\ dataset. I can coordinate with the team to upload it there, or it can be done during the merge process.

cc @merveenoyan @stevhliu

This cookbook demonstrates how to fine-tune LLMs using Supervised Fine-Tuning (SFT)
with RapidFire AI, enabling concurrent training of multiple configurations on a single GPU.

- Uses TRL/Transformers APIs with LoRA adapters
- Shows chunk-based scheduling for 16-24x faster experimentation
- Includes TensorBoard integration for real-time monitoring
- Demonstrates Interactive Control Operations (IC Ops) for mid-training management

cc @merveenoyan @stevhliu
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ccol, lgtm! please upload your images to huggingface/cookbook-images :)

pinging @sergiopaniego for a quick look in case you have any feedback!

@kamran-rapidfireAI
Copy link
Author

Hi! I tried to upload the image to \huggingface/cookbook-images\ but I'm getting a 403 Forbidden error - it seems external contributors don't have write access to that dataset.

\
403 Forbidden: Authorization error.
Cannot access content at: https://huggingface.co/datasets/huggingface/cookbook-images.git/info/lfs/objects/batch.
Make sure your token has the correct permissions.
\\

Could you please either:

  1. Grant temporary write access to my HuggingFace account (\kbigdelysh) so I can upload, or
  2. Upload it on my behalf - the image is included in this PR at
    otebooks/en/rapidfire-gantt-1gpu.png\

Once uploaded, I'll update the notebook to reference the correct URL. Let me know which option works best!

Thanks!

@stevhliu
Copy link
Member

stevhliu commented Jan 8, 2026

can you try opening a PR on the huggingface/cookbook-images repo here and then i can merge for you?

@kamran-rapidfireAI
Copy link
Author

Done! I've uploaded the image to \huggingface/cookbook-images\ and removed it from this PR.

The notebook now references the correct URL. Thanks!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kamran-rapidfireAI
Copy link
Author

@stevhliu Is everything look good to you? or I need to do something for this PR to go forward? Thanks

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this contribution!
The content is really nice. A small tip for future occasions: please consider a more subtle and neutral focus in the recipes

@sergiopaniego sergiopaniego merged commit 06ff7de into huggingface:main Jan 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants