-
Notifications
You must be signed in to change notification settings - Fork 69
Adding SFT Notebook #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding SFT Notebook #424
Conversation
|
Notebook doesnt render in diffs so leaving some suggestions here:
|
HamidShojanazeri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @HosseinKaviani-H , hard to load the notebook in the PR here, so leaving notebook comments in the following:
- opening message "This notebook allows you to configure and run SFT training without any YAML files!"
it would be good to align with Forge message. something along these lines " This notebook introduces a seamless fine-tuning experience by abstracting away the complexities of distributed training, allowing you to configure and run SFT jobs across multiple nodes"
-
Please add a level of explanation at the top/ intro what we user will see here, dataset, hardware requirements, capabilities etc.
-
The "Benefits" section might not be very necessary, should be ok to remove it or replace it with some value props of Forge.
-
Please add a reference to Forge doc for readers to educate themselves.
-
8 steps configuration, can we please either remove "step" from the text, we can keep Step 1 configurations then follows by different cells/ sections.
-
"Alternative: Manual Lifecycle Control" section needs more clarification and explanation on actors and how this separation help.
@init27 Thanks for your comments Sanyam. I have implemented them. |
@HamidShojanazeri Thanks for the helpful comments Hamid. I have addressed your comments and updated the PR. |
…g the extra steps
- Removed _init_dist() method and its call from BaseForgeActor.__init__() - This method was removed in upstream PR meta-pytorch#561 - Distributed initialization is now handled by the provisioner - Fixed linting issues: removed unused 'os' import, combined __init__ docstring with class docstring - Keeps SFT_Notebook branch compatible with latest upstream changes
dfdcd56 to
8774767
Compare
- Added setup_eval_dataloaders() function to utils.py for multi-dataset evaluation - Added evaluate() method to TrainerActor for periodic and final evaluation - Added forward_backward_eval() for evaluation forward passes (no backprop) - Evaluation supports: - Multiple eval datasets - Periodic evaluation during training (eval_every_n_steps) - Final evaluation at end of training - Macro/micro average loss across datasets - StopAfterOneEpoch for proper epoch boundaries - max_eval_steps cap support - Fixed docstring to comply with pydoclint - Now matches full evaluation capabilities from main.py
Adds an interactive Jupyter notebook and supporting utilities to configure and run SFT training without YAML files, making experimentation more accessible.
What's New
Interactive Configuration Notebook (
interactive_config_notebook.ipynb)await run_actor()) or manual lifecycle controlSupporting Files
spawn_actor.py- Actor spawning and lifecycle managementtrainer_actor.py- Trainer actor implementationactor.py- Base actor abstractionsutils.py- Helper functionsREADME.md- DocumentationExample Usage
# Configure in notebook cells model_config = {"name": "llama3", "flavor": "8B", ...} training_config = {"local_batch_size": 1, "steps": 1000, ...} # Run training await run_actor(TrainerActor, cfg)Benefits
✅ No YAML editing required
✅ Interactive experimentation
✅ Educational with clear documentation
✅ Backward compatible - CLI workflow unchanged
✅ Production-ready
Compatibility