-
Notifications
You must be signed in to change notification settings - Fork 48
Fix buffer.total_epochs not working in SFT/DPO
#140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @pan-x-c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses a bug where the buffer.total_epochs and buffer.total_steps configuration parameters were not being correctly propagated and applied during SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) training. My changes involve modifying the configuration logic to ensure these parameters are correctly routed to the training buffer and introducing new and updated unit tests to verify the fix across both SFT and DPO training modes.
Highlights
- Core Bug Fix for Training Epochs: The primary change is in
trinity/common/config.py, specifically within the_check_buffermethod. I've introduced conditional logic to ensure thatbuffer.total_epochsandbuffer.total_stepsare correctly applied to thetrainer_input.experience_bufferwhen the system is operating in 'train' mode. Previously, these parameters were unconditionally assigned toexplorer_input.taskset, which caused them to be ineffective for SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) training. - New SFT Test Case Added: To validate the fix, I've added a new test class,
TestTrainerSFT, intests/trainer/trainer_test.py. This test explicitly configures an SFT training run with a specifiedbuffer.total_epochsand asserts that the training proceeds for the expected number of steps, confirming the correct functionality of thetotal_epochssetting for SFT. - DPO Test Case Enhanced: I've also updated the existing DPO test case within
tests/trainer/trainer_test.py. This update now explicitly setsbuffer.total_epochsandbuffer.total_stepsfor the DPO configuration, providing additional verification that these parameters are correctly honored during DPO training. - Test Environment Configuration: A minor adjustment was made to
tests/template/config.yaml, changing the cluster configuration from 2 nodes with 2 GPUs per node to 1 node with 4 GPUs per node. This change likely optimizes or aligns the test environment setup for the new and updated tests.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
/unittest-module-trainer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses an issue where buffer.total_epochs was not being respected for SFT and DPO training modes. The core logic change in trinity/common/config.py properly directs this setting to the trainer_input for train mode, and the new tests for SFT and DPO effectively validate this fix.
I've identified one critical issue that could lead to a crash if experience_buffer is not configured, and a minor issue with a leftover debug statement in a test. Once these are addressed, the PR will be ready for merging.
|
/unittest-module-trainer |
Summary
Tests
Github Test Reporter by CTRF 💚 |
Description
As the title says
Checklist
Please check the following items before code is ready to be reviewed.