Fix `buffer.total_epochs` not working in SFT/DPO #140

pan-x-c · 2025-07-23T10:35:01Z

Description

As the title says

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where the buffer.total_epochs and buffer.total_steps configuration parameters were not being correctly propagated and applied during SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) training. My changes involve modifying the configuration logic to ensure these parameters are correctly routed to the training buffer and introducing new and updated unit tests to verify the fix across both SFT and DPO training modes.

Highlights

Core Bug Fix for Training Epochs: The primary change is in trinity/common/config.py, specifically within the _check_buffer method. I've introduced conditional logic to ensure that buffer.total_epochs and buffer.total_steps are correctly applied to the trainer_input.experience_buffer when the system is operating in 'train' mode. Previously, these parameters were unconditionally assigned to explorer_input.taskset, which caused them to be ineffective for SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) training.
New SFT Test Case Added: To validate the fix, I've added a new test class, TestTrainerSFT, in tests/trainer/trainer_test.py. This test explicitly configures an SFT training run with a specified buffer.total_epochs and asserts that the training proceeds for the expected number of steps, confirming the correct functionality of the total_epochs setting for SFT.
DPO Test Case Enhanced: I've also updated the existing DPO test case within tests/trainer/trainer_test.py. This update now explicitly sets buffer.total_epochs and buffer.total_steps for the DPO configuration, providing additional verification that these parameters are correctly honored during DPO training.
Test Environment Configuration: A minor adjustment was made to tests/template/config.yaml, changing the cluster configuration from 2 nodes with 2 GPUs per node to 1 node with 4 GPUs per node. This change likely optimizes or aligns the test environment setup for the new and updated tests.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

pan-x-c · 2025-07-23T10:36:13Z

/unittest-module-trainer

gemini-code-assist

Code Review

This pull request correctly addresses an issue where buffer.total_epochs was not being respected for SFT and DPO training modes. The core logic change in trinity/common/config.py properly directs this setting to the trainer_input for train mode, and the new tests for SFT and DPO effectively validate this fix.

I've identified one critical issue that could lead to a crash if experience_buffer is not configured, and a minor issue with a leftover debug statement in a test. Once these are addressed, the PR will be ready for merging.

trinity/common/config.py

tests/trainer/trainer_test.py

pan-x-c · 2025-07-23T11:11:56Z

/unittest-module-trainer

github-actions · 2025-07-23T11:26:17Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
9	9	0	0	0	0	821ms

Tests

Test Name	Status	Duration
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer	✅	1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer	✅	256ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	98ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer	✅	60ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	110ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	45ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	42ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue	✅	100ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue	✅	98ms

Github Test Reporter by CTRF 💚

pan-x-c added 2 commits July 23, 2025 17:56

fix total_epochs/steps in train mode

b52efbb

add tests for sft

3aff5ef

gemini-code-assist bot reviewed Jul 23, 2025

View reviewed changes

fix config

942c622

gemini-code-assist bot reviewed Jul 23, 2025

View reviewed changes

trinity/common/config.py Show resolved Hide resolved

tests/trainer/trainer_test.py Outdated Show resolved Hide resolved

fix comments

cecf34b

hiyuchang approved these changes Jul 23, 2025

View reviewed changes

pan-x-c merged commit 36344ec into agentscope-ai:main Jul 23, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `buffer.total_epochs` not working in SFT/DPO #140

Fix `buffer.total_epochs` not working in SFT/DPO #140

Uh oh!

pan-x-c commented Jul 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

pan-x-c commented Jul 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Jul 23, 2025

Uh oh!

github-actions bot commented Jul 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix buffer.total_epochs not working in SFT/DPO #140

Fix buffer.total_epochs not working in SFT/DPO #140

Uh oh!

Conversation

pan-x-c commented Jul 23, 2025

Description

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

pan-x-c commented Jul 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

pan-x-c commented Jul 23, 2025

Uh oh!

github-actions bot commented Jul 23, 2025

Summary

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix `buffer.total_epochs` not working in SFT/DPO #140

Fix `buffer.total_epochs` not working in SFT/DPO #140