Skip to content

Conversation

@pan-x-c
Copy link
Collaborator

@pan-x-c pan-x-c commented Jul 23, 2025

Description

As the title says

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @pan-x-c, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug where the buffer.total_epochs and buffer.total_steps configuration parameters were not being correctly propagated and applied during SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) training. My changes involve modifying the configuration logic to ensure these parameters are correctly routed to the training buffer and introducing new and updated unit tests to verify the fix across both SFT and DPO training modes.

Highlights

  • Core Bug Fix for Training Epochs: The primary change is in trinity/common/config.py, specifically within the _check_buffer method. I've introduced conditional logic to ensure that buffer.total_epochs and buffer.total_steps are correctly applied to the trainer_input.experience_buffer when the system is operating in 'train' mode. Previously, these parameters were unconditionally assigned to explorer_input.taskset, which caused them to be ineffective for SFT (Supervised Fine-Tuning) and DPO (Direct Preference Optimization) training.
  • New SFT Test Case Added: To validate the fix, I've added a new test class, TestTrainerSFT, in tests/trainer/trainer_test.py. This test explicitly configures an SFT training run with a specified buffer.total_epochs and asserts that the training proceeds for the expected number of steps, confirming the correct functionality of the total_epochs setting for SFT.
  • DPO Test Case Enhanced: I've also updated the existing DPO test case within tests/trainer/trainer_test.py. This update now explicitly sets buffer.total_epochs and buffer.total_steps for the DPO configuration, providing additional verification that these parameters are correctly honored during DPO training.
  • Test Environment Configuration: A minor adjustment was made to tests/template/config.yaml, changing the cluster configuration from 2 nodes with 2 GPUs per node to 1 node with 4 GPUs per node. This change likely optimizes or aligns the test environment setup for the new and updated tests.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Jul 23, 2025

/unittest-module-trainer

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue where buffer.total_epochs was not being respected for SFT and DPO training modes. The core logic change in trinity/common/config.py properly directs this setting to the trainer_input for train mode, and the new tests for SFT and DPO effectively validate this fix.

I've identified one critical issue that could lead to a crash if experience_buffer is not configured, and a minor issue with a leftover debug statement in a test. Once these are addressed, the PR will be ready for merging.

@pan-x-c
Copy link
Collaborator Author

pan-x-c commented Jul 23, 2025

/unittest-module-trainer

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
9 9 0 0 0 0 821ms

Tests

Test Name Status Flaky Duration
tests/trainer/trainer_test.py::BaseTrainerCase::test_trainer 1ms
tests/trainer/trainer_test.py::TestTrainerCountdown::test_trainer 256ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 98ms
tests/trainer/trainer_test.py::TestTrainerGSM8K::test_trainer 60ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 110ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 45ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 42ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_0_queue 100ms
tests/trainer/trainer_test.py::TestFullyAsyncMode::test_fully_async_mode_1_priority_queue 98ms

Github Test Reporter by CTRF 💚

@pan-x-c pan-x-c merged commit 36344ec into agentscope-ai:main Jul 23, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants