[Refactor] refactor trainer fit loop for better code organization #1388

HAOCHENYE · 2025-12-24T08:09:56Z

Extract model input preparation logic into _prepare_model_input method
Move loss_log update logic from trainer to train_engine
Simplify _log_step method signature by using instance variables
Fix type hints: consumed_tokens and consumed_img_tokens should be int
Adjust consumed_samples calculation position for better logic flow

Copilot

Pull request overview

This PR refactors the trainer fit loop to improve code organization by extracting model input preparation logic, relocating loss_log update logic, simplifying method signatures, and fixing type hints.

Extracted model input preparation into a dedicated _prepare_model_input method for better code modularity
Moved loss_log update logic from trainer to train_engine for better separation of concerns
Simplified _log_step method signature by using instance variables instead of passing them as parameters
Fixed type hints for consumed_tokens and consumed_img_tokens from float to int with appropriate conversions

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
xtuner/v1/train/trainer.py	Refactored fit loop by extracting `_prepare_model_input` method, removed loss_log update logic (moved to engine), simplified `_log_step` signature, adjusted `consumed_samples` calculation timing, updated `_reduce_number_across_rank` type hints, and removed unused `ModelForwardExtraLogInfo` import
xtuner/v1/engine/train_engine.py	Updated type hints for `consumed_tokens` and `consumed_img_tokens` to int, added loss_log update logic (moved from trainer), and added int conversion for `consumed_tokens`
xtuner/v1/engine/vision_compose_train_engine.py	Added int conversions for `consumed_tokens` and `consumed_img_tokens` to match updated type hints

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xtuner/v1/engine/vision_compose_train_engine.py

Copilot · 2025-12-24T08:25:45Z

xtuner/v1/engine/vision_compose_train_engine.py

        other_log["extra_info"] = train_engine_extra_info  # type: ignore[assignment]
        other_log["efficient_attn_ratio"] = (efficient_forward_tokens / total_forward_tokens).item()
-        other_log["consumed_img_tokens"] = step_consumed_img_tokens
+        other_log["consumed_img_tokens"] = int(step_consumed_img_tokens)


The variable step_consumed_img_tokens is initialized as a float (0.0) on line 148 and may contain a fractional value after division on line 163. Converting to int here will truncate any fractional part. Consider using 0 instead of 0.0 on line 148 and using integer division (//) on line 163 if integer values are required, or document that truncation is intentional.

- Extract model input preparation logic into _prepare_model_input method - Move loss_log update logic from trainer to train_engine - Simplify _log_step method signature by using instance variables - Fix type hints: consumed_tokens and consumed_img_tokens should be int - Adjust consumed_samples calculation position for better logic flow

YanhuiDua · 2025-12-25T08:42:52Z

xtuner/v1/train/trainer.py

-            else:
-                extra_info_updated = ModelForwardExtraLogInfo(extra_info)
-                extra_info_dict = extra_info_updated.get()
-            loss_log.update(extra_info_dict)


这里不更新extra_info的话，sft/pretrain应该就不打印了每张卡的loss了，这个是符合预期的不

This part of the logic has been moved to 'TrainEngine', and 'Trainer' should not be aware of this part of the logic.

YanhuiDua

LGTM

HAOCHENYE · 2025-12-25T12:59:57Z

@gemini-code-assist

HAOCHENYE · 2025-12-25T13:06:28Z

/gemini review

jayhenry

LGTM

jayhenry · 2025-12-26T08:32:08Z

xtuner/v1/engine/train_engine.py

-    consumed_tokens: float
-    consumed_img_tokens: NotRequired[float]
+    consumed_tokens: int
+    consumed_img_tokens: NotRequired[int]


在下面的PR已经rename为step_consumed_tokens，在rebase时需要注意下：
参考 rename的PR，
统计变量前缀规则是：

空间上（dp rank还是reduce求和），rank的用 local_，默认reduced无前缀。

时间上（step还是累积），用step_和total_。

HAOCHENYE force-pushed the yehc/beautify-trainer-fit branch from 8428bfb to 1ef1c72 Compare December 24, 2025 08:21

HAOCHENYE requested review from HIT-cwh, YanhuiDua, Copilot and hhaAndroid and removed request for YanhuiDua December 24, 2025 08:22

Copilot started reviewing on behalf of HAOCHENYE December 24, 2025 08:22 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

HAOCHENYE force-pushed the yehc/beautify-trainer-fit branch from 1ef1c72 to 4f6412f Compare December 24, 2025 08:37

YanhuiDua reviewed Dec 25, 2025

View reviewed changes

YanhuiDua approved these changes Dec 25, 2025

View reviewed changes

jayhenry approved these changes Dec 25, 2025

View reviewed changes

jayhenry reviewed Dec 26, 2025

View reviewed changes

HIT-cwh approved these changes Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] refactor trainer fit loop for better code organization #1388

[Refactor] refactor trainer fit loop for better code organization #1388

Uh oh!

HAOCHENYE commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

YanhuiDua Dec 25, 2025

Uh oh!

HAOCHENYE Dec 25, 2025

Uh oh!

YanhuiDua Dec 25, 2025

Uh oh!

YanhuiDua left a comment

Uh oh!

HAOCHENYE commented Dec 25, 2025

Uh oh!

HAOCHENYE commented Dec 25, 2025

Uh oh!

jayhenry left a comment

Uh oh!

jayhenry Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Refactor] refactor trainer fit loop for better code organization #1388

Are you sure you want to change the base?

[Refactor] refactor trainer fit loop for better code organization #1388

Uh oh!

Conversation

HAOCHENYE commented Dec 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

YanhuiDua Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

HAOCHENYE Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

YanhuiDua Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

YanhuiDua left a comment

Choose a reason for hiding this comment

Uh oh!

HAOCHENYE commented Dec 25, 2025

Uh oh!

HAOCHENYE commented Dec 25, 2025

Uh oh!

jayhenry left a comment

Choose a reason for hiding this comment

Uh oh!

jayhenry Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants