Skip to content

Comments

[InternalMetrics][Fix] fix dummy_fwd && refactor code && add CI#1468

Merged
HAOCHENYE merged 6 commits intoInternLM:mainfrom
nil0x9:linty/dev-rm-metric-globals
Feb 4, 2026
Merged

[InternalMetrics][Fix] fix dummy_fwd && refactor code && add CI#1468
HAOCHENYE merged 6 commits intoInternLM:mainfrom
nil0x9:linty/dev-rm-metric-globals

Conversation

@nil0x9
Copy link
Contributor

@nil0x9 nil0x9 commented Jan 30, 2026

This PR involves several improvements for internal metrics:

  1. (critical) Fixes error incurs when turning on internal metrics monitor and chunk loss is enabled. This is bc this PR adapted chunk loss to use torch.autograd.grad instead of torch.func.grad_and_value, which requires loss calculation to be performed on require-grad tensors. The original dummy forward in internal metrics monitor would perform loss calculation in no_grad mode, causing a runtime error. This PR removes unnecessary loss_ctx to avoid this.
  2. Refactor: remove global vars from original implementation for that we found the recompile count is actually on par when these vars are moved to class attributes.
  3. Add UT of internal metrics.

@nil0x9 nil0x9 force-pushed the linty/dev-rm-metric-globals branch 3 times, most recently from 30d15b1 to 91ffdf6 Compare February 4, 2026 08:31
@nil0x9 nil0x9 force-pushed the linty/dev-rm-metric-globals branch from 91ffdf6 to 87840e7 Compare February 4, 2026 10:39
@nil0x9 nil0x9 force-pushed the linty/dev-rm-metric-globals branch from 87840e7 to 4e59041 Compare February 4, 2026 10:49
@HAOCHENYE HAOCHENYE merged commit e368d87 into InternLM:main Feb 4, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants