feat(pt): add cosine annealing lr scheduler #5133

OutisLi · 2026-01-07T06:53:19Z

Summary by CodeRabbit

New Features
- Added a cosine-annealing learning-rate scheduler; choose "cosine" in training config with customizable start/stop rates and total steps.
Chores
- Training config parsing and validation updated to accept the new "cosine" option and to raise a clear error for unsupported LR types.
Tests
- Added unit tests verifying the cosine schedule curve, mid-point values, end plateau, and boundary behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Copilot

Pull request overview

This PR adds support for cosine annealing learning rate scheduling to the PyTorch backend. The implementation provides an alternative to the existing exponential decay scheduler with a standard cosine annealing formula that smoothly decreases the learning rate from start_lr to stop_lr over the training period.

Key changes:

Added LearningRateCosine class implementing cosine annealing with formula: lr = stop_lr + (start_lr - stop_lr) * 0.5 * (1 + cos(π * step / stop_steps))
Extended configuration schema to accept "cosine" as a learning rate type option (PyTorch-only)
Refactored training logic to support multiple learning rate scheduler types with improved error handling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
deepmd/dpmodel/utils/learning_rate.py	Implements the core LearningRateCosine class with cosine annealing formula
deepmd/utils/argcheck.py	Adds configuration arguments for cosine scheduler with start_lr and stop_lr parameters
deepmd/pt/utils/learning_rate.py	Exports LearningRateCosine for PyTorch backend
deepmd/pt/train/training.py	Refactors get_lr function to handle both exponential and cosine schedulers dynamically

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

deepmd/dpmodel/utils/learning_rate.py

coderabbitai · 2026-01-07T06:56:36Z

📝 Walkthrough

Walkthrough

Added a cosine-annealing learning-rate scheduler (LearningRateCosine), integrated it into the training LR factory and public PT API, extended argument validation to accept a cosine variant, and added unit tests to verify cosine schedule behavior.

Changes

Cohort / File(s)	Summary
Core scheduler implementation `deepmd/dpmodel/utils/learning_rate.py`	Added `LearningRateCosine` class. New init(start_lr: float, stop_steps: int, stop_lr: float
Training pipeline integration `deepmd/pt/train/training.py`	Updated LR factory (`get_lr`) to branch on `lr_params["type"]` (default `"exp"`), build config including `stop_steps`, instantiate `LearningRateCosine` or `LearningRateExp`, and raise `ValueError` for unknown types. Imported `LearningRateCosine`.
PyTorch API exports `deepmd/pt/utils/learning_rate.py`	Imported and added `LearningRateCosine` to module exports (`__all__`).
Configuration & validation `deepmd/utils/argcheck.py`	Added `learning_rate_cosine()` argument spec (exposes `start_lr`, `stop_lr`/`stop_lr_factor`, `stop_steps`) and extended `learning_rate_variant_type_args()` to include a `"cosine"` variant alongside `"exp"`.
Tests `source/tests/pt/test_lr.py`	Added `TestLearningRateCosine` with `test_basic_curve` asserting LR at start, mid, end, and plateau after `stop_steps`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Trainer as Trainer (training.py)
  participant LRFactory as get_lr()
  participant LRClass as LearningRateCosine / LearningRateExp
  participant Optim as Optimizer

  User->>Trainer: start training
  Trainer->>LRFactory: build lr from lr_params (type, start_lr, stop_steps, stop_lr/stop_lr_factor)
  alt type == "cosine"
    LRFactory->>LRClass: instantiate LearningRateCosine(config)
  else type == "exp"
    LRFactory->>LRClass: instantiate LearningRateExp(config)
  end
  loop per training step
    Trainer->>LRClass: lr = value(step)
    LRClass-->>Trainer: lr (np.float64)
    Trainer->>Optim: set lr and step optimizer
    Optim-->>Trainer: step result
  end
  Note right of LRClass `#D6EAF8`: Cosine annealing computed as\nstop_lr + (start_lr - stop_lr) * 0.5*(1+cos(pi*clamped_step/stop_steps))

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: adding a cosine annealing learning rate scheduler to the PyTorch training module, which aligns with the file changes and objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

source/tests/pt/test_lr.py (1)
106-120: Consider expanding test coverage to match the comprehensiveness of existing tests.

While the basic curve test validates key behaviors (start, end, plateau, mid-point), it tests only a single parameter configuration with hardcoded values. The existing TestLearningRate class (lines 18-104) demonstrates more comprehensive testing with:

Multiple parameter combinations using np.arange

Edge case validation

Consistency checks across different configurations

Consider adding:

Tests with varied start_lr, stop_lr, and stop_steps combinations

Edge cases: stop_steps=1, very large stop_steps, step exceeding stop_steps

Verification that the curve is monotonically decreasing (for start_lr > stop_lr)

More intermediate points to verify the smoothness of the cosine curve
💡 Example: Enhanced test with multiple configurations
def test_multiple_configurations(self) -> None:
    """Test cosine annealing with various parameter combinations."""
    start_lrs = [1.0, 0.01, 0.001]
    stop_lrs = [0.1, 0.0001, 1e-8]
    stop_steps_list = [10, 100, 1000]
    
    for start_lr in start_lrs:
        for stop_lr in stop_lrs:
            if stop_lr >= start_lr:
                continue
            for stop_steps in stop_steps_list:
                lr = LearningRateCosine(start_lr, stop_lr, stop_steps)
                
                # Verify boundary conditions
                self.assertTrue(np.allclose(lr.value(0), start_lr))
                self.assertTrue(np.allclose(lr.value(stop_steps), stop_lr))
                
                # Verify monotonic decrease
                vals = [lr.value(i) for i in range(stop_steps + 1)]
                self.assertTrue(all(vals[i] >= vals[i+1] for i in range(len(vals)-1)))
deepmd/dpmodel/utils/learning_rate.py (1)
60-88: Consider adding input validation for robustness.

While the current implementation handles the critical case of stop_steps (clamping to 1), consider adding validation for other edge cases to improve robustness:

Negative step values (currently would produce unexpected results)

start_lr or stop_lr being non-positive (if that's invalid for your use case)

start_lr < stop_lr (cosine would increase rather than decrease)

This is not critical if the calling code guarantees valid inputs, but defensive validation can prevent subtle bugs.
💡 Example: Optional input validation
 def __init__(
     self,
     start_lr: float,
     stop_lr: float,
     stop_steps: int,
     **kwargs: Any,
 ) -> None:
     """
     Construct a cosine-annealed learning rate.

     Parameters
     ----------
     start_lr
         The learning rate at the start of the training.
     stop_lr
         The desired learning rate at the end of the training.
     stop_steps
         The total training steps for learning rate scheduler.
     """
+    if start_lr <= 0 or stop_lr <= 0:
+        raise ValueError("Learning rates must be positive")
+    if stop_steps <= 0:
+        raise ValueError("stop_steps must be positive")
     self.start_lr = start_lr
     self.stop_lr = stop_lr
     self.stop_steps = max(1, stop_steps)
For the value method:
 def value(self, step: int) -> np.float64:
     """Get the learning rate at the given step."""
+    if step < 0:
+        step = 0
     clamped_step = min(step, self.stop_steps)
     cosine = 0.5 * (1.0 + np.cos(np.pi * clamped_step / self.stop_steps))
     return np.float64(self.stop_lr + (self.start_lr - self.stop_lr) * cosine)

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32aa0e5 and 96576ce.

📒 Files selected for processing (2)

deepmd/dpmodel/utils/learning_rate.py
source/tests/pt/test_lr.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-15T22:22:24.889Z

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4219
File: deepmd/utils/learning_rate.py:48-53
Timestamp: 2024-10-15T22:22:24.889Z
Learning: Methods in `deepmd/utils/learning_rate.py` that return NumPy scalar types should have return type annotations using the corresponding NumPy types, such as `np.float64`.

Applied to files:

deepmd/dpmodel/utils/learning_rate.py

🧬 Code graph analysis (1)

source/tests/pt/test_lr.py (1)

deepmd/dpmodel/utils/learning_rate.py (3)

LearningRateCosine (60-88)

value (52-57)

value (84-88)

🪛 Ruff (0.14.10)

deepmd/dpmodel/utils/learning_rate.py

66-66: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)

GitHub Check: Test Python (6, 3.10)
GitHub Check: Test Python (5, 3.13)
GitHub Check: Test Python (1, 3.10)
GitHub Check: Test Python (3, 3.13)
GitHub Check: Test Python (4, 3.10)
GitHub Check: Test Python (5, 3.10)
GitHub Check: Test Python (4, 3.13)
GitHub Check: Test Python (1, 3.13)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Test C++ (true, true, true, false)
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (3)

source/tests/pt/test_lr.py (1)

10-10: LGTM!

The import is correctly added and necessary for the new test class.

deepmd/dpmodel/utils/learning_rate.py (2)

66-66: The unused kwargs parameter is acceptable for API consistency.

While Ruff flags this as unused, accepting **kwargs maintains API consistency with LearningRateExp.__init__ (line 17) and provides forward compatibility for potential extensions without breaking existing code.

84-88: LGTM! The cosine annealing implementation is mathematically correct.

The formula correctly implements cosine annealing:

At step=0: returns start_lr

At step=stop_steps: returns stop_lr

Smooth cosine interpolation in between

Clamping prevents extrapolation beyond stop_steps (plateau behavior)

Return type np.float64 follows the learning about NumPy scalar types

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

deepmd/utils/argcheck.py (1)
2512-2544: Document the precedence behavior when both stop_lr and stop_lr_factor are provided.

The documentation explains behavior when stop_lr is omitted, but doesn't clarify what happens when both stop_lr and stop_lr_factor are provided. Based on the implementation in deepmd/dpmodel/utils/learning_rate.py (lines 87-94), stop_lr takes precedence. This should be documented to avoid confusion.
📝 Suggested documentation improvement
     doc_stop_lr = "The desired learning rate at the end of the training."
     doc_stop_lr_factor = (
         "The factor to scale the learning rate at the end of the training. "
         "The actual stop_lr is calculated as `start_lr * stop_lr_factor`. "
-        "If `stop_lr` is not provided, this option will be used."
+        "If `stop_lr` is provided, it takes precedence over this option."
     )

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96576ce and d1da658.

📒 Files selected for processing (2)

deepmd/dpmodel/utils/learning_rate.py
deepmd/utils/argcheck.py

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2024-10-15T22:22:24.889Z

Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4219
File: deepmd/utils/learning_rate.py:48-53
Timestamp: 2024-10-15T22:22:24.889Z
Learning: Methods in `deepmd/utils/learning_rate.py` that return NumPy scalar types should have return type annotations using the corresponding NumPy types, such as `np.float64`.

Applied to files:

deepmd/dpmodel/utils/learning_rate.py

🧬 Code graph analysis (1)

deepmd/dpmodel/utils/learning_rate.py (2)

deepmd/tf/utils/learning_rate.py (1)

start_lr (96-98)

deepmd/pt/train/training.py (1)

step (752-1124)

🪛 Ruff (0.14.10)

deepmd/dpmodel/utils/learning_rate.py

67-67: Unused method argument: kwargs

(ARG002)

92-94: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Analyze (python)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Test C++ (true, false, false, true)
GitHub Check: Test C++ (false, false, false, true)
GitHub Check: Test C++ (false, true, true, false)
GitHub Check: Test C++ (true, true, true, false)

🔇 Additional comments (3)

deepmd/utils/argcheck.py (1)

2546-2563: LGTM! Clean integration of the cosine learning rate variant.

The cosine annealing option is correctly integrated into the existing learning rate configuration system, following the same pattern as the exponential variant. The PT-only restriction is properly documented.

deepmd/dpmodel/utils/learning_rate.py (2)

61-95: LGTM! The constructor validation logic is sound.

The parameter validation correctly ensures either stop_lr or stop_lr_factor is provided, with stop_lr taking precedence when both are specified. The clamping of stop_steps to a minimum of 1 prevents division by zero in the value() method.

The unused **kwargs parameter (flagged by static analysis) is acceptable here—it maintains consistency with LearningRateExp.__init__ and provides forward compatibility.

97-101: LGTM! The cosine annealing formula is mathematically correct.

The implementation correctly computes cosine-annealed learning rates:

At step 0: returns start_lr

At stop_steps: returns stop_lr

Steps beyond stop_steps plateau at stop_lr due to clamping

The return type np.float64 aligns with the existing LearningRateExp.value() method and follows established conventions for this module.

Based on learnings, methods in this module returning NumPy scalar types should use np.float64 annotations.

codecov · 2026-01-09T05:10:43Z

Codecov Report

❌ Patch coverage is 50.00000% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.89%. Comparing base (fe1662d) to head (d1da658).
⚠️ Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
deepmd/dpmodel/utils/learning_rate.py	23.07%	10 Missing ⚠️
deepmd/pt/train/training.py	54.54%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5133      +/-   ##
==========================================
- Coverage   82.15%   81.89%   -0.26%     
==========================================
  Files         709      712       +3     
  Lines       72468    74560    +2092     
  Branches     3616     3615       -1     
==========================================
+ Hits        59535    61063    +1528     
- Misses      11769    12334     +565     
+ Partials     1164     1163       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

iProzd · 2026-01-09T10:33:43Z

Reopened in #5142 with previous implementation.

feat(pt): add cosine annealing lr scheduler

32aa0e5

Copilot AI review requested due to automatic review settings January 7, 2026 06:53

github-actions bot added the Python label Jan 7, 2026

Copilot started reviewing on behalf of OutisLi January 7, 2026 06:53 View session

dosubot bot added the new feature label Jan 7, 2026

Copilot AI reviewed Jan 7, 2026

View reviewed changes

deepmd/dpmodel/utils/learning_rate.py Outdated Show resolved Hide resolved

deepmd/dpmodel/utils/learning_rate.py Show resolved Hide resolved

fix

96576ce

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

add stop_lr_factor

d1da658

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

wanghan-iapcm requested a review from iProzd January 8, 2026 05:12

OutisLi closed this Jan 9, 2026

OutisLi reopened this Jan 9, 2026

OutisLi marked this pull request as draft January 9, 2026 04:37

iProzd closed this Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(pt): add cosine annealing lr scheduler #5133

feat(pt): add cosine annealing lr scheduler #5133

OutisLi commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Jan 7, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

iProzd commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(pt): add cosine annealing lr scheduler #5133

feat(pt): add cosine annealing lr scheduler #5133

Conversation

OutisLi commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

iProzd commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OutisLi commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading

codecov bot commented Jan 9, 2026 •

edited

Loading