Skip to content

Conversation

@OutisLi
Copy link
Collaborator

@OutisLi OutisLi commented Jan 7, 2026

Summary by CodeRabbit

  • New Features

    • Added a cosine-annealing learning-rate scheduler; choose "cosine" in training config with customizable start/stop rates and total steps.
  • Chores

    • Training config parsing and validation updated to accept the new "cosine" option and to raise a clear error for unsupported LR types.
  • Tests

    • Added unit tests verifying the cosine schedule curve, mid-point values, end plateau, and boundary behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 7, 2026 06:53
@github-actions github-actions bot added the Python label Jan 7, 2026
@dosubot dosubot bot added the new feature label Jan 7, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for cosine annealing learning rate scheduling to the PyTorch backend. The implementation provides an alternative to the existing exponential decay scheduler with a standard cosine annealing formula that smoothly decreases the learning rate from start_lr to stop_lr over the training period.

Key changes:

  • Added LearningRateCosine class implementing cosine annealing with formula: lr = stop_lr + (start_lr - stop_lr) * 0.5 * (1 + cos(π * step / stop_steps))
  • Extended configuration schema to accept "cosine" as a learning rate type option (PyTorch-only)
  • Refactored training logic to support multiple learning rate scheduler types with improved error handling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
deepmd/dpmodel/utils/learning_rate.py Implements the core LearningRateCosine class with cosine annealing formula
deepmd/utils/argcheck.py Adds configuration arguments for cosine scheduler with start_lr and stop_lr parameters
deepmd/pt/utils/learning_rate.py Exports LearningRateCosine for PyTorch backend
deepmd/pt/train/training.py Refactors get_lr function to handle both exponential and cosine schedulers dynamically

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Added a cosine-annealing learning-rate scheduler (LearningRateCosine), integrated it into the training LR factory and public PT API, extended argument validation to accept a cosine variant, and added unit tests to verify cosine schedule behavior.

Changes

Cohort / File(s) Summary
Core scheduler implementation
deepmd/dpmodel/utils/learning_rate.py
Added LearningRateCosine class. New init(start_lr: float, stop_steps: int, stop_lr: float
Training pipeline integration
deepmd/pt/train/training.py
Updated LR factory (get_lr) to branch on lr_params["type"] (default "exp"), build config including stop_steps, instantiate LearningRateCosine or LearningRateExp, and raise ValueError for unknown types. Imported LearningRateCosine.
PyTorch API exports
deepmd/pt/utils/learning_rate.py
Imported and added LearningRateCosine to module exports (__all__).
Configuration & validation
deepmd/utils/argcheck.py
Added learning_rate_cosine() argument spec (exposes start_lr, stop_lr/stop_lr_factor, stop_steps) and extended learning_rate_variant_type_args() to include a "cosine" variant alongside "exp".
Tests
source/tests/pt/test_lr.py
Added TestLearningRateCosine with test_basic_curve asserting LR at start, mid, end, and plateau after stop_steps.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Trainer as Trainer (training.py)
  participant LRFactory as get_lr()
  participant LRClass as LearningRateCosine / LearningRateExp
  participant Optim as Optimizer

  User->>Trainer: start training
  Trainer->>LRFactory: build lr from lr_params (type, start_lr, stop_steps, stop_lr/stop_lr_factor)
  alt type == "cosine"
    LRFactory->>LRClass: instantiate LearningRateCosine(config)
  else type == "exp"
    LRFactory->>LRClass: instantiate LearningRateExp(config)
  end
  loop per training step
    Trainer->>LRClass: lr = value(step)
    LRClass-->>Trainer: lr (np.float64)
    Trainer->>Optim: set lr and step optimizer
    Optim-->>Trainer: step result
  end
  Note right of LRClass `#D6EAF8`: Cosine annealing computed as\nstop_lr + (start_lr - stop_lr) * 0.5*(1+cos(pi*clamped_step/stop_steps))
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding a cosine annealing learning rate scheduler to the PyTorch training module, which aligns with the file changes and objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
source/tests/pt/test_lr.py (1)

106-120: Consider expanding test coverage to match the comprehensiveness of existing tests.

While the basic curve test validates key behaviors (start, end, plateau, mid-point), it tests only a single parameter configuration with hardcoded values. The existing TestLearningRate class (lines 18-104) demonstrates more comprehensive testing with:

  • Multiple parameter combinations using np.arange
  • Edge case validation
  • Consistency checks across different configurations

Consider adding:

  • Tests with varied start_lr, stop_lr, and stop_steps combinations
  • Edge cases: stop_steps=1, very large stop_steps, step exceeding stop_steps
  • Verification that the curve is monotonically decreasing (for start_lr > stop_lr)
  • More intermediate points to verify the smoothness of the cosine curve
💡 Example: Enhanced test with multiple configurations
def test_multiple_configurations(self) -> None:
    """Test cosine annealing with various parameter combinations."""
    start_lrs = [1.0, 0.01, 0.001]
    stop_lrs = [0.1, 0.0001, 1e-8]
    stop_steps_list = [10, 100, 1000]
    
    for start_lr in start_lrs:
        for stop_lr in stop_lrs:
            if stop_lr >= start_lr:
                continue
            for stop_steps in stop_steps_list:
                lr = LearningRateCosine(start_lr, stop_lr, stop_steps)
                
                # Verify boundary conditions
                self.assertTrue(np.allclose(lr.value(0), start_lr))
                self.assertTrue(np.allclose(lr.value(stop_steps), stop_lr))
                
                # Verify monotonic decrease
                vals = [lr.value(i) for i in range(stop_steps + 1)]
                self.assertTrue(all(vals[i] >= vals[i+1] for i in range(len(vals)-1)))
deepmd/dpmodel/utils/learning_rate.py (1)

60-88: Consider adding input validation for robustness.

While the current implementation handles the critical case of stop_steps (clamping to 1), consider adding validation for other edge cases to improve robustness:

  • Negative step values (currently would produce unexpected results)
  • start_lr or stop_lr being non-positive (if that's invalid for your use case)
  • start_lr < stop_lr (cosine would increase rather than decrease)

This is not critical if the calling code guarantees valid inputs, but defensive validation can prevent subtle bugs.

💡 Example: Optional input validation
 def __init__(
     self,
     start_lr: float,
     stop_lr: float,
     stop_steps: int,
     **kwargs: Any,
 ) -> None:
     """
     Construct a cosine-annealed learning rate.

     Parameters
     ----------
     start_lr
         The learning rate at the start of the training.
     stop_lr
         The desired learning rate at the end of the training.
     stop_steps
         The total training steps for learning rate scheduler.
     """
+    if start_lr <= 0 or stop_lr <= 0:
+        raise ValueError("Learning rates must be positive")
+    if stop_steps <= 0:
+        raise ValueError("stop_steps must be positive")
     self.start_lr = start_lr
     self.stop_lr = stop_lr
     self.stop_steps = max(1, stop_steps)

For the value method:

 def value(self, step: int) -> np.float64:
     """Get the learning rate at the given step."""
+    if step < 0:
+        step = 0
     clamped_step = min(step, self.stop_steps)
     cosine = 0.5 * (1.0 + np.cos(np.pi * clamped_step / self.stop_steps))
     return np.float64(self.stop_lr + (self.start_lr - self.stop_lr) * cosine)
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 32aa0e5 and 96576ce.

📒 Files selected for processing (2)
  • deepmd/dpmodel/utils/learning_rate.py
  • source/tests/pt/test_lr.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-15T22:22:24.889Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4219
File: deepmd/utils/learning_rate.py:48-53
Timestamp: 2024-10-15T22:22:24.889Z
Learning: Methods in `deepmd/utils/learning_rate.py` that return NumPy scalar types should have return type annotations using the corresponding NumPy types, such as `np.float64`.

Applied to files:

  • deepmd/dpmodel/utils/learning_rate.py
🧬 Code graph analysis (1)
source/tests/pt/test_lr.py (1)
deepmd/dpmodel/utils/learning_rate.py (3)
  • LearningRateCosine (60-88)
  • value (52-57)
  • value (84-88)
🪛 Ruff (0.14.10)
deepmd/dpmodel/utils/learning_rate.py

66-66: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)
  • GitHub Check: Test Python (6, 3.10)
  • GitHub Check: Test Python (5, 3.13)
  • GitHub Check: Test Python (1, 3.10)
  • GitHub Check: Test Python (3, 3.13)
  • GitHub Check: Test Python (4, 3.10)
  • GitHub Check: Test Python (5, 3.10)
  • GitHub Check: Test Python (4, 3.13)
  • GitHub Check: Test Python (1, 3.13)
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test C++ (true, false, false, true)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Test C++ (true, true, true, false)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
🔇 Additional comments (3)
source/tests/pt/test_lr.py (1)

10-10: LGTM!

The import is correctly added and necessary for the new test class.

deepmd/dpmodel/utils/learning_rate.py (2)

66-66: The unused kwargs parameter is acceptable for API consistency.

While Ruff flags this as unused, accepting **kwargs maintains API consistency with LearningRateExp.__init__ (line 17) and provides forward compatibility for potential extensions without breaking existing code.


84-88: LGTM! The cosine annealing implementation is mathematically correct.

The formula correctly implements cosine annealing:

  • At step=0: returns start_lr
  • At step=stop_steps: returns stop_lr
  • Smooth cosine interpolation in between
  • Clamping prevents extrapolation beyond stop_steps (plateau behavior)
  • Return type np.float64 follows the learning about NumPy scalar types

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deepmd/utils/argcheck.py (1)

2512-2544: Document the precedence behavior when both stop_lr and stop_lr_factor are provided.

The documentation explains behavior when stop_lr is omitted, but doesn't clarify what happens when both stop_lr and stop_lr_factor are provided. Based on the implementation in deepmd/dpmodel/utils/learning_rate.py (lines 87-94), stop_lr takes precedence. This should be documented to avoid confusion.

📝 Suggested documentation improvement
     doc_stop_lr = "The desired learning rate at the end of the training."
     doc_stop_lr_factor = (
         "The factor to scale the learning rate at the end of the training. "
         "The actual stop_lr is calculated as `start_lr * stop_lr_factor`. "
-        "If `stop_lr` is not provided, this option will be used."
+        "If `stop_lr` is provided, it takes precedence over this option."
     )
📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96576ce and d1da658.

📒 Files selected for processing (2)
  • deepmd/dpmodel/utils/learning_rate.py
  • deepmd/utils/argcheck.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2024-10-15T22:22:24.889Z
Learnt from: njzjz
Repo: deepmodeling/deepmd-kit PR: 4219
File: deepmd/utils/learning_rate.py:48-53
Timestamp: 2024-10-15T22:22:24.889Z
Learning: Methods in `deepmd/utils/learning_rate.py` that return NumPy scalar types should have return type annotations using the corresponding NumPy types, such as `np.float64`.

Applied to files:

  • deepmd/dpmodel/utils/learning_rate.py
🧬 Code graph analysis (1)
deepmd/dpmodel/utils/learning_rate.py (2)
deepmd/tf/utils/learning_rate.py (1)
  • start_lr (96-98)
deepmd/pt/train/training.py (1)
  • step (752-1124)
🪛 Ruff (0.14.10)
deepmd/dpmodel/utils/learning_rate.py

67-67: Unused method argument: kwargs

(ARG002)


92-94: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: Build wheels for cp311-macosx_arm64
  • GitHub Check: Build wheels for cp311-macosx_x86_64
  • GitHub Check: Build wheels for cp310-manylinux_aarch64
  • GitHub Check: Build wheels for cp311-win_amd64
  • GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
  • GitHub Check: Build wheels for cp311-manylinux_x86_64
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (c-cpp)
  • GitHub Check: Build C++ (cuda120, cuda)
  • GitHub Check: Build C++ (clang, clang)
  • GitHub Check: Build C++ (rocm, rocm)
  • GitHub Check: Build C++ (cpu, cpu)
  • GitHub Check: Test C++ (true, false, false, true)
  • GitHub Check: Test C++ (false, false, false, true)
  • GitHub Check: Test C++ (false, true, true, false)
  • GitHub Check: Test C++ (true, true, true, false)
🔇 Additional comments (3)
deepmd/utils/argcheck.py (1)

2546-2563: LGTM! Clean integration of the cosine learning rate variant.

The cosine annealing option is correctly integrated into the existing learning rate configuration system, following the same pattern as the exponential variant. The PT-only restriction is properly documented.

deepmd/dpmodel/utils/learning_rate.py (2)

61-95: LGTM! The constructor validation logic is sound.

The parameter validation correctly ensures either stop_lr or stop_lr_factor is provided, with stop_lr taking precedence when both are specified. The clamping of stop_steps to a minimum of 1 prevents division by zero in the value() method.

The unused **kwargs parameter (flagged by static analysis) is acceptable here—it maintains consistency with LearningRateExp.__init__ and provides forward compatibility.


97-101: LGTM! The cosine annealing formula is mathematically correct.

The implementation correctly computes cosine-annealed learning rates:

  • At step 0: returns start_lr
  • At stop_steps: returns stop_lr
  • Steps beyond stop_steps plateau at stop_lr due to clamping

The return type np.float64 aligns with the existing LearningRateExp.value() method and follows established conventions for this module.

Based on learnings, methods in this module returning NumPy scalar types should use np.float64 annotations.

@wanghan-iapcm wanghan-iapcm requested a review from iProzd January 8, 2026 05:12
@OutisLi OutisLi closed this Jan 9, 2026
@OutisLi OutisLi reopened this Jan 9, 2026
@OutisLi OutisLi marked this pull request as draft January 9, 2026 04:37
@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.89%. Comparing base (fe1662d) to head (d1da658).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/dpmodel/utils/learning_rate.py 23.07% 10 Missing ⚠️
deepmd/pt/train/training.py 54.54% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5133      +/-   ##
==========================================
- Coverage   82.15%   81.89%   -0.26%     
==========================================
  Files         709      712       +3     
  Lines       72468    74560    +2092     
  Branches     3616     3615       -1     
==========================================
+ Hits        59535    61063    +1528     
- Misses      11769    12334     +565     
+ Partials     1164     1163       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@iProzd
Copy link
Collaborator

iProzd commented Jan 9, 2026

Reopened in #5142 with previous implementation.

@iProzd iProzd closed this Jan 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants