-
Notifications
You must be signed in to change notification settings - Fork 584
feat(pt): add warmup_ratio setting to set warmup steps conveniently #5134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdded two warm-up options: Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @deepmd/pt/train/training.py:
- Around line 418-426: The warmup_ratio is used to compute self.warmup_steps
without validation and with int() truncation; add a validation step: if
warmup_ratio is provided ensure 0.0 <= warmup_ratio < 1.0 and raise a clear
ValueError mentioning warmup_ratio when out of range, then compute
self.warmup_steps using round(warmup_ratio * self.num_steps) (or keep int() but
add a comment if truncation is intentional) and if round(...) == 0 while
warmup_ratio > 0, set self.warmup_steps = 1 to avoid producing zero warmup
steps; update the error message used by the existing assertion (the warmup_steps
check) to include warmup_ratio when raising.
🧹 Nitpick comments (1)
deepmd/utils/argcheck.py (1)
3344-3349: Consider adding range validation forwarmup_ratioin the argument schema.The
warmup_ratioparameter lacks explicit range constraints. Based on the implementation intraining.py, valid values should be in the range[0, 1)to ensure the computedwarmup_stepsdoesn't exceednum_steps. Adding validation here would provide clearer error messages during configuration validation rather than at runtime.♻️ Suggested improvement with validation
While the
dargsArgumentclass may not directly support min/max validation for float types, you could document the expected range more explicitly and consider adding a validation function:Argument( "warmup_ratio", float, optional=True, - doc=doc_only_pt_supported + doc_warmup_ratio, + doc=doc_only_pt_supported + doc_warmup_ratio + " Valid range: [0, 1).", ),Alternatively, if custom validation is supported, add explicit bounds checking to fail fast during configuration parsing.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/pt/train/training.pydeepmd/utils/argcheck.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: CodeQL analysis (python)
- GitHub Check: Agent
- GitHub Check: Test Python (6, 3.13)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (3, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a convenient warmup_ratio parameter for PyTorch training that allows users to specify warmup steps as a ratio of total training steps, rather than an absolute number.
Key changes:
- Added
warmup_ratioconfiguration parameter that calculates warmup steps as a fraction ofnumb_steps - Implemented precedence logic where
warmup_stepstakes priority overwarmup_ratiowhen both are specified
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| deepmd/utils/argcheck.py | Adds documentation and argument definition for the new warmup_ratio parameter, marked as PyTorch-only |
| deepmd/pt/train/training.py | Implements the warmup_ratio logic by calculating warmup_steps = int(warmup_ratio * num_steps) when warmup_steps is not explicitly set |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
deepmd/pt/train/training.py (2)
424-425: Consider clarifying why the upper bound is exclusive.The validation correctly enforces
warmup_ratio < 1, but the error message doesn't explain why 1.0 is excluded. Users might wonder why the full range isn't[0, 1].📝 Suggested improvement
if not 0 <= warmup_ratio < 1: - raise ValueError(f"warmup_ratio must be in [0, 1), got {warmup_ratio}") + raise ValueError( + f"warmup_ratio must be in [0, 1) to leave steps for training, got {warmup_ratio}" + )This also addresses the Ruff TRY003 style hint by breaking the message into multiple lines.
435-435: Add validation forwarmup_start_factorto prevent unexpected behavior.The
warmup_start_factorparameter lacks range validation. While the most common use case is[0, 1](starting from 0% to 100% of the target learning rate), negative or > 1 values could cause unexpected learning rate schedules.🛡️ Proposed validation
Add validation after line 435:
self.warmup_start_factor = training_params.get("warmup_start_factor", 0.0) if not 0 <= self.warmup_start_factor <= 1: log.warning( f"warmup_start_factor is typically in [0, 1], got {self.warmup_start_factor}. " f"This may result in unusual learning rate schedules." )Alternatively, if values outside
[0, 1]should be disallowed, useraise ValueErrorinstead oflog.warning.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/pt/train/training.pydeepmd/utils/argcheck.py
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/utils/argcheck.py
🧰 Additional context used
🪛 Ruff (0.14.10)
deepmd/pt/train/training.py
425-425: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (36)
- GitHub Check: Test Python (9, 3.10)
- GitHub Check: Test Python (10, 3.10)
- GitHub Check: Test Python (12, 3.13)
- GitHub Check: Test Python (9, 3.13)
- GitHub Check: Test Python (11, 3.13)
- GitHub Check: Test Python (4, 3.13)
- GitHub Check: Test Python (10, 3.13)
- GitHub Check: Test Python (7, 3.13)
- GitHub Check: Test Python (8, 3.13)
- GitHub Check: Test Python (8, 3.10)
- GitHub Check: Test Python (4, 3.10)
- GitHub Check: Test Python (6, 3.10)
- GitHub Check: Test Python (5, 3.13)
- GitHub Check: Test Python (7, 3.10)
- GitHub Check: Test Python (5, 3.10)
- GitHub Check: Test Python (2, 3.10)
- GitHub Check: Test Python (1, 3.13)
- GitHub Check: Test Python (1, 3.10)
- GitHub Check: Test Python (3, 3.10)
- GitHub Check: Test Python (2, 3.13)
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build wheels for cp311-macosx_arm64
- GitHub Check: Build wheels for cp311-manylinux_x86_64
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Test C++ (false, true, true, false)
- GitHub Check: Test C++ (true, false, false, true)
- GitHub Check: Test C++ (true, true, true, false)
- GitHub Check: Test C++ (false, false, false, true)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (2)
deepmd/pt/train/training.py (2)
419-434: LGTM! Warmup configuration logic is well-structured.The precedence (explicit
warmup_steps→warmup_ratio→ default 0) is clear and correctly implements the feature. The validation and warning for truncation are helpful for users.
685-691: LGTM! Linear warmup implementation is mathematically correct.The modified warmup schedule correctly interpolates the learning rate multiplier from
warmup_start_factorto1.0overwarmup_stepssteps. The division by zero is protected by the conditional check, and the formula produces the expected linear ramp.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5134 +/- ##
==========================================
- Coverage 82.15% 81.93% -0.22%
==========================================
Files 709 712 +3
Lines 72468 72900 +432
Branches 3616 3617 +1
==========================================
+ Hits 59535 59733 +198
- Misses 11769 12003 +234
Partials 1164 1164 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.