Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds CUDA TF32 (TensorFloat-32) capability detection and configuration support to Axolotl. It introduces tf32 capability detection during CLI config validation, extends the configuration schema to support automatic detection via "auto" mode, updates the GPU capabilities data model, and modifies tests to include the new tf32 capability parameter. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
📖 Documentation Preview: https://69b7864d7a8bf7f12a8126cb--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit 676256b |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/axolotl/utils/schemas/config.py`:
- Around line 1223-1234: The check_tf32 model_validator currently treats
self.tf32 == None as "auto" when capabilities.tf32 is True but doesn't normalize
None to False when capabilities.tf32 is False; update the elif branch in
check_tf32 to check if self.tf32 is either "auto" or None (e.g., if self.tf32 in
(None, "auto")) and set self.tf32 = False and log the disable message so legacy
tf32: null configs are normalized to False; keep the rest of the logic and
return self as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 68ee5856-47eb-4d76-a93c-51faaceab425
📒 Files selected for processing (6)
src/axolotl/cli/config.pysrc/axolotl/utils/schemas/config.pysrc/axolotl/utils/schemas/internal/__init__.pytests/e2e/test_llama.pytests/test_validation_dataset.pytests/utils/schemas/validation/test_moe_quant.py
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| def check_tf32(self): | ||
| if self.capabilities.tf32: | ||
| if self.tf32 is None or self.tf32 == "auto": | ||
| self.tf32 = True | ||
| LOG.info( | ||
| "tf32 support detected, enabling tf32 automatically for this configuration." | ||
| ) | ||
| elif self.tf32 is None or self.tf32 == "auto": | ||
| self.tf32 = False | ||
| LOG.info("tf32 support not found, disabling tf32 for this configuration.") | ||
| return self |
There was a problem hiding this comment.
This condition feels weird to read at. Should it just be, if tf32 is auto, then cfg.tf32 = self.capabilities.tf32 ?
Description
Most folks don't need full ieee fp32 precision, so enable automaticallly tf32 if the gpu supports it
Summary by CodeRabbit
New Features
Tests