Skip to content

automatically enable tf32 if supported#3473

Open
winglian wants to merge 5 commits intomainfrom
auto-tf32
Open

automatically enable tf32 if supported#3473
winglian wants to merge 5 commits intomainfrom
auto-tf32

Conversation

@winglian
Copy link
Collaborator

@winglian winglian commented Mar 6, 2026

Description

Most folks don't need full ieee fp32 precision, so enable automaticallly tf32 if the gpu supports it

Summary by CodeRabbit

  • New Features

    • Introduced CUDA TF32 capability detection with automatic configuration that intelligently enables or disables optimization based on hardware support.
    • TF32 now defaults to "auto" mode for seamless hardware-aware optimization.
  • Tests

    • Updated test suite to validate TF32 functionality and configuration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 580c1ba1-7182-48cf-9978-4807a56c4295

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds CUDA TF32 (TensorFloat-32) capability detection and configuration support to Axolotl. It introduces tf32 capability detection during CLI config validation, extends the configuration schema to support automatic detection via "auto" mode, updates the GPU capabilities data model, and modifies tests to include the new tf32 capability parameter.

Changes

Cohort / File(s) Summary
TF32 Capability Detection & Config Schema
src/axolotl/cli/config.py, src/axolotl/utils/schemas/config.py
Added tf32 capability detection in CLI config validation; extended AxolotlInputConfig and AxolotlConfigWCapabilities to support "auto" mode with post-validation logic that auto-enables/disables tf32 based on capability availability.
GPU Capabilities Model
src/axolotl/utils/schemas/internal/__init__.py
Added tf32 boolean field with default False to GPUCapabilities model.
Test Updates
tests/e2e/test_llama.py, tests/test_validation_dataset.py, tests/utils/schemas/validation/test_moe_quant.py
Updated test fixtures and parametrization to include tf32 capability in GPU capabilities dictionaries and test configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

ready to merge

Suggested reviewers

  • SalmanMohammadi
  • ved1beta
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'automatically enable tf32 if supported' directly and clearly describes the main change: adding automatic TF32 enablement when GPU support is available, which is reflected across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch auto-tf32
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

📖 Documentation Preview: https://69b7864d7a8bf7f12a8126cb--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 676256b

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/axolotl/utils/schemas/config.py`:
- Around line 1223-1234: The check_tf32 model_validator currently treats
self.tf32 == None as "auto" when capabilities.tf32 is True but doesn't normalize
None to False when capabilities.tf32 is False; update the elif branch in
check_tf32 to check if self.tf32 is either "auto" or None (e.g., if self.tf32 in
(None, "auto")) and set self.tf32 = False and log the disable message so legacy
tf32: null configs are normalized to False; keep the rest of the logic and
return self as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68ee5856-47eb-4d76-a93c-51faaceab425

📥 Commits

Reviewing files that changed from the base of the PR and between 80f7088 and a2517d4.

📒 Files selected for processing (6)
  • src/axolotl/cli/config.py
  • src/axolotl/utils/schemas/config.py
  • src/axolotl/utils/schemas/internal/__init__.py
  • tests/e2e/test_llama.py
  • tests/test_validation_dataset.py
  • tests/utils/schemas/validation/test_moe_quant.py

@codecov
Copy link

codecov bot commented Mar 7, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/utils/schemas/config.py 72.72% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@winglian winglian requested a review from NanoCode012 March 8, 2026 15:27
Comment on lines +1224 to +1234
def check_tf32(self):
if self.capabilities.tf32:
if self.tf32 is None or self.tf32 == "auto":
self.tf32 = True
LOG.info(
"tf32 support detected, enabling tf32 automatically for this configuration."
)
elif self.tf32 is None or self.tf32 == "auto":
self.tf32 = False
LOG.info("tf32 support not found, disabling tf32 for this configuration.")
return self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition feels weird to read at. Should it just be, if tf32 is auto, then cfg.tf32 = self.capabilities.tf32 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants