automatically enable tf32 if supported by winglian · Pull Request #3473 · axolotl-ai-cloud/axolotl

winglian · 2026-03-06T20:36:03Z

Description

Most folks don't need full ieee fp32 precision, so enable automaticallly tf32 if the gpu supports it

Summary by CodeRabbit

New Features
- Introduced CUDA TF32 capability detection with automatic configuration that intelligently enables or disables optimization based on hardware support.
- TF32 now defaults to "auto" mode for seamless hardware-aware optimization.
Tests
- Updated test suite to validate TF32 functionality and configuration.

coderabbitai · 2026-03-06T20:36:24Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 580c1ba1-7182-48cf-9978-4807a56c4295

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds CUDA TF32 (TensorFloat-32) capability detection and configuration support to Axolotl. It introduces tf32 capability detection during CLI config validation, extends the configuration schema to support automatic detection via "auto" mode, updates the GPU capabilities data model, and modifies tests to include the new tf32 capability parameter.

Changes

Cohort / File(s)	Summary
TF32 Capability Detection & Config Schema `src/axolotl/cli/config.py`, `src/axolotl/utils/schemas/config.py`	Added tf32 capability detection in CLI config validation; extended AxolotlInputConfig and AxolotlConfigWCapabilities to support "auto" mode with post-validation logic that auto-enables/disables tf32 based on capability availability.
GPU Capabilities Model `src/axolotl/utils/schemas/internal/__init__.py`	Added tf32 boolean field with default False to GPUCapabilities model.
Test Updates `tests/e2e/test_llama.py`, `tests/test_validation_dataset.py`, `tests/utils/schemas/validation/test_moe_quant.py`	Updated test fixtures and parametrization to include tf32 capability in GPU capabilities dictionaries and test configurations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

use new tf32 APIs for torch 2.9+ #3467: Implements runtime tf32 enabling for Torch 2.9+ APIs, complementing this PR's capability detection and schema support infrastructure.

Suggested labels

ready to merge

Suggested reviewers

SalmanMohammadi
ved1beta

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'automatically enable tf32 if supported' directly and clearly describes the main change: adding automatic TF32 enablement when GPU support is available, which is reflected across all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch auto-tf32

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-06T20:43:35Z

📖 Documentation Preview: https://69b7864d7a8bf7f12a8126cb--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 676256b

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/axolotl/utils/schemas/config.py`:
- Around line 1223-1234: The check_tf32 model_validator currently treats
self.tf32 == None as "auto" when capabilities.tf32 is True but doesn't normalize
None to False when capabilities.tf32 is False; update the elif branch in
check_tf32 to check if self.tf32 is either "auto" or None (e.g., if self.tf32 in
(None, "auto")) and set self.tf32 = False and log the disable message so legacy
tf32: null configs are normalized to False; keep the rest of the logic and
return self as before.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68ee5856-47eb-4d76-a93c-51faaceab425

📥 Commits

Reviewing files that changed from the base of the PR and between 80f7088 and a2517d4.

📒 Files selected for processing (6)

src/axolotl/cli/config.py
src/axolotl/utils/schemas/config.py
src/axolotl/utils/schemas/internal/__init__.py
tests/e2e/test_llama.py
tests/test_validation_dataset.py
tests/utils/schemas/validation/test_moe_quant.py

src/axolotl/utils/schemas/config.py

codecov · 2026-03-07T03:59:20Z

Codecov Report

❌ Patch coverage is 80.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/utils/schemas/config.py	72.72%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

NanoCode012 · 2026-03-09T06:08:42Z

src/axolotl/utils/schemas/config.py

+    def check_tf32(self):
+        if self.capabilities.tf32:
+            if self.tf32 is None or self.tf32 == "auto":
+                self.tf32 = True
+                LOG.info(
+                    "tf32 support detected, enabling tf32 automatically for this configuration."
+                )
+        elif self.tf32 is None or self.tf32 == "auto":
+            self.tf32 = False
+            LOG.info("tf32 support not found, disabling tf32 for this configuration.")
+        return self


This condition feels weird to read at. Should it just be, if tf32 is auto, then cfg.tf32 = self.capabilities.tf32 ?

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

src/axolotl/utils/schemas/config.py Show resolved Hide resolved

winglian requested a review from NanoCode012 March 8, 2026 15:27

NanoCode012 reviewed Mar 9, 2026

View reviewed changes

winglian added 5 commits March 16, 2026 00:16

automatically enable tf32 if supported

60958f5

update fixtures

18d1a6e

handle only when True

0ac3c1c

Address CR comments

0af6943

address readability from pr comment

676256b

winglian force-pushed the auto-tf32 branch from e9646d8 to 676256b Compare March 16, 2026 04:18

NanoCode012 approved these changes Mar 16, 2026

View reviewed changes

NanoCode012 added the ready to merge label Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

automatically enable tf32 if supported#3473

automatically enable tf32 if supported#3473
winglian wants to merge 5 commits intomainfrom
auto-tf32

winglian commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Mar 7, 2026

Uh oh!

NanoCode012 Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

winglian commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 7, 2026

Codecov Report

Uh oh!

NanoCode012 Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

winglian commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

github-actions bot commented Mar 6, 2026 •

edited

Loading