[Fix] Mixed Precision (float16) numerical instability in GroupNormalization with small epsilon by ChiragSW · Pull Request #22589 · keras-team/keras

ChiragSW · 2026-03-28T20:40:55Z

Root Cause

With autocast=True (it is true in default), inputs were cast to float16 before reaching call(). Values exceeding float16's max (65504) overflowed to inf, causing NaN propagation through normalization math. The existing internal float32 upcast couldn't recover already-lost values.

Fix

self.autocast = False keeps inputs in their original dtype (float32), preventing overflow
autocast=False on gamma/beta weights stores weights in float32 for precision
ops.cast(outputs, self.compute_dtype) returns proper float16 output for mixed precision

I also added regression tests:

test_large_value_within_autocast_scope : verifies weights aren't corrupted by autocast (same test in BatchNormalization and LayerNormalization)
test_mixed_float16_large_inputs : catches actual NaN bug

…zation with small epsilon

gemini-code-assist

Code Review

This pull request updates the GroupNormalization layer to improve numerical stability during mixed precision training. It disables automatic casting for the layer and its weights (gamma and beta) and adds an explicit cast to compute_dtype at the end of the call method. New test cases are included to ensure that large input values do not result in NaNs when running within a float16 autocast scope. I have no feedback to provide.

codecov-commenter · 2026-03-28T20:46:51Z

Codecov Report

❌ Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.28%. Comparing base (28a83c5) to head (431462d).

Files with missing lines	Patch %	Lines
...as/src/layers/normalization/group_normalization.py	82.60%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #22589      +/-   ##
==========================================
- Coverage   83.28%   83.28%   -0.01%     
==========================================
  Files         596      596              
  Lines       68089    68110      +21     
  Branches    10607    10611       +4     
==========================================
+ Hits        56711    56728      +17     
- Misses       8634     8637       +3     
- Partials     2744     2745       +1

Flag	Coverage Δ
keras	`83.09% <82.60%> (-0.01%)`	⬇️
keras-jax	`59.67% <82.60%> (+<0.01%)`	⬆️
keras-numpy	`55.35% <82.60%> (+<0.01%)`	⬆️
keras-openvino	`53.34% <82.60%> (+<0.01%)`	⬆️
keras-tensorflow	`61.04% <82.60%> (+<0.01%)`	⬆️
keras-torch	`59.86% <78.26%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ChiragSW · 2026-03-28T22:14:36Z

@hertschuh please review

amadhan882 · 2026-03-30T16:56:05Z

The fix in #22589 correctly addresses the numerical instability I reported in #22586. Disabling autocast to keep internal computations in float32 is the right approach and aligns with other normalization layers. Thanks for the quick fix @ChiragSW!

hertschuh · 2026-03-31T23:23:30Z

keras/src/layers/normalization/group_normalization.py

    ):
        super().__init__(**kwargs)
        self.supports_masking = True
+        self.autocast = False


We should not hardcode self.autocast = False. The fix is indeed to do this:

keras.layers.GroupNormalization(groups=8, epsilon=1e-12, autocast=False)

But this should be controlled by users, not hardcoded.

The contract of autocast is to accept lower precision to improve speed, and that option should remain open to people who want it.

Now, we could print a warning if the epsilon is lower than the precision, because this is not achievable.

…ocast argument

[Fix] Mixed Precision (float16) numerical instability in GroupNormali…

362c937

…zation with small epsilon

google-ml-butler bot added the size:S label Mar 28, 2026

google-ml-butler bot assigned gbaned Mar 28, 2026

gemini-code-assist bot reviewed Mar 28, 2026

View reviewed changes

keerthanakadiri added the stat:awaiting keras-eng Awaiting response from Keras engineer label Mar 30, 2026

hertschuh requested changes Mar 31, 2026

View reviewed changes

hertschuh added stat:awaiting response from contributor and removed stat:awaiting keras-eng Awaiting response from Keras engineer labels Mar 31, 2026

ChiragSW and others added 4 commits April 1, 2026 20:56

removed the hardcoded self.autocast = False and added an explicit aut…

d46a8bb

…ocast argument

Merge branch 'master' into issue#22586

b542b17

Update group_normalization_test.py

3cba1d9

format changes

431462d

ChiragSW requested a review from hertschuh April 1, 2026 16:53

google-ml-butler bot added the awaiting review label Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Mixed Precision (float16) numerical instability in GroupNormalization with small epsilon#22589

[Fix] Mixed Precision (float16) numerical instability in GroupNormalization with small epsilon#22589
ChiragSW wants to merge 5 commits intokeras-team:masterfrom
ChiragSW:issue#22586

ChiragSW commented Mar 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

codecov-commenter commented Mar 28, 2026 •

edited

Loading

Uh oh!

ChiragSW commented Mar 28, 2026

Uh oh!

amadhan882 commented Mar 30, 2026

Uh oh!

hertschuh Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

ChiragSW commented Mar 28, 2026

Root Cause

Fix

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

codecov-commenter commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ChiragSW commented Mar 28, 2026

Uh oh!

amadhan882 commented Mar 30, 2026

Uh oh!

hertschuh Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-commenter commented Mar 28, 2026 •

edited

Loading