[Numerics] TPU vs GPU mismatch for `jax.lax.div` in fp32 (same inputs, up to 2 ULP) #35425

lingebeng · 2026-02-26T08:52:48Z

lingebeng
Feb 26, 2026

Hi JAX team, I want to confirm whether this is expected behavior and how to reason about it.

Summary

For identical inputs, jax.lax.div in float32 gives slightly different TPU/GPU results.
The mismatch is small (mostly 1 ULP, max 2 ULP), but reproducible.

Environment

jax==0.8.1
jaxlib==0.8.1
Python 3.12.12
TPU: TPUv6e (single chip)
GPU: NVIDIA H100 80GB HBM3 (sm_90a)

Repro

I run the same seeded input on TPU and GPU (saved GPU output, then compared on TPU):

import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import jax
import jax.numpy as jnp
import torch
import numpy as np
jax.config.update("jax_enable_x64", True)
rng = np.random.default_rng(seed=0)

x = rng.normal(size=(1024, 1024))
y = rng.normal(size=(1024, 1024))

x_j = jnp.array(x, dtype=jnp.float32)
y_j = jnp.array(y, dtype=jnp.float32)
o = jax.lax.div(x_j, y_j)

def to_numpy_f32(arr):
    if isinstance(arr, torch.Tensor):
        arr = arr.detach().cpu().numpy()
    else:
        arr = np.asarray(arr)
    return np.ascontiguousarray(arr.astype(np.float32, copy=False))
def ulp_diff_float32(a, b):
    a_i = a.view(np.int32).astype(np.int64)
    b_i = b.view(np.int32).astype(np.int64)
    a_ordered = np.where(a_i < 0, 0x80000000 - a_i, a_i)
    b_ordered = np.where(b_i < 0, 0x80000000 - b_i, b_i)
    return np.abs(a_ordered - b_ordered)
def compare_metrics(name, test, ref):
    test = to_numpy_f32(test)
    ref = to_numpy_f32(ref)

    finite_mask = np.isfinite(test) & np.isfinite(ref)
    skipped = test.size - int(finite_mask.sum())
    test_f = test[finite_mask]
    ref_f = ref[finite_mask]

    abs_err = np.abs(test_f - ref_f)
    rel_denom = np.maximum(np.abs(ref_f), np.finfo(np.float32).eps)
    rel_err = abs_err / rel_denom
    ulp_err = ulp_diff_float32(test_f, ref_f)

    print(f"\n{name}")
    print(
        f"  valid elements: {test_f.size}/{test.size}, skipped non-finite pairs: {skipped}"
    )
    print(
        f"  abs_err: max={abs_err.max():.6e}, mean={abs_err.mean():.6e}, p99={np.percentile(abs_err, 99):.6e}"
    )
    print(
        f"  rel_err: max={rel_err.max():.6e}, mean={rel_err.mean():.6e}, p99={np.percentile(rel_err, 99):.6e}"
    )
    print(
        f"  ulp_err: max={int(ulp_err.max())}, mean={ulp_err.mean():.6f}, p99={np.percentile(ulp_err, 99):.6f}"
    )

Observed metrics (TPU vs GPU, fp32)

valid elements: 1,048,576 / 1,048,576
abs_err: max=1.562500e-02, mean=1.844224e-07, p99=9.536743e-07
rel_err: max=2.292508e-07, mean=2.029220e-08, p99=1.391064e-07
ulp_err: max=2, mean=0.245610, p99=2.0

Max-diff example:

x = 0.980900327389666
y = -4.190711131229839e-06
TPU = -234065.390625
GPU = -234065.375

Extra notes

With jax_enable_x64=True and float64 path, mismatch is much smaller.
For explicit fp32, mismatch remains.
On TPU HLO dump, op remains divide through passes (before_optimizations -> after_codegen).
On GPU PTX, I see division instruction selection differences in related experiments (div.full.f32 vs div.rn.f32 in other stacks), so this may be backend lowering behavior.

Questions

Is 1–2 ULP TPU/GPU divergence for fp32 divide expected?
Is there any JAX/XLA option to enforce stricter cross-backend fp32 divide consistency?
If expected, could this be documented as a known cross-backend numerics behavior?

Answered by hawkinsp

Feb 26, 2026

Yes, this is expected. TPU implements floating point division as multiplication by the reciprocal. The best we could really do here is document it.

View full answer

hawkinsp · 2026-02-26T13:30:34Z

hawkinsp
Feb 26, 2026
Maintainer

Yes, this is expected. TPU implements floating point division as multiplication by the reciprocal. The best we could really do here is document it.

1 reply

lingebeng Feb 27, 2026
Author

Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Numerics] TPU vs GPU mismatch for `jax.lax.div` in fp32 (same inputs, up to 2 ULP) #35425

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Numerics] TPU vs GPU mismatch for jax.lax.div in fp32 (same inputs, up to 2 ULP) #35425

Uh oh!

lingebeng Feb 26, 2026

Summary

Environment

Repro

Observed metrics (TPU vs GPU, fp32)

Extra notes

Questions

Replies: 1 comment · 1 reply

Uh oh!

hawkinsp Feb 26, 2026 Maintainer

Uh oh!

lingebeng Feb 27, 2026 Author

[Numerics] TPU vs GPU mismatch for `jax.lax.div` in fp32 (same inputs, up to 2 ULP) #35425

lingebeng
Feb 26, 2026

Replies: 1 comment 1 reply

hawkinsp
Feb 26, 2026
Maintainer

lingebeng Feb 27, 2026
Author