Skip to content

Commit 6aaabc1

Browse files
[release/2.8] layernorm tests: Tweak test thresholds for comparing tensors (#2583)
After PR: pytorch#156600, this test was failing internally on large tensors because the differences were greater than tolerances on some cuda devices. We now raise the tolerances for larger tensors. Pull Request resolved: pytorch#156699 Approved by: https://github.com/eqy, https://github.com/ngimel (cherry picked from commit 36dd598) Fixes SWDEV-547998 Co-authored-by: Ahmad Sharif <[email protected]>
1 parent 7b2a4fd commit 6aaabc1

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

test/test_nn.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7437,10 +7437,15 @@ def test_layer_norm_backwards_eps(self):
74377437
ln_out_cuda = ln_cuda(x_cuda)
74387438
ln_out.backward(grad_output)
74397439
ln_out_cuda.backward(grad_output_cuda)
7440+
atol = 1e-4
7441+
rtol = 1e-5
7442+
if m > 64 * 1024:
7443+
atol = 1e-3
7444+
rtol = 1e-3
74407445
if elementwise_affine:
7441-
self.assertEqual(ln.weight.grad, ln_cuda.weight.grad, f"weight grad failed: {m=} {n=}", rtol=1e-4, atol=1e-4)
7446+
self.assertEqual(ln.weight.grad, ln_cuda.weight.grad, f"weight grad failed: {m=} {n=}", rtol=rtol, atol=atol)
74427447
if bias and elementwise_affine:
7443-
self.assertEqual(ln.bias.grad, ln_cuda.bias.grad, f"bias grad failed: {m=} {n=}", rtol=1e-5, atol=1e-4)
7448+
self.assertEqual(ln.bias.grad, ln_cuda.bias.grad, f"bias grad failed: {m=} {n=}", rtol=rtol, atol=atol)
74447449

74457450
@largeTensorTest("40GB", device="cuda")
74467451
def test_layer_norm_large_tensor(self):

0 commit comments

Comments
 (0)