[release/2.7][ROCm][tunableop] UT tolerance increase for matmul_small_brute_force_… (#2397)

naromero77amd · jithunnair-amd · commit d010db77f8d1 · 2025-07-31T00:31:42.000-05:00
TunableOp will sometimes find a less precise solution due to the small input vectors used in this UT. Bumping up tolerance to eliminate flakiness. Pull Request resolved: pytorch#158788 Approved by: https://github.com/jeffdaily (cherry picked from commit c917c63) (cherry picked from commit 35daec9)
diff --git a/test/test_linalg.py b/test/test_linalg.py
@@ -4762,6 +4762,7 @@ def test_matmul_small_brute_force_3d_Nd(self, device, dtype):
     @onlyCUDA
     @skipCUDAIfNotRocm  # Skipping due to SM89 OOM in CI, UT doesn't do much on NV anyways
     @dtypes(*floating_types_and(torch.half))
+    @precisionOverride({torch.float16: 1e-1})  # TunableOp may occasionally find less precise solution
     def test_matmul_small_brute_force_tunableop(self, device, dtype):
         # disable tunableop buffer rotation for all tests everywhere, it can be slow
         # We set the TunableOp numerical check environment variable here because it is