Enable input fusion for a specific kernel pattern.

Google-ML-Automation · jax authors · commit 489febee04d5 · 2024-06-10T12:37:49.000-07:00
cl/640530524 introduces batching support for some pallas calls that don't currently support it yet using dynamic slicing the input and dynamically updating the output. This CL ensures that XLA-guided input fusion into pallas kernel is working as expected for such pattern. We don't have support for fusion on the output side yet for pallas kernels.

PiperOrigin-RevId: 641989012
diff --git a/tests/pallas/pallas_call_tpu_test.py b/tests/pallas/pallas_call_tpu_test.py
@@ -311,6 +311,7 @@ def kernel(s, x):
               grid=8,
           ),
           interpret=self.interpret,
+          compiler_params=dict(mosaic=dict(allow_input_fusion=[False, True])),
       )(s, x)
 
     first = x[0, ...].reshape((1, 8, 8, -1))[:, s[0, ...]].reshape(x.shape[1:])