Clarify Batch Matrix Multiply operator usage

pareenaverma · web-flow · commit b47b9c42d71b · 2025-11-20T16:21:39.000-05:00
Updated the explanation of the Batch Matrix Multiply operator and clarified the instructions for constructing benchmark models.
diff --git a/content/learning-paths/mobile-graphics-and-gaming/measure-kleidiai-kernel-performance-on-executorch/06-create-matrix-mul-model.md b/content/learning-paths/mobile-graphics-and-gaming/measure-kleidiai-kernel-performance-on-executorch/06-create-matrix-mul-model.md
@@ -6,9 +6,9 @@ weight: 7
 layout: learningpathall
 ---
 
-In the previous section, we discussed that the Batch Matrix Multiply operator supports multiple GEMM (General Matrix Multiplication) variants.
+The Batch Matrix Multiply operator (torch.bmm) under XNNPACK lowers to GEMM and, when shapes and dtypes match supported patterns, can dispatch to KleidiAI micro-kernels on Arm. 
 
-To evaluate the performance of these variants across different hardware platforms, we construct a set of benchmark models that utilize the batch matrix multiply operator with different GEMM implementations for comparative analysis.
+To evaluate the performance of these variants across different hardware platforms, you will construct a set of benchmark models that utilize the batch matrix multiply operator with different GEMM implementations for comparative analysis.
 
 
 ### Matrix multiply benchmark model
@@ -72,11 +72,10 @@ export_mutrix_mul_model(torch.float32,"matrix_mul_pf32_gemm")
 
 ```
 
-**NOTE:** 
-
+{{%notice Note%}}
 When exporting models, the **generate_etrecord** option is enabled to produce the .etrecord file alongside the .pte model file.
 These ETRecord files are essential for subsequent model analysis and performance evaluation.
-
+{{%/notice%}}
 
 After running this script, both the PTE model file and the etrecord file are generated.