Update llama README to reflect kleidiai is default (#12905)

pytorchbot · mcr229 · web-flow · commit 0989aa271a78 · 2025-08-05T14:24:43.000-07:00
KleidiAI is now on by default for the XNNPACK Backend, we update the
README in models/llama to reflect this

Co-authored-by: Max Ren &lt;40742183+mcr229@users.noreply.github.com&gt;
diff --git a/examples/models/llama/README.md b/examples/models/llama/README.md
@@ -65,7 +65,7 @@ Please see the [Llama 3.2 model card](https://github.com/meta-llama/llama-models
 
 ### Performance
 
-Llama 3.2 1B and 3B performance was measured on Android OnePlus 12 device. The performance measurement is expressed in terms of tokens per second using an [adb binary-based approach](#step-4-run-benchmark-on-android-phone) with prompt length of 64. It is measured with KleidiAI library. KleidiAI is not enabled by default yet. Use `-DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON` to enable it in the build.
+Llama 3.2 1B and 3B performance was measured on Android OnePlus 12 device. The performance measurement is expressed in terms of tokens per second using an [adb binary-based approach](#step-4-run-benchmark-on-android-phone) with prompt length of 64. It is measured with KleidiAI library. KleidiAI is enabled by default on the XNNPACK Backend for all ARM devices.
 
 |Model  | Decode (tokens/s) | Time-to-first-token (sec) | Prefill (tokens/s) | Model size (PTE file size in MiB) | Memory size (RSS in MiB) |
 |-------|------------------:|--------------------------:| ------------------:|----------------------------------:| ------------------------:|