Update prototype_source/max_autotune_on_CPU_tutorial.rst

chunyuan-w · svekars · web-flow · commit fb8f415cb7a4 · 2024-10-02T10:27:29.000+08:00
Co-authored-by: Svetlana Karslioglu &lt;svekars@meta.com&gt;
diff --git a/prototype_source/max_autotune_on_CPU_tutorial.rst b/prototype_source/max_autotune_on_CPU_tutorial.rst
@@ -24,8 +24,8 @@ We have covered most popular data types, including FP32, BF16, FP16, and INT8, w
 
 While the development is still in progress, we have already seen promising speedups over pure ATen-based GEMMs as measured by the three benchmark suites and the inference of LLMs. 
 
-How to activate ``max-autotune`` mode
-------------
+Activating the ``max-autotune`` mode
+-------------------------------------
 To activate the ``max-autotune`` mode in PyTorch, set the ``mode`` argument to ``max-autotune`` when compiling your model using ``torch.compile``.
 If you prefer to bypass the tuning process and always use the CPP template implementations, you can configure this via an environment variable: 
 ``export TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_BACKENDS=CPP``.