Skip to content

Commit dce78f3

Browse files
committed
Add more details for freezing
1 parent 806ee21 commit dce78f3

File tree

1 file changed

+6
-2
lines changed

1 file changed

+6
-2
lines changed

prototype_source/max_autotune_on_CPU_tutorial.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,12 @@ Example code
3030
------------
3131
The below code is an example of using the ``max-autotune`` mode on a simple neural network with a linear layer followed by a ReLU activation.
3232

33-
We only support frozen model with ``torch.no_grad`` or the inference mode
34-
Therefore, you need to set the environment variable ``export TORCHINDUCTOR_FREEZING=1``
33+
In the C++ template-based GEMM implementation, we will pre-pack the weight for good cache usage.
34+
In the case of inference which is the primary scenario of CPU AI workloads,
35+
model weights are constant and we pack them upfront during compilation
36+
so that the data accesses are contiguous within the cache blocks.
37+
Thus, We only support frozen model with ``torch.no_grad`` or the inference mode.
38+
You need to set the environment variable ``export TORCHINDUCTOR_FREEZING=1``
3539
and ensure that both the compilation and inference steps are executed within the ``torch.no_grad`` context.
3640

3741
.. code:: python

0 commit comments

Comments
 (0)