Skip to content

Commit 29effc5

Browse files
committed
add request on frozen and no_grad
1 parent 9380b9d commit 29effc5

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

prototype_source/max_autotune_on_CPU_tutorial.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,10 @@ If you prefer to bypass the tuning process and always use the CPP template imple
2727
Example code
2828
------------
2929
The below code is an example of using the ``max-autotune`` mode on a simple neural network with a linear layer followed by a ReLU activation.
30-
You could run the example code by setting this environment variable ``export TORCHINDUCTOR_FREEZING=1``.
3130

31+
We only support frozen model with ``torch.no_grad`` or the inference mode
32+
Therefore, you need to set the environment variable ``export TORCHINDUCTOR_FREEZING=1``
33+
and ensure that both the compilation and inference steps are executed within the ``torch.no_grad`` context.
3234

3335
.. code:: python
3436
@@ -86,6 +88,7 @@ We could check the generated output code by setting ``export TORCH_LOGS="+output
8688
When CPP template is selected, we won't have ``torch.ops.mkldnn._linear_pointwise.default`` (for bfloat16) or ``torch.ops.mkl._mkl_linear.default`` (for float32)
8789
in the generated code anymore, instead, we'll find kernel based on CPP GEMM template ``cpp_fused__to_copy_relu_1``
8890
(only part of the code is demonstrated below for simplicity) with the bias and relu epilogues fused inside the CPP GEMM template kernel.
91+
8992
The generated code differs by CPU architecture and is implementation-specific, which is subject to change.
9093

9194
.. code:: python

0 commit comments

Comments
 (0)