You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: prototype_source/max_autotune_on_CPU_tutorial.rst
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,8 +27,10 @@ If you prefer to bypass the tuning process and always use the CPP template imple
27
27
Example code
28
28
------------
29
29
The below code is an example of using the ``max-autotune`` mode on a simple neural network with a linear layer followed by a ReLU activation.
30
-
You could run the example code by setting this environment variable ``export TORCHINDUCTOR_FREEZING=1``.
31
30
31
+
We only support frozen model with ``torch.no_grad`` or the inference mode
32
+
Therefore, you need to set the environment variable ``export TORCHINDUCTOR_FREEZING=1``
33
+
and ensure that both the compilation and inference steps are executed within the ``torch.no_grad`` context.
32
34
33
35
.. code:: python
34
36
@@ -86,6 +88,7 @@ We could check the generated output code by setting ``export TORCH_LOGS="+output
86
88
When CPP template is selected, we won't have ``torch.ops.mkldnn._linear_pointwise.default`` (for bfloat16) or ``torch.ops.mkl._mkl_linear.default`` (for float32)
87
89
in the generated code anymore, instead, we'll find kernel based on CPP GEMM template ``cpp_fused__to_copy_relu_1``
88
90
(only part of the code is demonstrated below for simplicity) with the bias and relu epilogues fused inside the CPP GEMM template kernel.
91
+
89
92
The generated code differs by CPU architecture and is implementation-specific, which is subject to change.
0 commit comments