File tree Expand file tree Collapse file tree 1 file changed +6
-2
lines changed Expand file tree Collapse file tree 1 file changed +6
-2
lines changed Original file line number Diff line number Diff line change @@ -30,8 +30,12 @@ Example code
30
30
------------
31
31
The below code is an example of using the ``max-autotune `` mode on a simple neural network with a linear layer followed by a ReLU activation.
32
32
33
- We only support frozen model with ``torch.no_grad `` or the inference mode
34
- Therefore, you need to set the environment variable ``export TORCHINDUCTOR_FREEZING=1 ``
33
+ In the C++ template-based GEMM implementation, we will pre-pack the weight for good cache usage.
34
+ In the case of inference which is the primary scenario of CPU AI workloads,
35
+ model weights are constant and we pack them upfront during compilation
36
+ so that the data accesses are contiguous within the cache blocks.
37
+ Thus, We only support frozen model with ``torch.no_grad `` or the inference mode.
38
+ You need to set the environment variable ``export TORCHINDUCTOR_FREEZING=1 ``
35
39
and ensure that both the compilation and inference steps are executed within the ``torch.no_grad `` context.
36
40
37
41
.. code :: python
You can’t perform that action at this time.
0 commit comments