You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This doc contains [Torchdynamo Benchmarks](https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo) setup for XPU Backend for Triton\*.
23
+
This document outlines the setup for [Torchdynamo Benchmarks](https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo) with XPU Backend for Triton*. It includes various suites and serves as a common frontend usage guide.
24
+
25
+
The Benchmark contains different suites and shares as a common frontend usage. This doc below is an example showing [Hugging Face\*](https://huggingface.co/), [TIMM Models](https://github.com/rwightman/pytorch-image-models) and [TorchBench](https://github.com/pytorch/benchmark) End-to-End models within the [Torchdynamo Benchmarks](https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo) context.
15
26
16
-
The Benchmark contains different suites and shares as a common frontend usage. This doc below is an example showing [Hugging Face\*](https://huggingface.co/) End-to-End models for triton.
17
27
18
28
# Pre-Request
19
29
The PyTorch version should be the same as the one in [installation guide for intel_extension_for_pytorch](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html#installation-guide).
20
30
31
+
# Package Installation
32
+
## HuggingFace and TIMM Models Installation
33
+
The scripts on [Torchdynamo Benchmarks](https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo) automatically download and install the transformers and timm packages. However, there are instances where the script may uninstall the XPU version of PyTorch and install the CUDA version instead. Therefore, verifying the PyTorch version before running is crucial.
If the PyTorch version is incorrect, please reinstall the [XPU version of PyTorch](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html#installation-guide).
23
52
24
-
## TL;DR
25
-
PyTorch benchmark will automatically download necessary dependencies.
26
53
27
-
Simply run the model using the following sh file. Note that there are some tricks for debugging. It is recommended to refer to [Debugging Tips](#debugging-tips).
54
+
## TorchBench Installation
55
+
TorchBench relies on [torchvision](https://github.com/pytorch/vision.git),[torchtext](https://github.com/pytorch/text) and [torchaudio](https://github.com/pytorch/audio.git). Since it by default build with CUDA support, for XPU support, all of these packages needs to be **BUILD FROM SOURCE**.
56
+
57
+
Please follow the following command for building and installation dependencies:
Note that when building, it has the following error, it could be ignored.
30
78
31
-
First, copy the sh file [intel_xpu_backend/.github/scripts/inductor_xpu_test.sh](../../.github/scripts/inductor_xpu_test.sh) to the PyTorch source folder, then run the `sh` file with the command:
79
+
```Bash
80
+
Processing dependencies for torchtext==0.17.0a0+c0d0685
81
+
error: torch 2.1.0a0+gitdd9913f is installed but torch==2.1.0 is required by {'torchdata'}
Simply run the model using the following sh file. Note that there are some tricks for debugging. It is recommended to refer to [Debugging Tips](#debugging-tips).
111
+
112
+
113
+
Copy the shell script [intel_xpu_backend/.github/scripts/inductor_xpu_test.sh](../../.github/scripts/inductor_xpu_test.sh) to the PyTorch source folder, then execute the command:
-`TRITON_XPU_PROFILE=ON`: Show XPU triton kernels for debug.
123
+
For the real example, refer to our CI command at [triton_xpu_backend_e2e_nightly.yml](https://github.com/intel/intel-xpu-backend-for-triton/blob/da1bc1fb7a39cb3c3332a92fba47c2fc1df25396/.github/workflows/triton_xpu_backend_e2e_nightly.yml#L230-L233).
124
+
125
+
126
+
Environment variables for debugging include:
127
+
-`TORCHINDUCTOR_CACHE_DIR={some_DIR}`: Specifies the cache directory. Useful for debugging.
128
+
-`TORCH_COMPILE_DEBUG=1`: Enables debug information printing.
129
+
-`TRITON_XPU_PROFILE=ON`: Displays XPU Triton kernels for debugging.
45
130
46
-
By default, the cache dir is under `/tmp/torchinductor_{user}/`, it is recommended to change the cache dir to a new place when you are debugging. For example,
131
+
By default, the cache dir is under `/tmp/torchinductor_{user}/`, It's advisable to change this when debugging, as demonstrated below:
The full arg lists could be found with the following command:
148
+
Full argument lists are accessible via:
66
149
67
150
```Bash
68
151
python benchmarks/dynamo/huggingface.py --help
69
152
```
70
153
71
-
In addition to the argument, there are configs in Python code to control the behavior:
72
-
73
-
74
-
Please go to [torch._dynamo.config](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py) and [torch._inductor.config](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/config.py) to find all configs.
75
-
76
-
One example of using the config is in [Debugging Tips](#debugging-tips). Please set the config according to your need.
154
+
Additional configuration settings are available in Python code, specifically in [torch._dynamo.config](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py) and [torch._inductor.config](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/config.py). Set these configurations as needed.
77
155
78
-
###Debugging Tips
156
+
## Debugging Tips
79
157
80
158
It is recommended to set the following environment variables for debugging:
81
159
82
-
-`TORCHINDUCTOR_CACHE_DIR={some-dir}`: Set this for where torchinductor cache is put.
83
-
-`TRITON_CACHE_DIR={some-dir}`: Where the triton cache is. By default, it is under the `TORCHINDUCTOR_CACHE_DIR/triton` folder.
160
+
-`TORCHINDUCTOR_CACHE_DIR={some-dir}`: Designates the torchinductor cache location.
161
+
-`TRITON_CACHE_DIR={some-dir}`: Specifies the Triton cache directory, usually within the`TORCHINDUCTOR_CACHE_DIR/triton` folder.
84
162
-`TORCH_COMPILE_DEBUG_DIR={some-dir}`: Where the compile debug files be put. You could see folders like `aot_torchinductor` containing the torchinductor logs, and `torchdynamo` folder containing the dynamo log.
85
163
-`TORCH_COMPILE_DEBUG=1`: Detailed for TorchInductor Tracing. It will print a lot of messages. Thus it is recommended to redirect the output to the file. By setting this flag, the re-producible Python file could be easily found.
86
164
@@ -89,15 +167,15 @@ Alternatively, the above env flag could also be set in a Python file like below,
Re-running from the overall model is quite a burden, you could try to reproduce the error using a smaller Python file.
100
-
To reproduce the result, one could set the flag `TORCH_COMPILE_DEBUG=1`. Then the graph will be printed. Note that there are a lot of outputs, one could direct the output to a file.
176
+
For efficiency, reproduce errors using a smaller Python file. Enable `TORCH_COMPILE_DEBUG=1` to generate detailed outputs, which can be redirected to a file for easier inspection. The debug folder will contain files like `fx_graph_readable.py`, `fx_graph_runnable.py`, and `output_code.py`, which can be used for further analysis and debugging.
177
+
178
+
Note that there are a lot of outputs, one could direct the output to a file.
To profile the result, one should use the `performance` mode instead of `accuracy`. i.e, One should use
218
+
To profile the result, one should use the `performance` mode instead of `accuracy`, and make sure the profiler trace flag `--export-profiler-trace` is enabled in the `inductor_xpu_test.sh`. i.e, One should use
For now, we use the [profiler_legacy](https://github.com/intel/intel-extension-for-pytorch/blob/xpu-master/docs/tutorials/features/profiler_legacy.md) to catch the profiling result.
223
+
## Option 1 : Use Legacy Profiling
224
+
For now, we use the [profiler_legacy](https://github.com/intel/intel-extension-for-pytorch/blob/xpu-master/docs/tutorials/features/profiler_legacy.md) to catch the profiling result. We are migrating legacy profiling to kineto profiling. As the legacy profiling is more stable, it is recommended to use legacy profiling first.
147
225
148
226
A typical profiling code would look like below:
149
227
@@ -164,9 +242,8 @@ with torch.autograd.profiler_legacy.profile(use_xpu=True) as prof:
164
242
# print the result table formatted by the legacy profiler tool as your wish
For E2E tests, there are several places to change. You should cd to `pytorch/benchmarks/dynamo` and change the `common.py` as below. Note that the line number may not be the same, but the change places are unique.
172
249
@@ -191,6 +268,40 @@ rgs):
191
268
else:
192
269
yield
193
270
```
271
+
## Option 2: Use Kineto Profiling
272
+
We are migrating to kineto profiling. In the future, this will be the only option. A typical profiler case would like below. For now, be sure to enable the environmental flag `export IPEX_ZE_TRACING=1`.
273
+
274
+
```Python
275
+
import torch
276
+
import intel_extension_for_pytorch
277
+
from torch.profiler import profile, ProfilerActivity
278
+
279
+
a = torch.randn(3).xpu()
280
+
b = torch.randn(3).xpu()
281
+
282
+
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.XPU]) as prof:
283
+
c = a + b
284
+
285
+
print(prof.key_averages().table())
286
+
```
287
+
### Profiling Settings
288
+
Same as the legacy profiling, you could modify the code like:
- with torch.profiler.profile(*args, **kwargs) as p:
296
+
+ with torch.autograd.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.XPU], *args, **kwargs) as p:
297
+
yield p
298
+
else:
299
+
yield
300
+
```
301
+
302
+
303
+
### End-to-end Tests Setting:
304
+
194
305
#### Profiling Tips
195
306
196
307
To run the model, you should add the `--export-profiler-trace` flag when running. Because use the profiling process will link libtorch, this will greatly reduce the kernel compiling time. It is highly recommended to **run twice** for quicker result:
@@ -203,8 +314,7 @@ If you wish to make kernel name more readable, you could enable with the followi
0 commit comments