[DOC] Add docs for TIMM and TorchBench setup (#125)

Stonepia · web-flow · commit 35ab4273c855 · 2023-11-15T12:41:34.000+08:00
* Add instructions for TIMM and TorchBench

* Update end_to_end_tests.md

* Update
diff --git a/docs/test_docs/end_to_end_tests.md b/docs/test_docs/end_to_end_tests.md
@@ -1,34 +1,116 @@
 - [Overview](#overview)
 - [Pre-Request](#pre-request)
-- [Use the Hugging Face model](#use-the-hugging-face-model)
-  - [TL;DR](#tldr)
-- [Detail for commands](#detail-for-commands)
-    - [Debugging Tips](#debugging-tips)
-  - [Profiling](#profiling)
+- [Package Installation](#package-installation)
+  - [HuggingFace and TIMM Models Installation](#huggingface-and-timm-models-installation)
+  - [TorchBench Installation](#torchbench-installation)
+    - [Install Torch Vision](#install-torch-vision)
+    - [Install Torch Text](#install-torch-text)
+    - [Install Torch Audio](#install-torch-audio)
+    - [Install TorchBenchmark](#install-torchbenchmark)
+- [Run the Model](#run-the-model)
+  - [Command Details](#command-details)
+  - [Debugging Tips](#debugging-tips)
+- [Profiling](#profiling)
+  - [Option 1 : Use Legacy Profiling](#option-1--use-legacy-profiling)
+    - [Profiling Settings](#profiling-settings)
+  - [Option 2: Use Kineto Profiling](#option-2-use-kineto-profiling)
+    - [Profiling Settings](#profiling-settings-1)
     - [End-to-end Tests Setting:](#end-to-end-tests-setting)
-      - [Profiling Settings](#profiling-settings)
       - [Profiling Tips](#profiling-tips)
 
 
 # Overview
-This doc contains [Torchdynamo Benchmarks](https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo) setup for XPU Backend for Triton\*.
+This document outlines the setup for [Torchdynamo Benchmarks](https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo)  with XPU Backend for Triton*. It includes various suites and serves as a common frontend usage guide.
+
+The Benchmark contains different suites and shares as a common frontend usage. This doc below is an example showing [Hugging Face\*](https://huggingface.co/), [TIMM Models](https://github.com/rwightman/pytorch-image-models) and [TorchBench](https://github.com/pytorch/benchmark) End-to-End models within the [Torchdynamo Benchmarks](https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo) context.
 
-The Benchmark contains different suites and shares as a common frontend usage. This doc below is an example showing [Hugging Face\*](https://huggingface.co/) End-to-End models for triton.
 
 # Pre-Request
 The PyTorch version should be the same as the one in [installation guide for intel_extension_for_pytorch](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html#installation-guide).
 
+# Package Installation
+## HuggingFace and TIMM Models Installation
+The scripts on [Torchdynamo Benchmarks](https://github.com/pytorch/pytorch/tree/main/benchmarks/dynamo) automatically download and install the transformers and timm packages. However, there are instances where the script may uninstall the XPU version of PyTorch and install the CUDA version instead. Therefore, verifying the PyTorch version before running is crucial.
 
-# Use the Hugging Face model
+```Bash
+# Wrong one, it uses CUDA version
+(triton_env) ➜  python
+>>> import torch
+>>> torch.__version__
+'2.1.0+cu121'
+>>> torch.__file__
+'/home/user/miniconda3/envs/triton_env/lib/python3.10/site-packages/torch/__init__.py'
+
+# Correct one, should use XPU
+>>> import torch
+>>> torch.__version__
+'2.1.0a0+gitdd9913f'
+>>> torch.__file__
+'/home/user/pytorch/torch/__init__.py'
+```
+If the PyTorch version is incorrect, please reinstall the [XPU version of PyTorch](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installation.html#installation-guide).
 
-## TL;DR
-PyTorch benchmark will automatically download necessary dependencies.
 
-Simply run the model using the following sh file. Note that there are some tricks for debugging. It is recommended to refer to [Debugging Tips](#debugging-tips).
+## TorchBench Installation
+TorchBench relies on [torchvision](https://github.com/pytorch/vision.git),[torchtext](https://github.com/pytorch/text) and [torchaudio](https://github.com/pytorch/audio.git). Since it by default build with CUDA support, for XPU support, all of these packages needs to be **BUILD FROM SOURCE**.
+
+Please follow the following command for building and installation dependencies:
+
+
+### Install Torch Vision
+
+```Bash
+git clone --recursive https://github.com/pytorch/vision.git
+cd vision
+conda install libpng jpeg
+conda install -c conda-forge ffmpeg
+python setup.py install
+```
+### Install Torch Text
 
+```Bash
+git clone --recursive https://github.com/pytorch/text
+cd text
+python setup.py clean install
+```
 
+Note that when building, it has the following error, it could be ignored.
 
-First, copy the sh file  [intel_xpu_backend/.github/scripts/inductor_xpu_test.sh](../../.github/scripts/inductor_xpu_test.sh) to the PyTorch source folder, then run the `sh` file with the command:
+```Bash
+Processing dependencies for torchtext==0.17.0a0+c0d0685
+error: torch 2.1.0a0+gitdd9913f is installed but torch==2.1.0 is required by {'torchdata'}
+```
+
+### Install Torch Audio
+```Bash
+pip install torchaudio
+git clone --recursive https://github.com/pytorch/audio.git
+cd audio
+python setup.py install
+```
+
+### Install TorchBenchmark
+Ensure all dependencies are correctly installed:
+
+```Bash
+python -c "import torchvision,torchtext,torchaudio;print(torchvision.__version__, torchtext.__version__, torchaudio.__version__)"
+```
+
+Then install TorchBenchmark as a library:
+```
+conda install git-lfs pyyaml pandas scipy psutil
+git clone --recursive https://github.com/pytorch/benchmark.git
+
+cd benchmark
+python install.py
+pip install .
+```
+
+# Run the Model
+Simply run the model using the following sh file. Note that there are some tricks for debugging. It is recommended to refer to [Debugging Tips](#debugging-tips).
+
+
+Copy the shell script [intel_xpu_backend/.github/scripts/inductor_xpu_test.sh](../../.github/scripts/inductor_xpu_test.sh) to the PyTorch source folder, then execute the command:
 
 ```Bash
 # Run all models
@@ -38,12 +120,15 @@ bash xpu_run_batch.sh huggingface amp_bf16 training performance  xpu 0
 bash xpu_run_batch.sh huggingface amp_bf16 training performance  xpu 0 static 1 0 T5Small
 ```
 
-There are also useful env flag, for example:
-- `TORCHINDUCTOR_CACHE_DIR={some_DIR}`: Where the cache files are put. It is useful when debugging.
-- `TORCH_COMPILE_DEBUG=1`: Whether print debug info.
-- `TRITON_XPU_PROFILE=ON`: Show XPU triton kernels for debug.
+For the real example, refer to our CI command at [triton_xpu_backend_e2e_nightly.yml](https://github.com/intel/intel-xpu-backend-for-triton/blob/da1bc1fb7a39cb3c3332a92fba47c2fc1df25396/.github/workflows/triton_xpu_backend_e2e_nightly.yml#L230-L233).
+
+
+Environment variables for debugging include:
+- `TORCHINDUCTOR_CACHE_DIR={some_DIR}`: Specifies the cache directory. Useful for debugging.
+- `TORCH_COMPILE_DEBUG=1`: Enables debug information printing.
+- `TRITON_XPU_PROFILE=ON`: Displays XPU Triton kernels for debugging.
 
-By default, the cache dir is under `/tmp/torchinductor_{user}/`, it is recommended to change the cache dir to a new place when you are debugging. For example,
+By default, the cache dir is under `/tmp/torchinductor_{user}/`, It's advisable to change this when debugging, as demonstrated below:
 
 ```Bash
 LOG_DIR=${WORKSPACE}/inductor_log/${SUITE}/${MODEL}/${DT}
@@ -53,34 +138,27 @@ export TORCHINDUCTOR_CACHE_DIR=${LOG_DIR}
 ```
 
 
-# Detail for commands
+## Command Details
 
-Below is the detail for those who are interested in more fine-grained control.
-
-Normally, the command will be like the following:
+For fine-grained control, the typical command structure is as follows:
 
 ```Bash
 python benchmarks/dynamo/${SUITE}.py --only ${MODEL} --accuracy --amp -dxpu -n50 --no-skip --dashboard ${Mode_extra}  --backend=inductor --timeout=4800 --output=${LOG_DIR}/${LOG_NAME}.csv
 ```
-The full arg lists could be found with the following command:
+Full argument lists are accessible via:
 
 ```Bash
 python benchmarks/dynamo/huggingface.py --help
 ```
 
-In addition to the argument, there are configs in Python code to control the behavior:
-
-
-Please go to [torch._dynamo.config](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py) and [torch._inductor.config](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/config.py) to find all configs.
-
-One example of using the config is in [Debugging Tips](#debugging-tips). Please set the config according to your need.
+Additional configuration settings are available in Python code, specifically in [torch._dynamo.config](https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py) and [torch._inductor.config](https://github.com/pytorch/pytorch/blob/main/torch/_inductor/config.py). Set these configurations as needed.
 
-### Debugging Tips
+## Debugging Tips
 
 It is recommended to set the following environment variables for debugging:
 
-- `TORCHINDUCTOR_CACHE_DIR={some-dir}`: Set this for where torchinductor cache is put.
-- `TRITON_CACHE_DIR={some-dir}`: Where the triton cache is. By default, it is under the `TORCHINDUCTOR_CACHE_DIR/triton` folder.
+- `TORCHINDUCTOR_CACHE_DIR={some-dir}`: Designates the torchinductor cache location.
+- `TRITON_CACHE_DIR={some-dir}`: Specifies the Triton cache directory, usually within the  `TORCHINDUCTOR_CACHE_DIR/triton` folder.
 - `TORCH_COMPILE_DEBUG_DIR={some-dir}`: Where the compile debug files be put. You could see folders like `aot_torchinductor` containing the torchinductor logs, and `torchdynamo` folder containing the dynamo log.
 - `TORCH_COMPILE_DEBUG=1`: Detailed for TorchInductor Tracing. It will print a lot of messages. Thus it is recommended to redirect the output to the file. By setting this flag, the re-producible Python file could be easily found.
 
@@ -89,15 +167,15 @@ Alternatively, the above env flag could also be set in a Python file like below,
 
 ```Python
 # helps to generate descriptive kernel names
-torch._inductor.config.triton.ordered_kernel_names = True
-torch._inductor.config.triton.descriptive_kernel_names = True
+torch._inductor.config.triton.unique_kernel_names = True
 torch._inductor.config.kernel_name_max_ops = 8
 ```
 
 **Reproducing Errors with Smaller Python File**
 
-Re-running from the overall model is quite a burden, you could try to reproduce the error using a smaller Python file.
-To reproduce the result, one could set the flag `TORCH_COMPILE_DEBUG=1`. Then the graph will be printed. Note that there are a lot of outputs, one could direct the output to a file.
+For efficiency, reproduce errors using a smaller Python file. Enable `TORCH_COMPILE_DEBUG=1` to generate detailed outputs, which can be redirected to a file for easier inspection. The debug folder will contain files like `fx_graph_readable.py`, `fx_graph_runnable.py`, and `output_code.py`, which can be used for further analysis and debugging.
+
+Note that there are a lot of outputs, one could direct the output to a file.
 
 ```Bash
 TORCH_COMPILE_DEBUG=1 python ... &> test.log
@@ -135,15 +213,15 @@ torch._dynamo.config.repro_after="dynamo"
 ```
 
 
-## Profiling
+# Profiling
 
-To profile the result, one should use the `performance` mode instead of `accuracy`. i.e, One should use
+To profile the result, one should use the `performance` mode instead of `accuracy`,  and make sure the profiler trace flag `--export-profiler-trace` is enabled in the `inductor_xpu_test.sh`. i.e, One should use
 
 ```Bash
-python benchmarks/dynamo/${SUITE}.py  ... --performance ...
+python benchmarks/dynamo/${SUITE}.py  ... --performance --export-profiler-trace...
 ```
-
-For now, we use the [profiler_legacy](https://github.com/intel/intel-extension-for-pytorch/blob/xpu-master/docs/tutorials/features/profiler_legacy.md) to catch the profiling result.
+## Option 1 : Use Legacy Profiling
+For now, we use the [profiler_legacy](https://github.com/intel/intel-extension-for-pytorch/blob/xpu-master/docs/tutorials/features/profiler_legacy.md) to catch the profiling result. We are migrating legacy profiling to kineto profiling. As the legacy profiling is more stable, it is recommended to use legacy profiling first.
 
 A typical profiling code would look like below:
 
@@ -164,9 +242,8 @@ with torch.autograd.profiler_legacy.profile(use_xpu=True) as prof:
 # print the result table formatted by the legacy profiler tool as your wish
 print(prof.key_averages().table(sort_by="self_xpu_time_total"))
 ```
-### End-to-end Tests Setting:
 
-#### Profiling Settings
+### Profiling Settings
 
 For E2E tests, there are several places to change. You should cd to `pytorch/benchmarks/dynamo` and change the `common.py` as below. Note that the line number may not be the same, but the change places are unique.
 
@@ -191,6 +268,40 @@ rgs):
          else:
              yield
 ```
+## Option 2: Use Kineto Profiling
+We are migrating to kineto profiling. In the future, this will be the only option. A typical profiler case would like below. For now, be sure to enable the environmental flag `export IPEX_ZE_TRACING=1`.
+
+```Python
+import torch
+import intel_extension_for_pytorch
+from torch.profiler import profile, ProfilerActivity
+
+a = torch.randn(3).xpu()
+b = torch.randn(3).xpu()
+
+with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.XPU]) as prof:
+     c = a + b
+
+print(prof.key_averages().table())
+```
+### Profiling Settings
+Same as the legacy profiling, you could modify the code like:
+
+```diff
+@@ -530,7 +536,7 @@ def speedup_experiment(args, model_iter_fn, model, example_inputs, **kwargs):
+     @contextlib.contextmanager
+     def maybe_profile(*args, **kwargs):
+         if kwargs.pop("enabled", True):
+-            with torch.profiler.profile(*args, **kwargs) as p:
++            with torch.autograd.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.XPU], *args, **kwargs) as p:
+                 yield p
+         else:
+             yield
+```
+
+
+### End-to-end Tests Setting:
+
 #### Profiling Tips
 
 To run the model, you should add the `--export-profiler-trace` flag when running. Because use the profiling process will link libtorch, this will greatly reduce the kernel compiling time. It is highly recommended to **run twice** for quicker result:
@@ -203,8 +314,7 @@ If you wish to make kernel name more readable, you could enable with the followi
 
 ```Python
 # common.py
-torch._inductor.config.triton.ordered_kernel_names = True
-torch._inductor.config.triton.descriptive_kernel_names = True
+torch._inductor.config.triton.unique_kernel_names = True
 torch._inductor.config.kernel_name_max_ops = 8
 ```