Skip to content

Commit da4bdca

Browse files
authored
Refresh compile/run examples in deployment configuration guides. (#19920)
Progress on #18174, updating some stale documentation. > [!NOTE] > Demo here: https://scotttodd.github.io/iree/guides/deployment-configurations/cpu/ Changes included: * Switch examples to use ONNX instead of TensorFlow given that users are trying to use TensorFlow and failing: #19852 * Add more documentation for CPU targets and features for #18561 * Standardize some formatting across CPU/CUDA/ROCm/Vulkan pages * Adjust some parts of the ONNX guide now that support is more mature
1 parent 25ec84c commit da4bdca

File tree

10 files changed

+295
-306
lines changed

10 files changed

+295
-306
lines changed

docs/website/docs/guides/deployment-configurations/cpu.md

Lines changed: 65 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -13,27 +13,32 @@ IREE supports efficient program execution on CPU devices by using
1313
highly optimized CPU native instruction streams, which are embedded in one of
1414
IREE's deployable formats.
1515

16-
To compile a program for CPU execution, pick one of IREE's supported executable
17-
formats:
16+
To compile a program for CPU execution:
1817

19-
| Executable Format | Description |
20-
| ----------------- | ----------------------------------------------------- |
21-
| embedded ELF | portable, high performance dynamic library |
22-
| system library | platform-specific dynamic library (.so, .dll, etc.) |
23-
| VMVX | reference target |
18+
1. Pick a CPU target supported by LLVM. By default, IREE includes these LLVM
19+
targets:
2420

25-
At runtime, CPU executables can be loaded using one of IREE's CPU HAL drivers:
21+
* X86
22+
* ARM
23+
* AArch64
24+
* RISCV
2625

27-
* `local-task`: asynchronous, multithreaded driver built on IREE's "task"
28-
system
29-
* `local-sync`: synchronous, single-threaded driver that executes work inline
26+
Other targets may work, but in-tree test coverage and performance work is
27+
focused on that list.
28+
29+
2. Pick one of IREE's supported executable formats:
3030

31-
!!! todo
31+
| Executable Format | Description |
32+
| ----------------- | ----------------------------------------------------- |
33+
| Embedded ELF | (Default) Portable, high performance dynamic library |
34+
| System library | Platform-specific dynamic library (.so, .dll, etc.) |
35+
| VMVX | Reference target |
3236

33-
Add IREE's CPU support matrix: what architectures are supported; what
34-
architectures are well optimized; etc.
37+
At runtime, CPU executables can be loaded using one of IREE's CPU HAL devices:
3538

36-
<!-- TODO(??): when to use CPU vs GPU vs other backends -->
39+
* `local-task`: asynchronous, multithreaded device built on IREE's "task"
40+
system
41+
* `local-sync`: synchronous, single-threaded devices that executes work inline
3742

3843
## :octicons-download-16: Prerequisites
3944

@@ -44,22 +49,17 @@ At runtime, CPU executables can be loaded using one of IREE's CPU HAL drivers:
4449
Python packages are distributed through multiple channels. See the
4550
[Python Bindings](../../reference/bindings/python.md) page for more details.
4651
The core [`iree-base-compiler`](https://pypi.org/project/iree-base-compiler/)
47-
package includes the LLVM-based CPU compiler:
52+
package includes the compiler tools:
4853

4954
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-compiler-from-release.md"
5055

5156
#### :material-hammer-wrench: Build the compiler from source
5257

5358
Please make sure you have followed the
5459
[Getting started](../../building-from-source/getting-started.md) page to build
55-
IREE for your host platform and the
56-
[Android cross-compilation](../../building-from-source/android.md) or
57-
[iOS cross-compilation](../../building-from-source/ios.md) page if you are cross
58-
compiling for a mobile device. The `llvm-cpu` compiler backend is compiled in by
59-
default on all platforms.
60-
61-
Ensure that the `IREE_TARGET_BACKEND_LLVM_CPU` CMake option is `ON` when
62-
configuring for the host.
60+
IREE for your host platform. The `llvm-cpu` compiler backend is compiled in by
61+
default on all platforms, though you should ensure that the
62+
`IREE_TARGET_BACKEND_LLVM_CPU` CMake option is `ON` when configuring.
6363

6464
!!! tip
6565
`iree-compile` will be built under the `iree-build/tools/` directory. You
@@ -71,10 +71,14 @@ You will need to get an IREE runtime that supports the local CPU HAL driver,
7171
along with the appropriate executable loaders for your application.
7272

7373
You can check for CPU support by looking for the `local-sync` and `local-task`
74-
drivers:
74+
drivers and devices:
7575

76-
```console hl_lines="5 6"
77-
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md"
76+
```console hl_lines="10-11"
77+
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md:1"
78+
```
79+
80+
```console hl_lines="4-5"
81+
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-device-list-amd.md"
7882
```
7983

8084
#### :octicons-download-16: Download the runtime from a release
@@ -88,47 +92,49 @@ package includes the local CPU HAL drivers:
8892

8993
#### :material-hammer-wrench: Build the runtime from source
9094

91-
Please make sure you have followed the
92-
[Getting started](../../building-from-source/getting-started.md) page to build
93-
IREE for your host platform and the
94-
[Android cross-compilation](../../building-from-source/android.md) page if you
95-
are cross compiling for Android. The local CPU HAL drivers are compiled in by
96-
default on all platforms.
97-
98-
Ensure that the `IREE_HAL_DRIVER_LOCAL_TASK` and
99-
`IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF` (or other executable loader) CMake
100-
options are `ON` when configuring for the target.
95+
Please make sure you have followed one of the
96+
[Building from source](../../building-from-source/index.md) pages to build
97+
IREE for your target platform. The local CPU HAL drivers and devices are
98+
compiled in by default on all platforms, though you should ensure that the
99+
`IREE_HAL_DRIVER_LOCAL_TASK` and `IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF`
100+
(or other executable loader) CMake options are `ON` when configuring.
101101

102102
## Compile and run a program
103103

104104
With the requirements out of the way, we can now compile a model and run it.
105105

106106
### :octicons-file-code-16: Compile a program
107107

108-
The IREE compiler transforms a model into its final deployable format in many
109-
sequential steps. A model authored with Python in an ML framework should use the
110-
corresponding framework's import tool to convert into a format (i.e.,
111-
[MLIR](https://mlir.llvm.org/)) expected by the IREE compiler first.
108+
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-import-onnx-mobilenet.md"
112109

113-
Using MobileNet v2 as an example, you can download the SavedModel with trained
114-
weights from
115-
[TensorFlow Hub](https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification)
116-
and convert it using IREE's
117-
[TensorFlow importer](../ml-frameworks/tensorflow.md). Then run the following
118-
command to compile with the `llvm-cpu` target:
110+
Then run the following command to compile with the `llvm-cpu` target:
119111

120-
``` shell hl_lines="2"
112+
``` shell hl_lines="2-3"
121113
iree-compile \
122114
--iree-hal-target-backends=llvm-cpu \
123-
mobilenet_iree_input.mlir -o mobilenet_cpu.vmfb
115+
--iree-llvmcpu-target-cpu=host \
116+
mobilenetv2.mlir -o mobilenet_cpu.vmfb
124117
```
125118

126-
!!! tip "Tip - CPU targets"
119+
???+ tip "Tip - Target CPUs and CPU features"
120+
121+
By default, the compiler will use a generic CPU target which will result in
122+
poor performance. A target CPU or target CPU feature set should be selected
123+
using one of these options:
124+
125+
* `--iree-llvmcpu-target-cpu=...`
126+
* `--iree-llvmcpu-target-cpu-features=...`
127+
128+
When not cross compiling, passing `--iree-llvmcpu-target-cpu=host` is
129+
usually sufficient on most devices.
130+
131+
???+ tip "Tip - CPU targets"
127132

128133
The `--iree-llvmcpu-target-triple` flag tells the compiler to generate code
129134
for a specific type of CPU. You can see the list of supported targets with
130-
`iree-compile --iree-llvmcpu-list-targets`, or pass "host" to let LLVM
131-
infer the triple from your host machine (e.g. `x86_64-linux-gnu`).
135+
`iree-compile --iree-llvmcpu-list-targets`, or use the default value of
136+
"host" to let LLVM infer the triple from your host machine
137+
(e.g. `x86_64-linux-gnu`).
132138

133139
```console
134140
$ iree-compile --iree-llvmcpu-list-targets
@@ -149,28 +155,21 @@ iree-compile \
149155
x86-64 - 64-bit X86: EM64T and AMD64
150156
```
151157

152-
!!! tip "Tip - CPU features"
153-
154-
The `--iree-llvmcpu-target-cpu-features` flag tells the compiler to generate
155-
code using certain CPU "features", like SIMD instruction sets. Like the
156-
target triple, you can pass "host" to this flag to let LLVM infer the
157-
features supported by your host machine.
158-
159158
### :octicons-terminal-16: Run a compiled program
160159

161-
In the build directory, run the following command:
160+
To run the compiled program:
162161

163162
``` shell hl_lines="2"
164-
tools/iree-run-module \
163+
iree-run-module \
165164
--device=local-task \
166165
--module=mobilenet_cpu.vmfb \
167-
--function=predict \
168-
--input="1x224x224x3xf32=0"
166+
--function=torch-jit-export \
167+
--input="1x3x224x224xf32=0"
169168
```
170169

171-
The above assumes the exported function in the model is named as `predict` and
172-
it expects one 224x224 RGB image. We are feeding in an image with all 0 values
173-
here for brevity, see `iree-run-module --help` for the format to specify
170+
The above assumes the exported function in the model is named `torch-jit-export`
171+
and it expects one 224x224 RGB image. We are feeding in an image with all 0
172+
values here for brevity, see `iree-run-module --help` for the format to specify
174173
concrete values.
175174

176175
<!-- TODO(??): measuring performance -->

docs/website/docs/guides/deployment-configurations/gpu-cuda.md

Lines changed: 36 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -52,16 +52,12 @@ Next you will need to get an IREE runtime that includes the CUDA HAL driver.
5252

5353
You can check for CUDA support by looking for a matching driver and device:
5454

55-
```console hl_lines="3"
56-
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md"
55+
```console hl_lines="8"
56+
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md:1"
5757
```
5858

5959
```console hl_lines="3"
60-
$ iree-run-module --list_devices
61-
62-
cuda://GPU-00000000-1111-2222-3333-444444444444
63-
local-sync://
64-
local-task://
60+
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-device-list-nvidia.md"
6561
```
6662

6763
#### :octicons-download-16: Download the runtime from a release
@@ -82,69 +78,63 @@ IREE from source, then enable the CUDA HAL driver with the
8278

8379
## Compile and run a program model
8480

85-
With the compiler and runtime ready, we can now compile programs and run them
86-
on GPUs.
81+
With the requirements out of the way, we can now compile a model and run it.
8782

8883
### :octicons-file-code-16: Compile a program
8984

90-
The IREE compiler transforms a model into its final deployable format in many
91-
sequential steps. A model authored with Python in an ML framework should use the
92-
corresponding framework's import tool to convert into a format (i.e.,
93-
[MLIR](https://mlir.llvm.org/)) expected by the IREE compiler first.
85+
--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-import-onnx-mobilenet.md"
9486

95-
Using MobileNet v2 as an example, you can download the SavedModel with trained
96-
weights from
97-
[TensorFlow Hub](https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification)
98-
and convert it using IREE's
99-
[TensorFlow importer](../ml-frameworks/tensorflow.md). Then run one of the
100-
following commands to compile:
87+
Then run the following command to compile with the `cuda` target:
10188

10289
```shell hl_lines="2-3"
10390
iree-compile \
10491
--iree-hal-target-backends=cuda \
10592
--iree-cuda-target=<...> \
106-
mobilenet_iree_input.mlir -o mobilenet_cuda.vmfb
93+
mobilenetv2.mlir -o mobilenet_cuda.vmfb
10794
```
10895

109-
Canonically a CUDA target (`iree-cuda-target`) matching the LLVM NVPTX backend
110-
of the form `sm_<arch_number>` is needed to compile towards each GPU
111-
architecture. If no architecture is specified then we will default to `sm_60`.
96+
???+ tip "Tip - CUDA targets"
97+
98+
Canonically a CUDA target (`iree-cuda-target`) matching the LLVM NVPTX
99+
backend of the form `sm_<arch_number>` is needed to compile towards each GPU
100+
architecture. If no architecture is specified then we will default to
101+
`sm_60`.
112102

113-
Here is a table of commonly used architectures:
103+
Here is a table of commonly used architectures:
114104

115-
| CUDA GPU | Target Architecture | Architecture Code Name
116-
| ------------------- | ------------------- | ----------------------
117-
| NVIDIA P100 | `sm_60` | `pascal`
118-
| NVIDIA V100 | `sm_70` | `volta`
119-
| NVIDIA A100 | `sm_80` | `ampere`
120-
| NVIDIA H100 | `sm_90` | `hopper`
121-
| NVIDIA RTX20 series | `sm_75` | `turing`
122-
| NVIDIA RTX30 series | `sm_86` | `ampere`
123-
| NVIDIA RTX40 series | `sm_89` | `ada`
105+
| CUDA GPU | Target Architecture | Architecture Code Name
106+
| ------------------- | ------------------- | ----------------------
107+
| NVIDIA P100 | `sm_60` | `pascal`
108+
| NVIDIA V100 | `sm_70` | `volta`
109+
| NVIDIA A100 | `sm_80` | `ampere`
110+
| NVIDIA H100 | `sm_90` | `hopper`
111+
| NVIDIA RTX20 series | `sm_75` | `turing`
112+
| NVIDIA RTX30 series | `sm_86` | `ampere`
113+
| NVIDIA RTX40 series | `sm_89` | `ada`
124114

125-
In addition to the canonical `sm_<arch_number>` scheme, `iree-cuda-target` also
126-
supports two additonal schemes to make a better developer experience:
115+
In addition to the canonical `sm_<arch_number>` scheme, `iree-cuda-target`
116+
also supports two additonal schemes to make a better developer experience:
127117

128-
* Architecture code names like `volta` or `ampere`
129-
* GPU product names like `a100` or `rtx3090`
118+
* Architecture code names like `volta` or `ampere`
119+
* GPU product names like `a100` or `rtx3090`
130120

131-
These two schemes are translated into the canonical form under the hood.
132-
We add support for common code/product names without aiming to be exhaustive.
133-
If the ones you want are missing, please use the canonical form.
121+
These two schemes are translated into the canonical form under the hood.
122+
We add support for common code/product names without aiming to be exhaustive.
123+
If the ones you want are missing, please use the canonical form.
134124

135125
### :octicons-terminal-16: Run a compiled program
136126

137-
Run the following command:
127+
To run the compiled program:
138128

139129
``` shell hl_lines="2"
140130
iree-run-module \
141131
--device=cuda \
142132
--module=mobilenet_cuda.vmfb \
143-
--function=predict \
144-
--input="1x224x224x3xf32=0"
133+
--function=torch-jit-export \
134+
--input="1x3x224x224xf32=0"
145135
```
146136

147-
The above assumes the exported function in the model is named as `predict` and
148-
it expects one 224x224 RGB image. We are feeding in an image with all 0 values
149-
here for brevity, see `iree-run-module --help` for the format to specify
137+
The above assumes the exported function in the model is named `torch-jit-export`
138+
and it expects one 224x224 RGB image. We are feeding in an image with all 0
139+
values here for brevity, see `iree-run-module --help` for the format to specify
150140
concrete values.

0 commit comments

Comments
 (0)