@@ -13,27 +13,32 @@ IREE supports efficient program execution on CPU devices by using
1313highly optimized CPU native instruction streams, which are embedded in one of
1414IREE's deployable formats.
1515
16- To compile a program for CPU execution, pick one of IREE's supported executable
17- formats:
16+ To compile a program for CPU execution:
1817
19- | Executable Format | Description |
20- | ----------------- | ----------------------------------------------------- |
21- | embedded ELF | portable, high performance dynamic library |
22- | system library | platform-specific dynamic library (.so, .dll, etc.) |
23- | VMVX | reference target |
18+ 1 . Pick a CPU target supported by LLVM. By default, IREE includes these LLVM
19+ targets:
2420
25- At runtime, CPU executables can be loaded using one of IREE's CPU HAL drivers:
21+ * X86
22+ * ARM
23+ * AArch64
24+ * RISCV
2625
27- * ` local-task ` : asynchronous, multithreaded driver built on IREE's "task"
28- system
29- * ` local-sync ` : synchronous, single-threaded driver that executes work inline
26+ Other targets may work, but in-tree test coverage and performance work is
27+ focused on that list.
28+
29+ 2 . Pick one of IREE's supported executable formats:
3030
31- !!! todo
31+ | Executable Format | Description |
32+ | ----------------- | ----------------------------------------------------- |
33+ | Embedded ELF | (Default) Portable, high performance dynamic library |
34+ | System library | Platform-specific dynamic library (.so, .dll, etc.) |
35+ | VMVX | Reference target |
3236
33- Add IREE's CPU support matrix: what architectures are supported; what
34- architectures are well optimized; etc.
37+ At runtime, CPU executables can be loaded using one of IREE's CPU HAL devices:
3538
36- <!-- TODO(??): when to use CPU vs GPU vs other backends -->
39+ * ` local-task ` : asynchronous, multithreaded device built on IREE's "task"
40+ system
41+ * ` local-sync ` : synchronous, single-threaded devices that executes work inline
3742
3843## :octicons-download-16: Prerequisites
3944
@@ -44,22 +49,17 @@ At runtime, CPU executables can be loaded using one of IREE's CPU HAL drivers:
4449Python packages are distributed through multiple channels. See the
4550[ Python Bindings] ( ../../reference/bindings/python.md ) page for more details.
4651The core [ ` iree-base-compiler ` ] ( https://pypi.org/project/iree-base-compiler/ )
47- package includes the LLVM-based CPU compiler:
52+ package includes the compiler tools :
4853
4954--8<-- "docs/website/docs/guides/deployment-configurations/snippets/_ iree-compiler-from-release.md"
5055
5156#### :material-hammer-wrench: Build the compiler from source
5257
5358Please make sure you have followed the
5459[ Getting started] ( ../../building-from-source/getting-started.md ) page to build
55- IREE for your host platform and the
56- [ Android cross-compilation] ( ../../building-from-source/android.md ) or
57- [ iOS cross-compilation] ( ../../building-from-source/ios.md ) page if you are cross
58- compiling for a mobile device. The ` llvm-cpu ` compiler backend is compiled in by
59- default on all platforms.
60-
61- Ensure that the ` IREE_TARGET_BACKEND_LLVM_CPU ` CMake option is ` ON ` when
62- configuring for the host.
60+ IREE for your host platform. The ` llvm-cpu ` compiler backend is compiled in by
61+ default on all platforms, though you should ensure that the
62+ ` IREE_TARGET_BACKEND_LLVM_CPU ` CMake option is ` ON ` when configuring.
6363
6464!!! tip
6565 ` iree-compile ` will be built under the ` iree-build/tools/ ` directory. You
@@ -71,10 +71,14 @@ You will need to get an IREE runtime that supports the local CPU HAL driver,
7171along with the appropriate executable loaders for your application.
7272
7373You can check for CPU support by looking for the ` local-sync ` and ` local-task `
74- drivers:
74+ drivers and devices :
7575
76- ``` console hl_lines="5 6"
77- --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md"
76+ ``` console hl_lines="10-11"
77+ --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-driver-list.md:1"
78+ ```
79+
80+ ``` console hl_lines="4-5"
81+ --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_iree-run-module-device-list-amd.md"
7882```
7983
8084#### :octicons-download-16: Download the runtime from a release
@@ -88,47 +92,49 @@ package includes the local CPU HAL drivers:
8892
8993#### :material-hammer-wrench: Build the runtime from source
9094
91- Please make sure you have followed the
92- [ Getting started] ( ../../building-from-source/getting-started.md ) page to build
93- IREE for your host platform and the
94- [ Android cross-compilation] ( ../../building-from-source/android.md ) page if you
95- are cross compiling for Android. The local CPU HAL drivers are compiled in by
96- default on all platforms.
97-
98- Ensure that the ` IREE_HAL_DRIVER_LOCAL_TASK ` and
99- ` IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF ` (or other executable loader) CMake
100- options are ` ON ` when configuring for the target.
95+ Please make sure you have followed one of the
96+ [ Building from source] ( ../../building-from-source/index.md ) pages to build
97+ IREE for your target platform. The local CPU HAL drivers and devices are
98+ compiled in by default on all platforms, though you should ensure that the
99+ ` IREE_HAL_DRIVER_LOCAL_TASK ` and ` IREE_HAL_EXECUTABLE_LOADER_EMBEDDED_ELF `
100+ (or other executable loader) CMake options are ` ON ` when configuring.
101101
102102## Compile and run a program
103103
104104With the requirements out of the way, we can now compile a model and run it.
105105
106106### :octicons-file-code-16: Compile a program
107107
108- The IREE compiler transforms a model into its final deployable format in many
109- sequential steps. A model authored with Python in an ML framework should use the
110- corresponding framework's import tool to convert into a format (i.e.,
111- [ MLIR] ( https://mlir.llvm.org/ ) ) expected by the IREE compiler first.
108+ --8<-- "docs/website/docs/guides/deployment-configurations/snippets/_ iree-import-onnx-mobilenet.md"
112109
113- Using MobileNet v2 as an example, you can download the SavedModel with trained
114- weights from
115- [ TensorFlow Hub] ( https://tfhub.dev/google/tf2-preview/mobilenet_v2/classification )
116- and convert it using IREE's
117- [ TensorFlow importer] ( ../ml-frameworks/tensorflow.md ) . Then run the following
118- command to compile with the ` llvm-cpu ` target:
110+ Then run the following command to compile with the ` llvm-cpu ` target:
119111
120- ``` shell hl_lines="2"
112+ ``` shell hl_lines="2-3 "
121113iree-compile \
122114 --iree-hal-target-backends=llvm-cpu \
123- mobilenet_iree_input.mlir -o mobilenet_cpu.vmfb
115+ --iree-llvmcpu-target-cpu=host \
116+ mobilenetv2.mlir -o mobilenet_cpu.vmfb
124117```
125118
126- !!! tip "Tip - CPU targets"
119+ ???+ tip "Tip - Target CPUs and CPU features"
120+
121+ By default, the compiler will use a generic CPU target which will result in
122+ poor performance. A target CPU or target CPU feature set should be selected
123+ using one of these options:
124+
125+ * `--iree-llvmcpu-target-cpu=...`
126+ * `--iree-llvmcpu-target-cpu-features=...`
127+
128+ When not cross compiling, passing `--iree-llvmcpu-target-cpu=host` is
129+ usually sufficient on most devices.
130+
131+ ???+ tip "Tip - CPU targets"
127132
128133 The `--iree-llvmcpu-target-triple` flag tells the compiler to generate code
129134 for a specific type of CPU. You can see the list of supported targets with
130- `iree-compile --iree-llvmcpu-list-targets`, or pass "host" to let LLVM
131- infer the triple from your host machine (e.g. `x86_64-linux-gnu`).
135+ `iree-compile --iree-llvmcpu-list-targets`, or use the default value of
136+ "host" to let LLVM infer the triple from your host machine
137+ (e.g. `x86_64-linux-gnu`).
132138
133139 ```console
134140 $ iree-compile --iree-llvmcpu-list-targets
@@ -149,28 +155,21 @@ iree-compile \
149155 x86-64 - 64-bit X86: EM64T and AMD64
150156 ```
151157
152- !!! tip "Tip - CPU features"
153-
154- The `--iree-llvmcpu-target-cpu-features` flag tells the compiler to generate
155- code using certain CPU "features", like SIMD instruction sets. Like the
156- target triple, you can pass "host" to this flag to let LLVM infer the
157- features supported by your host machine.
158-
159158### :octicons-terminal-16: Run a compiled program
160159
161- In the build directory, run the following command :
160+ To run the compiled program :
162161
163162``` shell hl_lines="2"
164- tools/ iree-run-module \
163+ iree-run-module \
165164 --device=local-task \
166165 --module=mobilenet_cpu.vmfb \
167- --function=predict \
168- --input=" 1x224x224x3xf32 =0"
166+ --function=torch-jit-export \
167+ --input=" 1x3x224x224xf32 =0"
169168```
170169
171- The above assumes the exported function in the model is named as ` predict ` and
172- it expects one 224x224 RGB image. We are feeding in an image with all 0 values
173- here for brevity, see ` iree-run-module --help ` for the format to specify
170+ The above assumes the exported function in the model is named ` torch-jit-export `
171+ and it expects one 224x224 RGB image. We are feeding in an image with all 0
172+ values here for brevity, see ` iree-run-module --help ` for the format to specify
174173concrete values.
175174
176175<!-- TODO(??): measuring performance -->
0 commit comments