Skip to content

Commit 082e8e7

Browse files
lucylqdbort
authored andcommitted
Fix lowering example (#958)
Summary: Pull Request resolved: #958 ^ Reviewed By: kirklandsign Differential Revision: D50332532 fbshipit-source-id: baee429309b7f838161feef1e43ef4de60d86f5d
1 parent 4bf5cf7 commit 082e8e7

File tree

2 files changed

+23
-19
lines changed

2 files changed

+23
-19
lines changed

docs/source/concepts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ The default PAL implementation can be overridden if it doesn’t work for a part
224224

225225
Kernels that support a subset of tensor dtypes and/or dim orders.
226226

227-
## Partitioner
227+
## [Partitioner](./compiler-custom-compiler-passes#Partitioner)
228228

229229
Parts of a model may be delegated to run on an optimized backend. The partitioner splits the graph into the appropriate sub-networks and tags them for delegation.
230230

@@ -263,7 +263,7 @@ Models may lose accuracy after quantization. QAT enables higher accuracy compare
263263

264264
Techniques for performing computations and memory accesses on tensors with lower precision data, usually `int8`. Quantization improves model performance by lowering the memory usage and (usually) decreasing computational latency; depending on the hardware, computation done in lower precision will typically be faster, e.g. `int8` matmul vs `fp32` matmul. Often, quantization comes at the cost of model accuracy.
265265

266-
## Runtime
266+
## [Runtime](./runtime-overview.md)
267267

268268
The ExecuTorch runtime executes models on edge devices. It is responsible for program initialization, program execution and, optionally, destruction (releasing backend owned resources).
269269

docs/source/examples-end-to-end-to-lower-model-to-delegate.md

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Lowering a Model as a Delegate
22

3-
Audience: ML Engineers, who are interested in applying delegates to accelerate their program in runtime
3+
Audience: ML Engineers, who are interested in applying delegates to accelerate their program in runtime.
44

55
Backend delegation is an entry point for backends to process and execute PyTorch
6-
programs to leverage performance and efficiency benefits of specialized
6+
programs to leverage the performance and efficiency benefits of specialized
77
backends and hardware, while still providing PyTorch users with an experience
88
close to that of the PyTorch runtime. The backend delegate is usually either provided by
9-
ExecuTorch or vendors. The way to leverage delegate in your program is via a standard entry point `to_backend`.
9+
ExecuTorch or vendors. The way to leverage delegation in your program is via a standard entry point `to_backend`.
1010

1111

1212
## Frontend Interfaces
@@ -24,8 +24,7 @@ There are three flows for delegating a program to a backend:
2424
### Flow 1: Lowering the whole module
2525

2626
This flow starts from a traced graph module with Edge Dialect representation. To
27-
lower it, we call the following function which returns a `LoweredBackendModule`
28-
(more documentation on this function can be found in the Python API reference):
27+
lower it, we call the following function which returns a `LoweredBackendModule` (more documentation on this function can be found in the [Export API reference](export-to-executorch-api-reference.rst))
2928

3029
```python
3130
# defined in backend_api.py
@@ -45,7 +44,7 @@ that can be loaded by the runtime.
4544
The following is an example of this flow:
4645

4746
```python
48-
from executorch.exir.backend.backend_api import to_backend, MethodCompileSpec
47+
from executorch.exir.backend.backend_api import to_backend
4948
import executorch.exir as exir
5049
import torch
5150

@@ -66,7 +65,7 @@ to_be_lowered_exir_submodule = exir.capture(to_be_lowered, example_input).to_edg
6665
from executorch.exir.backend.test.backend_with_compiler_demo import (
6766
BackendWithCompilerDemo,
6867
)
69-
lowered_module = to_backend('BackendWithCompilerDemo', to_be_lowered_exir_submodule, [])
68+
lowered_module = to_backend('BackendWithCompilerDemo', to_be_lowered_exir_submodule.exported_program, [])
7069
```
7170

7271
We can serialize the program to a flatbuffer format by directly running:
@@ -132,7 +131,7 @@ def to_backend(
132131
```
133132

134133
This function takes in a `Partitioner` which adds a tag to all the nodes that
135-
are meant to be lowered. It will return a `partition_tags` mapping tags to
134+
are meant to be lowered. It will return a `partition_tags` dictionary mapping tags to
136135
backend names and module compile specs. The tagged nodes will then be
137136
partitioned and lowered to their mapped backends using Flow 1's process.
138137
Available helper partitioners are documented
@@ -141,8 +140,14 @@ will be inserted into the top-level module and serialized.
141140

142141
The following is an example of the flow:
143142
```python
144-
from executorch.exir.backend.backend_api import to_backend
145143
import executorch.exir as exir
144+
from executorch.exir.backend.backend_api import to_backend
145+
from executorch.exir.backend.test.op_partitioner_demo import AddMulPartitionerDemo
146+
from executorch.exir.program import (
147+
EdgeProgramManager,
148+
to_edge,
149+
)
150+
from torch.export import export
146151
import torch
147152

148153
class Model(torch.nn.Module):
@@ -160,12 +165,11 @@ class Model(torch.nn.Module):
160165

161166
model = Model()
162167
model_inputs = (torch.randn(1, 3), torch.randn(1, 3))
163-
gm = exir.capture(model, model_inputs).to_edge()
164168

165-
from executorch.exir.backend.test.op_partitioner_demo import AddMulPartitionerDemo
166-
exec_prog = to_backend(gm, AddMulPartitionerDemo).to_executorch(
167-
exir.ExecutorchBackendConfig(passes=SpecPropPass())
168-
)
169+
core_aten_ep = export(model, model_inputs)
170+
edge: EdgeProgramManager = to_edge(core_aten_ep)
171+
edge = edge.to_backend(AddMulPartitionerDemo)
172+
exec_prog = edge.to_executorch()
169173

170174
# Save the flatbuffer to a local file
171175
save_path = "delegate.pte"
@@ -177,8 +181,8 @@ with open(save_path, "wb") as f:
177181

178182
After having the program with delegates, to run the model with the backend, we'd need to register the backend.
179183
Depending on the delegate implementation, the backend can be registered either as part of global variables or
180-
explicitly registered inside main function.
184+
explicitly registered inside the main function.
181185

182-
- If it's registered during global variables initialization, the backend will be registered as long as it's static linked. Users only need to include the library as part of the dependency.
186+
- If it's registered during global variables initialization, the backend will be registered as long as it's statically linked. Users only need to include the library as part of the dependency.
183187

184-
- If the vendor provides an API to register the backend, users need to include the library as part of the dependency, and call the API provided by vendors to explicitly register the backend as part of the main function
188+
- If the vendor provides an API to register the backend, users need to include the library as part of the dependency, and call the API provided by vendors to explicitly register the backend as part of the main function.

0 commit comments

Comments
 (0)