Skip to content

Commit 47db14f

Browse files
authored
Merge pull request #448 from NVIDIA/bowa_fallback
doc: documentation for partitioning phase as well as automatic fallback APIs
2 parents e63d8b2 + 6e0da4d commit 47db14f

File tree

1 file changed

+67
-0
lines changed

1 file changed

+67
-0
lines changed

core/partitioning/README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# TRTorch Partitioning
2+
3+
TRTorch partitioning phase is developed to support `automatic fallback` feature in TRTorch. This phase won't run by
4+
default until the automatic fallback feature is enabled.
5+
6+
On a high level, TRTorch partitioning phase does the following:
7+
- `Segmentation`. Go through the set of operators in order and verify if there is converter for each operator. Then,
8+
roughly separate the graph into parts that TRTorch can support and parts TRTorch cannot.
9+
- `Dependency Analysis`. For every to be compiled operator there is a "complete dependency graph", which means that
10+
every input can to traced back to an input as Tensor or TensorList. Go through all segments after segmentation then
11+
do dependency analysis to ensure that there are only Tensor/TensorList inputs and outputs for TensorRT segments.
12+
- `Shape Analysis`. For each segments, figure out the input and outputs shapes starting from the provided input shape
13+
from the user. Shapes can be calculated by running the graphs with JIT.
14+
- `Conversion`. Every TensorRT segments will be converted to TensorRT engine. This part is done in compiler.cpp, but
15+
it's still a phase in our partitioning process.
16+
- `Stitching`. Stitch all TensorRT engines with PyTorch nodes altogether.
17+
18+
Test cases for each of these components could be found [here](https://github.com/NVIDIA/TRTorch/tree/master/tests/core/partitioning).
19+
20+
Here is the brief description of functionalities of each file:
21+
- `PartitionInfo.h/cpp`: The automatic fallback APIs that is used for partitioning.
22+
- `SegmentedBlock.h/cpp`: The main data structures that is used to maintain information for each segments after segmentation.
23+
- `shape_analysis.h/cpp`: Code implementation to get the shapes for each segments by running them in JIT.
24+
- `partitioning.h/cpp`: APIs and main code implementation for partitioning phase.
25+
26+
### Automatic Fallback
27+
To enable automatic fallback feature, you can set following attributes in Python:
28+
```python
29+
import torch
30+
import trtorch
31+
32+
...
33+
model = MyModel()
34+
ts_model = torch.jit.script(model)
35+
trt_model = trtorch.compile(model, {
36+
...
37+
"torch_fallback" : {
38+
"enabled" : True,
39+
"min_block_size" : 3,
40+
"forced_fallback_ops": ["aten::add"],
41+
}
42+
})
43+
```
44+
- `enabled`: By default automatic fallback will be off. It is enabled by setting it to True.
45+
- `min_block_size`: The minimum number of consecutive operations that must satisfy to be converted to TensorRT. For
46+
example, if it's set to 3, then there must be 3 consecutive supported operators then this segments will be converted.
47+
- `forced_fallback_ops`: A list of strings that will be the names of operations that the user explicitly want to be in
48+
PyTorch nodes.
49+
50+
To enable automatic fallback feature in C++, following APIs could be uses:
51+
52+
```c++
53+
#include "torch/script.h"
54+
#include "trtorch/trtorch.h"
55+
56+
...
57+
auto in = torch::randn({1, 3, 224, 224}, {torch::kCUDA});
58+
59+
auto mod = trtorch::jit::load("trt_ts_module.ts");
60+
auto input_sizes = std::vector<trtorch::CompileSpec::InputRange>{{in.sizes()}};
61+
trtorch::CompileSpec cfg(input_sizes);
62+
cfg.torch_fallback = trtorch::CompileSpec::TorchFallback(true);
63+
cfg.torch_fallback.min_block_size = 2;
64+
cfg.torch_fallback.forced_fallback_ops.push_back("aten::relu");
65+
auto trt_mod = trtorch::CompileGraph(mod, cfg);
66+
auto out = trt_mod.forward({in});
67+
```

0 commit comments

Comments
 (0)