You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[5545101]: AutoCast: Add options to force include node/op in low precision
Add options nodes_to_include, op_types_to_include that force-include nodes in the conversion, overriding NodeClassifier exclusion logic
Signed-off-by: Gal Hubara Agam <[email protected]>
Copy file name to clipboardExpand all lines: docs/source/guides/8_autocast.rst
+15Lines changed: 15 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,6 +31,8 @@ AutoCast can also be used programmatically through its Python API:
31
31
low_precision_type="fp16", # or "bf16"
32
32
nodes_to_exclude=None, # optional list of node name patterns to keep in FP32
33
33
op_types_to_exclude=None, # optional list of op types to keep in FP32
34
+
nodes_to_include=None, # optional list of node name patterns to force-include in low precision
35
+
op_types_to_include=None, # optional list of op types to force-include in low precision
34
36
data_max=512, # threshold for node outputs
35
37
init_max=65504, # threshold for initializers
36
38
keep_io_types=False, # whether to preserve input/output types
@@ -60,6 +62,19 @@ AutoCast follows these steps to convert a model:
60
62
- Analyzes each node in the graph
61
63
- Determines which nodes should remain in FP32 based on input and output tensors magnitudes, operation types and node name patterns
62
64
- If a calibration dataset is provided, it will be used to generate intermediate tensor magnitudes for more accurate node classification, otherwise random data will be used.
65
+
- Use ``nodes_to_include`` and ``op_types_to_include`` to force-include nodes in low precision, even if they would otherwise be excluded.
66
+
67
+
- Default classification rules. Nodes that meet any of these rules will be kept in high precision:
68
+
- Node I/O magnitudes are higher than ``data_max`` (default: 512). Due to precision limitations, compute of high magnitude tensors in low precision might not be accurate. The unit in last place (ULP) for 512 is 0.5, for 1024 it is 1.0, etc.
69
+
- Initializers magnitudes are higher than ``init_max`` (default: 65504). Initializers are often used for non-compute intensive operations and are more likely to be controlled by the user. However, values above ``init_max`` will cause overflow, therefore they are kept in high precision.
70
+
71
+
Additional classification rules (disabled by default):
72
+
- ``max_depth_of_reduction``: Require nodes with a high depth of reduction (e.g., large matrix multiplications, convolutions with large kernels) to be kept in high precision.
73
+
- ``nodes_to_exclude``: List of regex patterns for node names to keep in high precision.
74
+
- ``op_types_to_exclude``: List of operation types to keep in high precision.
75
+
- ``nodes_to_include``: List of regex patterns for node names to force-include in low precision.
76
+
- ``op_types_to_include``: List of operation types to force-include in low precision.
77
+
- ``custom_rule``: Optional custom rule for node classification (inherits from NodeRuleBase).
0 commit comments