-
Notifications
You must be signed in to change notification settings - Fork 713
Description
🚀 The feature, motivation and pitch
Background:
Fp32 arithmetic typically is avoided in the embedded (microcontroller) domain, due to tight cycle and memory constraints. Hence, sensors usually produce integer data. Therefore, the input/output to an int8-quantized-NN should ideally be of integer dtype (int8) in order to save cycles and memory.
Current behavior:
Input/output is always fp32. Example:
fp32 int8 int8 fp32
input -- q -- accelerated subgraph -- dq -- output
Notes:
• In this example, “accelerated subgraph” is a node (subgraph) delegated to e.g. an NPU such as Ethos-U.
• For the Arm TOSA delegate, we have implemented a workaround (#3056), that tags the d/dq nodes directly connected to the input/output in order for the delegate not to consume those nodes. Hence….
• …the q and dq nodes above are executed on CPU, which cost memory and cycles.
Desired behavior:
Ideally, we’d like a mechanism to change the graph signature such that the int8-quantized-NN takes int8 input:
int8 int8
input -- accelerated subgraph -- output
How, where and when to do that in a way that works well with the rest of the framework is unclear.
Alternatives
No response
Additional context
No response
RFC (Optional)
No response