Qualcomm backend documentation update (#15043)

cccclai · web-flow · commit 2d3a765cfe40 · 2025-10-16T12:08:21.000-07:00
Added detailed sections on QnnPartitioner, quantization, operator
support, and custom ops support.
diff --git a/docs/source/backends-qualcomm.md b/docs/source/backends-qualcomm.md
@@ -286,6 +286,13 @@ The model, inputs, and output location are passed to `qnn_executorch_runner` by
 
 Please refer to `$EXECUTORCH_ROOT/examples/qualcomm/scripts/` and `$EXECUTORCH_ROOT/examples/qualcomm/oss_scripts/` to the list of supported models.
 
+Each script demonstrates:
+- Model export (torch.export)
+- Quantization (PTQ/QAT)
+- Lowering and compilation to QNN delegate
+
+Deployment on device or HTP emulator
+
 ## How to Support a Custom Model in HTP Backend
 
 ### Step-by-Step Implementation Guide
@@ -389,12 +396,86 @@ with open(model_name, "wb") as f:
 print(f"Model successfully exported to {model_name}")
 ```
 
-## What is coming?
+## Deep Dive
+
+### Partitioner API
+
+The **QnnPartitioner** identifies and groups supported subgraphs for execution on the QNN backend.  
+It uses `QnnOperatorSupport` to check node-level compatibility with the Qualcomm backend via QNN SDK APIs.
+
+The partitioner tags supported nodes with a `delegation_tag` and handles constants, buffers, and mutable states appropriately.
+Please checkout [QNNPartitioner](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/partition/qnn_partitioner.py#L125) for the latest changes. It mostly supports the following 4 inputs, and only compile spec is required
+```python
+class QnnPartitioner(Partitioner):
+    """
+    QnnPartitioner identifies subgraphs that can be lowered to QNN backend, by tagging nodes for delegation,
+    and manages special cases such as mutable buffers and consumed constants.
+    """
+
+    def __init__(
+        self,
+        compiler_specs: List[CompileSpec],
+        skip_node_id_set: set = None,
+        skip_node_op_set: set = None,
+        skip_mutable_buffer: bool = False,
+    ):
+        ...
+```
+
+### Quantization
+Quantization in the QNN backend supports multiple data bit-widths and training modes (PTQ/QAT).
+The QnnQuantizer defines quantization configurations and annotations compatible with Qualcomm hardware.
+
+Supported schemes include:
+- 8a8w (default)
+- 16a16w
+- 16a8w
+- 16a4w
+- 16a4w_block
+
+
+Highlights:
+- QuantDtype enumerates bit-width combinations for activations and weights.
+- ModuleQConfig manages per-layer quantization behavior and observers.
+- QnnQuantizer integrates with PT2E prepare/convert flow to annotate and quantize models.
+
+Supports:
 
- - Improve the performance for llama3-8B-Instruct and support batch prefill.
- - We will support pre-compiled binaries from [Qualcomm AI Hub](https://aihub.qualcomm.com/).
+- Per-channel and per-block quantization
+
+- Custom quant annotation via custom_quant_annotations
+
+- Skipping specific nodes or ops
+
+- Per-module customization via submodule_qconfig_list
+
+For details, see: backends/qualcomm/quantizer/quantizer.py
+
+### Operator Support
+[The full operator support matrix](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/backends/qualcomm/builders#operator-support-status is tracked and frequently updated in the ExecuTorch repository.
+
+It lists:
+- Supported PyTorch ops (aten.*, custom ops)
+- Planned ops
+- Deprecated ops
+
+This matrix directly corresponds to the implementations in: [executorch/backends/qualcomm/builders/node_visitors/*.py](https://github.com/pytorch/executorch/tree/main/backends/qualcomm/builders)
+
+### Custom Ops Support
+
+You can extend QNN backend support for your own operators.
+Follow the [tutorial](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/examples/qualcomm/custom_op#custom-operator-support):
+
+It covers:
+- Writing new NodeVisitor for your op
+- Registering via @register_node_visitor
+- Creating and linking libQnnOp*.so for the delegate
+- Testing and verifying custom kernels on HTP
 
 ## FAQ
 
 If you encounter any issues while reproducing the tutorial, please file a github
 [issue](https://github.com/pytorch/executorch/issues) on ExecuTorch repo and tag use `#qcom_aisw` tag
+
+ ### Debugging tips
+ - Before trying any complicated models, try out [a simple model example](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/examples/qualcomm#simple-examples-to-verify-the-backend-is-working) and see it if works one device.