Skip to content

Commit 2d3a765

Browse files
authored
Qualcomm backend documentation update (#15043)
Added detailed sections on QnnPartitioner, quantization, operator support, and custom ops support.
1 parent 39f474f commit 2d3a765

File tree

1 file changed

+84
-3
lines changed

1 file changed

+84
-3
lines changed

docs/source/backends-qualcomm.md

Lines changed: 84 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,13 @@ The model, inputs, and output location are passed to `qnn_executorch_runner` by
286286

287287
Please refer to `$EXECUTORCH_ROOT/examples/qualcomm/scripts/` and `$EXECUTORCH_ROOT/examples/qualcomm/oss_scripts/` to the list of supported models.
288288

289+
Each script demonstrates:
290+
- Model export (torch.export)
291+
- Quantization (PTQ/QAT)
292+
- Lowering and compilation to QNN delegate
293+
294+
Deployment on device or HTP emulator
295+
289296
## How to Support a Custom Model in HTP Backend
290297

291298
### Step-by-Step Implementation Guide
@@ -389,12 +396,86 @@ with open(model_name, "wb") as f:
389396
print(f"Model successfully exported to {model_name}")
390397
```
391398

392-
## What is coming?
399+
## Deep Dive
400+
401+
### Partitioner API
402+
403+
The **QnnPartitioner** identifies and groups supported subgraphs for execution on the QNN backend.
404+
It uses `QnnOperatorSupport` to check node-level compatibility with the Qualcomm backend via QNN SDK APIs.
405+
406+
The partitioner tags supported nodes with a `delegation_tag` and handles constants, buffers, and mutable states appropriately.
407+
Please checkout [QNNPartitioner](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/partition/qnn_partitioner.py#L125) for the latest changes. It mostly supports the following 4 inputs, and only compile spec is required
408+
```python
409+
class QnnPartitioner(Partitioner):
410+
"""
411+
QnnPartitioner identifies subgraphs that can be lowered to QNN backend, by tagging nodes for delegation,
412+
and manages special cases such as mutable buffers and consumed constants.
413+
"""
414+
415+
def __init__(
416+
self,
417+
compiler_specs: List[CompileSpec],
418+
skip_node_id_set: set = None,
419+
skip_node_op_set: set = None,
420+
skip_mutable_buffer: bool = False,
421+
):
422+
...
423+
```
424+
425+
### Quantization
426+
Quantization in the QNN backend supports multiple data bit-widths and training modes (PTQ/QAT).
427+
The QnnQuantizer defines quantization configurations and annotations compatible with Qualcomm hardware.
428+
429+
Supported schemes include:
430+
- 8a8w (default)
431+
- 16a16w
432+
- 16a8w
433+
- 16a4w
434+
- 16a4w_block
435+
436+
437+
Highlights:
438+
- QuantDtype enumerates bit-width combinations for activations and weights.
439+
- ModuleQConfig manages per-layer quantization behavior and observers.
440+
- QnnQuantizer integrates with PT2E prepare/convert flow to annotate and quantize models.
441+
442+
Supports:
393443

394-
- Improve the performance for llama3-8B-Instruct and support batch prefill.
395-
- We will support pre-compiled binaries from [Qualcomm AI Hub](https://aihub.qualcomm.com/).
444+
- Per-channel and per-block quantization
445+
446+
- Custom quant annotation via custom_quant_annotations
447+
448+
- Skipping specific nodes or ops
449+
450+
- Per-module customization via submodule_qconfig_list
451+
452+
For details, see: backends/qualcomm/quantizer/quantizer.py
453+
454+
### Operator Support
455+
[The full operator support matrix](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/backends/qualcomm/builders#operator-support-status is tracked and frequently updated in the ExecuTorch repository.
456+
457+
It lists:
458+
- Supported PyTorch ops (aten.*, custom ops)
459+
- Planned ops
460+
- Deprecated ops
461+
462+
This matrix directly corresponds to the implementations in: [executorch/backends/qualcomm/builders/node_visitors/*.py](https://github.com/pytorch/executorch/tree/main/backends/qualcomm/builders)
463+
464+
### Custom Ops Support
465+
466+
You can extend QNN backend support for your own operators.
467+
Follow the [tutorial](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/examples/qualcomm/custom_op#custom-operator-support):
468+
469+
It covers:
470+
- Writing new NodeVisitor for your op
471+
- Registering via @register_node_visitor
472+
- Creating and linking libQnnOp*.so for the delegate
473+
- Testing and verifying custom kernels on HTP
396474

397475
## FAQ
398476

399477
If you encounter any issues while reproducing the tutorial, please file a github
400478
[issue](https://github.com/pytorch/executorch/issues) on ExecuTorch repo and tag use `#qcom_aisw` tag
479+
480+
### Debugging tips
481+
- Before trying any complicated models, try out [a simple model example](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/examples/qualcomm#simple-examples-to-verify-the-backend-is-working) and see it if works one device.

0 commit comments

Comments
 (0)