Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 84 additions & 3 deletions docs/source/backends-qualcomm.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,13 @@ The model, inputs, and output location are passed to `qnn_executorch_runner` by

Please refer to `$EXECUTORCH_ROOT/examples/qualcomm/scripts/` and `$EXECUTORCH_ROOT/examples/qualcomm/oss_scripts/` to the list of supported models.

Each script demonstrates:
- Model export (torch.export)
- Quantization (PTQ/QAT)
- Lowering and compilation to QNN delegate

Deployment on device or HTP emulator

## How to Support a Custom Model in HTP Backend

### Step-by-Step Implementation Guide
Expand Down Expand Up @@ -389,12 +396,86 @@ with open(model_name, "wb") as f:
print(f"Model successfully exported to {model_name}")
```

## What is coming?
## Deep Dive

### Partitioner API

The **QnnPartitioner** identifies and groups supported subgraphs for execution on the QNN backend.
It uses `QnnOperatorSupport` to check node-level compatibility with the Qualcomm backend via QNN SDK APIs.

The partitioner tags supported nodes with a `delegation_tag` and handles constants, buffers, and mutable states appropriately.
Please checkout [QNNPartitioner](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/partition/qnn_partitioner.py#L125) for the latest changes. It mostly supports the following 4 inputs, and only compile spec is required
```python
class QnnPartitioner(Partitioner):
"""
QnnPartitioner identifies subgraphs that can be lowered to QNN backend, by tagging nodes for delegation,
and manages special cases such as mutable buffers and consumed constants.
"""

def __init__(
self,
compiler_specs: List[CompileSpec],
skip_node_id_set: set = None,
skip_node_op_set: set = None,
skip_mutable_buffer: bool = False,
):
...
```

### Quantization
Quantization in the QNN backend supports multiple data bit-widths and training modes (PTQ/QAT).
The QnnQuantizer defines quantization configurations and annotations compatible with Qualcomm hardware.

Supported schemes include:
- 8a8w (default)
- 16a16w
- 16a8w
- 16a4w
- 16a4w_block


Highlights:
- QuantDtype enumerates bit-width combinations for activations and weights.
- ModuleQConfig manages per-layer quantization behavior and observers.
- QnnQuantizer integrates with PT2E prepare/convert flow to annotate and quantize models.

Supports:

- Improve the performance for llama3-8B-Instruct and support batch prefill.
- We will support pre-compiled binaries from [Qualcomm AI Hub](https://aihub.qualcomm.com/).
- Per-channel and per-block quantization

- Custom quant annotation via custom_quant_annotations

- Skipping specific nodes or ops

- Per-module customization via submodule_qconfig_list

For details, see: backends/qualcomm/quantizer/quantizer.py

### Operator Support
[The full operator support matrix](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/backends/qualcomm/builders#operator-support-status is tracked and frequently updated in the ExecuTorch repository.

It lists:
- Supported PyTorch ops (aten.*, custom ops)
- Planned ops
- Deprecated ops

This matrix directly corresponds to the implementations in: [executorch/backends/qualcomm/builders/node_visitors/*.py](https://github.com/pytorch/executorch/tree/main/backends/qualcomm/builders)

### Custom Ops Support

You can extend QNN backend support for your own operators.
Follow the [tutorial](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/examples/qualcomm/custom_op#custom-operator-support):

It covers:
- Writing new NodeVisitor for your op
- Registering via @register_node_visitor
- Creating and linking libQnnOp*.so for the delegate
- Testing and verifying custom kernels on HTP

## FAQ

If you encounter any issues while reproducing the tutorial, please file a github
[issue](https://github.com/pytorch/executorch/issues) on ExecuTorch repo and tag use `#qcom_aisw` tag

### Debugging tips
- Before trying any complicated models, try out [a simple model example](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/examples/qualcomm#simple-examples-to-verify-the-backend-is-working) and see it if works one device.
Loading