You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/backends-qualcomm.md
+84-3Lines changed: 84 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -286,6 +286,13 @@ The model, inputs, and output location are passed to `qnn_executorch_runner` by
286
286
287
287
Please refer to `$EXECUTORCH_ROOT/examples/qualcomm/scripts/` and `$EXECUTORCH_ROOT/examples/qualcomm/oss_scripts/` to the list of supported models.
288
288
289
+
Each script demonstrates:
290
+
- Model export (torch.export)
291
+
- Quantization (PTQ/QAT)
292
+
- Lowering and compilation to QNN delegate
293
+
294
+
Deployment on device or HTP emulator
295
+
289
296
## How to Support a Custom Model in HTP Backend
290
297
291
298
### Step-by-Step Implementation Guide
@@ -389,12 +396,86 @@ with open(model_name, "wb") as f:
389
396
print(f"Model successfully exported to {model_name}")
390
397
```
391
398
392
-
## What is coming?
399
+
## Deep Dive
400
+
401
+
### Partitioner API
402
+
403
+
The **QnnPartitioner** identifies and groups supported subgraphs for execution on the QNN backend.
404
+
It uses `QnnOperatorSupport` to check node-level compatibility with the Qualcomm backend via QNN SDK APIs.
405
+
406
+
The partitioner tags supported nodes with a `delegation_tag` and handles constants, buffers, and mutable states appropriately.
407
+
Please checkout [QNNPartitioner](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/partition/qnn_partitioner.py#L125) for the latest changes. It mostly supports the following 4 inputs, and only compile spec is required
408
+
```python
409
+
classQnnPartitioner(Partitioner):
410
+
"""
411
+
QnnPartitioner identifies subgraphs that can be lowered to QNN backend, by tagging nodes for delegation,
412
+
and manages special cases such as mutable buffers and consumed constants.
413
+
"""
414
+
415
+
def__init__(
416
+
self,
417
+
compiler_specs: List[CompileSpec],
418
+
skip_node_id_set: set=None,
419
+
skip_node_op_set: set=None,
420
+
skip_mutable_buffer: bool=False,
421
+
):
422
+
...
423
+
```
424
+
425
+
### Quantization
426
+
Quantization in the QNN backend supports multiple data bit-widths and training modes (PTQ/QAT).
427
+
The QnnQuantizer defines quantization configurations and annotations compatible with Qualcomm hardware.
428
+
429
+
Supported schemes include:
430
+
- 8a8w (default)
431
+
- 16a16w
432
+
- 16a8w
433
+
- 16a4w
434
+
- 16a4w_block
435
+
436
+
437
+
Highlights:
438
+
- QuantDtype enumerates bit-width combinations for activations and weights.
439
+
- ModuleQConfig manages per-layer quantization behavior and observers.
440
+
- QnnQuantizer integrates with PT2E prepare/convert flow to annotate and quantize models.
441
+
442
+
Supports:
393
443
394
-
- Improve the performance for llama3-8B-Instruct and support batch prefill.
395
-
- We will support pre-compiled binaries from [Qualcomm AI Hub](https://aihub.qualcomm.com/).
444
+
- Per-channel and per-block quantization
445
+
446
+
- Custom quant annotation via custom_quant_annotations
447
+
448
+
- Skipping specific nodes or ops
449
+
450
+
- Per-module customization via submodule_qconfig_list
451
+
452
+
For details, see: backends/qualcomm/quantizer/quantizer.py
453
+
454
+
### Operator Support
455
+
[The full operator support matrix](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/backends/qualcomm/builders#operator-support-status is tracked and frequently updated in the ExecuTorch repository.
456
+
457
+
It lists:
458
+
- Supported PyTorch ops (aten.*, custom ops)
459
+
- Planned ops
460
+
- Deprecated ops
461
+
462
+
This matrix directly corresponds to the implementations in: [executorch/backends/qualcomm/builders/node_visitors/*.py](https://github.com/pytorch/executorch/tree/main/backends/qualcomm/builders)
463
+
464
+
### Custom Ops Support
465
+
466
+
You can extend QNN backend support for your own operators.
467
+
Follow the [tutorial](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/examples/qualcomm/custom_op#custom-operator-support):
468
+
469
+
It covers:
470
+
- Writing new NodeVisitor for your op
471
+
- Registering via @register_node_visitor
472
+
- Creating and linking libQnnOp*.so for the delegate
473
+
- Testing and verifying custom kernels on HTP
396
474
397
475
## FAQ
398
476
399
477
If you encounter any issues while reproducing the tutorial, please file a github
400
478
[issue](https://github.com/pytorch/executorch/issues) on ExecuTorch repo and tag use `#qcom_aisw` tag
479
+
480
+
### Debugging tips
481
+
- Before trying any complicated models, try out [a simple model example](https://github.com/pytorch/executorch/tree/f32cdc3de6f7176d70a80228f1a60bcd45d93437/examples/qualcomm#simple-examples-to-verify-the-backend-is-working) and see it if works one device.
0 commit comments