Skip to content

Commit d3cfa31

Browse files
committed
added qnn issue
1 parent be55afe commit d3cfa31

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

articles/ai-foundry/foundry-local/reference/reference-best-practice.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ foundry model info <model> --license
3535

3636
Foundry Local is designed for on-device inference and *not* distributed, containerized, or multi-machine production deployments.
3737

38+
3839
## Troubleshooting
3940

4041
### Common issues and solutions
@@ -44,11 +45,13 @@ Foundry Local is designed for on-device inference and *not* distributed, contain
4445
| Slow inference | CPU-only model with large parameter count | Use GPU-optimized model variants when available |
4546
| Model download failures | Network connectivity issues | Check your internet connection and run `foundry cache list` to verify cache status |
4647
| The service fails to start | Port conflicts or permission issues | Try `foundry service restart` or [report an issue](https://github.com/microsoft/Foundry-Local/issues) with logs using `foundry zip-logs` |
48+
| Qualcomm NPU error (`Qnn error code 5005: "Failed to load from EpContext model. qnn_backend_manager."`) | Qualcomm NPU error | |
4749

4850
### Improving performance
4951

5052
If you experience slow inference, consider the following strategies:
5153

54+
- Simultaneously running ONNX models provided in the AI Toolkit for VS Code cause resource contention. Stop the AI Toolkit inference session before running Foundry Local.
5255
- Use GPU acceleration when available
5356
- Identify bottlenecks by monitoring memory usage during inference.
5457
- Try more quantized model variants (like INT8 instead of FP16)

0 commit comments

Comments
 (0)