Skip to content

XNNPack execution fails on the listed modelsΒ #12134

@brucekk-kim

Description

@brucekk-kim

πŸ› Describe the bug

When executing XNNPack on models (listed below), errors occur.
We use benchmark.py from #11039.
@SS-JIA @kimishpatel @mergennachin
Commands:

# pte generation
python3 -m examples.xnnpack.aot_compiler --model_name dl3 --delegate --quantize -o xnnpack_pte/
# # xnnpack executor build
#!/bin/bash

if [[ -z $ANDROID_NDK_ROOT ]]; then
  echo "Please export ANDROID_NDK_ROOT=/path/to/ndk"
  exit -1
fi

CLEAN_BUILD="false"
BUILD_FOLDER="build-xnnpack"
BUILD_TYPE="release"

while [[ "$#" -gt 0 ]]; do
  case "$1" in
    -c|--clean_build) CLEAN_BUILD="true"; shift;;
    -d|--debug) BUILD_TYPE="Debug"; shift;;
    *) echo "unknow arg passed: $1"; exit 1;;
  esac
  shift
done

if [ "$CLEAN_BUILD" = true ]; then
  rm -rf $BUILD_FOLDER
fi

cmake \
  -DCMAKE_INSTALL_PREFIX=$BUILD_FOLDER \
  -DCMAKE_BUILD_TYPE=$BUILD_TYPE \
  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_ROOT/build/cmake/android.toolchain.cmake \
  -DANDROID_ABI='arm64-v8a' \
  -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
  -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
  -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
  -DEXECUTORCH_BUILD_XNNPACK=ON \
  -DEXECUTORCH_ENABLE_LOGGING=ON \
  -DPYTHON_EXECUTABLE=python \
  -B$BUILD_FOLDER .

cmake --build $BUILD_FOLDER -j9 --target install --config $BUILD_TYPE

# benchmark
python3 benchmark.py -p xnnpack_pte/dl3_xnnpack_q8.pte -s <ADB_SERIAL_NUM> -b xnn

output summary:

name    load    1st     avg     peak_mem        avg_mem note
xnnpack_pte/dl3_xnnpack_fp32.pte         0       0       0       0       0       Aborted
xnnpack_pte/dl3_xnnpack_q8.pte   0       0       0       0       0       Aborted
xnnpack_pte/edsr_xnnpack_fp32.pte        14.256  612.801         468.82162       323398  323236.459      success
xnnpack_pte/edsr_xnnpack_q8.pte  0       0       0       0       0       Aborted
xnnpack_pte/emformer_transcribe_xnnpack_fp32.pte         0       0       0       0       0       libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
xnnpack_pte/emformer_transcribe_xnnpack_q8.pte   0       0       0       0       0       Aborted
xnnpack_pte/ic3_xnnpack_fp32.pte         0       0       0       0       0       Aborted
xnnpack_pte/ic3_xnnpack_q8.pte   0       0       0       0       0       Aborted
xnnpack_pte/ic4_xnnpack_fp32.pte         0       0       0       3234    3234.0  Aborted
xnnpack_pte/ic4_xnnpack_q8.pte   0       0       0       0       0       Aborted
xnnpack_pte/mobilebert_xnnpack_fp32.pte  0       0       0       10530   10530.0         libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
xnnpack_pte/mv3_xnnpack_fp32.pte         11.439  64.515  2.68764         0       0       awk: division by zero
xnnpack_pte/mv3_xnnpack_q8.pte   0       0       0       1814    1814.0  Aborted
xnnpack_pte/resnet18_xnnpack_fp32.pte    95.514  86.227  13.44775        66779   66768.308       success
xnnpack_pte/resnet18_xnnpack_q8.pte      0       0       0       0       0       Aborted
xnnpack_pte/resnet50_xnnpack_fp32.pte    108.011         181.618         31.16065        171762  128501.375      Unable to read dmabuf info for 29909
xnnpack_pte/resnet50_xnnpack_q8.pte      0       0       0       0       0       Aborted
xnnpack_pte/vit_xnnpack_fp32.pte         0       0       0       0       0       Aborted
xnnpack_pte/vit_xnnpack_q8.pte   0       0       0       0       0       Aborted
xnnpack_pte/w2l_xnnpack_fp32.pte         111.403         122.821         18.16669        224629  143850.55       success

outputs

command: python3 benchmark.py -p xnnpack_pte/dl3_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/dl3_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/edsr_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
load: 14.256000
1st: 612.801000
avg: 468.821620
peak_mem: 323398
avg_mem: 323236.459

command: python3 benchmark.py -p xnnpack_pte/edsr_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/emformer_transcribe_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/emformer_transcribe_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/ic3_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
Aborted
Invalid arguments - only one [PID] argument is allowed
Usage: dmabuf_dump [-abh] [PID] [-o <raw|csv>]
-a       show all dma buffers (ion) in big table, [buffer x process] grid
-b       show DMA-BUF per-buffer, per-exporter and per-device statistics
-o       [raw][csv] print output in the specified format.
-h       show this help
         If PID is supplied, the dmabuf information for that process is shown.
         Per-buffer DMA-BUF stats do not take an argument.
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/ic3_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/ic4_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
Aborted
Invalid arguments - only one [PID] argument is allowed
Usage: dmabuf_dump [-abh] [PID] [-o <raw|csv>]
-a       show all dma buffers (ion) in big table, [buffer x process] grid
-b       show DMA-BUF per-buffer, per-exporter and per-device statistics
-o       [raw][csv] print output in the specified format.
-h       show this help
         If PID is supplied, the dmabuf information for that process is shown.
         Per-buffer DMA-BUF stats do not take an argument.
peak_mem: 3234
avg_mem: 3234.000

command: python3 benchmark.py -p xnnpack_pte/ic4_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/mobilebert_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
libc++abi: terminating due to uncaught exception of type std::out_of_range: unordered_map::at: key not found
Aborted
peak_mem: 10530
avg_mem: 10530.000

command: python3 benchmark.py -p xnnpack_pte/mv3_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
awk: division by zero
 source line number 1
load: 11.439000
1st: 64.515000
avg: 2.687640
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/mv3_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
Invalid arguments - only one [PID] argument is allowed
Usage: dmabuf_dump [-abh] [PID] [-o <raw|csv>]
-a       show all dma buffers (ion) in big table, [buffer x process] grid
-b       show DMA-BUF per-buffer, per-exporter and per-device statistics
-o       [raw][csv] print output in the specified format.
-h       show this help
         If PID is supplied, the dmabuf information for that process is shown.
         Per-buffer DMA-BUF stats do not take an argument.
peak_mem: 1814
avg_mem: 1814.000

command: python3 benchmark.py -p xnnpack_pte/resnet18_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
load: 95.514000
1st: 86.227000
avg: 13.447750
peak_mem: 66779
avg_mem: 66768.308

command: python3 benchmark.py -p xnnpack_pte/resnet18_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/resnet50_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
Unable to read dmabuf info for 29909
load: 108.011000
1st: 181.618000
avg: 31.160650
peak_mem: 171762
avg_mem: 128501.375

command: python3 benchmark.py -p xnnpack_pte/resnet50_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/vit_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/vit_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
Aborted
awk: division by zero
 source line number 1
peak_mem: 0
avg_mem:

command: python3 benchmark.py -p xnnpack_pte/w2l_xnnpack_fp32.pte -s 172.17.32.1:80 -b xnn
load: 111.403000
1st: 122.821000
avg: 18.166690
peak_mem: 224629
avg_mem: 143850.550

command: python3 benchmark.py -p xnnpack_pte/w2l_xnnpack_q8.pte -s 172.17.32.1:80 -b xnn
load: 126.316000
1st: 153.128000
avg: 18.250880
peak_mem: 224564
avg_mem: 144070.632

Versions

Collecting environment information...
PyTorch version: 2.7.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.31.6
Libc version: glibc-2.35

Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
Model name: 13th Gen Intel(R) Core(TM) i7-1360P
CPU family: 6
Model: 186
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 2
BogoMIPS: 5222.41
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 384 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 10 MiB (8 instances)
L3 cache: 18 MiB (1 instance)
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Vulnerable: No microcode
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] executorch==0.6.0a0+2d5c84f
[pip3] numpy==2.2.6
[pip3] torch==2.7.0+cpu
[pip3] torchao==0.10.0+git8b264ce1
[pip3] torchaudio==2.7.0+cpu
[pip3] torchsr==1.0.4
[pip3] torchvision==0.22.0+cpu
[conda] executorch 0.6.0a0+2d5c84f pypi_0 pypi
[conda] numpy 2.2.6 pypi_0 pypi
[conda] torch 2.7.0+cpu pypi_0 pypi
[conda] torchao 0.10.0+git8b264ce1 pypi_0 pypi
[conda] torchaudio 2.7.0+cpu pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.22.0+cpu pypi_0 pypi

cc @digantdesai @mcr229 @cbilgin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions