Skip to content

The ExecuTorch SwiftPM binary distribution (v0.7.0) fails to load the "forward" method from exported PTE models with **Error 32 (NotFound)**, while the same PTE files work perfectly with the Python ExecuTorch runtime (v0.7.0)Β #14809

@andytriboletti

Description

@andytriboletti

πŸ› Describe the bug

ExecuTorch SwiftPM Binary Distribution Bug Report

Summary

The ExecuTorch SwiftPM binary distribution (v0.7.0) fails to load the "forward" method from exported PTE models with Error 32 (NotFound), while the same PTE files work perfectly with the Python ExecuTorch runtime (v0.7.0).

Environment

  • Platform: macOS 26.0.1 (Apple Silicon M1/M2)
  • Xcode: 26.0.1
  • ExecuTorch Python: 0.7.0 (pip package)
  • ExecuTorch SwiftPM: 0.7.0 (branch: swiftpm-0.7.0)
  • Model: SmolLM-135M exported with LLM exporter
  • Export Command:
python -m executorch.examples.models.llama.export_llama \
  --model smollm2 \
  -X --pt2e_quantize xnnpack_dynamic \
  --max_seq_length 128 \
  --params params.json \
  -n smollm2_simple.pte \
  -o . \
  -v

Issue Description

What Works (Python Runtime)

from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer

with open("smollm2_simple.pte", "rb") as f:
    module = _load_for_executorch_from_buffer(f.read())

# This works perfectly
tokens = torch.tensor([[1, 72, 101, 108, 108, 111]], dtype=torch.long)
result = module.forward([tokens])
print(result[0].shape)  # Output: torch.Size([1, 49152])

Result: βœ… Works perfectly, generates coherent text

What Fails (C++ SwiftPM Runtime)

// Load model
auto module = std::make_unique<torch::executor::Module>(modelPath, ...);
auto load_err = module->load();
// load_err == Error::Ok βœ…

// Try to load forward method
auto load_method_err = module->load_method("forward");
// load_method_err == Error::NotFound (32) ❌

// Try to execute forward
std::vector<executorch::runtime::EValue> inputs;
inputs.emplace_back(*input_tensor);
auto result = module->forward(inputs);
// result.error() == Error::NotFound (32) ❌

// Try execute("forward", ...)
auto result2 = module->execute("forward", inputs);
// result2.error() == Error::NotFound (32) ❌

Result: ❌ Error 32 (NotFound) - method cannot be loaded or executed

Detailed Observations

1. Method Exists in Metadata

auto method_names = module->method_names();
// Returns: ["enable_dynamic_shape", "use_sdpa_with_kv_cache", "get_n_layers", 
//           "get_eos_ids", "get_max_seq_len", "get_vocab_size", "get_bos_id", 
//           "get_max_context_len", "use_kv_cache", "forward"]

The "forward" method is listed in method_names(), but cannot be loaded.

2. Method Metadata is Accessible

auto meta_result = module->method_meta("forward");
// meta_result.ok() == true βœ…

auto meta = meta_result.get();
// meta.num_inputs() == 1
// meta.input_tensor_meta(0).dtype() == ScalarType::Long (4)
// meta.input_tensor_meta(0).sizes() == [1, dynamic]
// meta.num_outputs() == 1
// meta.output_tensor_meta(0).dtype() == ScalarType::Float (6)
// meta.output_tensor_meta(0).sizes() == [1, 49152]

Metadata is accessible and shows correct input/output signatures.

3. All Operators Are Present

We verified that all required operators are present in the SwiftPM binaries by:

  • Extracting operator list from PTE using gen_oplist.py
  • Confirming all operators exist in kernels_optimized and backend_xnnpack
  • Disabling custom operator registration β†’ error remains 32 (not 20/OperatorMissing)

4. KV Cache Methods Also Fail

auto kv_result = module->execute("use_kv_cache", {true});
// kv_result.error() == Error::InvalidArgument (18) ❌

Helper methods added by LLM exporter also fail.

Attempted Solutions

βœ… Tried: Custom Operator Registration

  • Generated selected_operators.yaml from PTE
  • Created RegisterCodegenUnboxedKernelsEverything.cpp
  • Linked all required kernels (optimized, xnnpack, custom, quantized)
  • Result: Same error (32)

βœ… Tried: Different Input Signatures

Attempted all possible input combinations:

  • [ids:int64 1xL], [ids:int32 1xL]
  • [ids:int64 L], [ids:int32 L]
  • [ids:int64, mask:int64], [ids:int32, mask:int32]
  • [ids:int64, seq_len:int32], [ids:int32, seq_len:int32]
  • [ids:int64, positions:int64], [ids:int32, positions:int32]
  • Fixed sequence lengths: K ∈ {16, 32, 64, 128, 256}
  • Result: All return error 32

βœ… Tried: Different Module Loading Patterns

  • Owner module (path-based): Module(path, ...)
  • Allocator-backed module: Module(path, ..., allocator)
  • Explicit load_forward() calls
  • Result: All return error 32

βœ… Tried: Version Alignment

  • Downgraded Python to 0.4.0 β†’ same error
  • Upgraded Python to 0.7.0 β†’ works in Python, fails in C++
  • Result: Python works, C++ fails regardless of version

Root Cause Hypothesis

The SwiftPM binary distribution appears to have a fundamental incompatibility with PTE files exported by the Python ExecuTorch 0.7.0 package. Specifically:

  1. Method Plan Loading: The C++ runtime cannot load method plans from PTE files, even though:

    • The method exists in metadata
    • All operators are present
    • The Python runtime loads the same file successfully
  2. SwiftPM Branch Discrepancy: The swiftpm-0.7.0 branch doesn't exist in the official PyTorch/ExecuTorch repository. The latest official SwiftPM branch is swiftpm-0.4.0, suggesting the 0.7.0 binaries may be:

    • An unofficial build
    • An outdated/incomplete build
    • Missing critical runtime components

Workaround

We implemented a Python subprocess bridge that:

  1. Spawns Python process running ExecuTorch
  2. Communicates via JSON over stdin/stdout
  3. Uses the working Python runtime for inference
  4. Adds ~1-5ms IPC overhead per token (negligible)

Result: βœ… Fully functional, generates coherent text

Expected Behavior

The C++ SwiftPM runtime should be able to:

  1. Load the "forward" method: load_method("forward") β†’ Error::Ok
  2. Execute forward pass: forward(inputs) β†’ valid output tensor
  3. Match Python runtime behavior exactly

Actual Behavior

  • load_method("forward") β†’ Error::NotFound (32)
  • forward(inputs) β†’ Error::NotFound (32)
  • execute("forward", inputs) β†’ Error::NotFound (32)

Reproduction Steps

  1. Export model (Python):
python -m executorch.examples.models.llama.export_llama \
  --model smollm2 \
  -X --pt2e_quantize xnnpack_dynamic \
  --max_seq_length 128 \
  --params params.json \
  -n model.pte
  1. Test in Python (works):
from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer
import torch

with open("model.pte", "rb") as f:
    module = _load_for_executorch_from_buffer(f.read())

tokens = torch.tensor([[1, 2, 3]], dtype=torch.long)
result = module.forward([tokens])
print("Success:", result[0].shape)
  1. Test in C++ with SwiftPM (fails):
#import <executorch/extension/module/module.h>

auto module = std::make_unique<torch::executor::Module>(modelPath);
module->load();

auto result = module->load_method("forward");
// result == Error::NotFound (32)

Additional Information

  • PTE File Size: 238 MB
  • Model Architecture: SmolLM-135M (30 layers, 9 heads, vocab_size=49152)
  • Quantization: XNNPACK dynamic quantization
  • Backend: XNNPACK + optimized kernels
  • Platform: macOS (arm64)

Questions

  1. Is the swiftpm-0.7.0 branch an official release?
  2. Are there known incompatibilities between Python-exported PTEs and SwiftPM binaries?
  3. Should we use a different export path for SwiftPM compatibility?
  4. Is there a recommended way to debug Error::NotFound (32) in method loading?

Files Available

  • smollm2_simple.pte - Exported model file (works in Python, fails in C++)
  • selected_operators.yaml - Extracted operator list
  • Full Xcode project with reproduction case

Happy to provide any additional information or test patches!

Versions

(venv) andytriboletti@macbookpro alientavern-v2 % python collect_env.py

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: macOS 26.0.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.3.19.1)
CMake version: version 4.1.2
Libc version: N/A

Python version: 3.13.7 (main, Aug 14 2025, 11:12:11) [Clang 17.0.0 (clang-1700.3.19.1)] (64-bit runtime)
Python platform: macOS-26.0.1-arm64-arm-64bit-Mach-O
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Is XPU available: N/A
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Apple M1 Max

Versions of relevant libraries:
[pip3] No relevant packages
[conda] Could not collect

cc @shoumikhin @cbilgin

Metadata

Metadata

Assignees

Labels

module: iosIssues related to iOS code, build, and executiontriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions