-
Notifications
You must be signed in to change notification settings - Fork 680
Description
π Describe the bug
ExecuTorch SwiftPM Binary Distribution Bug Report
Summary
The ExecuTorch SwiftPM binary distribution (v0.7.0) fails to load the "forward" method from exported PTE models with Error 32 (NotFound), while the same PTE files work perfectly with the Python ExecuTorch runtime (v0.7.0).
Environment
- Platform: macOS 26.0.1 (Apple Silicon M1/M2)
- Xcode: 26.0.1
- ExecuTorch Python: 0.7.0 (pip package)
- ExecuTorch SwiftPM: 0.7.0 (branch: swiftpm-0.7.0)
- Model: SmolLM-135M exported with LLM exporter
- Export Command:
python -m executorch.examples.models.llama.export_llama \
--model smollm2 \
-X --pt2e_quantize xnnpack_dynamic \
--max_seq_length 128 \
--params params.json \
-n smollm2_simple.pte \
-o . \
-v
Issue Description
What Works (Python Runtime)
from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer
with open("smollm2_simple.pte", "rb") as f:
module = _load_for_executorch_from_buffer(f.read())
# This works perfectly
tokens = torch.tensor([[1, 72, 101, 108, 108, 111]], dtype=torch.long)
result = module.forward([tokens])
print(result[0].shape) # Output: torch.Size([1, 49152])
Result: β Works perfectly, generates coherent text
What Fails (C++ SwiftPM Runtime)
// Load model
auto module = std::make_unique<torch::executor::Module>(modelPath, ...);
auto load_err = module->load();
// load_err == Error::Ok β
// Try to load forward method
auto load_method_err = module->load_method("forward");
// load_method_err == Error::NotFound (32) β
// Try to execute forward
std::vector<executorch::runtime::EValue> inputs;
inputs.emplace_back(*input_tensor);
auto result = module->forward(inputs);
// result.error() == Error::NotFound (32) β
// Try execute("forward", ...)
auto result2 = module->execute("forward", inputs);
// result2.error() == Error::NotFound (32) β
Result: β Error 32 (NotFound) - method cannot be loaded or executed
Detailed Observations
1. Method Exists in Metadata
auto method_names = module->method_names();
// Returns: ["enable_dynamic_shape", "use_sdpa_with_kv_cache", "get_n_layers",
// "get_eos_ids", "get_max_seq_len", "get_vocab_size", "get_bos_id",
// "get_max_context_len", "use_kv_cache", "forward"]
The "forward" method is listed in method_names()
, but cannot be loaded.
2. Method Metadata is Accessible
auto meta_result = module->method_meta("forward");
// meta_result.ok() == true β
auto meta = meta_result.get();
// meta.num_inputs() == 1
// meta.input_tensor_meta(0).dtype() == ScalarType::Long (4)
// meta.input_tensor_meta(0).sizes() == [1, dynamic]
// meta.num_outputs() == 1
// meta.output_tensor_meta(0).dtype() == ScalarType::Float (6)
// meta.output_tensor_meta(0).sizes() == [1, 49152]
Metadata is accessible and shows correct input/output signatures.
3. All Operators Are Present
We verified that all required operators are present in the SwiftPM binaries by:
- Extracting operator list from PTE using
gen_oplist.py
- Confirming all operators exist in
kernels_optimized
andbackend_xnnpack
- Disabling custom operator registration β error remains 32 (not 20/OperatorMissing)
4. KV Cache Methods Also Fail
auto kv_result = module->execute("use_kv_cache", {true});
// kv_result.error() == Error::InvalidArgument (18) β
Helper methods added by LLM exporter also fail.
Attempted Solutions
β Tried: Custom Operator Registration
- Generated
selected_operators.yaml
from PTE - Created
RegisterCodegenUnboxedKernelsEverything.cpp
- Linked all required kernels (optimized, xnnpack, custom, quantized)
- Result: Same error (32)
β Tried: Different Input Signatures
Attempted all possible input combinations:
[ids:int64 1xL]
,[ids:int32 1xL]
[ids:int64 L]
,[ids:int32 L]
[ids:int64, mask:int64]
,[ids:int32, mask:int32]
[ids:int64, seq_len:int32]
,[ids:int32, seq_len:int32]
[ids:int64, positions:int64]
,[ids:int32, positions:int32]
- Fixed sequence lengths: K β {16, 32, 64, 128, 256}
- Result: All return error 32
β Tried: Different Module Loading Patterns
- Owner module (path-based):
Module(path, ...)
- Allocator-backed module:
Module(path, ..., allocator)
- Explicit
load_forward()
calls - Result: All return error 32
β Tried: Version Alignment
- Downgraded Python to 0.4.0 β same error
- Upgraded Python to 0.7.0 β works in Python, fails in C++
- Result: Python works, C++ fails regardless of version
Root Cause Hypothesis
The SwiftPM binary distribution appears to have a fundamental incompatibility with PTE files exported by the Python ExecuTorch 0.7.0 package. Specifically:
-
Method Plan Loading: The C++ runtime cannot load method plans from PTE files, even though:
- The method exists in metadata
- All operators are present
- The Python runtime loads the same file successfully
-
SwiftPM Branch Discrepancy: The
swiftpm-0.7.0
branch doesn't exist in the official PyTorch/ExecuTorch repository. The latest official SwiftPM branch isswiftpm-0.4.0
, suggesting the 0.7.0 binaries may be:- An unofficial build
- An outdated/incomplete build
- Missing critical runtime components
Workaround
We implemented a Python subprocess bridge that:
- Spawns Python process running ExecuTorch
- Communicates via JSON over stdin/stdout
- Uses the working Python runtime for inference
- Adds ~1-5ms IPC overhead per token (negligible)
Result: β Fully functional, generates coherent text
Expected Behavior
The C++ SwiftPM runtime should be able to:
- Load the "forward" method:
load_method("forward")
βError::Ok
- Execute forward pass:
forward(inputs)
β valid output tensor - Match Python runtime behavior exactly
Actual Behavior
load_method("forward")
βError::NotFound (32)
forward(inputs)
βError::NotFound (32)
execute("forward", inputs)
βError::NotFound (32)
Reproduction Steps
- Export model (Python):
python -m executorch.examples.models.llama.export_llama \
--model smollm2 \
-X --pt2e_quantize xnnpack_dynamic \
--max_seq_length 128 \
--params params.json \
-n model.pte
- Test in Python (works):
from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer
import torch
with open("model.pte", "rb") as f:
module = _load_for_executorch_from_buffer(f.read())
tokens = torch.tensor([[1, 2, 3]], dtype=torch.long)
result = module.forward([tokens])
print("Success:", result[0].shape)
- Test in C++ with SwiftPM (fails):
#import <executorch/extension/module/module.h>
auto module = std::make_unique<torch::executor::Module>(modelPath);
module->load();
auto result = module->load_method("forward");
// result == Error::NotFound (32)
Additional Information
- PTE File Size: 238 MB
- Model Architecture: SmolLM-135M (30 layers, 9 heads, vocab_size=49152)
- Quantization: XNNPACK dynamic quantization
- Backend: XNNPACK + optimized kernels
- Platform: macOS (arm64)
Questions
- Is the
swiftpm-0.7.0
branch an official release? - Are there known incompatibilities between Python-exported PTEs and SwiftPM binaries?
- Should we use a different export path for SwiftPM compatibility?
- Is there a recommended way to debug Error::NotFound (32) in method loading?
Files Available
smollm2_simple.pte
- Exported model file (works in Python, fails in C++)selected_operators.yaml
- Extracted operator list- Full Xcode project with reproduction case
Happy to provide any additional information or test patches!
Versions
(venv) andytriboletti@macbookpro alientavern-v2 % python collect_env.py
Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A
OS: macOS 26.0.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.3.19.1)
CMake version: version 4.1.2
Libc version: N/A
Python version: 3.13.7 (main, Aug 14 2025, 11:12:11) [Clang 17.0.0 (clang-1700.3.19.1)] (64-bit runtime)
Python platform: macOS-26.0.1-arm64-arm-64bit-Mach-O
Is CUDA available: N/A
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Is XPU available: N/A
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A
CPU:
Apple M1 Max
Versions of relevant libraries:
[pip3] No relevant packages
[conda] Could not collect