Phi4-mini tokenizer.json does not work the ExecuTorch C++ runner

### 🐛 Describe the bug

The phi4-mini model used to run with our demo apps, but it doesn't look like it still does.  When using the phi4-mini tokenizer.json file from huggingface, I get the following error: https://github.com/pytorch/executorch/actions/runs/17559629529/job/49872638210.  For convenience, the error is:

```
I tokenizers:regex.cpp:27] Registering override fallback regex
I tokenizers:regex.cpp:27] Registering override fallback regex
I 00:00:00.000455 executorch:cpuinfo_utils.cpp:71] Reading file /sys/devices/soc0/image_version
I 00:00:00.000485 executorch:cpuinfo_utils.cpp:87] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.000494 executorch:cpuinfo_utils.cpp:100] Reading file /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
I 00:00:00.000504 executorch:cpuinfo_utils.cpp:109] Failed to open midr file /sys/devices/system/cpu/cpu0/regs/identification/midr_el1
I 00:00:00.000511 executorch:cpuinfo_utils.cpp:125] CPU info and manual query on # of cpus dont match.
I 00:00:00.000517 executorch:main.cpp:87] Resetting threadpool with num threads = 0
I 00:00:00.000528 executorch:runner.cpp:44] Creating LLaMa runner: model_path=model.pte, tokenizer_path=/var/lib/ci-user/.cache/huggingface/hub/models--metascroy--Phi-4-mini-instruct-INT8-INT4/snapshots/fb6d43d74a0349abe2d92f6f2136ecb7258abdcb/tokenizer.json
I tokenizers:hf_tokenizer.cpp:109] Setting up normalizer...
I tokenizers:hf_tokenizer.cpp:115] Normalizer field is null, skipping
I tokenizers:hf_tokenizer.cpp:127] Setting up pretokenizer...
terminate called after throwing an instance of 'std::runtime_error'
  what():  invert=true is not supported for Split PreTokenizer. Only invert=false is supported.
.ci/scripts/test_torchao_huggingface_checkpoints.sh: line 105: 69317 Aborted                 (core dumped) ./cmake-out/examples/models/llama/llama_main --model_path=$MODEL_OUT --tokenizer_path="${HF_MODEL_DIR}/tokenizer.json" --prompt="Once upon a time,"
    main()
  File "/home/ec2-user/actions-runner/_work/executorch/executorch/test-infra/.github/scripts/run_with_env_secrets.py", line 98, in main
    run_cmd_or_die(f"docker exec -t {container_name} /exec")
  File "/home/ec2-user/actions-runner/_work/executorch/executorch/test-infra/.github/scripts/run_with_env_secrets.py", line 39, in run_cmd_or_die
    raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}")
RuntimeError: Command docker exec -t 3b60b4fc3d885023ab8d38a531bbb1b66a33cd0d5cd87be4a787fbb5ecb4c86e /exec failed with exit code 134
```

The change appears related to https://github.com/meta-pytorch/tokenizers/pull/87, which added the invert check that throws this error.

To reproduce, check out https://github.com/pytorch/executorch/pull/14074 (if not already landed) and run:

```
.ci/scripts/test_torchao_huggingface_checkpoints.sh phi_4_mini --test_with_runner
```

### Versions

PyTorch version: 2.9.0.dev20250811
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.6.1 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.0.13.5)
CMake version: version 3.31.6
Libc version: N/A

Python version: 3.10.18 (main, Jun  5 2025, 08:37:47) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.6.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] executorch==0.8.0a0+1a7441f
[pip3] flake8==6.1.0
[pip3] flake8-breakpoint==1.1.0
[pip3] flake8-bugbear==24.4.26
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-plugin-utils==1.3.3
[pip3] flake8-pyi==23.5.0
[pip3] mypy==1.14.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] pytorch_tokenizers==0.1.0
[pip3] torch==2.9.0.dev20250811
[pip3] torchao==0.14.0+gitf1acc1e2a
[pip3] torchaudio==2.8.0.dev20250811
[pip3] torchdata==0.11.0
[pip3] torchsr==1.0.4
[pip3] torchtune==0.6.1
[pip3] torchvision==0.24.0.dev20250811
[conda] executorch                0.8.0a0+1a7441f          pypi_0    pypi
[conda] numpy                     2.2.6                    pypi_0    pypi
[conda] pytorch-tokenizers        0.1.0                    pypi_0    pypi
[conda] torch                     2.9.0.dev20250811          pypi_0    pypi
[conda] torchao                   0.14.0+gitf1acc1e2a          pypi_0    pypi
[conda] torchaudio                2.8.0.dev20250811          pypi_0    pypi
[conda] torchdata                 0.11.0                   pypi_0    pypi
[conda] torchfix                  0.6.0                    pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchtune                 0.6.1                    pypi_0    pypi
[conda] torchvision               0.24.0.dev20250811          pypi_0    pypi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phi4-mini tokenizer.json does not work the ExecuTorch C++ runner #14077

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phi4-mini tokenizer.json does not work the ExecuTorch C++ runner #14077

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions