[do not land] multifunction experiments #16514

metascroy · 2026-01-09T01:05:50Z

CoreML Multifunction Model Experiment

This PR adds tooling to create and benchmark CoreML multifunction models that combine prefill and decode functions into a single model package.

Overview

CoreML multifunction models allow multiple functions (e.g., prefill and decode) to share weights within a single model package. This experiment evaluates:

Memory usage of multifunction models vs. individual models
Performance characteristics when switching between prefill and decode

Step 1: Export Static Models

First, export two PTE files with different sequence lengths using export_static_llm_coreml.py:

# Export prefill model (seqlen=32)
python export_static_llm_coreml.py \
    --checkpoint <path_to_checkpoint> \
    --params <path_to_params.json> \
    --seq_length 32 \
    --output model_32.pte

# Export decode model (seqlen=1)
python export_static_llm_coreml.py \
    --checkpoint <path_to_checkpoint> \
    --params <path_to_params.json> \
    --seq_length 1 \
    --output model_1.pte

Step 2: Create Multifunction Models

Use create_multifunctions.py to combine the prefill and decode models:

python create_multifunctions.py \
    --prefill_model $HOME/Desktop/model_32.pte \
    --decode_model $HOME/Desktop/model_1.pte \
    --output_dir $HOME/Desktop/mods

This will:

Extract CoreML models from both PTE files
Create multifunction packages combining prefill/decode for each model piece
Output mod1.mlpackage, mod2.mlpackage, mod3.mlpackage

Optional: Pre-compile Models

Add the --compile flag to pre-compile the models to .mlmodelc format:

python create_multifunctions.py \
    --prefill_model $HOME/Desktop/model_32.pte \
    --decode_model $HOME/Desktop/model_1.pte \
    --output_dir $HOME/Desktop/mods \
    --compile

This outputs mod1.mlmodelc, mod2.mlmodelc, mod3.mlmodelc instead. Pre-compiled models skip the compilation step at runtime.

Step 3: Benchmark with CoreML Test

Open the Benchmark app Xcode project at extension/benchmark/apple/Benchmark
Drag and drop the .mlpackage or .mlmodelc files into the Resources folder
Run the benchmark: Product → Test

Configuring the Benchmark

Edit CoreMLTests.mm to configure the benchmark behavior:

// Enable/disable decode function benchmarking
const BOOL kEnableDecode = YES;

// Enable/disable individual model pieces
const BOOL kEnableMod1 = YES;  // Embedding piece
const BOOL kEnableMod2 = YES;  // Transformer piece
const BOOL kEnableMod3 = YES;  // Output piece

Benchmark Output

The benchmark runs:

Prefill 1: 30 iterations × enabled models
Decode: 50 iterations × enabled models (if kEnableDecode = YES)
Prefill 2: 30 iterations × enabled models

Output example:

=== Benchmark Results ===
Prefill 1: 30 iterations x 3 models, total time: 1234.56 ms (41.15 ms/iter)
Decode: 50 iterations x 3 models, total time: 567.89 ms (11.36 ms/iter)
Prefill 2: 30 iterations x 3 models, total time: 1230.12 ms (41.00 ms/iter)
Total time (prefill 1 + decode + prefill 2): 3032.57 ms
=========================

Observations

Memory Usage

Multifunction models do not appear to use significantly more memory than individual models. The weights are shared between the prefill and decode functions, so memory overhead is minimal.

Model Piece Memory

The embedding piece (mod1) uses significantly more memory compared to the other pieces. This can be observed by toggling kEnableMod1 = NO and comparing memory usage:

// To isolate memory usage of mod2 and mod3:
const BOOL kEnableMod1 = NO;   // Disable embedding piece
const BOOL kEnableMod2 = YES;
const BOOL kEnableMod3 = YES;

This suggests the embedding table is a major contributor to overall memory footprint.

pytorch-bot · 2026-01-09T01:05:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16514

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures, 1 Unrelated Failure

As of commit fcc943d with merge base 913436a ():

NEW FAILURES - The following jobs have failed:

Apple / build-benchmark-app / macos-job (gh)
/Users/runner/work/executorch/executorch/pytorch/executorch/extension/benchmark/apple/Benchmark/Tests/CoreMLTests.mm:55:10: error: property 'functionName' not found on object of type 'MLModelConfiguration *'
Lint / lintrunner / linux-job (gh)
>>> Lint for examples/apple/coreml/llama/create_multifunctions.py:
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 531bf2f42e789e7fca93f63babedc9aafdb55d77cee2c83f495cc8a55cbf4369 /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t a0285e51fb8cff1eefe647dbdb02bd35831d9754288b80b45199da715a60fcfc /exec failed with exit code 1
pull / unittest-arm-backend-with-no-deps (test_pytest_models_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 1ade2def4c8c33296c59ff87f4e7f6ff5a1247e7778f72cd9c1411207dd467f6 /exec failed with exit code 127
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_no_target) / linux-job (gh)
RuntimeError: Command docker exec -t ef5e5a514017189ed9d2b75ed293b31eb49116e264f4440111f2de53869772ac /exec failed with exit code 127
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 1925b19b2c26881a48518c09d8fc4eb1a765f84cc75ec8c891d69b5590114e46 /exec failed with exit code 127
pull / unittest-arm-backend-with-no-deps (test_run_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 6b5a9e84d5f84806c2e82867919bd97a63501c8ccd7f89a4e8e7de960b09f0f8 /exec failed with exit code 127

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / android / run-emulator (gh) (#16137)
Timeout waiting for emulator to boot.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-01-09T01:11:43Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JacobSzwejbka · 2026-01-09T18:31:16Z

And theres no cache coordination problems because static cache is io?

[do not land] multifunction experiments

fcc943d

metascroy requested review from Gasoonjia and cccclai as code owners January 9, 2026 01:05

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[do not land] multifunction experiments #16514

[do not land] multifunction experiments #16514

metascroy commented Jan 9, 2026

Uh oh!

pytorch-bot bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 9, 2026

Uh oh!

JacobSzwejbka commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[do not land] multifunction experiments #16514

Are you sure you want to change the base?

[do not land] multifunction experiments #16514

Conversation

metascroy commented Jan 9, 2026

CoreML Multifunction Model Experiment

Overview

Step 1: Export Static Models

Step 2: Create Multifunction Models

Optional: Pre-compile Models

Step 3: Benchmark with CoreML Test

Configuring the Benchmark

Benchmark Output

Observations

Memory Usage

Model Piece Memory

Uh oh!

pytorch-bot bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16514

❌ 8 New Failures, 1 Unrelated Failure

Uh oh!

github-actions bot commented Jan 9, 2026

This PR needs a release notes: label

Uh oh!

JacobSzwejbka commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Jan 9, 2026 •

edited

Loading

This PR needs a `release notes:` label