[not for landing] coreml llama #7616

metascroy · 2025-01-13T03:21:18Z

This is a version of llama that delegates to ANE and can handle a fixed sequence length (e.g., prefill) with input_pos >= 0. To export the model, run model_export_script.sh. In the script, modify the variables:

export MODEL_IN=$HOME/models/stories110M/stories110M.pt
export TOKENIZER=$HOME/models/stories110M/tokenizer.bin
export PARAMS=$HOME/models/stories110M/params.json
export MODEL_OUT_DIR=$HOME/models/stories110M
export STATIC_SEQ_LENGTH=500

to appropriate values for the model you are trying to export. If STATIC_SEQ_LENGTH=1, the model can be used for decoding. If STATIC_SEQ_LENGTH > 1, it can be used for static prefill.

Note that although the arg enable_dynamic_shape is set, dynamic_shapes are overridden to be None. The reason for setting enable_dynamic_shape is because this is how to handle seq_lengh > 1 in llama_transformer.py.

To avoid issues with CoreML delegation, we do the following:

Run SDPA/KV-cache ops outside of CoreML by skipping them in the partitioner. The ops required to update KV-cache are not supported on ANE.
We also skip any node that is a symbolic int or has a symbolic int arg (which is not supported by CoreML).
Finally, we skip the embedding op because coreml converts the token to uint16 when running on ANE, which will not work with llama3-type models.

For stories110M, I get:

163 tokens/sec decode on iPhone 15 Pro
56ms for 500 token prefill

(I get a segfault in SDPA op when STATIC_SEQ_LENGTH >= 512, which requires investigation).

Note: you will encounter an error during CoreML conversion when running export_model_script.sh. You must first make the change below so that CoreML tools can handle negative infinity in sympy_numbers.

Update the function _map_sympy_number_to_int in coremltools/converters/mil/frontend/torch/exir_utils.py as follows:

def _map_sympy_number_to_int(sympy_number: sympy.core.numbers.Number) -> int:
    MAX_DIM = 2**31 - 1
    MIN_DIM = -2**31
    if sympy_number == sympy.oo or sympy_number > MAX_DIM:
        return MAX_DIM
    elif sympy_number == -sympy.oo or sympy_number < MIN_DIM:
        return MIN_DIM
    else:
        return int(sympy_number)

pytorch-bot · 2025-01-13T03:21:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7616

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit d0e196b with merge base 3f9324c ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
Lint / lintrunner / linux-job (gh)
>>> Lint for examples/models/llama/export_llama_lib.py:
pull / test-llava-runner-linux / linux-job (gh)
test_llava_export

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-01-13T03:21:56Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

coreml llama

d0e196b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 13, 2025

metascroy changed the title ~~coreml llama~~ [not for landing] coreml llama Jan 13, 2025

metascroy closed this May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[not for landing] coreml llama #7616

[not for landing] coreml llama #7616

Uh oh!

metascroy commented Jan 13, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[not for landing] coreml llama #7616

[not for landing] coreml llama #7616

Uh oh!

Conversation

metascroy commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7616

❌ 3 New Failures

Uh oh!

github-actions bot commented Jan 13, 2025

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

metascroy commented Jan 13, 2025 •

edited

Loading

pytorch-bot bot commented Jan 13, 2025 •

edited

Loading

This PR needs a `release notes:` label