Make TorchTune Llama model KV cache compatible in eager #6643

jackzhxng · 2024-11-04T21:26:01Z

Summary

Set up KV caches for TorchTune Llama model
Adds a separate runner for TorchTune Llama models, since the input handling methods are separate enough to warrant a new copy

Test plan

Download checkpoint: tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct

Eager without KV cache:

python -m examples.models.llama3_2_vision.runner.eager --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth  --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model -d fp32 --verbose --prompt "What is the capital of USA?" --max_seq_length 64

Eager with KV cache:

python -m examples.models.llama3_2_vision.runner.eager --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth  --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model -d fp32 --verbose --prompt "What is the capital of USA?" --max_seq_length 64 -kv

larryliu0820

Please add test in a follow up

jackzhxng added 30 commits October 9, 2024 15:33

Changes to native runner to run tt

7f81e00

Add kwarg example inputs to eager model base

0b5a9a7

Create create new method for example kwarg inputs instead

a9647d2

Add kwarg example inputs to eager model base

fa3b1d2

Lint

e8715ba

Accept model type parameter in export_llama

a6f96a2

Remove future implementation

328c72c

Lint

ec80bba

Create create new method for example kwarg inputs instead

c9bbe12

Accept model type parameter in export_llama

99d5bfb

Torchtune llama3_2_vision model in ET, no quantization

1fb2236

Fix vision model example input

e0c4b8a

Lint

e145bd1

Kv cache

ed906cb

Merge branch 'main' into jz/tt-llama

6dd47e7

Update READMEs

1825972

Change model default arg

196499a

Update eager runner and eval llama

96ba40b

Merge branch 'jz/tt-llama-rebased' into jz/tt-llama-2

18a82e1

Fix tests

0f3035d

Merge branch 'jz/tt-llama-rebased' into jz/tt-llama-2

e677e14

Fix tests again

b1f6678

Merge branch 'jz/tt-llama-rebased' into jz/tt-llama-2

13d004b

Strict = True

c79b773

Things work

b8ff8e2

Merge branch 'jz/tt-llama-rebased' into jz/native-runner-tt

25ec7ce

Clip logits if torchtune

6e38763

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

7a7041d

Fix

96d5798

Kv cache by default is false

f275e2e

jackzhxng added 5 commits November 13, 2024 07:37

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

e1ec74c

Fixes

84422d9

Remove token count printing

1163769

Merge branch 'jz/native-runner-tt' into jz/tt-llama-kv-cache

a0e33d9

Fix faulty merge

aa289ea

jackzhxng changed the title ~~Make TorchTune Llama model KV cache compatible~~ Make TorchTune Llama model KV cache compatible in eager Nov 13, 2024

jackzhxng added 4 commits November 13, 2024 14:07

Add runner

eeeeb8a

Remove has_full_logits from llama runner

c80ce1c

Lint

9bd405f

Modularize and update base eager runner

7507002

jackzhxng marked this pull request as ready for review November 13, 2024 22:54

jackzhxng added 2 commits November 14, 2024 09:45

Move to subdir

e5428de

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

eefadaa

jackzhxng mentioned this pull request Nov 14, 2024

Export TorchTune llama3_2_vision in ET #5911

Merged

jackzhxng added 4 commits November 14, 2024 10:34

Merge remote-tracking branch 'origin/main' into jz/tt-llama-2

bf33485

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

9c5647c

Tarun rev

f61a347

Merge branch 'jz/native-runner-tt' into jz/tt-llama-kv-cache

a36703e

larryliu0820 approved these changes Nov 14, 2024

View reviewed changes

jackzhxng added 4 commits November 14, 2024 12:49

Add automatically generated export tests

7a0101f

Fix internal pyre warning

9777e23

Merge branch 'jz/tt-llama-2' into jz/native-runner-tt

2b9f281

Merge branch 'jz/native-runner-tt' into jz/tt-llama-kv-cache

f504cc5

Base automatically changed from jz/native-runner-tt to main November 14, 2024 22:04

jackzhxng added 4 commits November 14, 2024 16:13

Add executorch runner

1e26f60

Merge remote-tracking branch 'origin/main' into jz/tt-llama-kv-cache

b74e2c3

Fix test

f8f8f06

Lint

09e9675

jackzhxng merged commit 7b76f0f into main Nov 15, 2024
39 checks passed

jackzhxng deleted the jz/tt-llama-kv-cache branch November 15, 2024 03:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make TorchTune Llama model KV cache compatible in eager #6643

Make TorchTune Llama model KV cache compatible in eager #6643

Uh oh!

jackzhxng commented Nov 4, 2024 •

edited

Loading

Uh oh!

larryliu0820 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Make TorchTune Llama model KV cache compatible in eager #6643

Make TorchTune Llama model KV cache compatible in eager #6643

Uh oh!

Conversation

jackzhxng commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

larryliu0820 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jackzhxng commented Nov 4, 2024 •

edited

Loading