Skip to content

Conversation

@jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Nov 4, 2024

Summary

  • Set up KV caches for TorchTune Llama model
  • Adds a separate runner for TorchTune Llama models, since the input handling methods are separate enough to warrant a new copy

Test plan

Download checkpoint: tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct

Eager without KV cache:

python -m examples.models.llama3_2_vision.runner.eager --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth  --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model -d fp32 --verbose --prompt "What is the capital of USA?" --max_seq_length 64

Eager with KV cache:

python -m examples.models.llama3_2_vision.runner.eager --model llama3_2_vision --checkpoint /tmp/Llama-3.2-11B-Vision-Instruct/original/consolidated.pth  --params examples/models/llama3_2_vision/text_decoder/params/demo_config.json --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}' --output_name="llama3_2_vision.pte" --tokenizer /tmp/Llama-3.2-11B-Vision-Instruct/original/tokenizer.model -d fp32 --verbose --prompt "What is the capital of USA?" --max_seq_length 64 -kv

@jackzhxng jackzhxng changed the title Make TorchTune Llama model KV cache compatible Make TorchTune Llama model KV cache compatible in eager Nov 13, 2024
@jackzhxng jackzhxng marked this pull request as ready for review November 13, 2024 22:54
Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test in a follow up

Base automatically changed from jz/native-runner-tt to main November 14, 2024 22:04
@jackzhxng jackzhxng merged commit 7b76f0f into main Nov 15, 2024
39 checks passed
@jackzhxng jackzhxng deleted the jz/tt-llama-kv-cache branch November 15, 2024 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants