|
| 1 | +# TinyLlama Text Generation Example |
| 2 | + |
| 3 | +This document provides a step‑by‑step guide for generating and processing a TinyLlama text‑generation model. |
| 4 | + |
| 5 | +## Summary |
| 6 | + |
| 7 | +1. Set up the environment and install dependencies. |
| 8 | +2. Generate the initial `prefill` and `decode` Circle model files. |
| 9 | +3. Run the pipeline to optimize, reshape, and prune the model, producing a final `decode.circle` ready for inference. |
| 10 | + |
| 11 | +## Prerequisites |
| 12 | + |
| 13 | +### 1. Python virtual environment |
| 14 | +```bash |
| 15 | +cd runtime/ggma/examples/generate_text/ |
| 16 | +python3 -m venv _ |
| 17 | +source _/bin/activate |
| 18 | +``` |
| 19 | + |
| 20 | +### 2. Install required Python packages |
| 21 | +```bash |
| 22 | +pip install -r requirements.txt |
| 23 | +``` |
| 24 | + |
| 25 | +### 3. Install TICO (Torch IR to Circle ONE) |
| 26 | +```bash |
| 27 | +# Clone the repository |
| 28 | +git clone https://github.com/Samsung/TICO.git |
| 29 | +# Install it in editable mode |
| 30 | +pip install -e TICO |
| 31 | +``` |
| 32 | + |
| 33 | +### 4. Get [o2o](https://github.com/Samsung/ONE/pull/16233) in PATH |
| 34 | +*Requires the GitHub CLI (`gh`).* |
| 35 | +```bash |
| 36 | +gh pr checkout 16233 |
| 37 | +export PATH=../../../../tools/o2o:$PATH |
| 38 | +``` |
| 39 | + |
| 40 | +## Generating Model Files |
| 41 | + |
| 42 | +### 1. Create the prefill and decode Circle model files |
| 43 | +```bash |
| 44 | +python prefill.py # Generates prefill.circle |
| 45 | +python decode.py # Generates decode_.circle |
| 46 | +``` |
| 47 | + |
| 48 | +Verify the generated files: |
| 49 | +```bash |
| 50 | +ls -lh *.circle |
| 51 | +# -rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 decode_.circle |
| 52 | +# -rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 prefill.circle |
| 53 | +``` |
| 54 | + |
| 55 | +### 2. Update `tinyllama.decode.circle` |
| 56 | +Fuse attention and normalize KV-cache inputs for the decode model. |
| 57 | + |
| 58 | +```bash |
| 59 | +# Fuse attention and reshape KV-cache for the decode model |
| 60 | +fuse.attention.py < decode_.circle \ |
| 61 | + | fuse.bmm_lhs_const.py \ |
| 62 | + | reshape.io.py input --by_shape [1,16,30,4] [1,16,32,4] \ |
| 63 | + | transpose.io.kvcache.py > decode.circle |
| 64 | +``` |
| 65 | + |
| 66 | +### 3. Merge prefill and decode circles |
| 67 | +Merge the models, retype input IDs, and clean up. |
| 68 | + |
| 69 | +```bash |
| 70 | +merge.circles.py prefill.circle decode.circle \ |
| 71 | + | downcast.input_ids.py \ |
| 72 | + | gc.py > model.circle |
| 73 | +``` |
| 74 | + |
| 75 | +Verify final model files: |
| 76 | +```bash |
| 77 | +ls -l {decode,prefill,model}.circle |
| 78 | +# -rw-rw-r-- 1 gyu gyu 18594868 Nov 22 17:26 decode.circle |
| 79 | +# -rw-rw-r-- 1 gyu gyu 18642052 Nov 22 07:53 prefill.circle |
| 80 | +# -rw-rw-r-- 1 gyu gyu 18629520 Nov 22 17:28 model.circle |
| 81 | +``` |
| 82 | + |
| 83 | +## Create a GGMA package |
| 84 | + |
| 85 | +1. Create the package root directory and move `model.circle` there: |
| 86 | +```bash |
| 87 | +cd runtime/ggma/examples/generate_text |
| 88 | +mkdir tinyllama |
| 89 | +mv model.circle tinyllama/ |
| 90 | +``` |
| 91 | + |
| 92 | +2. Copy the tokenizer files (replace `{your_snapshot}` with the actual snapshot hash): |
| 93 | +```bash |
| 94 | +cp -L ~/.cache/huggingface/hub/models--Maykeye--TinyLLama-v0/snapshots/{your_snapshot}/tokenizer.* tinyllama/ |
| 95 | +cp -L ~/.cache/huggingface/hub/models--Maykeye--TinyLLama-v0/snapshots/{your_snapshot}/config.json tinyllama/ |
| 96 | +``` |
| 97 | + |
| 98 | +```bash |
| 99 | +tree tinyllama/ |
| 100 | +tinyllama/ |
| 101 | +├── model.circle |
| 102 | +├── tokenizer.json |
| 103 | +└── tokenizer.model |
| 104 | +``` |
| 105 | + |
| 106 | +## Build and run `ggma_run` |
| 107 | + |
| 108 | +```bash |
| 109 | +make -j$(nproc) |
| 110 | +make install |
| 111 | +``` |
| 112 | + |
| 113 | +Check version: |
| 114 | +```bash |
| 115 | +Product/out/bin/ggma_run --version |
| 116 | +# ggma_run v0.1.0 (nnfw runtime: v1.31.0) |
| 117 | +``` |
| 118 | + |
| 119 | +Run the model: |
| 120 | +```bash |
| 121 | +Product/out/bin/ggma_run tinyllama |
| 122 | +# prompt: Lily picked up a flower. |
| 123 | +# generated: { 1100, 7899, 289, 826, 351, 600, 2439, 288, 266, 3653, 31843, 1100, 7899, 289, 1261, 291, 5869, 291, 1261, 31843, 1100, 7899 } |
| 124 | +# detokenized: She liked to play with her friends in the park. She liked to run and jump and run. She liked |
| 125 | +``` |
0 commit comments