|
| 1 | +# TinyLlama Text Generation Developer Guide |
| 2 | + |
| 3 | +This document provides a detailed technical guide for generating, processing, and optimizing the TinyLlama text-generation model. For basic usage, see [USER.md](USER.md). |
| 4 | + |
| 5 | +## Summary |
| 6 | + |
| 7 | +1. Set up the environment and install dependencies. |
| 8 | +2. Generate the initial `prefill` and `decode` Circle model files. |
| 9 | +3. Run the pipeline to optimize, reshape, and prune the model, producing a final `decode.circle` ready for inference. |
| 10 | + |
| 11 | +## Prerequisites |
| 12 | + |
| 13 | +### 1. Python virtual environment |
| 14 | +```bash |
| 15 | +$ cd runtime/ggma/examples/generate_text/ |
| 16 | +$ python3 -m venv _ |
| 17 | +$ source _/bin/activate |
| 18 | +``` |
| 19 | + |
| 20 | +### 2. Prepare [gyu](tools/gyu/README.md) and o2o tools |
| 21 | +Install dependencies and setup `o2o` tools (similar to what `tools/gyu/init.py` does). |
| 22 | + |
| 23 | +> **Note**: We install the CPU version of `torch` first because `gyu` depends on `TICO`, which by default pulls in the large NVIDIA version of `torch`. Installing the CPU version beforehand prevents this. |
| 24 | +
|
| 25 | +```bash |
| 26 | +# 1. Install torch (CPU) and gyu requirements |
| 27 | +$ pip install torch --index-url https://download.pytorch.org/whl/cpu |
| 28 | +$ pip install -r tools/gyu/requirements.txt |
| 29 | + |
| 30 | +# 2. Fetch o2o tools from PR #16233 |
| 31 | +$ git fetch origin pull/16233/head:pr-16233 |
| 32 | +$ git checkout pr-16233 -- tools/o2o |
| 33 | +$ chmod +x tools/o2o/*.py |
| 34 | + |
| 35 | +# 3. Add tools to PATH |
| 36 | +$ export PATH=$PWD/tools/o2o:$PWD/tools/gyu:$PATH |
| 37 | +``` |
| 38 | + |
| 39 | + |
| 40 | + |
| 41 | +## Generating Model Files |
| 42 | + |
| 43 | +### 1. Install model dependencies |
| 44 | +```bash |
| 45 | +$ pip install -r tinyllama/tinyllama.requirements |
| 46 | +``` |
| 47 | + |
| 48 | +### 2. Create the prefill and decode Circle model files |
| 49 | +```bash |
| 50 | +$ python tinyllama/tinyllama.py --mode prefill # Generates prefill.circle |
| 51 | +$ python tinyllama/tinyllama.py --mode decode # Generates decode_.circle |
| 52 | +``` |
| 53 | + |
| 54 | +Verify the generated files: |
| 55 | +```bash |
| 56 | +$ ls -lh *.circle |
| 57 | +-rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 decode_.circle |
| 58 | +-rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 prefill.circle |
| 59 | +``` |
| 60 | + |
| 61 | +### 3. Update `tinyllama.decode.circle` |
| 62 | +Fuse attention and normalize KV-cache inputs for the decode model. |
| 63 | + |
| 64 | +```bash |
| 65 | +$ fuse.attention.py < decode_.circle \ |
| 66 | + | reshape.io.py input --by_shape [1,16,30,4] [1,16,32,4] \ |
| 67 | + | transpose.io.kvcache.py > decode.circle |
| 68 | +``` |
| 69 | + |
| 70 | +### 4. Merge prefill and decode circles |
| 71 | +Merge the models, retype input IDs, and clean up. |
| 72 | + |
| 73 | +```bash |
| 74 | +$ merge.circles.py prefill.circle decode.circle \ |
| 75 | + | fuse.bmm_lhs_const.py \ |
| 76 | + | downcast.input_ids.py \ |
| 77 | + | gc.py > model.circle |
| 78 | +``` |
| 79 | + |
| 80 | +Verify final model files: |
| 81 | +```bash |
| 82 | +$ ls -l {decode,prefill,model}.circle |
| 83 | +-rw-rw-r-- 1 gyu gyu 18594868 Nov 22 17:26 decode.circle |
| 84 | +-rw-rw-r-- 1 gyu gyu 18642052 Nov 22 07:53 prefill.circle |
| 85 | +-rw-rw-r-- 1 gyu gyu 18629520 Nov 22 17:28 model.circle |
| 86 | +``` |
| 87 | + |
| 88 | +## Create a GGMA package |
| 89 | + |
| 90 | +1. Create the package root directory and move `model.circle` there: |
| 91 | +```bash |
| 92 | +$ cd runtime/ggma/examples/generate_text |
| 93 | +$ mkdir tinyllama |
| 94 | +$ mv model.circle tinyllama/ |
| 95 | +``` |
| 96 | + |
| 97 | +2. Copy the tokenizer files (replace `{your_snapshot}` with the actual snapshot hash): |
| 98 | +```bash |
| 99 | +$ cp -L ~/.cache/huggingface/hub/models--Maykeye--TinyLLama-v0/snapshots/{your_snapshot}/tokenizer.* tinyllama/ |
| 100 | +$ cp -L ~/.cache/huggingface/hub/models--Maykeye--TinyLLama-v0/snapshots/{your_snapshot}/config.json tinyllama/ |
| 101 | +``` |
| 102 | + |
| 103 | +```bash |
| 104 | +$ tree tinyllama/ |
| 105 | +tinyllama/ |
| 106 | +├── model.circle |
| 107 | +├── tokenizer.json |
| 108 | +└── tokenizer.model |
| 109 | +``` |
| 110 | + |
| 111 | +## Build and run `ggma_run` |
| 112 | + |
| 113 | +```bash |
| 114 | +$ make -j$(nproc) |
| 115 | +$ make install |
| 116 | +``` |
| 117 | + |
| 118 | +Check version: |
| 119 | +```bash |
| 120 | +$ Product/out/bin/ggma_run --version |
| 121 | +ggma_run v0.1.0 (nnfw runtime: v1.31.0) |
| 122 | +``` |
| 123 | + |
| 124 | +Run the model: |
| 125 | +```bash |
| 126 | +$ Product/out/bin/ggma_run tinyllama |
| 127 | +prompt: Lily picked up a flower. |
| 128 | +generated: { 1100, 7899, 289, 826, 351, 600, 2439, 288, 266, 3653, 31843, 1100, 7899, 289, 1261, 291, 5869, 291, 1261, 31843, 1100, 7899 } |
| 129 | +detokenized: She liked to play with her friends in the park. She liked to run and jump and run. She liked |
| 130 | +``` |
0 commit comments