1- # TinyLlama Example Documentation
1+ # TinyLlama Example
22
33This document provides a step‑by‑step guide for generating and processing a text generation model.
44
@@ -32,34 +32,37 @@ This document provides a step‑by‑step guide for generating and processing a
3232
3333## Generating Model Files
3434
35- Run the provided scripts to create the prefill and decode Circle model files:
35+ 1 . Run the provided scripts to create the prefill and decode Circle model files:
3636
3737``` bash
38- python prefill.py # Generates tinyllama. prefill.circle
39- python decode.py # Generates tinyllama.decode .circle
38+ python prefill.py # Generates prefill.circle
39+ python decode.py # Generates decode_ .circle
4040```
4141
4242You can verify the generated files:
4343
4444``` bash
4545ls -lh * .circle
4646# Expected output:
47- # -rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 tinyllama.decode .circle
48- # -rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 tinyllama. prefill.circle
47+ # -rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 decode_ .circle
48+ # -rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 prefill.circle
4949```
5050
51- ## Full Processing Pipeline
51+ 2 . Update tinyllama.decode.circle
5252
53- The following pipeline shows how to chain several tools to transform the model:
53+ Add [ tools/o2o ] ( https://github.com/Samsung/ONE/pull/16233 ) to PATH.
5454
5555``` bash
56- with.py tinyllama.decode.circle |
57- fuse.attention.py \
58- fuse.bmm_lhs_const.py | reshape.fc_weight.py | \
56+ export PATH=../../../../tools/o2o:$PATH
57+ ```
58+
59+ Then, run the following:
60+
61+ ``` bash
62+ fuse.attention.py < decode_.circle | \
63+ fuse.bmm_lhs_const.py | \
5964reshape.io.py input --by_shape [1,16,30,4] [1,16,32,4] | \
6065transpose.io.kvcache.py | \
61- remove.io.py output --keep_by_id 0 | \
62- select.op.py --by_id 0-181 | \
6366gc.py | \
6467retype.input_ids.py > decode.circle
6568```
@@ -68,17 +71,28 @@ retype.input_ids.py > decode.circle
6871
6972| Tool | Purpose |
7073| ------| ---------|
71- | ` with.py ` | Reads the Circle model from stdin and writes it to stdout. |
7274| ` fuse.attention.py ` | Fuses attention‑related operators for optimization. |
7375| ` fuse.bmm_lhs_const.py ` | Fuses constant left‑hand side matrices in batch matrix multiplication. |
74- | ` reshape.fc_weight.py ` | Reshapes fully‑connected layer weights. |
7576| ` reshape.io.py input --by_shape [...] ` | Reshapes input tensors to the specified shapes. |
7677| ` transpose.io.kvcache.py ` | Transposes the KV‑cache tensors. |
77- | ` remove.io.py output --keep_by_id 0 ` | Keeps only the output tensor with ID 0, removing the rest. |
78- | ` select.op.py --by_id 0-181 ` | Selects operators with IDs from 0 to 181. |
7978| ` gc.py ` | Performs garbage collection, removing unused tensors and operators. |
8079| ` retype.input_ids.py ` | Changes the data type of input IDs as needed. |
8180| ` > decode.circle ` | Saves the final processed model to ` decode.circle ` . |
8281
8382
8483Feel free to adjust the pipeline arguments (e.g., shapes, IDs) to suit your specific model configuration.
84+
85+
86+ 3 . Merge prefill and decode circle into 1 circle
87+
88+ ``` bash
89+ merge.circles.py prefill.circle decode.circle > tinyllama.circle
90+ ```
91+
92+ ```
93+ ls -lh *.circle
94+ -rw-rw-r-- 1 gyu gyu 18M Nov 21 17:43 decode.circle
95+ -rw-rw-r-- 1 gyu gyu 18M Nov 21 17:43 decode_.circle
96+ -rw-rw-r-- 1 gyu gyu 18M Nov 18 17:35 prefill.circle
97+ -rw-rw-r-- 1 gyu gyu 18M Nov 21 17:43 tinyllama.circle
98+ ```
0 commit comments