Skip to content

Commit f78430e

Browse files
glisteningSanggyu Lee
authored andcommitted
Update document
1 parent 1343862 commit f78430e

File tree

4 files changed

+126
-86
lines changed

4 files changed

+126
-86
lines changed
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# TinyLlama Text Generation Example
2+
3+
This document provides a step‑by‑step guide for generating and processing a TinyLlama text‑generation model.
4+
5+
## Summary
6+
7+
1. Set up the environment and install dependencies.
8+
2. Generate the initial `prefill` and `decode` Circle model files.
9+
3. Run the pipeline to optimize, reshape, and prune the model, producing a final `decode.circle` ready for inference.
10+
11+
## Prerequisites
12+
13+
### 1. Python virtual environment
14+
```bash
15+
cd runtime/ggma/examples/generate_text/
16+
python3 -m venv _
17+
source _/bin/activate
18+
```
19+
20+
### 2. Install required Python packages
21+
```bash
22+
pip install -r requirements.txt
23+
```
24+
25+
### 3. Install TICO (Torch IR to Circle ONE)
26+
```bash
27+
# Clone the repository
28+
git clone https://github.com/Samsung/TICO.git
29+
# Install it in editable mode
30+
pip install -e TICO
31+
```
32+
33+
### 4. Get [o2o](https://github.com/Samsung/ONE/pull/16233) in PATH
34+
*Requires the GitHub CLI (`gh`).*
35+
```bash
36+
gh pr checkout 16233
37+
export PATH=../../../../tools/o2o:$PATH
38+
```
39+
40+
## Generating Model Files
41+
42+
### 1. Create the prefill and decode Circle model files
43+
```bash
44+
python prefill.py # Generates prefill.circle
45+
python decode.py # Generates decode_.circle
46+
```
47+
48+
Verify the generated files:
49+
```bash
50+
ls -lh *.circle
51+
# -rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 decode_.circle
52+
# -rw-rw-r-- 1 gyu gyu 18M Nov 14 14:09 prefill.circle
53+
```
54+
55+
### 2. Update `tinyllama.decode.circle`
56+
Fuse attention and normalize KV-cache inputs for the decode model.
57+
58+
```bash
59+
# Fuse attention and reshape KV-cache for the decode model
60+
fuse.attention.py < decode_.circle \
61+
| fuse.bmm_lhs_const.py \
62+
| reshape.io.py input --by_shape [1,16,30,4] [1,16,32,4] \
63+
| transpose.io.kvcache.py > decode.circle
64+
```
65+
66+
### 3. Merge prefill and decode circles
67+
Merge the models, retype input IDs, and clean up.
68+
69+
```bash
70+
merge.circles.py prefill.circle decode.circle \
71+
| downcast.input_ids.py \
72+
| gc.py > model.circle
73+
```
74+
75+
Verify final model files:
76+
```bash
77+
ls -l {decode,prefill,model}.circle
78+
# -rw-rw-r-- 1 gyu gyu 18594868 Nov 22 17:26 decode.circle
79+
# -rw-rw-r-- 1 gyu gyu 18642052 Nov 22 07:53 prefill.circle
80+
# -rw-rw-r-- 1 gyu gyu 18629520 Nov 22 17:28 model.circle
81+
```
82+
83+
## Create a GGMA package
84+
85+
1. Create the package root directory and move `model.circle` there:
86+
```bash
87+
cd runtime/ggma/examples/generate_text
88+
mkdir tinyllama
89+
mv model.circle tinyllama/
90+
```
91+
92+
2. Copy the tokenizer files (replace `{your_snapshot}` with the actual snapshot hash):
93+
```bash
94+
cp -L ~/.cache/huggingface/hub/models--Maykeye--TinyLLama-v0/snapshots/{your_snapshot}/tokenizer.* tinyllama/
95+
```
96+
97+
```bash
98+
tree tinyllama/
99+
tinyllama/
100+
├── model.circle
101+
├── tokenizer.json
102+
└── tokenizer.model
103+
```
104+
105+
## Build and run `ggma_run`
106+
107+
```bash
108+
make -j$(nproc)
109+
make install
110+
```
111+
112+
Check version:
113+
```bash
114+
Product/out/bin/ggma_run --version
115+
# ggma_run v0.1.0 (nnfw runtime: v1.31.0)
116+
```
117+
118+
Run the model:
119+
```bash
120+
Product/out/bin/ggma_run tinyllama
121+
# prompt: Lily picked up a flower.
122+
# generated: { 1100, 7899, 289, 826, 351, 600, 2439, 288, 266, 3653, 31843, 1100, 7899, 289, 1261, 291, 5869, 291, 1261, 31843, 1100, 7899 }
123+
# detokenized: She liked to play with her friends in the park. She liked to run and jump and run. She liked
124+
```

runtime/ggma/examples/generate_text/decode.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,4 +65,4 @@
6565
model = AutoModelForCausalLM.from_pretrained(model_name)
6666
model.eval()
6767
circle_model = tico.convert(model, captured_input)
68-
circle_model.save(f"tinyllama.decode.circle")
68+
circle_model.save(f"decode_.circle")

runtime/ggma/examples/generate_text/prefill.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,4 +72,4 @@
7272
model = AutoModelForCausalLM.from_pretrained(model_name)
7373
model.eval()
7474
circle_model = tico.convert(model, captured_input)
75-
circle_model.save(f"tinyllama.prefill.circle")
75+
circle_model.save(f"prefill.circle")

runtime/ggma/examples/generate_text/tinyllama.md

Lines changed: 0 additions & 84 deletions
This file was deleted.

0 commit comments

Comments
 (0)