Commit 2b5ca16
authored
Use external hf_tokenizer in llama runner (pytorch#9112)
### Summary
Use https://github.com/pytorch-labs/tokenizers huggingface tokenizer in
the Llama runner.
Results on Qwen2.5 with `extension/llm/tokenizers` checked out to
meta-pytorch/tokenizers#50:
```
Once upon a time, there was a little girl named Lily. She was very happy. She had a big garden in the back of her house. She planted many flowers in it. They were red, yellow and blue. They were very pretty. Lily loved them very much. One day, she was watering them. Suddenly, she heard a noise. It was a noise in the tree. She looked up. There was a big bird in the tree. It was eating one of Lily's flowers. Lily was very angry. She ran to the tree. "Hello!" she said to the bird. "What are you doing in my
I 00:00:08.624959 executorch:runner.cpp:294] RSS after finishing text generation: 2147.121094 MiB (0 if unsupported)
PyTorchObserver {"prompt_tokens":4,"generated_tokens":123,"model_load_start_ms":1744936315023,"model_load_end_ms":1744936318524,"inference_start_ms":1744936318524,"inference_end_ms":1744936323646,"prompt_eval_end_ms":1744936318580,"first_token_ms":1744936318580,"aggregate_sampling_time_ms":274877907025,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:00:08.625019 executorch:stats.h:106] Prompt Tokens: 4 Generated Tokens: 123
I 00:00:08.625021 executorch:stats.h:112] Model Load Time: 3.501000 (seconds)
I 00:00:08.625023 executorch:stats.h:119] Total inference time: 5.122000 (seconds) Rate: 24.014057 (tokens/second)
I 00:00:08.625033 executorch:stats.h:129] Prompt evaluation: 0.056000 (seconds) Rate: 71.428571 (tokens/second)
I 00:00:08.625038 executorch:stats.h:138] Generated 123 tokens: 5.066000 (seconds) Rate: 24.279510 (tokens/second)
I 00:00:08.625045 executorch:stats.h:149] Time to first generated token: 0.056000 (seconds)
I 00:00:08.625047 executorch:stats.h:155] Sampling time over 127 tokens: 274877907.025000 (seconds)
```
### Test plan
Build llama runner locally (note the inclusion of
`-DSUPPORT_REGEX_LOOKAHEAD=ON`):
```
cmake -DPYTHON_EXECUTABLE=python \
-DCMAKE_INSTALL_PREFIX=cmake-out \
-DCMAKE_BUILD_TYPE=Release \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
-DSUPPORT_REGEX_LOOKAHEAD=ON \
-Bcmake-out/examples/models/llama \
examples/models/llama
cmake --build cmake-out/examples/models/llama -j16 --config Release
```
Run on Qwen2.5:
```
cmake-out/examples/models/llama/llama_main --model_path=qwen2_5.pte --tokenizer_path ~/hf/models--Qwen--Qwen2.5-1.5B/snapshots/8faed761d45a263340a0528343f099c05c9a4323/tokenizer.json --prompt="Once upon a time" --temperature 0
```1 parent 5a643b3 commit 2b5ca16
File tree
7 files changed
+67
-36
lines changed- examples
- models/llama/runner
- qualcomm
- qaihub_scripts/llama
- extension/llm
- runner
7 files changed
+67
-36
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | 44 | | |
53 | 45 | | |
54 | 46 | | |
| |||
60 | 52 | | |
61 | 53 | | |
62 | 54 | | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | 55 | | |
79 | | - | |
| 56 | + | |
80 | 57 | | |
81 | 58 | | |
82 | 59 | | |
| |||
85 | 62 | | |
86 | 63 | | |
87 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
88 | 76 | | |
89 | 77 | | |
90 | 78 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
39 | 63 | | |
40 | 64 | | |
41 | 65 | | |
| |||
87 | 111 | | |
88 | 112 | | |
89 | 113 | | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
97 | 118 | | |
98 | 119 | | |
99 | 120 | | |
100 | 121 | | |
101 | 122 | | |
102 | 123 | | |
103 | 124 | | |
104 | | - | |
| 125 | + | |
105 | 126 | | |
106 | 127 | | |
107 | 128 | | |
108 | 129 | | |
| 130 | + | |
109 | 131 | | |
110 | 132 | | |
111 | 133 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
38 | | - | |
| 38 | + | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
70 | 73 | | |
71 | 74 | | |
72 | 75 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
31 | 36 | | |
32 | 37 | | |
33 | 38 | | |
| |||
69 | 74 | | |
70 | 75 | | |
71 | 76 | | |
72 | | - | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
73 | 83 | | |
74 | 84 | | |
75 | 85 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
52 | 59 | | |
53 | 60 | | |
54 | 61 | | |
| |||
Submodule tokenizers updated from 35d185e to 1621280
0 commit comments