Skip to content

Commit 4783816

Browse files
HavenDVclaude
andcommitted
perf: optimize Decode with ArrayPool and add complete decode benchmarks
- Add DecodeToString using ArrayPool<byte> on .NET 8+ to avoid intermediate byte[] allocation (single-pass approach) - Pre-cache special token bytes in constructor - Add MicrosoftML and TokenizerLib decode benchmarks for full comparison - HelloWorld decode: 30% faster (53→38ns), KingLear: 14% faster (433→374ns) - Allocation reduction: Bitcoin 124KB→40KB, KingLear 1.8KB→616B Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e7a300c commit 4783816

File tree

5 files changed

+196
-114
lines changed

5 files changed

+196
-114
lines changed

README.md

Lines changed: 55 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -41,55 +41,61 @@ Apple M4 Max, 1 CPU, 16 logical and 16 physical cores
4141
4242
4343
```
44-
| Method | Categories | Data | Mean | Median | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
45-
|---------------------------------- |------------ |-------------------- |--------------:|--------------:|------:|---------:|--------:|----------:|------------:|
46-
| **SharpTokenV2_0_3_** | **CountTokens** | **1. (...)57. [19866]** | **378,104.35 ns** | **374,371.36 ns** | **1.00** | **1.9531** | **-** | **20112 B** | **1.00** |
47-
| TiktokenSharpV1_1_5_ | CountTokens | 1. (...)57. [19866] | 249,330.44 ns | 247,579.14 ns | 0.66 | 7.8125 | 0.4883 | 65968 B | 3.28 |
48-
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | 1. (...)57. [19866] | 249,838.63 ns | 247,990.36 ns | 0.66 | - | - | 304 B | 0.02 |
49-
| TokenizerLibV1_3_3_ | CountTokens | 1. (...)57. [19866] | 499,477.14 ns | 499,648.64 ns | 1.32 | 184.5703 | 75.1953 | 1547672 B | 76.95 |
50-
| Tiktoken_ | CountTokens | 1. (...)57. [19866] | 158,290.75 ns | 157,653.59 ns | 0.42 | - | - | - | 0.00 |
51-
| | | | | | | | | | |
52-
| **SharpTokenV2_0_3_** | **CountTokens** | **Hello, World!** | **237.23 ns** | **235.68 ns** | **1.00** | **0.0305** | **-** | **256 B** | **1.00** |
53-
| TiktokenSharpV1_1_5_ | CountTokens | Hello, World! | 166.19 ns | 165.99 ns | 0.70 | 0.0238 | - | 200 B | 0.78 |
54-
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | Hello, World! | 182.46 ns | 182.27 ns | 0.77 | 0.0124 | - | 104 B | 0.41 |
55-
| TokenizerLibV1_3_3_ | CountTokens | Hello, World! | 294.78 ns | 295.13 ns | 1.24 | 0.1769 | 0.0005 | 1480 B | 5.78 |
56-
| Tiktoken_ | CountTokens | Hello, World! | 101.43 ns | 101.06 ns | 0.43 | - | - | - | 0.00 |
57-
| | | | | | | | | | |
58-
| **SharpTokenV2_0_3_** | **CountTokens** | **King(...)edy. [275]** | **4,033.61 ns** | **4,004.98 ns** | **1.00** | **0.0610** | **-** | **520 B** | **1.00** |
59-
| TiktokenSharpV1_1_5_ | CountTokens | King(...)edy. [275] | 2,503.27 ns | 2,500.90 ns | 0.62 | 0.0916 | - | 776 B | 1.49 |
60-
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | King(...)edy. [275] | 2,147.42 ns | 2,142.02 ns | 0.53 | 0.0114 | - | 104 B | 0.20 |
61-
| TokenizerLibV1_3_3_ | CountTokens | King(...)edy. [275] | 4,741.88 ns | 4,738.57 ns | 1.18 | 2.3117 | 0.0992 | 19344 B | 37.20 |
62-
| Tiktoken_ | CountTokens | King(...)edy. [275] | 1,359.61 ns | 1,357.52 ns | 0.34 | 0.0038 | - | 32 B | 0.06 |
63-
| | | | | | | | | | |
64-
| **SharpTokenV2_0_3_Decode** | **Decode** | **1. (...)57. [19866]** | **47,227.99 ns** | **47,179.53 ns** | **1.00** | **14.8926** | **-** | **125232 B** | **1.00** |
65-
| TiktokenSharpV1_1_5_Decode | Decode | 1. (...)57. [19866] | 35,960.97 ns | 35,924.18 ns | 0.76 | 15.8691 | 2.6245 | 133400 B | 1.07 |
66-
| Tiktoken_Decode | Decode | 1. (...)57. [19866] | 42,623.50 ns | 42,504.76 ns | 0.90 | 14.7705 | 1.4648 | 124248 B | 0.99 |
67-
| | | | | | | | | | |
68-
| **SharpTokenV2_0_3_Decode** | **Decode** | **Hello, World!** | **60.16 ns** | **59.93 ns** | **1.00** | **0.0564** | **-** | **472 B** | **1.00** |
69-
| TiktokenSharpV1_1_5_Decode | Decode | Hello, World! | 43.96 ns | 43.87 ns | 0.73 | 0.0105 | - | 88 B | 0.19 |
70-
| Tiktoken_Decode | Decode | Hello, World! | 53.35 ns | 53.48 ns | 0.89 | 0.0277 | - | 232 B | 0.49 |
71-
| | | | | | | | | | |
72-
| **SharpTokenV2_0_3_Decode** | **Decode** | **King(...)edy. [275]** | **555.68 ns** | **555.68 ns** | **1.00** | **0.2146** | **-** | **1800 B** | **1.00** |
73-
| TiktokenSharpV1_1_5_Decode | Decode | King(...)edy. [275] | 464.31 ns | 464.17 ns | 0.84 | 0.0734 | - | 616 B | 0.34 |
74-
| Tiktoken_Decode | Decode | King(...)edy. [275] | 433.38 ns | 433.43 ns | 0.78 | 0.2227 | - | 1864 B | 1.04 |
75-
| | | | | | | | | | |
76-
| **SharpTokenV2_0_3_Encode** | **Encode** | **1. (...)57. [19866]** | **359,477.48 ns** | **360,047.91 ns** | **1.00** | **1.9531** | **-** | **20112 B** | **1.00** |
77-
| TiktokenSharpV1_1_5_Encode | Encode | 1. (...)57. [19866] | 247,277.33 ns | 246,921.70 ns | 0.69 | 7.8125 | 0.4883 | 65968 B | 3.28 |
78-
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | 1. (...)57. [19866] | 256,138.26 ns | 254,792.06 ns | 0.71 | 7.8125 | 0.4883 | 66144 B | 3.29 |
79-
| TokenizerLibV1_3_3_Encode | Encode | 1. (...)57. [19866] | 509,557.25 ns | 507,033.43 ns | 1.42 | 184.5703 | 75.1953 | 1547672 B | 76.95 |
80-
| Tiktoken_Encode | Encode | 1. (...)57. [19866] | 170,554.17 ns | 170,710.12 ns | 0.47 | 7.8125 | 0.7324 | 66152 B | 3.29 |
81-
| | | | | | | | | | |
82-
| **SharpTokenV2_0_3_Encode** | **Encode** | **Hello, World!** | **233.57 ns** | **232.82 ns** | **1.00** | **0.0305** | **-** | **256 B** | **1.00** |
83-
| TiktokenSharpV1_1_5_Encode | Encode | Hello, World! | 168.67 ns | 168.01 ns | 0.72 | 0.0238 | - | 200 B | 0.78 |
84-
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | Hello, World! | 212.94 ns | 208.56 ns | 0.91 | 0.0210 | - | 176 B | 0.69 |
85-
| TokenizerLibV1_3_3_Encode | Encode | Hello, World! | 297.23 ns | 297.86 ns | 1.27 | 0.1769 | 0.0005 | 1480 B | 5.78 |
86-
| Tiktoken_Encode | Encode | Hello, World! | 140.66 ns | 140.45 ns | 0.60 | 0.0458 | - | 384 B | 1.50 |
87-
| | | | | | | | | | |
88-
| **SharpTokenV2_0_3_Encode** | **Encode** | **King(...)edy. [275]** | **3,923.79 ns** | **3,923.00 ns** | **1.00** | **0.0610** | **-** | **520 B** | **1.00** |
89-
| TiktokenSharpV1_1_5_Encode | Encode | King(...)edy. [275] | 2,464.95 ns | 2,462.37 ns | 0.63 | 0.0916 | - | 776 B | 1.49 |
90-
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | King(...)edy. [275] | 2,215.75 ns | 2,209.49 ns | 0.56 | 0.0877 | - | 752 B | 1.45 |
91-
| TokenizerLibV1_3_3_Encode | Encode | King(...)edy. [275] | 4,704.92 ns | 4,688.91 ns | 1.20 | 2.3117 | 0.0992 | 19344 B | 37.20 |
92-
| Tiktoken_Encode | Encode | King(...)edy. [275] | 1,520.53 ns | 1,517.05 ns | 0.39 | 0.1183 | - | 992 B | 1.91 |
44+
| Method | Categories | Data | Mean | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
45+
|---------------------------------- |------------ |-------------------- |--------------:|------:|---------:|--------:|----------:|------------:|
46+
| **SharpTokenV2_0_3_** | **CountTokens** | **1. (...)57. [19866]** | **352,748.15 ns** | **1.00** | **1.9531** | **-** | **20112 B** | **1.00** |
47+
| TiktokenSharpV1_1_5_ | CountTokens | 1. (...)57. [19866] | 242,255.09 ns | 0.69 | 7.8125 | 0.4883 | 65968 B | 3.28 |
48+
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | 1. (...)57. [19866] | 244,002.56 ns | 0.69 | - | - | 304 B | 0.02 |
49+
| TokenizerLibV1_3_3_ | CountTokens | 1. (...)57. [19866] | 473,840.64 ns | 1.34 | 184.5703 | 75.1953 | 1547672 B | 76.95 |
50+
| Tiktoken_ | CountTokens | 1. (...)57. [19866] | 154,192.33 ns | 0.44 | - | - | - | 0.00 |
51+
| | | | | | | | | |
52+
| **SharpTokenV2_0_3_** | **CountTokens** | **Hello, World!** | **226.96 ns** | **1.00** | **0.0305** | **-** | **256 B** | **1.00** |
53+
| TiktokenSharpV1_1_5_ | CountTokens | Hello, World! | 165.53 ns | 0.73 | 0.0238 | - | 200 B | 0.78 |
54+
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | Hello, World! | 191.55 ns | 0.84 | 0.0124 | - | 104 B | 0.41 |
55+
| TokenizerLibV1_3_3_ | CountTokens | Hello, World! | 292.79 ns | 1.29 | 0.1769 | 0.0005 | 1480 B | 5.78 |
56+
| Tiktoken_ | CountTokens | Hello, World! | 101.90 ns | 0.45 | - | - | - | 0.00 |
57+
| | | | | | | | | |
58+
| **SharpTokenV2_0_3_** | **CountTokens** | **King(...)edy. [275]** | **3,941.04 ns** | **1.00** | **0.0610** | **-** | **520 B** | **1.00** |
59+
| TiktokenSharpV1_1_5_ | CountTokens | King(...)edy. [275] | 2,442.44 ns | 0.62 | 0.0916 | - | 776 B | 1.49 |
60+
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | King(...)edy. [275] | 2,122.37 ns | 0.54 | 0.0114 | - | 104 B | 0.20 |
61+
| TokenizerLibV1_3_3_ | CountTokens | King(...)edy. [275] | 4,698.25 ns | 1.19 | 2.3117 | 0.0992 | 19344 B | 37.20 |
62+
| Tiktoken_ | CountTokens | King(...)edy. [275] | 1,382.18 ns | 0.35 | 0.0038 | - | 32 B | 0.06 |
63+
| | | | | | | | | |
64+
| **SharpTokenV2_0_3_Decode** | **Decode** | **1. (...)57. [19866]** | **46,848.18 ns** | **1.00** | **14.8926** | **-** | **125232 B** | **1.00** |
65+
| TiktokenSharpV1_1_5_Decode | Decode | 1. (...)57. [19866] | 35,494.59 ns | 0.76 | 15.8691 | 2.6245 | 133400 B | 1.07 |
66+
| MicrosoftMLTokenizerV1_0_0_Decode | Decode | 1. (...)57. [19866] | 67,996.10 ns | 1.45 | 4.6387 | - | 39800 B | 0.32 |
67+
| TokenizerLibV1_3_3_Decode | Decode | 1. (...)57. [19866] | 47,744.68 ns | 1.02 | 28.0151 | 2.9297 | 234680 B | 1.87 |
68+
| Tiktoken_Decode | Decode | 1. (...)57. [19866] | 43,773.75 ns | 0.93 | 4.6997 | - | 39800 B | 0.32 |
69+
| | | | | | | | | |
70+
| **SharpTokenV2_0_3_Decode** | **Decode** | **Hello, World!** | **60.35 ns** | **1.00** | **0.0564** | **-** | **472 B** | **1.00** |
71+
| TiktokenSharpV1_1_5_Decode | Decode | Hello, World! | 42.81 ns | 0.71 | 0.0105 | - | 88 B | 0.19 |
72+
| MicrosoftMLTokenizerV1_0_0_Decode | Decode | Hello, World! | 46.36 ns | 0.77 | 0.0105 | - | 88 B | 0.19 |
73+
| TokenizerLibV1_3_3_Decode | Decode | Hello, World! | 46.03 ns | 0.76 | 0.0344 | - | 288 B | 0.61 |
74+
| Tiktoken_Decode | Decode | Hello, World! | 37.60 ns | 0.62 | 0.0105 | - | 88 B | 0.19 |
75+
| | | | | | | | | |
76+
| **SharpTokenV2_0_3_Decode** | **Decode** | **King(...)edy. [275]** | **556.70 ns** | **1.00** | **0.2146** | **-** | **1800 B** | **1.00** |
77+
| TiktokenSharpV1_1_5_Decode | Decode | King(...)edy. [275] | 458.64 ns | 0.82 | 0.0734 | - | 616 B | 0.34 |
78+
| MicrosoftMLTokenizerV1_0_0_Decode | Decode | King(...)edy. [275] | 562.85 ns | 1.01 | 0.0734 | - | 616 B | 0.34 |
79+
| TokenizerLibV1_3_3_Decode | Decode | King(...)edy. [275] | 447.74 ns | 0.80 | 0.3901 | 0.0005 | 3264 B | 1.81 |
80+
| Tiktoken_Decode | Decode | King(...)edy. [275] | 374.09 ns | 0.67 | 0.0734 | - | 616 B | 0.34 |
81+
| | | | | | | | | |
82+
| **SharpTokenV2_0_3_Encode** | **Encode** | **1. (...)57. [19866]** | **359,194.60 ns** | **1.00** | **1.9531** | **-** | **20112 B** | **1.00** |
83+
| TiktokenSharpV1_1_5_Encode | Encode | 1. (...)57. [19866] | 239,457.49 ns | 0.67 | 7.8125 | 0.4883 | 65968 B | 3.28 |
84+
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | 1. (...)57. [19866] | 250,800.76 ns | 0.70 | 7.8125 | 0.4883 | 66144 B | 3.29 |
85+
| TokenizerLibV1_3_3_Encode | Encode | 1. (...)57. [19866] | 489,822.50 ns | 1.36 | 184.5703 | 75.1953 | 1547672 B | 76.95 |
86+
| Tiktoken_Encode | Encode | 1. (...)57. [19866] | 168,313.91 ns | 0.47 | 7.8125 | 0.7324 | 66152 B | 3.29 |
87+
| | | | | | | | | |
88+
| **SharpTokenV2_0_3_Encode** | **Encode** | **Hello, World!** | **231.24 ns** | **1.00** | **0.0305** | **-** | **256 B** | **1.00** |
89+
| TiktokenSharpV1_1_5_Encode | Encode | Hello, World! | 165.16 ns | 0.71 | 0.0238 | - | 200 B | 0.78 |
90+
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | Hello, World! | 200.68 ns | 0.87 | 0.0210 | - | 176 B | 0.69 |
91+
| TokenizerLibV1_3_3_Encode | Encode | Hello, World! | 291.37 ns | 1.26 | 0.1769 | 0.0005 | 1480 B | 5.78 |
92+
| Tiktoken_Encode | Encode | Hello, World! | 146.61 ns | 0.63 | 0.0458 | - | 384 B | 1.50 |
93+
| | | | | | | | | |
94+
| **SharpTokenV2_0_3_Encode** | **Encode** | **King(...)edy. [275]** | **3,977.22 ns** | **1.00** | **0.0610** | **-** | **520 B** | **1.00** |
95+
| TiktokenSharpV1_1_5_Encode | Encode | King(...)edy. [275] | 2,452.75 ns | 0.62 | 0.0916 | - | 776 B | 1.49 |
96+
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | King(...)edy. [275] | 2,202.46 ns | 0.55 | 0.0877 | - | 752 B | 1.45 |
97+
| TokenizerLibV1_3_3_Encode | Encode | King(...)edy. [275] | 4,742.05 ns | 1.19 | 2.3117 | 0.0992 | 19344 B | 37.20 |
98+
| Tiktoken_Encode | Encode | King(...)edy. [275] | 1,549.81 ns | 0.39 | 0.1183 | - | 992 B | 1.91 |
9399

94100
<!--BENCHMARKS_END-->
95101

0 commit comments

Comments
 (0)