Skip to content

Commit e7a300c

Browse files
HavenDVclaude
andcommitted
perf: add GetAlternateLookup optimization and decode benchmarks
Use .NET 9+ GetAlternateLookup<ReadOnlySpan<char>> for zero-allocation dictionary lookups in CountTokensNative and EncodeNative. CountTokens now achieves 0 bytes allocated on large text. Also adds decode benchmarks for SharpToken, TiktokenSharp, and Tiktoken. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 1e248c6 commit e7a300c

File tree

4 files changed

+204
-89
lines changed

4 files changed

+204
-89
lines changed

README.md

Lines changed: 49 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -41,43 +41,55 @@ Apple M4 Max, 1 CPU, 16 logical and 16 physical cores
4141
4242
4343
```
44-
| Method | Categories | Data | Mean | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
45-
|---------------------------------- |------------ |-------------------- |-------------:|------:|---------:|--------:|----------:|------------:|
46-
| **SharpTokenV2_0_3_** | **CountTokens** | **1. (...)57. [19866]** | **367,069.9 ns** | **1.00** | **1.9531** | **-** | **20112 B** | **1.00** |
47-
| TiktokenSharpV1_1_5_ | CountTokens | 1. (...)57. [19866] | 248,732.4 ns | 0.68 | 7.8125 | 0.4883 | 65968 B | 3.28 |
48-
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | 1. (...)57. [19866] | 247,020.3 ns | 0.67 | - | - | 304 B | 0.02 |
49-
| TokenizerLibV1_3_3_ | CountTokens | 1. (...)57. [19866] | 500,866.7 ns | 1.36 | 184.5703 | 75.1953 | 1547672 B | 76.95 |
50-
| Tiktoken_ | CountTokens | 1. (...)57. [19866] | 199,159.1 ns | 0.54 | 17.5781 | - | 148312 B | 7.37 |
51-
| | | | | | | | | |
52-
| **SharpTokenV2_0_3_** | **CountTokens** | **Hello, World!** | **234.7 ns** | **1.00** | **0.0305** | **-** | **256 B** | **1.00** |
53-
| TiktokenSharpV1_1_5_ | CountTokens | Hello, World! | 166.4 ns | 0.71 | 0.0238 | - | 200 B | 0.78 |
54-
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | Hello, World! | 185.8 ns | 0.79 | 0.0124 | - | 104 B | 0.41 |
55-
| TokenizerLibV1_3_3_ | CountTokens | Hello, World! | 298.7 ns | 1.27 | 0.1769 | 0.0005 | 1480 B | 5.78 |
56-
| Tiktoken_ | CountTokens | Hello, World! | 122.8 ns | 0.52 | 0.0143 | - | 120 B | 0.47 |
57-
| | | | | | | | | |
58-
| **SharpTokenV2_0_3_** | **CountTokens** | **King(...)edy. [275]** | **4,018.7 ns** | **1.00** | **0.0610** | **-** | **520 B** | **1.00** |
59-
| TiktokenSharpV1_1_5_ | CountTokens | King(...)edy. [275] | 2,507.5 ns | 0.62 | 0.0916 | - | 776 B | 1.49 |
60-
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | King(...)edy. [275] | 2,096.7 ns | 0.52 | 0.0114 | - | 104 B | 0.20 |
61-
| TokenizerLibV1_3_3_ | CountTokens | King(...)edy. [275] | 4,807.5 ns | 1.20 | 2.3117 | 0.0992 | 19344 B | 37.20 |
62-
| Tiktoken_ | CountTokens | King(...)edy. [275] | 1,761.0 ns | 0.44 | 0.2346 | - | 1976 B | 3.80 |
63-
| | | | | | | | | |
64-
| **SharpTokenV2_0_3_Encode** | **Encode** | **1. (...)57. [19866]** | **393,341.1 ns** | **1.00** | **1.9531** | **-** | **20112 B** | **1.00** |
65-
| TiktokenSharpV1_1_5_Encode | Encode | 1. (...)57. [19866] | 250,618.0 ns | 0.64 | 7.8125 | 0.4883 | 65968 B | 3.28 |
66-
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | 1. (...)57. [19866] | 254,877.6 ns | 0.65 | 7.8125 | 0.4883 | 66144 B | 3.29 |
67-
| TokenizerLibV1_3_3_Encode | Encode | 1. (...)57. [19866] | 502,130.9 ns | 1.28 | 184.5703 | 75.1953 | 1547672 B | 76.95 |
68-
| Tiktoken_Encode | Encode | 1. (...)57. [19866] | 214,646.0 ns | 0.55 | 25.3906 | 2.9297 | 214464 B | 10.66 |
69-
| | | | | | | | | |
70-
| **SharpTokenV2_0_3_Encode** | **Encode** | **Hello, World!** | **236.9 ns** | **1.00** | **0.0305** | **-** | **256 B** | **1.00** |
71-
| TiktokenSharpV1_1_5_Encode | Encode | Hello, World! | 170.8 ns | 0.72 | 0.0238 | - | 200 B | 0.78 |
72-
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | Hello, World! | 208.9 ns | 0.88 | 0.0210 | - | 176 B | 0.69 |
73-
| TokenizerLibV1_3_3_Encode | Encode | Hello, World! | 301.8 ns | 1.27 | 0.1769 | 0.0005 | 1480 B | 5.78 |
74-
| Tiktoken_Encode | Encode | Hello, World! | 163.3 ns | 0.69 | 0.0601 | - | 504 B | 1.97 |
75-
| | | | | | | | | |
76-
| **SharpTokenV2_0_3_Encode** | **Encode** | **King(...)edy. [275]** | **4,030.5 ns** | **1.00** | **0.0610** | **-** | **520 B** | **1.00** |
77-
| TiktokenSharpV1_1_5_Encode | Encode | King(...)edy. [275] | 2,481.8 ns | 0.62 | 0.0916 | - | 776 B | 1.49 |
78-
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | King(...)edy. [275] | 2,193.7 ns | 0.54 | 0.0877 | - | 752 B | 1.45 |
79-
| TokenizerLibV1_3_3_Encode | Encode | King(...)edy. [275] | 4,881.7 ns | 1.21 | 2.3117 | 0.0992 | 19344 B | 37.20 |
80-
| Tiktoken_Encode | Encode | King(...)edy. [275] | 1,881.4 ns | 0.47 | 0.3510 | - | 2936 B | 5.65 |
44+
| Method | Categories | Data | Mean | Median | Ratio | Gen0 | Gen1 | Allocated | Alloc Ratio |
45+
|---------------------------------- |------------ |-------------------- |--------------:|--------------:|------:|---------:|--------:|----------:|------------:|
46+
| **SharpTokenV2_0_3_** | **CountTokens** | **1. (...)57. [19866]** | **378,104.35 ns** | **374,371.36 ns** | **1.00** | **1.9531** | **-** | **20112 B** | **1.00** |
47+
| TiktokenSharpV1_1_5_ | CountTokens | 1. (...)57. [19866] | 249,330.44 ns | 247,579.14 ns | 0.66 | 7.8125 | 0.4883 | 65968 B | 3.28 |
48+
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | 1. (...)57. [19866] | 249,838.63 ns | 247,990.36 ns | 0.66 | - | - | 304 B | 0.02 |
49+
| TokenizerLibV1_3_3_ | CountTokens | 1. (...)57. [19866] | 499,477.14 ns | 499,648.64 ns | 1.32 | 184.5703 | 75.1953 | 1547672 B | 76.95 |
50+
| Tiktoken_ | CountTokens | 1. (...)57. [19866] | 158,290.75 ns | 157,653.59 ns | 0.42 | - | - | - | 0.00 |
51+
| | | | | | | | | | |
52+
| **SharpTokenV2_0_3_** | **CountTokens** | **Hello, World!** | **237.23 ns** | **235.68 ns** | **1.00** | **0.0305** | **-** | **256 B** | **1.00** |
53+
| TiktokenSharpV1_1_5_ | CountTokens | Hello, World! | 166.19 ns | 165.99 ns | 0.70 | 0.0238 | - | 200 B | 0.78 |
54+
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | Hello, World! | 182.46 ns | 182.27 ns | 0.77 | 0.0124 | - | 104 B | 0.41 |
55+
| TokenizerLibV1_3_3_ | CountTokens | Hello, World! | 294.78 ns | 295.13 ns | 1.24 | 0.1769 | 0.0005 | 1480 B | 5.78 |
56+
| Tiktoken_ | CountTokens | Hello, World! | 101.43 ns | 101.06 ns | 0.43 | - | - | - | 0.00 |
57+
| | | | | | | | | | |
58+
| **SharpTokenV2_0_3_** | **CountTokens** | **King(...)edy. [275]** | **4,033.61 ns** | **4,004.98 ns** | **1.00** | **0.0610** | **-** | **520 B** | **1.00** |
59+
| TiktokenSharpV1_1_5_ | CountTokens | King(...)edy. [275] | 2,503.27 ns | 2,500.90 ns | 0.62 | 0.0916 | - | 776 B | 1.49 |
60+
| MicrosoftMLTokenizerV1_0_0_ | CountTokens | King(...)edy. [275] | 2,147.42 ns | 2,142.02 ns | 0.53 | 0.0114 | - | 104 B | 0.20 |
61+
| TokenizerLibV1_3_3_ | CountTokens | King(...)edy. [275] | 4,741.88 ns | 4,738.57 ns | 1.18 | 2.3117 | 0.0992 | 19344 B | 37.20 |
62+
| Tiktoken_ | CountTokens | King(...)edy. [275] | 1,359.61 ns | 1,357.52 ns | 0.34 | 0.0038 | - | 32 B | 0.06 |
63+
| | | | | | | | | | |
64+
| **SharpTokenV2_0_3_Decode** | **Decode** | **1. (...)57. [19866]** | **47,227.99 ns** | **47,179.53 ns** | **1.00** | **14.8926** | **-** | **125232 B** | **1.00** |
65+
| TiktokenSharpV1_1_5_Decode | Decode | 1. (...)57. [19866] | 35,960.97 ns | 35,924.18 ns | 0.76 | 15.8691 | 2.6245 | 133400 B | 1.07 |
66+
| Tiktoken_Decode | Decode | 1. (...)57. [19866] | 42,623.50 ns | 42,504.76 ns | 0.90 | 14.7705 | 1.4648 | 124248 B | 0.99 |
67+
| | | | | | | | | | |
68+
| **SharpTokenV2_0_3_Decode** | **Decode** | **Hello, World!** | **60.16 ns** | **59.93 ns** | **1.00** | **0.0564** | **-** | **472 B** | **1.00** |
69+
| TiktokenSharpV1_1_5_Decode | Decode | Hello, World! | 43.96 ns | 43.87 ns | 0.73 | 0.0105 | - | 88 B | 0.19 |
70+
| Tiktoken_Decode | Decode | Hello, World! | 53.35 ns | 53.48 ns | 0.89 | 0.0277 | - | 232 B | 0.49 |
71+
| | | | | | | | | | |
72+
| **SharpTokenV2_0_3_Decode** | **Decode** | **King(...)edy. [275]** | **555.68 ns** | **555.68 ns** | **1.00** | **0.2146** | **-** | **1800 B** | **1.00** |
73+
| TiktokenSharpV1_1_5_Decode | Decode | King(...)edy. [275] | 464.31 ns | 464.17 ns | 0.84 | 0.0734 | - | 616 B | 0.34 |
74+
| Tiktoken_Decode | Decode | King(...)edy. [275] | 433.38 ns | 433.43 ns | 0.78 | 0.2227 | - | 1864 B | 1.04 |
75+
| | | | | | | | | | |
76+
| **SharpTokenV2_0_3_Encode** | **Encode** | **1. (...)57. [19866]** | **359,477.48 ns** | **360,047.91 ns** | **1.00** | **1.9531** | **-** | **20112 B** | **1.00** |
77+
| TiktokenSharpV1_1_5_Encode | Encode | 1. (...)57. [19866] | 247,277.33 ns | 246,921.70 ns | 0.69 | 7.8125 | 0.4883 | 65968 B | 3.28 |
78+
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | 1. (...)57. [19866] | 256,138.26 ns | 254,792.06 ns | 0.71 | 7.8125 | 0.4883 | 66144 B | 3.29 |
79+
| TokenizerLibV1_3_3_Encode | Encode | 1. (...)57. [19866] | 509,557.25 ns | 507,033.43 ns | 1.42 | 184.5703 | 75.1953 | 1547672 B | 76.95 |
80+
| Tiktoken_Encode | Encode | 1. (...)57. [19866] | 170,554.17 ns | 170,710.12 ns | 0.47 | 7.8125 | 0.7324 | 66152 B | 3.29 |
81+
| | | | | | | | | | |
82+
| **SharpTokenV2_0_3_Encode** | **Encode** | **Hello, World!** | **233.57 ns** | **232.82 ns** | **1.00** | **0.0305** | **-** | **256 B** | **1.00** |
83+
| TiktokenSharpV1_1_5_Encode | Encode | Hello, World! | 168.67 ns | 168.01 ns | 0.72 | 0.0238 | - | 200 B | 0.78 |
84+
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | Hello, World! | 212.94 ns | 208.56 ns | 0.91 | 0.0210 | - | 176 B | 0.69 |
85+
| TokenizerLibV1_3_3_Encode | Encode | Hello, World! | 297.23 ns | 297.86 ns | 1.27 | 0.1769 | 0.0005 | 1480 B | 5.78 |
86+
| Tiktoken_Encode | Encode | Hello, World! | 140.66 ns | 140.45 ns | 0.60 | 0.0458 | - | 384 B | 1.50 |
87+
| | | | | | | | | | |
88+
| **SharpTokenV2_0_3_Encode** | **Encode** | **King(...)edy. [275]** | **3,923.79 ns** | **3,923.00 ns** | **1.00** | **0.0610** | **-** | **520 B** | **1.00** |
89+
| TiktokenSharpV1_1_5_Encode | Encode | King(...)edy. [275] | 2,464.95 ns | 2,462.37 ns | 0.63 | 0.0916 | - | 776 B | 1.49 |
90+
| MicrosoftMLTokenizerV1_0_0_Encode | Encode | King(...)edy. [275] | 2,215.75 ns | 2,209.49 ns | 0.56 | 0.0877 | - | 752 B | 1.45 |
91+
| TokenizerLibV1_3_3_Encode | Encode | King(...)edy. [275] | 4,704.92 ns | 4,688.91 ns | 1.20 | 2.3117 | 0.0992 | 19344 B | 37.20 |
92+
| Tiktoken_Encode | Encode | King(...)edy. [275] | 1,520.53 ns | 1,517.05 ns | 0.39 | 0.1183 | - | 992 B | 1.91 |
8193

8294
<!--BENCHMARKS_END-->
8395

0 commit comments

Comments
 (0)