Skip to content

Commit ae218af

Browse files
committed
feat: updated
1 parent 3983c2e commit ae218af

File tree

4 files changed

+101
-107
lines changed

4 files changed

+101
-107
lines changed

CLAUDE.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
High-performance .NET port of OpenAI's [tiktoken](https://github.com/openai/tiktoken) tokenizer, optimized for token counting speed. Published as [Tiktoken](https://www.nuget.org/packages/Tiktoken/) on NuGet.
8+
9+
## Build Commands
10+
11+
```bash
12+
# Build the solution
13+
dotnet build Tiktoken.sln
14+
15+
# Build for release
16+
dotnet build Tiktoken.sln -c Release
17+
18+
# Run unit tests
19+
dotnet test src/tests/Tiktoken.UnitTests/Tiktoken.UnitTests.csproj
20+
21+
# Run all tests
22+
dotnet test Tiktoken.sln
23+
24+
# Run benchmarks
25+
dotnet run -c Release --project src/benchmarks/Tiktoken.Benchmarks/Tiktoken.Benchmarks.csproj
26+
```
27+
28+
## Architecture
29+
30+
### Project Layout
31+
32+
| Project | Purpose |
33+
|---------|---------|
34+
| `src/libs/Tiktoken/` | Main convenience library -- bundles Core + cl100k + o200k encodings |
35+
| `src/libs/Tiktoken.Core/` | Core tokenizer engine (`Encoder`, `ModelToEncoder`, BPE logic) |
36+
| `src/libs/Tiktoken.Encodings.Abstractions/` | Base types for encoding definitions |
37+
| `src/libs/Tiktoken.Encodings.cl100k/` | `cl100k_base` encoding (GPT-3.5/GPT-4) |
38+
| `src/libs/Tiktoken.Encodings.o200k/` | `o200k_base` encoding (GPT-4o) |
39+
| `src/libs/Tiktoken.Encodings.p50k/` | `p50k_base` / `p50k_edit` encodings |
40+
| `src/libs/Tiktoken.Encodings.r50k/` | `r50k_base` encoding |
41+
| `src/tests/Tiktoken.UnitTests/` | Unit tests (MSTest + FluentAssertions + Verify) |
42+
| `src/benchmarks/Tiktoken.Benchmarks/` | BenchmarkDotNet performance benchmarks |
43+
| `benchmarks/` | Historical benchmark result reports (Markdown) |
44+
45+
### Supported Encodings
46+
47+
- `o200k_base` -- GPT-4o models
48+
- `cl100k_base` -- GPT-3.5-turbo, GPT-4 models
49+
- `r50k_base` -- older GPT-3 models
50+
- `p50k_base` / `p50k_edit` -- Codex models
51+
52+
### Key API
53+
54+
```csharp
55+
var encoder = ModelToEncoder.For("gpt-4o");
56+
var tokens = encoder.Encode("hello world"); // [15339, 1917]
57+
var text = encoder.Decode(tokens); // "hello world"
58+
var count = encoder.CountTokens(text); // 2
59+
var parts = encoder.Explore(text); // ["hello", " world"]
60+
```
61+
62+
### Build Configuration
63+
64+
- **Target frameworks:** `net4.6.2`, `netstandard2.0`, `netstandard2.1`, `net8.0`, `net9.0`
65+
- **Language:** C# with nullable reference types
66+
- **Unsafe code:** Enabled in Core for performance
67+
- **Encoding data:** Embedded as `.tiktoken` resources in `Tiktoken.Core/Encodings/`
68+
- **Versioning:** Semantic versioning from git tags via MinVer
69+
- **Testing:** MSTest + FluentAssertions + Verify
70+
71+
### CI/CD
72+
73+
- Uses shared workflows from `HavenDV/workflows` repo
74+
- Dependabot updates NuGet packages

Tiktoken.sln

Lines changed: 0 additions & 104 deletions
This file was deleted.

Tiktoken.sln.DotSettings

Lines changed: 0 additions & 3 deletions
This file was deleted.

Tiktoken.slnx

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
<Solution>
2+
<Folder Name="/benchmarks/">
3+
<Project Path="src/benchmarks/Tiktoken.Benchmarks/Tiktoken.Benchmarks.csproj" />
4+
</Folder>
5+
<Folder Name="/libs/">
6+
<File Path="src/libs/Directory.Build.props" />
7+
<Project Path="src/libs/Tiktoken.Core/Tiktoken.Core.csproj" />
8+
<Project Path="src/libs/Tiktoken.Encodings.Abstractions/Tiktoken.Encodings.Abstractions.csproj" />
9+
<Project Path="src/libs/Tiktoken.Encodings.cl100k/Tiktoken.Encodings.cl100k.csproj" />
10+
<Project Path="src/libs/Tiktoken.Encodings.o200k/Tiktoken.Encodings.o200k.csproj" />
11+
<Project Path="src/libs/Tiktoken.Encodings.p50k/Tiktoken.Encodings.p50k.csproj" />
12+
<Project Path="src/libs/Tiktoken.Encodings.r50k/Tiktoken.Encodings.r50k.csproj" />
13+
<Project Path="src/libs/Tiktoken/Tiktoken.csproj" />
14+
</Folder>
15+
<Folder Name="/misc/">
16+
<File Path=".gitattributes" />
17+
<File Path=".github/workflows/dotnet.yml" />
18+
<File Path=".gitignore" />
19+
<File Path="LICENSE.txt" />
20+
<File Path="README.md" />
21+
<File Path="src/Directory.Build.props" />
22+
<File Path="src/Directory.Packages.props" />
23+
</Folder>
24+
<Folder Name="/tests/">
25+
<Project Path="src/tests/Tiktoken.UnitTests/Tiktoken.UnitTests.csproj" />
26+
</Folder>
27+
</Solution>

0 commit comments

Comments
 (0)