You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/ai/how-to/use-tokenizers.md
+16-19Lines changed: 16 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,17 +4,16 @@ description: Learn how to use the Microsoft.ML.Tokenizers library to tokenize te
4
4
ms.topic: how-to
5
5
ms.date: 10/29/2025
6
6
ai-usage: ai-assisted
7
-
#customer intent: As a .NET developer, I want to use the Microsoft.ML.Tokenizers library to tokenize text so I can work with AI models, manage costs, and handle token limits effectively.
8
7
---
9
8
# Use Microsoft.ML.Tokenizers for text tokenization
10
9
11
-
The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides a comprehensive set of tools for tokenizing text in .NET applications. Tokenization is essential when working with large language models (LLMs), as it allows you to manage token counts, estimate costs, and preprocess text for AI models.
10
+
The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides a comprehensive set of tools for tokenizing text in .NET applications. Tokenization is essential when you work with large language models (LLMs), as it allows you to manage token counts, estimate costs, and preprocess text for AI models.
12
11
13
12
This article shows you how to use the library's key features and work with different tokenizer models.
14
13
15
14
## Prerequisites
16
15
17
-
-[.NET 8.0 SDK](https://dotnet.microsoft.com/download/dotnet/8.0) or later
16
+
-[.NET 8 SDK](https://dotnet.microsoft.com/download/dotnet/8.0) or later
The library also provides specialized tokenizers like `BpeTokenizer` and `EnglishRobertaTokenizer` that you can configure with custom vocabularies for specific models.
69
+
The library also provides specialized tokenizers like <xref:Microsoft.ML.Tokenizers.BpeTokenizer> and <xref:Microsoft.ML.Tokenizers.EnglishRobertaTokenizer> that you can configure with custom vocabularies for specific models.
75
70
76
71
## Common tokenizer operations
77
72
78
-
All tokenizers in the library implement the `Tokenizer` base class, which provides a consistent API:
73
+
All tokenizers in the library implement the <xref:Microsoft.ML.Tokenizers.Tokenizer> base class. The following table shows the available methods.
79
74
80
-
-**`EncodeToIds`**: Converts text to a list of token IDs
81
-
-**`Decode`**: Converts token IDs back to text
82
-
-**`CountTokens`**: Returns the number of tokens in a text string
83
-
-**`EncodeToTokens`**: Returns detailed token information including values and IDs
84
-
-**`GetIndexByTokenCount`**: Finds the character index for a specific token count from the start
85
-
-**`GetIndexByTokenCountFromEnd`**: Finds the character index for a specific token count from the end
0 commit comments