Skip to content

Commit bb3802e

Browse files
committed
move tokenizers under ml.net
1 parent 295b53b commit bb3802e

File tree

1 file changed

+24
-30
lines changed

1 file changed

+24
-30
lines changed

docs/core/whats-new/dotnet-9/overview.md

Lines changed: 24 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -53,58 +53,52 @@ The .NET 9 SDK introduces _workload sets_, where all of your workloads stay at a
5353

5454
For more information, see [What's new in the SDK for .NET 9](sdk.md).
5555

56-
## AI building blocks and fundamentals
57-
58-
### Microsoft.Extensions.AI & Microsoft.Extensions.VectorData
56+
## AI building blocks
5957

6058
.NET 9 introduces a unified layer of C# abstractions through the [Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI.Abstractions/) and [Microsoft.Extensions.VectorData](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions/) packages. These abstractions facilitate interaction with AI services, including small and large language models (SLMs and LLMs), embeddings, vector stores, and middleware.
6159

62-
### Tokenizers
63-
64-
The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides .NET developers with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and preprocess text when working with local models.
65-
66-
The latest release introduces significant new capabilities:
67-
68-
- Tokenizers
69-
- Tiktoken for GPT (3, 3.5, 4, 4o, o1) and Llam3 models
70-
- Llama (based on SentencePiece) for Llama and Mistral models
71-
- CodeGen for code generation models like codegen-350M-mono
72-
- Phi2 (based on CodeGen) for Microsoft Phi2 model
73-
- WordPiece
74-
- Bert (based on WordPiece) for Bert supported models like optimum--all-MiniLM-L6-v2
75-
76-
### Tensors
77-
78-
In .NET 9, `TensorPrimitives` and the new `Tensor<T>` type expand AI capabilities by enabling efficient encoding, manipulation, and computation of multi-dimensional data.
60+
.NET 9 also includes new tensor types that expand AI capabilities. <xref:System.Numerics.Tensors.TensorPrimitives> and the new <xref:System.Numerics.Tensors.Tensor%601> type expand AI capabilities by enabling efficient encoding, manipulation, and computation of multi-dimensional data. You can find these types in the latest release of the [System.Numerics.Tensors package](https://www.nuget.org/packages/System.Numerics.Tensors/).
7961

80-
Improvements in the latest release of [System.Numerics.Tensors](https://www.nuget.org/packages/System.Numerics.Tensors/) include:
62+
### TensorPrimitives
8163

82-
#### TensorPrimitives
64+
- Expanded method scope: Increased from 40 to nearly 200 overloads, now including numerical operations similar to `Math`, `MathF`, and `INumber<T>` but for spans of values.
65+
- Performance enhancements: Many operations are now SIMD-optimized for better performance.
66+
- Generic overloads: Supports any type `T` that implements a certain interface, expanding beyond just spans of float values in .NET.
8367

84-
- **Expanded Method Scope:** Increased from 40 to nearly 200 overloads, now including numerical operations similar to `Math`, `MathF`, and `INumber<T>`, but for spans of values.
85-
- **Performance Enhancements:** Many operations are now SIMD-optimized for better performance.
86-
- **Generic Overloads:** Supports any T that implements a certain interface, expanding beyond just spans of float values in .NET.
87-
88-
#### Tensor
68+
### Tensor\<T>
8969

9070
- Builds on top of `TensorPrimitives` for efficient math operations.
9171
- Provides efficient interop with AI libraries (ML.NET, TorchSharp, ONNX Runtime) using zero copies where possible.
9272
- Enables easy and efficient data manipulation with indexing and slicing operations.
9373

94-
## ML.NET
74+
### ML.NET
9575

9676
[ML.NET](https://www.nuget.org/packages/Microsoft.ML/) is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications.
9777

9878
ML.NET 4.0 brings the following improvements:
9979

100-
- New ways to programatically configure `MLContext` options.
80+
- New ways to programmatically configure `MLContext` options.
10181
- Load ONNX models as `Stream`.
10282
- DataFrame improvements.
83+
- New capabilities for [tokenizers](#tokenizers).
10384
- (Experimental) TorchSharp ports of Llama and Phi family of models.
104-
- (Experimental) CausalLM pipeline APIs
85+
- (Experimental) CausalLM pipeline APIs.
10586

10687
For more information, see [What's new in ML.NET](../../../machine-learning/whats-new/overview.md).
10788

89+
#### Tokenizers
90+
91+
The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides .NET developers with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and preprocess text when working with local models.
92+
93+
The latest release introduces significant new capabilities for tokenizers:
94+
95+
- Tiktoken for GPT (3, 3.5, 4, 4o, o1) and Llam3 models
96+
- Llama (based on SentencePiece) for Llama and Mistral models
97+
- CodeGen for code-generation models like codegen-350M-mono
98+
- Phi2 (based on CodeGen) for Microsoft Phi2 model
99+
- WordPiece
100+
- Bert (based on WordPiece) for Bert-supported models like optimum--all-MiniLM-L6-v2
101+
108102
## .NET Aspire
109103

110104
.NET Aspire is an opinionated, cloud-ready stack for building observable, production ready, distributed applications.​ .NET Aspire is delivered through a collection of NuGet packages that handle specific cloud-native concerns, and is available in preview for .NET 9. For more information, see [.NET Aspire](/dotnet/aspire).

0 commit comments

Comments
 (0)