From 3b9c37235d6cd2764bf65939a205b147ed03f2ab Mon Sep 17 00:00:00 2001 From: Maria N Date: Mon, 11 Nov 2024 07:54:19 -0500 Subject: [PATCH 01/10] ML.NET to AI and Machine Learning update the ML.NET title --- docs/core/whats-new/dotnet-9/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index 9625a15d1deb9..476da2b727825 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -53,7 +53,7 @@ The .NET 9 SDK introduces _workload sets_, where all of your workloads stay at a For more information, see [What's new in the SDK for .NET 9](sdk.md). -## ML.NET +## AI and Machine Learning ML.NET is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications. The latest version, ML.NET 4.0, adds [additional tokenizer support](../../../machine-learning/whats-new/overview.md#additional-tokenizer-support) for tokenizers such as Tiktoken and models such as Llama and CodeGen. From f6a4f1ef080a1b0d5b28b13a10acee64dd31b163 Mon Sep 17 00:00:00 2001 From: Luis Quintanilla <46974588+luisquintanilla@users.noreply.github.com> Date: Mon, 11 Nov 2024 15:18:44 -0500 Subject: [PATCH 02/10] Initial commit. Still WIP --- docs/core/whats-new/dotnet-9/overview.md | 32 +++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index 476da2b727825..e939f3337dfcf 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -53,7 +53,37 @@ The .NET 9 SDK introduces _workload sets_, where all of your workloads stay at a For more information, see [What's new in the SDK for .NET 9](sdk.md). -## AI and Machine Learning +## AI Building Blocks and Fundamentals + +### Tokenizers + +The Microsoft.ML.Tokenizers library provides .NET developer with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and pre-process text when working with local models. + +The latest release introduces significant new capabilities: + +- Tokenizers + - SentencePiece + - WordPiece + - BERT + - CodeGen +- Built-in tokenizers for the following models: + - GPT (3, 3.5, 4, 4o, o1) + - Llama + - Phi + +### Tensors + + + +### Microsoft.Extensions.AI + + + +### Microsoft.Extensions.VectorData + + + +## ML.NET ML.NET is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications. The latest version, ML.NET 4.0, adds [additional tokenizer support](../../../machine-learning/whats-new/overview.md#additional-tokenizer-support) for tokenizers such as Tiktoken and models such as Llama and CodeGen. From 75f4e8d189d788f964f2d5244d8a7b1f5be16a3d Mon Sep 17 00:00:00 2001 From: Luis Quintanilla <46974588+luisquintanilla@users.noreply.github.com> Date: Mon, 11 Nov 2024 18:42:41 -0500 Subject: [PATCH 03/10] Updates AI sections --- docs/core/whats-new/dotnet-9/overview.md | 35 ++++++++++++++++++------ 1 file changed, 27 insertions(+), 8 deletions(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index e939f3337dfcf..6bd79f238ead7 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -62,30 +62,49 @@ The Microsoft.ML.Tokenizers library provides .NET developer with capabilities fo The latest release introduces significant new capabilities: - Tokenizers + - Byte-Level BPE - SentencePiece - WordPiece - - BERT - - CodeGen - Built-in tokenizers for the following models: - GPT (3, 3.5, 4, 4o, o1) - Llama - Phi + - BERT + - CodeGen ### Tensors - +In .NET 9, `TensorPrimitives` and the new `Tensor` type expand AI capabilities by enabling efficient encoding, manipulation, and computation of multi-dimensional data. + +Improvements in the latest release of System.Numerics.Tensors include: -### Microsoft.Extensions.AI +#### TensorPrimitives - +- **Expanded Method Scope:** Increased from 40 to nearly 200 overloads, now including numerical operations similar to `Math`, `MathF`, and `INumber`, but for spans of values. +- **Performance Enhancements:** Many operations are now SIMD-optimized for better performance. +- **Generic Overloads:** Supports any T that implements a certain interface, expanding beyond just spans of float values in .NET -### Microsoft.Extensions.VectorData +#### Tensor - +- Builds on top of `TensorPrimitives` for efficient math operations. +- Provides efficient interop with AI libraries (ML.NET, TorchSharp, ONNX Runtime) using zero copies where possible. +- Enables easy and efficient data manipulation with indexing and slicing operations. + +### Microsoft.Extensions.AI & Microsoft.Extensions.VectorData + +.NET 9 introduces a unified layer of C# abstractions through Microsoft.Extensions.AI and Microsoft.Extensions.VectorData. These abstractions facilitate interaction with AI services, including small and large language models (SLMs and LLMs), embeddings, vector stores, and middleware. ## ML.NET -ML.NET is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications. The latest version, ML.NET 4.0, adds [additional tokenizer support](../../../machine-learning/whats-new/overview.md#additional-tokenizer-support) for tokenizers such as Tiktoken and models such as Llama and CodeGen. +ML.NET is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications. + +ML.NET 4.0 brings the following improvements: + +- New ways to programatically configure `MLContext` options. +- Load ONNX models as `Stream`. +- DataFrame improvements. +- (Experimental) TorchSharp ports of Llama and Phi family of models. +- (Experimental) CausalLM pipeline APIs ## .NET Aspire From 1c80c4c4f4795ec28203927bd91b5fec1e873d98 Mon Sep 17 00:00:00 2001 From: Luis Quintanilla <46974588+luisquintanilla@users.noreply.github.com> Date: Mon, 11 Nov 2024 18:48:47 -0500 Subject: [PATCH 04/10] Rearrange to place MEAI and VD upfront --- docs/core/whats-new/dotnet-9/overview.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index 6bd79f238ead7..6ce8cf0e9b90f 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -55,6 +55,10 @@ For more information, see [What's new in the SDK for .NET 9](sdk.md). ## AI Building Blocks and Fundamentals +### Microsoft.Extensions.AI & Microsoft.Extensions.VectorData + +.NET 9 introduces a unified layer of C# abstractions through [Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI.Abstractions/) and [Microsoft.Extensions.VectorData](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions/). These abstractions facilitate interaction with AI services, including small and large language models (SLMs and LLMs), embeddings, vector stores, and middleware. + ### Tokenizers The Microsoft.ML.Tokenizers library provides .NET developer with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and pre-process text when working with local models. @@ -90,10 +94,6 @@ Improvements in the latest release of System.Numerics.Tensors include: - Provides efficient interop with AI libraries (ML.NET, TorchSharp, ONNX Runtime) using zero copies where possible. - Enables easy and efficient data manipulation with indexing and slicing operations. -### Microsoft.Extensions.AI & Microsoft.Extensions.VectorData - -.NET 9 introduces a unified layer of C# abstractions through Microsoft.Extensions.AI and Microsoft.Extensions.VectorData. These abstractions facilitate interaction with AI services, including small and large language models (SLMs and LLMs), embeddings, vector stores, and middleware. - ## ML.NET ML.NET is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications. From d9e01393e3409b09daf7dee3ba504d66ece70534 Mon Sep 17 00:00:00 2001 From: Luis Quintanilla <46974588+luisquintanilla@users.noreply.github.com> Date: Mon, 11 Nov 2024 18:51:24 -0500 Subject: [PATCH 05/10] Add link to ML.NET package --- docs/core/whats-new/dotnet-9/overview.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index 6ce8cf0e9b90f..233151b656f44 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -61,7 +61,7 @@ For more information, see [What's new in the SDK for .NET 9](sdk.md). ### Tokenizers -The Microsoft.ML.Tokenizers library provides .NET developer with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and pre-process text when working with local models. +The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides .NET developer with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and pre-process text when working with local models. The latest release introduces significant new capabilities: @@ -80,7 +80,7 @@ The latest release introduces significant new capabilities: In .NET 9, `TensorPrimitives` and the new `Tensor` type expand AI capabilities by enabling efficient encoding, manipulation, and computation of multi-dimensional data. -Improvements in the latest release of System.Numerics.Tensors include: +Improvements in the latest release of [System.Numerics.Tensors](https://www.nuget.org/packages/System.Numerics.Tensors/) include: #### TensorPrimitives @@ -96,7 +96,7 @@ Improvements in the latest release of System.Numerics.Tensors include: ## ML.NET -ML.NET is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications. +[ML.NET](https://www.nuget.org/packages/Microsoft.ML/) is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications. ML.NET 4.0 brings the following improvements: From 4e6f28f9f5441c23f8da18e12b751788e6bb3697 Mon Sep 17 00:00:00 2001 From: James Montemagno Date: Mon, 11 Nov 2024 16:34:31 -0800 Subject: [PATCH 06/10] Apply suggestions from code review --- docs/core/whats-new/dotnet-9/overview.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index 233151b656f44..0f89a312c0d20 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -66,7 +66,6 @@ The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokeni The latest release introduces significant new capabilities: - Tokenizers - - Byte-Level BPE - SentencePiece - WordPiece - Built-in tokenizers for the following models: @@ -86,7 +85,7 @@ Improvements in the latest release of [System.Numerics.Tensors](https://www.nuge - **Expanded Method Scope:** Increased from 40 to nearly 200 overloads, now including numerical operations similar to `Math`, `MathF`, and `INumber`, but for spans of values. - **Performance Enhancements:** Many operations are now SIMD-optimized for better performance. -- **Generic Overloads:** Supports any T that implements a certain interface, expanding beyond just spans of float values in .NET +- **Generic Overloads:** Supports any T that implements a certain interface, expanding beyond just spans of float values in .NET. #### Tensor From f29a82643f3e1679db9b41a407b48d92320f63d4 Mon Sep 17 00:00:00 2001 From: James Montemagno Date: Mon, 11 Nov 2024 16:58:26 -0800 Subject: [PATCH 07/10] Apply suggestions from code review Co-authored-by: Genevieve Warren <24882762+gewarren@users.noreply.github.com> --- docs/core/whats-new/dotnet-9/overview.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index 0f89a312c0d20..a922c0ba65ceb 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -53,15 +53,15 @@ The .NET 9 SDK introduces _workload sets_, where all of your workloads stay at a For more information, see [What's new in the SDK for .NET 9](sdk.md). -## AI Building Blocks and Fundamentals +## AI building blocks and fundamentals ### Microsoft.Extensions.AI & Microsoft.Extensions.VectorData -.NET 9 introduces a unified layer of C# abstractions through [Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI.Abstractions/) and [Microsoft.Extensions.VectorData](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions/). These abstractions facilitate interaction with AI services, including small and large language models (SLMs and LLMs), embeddings, vector stores, and middleware. +.NET 9 introduces a unified layer of C# abstractions through the [Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI.Abstractions/) and [Microsoft.Extensions.VectorData](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions/) packages. These abstractions facilitate interaction with AI services, including small and large language models (SLMs and LLMs), embeddings, vector stores, and middleware. ### Tokenizers -The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides .NET developer with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and pre-process text when working with local models. +The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides .NET developers with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and preprocess text when working with local models. The latest release introduces significant new capabilities: @@ -105,6 +105,8 @@ ML.NET 4.0 brings the following improvements: - (Experimental) TorchSharp ports of Llama and Phi family of models. - (Experimental) CausalLM pipeline APIs +For more information, see [What's new in ML.NET](dotnet/machine-learning/whats-new/overview). + ## .NET Aspire .NET Aspire is an opinionated, cloud-ready stack for building observable, production ready, distributed applications.​ .NET Aspire is delivered through a collection of NuGet packages that handle specific cloud-native concerns, and is available in preview for .NET 9. For more information, see [.NET Aspire](/dotnet/aspire). From 130523783b773349bde07814fbe3ed54b21bf5ea Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Mon, 11 Nov 2024 20:20:56 -0800 Subject: [PATCH 08/10] Apply suggestions from code review Co-authored-by: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com> --- docs/core/whats-new/dotnet-9/overview.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index a922c0ba65ceb..62cfa886f1f30 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -66,14 +66,12 @@ The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokeni The latest release introduces significant new capabilities: - Tokenizers - - SentencePiece + - Tiktoken for GPT (3, 3.5, 4, 4o, o1) and Llam3 models + - Llama (based on SentencePiece) for Llama and Mistral models + - CodeGen for code generation models like codegen-350M-mono + - Phi2 (based on CodeGen) for Microsoft Phi2 model - WordPiece -- Built-in tokenizers for the following models: - - GPT (3, 3.5, 4, 4o, o1) - - Llama - - Phi - - BERT - - CodeGen + - Bert (based on WordPiece) for Bert supported models like optimum--all-MiniLM-L6-v2 ### Tensors From 295b53b08560523e51e301ebd936192b5470596c Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Mon, 11 Nov 2024 20:25:48 -0800 Subject: [PATCH 09/10] Update docs/core/whats-new/dotnet-9/overview.md --- docs/core/whats-new/dotnet-9/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index 62cfa886f1f30..76809c1882a57 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -103,7 +103,7 @@ ML.NET 4.0 brings the following improvements: - (Experimental) TorchSharp ports of Llama and Phi family of models. - (Experimental) CausalLM pipeline APIs -For more information, see [What's new in ML.NET](dotnet/machine-learning/whats-new/overview). +For more information, see [What's new in ML.NET](../../../machine-learning/whats-new/overview.md). ## .NET Aspire From bb3802ef14e6e1acb440cebe32b82446d5df354d Mon Sep 17 00:00:00 2001 From: Genevieve Warren <24882762+gewarren@users.noreply.github.com> Date: Tue, 12 Nov 2024 09:09:48 -0800 Subject: [PATCH 10/10] move tokenizers under ml.net --- docs/core/whats-new/dotnet-9/overview.md | 54 +++++++++++------------- 1 file changed, 24 insertions(+), 30 deletions(-) diff --git a/docs/core/whats-new/dotnet-9/overview.md b/docs/core/whats-new/dotnet-9/overview.md index 76809c1882a57..0c383146cf795 100644 --- a/docs/core/whats-new/dotnet-9/overview.md +++ b/docs/core/whats-new/dotnet-9/overview.md @@ -53,58 +53,52 @@ The .NET 9 SDK introduces _workload sets_, where all of your workloads stay at a For more information, see [What's new in the SDK for .NET 9](sdk.md). -## AI building blocks and fundamentals - -### Microsoft.Extensions.AI & Microsoft.Extensions.VectorData +## AI building blocks .NET 9 introduces a unified layer of C# abstractions through the [Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI.Abstractions/) and [Microsoft.Extensions.VectorData](https://www.nuget.org/packages/Microsoft.Extensions.VectorData.Abstractions/) packages. These abstractions facilitate interaction with AI services, including small and large language models (SLMs and LLMs), embeddings, vector stores, and middleware. -### Tokenizers - -The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides .NET developers with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and preprocess text when working with local models. - -The latest release introduces significant new capabilities: - -- Tokenizers - - Tiktoken for GPT (3, 3.5, 4, 4o, o1) and Llam3 models - - Llama (based on SentencePiece) for Llama and Mistral models - - CodeGen for code generation models like codegen-350M-mono - - Phi2 (based on CodeGen) for Microsoft Phi2 model - - WordPiece - - Bert (based on WordPiece) for Bert supported models like optimum--all-MiniLM-L6-v2 - -### Tensors - -In .NET 9, `TensorPrimitives` and the new `Tensor` type expand AI capabilities by enabling efficient encoding, manipulation, and computation of multi-dimensional data. +.NET 9 also includes new tensor types that expand AI capabilities. and the new type expand AI capabilities by enabling efficient encoding, manipulation, and computation of multi-dimensional data. You can find these types in the latest release of the [System.Numerics.Tensors package](https://www.nuget.org/packages/System.Numerics.Tensors/). -Improvements in the latest release of [System.Numerics.Tensors](https://www.nuget.org/packages/System.Numerics.Tensors/) include: +### TensorPrimitives -#### TensorPrimitives +- Expanded method scope: Increased from 40 to nearly 200 overloads, now including numerical operations similar to `Math`, `MathF`, and `INumber` but for spans of values. +- Performance enhancements: Many operations are now SIMD-optimized for better performance. +- Generic overloads: Supports any type `T` that implements a certain interface, expanding beyond just spans of float values in .NET. -- **Expanded Method Scope:** Increased from 40 to nearly 200 overloads, now including numerical operations similar to `Math`, `MathF`, and `INumber`, but for spans of values. -- **Performance Enhancements:** Many operations are now SIMD-optimized for better performance. -- **Generic Overloads:** Supports any T that implements a certain interface, expanding beyond just spans of float values in .NET. - -#### Tensor +### Tensor\ - Builds on top of `TensorPrimitives` for efficient math operations. - Provides efficient interop with AI libraries (ML.NET, TorchSharp, ONNX Runtime) using zero copies where possible. - Enables easy and efficient data manipulation with indexing and slicing operations. -## ML.NET +### ML.NET [ML.NET](https://www.nuget.org/packages/Microsoft.ML/) is an open-source, cross-platform framework that enables integration of custom machine-learning models into .NET applications. ML.NET 4.0 brings the following improvements: -- New ways to programatically configure `MLContext` options. +- New ways to programmatically configure `MLContext` options. - Load ONNX models as `Stream`. - DataFrame improvements. +- New capabilities for [tokenizers](#tokenizers). - (Experimental) TorchSharp ports of Llama and Phi family of models. -- (Experimental) CausalLM pipeline APIs +- (Experimental) CausalLM pipeline APIs. For more information, see [What's new in ML.NET](../../../machine-learning/whats-new/overview.md). +#### Tokenizers + +The [Microsoft.ML.Tokenizers](https://www.nuget.org/packages/Microsoft.ML.Tokenizers) library provides .NET developers with capabilities for encoding and decoding text to tokens. For AI scenarios, this is important to manage context, calculate cost, and preprocess text when working with local models. + +The latest release introduces significant new capabilities for tokenizers: + +- Tiktoken for GPT (3, 3.5, 4, 4o, o1) and Llam3 models +- Llama (based on SentencePiece) for Llama and Mistral models +- CodeGen for code-generation models like codegen-350M-mono +- Phi2 (based on CodeGen) for Microsoft Phi2 model +- WordPiece +- Bert (based on WordPiece) for Bert-supported models like optimum--all-MiniLM-L6-v2 + ## .NET Aspire .NET Aspire is an opinionated, cloud-ready stack for building observable, production ready, distributed applications.​ .NET Aspire is delivered through a collection of NuGet packages that handle specific cloud-native concerns, and is available in preview for .NET 9. For more information, see [.NET Aspire](/dotnet/aspire).