From aafcc19a7f27964147c3bcc5249322e8ecd37a8b Mon Sep 17 00:00:00 2001
From: David Pine <david.pine.7@gmail.com>
Date: Mon, 16 Dec 2024 14:57:06 -0600
Subject: [PATCH 1/6] Add the runtime libraries bits for MEAI

---
 docs/ai/ai-extensions.md                      |   6 +-
 .../extensions/artificial-intelligence.md     | 632 ++++++++++++++++++
 docs/core/extensions/http-ratelimiter.md      |  12 +-
 .../snippets/ai/ConsoleAI/ConsoleAI.csproj    |  14 +
 .../snippets/ai/ConsoleAI/Program.cs          |   8 +
 .../snippets/ai/ConsoleAI/SampleChatClient.cs |  58 ++
 .../ai/ConsoleAI/SampleEmbeddingGenerator.cs  |  35 +
 docs/fundamentals/toc.yml                     |   3 +
 8 files changed, 760 insertions(+), 8 deletions(-)
 create mode 100644 docs/core/extensions/artificial-intelligence.md
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI/ConsoleAI.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI/SampleChatClient.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI/SampleEmbeddingGenerator.cs

diff --git a/docs/ai/ai-extensions.md b/docs/ai/ai-extensions.md
index 226e4ec178e82..b9a539f310948 100644
--- a/docs/ai/ai-extensions.md
+++ b/docs/ai/ai-extensions.md
@@ -1,7 +1,7 @@
 ---
 title:  Unified AI building blocks for .NET
 description: Learn how to develop with unified AI building blocks for .NET using Microsoft.Extensions.AI and Microsoft.Extensions.AI.Abstractions libraries
-ms.date: 11/04/2024
+ms.date: 12/16/2024
 ms.topic: quickstart
 ms.custom: devx-track-dotnet, devx-track-dotnet-ai
 author: alexwolfmsft
@@ -16,11 +16,13 @@ The .NET ecosystem provides abstractions for integrating AI services into .NET a
 - How to work with AI abstractions in your apps and the benefits they offer.
 - Essential AI middleware concepts.
 
+For more information, see [Introduction to Microsoft.Extensions.AI](../core/extensions/artificial-intelligence.md).
+
 ## What is the Microsoft.Extensions.AI library?
 
 `Microsoft.Extensions.AI` is a set of core .NET libraries created in collaboration with developers across the .NET ecosystem, including Semantic Kernel. These libraries provide a unified layer of C# abstractions for interacting with AI services, such as small and large language models (SLMs and LLMs), embeddings, and middleware.
 
-:::image type="content" source="media/ai-extensions/meai-architecture-diagram.png" alt-text="An architectural diagram of the AI extensions libraries.":::
+:::image type="content" source="media/ai-extensions/meai-architecture-diagram.png" lightbox="media/ai-extensions/meai-architecture-diagram.png" alt-text="An architectural diagram of the AI extensions libraries.":::
 
 `Microsoft.Extensions.AI` provides abstractions that can be implemented by various services, all adhering to the same core concepts. This library is not intended to provide APIs tailored to any specific provider's services. The goal of `Microsoft.Extensions.AI` is to act as a unifying layer within the .NET ecosystem, enabling developers to choose their preferred frameworks and libraries while ensuring seamless integration and collaboration across the ecosystem.
 
diff --git a/docs/core/extensions/artificial-intelligence.md b/docs/core/extensions/artificial-intelligence.md
new file mode 100644
index 0000000000000..aa1e99669b102
--- /dev/null
+++ b/docs/core/extensions/artificial-intelligence.md
@@ -0,0 +1,632 @@
+---
+title: Artificial Intelligence in .NET (Preview)
+description: Learn how to use the Microsoft.Extensions.AI library to integrate and interact with various AI services in your .NET applications.
+author: IEvangelist
+ms.author: dapine
+ms.date: 12/16/2024
+---
+
+# Artificial Intelligence in .NET (Preview)
+
+With a growing variety of artificial intelligence (AI) services available, developers need a way to integrate and interact with these services in their .NET applications. The `Microsoft.Extensions.AI` library provides a unified approach for representing generative AI components, enabling seamless integration and interoperability with various AI services. This article introduces the library, its installation, and usage examples to help you get started.
+
+## Install the package
+
+To install the [📦 Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI) NuGet package, use either the .NET CLI or add a package reference directly to your C# project file:
+
+### [.NET CLI](#tab/dotnet-cli)
+
+```dotnetcli
+dotnet add package Microsoft.Extensions.AI --prelease
+```
+
+### [PackageReference](#tab/package-reference)
+
+```xml
+<PackageReference Include="Microsoft.Extensions.AI"
+                  Version="*" />
+```
+
+---
+
+For more information, see [dotnet add package](../tools/dotnet-add-package.md) or [Manage package dependencies in .NET applications](../tools/dependencies.md).
+
+## Usage examples
+
+The <xref:Microsoft.Extensions.AI.IChatClient> interface defines a client abstraction responsible for interacting with AI services that provide chat capabilities. It includes methods for sending and receiving messages with multi-modal content (text, images, audio, etc.), either as a complete set or streamed incrementally. Additionally, it provides metadata information about the client and allows retrieving strongly typed services.
+
+> [!IMPORTANT]
+> For more usage examples and real-world scenarios, see [AI for .NET developers](/dotnet/ai/).
+
+### The `IChatClient` interface
+
+The following sample implements `IChatClient` to show the general structure.
+
+:::code language="csharp" source="snippets/ai/ConsoleAI/SampleChatClient.cs":::
+
+You can find other concrete implementations of `IChatClient` in the following NuGet packages:
+
+- [📦 Microsoft.Extensions.AI.AzureAIInference](https://www.nuget.org/packages/Microsoft.Extensions.AI.AzureAIInference): Implementation backed by [Azure AI Model Inference API](/azure/ai-studio/reference/reference-model-inference-api).
+- [📦 Microsoft.Extensions.AI.Ollama](https://www.nuget.org/packages/Microsoft.Extensions.AI.Ollama): Implementation backed by [Ollama](https://ollama.com/).
+- [📦 Microsoft.Extensions.AI.OpenAI](https://www.nuget.org/packages/Microsoft.Extensions.AI.OpenAI): Implementation backed by either [OpenAI](https://openai.com/) or OpenAI-compatible endpoints (such as [Azure OpenAI](https://azure.microsoft.com/products/ai-services/openai-service)).
+
+#### Request chat completion
+
+To request a completion, call the <xref:Microsoft.Extensions.AI.IChatClient.CompleteAsync*?displayProperty=nameWithType> method. The request is composed of one or more messages, each of which is composed of one or more pieces of content. Accelerator methods exist to simplify common cases, such as constructing a request for a single piece of text content.
+
+:::code language="csharp" source="snippets/ai/ConsoleAI/Program.cs":::
+
+The core `IChatClient.CompleteAsync` method accepts a list of messages. This list represents the history of all messages that are part of the conversation.
+
+```csharp
+using Microsoft.Extensions.AI;
+
+IChatClient client = new SampleChatClient(
+    new Uri("http://coolsite.ai"), "my-custom-model");
+
+Console.WriteLine(await client.CompleteAsync(
+[
+    new(ChatRole.System, "You are a helpful AI assistant"),
+    new(ChatRole.User, "What is AI?"),
+]));
+```
+
+Each message in the history is represented by a <xref:Microsoft.Extensions.AI.ChatMessage> object. The `ChatMessage` class provides a <xref:Microsoft.Extensions.AI.ChatMessage.Role?displayProperty=nameWithType> property that indicates the role of the message. By default, the <xref:Microsoft.Extensions.AI.ChatRole.User?displayProperty=nameWithType> is used. The following roles are available:
+
+- <xref:Microsoft.Extensions.AI.ChatRole.Assistant?displayProperty=nameWithType>: Instructs or sets the behavior of the assistant.
+- <xref:Microsoft.Extensions.AI.ChatRole.System?displayProperty=nameWithType>: Provides responses to system-instructed, user-prompted input.
+- <xref:Microsoft.Extensions.AI.ChatRole.Tool?displayProperty=nameWithType>: Provides additional information and references for chat completions.
+- <xref:Microsoft.Extensions.AI.ChatRole.User?displayProperty=nameWithType>: Provides input for chat completions.
+
+Each chat message is instantiated, assigning to its <xref:Microsoft.Extensions.AI.ChatMessage.Contents> property—a new <xref:Microsoft.Extensions.AI.TextContent>. There are various [types of content](xref:Microsoft.Extensions.AI.AIContent) that may be represented, such as a simple string, or it may be a more complex object that represents a multi-modal message with text, images, audio, etc.:
+
+- <xref:Microsoft.Extensions.AI.AudioContent>
+- <xref:Microsoft.Extensions.AI.DataContent>
+- <xref:Microsoft.Extensions.AI.FunctionCallContent>
+- <xref:Microsoft.Extensions.AI.FunctionResultContent>
+- <xref:Microsoft.Extensions.AI.ImageContent>
+- <xref:Microsoft.Extensions.AI.TextContent>
+- <xref:Microsoft.Extensions.AI.UsageContent>
+
+#### Request chat completion with streaming
+
+The inputs to <xref:Microsoft.Extensions.AI.IChatClient.CompleteStreamingAsync*?displayProperty=nameWithType> are identical to those of `CompleteAsync`. However, rather than returning the complete response as part of a <xref:Microsoft.Extensions.AI.ChatCompletion> object, the method returns an <xref:System.Collections.Generic.IAsyncEnumerable`1> where `T` is <xref:Microsoft.Extensions.AI.StreamingChatCompletionUpdate>, providing a stream of updates that collectively form the single response.
+
+```csharp
+using Microsoft.Extensions.AI;
+
+IChatClient client = new SampleChatClient(
+    new Uri("http://coolsite.ai"), "my-custom-model");
+
+await foreach (var update in client.CompleteStreamingAsync("What is AI?"))
+{
+    Console.Write(update);
+}
+```
+
+> [!TIP]
+> Streaming APIs are nearly synonymous with AI user experiences. C# enables compelling scenarios with its `IAsyncEnumerable<T>` support, allowing for a natural and efficient way to stream data.
+
+#### Tool calling
+
+Some models and services support _tool calling_, where requests can include tools for the model to invoke functions to gather additional information. Instead of sending a final response, the model requests a function invocation with specific arguments. The client then invokes the function and sends the results back to the model along with the conversation history. The `Microsoft.Extensions.AI` library includes abstractions for various message content types, including function call requests and results. While consumers can interact with this content directly, `Microsoft.Extensions.AI` automates these interactions and provides:
+
+- <xref:Microsoft.Extensions.AI.AIFunction>: Represents a function that can be described to an AI service and invoked.
+- <xref:Microsoft.Extensions.AI.AIFunctionFactory>: Provides factory methods for creating commonly used implementations of `AIFunction`.
+- <xref:Microsoft.Extensions.AI.FunctionInvokingChatClient>: Wraps an `IChatClient` to add automatic function invocation capabilities.
+
+Consider the following example, that demonstrates a random function invocation:
+
+```csharp
+using System.ComponentModel;
+using Microsoft.Extensions.AI;
+
+[Description("Gets the current weather")]
+string GetCurrentWeather() => Random.Shared.NextDouble() > 0.5
+    ? "It's sunny"
+    : "It's raining";
+
+IChatClient client = new ChatClientBuilder(
+        new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
+    .UseFunctionInvocation()
+    .Build();
+
+var response = client.CompleteStreamingAsync(
+    "Should I wear a rain coat?",
+    new() { Tools = [ AIFunctionFactory.Create(GetCurrentWeather) ] });
+
+await foreach (var update in response)
+{
+    Console.Write(update);
+}
+```
+
+The preceding code:
+
+- Defines a function named `GetCurrentWeather` that returns a random weather forecast.
+  - This function is decorated with a `Description` attribute, which is used to provide a description of the function to the AI service.
+- Instantiates a `ChatClientBuilder` with an `OllamaChatClient` and configures it to use function invocation.
+- Calls `CompleteStreamingAsync` on the client, passing a prompt and a list of tools that includes a function created with `AIFunctionFactory.Create`.
+- Iterates over the response, printing each update to the console.
+
+#### Cache responses
+
+If you're familiar with [Caching in .NET](caching.md), it's good to know that <xref:Microsoft.Extensions.AI> provides other such delegating `IChatClient` implementations. The <xref:Microsoft.Extensions.AI.DistributedCachingChatClient> is an `IChatClient` that layers caching around another arbitrary `IChatClient` instance. When a unique chat history is submitted to the `DistributedCachingChatClient`, it forwards it along to the underlying client, and then caches the response before it being sent back to the consumer. The next time the same history is submitted, such that a cached response can be found in the cache, the `DistributedCachingChatClient` can return back the cached response rather than needing to forward the request along the pipeline.
+
+```csharp
+using Microsoft.Extensions.AI;
+using Microsoft.Extensions.Caching.Distributed;
+using Microsoft.Extensions.Caching.Memory;
+using Microsoft.Extensions.Options;
+
+var sampleChatClient = new SampleChatClient(new Uri("http://coolsite.ai"), "my-custom-model");
+IChatClient client = new ChatClientBuilder(sampleChatClient)
+    .UseDistributedCache(new MemoryDistributedCache(Options.Create(new MemoryDistributedCacheOptions())))
+    .Build();
+
+string[] prompts = ["What is AI?", "What is .NET?", "What is AI?"];
+
+foreach (var prompt in prompts)
+{
+    await foreach (var update in client.CompleteStreamingAsync(prompt))
+    {
+        Console.Write(update);
+    }
+
+    Console.WriteLine();
+}
+```
+
+#### Use telemetry
+
+Another example of a delegating chat client is the <xref:Microsoft.Extensions.AI.OpenTelemetryChatClient>. This implementation adheres to the [OpenTelemetry Semantic Conventions for Generative AI systems](https://opentelemetry.io/docs/specs/semconv/gen-ai/). Similar to other `IChatClient` delegators, it layers metrics and spans around any underlying `IChatClient` implementation, providing enhanced observability.
+
+```csharp
+using Microsoft.Extensions.AI;
+using OpenTelemetry.Trace;
+
+// Configure OpenTelemetry exporter
+var sourceName = Guid.NewGuid().ToString();
+var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
+    .AddSource(sourceName)
+    .AddConsoleExporter()
+    .Build();
+
+var sampleChatClient = new SampleChatClient(
+    new Uri("http://coolsite.ai"), "my-custom-model");
+
+IChatClient client = new ChatClientBuilder(sampleChatClient)
+    .UseOpenTelemetry(sourceName, static c => c.EnableSensitiveData = true)
+    .Build();
+
+Console.WriteLine((await client.CompleteAsync("What is AI?")).Message);
+```
+
+#### Provide options
+
+Every call to <xref:Microsoft.Extensions.AI.IChatClient.CompleteAsync*> or <xref:Microsoft.Extensions.AI.IChatClient.CompleteStreamingAsync*> may optionally supply a <xref:Microsoft.Extensions.AI.ChatOptions> instance containing additional parameters for the operation. The most common parameters among AI models and services show up as strongly typed properties on the type, such as <xref:Microsoft.Extensions.AI.ChatOptions.Temperature?displayProperty=nameWithType>. Other parameters can be supplied by name in a weakly typed manner via the <xref:Microsoft.Extensions.AI.ChatOptions.AdditionalProperties?displayProperty=nameWithType> dictionary.
+
+Options may also be specified when building an `IChatClient` with the fluent <xref:Microsoft.Extensions.AI.ChatClientBuilder> API, and chaining a call to the `ConfigureOptions` extension method. This delegating client wraps another client and invokes the supplied delegate to populate a `ChatOptions` instance for every call. For example, to ensure that the <xref:Microsoft.Extensions.AI.ChatOptions.ModelId?displayProperty=nameWithType> property defaults to a particular model name, code like the following can be used:
+
+```csharp
+using Microsoft.Extensions.AI;
+
+IChatClient client = new ChatClientBuilder(
+        new OllamaChatClient(new Uri("http://localhost:11434")))
+    .ConfigureOptions(options => options.ModelId ??= "phi3")
+    .Build();
+
+// will request "phi3"
+Console.WriteLine(await client.CompleteAsync("What is AI?"));
+
+// will request "llama3.1"
+Console.WriteLine(await client.CompleteAsync("What is AI?", new() { ModelId = "llama3.1" }));
+```
+
+#### Functionality pipelines
+
+`IChatClient` instances can be layered to create a pipeline of components, each adding specific functionality. These components can come from `Microsoft.Extensions.AI`, other NuGet packages, or custom implementations. This approach allows you to augment the behavior of the `IChatClient` in various ways to meet your specific needs. Consider the following example code that layers a distributed cache, function invocation, and OpenTelemetry tracing around a sample chat client:
+
+```csharp
+using Microsoft.Extensions.AI;
+using Microsoft.Extensions.Caching.Distributed;
+using Microsoft.Extensions.Caching.Memory;
+using Microsoft.Extensions.Options;
+using OpenTelemetry.Trace;
+
+// Configure OpenTelemetry exporter
+var sourceName = Guid.NewGuid().ToString();
+var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
+    .AddSource(sourceName)
+    .AddConsoleExporter()
+    .Build();
+
+// Explore changing the order of the intermediate "Use" calls to see that impact
+// that has on what gets cached, traced, etc.
+IChatClient client = new ChatClientBuilder(
+        new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
+    .UseDistributedCache(new MemoryDistributedCache(Options.Create(new MemoryDistributedCacheOptions())))
+    .UseFunctionInvocation()
+    .UseOpenTelemetry(sourceName, static c => c.EnableSensitiveData = true)
+    .Build();
+
+ChatOptions options = new()
+{
+    Tools =
+    [
+        AIFunctionFactory.Create(
+            () => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining",
+            name: "GetCurrentWeather", 
+            description: "Gets the current weather")
+    ]
+};
+
+for (int i = 0; i < 3; ++i)
+{
+    List<ChatMessage> history =
+    [
+        new ChatMessage(ChatRole.System, "You are a helpful AI assistant"),
+        new ChatMessage(ChatRole.User, "Do I need an umbrella?")
+    ];
+
+    Console.WriteLine(await client.CompleteAsync(history, options));
+}
+```
+
+#### Custom `IChatClient` middleware
+
+To add additional functionality, you can implement `IChatClient` directly or use the <xref:Microsoft.Extensions.AI.DelegatingChatClient> class. This class serves as a base for creating chat clients that delegate operations to another `IChatClient` instance. It simplifies chaining multiple clients, allowing calls to pass through to an underlying client.
+
+The `DelegatingChatClient` class provides default implementations for methods like `CompleteAsync`, `CompleteStreamingAsync`, and `Dispose`, which forward calls to the inner client. You can derive from this class and override only the methods you need to enhance behavior, while delegating other calls to the base implementation. This approach helps create flexible and modular chat clients that are easy to extend and compose.
+
+The following is an example class derived from `DelegatingChatClient` to provide rate limiting functionality, utilizing the <xref:System.Threading.RateLimiting.RateLimiter>:
+
+```csharp
+using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+public sealed class RateLimitingChatClient(
+    IChatClient innerClient, RateLimiter rateLimiter) 
+        : DelegatingChatClient(innerClient)
+{
+    public override async Task<ChatCompletion> CompleteAsync(
+        IList<ChatMessage> chatMessages,
+        ChatOptions? options = null,
+        CancellationToken cancellationToken = default)
+    {
+        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+            .ConfigureAwait(false);
+
+        if (!lease.IsAcquired)
+        {
+            throw new InvalidOperationException("Unable to acquire lease.");
+        }
+
+        return await base.CompleteAsync(chatMessages, options, cancellationToken)
+            .ConfigureAwait(false);
+    }
+
+    public override async IAsyncEnumerable<StreamingChatCompletionUpdate> CompleteStreamingAsync(
+        IList<ChatMessage> chatMessages,
+        ChatOptions? options = null,
+        [EnumeratorCancellation] CancellationToken cancellationToken = default)
+    {
+        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+            .ConfigureAwait(false);
+
+        if (!lease.IsAcquired)
+        {
+            throw new InvalidOperationException("Unable to acquire lease.");
+        }
+
+        await foreach (var update in base.CompleteStreamingAsync(chatMessages, options, cancellationToken)
+            .ConfigureAwait(false))
+        {
+            yield return update;
+        }
+    }
+
+    protected override void Dispose(bool disposing)
+    {
+        if (disposing)
+        {
+            rateLimiter.Dispose();
+        }
+
+        base.Dispose(disposing);
+    }
+}
+```
+
+Composition of the `RateLimitingChatClient` with another client is straightforward:
+
+```csharp
+using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+var client = new RateLimitingChatClient(
+    new SampleChatClient(new Uri("http://localhost"), "test"),
+    new ConcurrencyLimiter(new() { PermitLimit = 1, QueueLimit = int.MaxValue }));
+
+await client.CompleteAsync("What color is the sky?");
+```
+
+To simplify the composition of such components with others, the author of the component is recommended to create a `Use*` extension method for registering this component into a pipeline, for example consider the following:
+
+```csharp
+public static class RateLimitingChatClientExtensions
+{
+    public static ChatClientBuilder UseRateLimiting(
+        this ChatClientBuilder builder, RateLimiter rateLimiter) =>
+        builder.Use(innerClient => new RateLimitingChatClient(innerClient, rateLimiter));
+}
+```
+
+Such extensions may also query for relevant services from the DI container; the <xref:System.IServiceProvider> used by the pipeline is passed in as an optional parameter:
+
+```csharp
+public static class RateLimitingChatClientExtensions
+{
+    public static ChatClientBuilder UseRateLimiting(
+        this ChatClientBuilder builder, RateLimiter? rateLimiter = null) =>
+        builder.Use((innerClient, services) => 
+            new RateLimitingChatClient(
+                innerClient,
+                rateLimiter ?? services.GetRequiredService<RateLimiter>()));
+}
+```
+
+The consumer can then easily use this in their pipeline, for example:
+
+```csharp
+var client = new SampleChatClient(new Uri("http://localhost"), "test")
+    .AsBuilder()
+    .UseDistributedCache()
+    .UseRateLimiting()
+    .UseOpenTelemetry()
+    .Build(services);
+```
+
+The preceding extension methods demonstrate using a `Use` method on <xref:Microsoft.Extensions.AI.ChatClientBuilder>. The `ChatClientBuilder` also provides <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use*> overloads that make it easier to write such delegating handlers.
+
+- <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use(Microsoft.Extensions.AI.IChatClient)>
+- <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use(System.Func{Microsoft.Extensions.AI.IChatClient,Microsoft.Extensions.AI.IChatClient})>
+- <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use(System.Func{System.IServiceProvider,Microsoft.Extensions.AI.IChatClient,Microsoft.Extensions.AI.IChatClient})>
+
+For example, in the earlier `RateLimitingChatClient` example, the overrides of `CompleteAsync` and `CompleteStreamingAsync` only need to do work before and after delegating to the next client in the pipeline. To achieve the same thing without writing a custom class, an overload of `Use` may be used that accepts a delegate which is used for both `CompleteAsync` and `CompleteStreamingAsync`, reducing the boilerplate required:
+
+```csharp
+RateLimiter rateLimiter = new ConcurrencyLimiter(new()
+{
+    PermitLimit = 1, 
+    QueueLimit = int.MaxValue
+});
+
+var client = new SampleChatClient(new Uri("http://localhost"), "test")
+    .AsBuilder()
+    .UseDistributedCache()
+    .Use(static async (chatMessages, options, nextAsync, cancellationToken) =>
+    {
+        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+            .ConfigureAwait(false);
+
+        if (!lease.IsAcquired)
+        {
+            throw new InvalidOperationException("Unable to acquire lease.");
+        }
+
+        await nextAsync(chatMessages, options, cancellationToken);
+    })
+    .UseOpenTelemetry()
+    .Build();
+```
+
+The preceding overload internally uses a `AnonymousDelegatingChatClient`, which enables more complicated patterns with only a little additional code. For example, to achieve the same as above but with the <xref:System.Threading.RateLimiting.RateLimiter> retrieved from DI:
+
+```csharp
+var client = new SampleChatClient(new Uri("http://localhost"), "test")
+    .AsBuilder()
+    .UseDistributedCache()
+    .Use(static (innerClient, services) =>
+    {
+        var rateLimiter = services.GetRequiredService<RateLimiter>();
+
+        return new AnonymousDelegatingChatClient(
+            innerClient, async (chatMessages, options, nextAsync, cancellationToken) =>
+        {
+            using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+                .ConfigureAwait(false);
+
+            if (!lease.IsAcquired)
+            {
+                throw new InvalidOperationException("Unable to acquire lease.");
+            }
+
+            await nextAsync(chatMessages, options, cancellationToken);
+        });
+    })
+    .UseOpenTelemetry()
+    .Build();
+```
+
+For scenarios where the developer would like to specify delegating implementations of `CompleteAsync` and `CompleteStreamingAsync` inline, and where it's important to be able to write a different implementation for each in order to handle their unique return types specially, another overload of `Use` exists that accepts a delegate for each.
+
+#### Dependency Injection
+
+<xref:Microsoft.Extensions.AI.IChatClient> implementations will typically be provided to an application via [dependency injection (DI)](dependency-injection.md). In this example, an <xref:Microsoft.Extensions.Caching.Distributed.IDistributedCache> is added into the DI container, as is an `IChatClient`. The registration for the `IChatClient` employs a builder that creates a pipeline containing a caching client (which will then use an `IDistributedCache` retrieved from DI) and the sample client. Elsewhere in the app, the injected `IChatClient` may be retrieved and used.
+
+```csharp
+using Microsoft.Extensions.AI;
+using Microsoft.Extensions.DependencyInjection;
+using Microsoft.Extensions.Hosting;
+
+// App Setup
+var builder = Host.CreateApplicationBuilder();
+
+builder.Services.AddDistributedMemoryCache();
+builder.Services.AddChatClient(new SampleChatClient(new Uri("http://coolsite.ai"), "my-custom-model"))
+    .UseDistributedCache();
+
+var host = builder.Build();
+
+// Elsewhere in the app
+var chatClient = host.Services.GetRequiredService<IChatClient>();
+
+Console.WriteLine(await chatClient.CompleteAsync("What is AI?"));
+```
+
+What instance and configuration is injected may differ based on the current needs of the application, and multiple pipelines may be injected with different keys.
+
+### The `IEmbeddingGenerator` interface
+
+The <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2> interface represents a generic generator of embeddings. Here, `TInput` is the type of input values being embedded, and `TEmbedding` is the type of generated embedding, which inherits from the <xref:Microsoft.Extensions.AI.Embedding> class.
+
+The `Embedding` class serves as a base class for embeddings generated by an `IEmbeddingGenerator`. It's designed to store and manage the metadata and data associated with embeddings. Derived types like `Embedding<T>` provide the concrete embedding vector data. For instance, an embedding exposes a <xref:Microsoft.Extensions.AI.Embedding`1.Vector?displayProperty=nameWithType> property to access its embedding data.
+
+The `IEmbeddingGenerator` interface defines a method to asynchronously generate embeddings for a collection of input values, with optional configuration and cancellation support. It also provides metadata describing the generator and allows for the retrieval of strongly typed services that may be provided by the generator or its underlying services.
+
+#### Sample implementation
+
+Consider the following sample implementation of an `IEmbeddingGenerator` to show the general structure but that just generates random embedding vectors.
+
+:::code language="csharp" source="snippets/ai/ConsoleAI/SampleEmbeddingGenerator.cs":::
+
+The preceding code:
+
+- Defines a class named `SampleEmbeddingGenerator` that implements the `IEmbeddingGenerator<string, Embedding<float>>` interface.
+- Its primary constructor accepts an endpoint and model ID, which are used to identify the generator.
+- Exposes a `Metadata` property that provides metadata about the generator.
+- Implements the `GenerateAsync` method to generate embeddings for a collection of input values:
+  - It simulates an asynchronous operation by delaying for 100 milliseconds.
+  - Returns random embeddings for each input value.
+
+You can find actual concrete implementations in the following packages:
+
+- [📦 Microsoft.Extensions.AI.OpenAI](https://www.nuget.org/packages/Microsoft.Extensions.AI.OpenAI)
+- [📦 Microsoft.Extensions.AI.Ollama](https://www.nuget.org/packages/Microsoft.Extensions.AI.Ollama)
+
+#### Create embeddings
+
+The primary operation performed with an <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2> is generating embeddings, which is accomplished with its <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2.GenerateAsync*> method.
+
+```csharp
+using Microsoft.Extensions.AI;
+
+IEmbeddingGenerator<string, Embedding<float>> generator =
+    new SampleEmbeddingGenerator(
+        new Uri("http://coolsite.ai"), "my-custom-model");
+
+foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
+{
+    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
+}
+```
+
+#### Custom `IEmbeddingGenerator` middleware
+
+As with `IChatClient`, `IEmbeddingGenerator` implementations may be layered. Just as `Microsoft.Extensions.AI` provides delegating implementations of `IChatClient` for caching and telemetry, it does so for `IEmbeddingGenerator` as well.
+
+```csharp
+using Microsoft.Extensions.AI;
+using Microsoft.Extensions.Caching.Distributed;
+using Microsoft.Extensions.Caching.Memory;
+using Microsoft.Extensions.Options;
+using OpenTelemetry.Trace;
+
+// Configure OpenTelemetry exporter
+var sourceName = Guid.NewGuid().ToString();
+var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
+    .AddSource(sourceName)
+    .AddConsoleExporter()
+    .Build();
+
+// Explore changing the order of the intermediate "Use" calls to see that impact
+// that has on what gets cached, traced, etc.
+var generator = new EmbeddingGeneratorBuilder<string, Embedding<float>>(
+        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "my-custom-model"))
+    .UseDistributedCache(
+        new MemoryDistributedCache(Options.Create(new MemoryDistributedCacheOptions())))
+    .UseOpenTelemetry(sourceName)
+    .Build();
+
+var embeddings = await generator.GenerateAsync(
+[
+    "What is AI?",
+    "What is .NET?",
+    "What is AI?"
+]);
+
+foreach (var embedding in embeddings)
+{
+    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
+}
+```
+
+The `IEmbeddingGenerator` enables building custom middleware that extends the functionality of an `IEmbeddingGenerator`. The <xref:Microsoft.Extensions.AI.DelegatingEmbeddingGenerator`2> class is an implementation of the `IEmbeddingGenerator<TInput, TEmbedding>` interface that serves as a base class for creating embedding generators which delegate their operations to another `IEmbeddingGenerator<TInput, TEmbedding>` instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as <xref:Microsoft.Extensions.AI.DelegatingEmbeddingGenerator`2.GenerateAsync*> and `Dispose`, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.
+
+The following is an example implementation of such a delegating embedding generator that rate limits embedding generation requests:
+
+```csharp
+using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+public class RateLimitingEmbeddingGenerator(
+    IEmbeddingGenerator<string, Embedding<float>> innerGenerator, RateLimiter rateLimiter) 
+        : DelegatingEmbeddingGenerator<string, Embedding<float>>(innerGenerator)
+{
+    public override async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
+        IEnumerable<string> values,
+        EmbeddingGenerationOptions? options = null,
+        CancellationToken cancellationToken = default)
+    {
+        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+            .ConfigureAwait(false);
+
+        if (!lease.IsAcquired)
+        {
+            throw new InvalidOperationException("Unable to acquire lease.");
+        }
+
+        return await base.GenerateAsync(values, options, cancellationToken);
+    }
+
+    protected override void Dispose(bool disposing)
+    {
+        if (disposing)
+        {
+            rateLimiter.Dispose();
+        }
+
+        base.Dispose(disposing);
+    }
+}
+```
+
+This can then be layered around an arbitrary `IEmbeddingGenerator<string, Embedding<float>>` to rate limit all embedding generation operations performed.
+
+```csharp
+using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+IEmbeddingGenerator<string, Embedding<float>> generator =
+    new RateLimitingEmbeddingGenerator(
+        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "my-custom-model"),
+        new ConcurrencyLimiter(new() { PermitLimit = 1, QueueLimit = int.MaxValue }));
+
+foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
+{
+    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
+}
+```
+
+In this way, the `RateLimitingEmbeddingGenerator` can be composed with other `IEmbeddingGenerator<string, Embedding<float>>` instances to provide rate limiting functionality.
+
+## See also
+
+- [Develop .NET applications with AI features](../../ai/get-started/dotnet-ai-overview.md)
+- [Unified AI building blocks for .NET using Microsoft.Extensions.AI](../../ai/ai-extensions.md)
+- [Build an AI chat app with .NET](../../ai/quickstarts/get-started-openai.md)
+- [.NET dependency injection](dependency-injection.md)
+- [Rate limit an HTTP handler in .NET](http-ratelimiter.md)
+- [.NET Generic Host](generic-host.md)
+- [Caching in .NET](caching.md)
diff --git a/docs/core/extensions/http-ratelimiter.md b/docs/core/extensions/http-ratelimiter.md
index d8634be350c84..04ebe462f9695 100644
--- a/docs/core/extensions/http-ratelimiter.md
+++ b/docs/core/extensions/http-ratelimiter.md
@@ -3,7 +3,7 @@ title: Rate limiting an HTTP handler in .NET
 description: Learn how to create a client-side HTTP handler that limits the number of requests, with the inbuilt rate limiter API from .NET.
 author: IEvangelist
 ms.author: dapine
-ms.date: 03/13/2023
+ms.date: 12/16/2024
 ---
 
 # Rate limit an HTTP handler in .NET
@@ -161,12 +161,12 @@ You'll notice that the first logged entries are always the immediately returned
 
 Note also that each URL's query string is unique: examine the `iteration` parameter to see that it's incremented by one for each request. This parameter helps to illustrate that the 429 responses aren't from the first requests, but rather from the requests that are made after the rate limit is reached. The 200 responses arrive later but these requests were made earlier&mdash;before the limit was reached.
 
-To have a better understanding of the various rate-limiting algorithms, try rewriting this code to accept a different `RateLimiter` implementation. In addition to the `TokenBucketRateLimiter` you could try:
+To have a better understanding of the various rate-limiting algorithms, try rewriting this code to accept a different <xref:System.Threading.RateLimiting.RateLimiter> implementation. In addition to the <xref:System.Threading.RateLimiting.TokenBucketRateLimiter> you could try:
 
-- `ConcurrencyLimiter`
-- `FixedWindowRateLimiter`
-- `PartitionedRateLimiter`
-- `SlidingWindowRateLimiter`
+- <xref:System.Threading.RateLimiting.ConcurrencyLimiter>
+- <xref:System.Threading.RateLimiting.FixedWindowRateLimiter>
+- <xref:System.Threading.RateLimiting.PartitionedRateLimiter>
+- <xref:System.Threading.RateLimiting.SlidingWindowRateLimiter>
 
 ## Summary
 
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI/ConsoleAI.csproj b/docs/core/extensions/snippets/ai/ConsoleAI/ConsoleAI.csproj
new file mode 100644
index 0000000000000..37684fe82e198
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI/ConsoleAI.csproj
@@ -0,0 +1,14 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.AI" Version="9.0.1-preview.1.24570.5" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI/Program.cs
new file mode 100644
index 0000000000000..6c3d58ae5772f
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI/Program.cs
@@ -0,0 +1,8 @@
+﻿using Microsoft.Extensions.AI;
+
+IChatClient client = new SampleChatClient(
+    new Uri("http://coolsite.ai"), "my-custom-model");
+
+var response = await client.CompleteAsync("What is AI?");
+
+Console.WriteLine(response.Message);
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI/SampleChatClient.cs b/docs/core/extensions/snippets/ai/ConsoleAI/SampleChatClient.cs
new file mode 100644
index 0000000000000..99e0fb033df9a
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI/SampleChatClient.cs
@@ -0,0 +1,58 @@
+﻿using System.Runtime.CompilerServices;
+using Microsoft.Extensions.AI;
+
+public sealed class SampleChatClient(Uri endpoint, string modelId) : IChatClient
+{
+    public ChatClientMetadata Metadata { get; } = new(nameof(SampleChatClient), endpoint, modelId);
+
+    public async Task<ChatCompletion> CompleteAsync(
+        IList<ChatMessage> chatMessages,
+        ChatOptions? options = null,
+        CancellationToken cancellationToken = default)
+    {
+        // Simulate some operation.
+        await Task.Delay(300, cancellationToken);
+
+        // Return a sample chat completion response randomly.
+        string[] responses =
+        [
+            "This is the first sample response.",
+            "Here is another example of a response message.",
+            "This is yet another response message."
+        ];
+
+        return new([new ChatMessage()
+        {
+            Role = ChatRole.Assistant,
+            Text = responses[Random.Shared.Next(responses.Length)],
+        }]);
+    }
+
+    public async IAsyncEnumerable<StreamingChatCompletionUpdate> CompleteStreamingAsync(
+        IList<ChatMessage> chatMessages,
+        ChatOptions? options = null,
+        [EnumeratorCancellation] CancellationToken cancellationToken = default)
+    {
+        // Simulate streaming by yielding messages one by one.
+        string[] words = ["This ", "is ", "the ", "response ", "for ", "the ", "request."];
+        foreach (string word in words)
+        {
+            // Simulate some operation.
+            await Task.Delay(100, cancellationToken);
+
+            // Yield the next message in the response.
+            yield return new StreamingChatCompletionUpdate
+            {
+                Role = ChatRole.Assistant,
+                Text = word,
+            };
+        }
+    }
+
+    public object? GetService(Type serviceType, object? serviceKey) => this;
+
+    public TService? GetService<TService>(object? key = null)
+        where TService : class => this as TService;
+
+    void IDisposable.Dispose() { }
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI/SampleEmbeddingGenerator.cs b/docs/core/extensions/snippets/ai/ConsoleAI/SampleEmbeddingGenerator.cs
new file mode 100644
index 0000000000000..8cf53982d2cb1
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI/SampleEmbeddingGenerator.cs
@@ -0,0 +1,35 @@
+﻿using Microsoft.Extensions.AI;
+
+public sealed class SampleEmbeddingGenerator(
+    Uri endpoint, string modelId)
+        : IEmbeddingGenerator<string, Embedding<float>>
+{
+    public EmbeddingGeneratorMetadata Metadata { get; } =
+        new(nameof(SampleEmbeddingGenerator), endpoint, modelId);
+
+    public async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
+        IEnumerable<string> values,
+        EmbeddingGenerationOptions? options = null,
+        CancellationToken cancellationToken = default)
+    {
+        // Simulate some async operation
+        await Task.Delay(100, cancellationToken);
+
+        // Create random embeddings
+        return
+        [
+            .. from value in values
+            select new Embedding<float>(
+                Enumerable.Range(0, 384)
+                          .Select(_ => Random.Shared.NextSingle())
+                          .ToArray())
+        ];
+    }
+
+    public object? GetService(Type serviceType, object? serviceKey) => this;
+
+    public TService? GetService<TService>(object? key = null)
+        where TService : class => this as TService;
+
+    void IDisposable.Dispose() { }
+}
diff --git a/docs/fundamentals/toc.yml b/docs/fundamentals/toc.yml
index 41d78e90e1aa4..bdc68869b8553 100644
--- a/docs/fundamentals/toc.yml
+++ b/docs/fundamentals/toc.yml
@@ -998,6 +998,9 @@ items:
             href: runtime-libraries/system-console.md
       - name: The System.Random class
         href: runtime-libraries/system-random.md
+      - name: Artificial Intelligence (AI)
+        displayName: microsoft.extensions.ai,ollama,ai,openai,azure inference,ichatclient
+        href: ../core/extensions/artificial-intelligence.md
       - name: Dependency injection
         items:
           - name: Overview

From 60dc38548ee4867485104134dc04cf80eff09653 Mon Sep 17 00:00:00 2001
From: David Pine <david.pine@microsoft.com>
Date: Mon, 16 Dec 2024 20:49:10 -0600
Subject: [PATCH 2/6] Apply suggestions from code review

Co-authored-by: Genevieve Warren <24882762+gewarren@users.noreply.github.com>
---
 .../extensions/artificial-intelligence.md     | 47 ++++++++++---------
 docs/fundamentals/toc.yml                     |  2 +-
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/docs/core/extensions/artificial-intelligence.md b/docs/core/extensions/artificial-intelligence.md
index aa1e99669b102..0adf1f5a41691 100644
--- a/docs/core/extensions/artificial-intelligence.md
+++ b/docs/core/extensions/artificial-intelligence.md
@@ -4,15 +4,16 @@ description: Learn how to use the Microsoft.Extensions.AI library to integrate a
 author: IEvangelist
 ms.author: dapine
 ms.date: 12/16/2024
+ms.collection: ce-skilling-ai-copilot
 ---
 
-# Artificial Intelligence in .NET (Preview)
+# Artificial intelligence in .NET (Preview)
 
-With a growing variety of artificial intelligence (AI) services available, developers need a way to integrate and interact with these services in their .NET applications. The `Microsoft.Extensions.AI` library provides a unified approach for representing generative AI components, enabling seamless integration and interoperability with various AI services. This article introduces the library, its installation, and usage examples to help you get started.
+With a growing variety of artificial intelligence (AI) services available, developers need a way to integrate and interact with these services in their .NET applications. The `Microsoft.Extensions.AI` library provides a unified approach for representing generative AI components, which enables seamless integration and interoperability with various AI services. This article introduces the library and provides installation instructions and usage examples to help you get started.
 
 ## Install the package
 
-To install the [📦 Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI) NuGet package, use either the .NET CLI or add a package reference directly to your C# project file:
+To install the [📦 Microsoft.Extensions.AI](https://www.nuget.org/packages/Microsoft.Extensions.AI) NuGet package, use the .NET CLI or add a package reference directly to your C# project file:
 
 ### [.NET CLI](#tab/dotnet-cli)
 
@@ -33,10 +34,10 @@ For more information, see [dotnet add package](../tools/dotnet-add-package.md) o
 
 ## Usage examples
 
-The <xref:Microsoft.Extensions.AI.IChatClient> interface defines a client abstraction responsible for interacting with AI services that provide chat capabilities. It includes methods for sending and receiving messages with multi-modal content (text, images, audio, etc.), either as a complete set or streamed incrementally. Additionally, it provides metadata information about the client and allows retrieving strongly typed services.
+The <xref:Microsoft.Extensions.AI.IChatClient> interface defines a client abstraction responsible for interacting with AI services that provide chat capabilities. It includes methods for sending and receiving messages with multi-modal content (such as text, images, and audio), either as a complete set or streamed incrementally. Additionally, it provides metadata information about the client and allows retrieving strongly typed services.
 
 > [!IMPORTANT]
-> For more usage examples and real-world scenarios, see [AI for .NET developers](/dotnet/ai/).
+> For more usage examples and real-world scenarios, see [AI for .NET developers](../../ai/index.yml).
 
 ### The `IChatClient` interface
 
@@ -78,7 +79,7 @@ Each message in the history is represented by a <xref:Microsoft.Extensions.AI.Ch
 - <xref:Microsoft.Extensions.AI.ChatRole.Tool?displayProperty=nameWithType>: Provides additional information and references for chat completions.
 - <xref:Microsoft.Extensions.AI.ChatRole.User?displayProperty=nameWithType>: Provides input for chat completions.
 
-Each chat message is instantiated, assigning to its <xref:Microsoft.Extensions.AI.ChatMessage.Contents> property—a new <xref:Microsoft.Extensions.AI.TextContent>. There are various [types of content](xref:Microsoft.Extensions.AI.AIContent) that may be represented, such as a simple string, or it may be a more complex object that represents a multi-modal message with text, images, audio, etc.:
+Each chat message is instantiated, assigning to its <xref:Microsoft.Extensions.AI.ChatMessage.Contents> property a new <xref:Microsoft.Extensions.AI.TextContent>. There are various [types of content](xref:Microsoft.Extensions.AI.AIContent) that can be represented, such as a simple string or a more complex object that represents a multi-modal message with text, images, and audio:
 
 - <xref:Microsoft.Extensions.AI.AudioContent>
 - <xref:Microsoft.Extensions.AI.DataContent>
@@ -115,7 +116,7 @@ Some models and services support _tool calling_, where requests can include tool
 - <xref:Microsoft.Extensions.AI.AIFunctionFactory>: Provides factory methods for creating commonly used implementations of `AIFunction`.
 - <xref:Microsoft.Extensions.AI.FunctionInvokingChatClient>: Wraps an `IChatClient` to add automatic function invocation capabilities.
 
-Consider the following example, that demonstrates a random function invocation:
+Consider the following example that demonstrates a random function invocation:
 
 ```csharp
 using System.ComponentModel;
@@ -151,7 +152,7 @@ The preceding code:
 
 #### Cache responses
 
-If you're familiar with [Caching in .NET](caching.md), it's good to know that <xref:Microsoft.Extensions.AI> provides other such delegating `IChatClient` implementations. The <xref:Microsoft.Extensions.AI.DistributedCachingChatClient> is an `IChatClient` that layers caching around another arbitrary `IChatClient` instance. When a unique chat history is submitted to the `DistributedCachingChatClient`, it forwards it along to the underlying client, and then caches the response before it being sent back to the consumer. The next time the same history is submitted, such that a cached response can be found in the cache, the `DistributedCachingChatClient` can return back the cached response rather than needing to forward the request along the pipeline.
+If you're familiar with [Caching in .NET](caching.md), it's good to know that <xref:Microsoft.Extensions.AI> provides other such delegating `IChatClient` implementations. The <xref:Microsoft.Extensions.AI.DistributedCachingChatClient> is an `IChatClient` that layers caching around another arbitrary `IChatClient` instance. When a unique chat history is submitted to the `DistributedCachingChatClient`, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same history is submitted, such that a cached response can be found in the cache, the `DistributedCachingChatClient` returns the cached response rather than needing to forward the request along the pipeline.
 
 ```csharp
 using Microsoft.Extensions.AI;
@@ -204,9 +205,9 @@ Console.WriteLine((await client.CompleteAsync("What is AI?")).Message);
 
 #### Provide options
 
-Every call to <xref:Microsoft.Extensions.AI.IChatClient.CompleteAsync*> or <xref:Microsoft.Extensions.AI.IChatClient.CompleteStreamingAsync*> may optionally supply a <xref:Microsoft.Extensions.AI.ChatOptions> instance containing additional parameters for the operation. The most common parameters among AI models and services show up as strongly typed properties on the type, such as <xref:Microsoft.Extensions.AI.ChatOptions.Temperature?displayProperty=nameWithType>. Other parameters can be supplied by name in a weakly typed manner via the <xref:Microsoft.Extensions.AI.ChatOptions.AdditionalProperties?displayProperty=nameWithType> dictionary.
+Every call to <xref:Microsoft.Extensions.AI.IChatClient.CompleteAsync*> or <xref:Microsoft.Extensions.AI.IChatClient.CompleteStreamingAsync*> can optionally supply a <xref:Microsoft.Extensions.AI.ChatOptions> instance containing additional parameters for the operation. The most common parameters among AI models and services show up as strongly typed properties on the type, such as <xref:Microsoft.Extensions.AI.ChatOptions.Temperature?displayProperty=nameWithType>. Other parameters can be supplied by name in a weakly typed manner via the <xref:Microsoft.Extensions.AI.ChatOptions.AdditionalProperties?displayProperty=nameWithType> dictionary.
 
-Options may also be specified when building an `IChatClient` with the fluent <xref:Microsoft.Extensions.AI.ChatClientBuilder> API, and chaining a call to the `ConfigureOptions` extension method. This delegating client wraps another client and invokes the supplied delegate to populate a `ChatOptions` instance for every call. For example, to ensure that the <xref:Microsoft.Extensions.AI.ChatOptions.ModelId?displayProperty=nameWithType> property defaults to a particular model name, code like the following can be used:
+You can also specify options when building an `IChatClient` with the fluent <xref:Microsoft.Extensions.AI.ChatClientBuilder> API and chaining a call to the `ConfigureOptions` extension method. This delegating client wraps another client and invokes the supplied delegate to populate a `ChatOptions` instance for every call. For example, to ensure that the <xref:Microsoft.Extensions.AI.ChatOptions.ModelId?displayProperty=nameWithType> property defaults to a particular model name, you can use code like the following:
 
 ```csharp
 using Microsoft.Extensions.AI;
@@ -351,7 +352,7 @@ var client = new RateLimitingChatClient(
 await client.CompleteAsync("What color is the sky?");
 ```
 
-To simplify the composition of such components with others, the author of the component is recommended to create a `Use*` extension method for registering this component into a pipeline, for example consider the following:
+To simplify the composition of such components with others, component authors should create a `Use*` extension method for registering the component into a pipeline. For example, consider the following extension method:
 
 ```csharp
 public static class RateLimitingChatClientExtensions
@@ -362,7 +363,7 @@ public static class RateLimitingChatClientExtensions
 }
 ```
 
-Such extensions may also query for relevant services from the DI container; the <xref:System.IServiceProvider> used by the pipeline is passed in as an optional parameter:
+Such extensions can also query for relevant services from the DI container; the <xref:System.IServiceProvider> used by the pipeline is passed in as an optional parameter:
 
 ```csharp
 public static class RateLimitingChatClientExtensions
@@ -393,7 +394,7 @@ The preceding extension methods demonstrate using a `Use` method on <xref:Micros
 - <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use(System.Func{Microsoft.Extensions.AI.IChatClient,Microsoft.Extensions.AI.IChatClient})>
 - <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use(System.Func{System.IServiceProvider,Microsoft.Extensions.AI.IChatClient,Microsoft.Extensions.AI.IChatClient})>
 
-For example, in the earlier `RateLimitingChatClient` example, the overrides of `CompleteAsync` and `CompleteStreamingAsync` only need to do work before and after delegating to the next client in the pipeline. To achieve the same thing without writing a custom class, an overload of `Use` may be used that accepts a delegate which is used for both `CompleteAsync` and `CompleteStreamingAsync`, reducing the boilerplate required:
+For example, in the earlier `RateLimitingChatClient` example, the overrides of `CompleteAsync` and `CompleteStreamingAsync` only need to do work before and after delegating to the next client in the pipeline. To achieve the same thing without writing a custom class, you can use an overload of `Use` that accepts a delegate that's used for both `CompleteAsync` and `CompleteStreamingAsync`, reducing the boilerplate required:
 
 ```csharp
 RateLimiter rateLimiter = new ConcurrencyLimiter(new()
@@ -421,7 +422,7 @@ var client = new SampleChatClient(new Uri("http://localhost"), "test")
     .Build();
 ```
 
-The preceding overload internally uses a `AnonymousDelegatingChatClient`, which enables more complicated patterns with only a little additional code. For example, to achieve the same as above but with the <xref:System.Threading.RateLimiting.RateLimiter> retrieved from DI:
+The preceding overload internally uses an `AnonymousDelegatingChatClient`, which enables more complicated patterns with only a little additional code. For example, to achieve the same result but with the <xref:System.Threading.RateLimiting.RateLimiter> retrieved from DI:
 
 ```csharp
 var client = new SampleChatClient(new Uri("http://localhost"), "test")
@@ -451,9 +452,9 @@ var client = new SampleChatClient(new Uri("http://localhost"), "test")
 
 For scenarios where the developer would like to specify delegating implementations of `CompleteAsync` and `CompleteStreamingAsync` inline, and where it's important to be able to write a different implementation for each in order to handle their unique return types specially, another overload of `Use` exists that accepts a delegate for each.
 
-#### Dependency Injection
+#### Dependency injection
 
-<xref:Microsoft.Extensions.AI.IChatClient> implementations will typically be provided to an application via [dependency injection (DI)](dependency-injection.md). In this example, an <xref:Microsoft.Extensions.Caching.Distributed.IDistributedCache> is added into the DI container, as is an `IChatClient`. The registration for the `IChatClient` employs a builder that creates a pipeline containing a caching client (which will then use an `IDistributedCache` retrieved from DI) and the sample client. Elsewhere in the app, the injected `IChatClient` may be retrieved and used.
+<xref:Microsoft.Extensions.AI.IChatClient> implementations will typically be provided to an application via [dependency injection (DI)](dependency-injection.md). In this example, an <xref:Microsoft.Extensions.Caching.Distributed.IDistributedCache> is added into the DI container, as is an `IChatClient`. The registration for the `IChatClient` employs a builder that creates a pipeline containing a caching client (which will then use an `IDistributedCache` retrieved from DI) and the sample client. The injected `IChatClient` can be retrieved and used elsewhere in the app.
 
 ```csharp
 using Microsoft.Extensions.AI;
@@ -475,7 +476,7 @@ var chatClient = host.Services.GetRequiredService<IChatClient>();
 Console.WriteLine(await chatClient.CompleteAsync("What is AI?"));
 ```
 
-What instance and configuration is injected may differ based on the current needs of the application, and multiple pipelines may be injected with different keys.
+What instance and configuration is injected can differ based on the current needs of the application, and multiple pipelines can be injected with different keys.
 
 ### The `IEmbeddingGenerator` interface
 
@@ -483,7 +484,7 @@ The <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2> interface represents a
 
 The `Embedding` class serves as a base class for embeddings generated by an `IEmbeddingGenerator`. It's designed to store and manage the metadata and data associated with embeddings. Derived types like `Embedding<T>` provide the concrete embedding vector data. For instance, an embedding exposes a <xref:Microsoft.Extensions.AI.Embedding`1.Vector?displayProperty=nameWithType> property to access its embedding data.
 
-The `IEmbeddingGenerator` interface defines a method to asynchronously generate embeddings for a collection of input values, with optional configuration and cancellation support. It also provides metadata describing the generator and allows for the retrieval of strongly typed services that may be provided by the generator or its underlying services.
+The `IEmbeddingGenerator` interface defines a method to asynchronously generate embeddings for a collection of input values, with optional configuration and cancellation support. It also provides metadata describing the generator and allows for the retrieval of strongly typed services that can be provided by the generator or its underlying services.
 
 #### Sample implementation
 
@@ -494,10 +495,10 @@ Consider the following sample implementation of an `IEmbeddingGenerator` to show
 The preceding code:
 
 - Defines a class named `SampleEmbeddingGenerator` that implements the `IEmbeddingGenerator<string, Embedding<float>>` interface.
-- Its primary constructor accepts an endpoint and model ID, which are used to identify the generator.
+- Has a primary constructor that accepts an endpoint and model ID, which are used to identify the generator.
 - Exposes a `Metadata` property that provides metadata about the generator.
 - Implements the `GenerateAsync` method to generate embeddings for a collection of input values:
-  - It simulates an asynchronous operation by delaying for 100 milliseconds.
+  - Simulates an asynchronous operation by delaying for 100 milliseconds.
   - Returns random embeddings for each input value.
 
 You can find actual concrete implementations in the following packages:
@@ -507,7 +508,7 @@ You can find actual concrete implementations in the following packages:
 
 #### Create embeddings
 
-The primary operation performed with an <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2> is generating embeddings, which is accomplished with its <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2.GenerateAsync*> method.
+The primary operation performed with an <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2> is embedding generation, which is accomplished with its <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2.GenerateAsync*> method.
 
 ```csharp
 using Microsoft.Extensions.AI;
@@ -524,7 +525,7 @@ foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is
 
 #### Custom `IEmbeddingGenerator` middleware
 
-As with `IChatClient`, `IEmbeddingGenerator` implementations may be layered. Just as `Microsoft.Extensions.AI` provides delegating implementations of `IChatClient` for caching and telemetry, it does so for `IEmbeddingGenerator` as well.
+As with `IChatClient`, `IEmbeddingGenerator` implementations can be layered. Just as `Microsoft.Extensions.AI` provides delegating implementations of `IChatClient` for caching and telemetry, it provides an implementation for `IEmbeddingGenerator` as well.
 
 ```csharp
 using Microsoft.Extensions.AI;
@@ -562,7 +563,7 @@ foreach (var embedding in embeddings)
 }
 ```
 
-The `IEmbeddingGenerator` enables building custom middleware that extends the functionality of an `IEmbeddingGenerator`. The <xref:Microsoft.Extensions.AI.DelegatingEmbeddingGenerator`2> class is an implementation of the `IEmbeddingGenerator<TInput, TEmbedding>` interface that serves as a base class for creating embedding generators which delegate their operations to another `IEmbeddingGenerator<TInput, TEmbedding>` instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as <xref:Microsoft.Extensions.AI.DelegatingEmbeddingGenerator`2.GenerateAsync*> and `Dispose`, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.
+The `IEmbeddingGenerator` enables building custom middleware that extends the functionality of an `IEmbeddingGenerator`. The <xref:Microsoft.Extensions.AI.DelegatingEmbeddingGenerator`2> class is an implementation of the `IEmbeddingGenerator<TInput, TEmbedding>` interface that serves as a base class for creating embedding generators that delegate their operations to another `IEmbeddingGenerator<TInput, TEmbedding>` instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as <xref:Microsoft.Extensions.AI.DelegatingEmbeddingGenerator`2.GenerateAsync*> and `Dispose`, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.
 
 The following is an example implementation of such a delegating embedding generator that rate limits embedding generation requests:
 
diff --git a/docs/fundamentals/toc.yml b/docs/fundamentals/toc.yml
index bdc68869b8553..f3c21c755d52a 100644
--- a/docs/fundamentals/toc.yml
+++ b/docs/fundamentals/toc.yml
@@ -998,7 +998,7 @@ items:
             href: runtime-libraries/system-console.md
       - name: The System.Random class
         href: runtime-libraries/system-random.md
-      - name: Artificial Intelligence (AI)
+      - name: Artificial intelligence (AI)
         displayName: microsoft.extensions.ai,ollama,ai,openai,azure inference,ichatclient
         href: ../core/extensions/artificial-intelligence.md
       - name: Dependency injection

From b40a75720724b7747d6bada2bbd64600806f43d0 Mon Sep 17 00:00:00 2001
From: David Pine <david.pine.7@gmail.com>
Date: Tue, 17 Dec 2024 08:13:34 -0600
Subject: [PATCH 3/6] Added a mock TOC (or in article nav)

---
 .../extensions/artificial-intelligence.md     | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/docs/core/extensions/artificial-intelligence.md b/docs/core/extensions/artificial-intelligence.md
index 0adf1f5a41691..ebcaaf8b89392 100644
--- a/docs/core/extensions/artificial-intelligence.md
+++ b/docs/core/extensions/artificial-intelligence.md
@@ -3,7 +3,7 @@ title: Artificial Intelligence in .NET (Preview)
 description: Learn how to use the Microsoft.Extensions.AI library to integrate and interact with various AI services in your .NET applications.
 author: IEvangelist
 ms.author: dapine
-ms.date: 12/16/2024
+ms.date: 12/17/2024
 ms.collection: ce-skilling-ai-copilot
 ---
 
@@ -39,6 +39,23 @@ The <xref:Microsoft.Extensions.AI.IChatClient> interface defines a client abstra
 > [!IMPORTANT]
 > For more usage examples and real-world scenarios, see [AI for .NET developers](../../ai/index.yml).
 
+**In this section**
+
+- [The `IChatClient` interface](#the-ichatclient-interface)
+  - [Request chat completion](#request-chat-completion)
+  - [Request chat completion with streaming](#request-chat-completion-with-streaming)
+  - [Tool calling](#tool-calling)
+  - [Cache responses](#cache-responses)
+  - [Use telemetry](#use-telemetry)
+  - [Provide options](#provide-options)
+  - [Functionality pipelines](#functionality-pipelines)
+  - [Custom `IChatClient` middleware](#custom-ichatclient-middleware)
+  - [Dependency injection](#dependency-injection)
+- [The `IEmbeddingGenerator` interface](#the-iembeddinggenerator-interface)
+  - [Sample implementation](#sample-implementation)
+  - [Create embeddings](#create-embeddings)
+  - [Custom `IEmbeddingGenerator` middleware](#custom-iembeddinggenerator-middleware)
+
 ### The `IChatClient` interface
 
 The following sample implements `IChatClient` to show the general structure.

From 87e05e25957fa4b2a182cff80b184cef46ff9874 Mon Sep 17 00:00:00 2001
From: David Pine <david.pine.7@gmail.com>
Date: Tue, 17 Dec 2024 11:03:14 -0600
Subject: [PATCH 4/6] Add all the snippets

---
 .../extensions/artificial-intelligence.md     | 442 ++----------------
 .../snippets/ai/AI.Shared/AI.Shared.csproj    |  14 +
 .../ai/AI.Shared/RateLimitingChatClient.cs    |  55 +++
 ...ngChatClientExtensions.OptionalOverload.cs |  17 +
 .../RateLimitingChatClientExtensions.cs       |  13 +
 .../RateLimitingEmbeddingGenerator.cs         |  33 ++
 .../SampleChatClient.cs                       |   0
 .../SampleEmbeddingGenerator.cs               |   0
 .../ConsoleAI.CacheResponses.csproj           |  18 +
 .../ai/ConsoleAI.CacheResponses/Program.cs    |  24 +
 .../ConsoleAI.CompleteAsyncArgs.csproj        |  14 +
 .../ai/ConsoleAI.CompleteAsyncArgs/Program.cs |  10 +
 .../ConsoleAI.CompleteStreamingAsync.csproj   |  14 +
 .../Program.cs                                |   9 +
 .../ConsoleAI.ConsumeClientMiddleware.csproj  |  15 +
 .../Program.cs                                |  26 ++
 ...soleAI.ConsumeRateLimitingEmbedding.csproj |  14 +
 .../Program.cs                                |  16 +
 .../ConsoleAI.CreateEmbeddings.csproj         |  14 +
 .../ai/ConsoleAI.CreateEmbeddings/Program.cs  |  10 +
 .../ConsoleAI.CustomClientMiddle.csproj       |  18 +
 .../ConsoleAI.CustomClientMiddle/Program.cs   |  12 +
 .../ConsoleAI.CustomEmbeddingsMiddle.csproj   |  16 +
 .../Program.cs                                |  34 ++
 .../ConsoleAI.DependencyInjection.csproj      |  16 +
 .../ConsoleAI.DependencyInjection/Program.cs  |  20 +
 .../ConsoleAI.FunctionalityPipelines.csproj   |  20 +
 .../Program.cs                                |  46 ++
 .../ConsoleAI.ProvideOptions.csproj           |  18 +
 .../ai/ConsoleAI.ProvideOptions/Program.cs    |  13 +
 .../ConsoleAI.ToolCalling.csproj              |  18 +
 .../ai/ConsoleAI.ToolCalling/Program.cs       |  21 +
 .../ConsoleAI.UseExample.csproj               |  14 +
 .../ai/ConsoleAI.UseExample/Program.cs        |  28 ++
 .../ConsoleAI.UseExampleAlt.csproj            |  14 +
 .../ai/ConsoleAI.UseExampleAlt/Program.cs     |  27 ++
 .../ConsoleAI.UseTelemetry.csproj             |  18 +
 .../ai/ConsoleAI.UseTelemetry/Program.cs      |  20 +
 .../snippets/ai/ConsoleAI/ConsoleAI.csproj    |   4 +
 .../snippets/ai/ConsoleAI/Program.cs          |   2 +-
 40 files changed, 731 insertions(+), 406 deletions(-)
 create mode 100644 docs/core/extensions/snippets/ai/AI.Shared/AI.Shared.csproj
 create mode 100644 docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClient.cs
 create mode 100644 docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClientExtensions.OptionalOverload.cs
 create mode 100644 docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClientExtensions.cs
 create mode 100644 docs/core/extensions/snippets/ai/AI.Shared/RateLimitingEmbeddingGenerator.cs
 rename docs/core/extensions/snippets/ai/{ConsoleAI => AI.Shared}/SampleChatClient.cs (100%)
 rename docs/core/extensions/snippets/ai/{ConsoleAI => AI.Shared}/SampleEmbeddingGenerator.cs (100%)
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CacheResponses/ConsoleAI.CacheResponses.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CacheResponses/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CompleteAsyncArgs/ConsoleAI.CompleteAsyncArgs.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CompleteAsyncArgs/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CompleteStreamingAsync/ConsoleAI.CompleteStreamingAsync.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CompleteStreamingAsync/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.ConsumeClientMiddleware/ConsoleAI.ConsumeClientMiddleware.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.ConsumeClientMiddleware/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/ConsoleAI.ConsumeRateLimitingEmbedding.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CreateEmbeddings/ConsoleAI.CreateEmbeddings.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CreateEmbeddings/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CustomClientMiddle/ConsoleAI.CustomClientMiddle.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CustomClientMiddle/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/ConsoleAI.CustomEmbeddingsMiddle.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.DependencyInjection/ConsoleAI.DependencyInjection.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.DependencyInjection/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.FunctionalityPipelines/ConsoleAI.FunctionalityPipelines.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.FunctionalityPipelines/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.ProvideOptions/ConsoleAI.ProvideOptions.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.ProvideOptions/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.ToolCalling/ConsoleAI.ToolCalling.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.ToolCalling/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.UseExample/ConsoleAI.UseExample.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.UseExample/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.UseExampleAlt/ConsoleAI.UseExampleAlt.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.UseExampleAlt/Program.cs
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.UseTelemetry/ConsoleAI.UseTelemetry.csproj
 create mode 100644 docs/core/extensions/snippets/ai/ConsoleAI.UseTelemetry/Program.cs

diff --git a/docs/core/extensions/artificial-intelligence.md b/docs/core/extensions/artificial-intelligence.md
index ebcaaf8b89392..22a1b45abb093 100644
--- a/docs/core/extensions/artificial-intelligence.md
+++ b/docs/core/extensions/artificial-intelligence.md
@@ -76,18 +76,7 @@ To request a completion, call the <xref:Microsoft.Extensions.AI.IChatClient.Comp
 
 The core `IChatClient.CompleteAsync` method accepts a list of messages. This list represents the history of all messages that are part of the conversation.
 
-```csharp
-using Microsoft.Extensions.AI;
-
-IChatClient client = new SampleChatClient(
-    new Uri("http://coolsite.ai"), "my-custom-model");
-
-Console.WriteLine(await client.CompleteAsync(
-[
-    new(ChatRole.System, "You are a helpful AI assistant"),
-    new(ChatRole.User, "What is AI?"),
-]));
-```
+:::code language="csharp" source="snippets/ai/ConsoleAI.CompleteAsyncArgs/Program.cs":::
 
 Each message in the history is represented by a <xref:Microsoft.Extensions.AI.ChatMessage> object. The `ChatMessage` class provides a <xref:Microsoft.Extensions.AI.ChatMessage.Role?displayProperty=nameWithType> property that indicates the role of the message. By default, the <xref:Microsoft.Extensions.AI.ChatRole.User?displayProperty=nameWithType> is used. The following roles are available:
 
@@ -110,17 +99,7 @@ Each chat message is instantiated, assigning to its <xref:Microsoft.Extensions.A
 
 The inputs to <xref:Microsoft.Extensions.AI.IChatClient.CompleteStreamingAsync*?displayProperty=nameWithType> are identical to those of `CompleteAsync`. However, rather than returning the complete response as part of a <xref:Microsoft.Extensions.AI.ChatCompletion> object, the method returns an <xref:System.Collections.Generic.IAsyncEnumerable`1> where `T` is <xref:Microsoft.Extensions.AI.StreamingChatCompletionUpdate>, providing a stream of updates that collectively form the single response.
 
-```csharp
-using Microsoft.Extensions.AI;
-
-IChatClient client = new SampleChatClient(
-    new Uri("http://coolsite.ai"), "my-custom-model");
-
-await foreach (var update in client.CompleteStreamingAsync("What is AI?"))
-{
-    Console.Write(update);
-}
-```
+:::code language="csharp" source="snippets/ai/ConsoleAI.CompleteStreamingAsync/Program.cs":::
 
 > [!TIP]
 > Streaming APIs are nearly synonymous with AI user experiences. C# enables compelling scenarios with its `IAsyncEnumerable<T>` support, allowing for a natural and efficient way to stream data.
@@ -135,90 +114,33 @@ Some models and services support _tool calling_, where requests can include tool
 
 Consider the following example that demonstrates a random function invocation:
 
-```csharp
-using System.ComponentModel;
-using Microsoft.Extensions.AI;
+:::code language="csharp" source="snippets/ai/ConsoleAI.ToolCalling/Program.cs":::
 
-[Description("Gets the current weather")]
-string GetCurrentWeather() => Random.Shared.NextDouble() > 0.5
-    ? "It's sunny"
-    : "It's raining";
-
-IChatClient client = new ChatClientBuilder(
-        new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
-    .UseFunctionInvocation()
-    .Build();
-
-var response = client.CompleteStreamingAsync(
-    "Should I wear a rain coat?",
-    new() { Tools = [ AIFunctionFactory.Create(GetCurrentWeather) ] });
-
-await foreach (var update in response)
-{
-    Console.Write(update);
-}
-```
+The preceding example depends on the [📦 Microsoft.Extensions.AI.Ollama](https://www.nuget.org/packages/Microsoft.Extensions.AI.Ollama) NuGet package.
 
 The preceding code:
 
 - Defines a function named `GetCurrentWeather` that returns a random weather forecast.
-  - This function is decorated with a `Description` attribute, which is used to provide a description of the function to the AI service.
-- Instantiates a `ChatClientBuilder` with an `OllamaChatClient` and configures it to use function invocation.
-- Calls `CompleteStreamingAsync` on the client, passing a prompt and a list of tools that includes a function created with `AIFunctionFactory.Create`.
+  - This function is decorated with a <xref:System.ComponentModel.DescriptionAttribute>, which is used to provide a description of the function to the AI service.
+- Instantiates a <xref:Microsoft.Extensions.AI.ChatClientBuilder> with an <xref:Microsoft.Extensions.AI.OllamaChatClient> and configures it to use function invocation.
+- Calls `CompleteStreamingAsync` on the client, passing a prompt and a list of tools that includes a function created with <xref:Microsoft.Extensions.AI.AIFunctionFactory.Create*>.
 - Iterates over the response, printing each update to the console.
 
 #### Cache responses
 
 If you're familiar with [Caching in .NET](caching.md), it's good to know that <xref:Microsoft.Extensions.AI> provides other such delegating `IChatClient` implementations. The <xref:Microsoft.Extensions.AI.DistributedCachingChatClient> is an `IChatClient` that layers caching around another arbitrary `IChatClient` instance. When a unique chat history is submitted to the `DistributedCachingChatClient`, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same history is submitted, such that a cached response can be found in the cache, the `DistributedCachingChatClient` returns the cached response rather than needing to forward the request along the pipeline.
 
-```csharp
-using Microsoft.Extensions.AI;
-using Microsoft.Extensions.Caching.Distributed;
-using Microsoft.Extensions.Caching.Memory;
-using Microsoft.Extensions.Options;
-
-var sampleChatClient = new SampleChatClient(new Uri("http://coolsite.ai"), "my-custom-model");
-IChatClient client = new ChatClientBuilder(sampleChatClient)
-    .UseDistributedCache(new MemoryDistributedCache(Options.Create(new MemoryDistributedCacheOptions())))
-    .Build();
-
-string[] prompts = ["What is AI?", "What is .NET?", "What is AI?"];
+:::code language="csharp" source="snippets/ai/ConsoleAI.CacheResponse/Program.cs":::
 
-foreach (var prompt in prompts)
-{
-    await foreach (var update in client.CompleteStreamingAsync(prompt))
-    {
-        Console.Write(update);
-    }
-
-    Console.WriteLine();
-}
-```
+The preceding example depends on the [📦 Microsoft.Extensions.Caching.Memory](https://www.nuget.org/packages/Microsoft.Extensions.Caching.Memory) NuGet package. For more information, see [Caching in .NET](caching.md).
 
 #### Use telemetry
 
 Another example of a delegating chat client is the <xref:Microsoft.Extensions.AI.OpenTelemetryChatClient>. This implementation adheres to the [OpenTelemetry Semantic Conventions for Generative AI systems](https://opentelemetry.io/docs/specs/semconv/gen-ai/). Similar to other `IChatClient` delegators, it layers metrics and spans around any underlying `IChatClient` implementation, providing enhanced observability.
 
-```csharp
-using Microsoft.Extensions.AI;
-using OpenTelemetry.Trace;
-
-// Configure OpenTelemetry exporter
-var sourceName = Guid.NewGuid().ToString();
-var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
-    .AddSource(sourceName)
-    .AddConsoleExporter()
-    .Build();
+:::code language="csharp" source="snippets/ai/ConsoleAI.UseTelemetry/Program.cs":::
 
-var sampleChatClient = new SampleChatClient(
-    new Uri("http://coolsite.ai"), "my-custom-model");
-
-IChatClient client = new ChatClientBuilder(sampleChatClient)
-    .UseOpenTelemetry(sourceName, static c => c.EnableSensitiveData = true)
-    .Build();
-
-Console.WriteLine((await client.CompleteAsync("What is AI?")).Message);
-```
+The preceding example depends on the [📦 OpenTelemetry.Exporter.Console](https://www.nuget.org/packages/OpenTelemetry.Exporter.Console) NuGet package.
 
 #### Provide options
 
@@ -226,70 +148,21 @@ Every call to <xref:Microsoft.Extensions.AI.IChatClient.CompleteAsync*> or <xref
 
 You can also specify options when building an `IChatClient` with the fluent <xref:Microsoft.Extensions.AI.ChatClientBuilder> API and chaining a call to the `ConfigureOptions` extension method. This delegating client wraps another client and invokes the supplied delegate to populate a `ChatOptions` instance for every call. For example, to ensure that the <xref:Microsoft.Extensions.AI.ChatOptions.ModelId?displayProperty=nameWithType> property defaults to a particular model name, you can use code like the following:
 
-```csharp
-using Microsoft.Extensions.AI;
+:::code language="csharp" source="snippets/ai/ConsoleAI.ProvideOptions/Program.cs":::
 
-IChatClient client = new ChatClientBuilder(
-        new OllamaChatClient(new Uri("http://localhost:11434")))
-    .ConfigureOptions(options => options.ModelId ??= "phi3")
-    .Build();
-
-// will request "phi3"
-Console.WriteLine(await client.CompleteAsync("What is AI?"));
-
-// will request "llama3.1"
-Console.WriteLine(await client.CompleteAsync("What is AI?", new() { ModelId = "llama3.1" }));
-```
+The preceding example depends on the [📦 Microsoft.Extensions.AI.Ollama](https://www.nuget.org/packages/Microsoft.Extensions.AI.Ollama) NuGet package.
 
 #### Functionality pipelines
 
 `IChatClient` instances can be layered to create a pipeline of components, each adding specific functionality. These components can come from `Microsoft.Extensions.AI`, other NuGet packages, or custom implementations. This approach allows you to augment the behavior of the `IChatClient` in various ways to meet your specific needs. Consider the following example code that layers a distributed cache, function invocation, and OpenTelemetry tracing around a sample chat client:
 
-```csharp
-using Microsoft.Extensions.AI;
-using Microsoft.Extensions.Caching.Distributed;
-using Microsoft.Extensions.Caching.Memory;
-using Microsoft.Extensions.Options;
-using OpenTelemetry.Trace;
-
-// Configure OpenTelemetry exporter
-var sourceName = Guid.NewGuid().ToString();
-var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
-    .AddSource(sourceName)
-    .AddConsoleExporter()
-    .Build();
-
-// Explore changing the order of the intermediate "Use" calls to see that impact
-// that has on what gets cached, traced, etc.
-IChatClient client = new ChatClientBuilder(
-        new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
-    .UseDistributedCache(new MemoryDistributedCache(Options.Create(new MemoryDistributedCacheOptions())))
-    .UseFunctionInvocation()
-    .UseOpenTelemetry(sourceName, static c => c.EnableSensitiveData = true)
-    .Build();
-
-ChatOptions options = new()
-{
-    Tools =
-    [
-        AIFunctionFactory.Create(
-            () => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining",
-            name: "GetCurrentWeather", 
-            description: "Gets the current weather")
-    ]
-};
-
-for (int i = 0; i < 3; ++i)
-{
-    List<ChatMessage> history =
-    [
-        new ChatMessage(ChatRole.System, "You are a helpful AI assistant"),
-        new ChatMessage(ChatRole.User, "Do I need an umbrella?")
-    ];
-
-    Console.WriteLine(await client.CompleteAsync(history, options));
-}
-```
+:::code language="csharp" source="snippets/ai/ConsoleAI.FunctionalityPipelines/Program.cs":::
+
+The preceding example depends on the following NuGet packages:
+
+- [📦 Microsoft.Extensions.Caching.Memory](https://www.nuget.org/packages/Microsoft.Extensions.Caching.Memory)
+- [📦 Microsoft.Extensions.AI.Ollama](https://www.nuget.org/packages/Microsoft.Extensions.AI.Ollama)
+- [📦 OpenTelemetry.Exporter.Console](https://www.nuget.org/packages/OpenTelemetry.Exporter.Console)
 
 #### Custom `IChatClient` middleware
 
@@ -299,113 +172,25 @@ The `DelegatingChatClient` class provides default implementations for methods li
 
 The following is an example class derived from `DelegatingChatClient` to provide rate limiting functionality, utilizing the <xref:System.Threading.RateLimiting.RateLimiter>:
 
-```csharp
-using Microsoft.Extensions.AI;
-using System.Threading.RateLimiting;
-
-public sealed class RateLimitingChatClient(
-    IChatClient innerClient, RateLimiter rateLimiter) 
-        : DelegatingChatClient(innerClient)
-{
-    public override async Task<ChatCompletion> CompleteAsync(
-        IList<ChatMessage> chatMessages,
-        ChatOptions? options = null,
-        CancellationToken cancellationToken = default)
-    {
-        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
-            .ConfigureAwait(false);
-
-        if (!lease.IsAcquired)
-        {
-            throw new InvalidOperationException("Unable to acquire lease.");
-        }
-
-        return await base.CompleteAsync(chatMessages, options, cancellationToken)
-            .ConfigureAwait(false);
-    }
-
-    public override async IAsyncEnumerable<StreamingChatCompletionUpdate> CompleteStreamingAsync(
-        IList<ChatMessage> chatMessages,
-        ChatOptions? options = null,
-        [EnumeratorCancellation] CancellationToken cancellationToken = default)
-    {
-        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
-            .ConfigureAwait(false);
-
-        if (!lease.IsAcquired)
-        {
-            throw new InvalidOperationException("Unable to acquire lease.");
-        }
-
-        await foreach (var update in base.CompleteStreamingAsync(chatMessages, options, cancellationToken)
-            .ConfigureAwait(false))
-        {
-            yield return update;
-        }
-    }
-
-    protected override void Dispose(bool disposing)
-    {
-        if (disposing)
-        {
-            rateLimiter.Dispose();
-        }
-
-        base.Dispose(disposing);
-    }
-}
-```
-
-Composition of the `RateLimitingChatClient` with another client is straightforward:
-
-```csharp
-using Microsoft.Extensions.AI;
-using System.Threading.RateLimiting;
+:::code language="csharp" source="snippets/ai/AI.Shared/RateLimitingChatClient.cs":::
 
-var client = new RateLimitingChatClient(
-    new SampleChatClient(new Uri("http://localhost"), "test"),
-    new ConcurrencyLimiter(new() { PermitLimit = 1, QueueLimit = int.MaxValue }));
+The preceding example depends on the [📦 System.Threading.RateLimiting](https://www.nuget.org/packages/System.Threading.RateLimiting) NuGet package. Composition of the `RateLimitingChatClient` with another client is straightforward:
 
-await client.CompleteAsync("What color is the sky?");
-```
+:::code language="csharp" source="snippets/ai/ConsoleAI.CustomClientMiddle/Program.cs":::
 
 To simplify the composition of such components with others, component authors should create a `Use*` extension method for registering the component into a pipeline. For example, consider the following extension method:
 
-```csharp
-public static class RateLimitingChatClientExtensions
-{
-    public static ChatClientBuilder UseRateLimiting(
-        this ChatClientBuilder builder, RateLimiter rateLimiter) =>
-        builder.Use(innerClient => new RateLimitingChatClient(innerClient, rateLimiter));
-}
-```
+:::code language="csharp" source="snippets/ai/AI.Shared/RateLimitingChatClientExtensions.cs":::
 
 Such extensions can also query for relevant services from the DI container; the <xref:System.IServiceProvider> used by the pipeline is passed in as an optional parameter:
 
-```csharp
-public static class RateLimitingChatClientExtensions
-{
-    public static ChatClientBuilder UseRateLimiting(
-        this ChatClientBuilder builder, RateLimiter? rateLimiter = null) =>
-        builder.Use((innerClient, services) => 
-            new RateLimitingChatClient(
-                innerClient,
-                rateLimiter ?? services.GetRequiredService<RateLimiter>()));
-}
-```
+:::code language="csharp" source="snippets/ai/AI.Shared/RateLimitingChatClientExtensions.OptionalOverload.cs":::
 
 The consumer can then easily use this in their pipeline, for example:
 
-```csharp
-var client = new SampleChatClient(new Uri("http://localhost"), "test")
-    .AsBuilder()
-    .UseDistributedCache()
-    .UseRateLimiting()
-    .UseOpenTelemetry()
-    .Build(services);
-```
+:::code language="csharp source="snippets/ai/ConsoleAI.ConsumeClientMiddleware/Program.cs" id="program":::
 
-The preceding extension methods demonstrate using a `Use` method on <xref:Microsoft.Extensions.AI.ChatClientBuilder>. The `ChatClientBuilder` also provides <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use*> overloads that make it easier to write such delegating handlers.
+This example demonstrates [hosted scenario](generic-host.md), where the consumer relies on [dependency injection](dependency-injection.md) to provide the `RateLimiter` instance. The preceding extension methods demonstrate using a `Use` method on <xref:Microsoft.Extensions.AI.ChatClientBuilder>. The `ChatClientBuilder` also provides <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use*> overloads that make it easier to write such delegating handlers.
 
 - <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use(Microsoft.Extensions.AI.IChatClient)>
 - <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use(System.Func{Microsoft.Extensions.AI.IChatClient,Microsoft.Extensions.AI.IChatClient})>
@@ -413,59 +198,11 @@ The preceding extension methods demonstrate using a `Use` method on <xref:Micros
 
 For example, in the earlier `RateLimitingChatClient` example, the overrides of `CompleteAsync` and `CompleteStreamingAsync` only need to do work before and after delegating to the next client in the pipeline. To achieve the same thing without writing a custom class, you can use an overload of `Use` that accepts a delegate that's used for both `CompleteAsync` and `CompleteStreamingAsync`, reducing the boilerplate required:
 
-```csharp
-RateLimiter rateLimiter = new ConcurrencyLimiter(new()
-{
-    PermitLimit = 1, 
-    QueueLimit = int.MaxValue
-});
-
-var client = new SampleChatClient(new Uri("http://localhost"), "test")
-    .AsBuilder()
-    .UseDistributedCache()
-    .Use(static async (chatMessages, options, nextAsync, cancellationToken) =>
-    {
-        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
-            .ConfigureAwait(false);
-
-        if (!lease.IsAcquired)
-        {
-            throw new InvalidOperationException("Unable to acquire lease.");
-        }
-
-        await nextAsync(chatMessages, options, cancellationToken);
-    })
-    .UseOpenTelemetry()
-    .Build();
-```
+:::code language="csharp" source="snippets/ai/ConsoleAI.UseExample/Program.cs":::
 
 The preceding overload internally uses an `AnonymousDelegatingChatClient`, which enables more complicated patterns with only a little additional code. For example, to achieve the same result but with the <xref:System.Threading.RateLimiting.RateLimiter> retrieved from DI:
 
-```csharp
-var client = new SampleChatClient(new Uri("http://localhost"), "test")
-    .AsBuilder()
-    .UseDistributedCache()
-    .Use(static (innerClient, services) =>
-    {
-        var rateLimiter = services.GetRequiredService<RateLimiter>();
-
-        return new AnonymousDelegatingChatClient(
-            innerClient, async (chatMessages, options, nextAsync, cancellationToken) =>
-        {
-            using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
-                .ConfigureAwait(false);
-
-            if (!lease.IsAcquired)
-            {
-                throw new InvalidOperationException("Unable to acquire lease.");
-            }
-
-            await nextAsync(chatMessages, options, cancellationToken);
-        });
-    })
-    .UseOpenTelemetry()
-    .Build();
-```
+:::code language="csharp" source="snippets/ai/ConsoleAI.UseExampleAlt/Program.cs":::
 
 For scenarios where the developer would like to specify delegating implementations of `CompleteAsync` and `CompleteStreamingAsync` inline, and where it's important to be able to write a different implementation for each in order to handle their unique return types specially, another overload of `Use` exists that accepts a delegate for each.
 
@@ -473,25 +210,12 @@ For scenarios where the developer would like to specify delegating implementatio
 
 <xref:Microsoft.Extensions.AI.IChatClient> implementations will typically be provided to an application via [dependency injection (DI)](dependency-injection.md). In this example, an <xref:Microsoft.Extensions.Caching.Distributed.IDistributedCache> is added into the DI container, as is an `IChatClient`. The registration for the `IChatClient` employs a builder that creates a pipeline containing a caching client (which will then use an `IDistributedCache` retrieved from DI) and the sample client. The injected `IChatClient` can be retrieved and used elsewhere in the app.
 
-```csharp
-using Microsoft.Extensions.AI;
-using Microsoft.Extensions.DependencyInjection;
-using Microsoft.Extensions.Hosting;
-
-// App Setup
-var builder = Host.CreateApplicationBuilder();
+::code language="csharp" source="snippets/ai/ConsoleAI.DependencyInjection/Program.cs":::
 
-builder.Services.AddDistributedMemoryCache();
-builder.Services.AddChatClient(new SampleChatClient(new Uri("http://coolsite.ai"), "my-custom-model"))
-    .UseDistributedCache();
+The preceding example depends on the following NuGet packages:
 
-var host = builder.Build();
-
-// Elsewhere in the app
-var chatClient = host.Services.GetRequiredService<IChatClient>();
-
-Console.WriteLine(await chatClient.CompleteAsync("What is AI?"));
-```
+- [📦 Microsoft.Extensions.Hosting](https://www.nuget.org/packages/Microsoft.Extensions.Hosting)
+- [📦 Microsoft.Extensions.Caching.Memory](https://www.nuget.org/packages/Microsoft.Extensions.Caching.Memory)
 
 What instance and configuration is injected can differ based on the current needs of the application, and multiple pipelines can be injected with different keys.
 
@@ -527,115 +251,23 @@ You can find actual concrete implementations in the following packages:
 
 The primary operation performed with an <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2> is embedding generation, which is accomplished with its <xref:Microsoft.Extensions.AI.IEmbeddingGenerator`2.GenerateAsync*> method.
 
-```csharp
-using Microsoft.Extensions.AI;
-
-IEmbeddingGenerator<string, Embedding<float>> generator =
-    new SampleEmbeddingGenerator(
-        new Uri("http://coolsite.ai"), "my-custom-model");
-
-foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
-{
-    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
-}
-```
+::code language="csharp" source="snippets/ai/ConsoleAI.CreateEmbeddings/Program.cs":::
 
 #### Custom `IEmbeddingGenerator` middleware
 
 As with `IChatClient`, `IEmbeddingGenerator` implementations can be layered. Just as `Microsoft.Extensions.AI` provides delegating implementations of `IChatClient` for caching and telemetry, it provides an implementation for `IEmbeddingGenerator` as well.
 
-```csharp
-using Microsoft.Extensions.AI;
-using Microsoft.Extensions.Caching.Distributed;
-using Microsoft.Extensions.Caching.Memory;
-using Microsoft.Extensions.Options;
-using OpenTelemetry.Trace;
-
-// Configure OpenTelemetry exporter
-var sourceName = Guid.NewGuid().ToString();
-var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
-    .AddSource(sourceName)
-    .AddConsoleExporter()
-    .Build();
-
-// Explore changing the order of the intermediate "Use" calls to see that impact
-// that has on what gets cached, traced, etc.
-var generator = new EmbeddingGeneratorBuilder<string, Embedding<float>>(
-        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "my-custom-model"))
-    .UseDistributedCache(
-        new MemoryDistributedCache(Options.Create(new MemoryDistributedCacheOptions())))
-    .UseOpenTelemetry(sourceName)
-    .Build();
-
-var embeddings = await generator.GenerateAsync(
-[
-    "What is AI?",
-    "What is .NET?",
-    "What is AI?"
-]);
-
-foreach (var embedding in embeddings)
-{
-    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
-}
-```
+:::code language="csharp" source="snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/Program.cs":::
 
 The `IEmbeddingGenerator` enables building custom middleware that extends the functionality of an `IEmbeddingGenerator`. The <xref:Microsoft.Extensions.AI.DelegatingEmbeddingGenerator`2> class is an implementation of the `IEmbeddingGenerator<TInput, TEmbedding>` interface that serves as a base class for creating embedding generators that delegate their operations to another `IEmbeddingGenerator<TInput, TEmbedding>` instance. It allows for chaining multiple generators in any order, passing calls through to an underlying generator. The class provides default implementations for methods such as <xref:Microsoft.Extensions.AI.DelegatingEmbeddingGenerator`2.GenerateAsync*> and `Dispose`, which forward the calls to the inner generator instance, enabling flexible and modular embedding generation.
 
 The following is an example implementation of such a delegating embedding generator that rate limits embedding generation requests:
 
-```csharp
-using Microsoft.Extensions.AI;
-using System.Threading.RateLimiting;
-
-public class RateLimitingEmbeddingGenerator(
-    IEmbeddingGenerator<string, Embedding<float>> innerGenerator, RateLimiter rateLimiter) 
-        : DelegatingEmbeddingGenerator<string, Embedding<float>>(innerGenerator)
-{
-    public override async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
-        IEnumerable<string> values,
-        EmbeddingGenerationOptions? options = null,
-        CancellationToken cancellationToken = default)
-    {
-        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
-            .ConfigureAwait(false);
-
-        if (!lease.IsAcquired)
-        {
-            throw new InvalidOperationException("Unable to acquire lease.");
-        }
-
-        return await base.GenerateAsync(values, options, cancellationToken);
-    }
-
-    protected override void Dispose(bool disposing)
-    {
-        if (disposing)
-        {
-            rateLimiter.Dispose();
-        }
-
-        base.Dispose(disposing);
-    }
-}
-```
+:::code language="csharp" source="snippets/ai/AI.Shared/RateLimitingEmbeddingGenerator.cs":::
 
 This can then be layered around an arbitrary `IEmbeddingGenerator<string, Embedding<float>>` to rate limit all embedding generation operations performed.
 
-```csharp
-using Microsoft.Extensions.AI;
-using System.Threading.RateLimiting;
-
-IEmbeddingGenerator<string, Embedding<float>> generator =
-    new RateLimitingEmbeddingGenerator(
-        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "my-custom-model"),
-        new ConcurrencyLimiter(new() { PermitLimit = 1, QueueLimit = int.MaxValue }));
-
-foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
-{
-    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
-}
-```
+:::code language="csharp source="snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/Program.cs" id="program":::
 
 In this way, the `RateLimitingEmbeddingGenerator` can be composed with other `IEmbeddingGenerator<string, Embedding<float>>` instances to provide rate limiting functionality.
 
diff --git a/docs/core/extensions/snippets/ai/AI.Shared/AI.Shared.csproj b/docs/core/extensions/snippets/ai/AI.Shared/AI.Shared.csproj
new file mode 100644
index 0000000000000..f7e01ba0c9f9c
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/AI.Shared/AI.Shared.csproj
@@ -0,0 +1,14 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.AI" Version="9.0.1-preview.1.24570.5" />
+    <PackageReference Include="System.Threading.RateLimiting" Version="9.0.0" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClient.cs b/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClient.cs
new file mode 100644
index 0000000000000..e5d3ada7f1f60
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClient.cs
@@ -0,0 +1,55 @@
+﻿using Microsoft.Extensions.AI;
+using System.Runtime.CompilerServices;
+using System.Threading.RateLimiting;
+
+public sealed class RateLimitingChatClient(
+    IChatClient innerClient, RateLimiter rateLimiter)
+        : DelegatingChatClient(innerClient)
+{
+    public override async Task<ChatCompletion> CompleteAsync(
+        IList<ChatMessage> chatMessages,
+        ChatOptions? options = null,
+        CancellationToken cancellationToken = default)
+    {
+        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+            .ConfigureAwait(false);
+
+        if (!lease.IsAcquired)
+        {
+            throw new InvalidOperationException("Unable to acquire lease.");
+        }
+
+        return await base.CompleteAsync(chatMessages, options, cancellationToken)
+            .ConfigureAwait(false);
+    }
+
+    public override async IAsyncEnumerable<StreamingChatCompletionUpdate> CompleteStreamingAsync(
+        IList<ChatMessage> chatMessages,
+        ChatOptions? options = null,
+        [EnumeratorCancellation] CancellationToken cancellationToken = default)
+    {
+        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+            .ConfigureAwait(false);
+
+        if (!lease.IsAcquired)
+        {
+            throw new InvalidOperationException("Unable to acquire lease.");
+        }
+
+        await foreach (var update in base.CompleteStreamingAsync(chatMessages, options, cancellationToken)
+            .ConfigureAwait(false))
+        {
+            yield return update;
+        }
+    }
+
+    protected override void Dispose(bool disposing)
+    {
+        if (disposing)
+        {
+            rateLimiter.Dispose();
+        }
+
+        base.Dispose(disposing);
+    }
+}
diff --git a/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClientExtensions.OptionalOverload.cs b/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClientExtensions.OptionalOverload.cs
new file mode 100644
index 0000000000000..066cf22f6ee44
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClientExtensions.OptionalOverload.cs
@@ -0,0 +1,17 @@
+﻿namespace Example.Two;
+
+// <two>
+using Microsoft.Extensions.AI;
+using Microsoft.Extensions.DependencyInjection;
+using System.Threading.RateLimiting;
+
+public static class RateLimitingChatClientExtensions
+{
+    public static ChatClientBuilder UseRateLimiting(
+        this ChatClientBuilder builder, RateLimiter? rateLimiter = null) =>
+        builder.Use((innerClient, services) =>
+            new RateLimitingChatClient(
+                innerClient,
+                rateLimiter ?? services.GetRequiredService<RateLimiter>()));
+}
+// </two>
diff --git a/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClientExtensions.cs b/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClientExtensions.cs
new file mode 100644
index 0000000000000..5f0fe5765b193
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingChatClientExtensions.cs
@@ -0,0 +1,13 @@
+﻿namespace Example.One;
+
+// <one>
+using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+public static class RateLimitingChatClientExtensions
+{
+    public static ChatClientBuilder UseRateLimiting(
+        this ChatClientBuilder builder, RateLimiter rateLimiter) =>
+        builder.Use(innerClient => new RateLimitingChatClient(innerClient, rateLimiter));
+}
+// </one>
diff --git a/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingEmbeddingGenerator.cs b/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingEmbeddingGenerator.cs
new file mode 100644
index 0000000000000..f71650698eaac
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/AI.Shared/RateLimitingEmbeddingGenerator.cs
@@ -0,0 +1,33 @@
+﻿using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+public class RateLimitingEmbeddingGenerator(
+    IEmbeddingGenerator<string, Embedding<float>> innerGenerator, RateLimiter rateLimiter)
+        : DelegatingEmbeddingGenerator<string, Embedding<float>>(innerGenerator)
+{
+    public override async Task<GeneratedEmbeddings<Embedding<float>>> GenerateAsync(
+        IEnumerable<string> values,
+        EmbeddingGenerationOptions? options = null,
+        CancellationToken cancellationToken = default)
+    {
+        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+            .ConfigureAwait(false);
+
+        if (!lease.IsAcquired)
+        {
+            throw new InvalidOperationException("Unable to acquire lease.");
+        }
+
+        return await base.GenerateAsync(values, options, cancellationToken);
+    }
+
+    protected override void Dispose(bool disposing)
+    {
+        if (disposing)
+        {
+            rateLimiter.Dispose();
+        }
+
+        base.Dispose(disposing);
+    }
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI/SampleChatClient.cs b/docs/core/extensions/snippets/ai/AI.Shared/SampleChatClient.cs
similarity index 100%
rename from docs/core/extensions/snippets/ai/ConsoleAI/SampleChatClient.cs
rename to docs/core/extensions/snippets/ai/AI.Shared/SampleChatClient.cs
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI/SampleEmbeddingGenerator.cs b/docs/core/extensions/snippets/ai/AI.Shared/SampleEmbeddingGenerator.cs
similarity index 100%
rename from docs/core/extensions/snippets/ai/ConsoleAI/SampleEmbeddingGenerator.cs
rename to docs/core/extensions/snippets/ai/AI.Shared/SampleEmbeddingGenerator.cs
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CacheResponses/ConsoleAI.CacheResponses.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.CacheResponses/ConsoleAI.CacheResponses.csproj
new file mode 100644
index 0000000000000..be3d111984a25
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CacheResponses/ConsoleAI.CacheResponses.csproj
@@ -0,0 +1,18 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="9.0.0" />
+  </ItemGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CacheResponses/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.CacheResponses/Program.cs
new file mode 100644
index 0000000000000..51096d1df9a95
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CacheResponses/Program.cs
@@ -0,0 +1,24 @@
+﻿using Microsoft.Extensions.AI;
+using Microsoft.Extensions.Caching.Distributed;
+using Microsoft.Extensions.Caching.Memory;
+using Microsoft.Extensions.Options;
+
+var sampleChatClient = new SampleChatClient(
+    new Uri("http://coolsite.ai"), "target-ai-model");
+
+IChatClient client = new ChatClientBuilder(sampleChatClient)
+    .UseDistributedCache(new MemoryDistributedCache(
+        Options.Create(new MemoryDistributedCacheOptions())))
+    .Build();
+
+string[] prompts = ["What is AI?", "What is .NET?", "What is AI?"];
+
+foreach (var prompt in prompts)
+{
+    await foreach (var update in client.CompleteStreamingAsync(prompt))
+    {
+        Console.Write(update);
+    }
+
+    Console.WriteLine();
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CompleteAsyncArgs/ConsoleAI.CompleteAsyncArgs.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.CompleteAsyncArgs/ConsoleAI.CompleteAsyncArgs.csproj
new file mode 100644
index 0000000000000..b615dd1b868c2
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CompleteAsyncArgs/ConsoleAI.CompleteAsyncArgs.csproj
@@ -0,0 +1,14 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CompleteAsyncArgs/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.CompleteAsyncArgs/Program.cs
new file mode 100644
index 0000000000000..eda37fef75fbf
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CompleteAsyncArgs/Program.cs
@@ -0,0 +1,10 @@
+﻿using Microsoft.Extensions.AI;
+
+IChatClient client = new SampleChatClient(
+    new Uri("http://coolsite.ai"), "target-ai-model");
+
+Console.WriteLine(await client.CompleteAsync(
+[
+    new(ChatRole.System, "You are a helpful AI assistant"),
+    new(ChatRole.User, "What is AI?"),
+]));
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CompleteStreamingAsync/ConsoleAI.CompleteStreamingAsync.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.CompleteStreamingAsync/ConsoleAI.CompleteStreamingAsync.csproj
new file mode 100644
index 0000000000000..b615dd1b868c2
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CompleteStreamingAsync/ConsoleAI.CompleteStreamingAsync.csproj
@@ -0,0 +1,14 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CompleteStreamingAsync/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.CompleteStreamingAsync/Program.cs
new file mode 100644
index 0000000000000..a5e32ce3438a0
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CompleteStreamingAsync/Program.cs
@@ -0,0 +1,9 @@
+﻿using Microsoft.Extensions.AI;
+
+IChatClient client = new SampleChatClient(
+    new Uri("http://coolsite.ai"), "target-ai-model");
+
+await foreach (var update in client.CompleteStreamingAsync("What is AI?"))
+{
+    Console.Write(update);
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeClientMiddleware/ConsoleAI.ConsumeClientMiddleware.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeClientMiddleware/ConsoleAI.ConsumeClientMiddleware.csproj
new file mode 100644
index 0000000000000..ffd67c3f5495a
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeClientMiddleware/ConsoleAI.ConsumeClientMiddleware.csproj
@@ -0,0 +1,15 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.Hosting" Version="9.0.0" />
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeClientMiddleware/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeClientMiddleware/Program.cs
new file mode 100644
index 0000000000000..f95efffe26568
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeClientMiddleware/Program.cs
@@ -0,0 +1,26 @@
+﻿using Example.Two;
+
+// <program>
+using Microsoft.Extensions.AI;
+using Microsoft.Extensions.DependencyInjection;
+using Microsoft.Extensions.Hosting;
+
+var builder = Host.CreateApplicationBuilder(args);
+
+builder.Services.AddChatClient(services =>
+    new SampleChatClient(new Uri("http://localhost"), "test")
+        .AsBuilder()
+        .UseDistributedCache()
+        .UseRateLimiting()
+        .UseOpenTelemetry()
+        .Build(services));
+
+using var app = builder.Build();
+
+// Elsewhere in the app
+var chatClient = app.Services.GetRequiredService<IChatClient>();
+
+Console.WriteLine(await chatClient.CompleteAsync("What is AI?"));
+
+app.Run();
+// </program>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/ConsoleAI.ConsumeRateLimitingEmbedding.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/ConsoleAI.ConsumeRateLimitingEmbedding.csproj
new file mode 100644
index 0000000000000..b615dd1b868c2
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/ConsoleAI.ConsumeRateLimitingEmbedding.csproj
@@ -0,0 +1,14 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/Program.cs
new file mode 100644
index 0000000000000..d7987319e07ee
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/Program.cs
@@ -0,0 +1,16 @@
+﻿using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+IEmbeddingGenerator<string, Embedding<float>> generator =
+    new RateLimitingEmbeddingGenerator(
+        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "target-ai-model"),
+        new ConcurrencyLimiter(new()
+        {
+            PermitLimit = 1,
+            QueueLimit = int.MaxValue
+        }));
+
+foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
+{
+    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CreateEmbeddings/ConsoleAI.CreateEmbeddings.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.CreateEmbeddings/ConsoleAI.CreateEmbeddings.csproj
new file mode 100644
index 0000000000000..b615dd1b868c2
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CreateEmbeddings/ConsoleAI.CreateEmbeddings.csproj
@@ -0,0 +1,14 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CreateEmbeddings/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.CreateEmbeddings/Program.cs
new file mode 100644
index 0000000000000..c3d8ece9410fb
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CreateEmbeddings/Program.cs
@@ -0,0 +1,10 @@
+﻿using Microsoft.Extensions.AI;
+
+IEmbeddingGenerator<string, Embedding<float>> generator =
+    new SampleEmbeddingGenerator(
+        new Uri("http://coolsite.ai"), "target-ai-model");
+
+foreach (var embedding in await generator.GenerateAsync(["What is AI?", "What is .NET?"]))
+{
+    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CustomClientMiddle/ConsoleAI.CustomClientMiddle.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.CustomClientMiddle/ConsoleAI.CustomClientMiddle.csproj
new file mode 100644
index 0000000000000..be4820d0ade34
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CustomClientMiddle/ConsoleAI.CustomClientMiddle.csproj
@@ -0,0 +1,18 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="System.Threading.RateLimiting" Version="9.0.0" />
+  </ItemGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CustomClientMiddle/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.CustomClientMiddle/Program.cs
new file mode 100644
index 0000000000000..dd69572c6c7a2
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CustomClientMiddle/Program.cs
@@ -0,0 +1,12 @@
+﻿using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+var client = new RateLimitingChatClient(
+    new SampleChatClient(new Uri("http://localhost"), "test"),
+    new ConcurrencyLimiter(new()
+    {
+        PermitLimit = 1,
+        QueueLimit = int.MaxValue
+    }));
+
+await client.CompleteAsync("What color is the sky?");
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/ConsoleAI.CustomEmbeddingsMiddle.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/ConsoleAI.CustomEmbeddingsMiddle.csproj
new file mode 100644
index 0000000000000..0f51adef5a2a3
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/ConsoleAI.CustomEmbeddingsMiddle.csproj
@@ -0,0 +1,16 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="9.0.0" />
+    <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.10.0" />
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/Program.cs
new file mode 100644
index 0000000000000..ffa45b4be6dd4
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.CustomEmbeddingsMiddle/Program.cs
@@ -0,0 +1,34 @@
+﻿using Microsoft.Extensions.AI;
+using Microsoft.Extensions.Caching.Distributed;
+using Microsoft.Extensions.Caching.Memory;
+using Microsoft.Extensions.Options;
+using OpenTelemetry.Trace;
+
+// Configure OpenTelemetry exporter
+var sourceName = Guid.NewGuid().ToString();
+var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
+    .AddSource(sourceName)
+    .AddConsoleExporter()
+    .Build();
+
+// Explore changing the order of the intermediate "Use" calls to see that impact
+// that has on what gets cached, traced, etc.
+var generator = new EmbeddingGeneratorBuilder<string, Embedding<float>>(
+        new SampleEmbeddingGenerator(new Uri("http://coolsite.ai"), "target-ai-model"))
+    .UseDistributedCache(
+        new MemoryDistributedCache(
+            Options.Create(new MemoryDistributedCacheOptions())))
+    .UseOpenTelemetry(sourceName: sourceName)
+    .Build();
+
+var embeddings = await generator.GenerateAsync(
+[
+    "What is AI?",
+    "What is .NET?",
+    "What is AI?"
+]);
+
+foreach (var embedding in embeddings)
+{
+    Console.WriteLine(string.Join(", ", embedding.Vector.ToArray()));
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.DependencyInjection/ConsoleAI.DependencyInjection.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.DependencyInjection/ConsoleAI.DependencyInjection.csproj
new file mode 100644
index 0000000000000..dd7f9e2936369
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.DependencyInjection/ConsoleAI.DependencyInjection.csproj
@@ -0,0 +1,16 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.Hosting" Version="9.0.0" />
+    <PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="9.0.0" />
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.DependencyInjection/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.DependencyInjection/Program.cs
new file mode 100644
index 0000000000000..930b0b036c74e
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.DependencyInjection/Program.cs
@@ -0,0 +1,20 @@
+﻿using Microsoft.Extensions.AI;
+using Microsoft.Extensions.DependencyInjection;
+using Microsoft.Extensions.Hosting;
+
+// App setup
+var builder = Host.CreateApplicationBuilder();
+
+builder.Services.AddDistributedMemoryCache();
+builder.Services.AddChatClient(new SampleChatClient(
+        new Uri("http://coolsite.ai"), "target-ai-model"))
+    .UseDistributedCache();
+
+using var app = builder.Build();
+
+// Elsewhere in the app
+var chatClient = app.Services.GetRequiredService<IChatClient>();
+
+Console.WriteLine(await chatClient.CompleteAsync("What is AI?"));
+
+app.Run();
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.FunctionalityPipelines/ConsoleAI.FunctionalityPipelines.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.FunctionalityPipelines/ConsoleAI.FunctionalityPipelines.csproj
new file mode 100644
index 0000000000000..3fe7c47c39ce1
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.FunctionalityPipelines/ConsoleAI.FunctionalityPipelines.csproj
@@ -0,0 +1,20 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.Caching.Memory" Version="9.0.0" />
+    <PackageReference Include="Microsoft.Extensions.AI.Ollama" Version="9.0.1-preview.1.24570.5" />
+    <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.10.0" />
+  </ItemGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.FunctionalityPipelines/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.FunctionalityPipelines/Program.cs
new file mode 100644
index 0000000000000..16f563b3689a3
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.FunctionalityPipelines/Program.cs
@@ -0,0 +1,46 @@
+﻿using Microsoft.Extensions.AI;
+using Microsoft.Extensions.Caching.Distributed;
+using Microsoft.Extensions.Caching.Memory;
+using Microsoft.Extensions.Options;
+using OpenTelemetry.Trace;
+
+// Configure OpenTelemetry exporter
+var sourceName = Guid.NewGuid().ToString();
+var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
+    .AddSource(sourceName)
+    .AddConsoleExporter()
+    .Build();
+
+// Explore changing the order of the intermediate "Use" calls to see that impact
+// that has on what gets cached, traced, etc.
+IChatClient client = new ChatClientBuilder(
+        new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
+    .UseDistributedCache(new MemoryDistributedCache(
+        Options.Create(new MemoryDistributedCacheOptions())))
+    .UseFunctionInvocation()
+    .UseOpenTelemetry(
+        sourceName: sourceName,
+        configure: static c => c.EnableSensitiveData = true)
+    .Build();
+
+ChatOptions options = new()
+{
+    Tools =
+    [
+        AIFunctionFactory.Create(
+            () => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining",
+            name: "GetCurrentWeather",
+            description: "Gets the current weather")
+    ]
+};
+
+for (int i = 0; i < 3; ++i)
+{
+    List<ChatMessage> history =
+    [
+        new ChatMessage(ChatRole.System, "You are a helpful AI assistant"),
+        new ChatMessage(ChatRole.User, "Do I need an umbrella?")
+    ];
+
+    Console.WriteLine(await client.CompleteAsync(history, options));
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.ProvideOptions/ConsoleAI.ProvideOptions.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.ProvideOptions/ConsoleAI.ProvideOptions.csproj
new file mode 100644
index 0000000000000..52b8ab4531c7f
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.ProvideOptions/ConsoleAI.ProvideOptions.csproj
@@ -0,0 +1,18 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.AI.Ollama" Version="9.0.1-preview.1.24570.5" />
+  </ItemGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.ProvideOptions/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.ProvideOptions/Program.cs
new file mode 100644
index 0000000000000..c6ce0bfb7010e
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.ProvideOptions/Program.cs
@@ -0,0 +1,13 @@
+﻿using Microsoft.Extensions.AI;
+
+IChatClient client = new ChatClientBuilder(
+        new OllamaChatClient(new Uri("http://localhost:11434")))
+    .ConfigureOptions(options => options.ModelId ??= "phi3")
+    .Build();
+
+// will request "phi3"
+Console.WriteLine(await client.CompleteAsync("What is AI?"));
+
+// will request "llama3.1"
+Console.WriteLine(await client.CompleteAsync(
+    "What is AI?", new() { ModelId = "llama3.1" }));
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.ToolCalling/ConsoleAI.ToolCalling.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.ToolCalling/ConsoleAI.ToolCalling.csproj
new file mode 100644
index 0000000000000..52b8ab4531c7f
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.ToolCalling/ConsoleAI.ToolCalling.csproj
@@ -0,0 +1,18 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="Microsoft.Extensions.AI.Ollama" Version="9.0.1-preview.1.24570.5" />
+  </ItemGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.ToolCalling/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.ToolCalling/Program.cs
new file mode 100644
index 0000000000000..ff8ef0c2ba7c8
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.ToolCalling/Program.cs
@@ -0,0 +1,21 @@
+﻿using System.ComponentModel;
+using Microsoft.Extensions.AI;
+
+[Description("Gets the current weather")]
+string GetCurrentWeather() => Random.Shared.NextDouble() > 0.5
+    ? "It's sunny"
+    : "It's raining";
+
+IChatClient client = new ChatClientBuilder(
+        new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1"))
+    .UseFunctionInvocation()
+    .Build();
+
+var response = client.CompleteStreamingAsync(
+    "Should I wear a rain coat?",
+    new() { Tools = [AIFunctionFactory.Create(GetCurrentWeather)] });
+
+await foreach (var update in response)
+{
+    Console.Write(update);
+}
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.UseExample/ConsoleAI.UseExample.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.UseExample/ConsoleAI.UseExample.csproj
new file mode 100644
index 0000000000000..b615dd1b868c2
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.UseExample/ConsoleAI.UseExample.csproj
@@ -0,0 +1,14 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.UseExample/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.UseExample/Program.cs
new file mode 100644
index 0000000000000..aa3de1cec1423
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.UseExample/Program.cs
@@ -0,0 +1,28 @@
+﻿using Microsoft.Extensions.AI;
+using System.Threading.RateLimiting;
+
+RateLimiter rateLimiter = new ConcurrencyLimiter(new()
+{
+    PermitLimit = 1,
+    QueueLimit = int.MaxValue
+});
+
+var client = new SampleChatClient(new Uri("http://localhost"), "test")
+    .AsBuilder()
+    .UseDistributedCache()
+    .Use(async (chatMessages, options, nextAsync, cancellationToken) =>
+    {
+        using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+            .ConfigureAwait(false);
+
+        if (!lease.IsAcquired)
+        {
+            throw new InvalidOperationException("Unable to acquire lease.");
+        }
+
+        await nextAsync(chatMessages, options, cancellationToken);
+    })
+    .UseOpenTelemetry()
+    .Build();
+
+// Use client
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.UseExampleAlt/ConsoleAI.UseExampleAlt.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.UseExampleAlt/ConsoleAI.UseExampleAlt.csproj
new file mode 100644
index 0000000000000..b615dd1b868c2
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.UseExampleAlt/ConsoleAI.UseExampleAlt.csproj
@@ -0,0 +1,14 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.UseExampleAlt/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.UseExampleAlt/Program.cs
new file mode 100644
index 0000000000000..1b35dfe6d25c5
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.UseExampleAlt/Program.cs
@@ -0,0 +1,27 @@
+﻿using System.Threading.RateLimiting;
+using Microsoft.Extensions.AI;
+using Microsoft.Extensions.DependencyInjection;
+
+var client = new SampleChatClient(new Uri("http://localhost"), "test")
+    .AsBuilder()
+    .UseDistributedCache()
+    .Use(static (innerClient, services) =>
+    {
+        var rateLimiter = services.GetRequiredService<RateLimiter>();
+
+        return new AnonymousDelegatingChatClient(
+            innerClient, async (chatMessages, options, nextAsync, cancellationToken) =>
+            {
+                using var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken)
+                    .ConfigureAwait(false);
+
+                if (!lease.IsAcquired)
+                {
+                    throw new InvalidOperationException("Unable to acquire lease.");
+                }
+
+                await nextAsync(chatMessages, options, cancellationToken);
+            });
+    })
+    .UseOpenTelemetry()
+    .Build();
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.UseTelemetry/ConsoleAI.UseTelemetry.csproj b/docs/core/extensions/snippets/ai/ConsoleAI.UseTelemetry/ConsoleAI.UseTelemetry.csproj
new file mode 100644
index 0000000000000..b97375a313615
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.UseTelemetry/ConsoleAI.UseTelemetry.csproj
@@ -0,0 +1,18 @@
+﻿<Project Sdk="Microsoft.NET.Sdk">
+
+  <PropertyGroup>
+    <OutputType>Exe</OutputType>
+    <TargetFramework>net9.0</TargetFramework>
+    <ImplicitUsings>enable</ImplicitUsings>
+    <Nullable>enable</Nullable>
+  </PropertyGroup>
+
+  <ItemGroup>
+    <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.10.0" />
+  </ItemGroup>
+
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
+</Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI.UseTelemetry/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI.UseTelemetry/Program.cs
new file mode 100644
index 0000000000000..d4c5e2c28e723
--- /dev/null
+++ b/docs/core/extensions/snippets/ai/ConsoleAI.UseTelemetry/Program.cs
@@ -0,0 +1,20 @@
+﻿using Microsoft.Extensions.AI;
+using OpenTelemetry.Trace;
+
+// Configure OpenTelemetry exporter
+var sourceName = Guid.NewGuid().ToString();
+var tracerProvider = OpenTelemetry.Sdk.CreateTracerProviderBuilder()
+    .AddSource(sourceName)
+    .AddConsoleExporter()
+    .Build();
+
+var sampleChatClient = new SampleChatClient(
+    new Uri("http://coolsite.ai"), "target-ai-model");
+
+IChatClient client = new ChatClientBuilder(sampleChatClient)
+    .UseOpenTelemetry(
+        sourceName: sourceName,
+        configure: static c => c.EnableSensitiveData = true)
+    .Build();
+
+Console.WriteLine((await client.CompleteAsync("What is AI?")).Message);
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI/ConsoleAI.csproj b/docs/core/extensions/snippets/ai/ConsoleAI/ConsoleAI.csproj
index 37684fe82e198..bcec98d0ad009 100644
--- a/docs/core/extensions/snippets/ai/ConsoleAI/ConsoleAI.csproj
+++ b/docs/core/extensions/snippets/ai/ConsoleAI/ConsoleAI.csproj
@@ -11,4 +11,8 @@
     <PackageReference Include="Microsoft.Extensions.AI" Version="9.0.1-preview.1.24570.5" />
   </ItemGroup>
 
+  <ItemGroup>
+    <ProjectReference Include="..\AI.Shared\AI.Shared.csproj" />
+  </ItemGroup>
+
 </Project>
diff --git a/docs/core/extensions/snippets/ai/ConsoleAI/Program.cs b/docs/core/extensions/snippets/ai/ConsoleAI/Program.cs
index 6c3d58ae5772f..258a6d41ac681 100644
--- a/docs/core/extensions/snippets/ai/ConsoleAI/Program.cs
+++ b/docs/core/extensions/snippets/ai/ConsoleAI/Program.cs
@@ -1,7 +1,7 @@
 ﻿using Microsoft.Extensions.AI;
 
 IChatClient client = new SampleChatClient(
-    new Uri("http://coolsite.ai"), "my-custom-model");
+    new Uri("http://coolsite.ai"), "target-ai-model");
 
 var response = await client.CompleteAsync("What is AI?");
 

From 3202f36950a45e3ee1c1983d4c1806a083ecce43 Mon Sep 17 00:00:00 2001
From: David Pine <david.pine@microsoft.com>
Date: Tue, 17 Dec 2024 11:07:05 -0600
Subject: [PATCH 5/6] Apply suggestions from code review

Co-authored-by: alexwolfmsft <93200798+alexwolfmsft@users.noreply.github.com>
---
 docs/core/extensions/artificial-intelligence.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/core/extensions/artificial-intelligence.md b/docs/core/extensions/artificial-intelligence.md
index 22a1b45abb093..970c54c410e1a 100644
--- a/docs/core/extensions/artificial-intelligence.md
+++ b/docs/core/extensions/artificial-intelligence.md
@@ -128,7 +128,7 @@ The preceding code:
 
 #### Cache responses
 
-If you're familiar with [Caching in .NET](caching.md), it's good to know that <xref:Microsoft.Extensions.AI> provides other such delegating `IChatClient` implementations. The <xref:Microsoft.Extensions.AI.DistributedCachingChatClient> is an `IChatClient` that layers caching around another arbitrary `IChatClient` instance. When a unique chat history is submitted to the `DistributedCachingChatClient`, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same history is submitted, such that a cached response can be found in the cache, the `DistributedCachingChatClient` returns the cached response rather than needing to forward the request along the pipeline.
+If you're familiar with [Caching in .NET](caching.md), it's good to know that <xref:Microsoft.Extensions.AI> provides other such delegating `IChatClient` implementations. The <xref:Microsoft.Extensions.AI.DistributedCachingChatClient> is an `IChatClient` that layers caching around another arbitrary `IChatClient` instance. When a unique chat history is submitted to the `DistributedCachingChatClient`, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same prompt is submitted, such that a cached response can be found in the cache, the `DistributedCachingChatClient` returns the cached response rather than needing to forward the request along the pipeline.
 
 :::code language="csharp" source="snippets/ai/ConsoleAI.CacheResponse/Program.cs":::
 

From 5c4ab2fbffab8f5981ad44e4856bb512dd1e396b Mon Sep 17 00:00:00 2001
From: David Pine <david.pine.7@gmail.com>
Date: Tue, 17 Dec 2024 11:20:31 -0600
Subject: [PATCH 6/6] Fix code includes

---
 docs/core/extensions/artificial-intelligence.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/core/extensions/artificial-intelligence.md b/docs/core/extensions/artificial-intelligence.md
index 970c54c410e1a..d872a9b14f55e 100644
--- a/docs/core/extensions/artificial-intelligence.md
+++ b/docs/core/extensions/artificial-intelligence.md
@@ -60,7 +60,7 @@ The <xref:Microsoft.Extensions.AI.IChatClient> interface defines a client abstra
 
 The following sample implements `IChatClient` to show the general structure.
 
-:::code language="csharp" source="snippets/ai/ConsoleAI/SampleChatClient.cs":::
+:::code language="csharp" source="snippets/ai/AI.Shared/SampleChatClient.cs":::
 
 You can find other concrete implementations of `IChatClient` in the following NuGet packages:
 
@@ -130,7 +130,7 @@ The preceding code:
 
 If you're familiar with [Caching in .NET](caching.md), it's good to know that <xref:Microsoft.Extensions.AI> provides other such delegating `IChatClient` implementations. The <xref:Microsoft.Extensions.AI.DistributedCachingChatClient> is an `IChatClient` that layers caching around another arbitrary `IChatClient` instance. When a unique chat history is submitted to the `DistributedCachingChatClient`, it forwards it to the underlying client and then caches the response before sending it back to the consumer. The next time the same prompt is submitted, such that a cached response can be found in the cache, the `DistributedCachingChatClient` returns the cached response rather than needing to forward the request along the pipeline.
 
-:::code language="csharp" source="snippets/ai/ConsoleAI.CacheResponse/Program.cs":::
+:::code language="csharp" source="snippets/ai/ConsoleAI.CacheResponses/Program.cs":::
 
 The preceding example depends on the [📦 Microsoft.Extensions.Caching.Memory](https://www.nuget.org/packages/Microsoft.Extensions.Caching.Memory) NuGet package. For more information, see [Caching in .NET](caching.md).
 
@@ -188,7 +188,7 @@ Such extensions can also query for relevant services from the DI container; the
 
 The consumer can then easily use this in their pipeline, for example:
 
-:::code language="csharp source="snippets/ai/ConsoleAI.ConsumeClientMiddleware/Program.cs" id="program":::
+:::code language="csharp" source="snippets/ai/ConsoleAI.ConsumeClientMiddleware/Program.cs" id="program":::
 
 This example demonstrates [hosted scenario](generic-host.md), where the consumer relies on [dependency injection](dependency-injection.md) to provide the `RateLimiter` instance. The preceding extension methods demonstrate using a `Use` method on <xref:Microsoft.Extensions.AI.ChatClientBuilder>. The `ChatClientBuilder` also provides <xref:Microsoft.Extensions.AI.ChatClientBuilder.Use*> overloads that make it easier to write such delegating handlers.
 
@@ -231,7 +231,7 @@ The `IEmbeddingGenerator` interface defines a method to asynchronously generate
 
 Consider the following sample implementation of an `IEmbeddingGenerator` to show the general structure but that just generates random embedding vectors.
 
-:::code language="csharp" source="snippets/ai/ConsoleAI/SampleEmbeddingGenerator.cs":::
+:::code language="csharp" source="snippets/ai/AI.Shared/SampleEmbeddingGenerator.cs":::
 
 The preceding code:
 
@@ -267,7 +267,7 @@ The following is an example implementation of such a delegating embedding genera
 
 This can then be layered around an arbitrary `IEmbeddingGenerator<string, Embedding<float>>` to rate limit all embedding generation operations performed.
 
-:::code language="csharp source="snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/Program.cs" id="program":::
+:::code language="csharp" source="snippets/ai/ConsoleAI.ConsumeRateLimitingEmbedding/Program.cs":::
 
 In this way, the `RateLimitingEmbeddingGenerator` can be composed with other `IEmbeddingGenerator<string, Embedding<float>>` instances to provide rate limiting functionality.