Skip to content

Commit fe30769

Browse files
.Net: Add Provider Support to ONNX Connector (+CUDA Sample) (#12861)
### Motivation and Context <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> 1. **Why is this change required?** Currently, users can configure ONNX execution providers by manually creating a `genai_config.json` file next to the ONNX model. 2. **What problem does it solve?** This change eliminates the need for manual JSON configuration files by providing a programmatic API for provider configuration, making it easier for developers to configure ONNX execution providers directly through code with the extension methods. 3. **What scenario does it contribute to?** This improves the developer experience when working with ONNX models in Semantic Kernel, particularly for scenarios where: - Developers want to dynamically select providers based on runtime conditions - Teams prefer code-based configuration over file-based configuration 4. **If it fixes an open issue, please link to the issue here.** Closes #12828 ### Description <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> **Changes Made:** - **New Provider Class**: Added `Provider` class to encapsulate provider configuration with `Id` and `Options` properties - **Enhanced Chat Completion Service**: Extended `OnnxRuntimeGenAIChatCompletionService` to accept the providers parameter. - **Updated Builder Extension**: Modified `OnnxKernelBuilderExtensions.AddOnnxRuntimeGenAIChatCompletion` to accept the providers parameter. - **New Demo Project**: Created `OnnxWithProviderChoice` demo showcasing the new provider configuration API - **Package Configuration**: Add the Microsoft.ML.OnnxRuntime.Gpu package version in the Directory.Packages.props **Design Approach:** The implementation leverages ONNX Runtime GenAI's existing `Config` API to programmatically set providers and their options. The `Provider` class provides a clean abstraction for specifying provider ID (e.g., "cuda", "cpu") and custom options. During service initialization, the providers are configured using the underlying ONNX Runtime GenAI configuration system. **Usage Example:** ```csharp builder.AddOnnxRuntimeGenAIChatCompletion( modelId: "onnx", modelPath: modelPath, providers: [new Provider { Id = "cuda" }] ); ``` **Backward Compatibility:** This change maintains full backward compatibility. Existing code continues to work without modification, and the manual `genai_config.json` approach remains supported. ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: Roger Barreto <[email protected]>
1 parent 11cdd33 commit fe30769

15 files changed

+861
-42
lines changed

dotnet/Directory.Packages.props

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@
5959
<PackageVersion Include="Microsoft.Identity.Client.Extensions.Msal" Version="4.74.1" />
6060
<PackageVersion Include="Microsoft.IdentityModel.JsonWebTokens" Version="8.13.0" />
6161
<PackageVersion Include="Microsoft.ML.OnnxRuntime" Version="1.22.1" />
62+
<PackageVersion Include="Microsoft.ML.OnnxRuntime.Gpu" Version="1.22.1"/>
6263
<PackageVersion Include="Microsoft.ML.Tokenizers.Data.Cl100kBase" Version="1.0.1" />
6364
<PackageVersion Include="Microsoft.SemanticKernel.Abstractions" Version="1.58.0" />
6465
<PackageVersion Include="Microsoft.SemanticKernel.Connectors.OpenAI" Version="1.58.0" />

dotnet/SK-dotnet.slnx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
<Project Path="samples/Demos/ModelContextProtocolPluginAuth/ModelContextProtocolPluginAuth.csproj" />
4242
<Project Path="samples/Demos/OllamaFunctionCalling/OllamaFunctionCalling.csproj" />
4343
<Project Path="samples/Demos/OnnxSimpleRAG/OnnxSimpleRAG.csproj" />
44+
<Project Path="samples/Demos/OnnxSimpleChatWithCuda/OnnxSimpleChatWithCuda.csproj" />
4445
<Project Path="samples/Demos/OpenAIRealtime/OpenAIRealtime.csproj" />
4546
<Project Path="samples/Demos/ProcessWithDapr/ProcessWithDapr.csproj" />
4647
<Project Path="samples/Demos/QualityCheck/QualityCheckWithFilters/QualityCheckWithFilters.csproj" />
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<Project Sdk="Microsoft.NET.Sdk">
2+
<PropertyGroup>
3+
<OutputType>Exe</OutputType>
4+
<TargetFramework>net8.0</TargetFramework>
5+
<NoWarn>$(NoWarn);CA2007,CA2208,CS1591,CA1024,IDE0009,IDE0055,IDE0073,IDE0211,VSTHRD111,SKEXP0001</NoWarn>
6+
</PropertyGroup>
7+
<ItemGroup>
8+
<!--
9+
TODO: fix this WORKAROUND
10+
CUDA provider set up with Microsoft.ML.OnnxRuntimeGenAI.Cuda 0.8.3 + Microsoft.ML.OnnxRuntime.Gpu 1.22.1
11+
- doesn't work with Microsoft.ML.OnnxRuntime 1.22.1
12+
- works with Microsoft.ML.OnnxRuntime 1.22.0
13+
-->
14+
<PackageReference Include="Microsoft.ML.OnnxRuntime" VersionOverride="1.22.0" NoWarn="NU1605"/>
15+
<PackageReference Include="Microsoft.ML.OnnxRuntime.Gpu" />
16+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda"/>
17+
<ProjectReference Include="..\..\..\src\Connectors\Connectors.Onnx\Connectors.Onnx.csproj"/>
18+
<ProjectReference Include="..\..\..\src\SemanticKernel.Abstractions\SemanticKernel.Abstractions.csproj"/>
19+
</ItemGroup>
20+
</Project>
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
using System;
2+
using System.Collections.Generic;
3+
using Microsoft.Extensions.AI;
4+
using Microsoft.SemanticKernel;
5+
using Microsoft.SemanticKernel.Connectors.Onnx;
6+
7+
// Path to the folder of your downloaded ONNX CUDA model
8+
// i.e: D:\repo\huggingface\Phi-3-mini-4k-instruct-onnx\cuda\cuda-int4-rtn-block-32
9+
string modelPath = "MODEL_PATH";
10+
11+
IKernelBuilder builder = Kernel.CreateBuilder();
12+
builder.AddOnnxRuntimeGenAIChatClient(
13+
modelPath: modelPath,
14+
15+
// Specify the provider you want to use, e.g., "cuda" for GPU support
16+
// For other execution providers, check: https://onnxruntime.ai/docs/genai/reference/config#provideroptions
17+
providers: [new Provider("cuda")] //
18+
);
19+
20+
Kernel kernel = builder.Build();
21+
22+
using IChatClient chatClient = kernel.GetRequiredService<IChatClient>();
23+
24+
List<ChatMessage> chatHistory = [];
25+
26+
while (true)
27+
{
28+
Console.Write("User > ");
29+
string userMessage = Console.ReadLine()!;
30+
if (string.IsNullOrEmpty(userMessage))
31+
{
32+
break;
33+
}
34+
35+
chatHistory.Add(new ChatMessage(ChatRole.User, userMessage));
36+
37+
try
38+
{
39+
ChatResponse result = await chatClient.GetResponseAsync(chatHistory, new() { MaxOutputTokens = 1024 });
40+
Console.WriteLine($"Assistant > {result.Text}");
41+
42+
chatHistory.AddRange(result.Messages);
43+
}
44+
catch (Exception e)
45+
{
46+
Console.WriteLine(e.Message);
47+
}
48+
}
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Onnx Simple Chat with Cuda Execution Provider
2+
3+
This sample demonstrates how you use ONNX Connector with CUDA Execution Provider to run Local Models straight from files using Semantic Kernel.
4+
5+
In this example we setup Chat Client from ONNX Connector with [Microsoft's Phi-3-ONNX](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) model
6+
7+
> [!IMPORTANT]
8+
> You can modify to use any other combination of models enabled for ONNX runtime.
9+
10+
## Semantic Kernel used Features
11+
12+
- [Chat Client](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/IChatCompletionService.cs) - Using the Chat Completion Service from [Onnx Connector](https://github.com/microsoft/semantic-kernel/blob/main/dotnet/src/Connectors/Connectors.Onnx/OnnxRuntimeGenAIChatCompletionService.cs) to generate responses from the Local Model.
13+
14+
## Prerequisites
15+
16+
- [.NET 8](https://dotnet.microsoft.com/download/dotnet/8.0).
17+
- [NVIDIA GPU](https://www.nvidia.com/en-us/geforce/graphics-cards)
18+
- [NVIDIA CUDA v12 Toolkit](https://developer.nvidia.com/cuda-12-0-0-download-archive)
19+
- [NVIDIA cuDNN v9.11](https://developer.nvidia.com/cudnn-9-11-0-download-archive)
20+
- Windows users only:
21+
22+
Ensure `PATH` environment variable includes the `bin` folder of the CUDA Toolkit and cuDNN.
23+
i.e:
24+
- C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin
25+
- C:\Program Files\NVIDIA\CUDNN\v9.11\bin\12.9
26+
27+
- Downloaded ONNX Models (see below).
28+
29+
## Downloading the Model
30+
31+
For this example we chose Hugging Face as our repository for download of the local models, go to a directory of your choice where the models should be downloaded and run the following commands:
32+
33+
```powershell
34+
git lfs install
35+
git clone https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx
36+
```
37+
38+
Update the `Program.cs` file lines below with the paths to the models you downloaded in the previous step.
39+
40+
```csharp
41+
// i.e. Running on Windows
42+
string modelPath = "D:\\repo\\huggingface\\Phi-3-mini-4k-instruct-onnx\\cuda\\cuda-int4-rtn-block-32";
43+
```
44+

dotnet/src/Connectors/Connectors.Onnx.UnitTests/OnnxChatClientExtensionsTests.cs

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
11
// Copyright (c) Microsoft. All rights reserved.
22

3+
using System.Collections.Generic;
34
using System.Linq;
45
using Microsoft.Extensions.AI;
56
using Microsoft.Extensions.DependencyInjection;
7+
using Microsoft.ML.OnnxRuntimeGenAI;
68
using Microsoft.SemanticKernel;
9+
using Microsoft.SemanticKernel.Connectors.Onnx;
710
using Xunit;
811

912
namespace SemanticKernel.Connectors.Onnx.UnitTests;
@@ -74,4 +77,75 @@ public void AddOnnxRuntimeGenAIChatClientToKernelBuilderWithServiceId()
7477
Assert.NotNull(serviceDescriptor);
7578
Assert.Equal(ServiceLifetime.Singleton, serviceDescriptor.Lifetime);
7679
}
80+
81+
[Fact]
82+
public void AddOnnxRuntimeGenAIChatClientWithProvidersToServiceCollection()
83+
{
84+
// Arrange
85+
var collection = new ServiceCollection();
86+
var providers = new List<Provider> { new("cuda"), new("cpu") };
87+
88+
// Act
89+
collection.AddOnnxRuntimeGenAIChatClient("modelPath", providers);
90+
91+
// Assert
92+
var serviceDescriptor = collection.FirstOrDefault(x => x.ServiceType == typeof(IChatClient));
93+
Assert.NotNull(serviceDescriptor);
94+
Assert.Equal(ServiceLifetime.Singleton, serviceDescriptor.Lifetime);
95+
Assert.NotNull(serviceDescriptor.ImplementationFactory);
96+
}
97+
98+
[Fact]
99+
public void AddOnnxRuntimeGenAIChatClientWithProvidersToKernelBuilder()
100+
{
101+
// Arrange
102+
var collection = new ServiceCollection();
103+
var kernelBuilder = collection.AddKernel();
104+
var providers = new List<Provider> { new("cuda"), new("cpu") };
105+
106+
// Act
107+
kernelBuilder.AddOnnxRuntimeGenAIChatClient("modelPath", providers);
108+
109+
// Assert
110+
var serviceDescriptor = collection.FirstOrDefault(x => x.ServiceType == typeof(IChatClient));
111+
Assert.NotNull(serviceDescriptor);
112+
Assert.Equal(ServiceLifetime.Singleton, serviceDescriptor.Lifetime);
113+
Assert.NotNull(serviceDescriptor.ImplementationFactory);
114+
}
115+
116+
[Fact]
117+
public void AddOnnxRuntimeGenAIChatClientWithProvidersAndServiceIdToServiceCollection()
118+
{
119+
// Arrange
120+
var collection = new ServiceCollection();
121+
var providers = new List<Provider> { new("cuda") };
122+
123+
// Act
124+
collection.AddOnnxRuntimeGenAIChatClient("modelPath", providers, serviceId: "test-service");
125+
var serviceProvider = collection.BuildServiceProvider();
126+
127+
// Assert
128+
var exception = Assert.Throws<OnnxRuntimeGenAIException>(() => serviceProvider.GetRequiredKeyedService<IChatClient>("test-service"));
129+
130+
Assert.Contains("genai_config.json", exception.Message);
131+
}
132+
133+
[Fact]
134+
public void AddOnnxRuntimeGenAIChatClientWithProvidersAndServiceIdToKernelBuilder()
135+
{
136+
// Arrange
137+
var collection = new ServiceCollection();
138+
var kernelBuilder = collection.AddKernel();
139+
var providers = new List<Provider> { new("cuda") };
140+
141+
// Act
142+
kernelBuilder.AddOnnxRuntimeGenAIChatClient("modelPath", providers, serviceId: "test-service");
143+
var serviceProvider = collection.BuildServiceProvider();
144+
145+
// Assert
146+
var kernel = serviceProvider.GetRequiredService<Kernel>();
147+
var exception = Assert.Throws<OnnxRuntimeGenAIException>(() => kernel.GetRequiredService<IChatClient>("test-service"));
148+
149+
Assert.Contains("genai_config.json", exception.Message);
150+
}
77151
}

dotnet/src/Connectors/Connectors.Onnx.UnitTests/OnnxExtensionsTests.cs

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
// Copyright (c) Microsoft. All rights reserved.
22

3+
using System.Collections.Generic;
4+
using System.Linq;
5+
using Microsoft.Extensions.AI;
36
using Microsoft.Extensions.DependencyInjection;
7+
using Microsoft.ML.OnnxRuntimeGenAI;
48
using Microsoft.SemanticKernel;
59
using Microsoft.SemanticKernel.ChatCompletion;
610
using Microsoft.SemanticKernel.Connectors.Onnx;
@@ -46,4 +50,76 @@ public void AddOnnxRuntimeGenAIChatCompletionToKernelBuilder()
4650
Assert.NotNull(service);
4751
Assert.IsType<OnnxRuntimeGenAIChatCompletionService>(service);
4852
}
53+
54+
[Fact]
55+
public void AddOnnxRuntimeGenAIChatCompletionWithProvidersToServiceCollection()
56+
{
57+
// Arrange
58+
var collection = new ServiceCollection();
59+
var providers = new List<Provider> { new("cuda"), new("cpu") };
60+
collection.AddOnnxRuntimeGenAIChatCompletion("modelId", "modelPath", providers);
61+
62+
// Act
63+
var serviceDescriptor = collection.FirstOrDefault(x => x.ServiceType == typeof(IChatCompletionService));
64+
65+
// Assert
66+
Assert.NotNull(serviceDescriptor);
67+
Assert.Equal(ServiceLifetime.Singleton, serviceDescriptor.Lifetime);
68+
Assert.NotNull(serviceDescriptor.ImplementationFactory);
69+
}
70+
71+
[Fact]
72+
public void AddOnnxRuntimeGenAIChatCompletionWithProvidersToKernelBuilder()
73+
{
74+
// Arrange
75+
var collection = new ServiceCollection();
76+
var kernelBuilder = collection.AddKernel();
77+
var providers = new List<Provider> { new("cuda"), new("cpu") };
78+
kernelBuilder.AddOnnxRuntimeGenAIChatCompletion("modelId", "modelPath", providers);
79+
80+
// Act
81+
var serviceDescriptor = collection.FirstOrDefault(x => x.ServiceType == typeof(IChatCompletionService));
82+
83+
// Assert
84+
Assert.NotNull(serviceDescriptor);
85+
Assert.Equal(ServiceLifetime.Singleton, serviceDescriptor.Lifetime);
86+
Assert.NotNull(serviceDescriptor.ImplementationFactory);
87+
}
88+
89+
[Fact]
90+
public void AddOnnxRuntimeGenAIChatCompletionWithProvidersAndServiceIdToServiceCollection()
91+
{
92+
// Arrange
93+
var collection = new ServiceCollection();
94+
var providers = new List<Provider> { new("cuda") };
95+
collection.AddOnnxRuntimeGenAIChatCompletion("modelId", "modelPath", providers, serviceId: "test-service");
96+
97+
// Act
98+
var serviceProvider = collection.BuildServiceProvider();
99+
100+
// Assert
101+
var exception = Assert.Throws<OnnxRuntimeGenAIException>(() => serviceProvider.GetRequiredKeyedService<IChatCompletionService>("test-service"));
102+
103+
Assert.Contains("genai_config.json", exception.Message);
104+
}
105+
106+
[Fact]
107+
public void AddOnnxRuntimeGenAIChatCompletionWithProvidersAndServiceIdToKernelBuilder()
108+
{
109+
// Arrange
110+
var collection = new ServiceCollection();
111+
var kernelBuilder = collection.AddKernel();
112+
var providers = new List<Provider> { new("cuda") };
113+
kernelBuilder.AddOnnxRuntimeGenAIChatCompletion("modelId", "modelPath", providers, serviceId: "test-service");
114+
115+
// Act
116+
var serviceDescriptor = collection.FirstOrDefault(x => x.ServiceType == typeof(IChatCompletionService) && x.ServiceKey?.ToString() == "test-service");
117+
var serviceProvider = collection.BuildServiceProvider();
118+
119+
// Assert
120+
var kernel = serviceProvider.GetRequiredService<Kernel>();
121+
var exception = Assert.Throws<OnnxRuntimeGenAIException>(() => kernel.GetRequiredService<IChatCompletionService>("test-service"));
122+
123+
Assert.Contains("genai_config.json", exception.Message);
124+
}
49125
}

0 commit comments

Comments
 (0)