Skip to content

Commit 61d3e6e

Browse files
authored
Python: Introduce NvidiaChatCompletion AI Connector (#12952)
### Motivation and Context <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? Enables developers to use NVIDIA's chat models through Semantic Kernel with structured output support. 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> This change adds chat completion support with structured output to the existing NVIDIA connector, enabling developers to use NVIDIA's chat models through Semantic Kernel while building conversational AI applications that require structured, validated responses. The connector will support more model types iteratively, with future PRs introducing VLM models and additional capabilities. ### Description <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> - New Chat Completion Service: `NvidiaChatCompletion`: Chat completion service following same structure as other connectors - Extends existing NVIDIA embedding connector patterns and architecture - Custom logic for handling structured output specific to NVIDIA models - Enhanced Configuration: `NvidiaChatPromptExecutionSettings`: Chat-specific settings following existing connector patterns ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄
1 parent d8af1ed commit 61d3e6e

File tree

14 files changed

+939
-31
lines changed

14 files changed

+939
-31
lines changed

python/samples/concepts/setup/ALL_SETTINGS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@
3030
| | [VertexAITextEmbedding](../../../semantic_kernel/connectors/ai/google/google_ai/services/google_ai_text_embedding.py) | project_id, <br> region, <br> embedding_model_id | VERTEX_AI_PROJECT_ID, <br> VERTEX_AI_REGION, <br> VERTEX_AI_EMBEDDING_MODEL_ID | Yes, <br> No, <br> Yes | |
3131
| HuggingFace | [HuggingFaceTextCompletion](../../../semantic_kernel/connectors/ai/hugging_face/services/hf_text_completion.py) | ai_model_id | N/A | Yes | |
3232
| | [HuggingFaceTextEmbedding](../../../semantic_kernel/connectors/ai/hugging_face/services/hf_text_embedding.py) | ai_model_id | N/A | Yes | |
33-
| NVIDIA NIM | [NvidiaTextEmbedding](../../../semantic_kernel/connectors/ai/nvidia/services/nvidia_text_embedding.py) | ai_model_id, <br> api_key, <br> base_url | NVIDIA_API_KEY, <br> NVIDIA_TEXT_EMBEDDING_MODEL_ID, <br> NVIDIA_BASE_URL | Yes | [NvidiaAISettings](../../../semantic_kernel/connectors/ai/nvidia/settings/nvidia_settings.py) |
33+
| NVIDIA NIM | [NvidiaChatCompletion](../../../semantic_kernel/connectors/ai/nvidia/services/nvidia_chat_completion.py) | ai_model_id, <br> api_key, <br> base_url | NVIDIA_CHAT_MODEL_ID, <br> NVIDIA_API_KEY, <br> NVIDIA_BASE_URL | Yes (default: meta/llama-3.1-8b-instruct), <br> Yes, <br> No | [NvidiaAISettings](../../../semantic_kernel/connectors/ai/nvidia/settings/nvidia_settings.py) |
34+
| | [NvidiaTextEmbedding](../../../semantic_kernel/connectors/ai/nvidia/services/nvidia_text_embedding.py) | ai_model_id, <br> api_key, <br> base_url | NVIDIA_API_KEY, <br> NVIDIA_TEXT_EMBEDDING_MODEL_ID, <br> NVIDIA_BASE_URL | Yes | [NvidiaAISettings](../../../semantic_kernel/connectors/ai/nvidia/settings/nvidia_settings.py) |
3435
| Mistral AI | [MistralAIChatCompletion](../../../semantic_kernel/connectors/ai/mistral_ai/services/mistral_ai_chat_completion.py) | ai_model_id, <br> api_key | MISTRALAI_CHAT_MODEL_ID, <br> MISTRALAI_API_KEY | Yes, <br> Yes | [MistralAISettings](../../../semantic_kernel/connectors/ai/mistral_ai/settings/mistral_ai_settings.py) |
3536
| | [MistralAITextEmbedding](../../../semantic_kernel/connectors/ai/mistral_ai/services/mistral_ai_text_embedding.py) | ai_model_id, <br> api_key | MISTRALAI_EMBEDDING_MODEL_ID, <br> MISTRALAI_API_KEY | Yes, <br> Yes | |
3637
| Ollama | [OllamaChatCompletion](../../../semantic_kernel/connectors/ai/ollama/services/ollama_chat_completion.py) | ai_model_id, <br> host | OLLAMA_CHAT_MODEL_ID, <br> OLLAMA_HOST | Yes, <br> No | [OllamaSettings](../../../semantic_kernel/connectors/ai/ollama/ollama_settings.py) |

python/samples/concepts/setup/chat_completion_services.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ class Services(str, Enum):
2828
ONNX = "onnx"
2929
VERTEX_AI = "vertex_ai"
3030
DEEPSEEK = "deepseek"
31+
NVIDIA = "nvidia"
3132

3233

3334
service_id = "default"
@@ -64,6 +65,7 @@ def get_chat_completion_service_and_request_settings(
6465
Services.ONNX: lambda: get_onnx_chat_completion_service_and_request_settings(),
6566
Services.VERTEX_AI: lambda: get_vertex_ai_chat_completion_service_and_request_settings(),
6667
Services.DEEPSEEK: lambda: get_deepseek_chat_completion_service_and_request_settings(),
68+
Services.NVIDIA: lambda: get_nvidia_chat_completion_service_and_request_settings(),
6769
}
6870

6971
# Call the appropriate lambda or function based on the service name
@@ -414,3 +416,27 @@ def get_deepseek_chat_completion_service_and_request_settings() -> tuple[
414416
request_settings = OpenAIChatPromptExecutionSettings(service_id=service_id)
415417

416418
return chat_service, request_settings
419+
420+
421+
def get_nvidia_chat_completion_service_and_request_settings() -> tuple[
422+
"ChatCompletionClientBase", "PromptExecutionSettings"
423+
]:
424+
"""Return NVIDIA chat completion service and request settings.
425+
426+
The service credentials can be read by 3 ways:
427+
1. Via the constructor
428+
2. Via the environment variables
429+
3. Via an environment file
430+
431+
The request settings control the behavior of the service. The default settings are sufficient to get started.
432+
However, you can adjust the settings to suit your needs.
433+
Note: Some of the settings are NOT meant to be set by the user.
434+
Please refer to the Semantic Kernel Python documentation for more information:
435+
https://learn.microsoft.com/en-us/python/api/semantic-kernel/semantic_kernel?view=semantic-kernel-python
436+
"""
437+
from semantic_kernel.connectors.ai.nvidia import NvidiaChatCompletion, NvidiaChatPromptExecutionSettings
438+
439+
chat_service = NvidiaChatCompletion(service_id=service_id)
440+
request_settings = NvidiaChatPromptExecutionSettings(service_id=service_id)
441+
442+
return chat_service, request_settings

python/semantic_kernel/connectors/ai/nvidia/README.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# semantic_kernel.connectors.ai.nvidia
22

3-
This connector enables integration with NVIDIA NIM API for text embeddings. It allows you to use NVIDIA's embedding models within the Semantic Kernel framework.
3+
This connector enables integration with NVIDIA NIM API for text embeddings and chat completion. It allows you to use NVIDIA's models within the Semantic Kernel framework.
44

55
## Quick start
66

@@ -13,6 +13,8 @@ kernel = sk.Kernel()
1313
### Add NVIDIA text embedding service
1414
You can provide your API key directly or through environment variables
1515
```python
16+
from semantic_kernel.connectors.ai.nvidia import NvidiaTextEmbedding
17+
1618
embedding_service = NvidiaTextEmbedding(
1719
ai_model_id="nvidia/nv-embedqa-e5-v5", # Default model if not specified
1820
api_key="your-nvidia-api-key", # Can also use NVIDIA_API_KEY env variable
@@ -30,3 +32,35 @@ kernel.add_service(embedding_service)
3032
texts = ["Hello, world!", "Semantic Kernel is awesome"]
3133
embeddings = await kernel.get_service("nvidia-embeddings").generate_embeddings(texts)
3234
```
35+
36+
### Add NVIDIA chat completion service
37+
```python
38+
from semantic_kernel.connectors.ai.nvidia import NvidiaChatCompletion
39+
40+
chat_service = NvidiaChatCompletion(
41+
ai_model_id="meta/llama-3.1-8b-instruct", # Default model if not specified
42+
api_key="your-nvidia-api-key", # Can also use NVIDIA_API_KEY env variable
43+
service_id="nvidia-chat" # Optional service identifier
44+
)
45+
kernel.add_service(chat_service)
46+
```
47+
48+
### Basic chat completion
49+
```python
50+
response = await kernel.invoke_prompt("Hello, how are you?")
51+
```
52+
53+
### Using with Chat Completion Agent
54+
```python
55+
from semantic_kernel.agents import ChatCompletionAgent
56+
from semantic_kernel.connectors.ai.nvidia import NvidiaChatCompletion
57+
58+
agent = ChatCompletionAgent(
59+
service=NvidiaChatCompletion(),
60+
name="SK-Assistant",
61+
instructions="You are a helpful assistant.",
62+
)
63+
response = await agent.get_response(messages="Write a haiku about Semantic Kernel.")
64+
print(response.content)
65+
```
66+

python/semantic_kernel/connectors/ai/nvidia/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
11
# Copyright (c) Microsoft. All rights reserved.
22

33
from semantic_kernel.connectors.ai.nvidia.prompt_execution_settings.nvidia_prompt_execution_settings import (
4+
NvidiaChatPromptExecutionSettings,
45
NvidiaEmbeddingPromptExecutionSettings,
56
NvidiaPromptExecutionSettings,
67
)
8+
from semantic_kernel.connectors.ai.nvidia.services.nvidia_chat_completion import NvidiaChatCompletion
79
from semantic_kernel.connectors.ai.nvidia.services.nvidia_text_embedding import NvidiaTextEmbedding
810
from semantic_kernel.connectors.ai.nvidia.settings.nvidia_settings import NvidiaSettings
911

1012
__all__ = [
13+
"NvidiaChatCompletion",
14+
"NvidiaChatPromptExecutionSettings",
1115
"NvidiaEmbeddingPromptExecutionSettings",
1216
"NvidiaPromptExecutionSettings",
1317
"NvidiaSettings",

python/semantic_kernel/connectors/ai/nvidia/prompt_execution_settings/nvidia_prompt_execution_settings.py

Lines changed: 45 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
from typing import Annotated, Any, Literal
44

5-
from pydantic import Field
5+
from pydantic import BaseModel, Field
66

77
from semantic_kernel.connectors.ai.prompt_execution_settings import PromptExecutionSettings
88

@@ -13,18 +13,6 @@ class NvidiaPromptExecutionSettings(PromptExecutionSettings):
1313
format: Literal["json"] | None = None
1414
options: dict[str, Any] | None = None
1515

16-
def prepare_settings_dict(self, **kwargs) -> dict[str, Any]:
17-
"""Prepare the settings as a dictionary for sending to the AI service.
18-
19-
By default, this method excludes the service_id and extension_data fields.
20-
As well as any fields that are None.
21-
"""
22-
return self.model_dump(
23-
exclude={"service_id", "extension_data", "structured_json_response", "input_type", "truncate"},
24-
exclude_none=True,
25-
by_alias=True,
26-
)
27-
2816

2917
class NvidiaEmbeddingPromptExecutionSettings(NvidiaPromptExecutionSettings):
3018
"""Settings for NVIDIA embedding prompt execution."""
@@ -39,3 +27,47 @@ class NvidiaEmbeddingPromptExecutionSettings(NvidiaPromptExecutionSettings):
3927
extra_body: dict | None = None
4028
timeout: float | None = None
4129
dimensions: Annotated[int | None, Field(gt=0)] = None
30+
31+
def prepare_settings_dict(self, **kwargs) -> dict[str, Any]:
32+
"""Override only for embeddings to exclude input_type and truncate."""
33+
return self.model_dump(
34+
exclude={"service_id", "extension_data", "structured_json_response", "input_type", "truncate"},
35+
exclude_none=True,
36+
by_alias=True,
37+
)
38+
39+
40+
class NvidiaChatPromptExecutionSettings(NvidiaPromptExecutionSettings):
41+
"""Settings for NVIDIA chat prompt execution."""
42+
43+
messages: list[dict[str, str]] | None = None
44+
ai_model_id: Annotated[str | None, Field(serialization_alias="model")] = None
45+
temperature: float | None = None
46+
top_p: float | None = None
47+
n: int | None = None
48+
stream: bool = False
49+
stop: str | list[str] | None = None
50+
max_tokens: int | None = None
51+
presence_penalty: float | None = None
52+
frequency_penalty: float | None = None
53+
logit_bias: dict[str, float] | None = None
54+
user: str | None = None
55+
tools: list[dict[str, Any]] | None = None
56+
tool_choice: str | dict[str, Any] | None = None
57+
response_format: (
58+
dict[Literal["type"], Literal["text", "json_object"]] | dict[str, Any] | type[BaseModel] | type | None
59+
) = None
60+
seed: int | None = None
61+
extra_headers: dict | None = None
62+
extra_body: dict | None = None
63+
timeout: float | None = None
64+
# NVIDIA-specific structured output support
65+
nvext: dict[str, Any] | None = None
66+
67+
def prepare_settings_dict(self, **kwargs) -> dict[str, Any]:
68+
"""Override only for embeddings to exclude input_type and truncate."""
69+
return self.model_dump(
70+
exclude={"service_id", "extension_data", "structured_json_response", "response_format"},
71+
exclude_none=True,
72+
by_alias=True,
73+
)

0 commit comments

Comments
 (0)