-
Notifications
You must be signed in to change notification settings - Fork 79
Add NVIDIA NIM support to Docker Model Runner #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add detection for NVIDIA NIM images (nvcr.io/nim/ prefix) - Create NIM-specific container lifecycle management - Configure NIM containers with GPU support, shared memory, and NGC API key - Proxy chat requests to NIM container's OpenAI-compatible API - Add tests for NIM detection and container naming - Support both single prompt and interactive chat modes for NIM - Add comprehensive NIM support documentation to README - Improve user feedback for NIM initialization status - Add GPU detection status messages - Improve timeout error message with troubleshooting tip Signed-off-by: Eric Curtin <[email protected]>
Reviewer's GuideExtends the Docker Model Runner CLI to detect NVIDIA NIM images and manage their full container lifecycle—including pull, creation, GPU and cache configuration, readiness checks, and OpenAI-compatible chat proxying—while adding tests and comprehensive documentation. Sequence diagram for NIM container lifecycle managementsequenceDiagram
participant User as actor User
participant CLI as Docker Model Runner CLI
participant Docker as Docker Engine
participant NIM as NIM Container
participant API as OpenAI-compatible API
User->>CLI: Run command with NIM image
CLI->>CLI: isNIMImage(model)
CLI->>Docker: Check for existing NIM container
alt Container exists and running
CLI->>Docker: Use existing container
else Container exists but not running
CLI->>Docker: Start container
else No container exists
CLI->>Docker: Pull NIM image
CLI->>Docker: Create and start container
end
CLI->>NIM: Wait for readiness (poll /v1/models)
NIM-->>CLI: Ready
User->>CLI: Provide prompt or start chat
CLI->>API: Proxy chat request to NIM OpenAI API
API-->>CLI: Stream response
CLI-->>User: Display response
Class diagram for NIM container management typesclassDiagram
class NIMManager {
+isNIMImage(model string) bool
+nimContainerName(model string) string
+pullNIMImage(ctx, dockerClient, model, cmd) error
+findNIMContainer(ctx, dockerClient, model) (string, error)
+createNIMContainer(ctx, dockerClient, model, cmd) (string, error)
+waitForNIMReady(ctx, cmd) error
+runNIMModel(ctx, dockerClient, model, cmd) error
+chatWithNIM(cmd, model, prompt) error
}
class gpupkg {
+ProbeGPUSupport(ctx, dockerClient) (GPUSupport, error)
+HasNVIDIARuntime(ctx, dockerClient) (bool, error)
GPUSupportNone
GPUSupportCUDA
}
NIMManager -- gpupkg: uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @ericcurtin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly extends the Docker Model Runner's capabilities by introducing native support for NVIDIA Inference Microservices (NIM). It streamlines the process of deploying and interacting with NVIDIA's optimized AI models by automatically managing NIM container lifecycles, configuring GPU resources, and providing a seamless chat interface. This integration aims to simplify the user experience for leveraging high-performance inference models. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Consider making nimDefaultPort, nimDefaultShmSize and cache directory configurable via CLI flags or environment variables rather than hardcoding values to allow greater flexibility.
- The SSE parsing in chatWithNIM relies on manual string operations—consider using a proper JSON streaming decoder or SSE client library to handle edge cases and keep the code more robust.
- The newRunCmd interactive loop and runNIMModel setup share overlapping logic—factoring out common behaviors could reduce duplication and make future maintenance easier.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider making nimDefaultPort, nimDefaultShmSize and cache directory configurable via CLI flags or environment variables rather than hardcoding values to allow greater flexibility.
- The SSE parsing in chatWithNIM relies on manual string operations—consider using a proper JSON streaming decoder or SSE client library to handle edge cases and keep the code more robust.
- The newRunCmd interactive loop and runNIMModel setup share overlapping logic—factoring out common behaviors could reduce duplication and make future maintenance easier.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive NVIDIA NIM (NVIDIA Inference Microservices) support to Docker Model Runner, enabling users to run NVIDIA's optimized inference containers directly through the existing CLI interface.
- Automatic detection of NIM images based on the
nvcr.io/nim/registry prefix - Complete container lifecycle management with GPU support, shared memory configuration, and NGC API key handling
- OpenAI-compatible API integration for both single prompt and interactive chat modes
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| cmd/cli/commands/run.go | Adds NIM image detection and routing logic to the main run command |
| cmd/cli/commands/nim.go | Implements core NIM functionality including container management and chat API integration |
| cmd/cli/commands/nim_test.go | Provides unit tests for NIM image detection and container naming functions |
| README.md | Documents NIM support with setup instructions, usage examples, and technical details |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Parse the JSON and extract the content | ||
| // For simplicity, we'll use basic string parsing | ||
| // In production, we'd use proper JSON parsing | ||
| if strings.Contains(data, `"content"`) { | ||
| // Extract content field - simple approach | ||
| contentStart := strings.Index(data, `"content":"`) | ||
| if contentStart != -1 { | ||
| contentStart += len(`"content":"`) | ||
| contentEnd := strings.Index(data[contentStart:], `"`) | ||
| if contentEnd != -1 { | ||
| content := data[contentStart : contentStart+contentEnd] | ||
| // Unescape basic JSON escapes | ||
| content = strings.ReplaceAll(content, `\n`, "\n") | ||
| content = strings.ReplaceAll(content, `\t`, "\t") | ||
| content = strings.ReplaceAll(content, `\"`, `"`) | ||
| cmd.Print(content) | ||
| } | ||
| } | ||
| } |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manual JSON parsing approach is fragile and may fail with complex content that contains escaped quotes or nested JSON structures. This could result in incorrect content extraction or parsing failures. Use proper JSON unmarshaling with a struct to handle the SSE response data safely.
| } | ||
| } | ||
|
|
||
| return fmt.Errorf("NIM failed to become ready within timeout. Check container logs with: docker logs $(docker ps -q --filter name=docker-model-nim-)") |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message uses a complex shell command that may not work correctly in all environments. The docker ps -q --filter name=docker-model-nim- command could return multiple container IDs if multiple NIM containers exist. Consider providing a more specific container name or a simpler troubleshooting command.
| content = strings.ReplaceAll(content, `\n`, "\n") | ||
| content = strings.ReplaceAll(content, `\t`, "\t") | ||
| content = strings.ReplaceAll(content, `\"`, `"`) |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON unescaping is incomplete and may not handle all valid JSON escape sequences. Missing handling for \\ (backslash), \/ (forward slash), \b (backspace), \f (form feed), \r (carriage return), and Unicode escapes (\uXXXX). This could result in incorrectly displayed content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant new functionality by adding support for NVIDIA NIM containers. The implementation is comprehensive, covering container lifecycle management, GPU configuration, and proxying chat requests. The code is generally well-structured, and the inclusion of documentation and tests is commendable. I have identified a few areas for improvement, primarily concerning robustness, maintainability, and correctness. Specifically, there are some issues with JSON handling and container name generation that should be addressed. I've also suggested refactoring to reduce code duplication. Overall, this is a solid contribution that will be even better with a few adjustments.
| func nimContainerName(model string) string { | ||
| // Extract the model name from the reference (e.g., nvcr.io/nim/google/gemma-3-1b-it:latest -> google-gemma-3-1b-it) | ||
| parts := strings.Split(strings.TrimPrefix(model, nimPrefix), "/") | ||
| name := strings.Join(parts, "-") | ||
| // Remove tag if present | ||
| if idx := strings.Index(name, ":"); idx != -1 { | ||
| name = name[:idx] | ||
| } | ||
| // Replace any remaining special characters | ||
| name = strings.ReplaceAll(name, ":", "-") | ||
| name = strings.ReplaceAll(name, "/", "-") | ||
| return nimContainerPrefix + name | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation for generating a container name is not robust. It can produce incorrect names for models that have a colon in their path segments (e.g., nvcr.io/nim/a:b/c:latest would incorrectly result in a container name based on a). The order of operations should be changed to first sanitize the path by replacing slashes, and then remove the tag. Using strings.LastIndex is also safer for finding the tag separator.
The current implementation also has redundant ReplaceAll calls.
func nimContainerName(model string) string {
// Extract the model name from the reference (e.g., nvcr.io/nim/google/gemma-3-1b-it:latest -> google-gemma-3-1b-it)
name := strings.TrimPrefix(model, nimPrefix)
// Replace path separators with dashes
name = strings.ReplaceAll(name, "/", "-")
// Remove tag, which is separated by the last colon.
if idx := strings.LastIndex(name, ":"); idx != -1 {
name = name[:idx]
}
return nimContainerPrefix + name
}| reqBody := fmt.Sprintf(`{ | ||
| "model": "%s", | ||
| "messages": [ | ||
| {"role": "user", "content": %q} | ||
| ], | ||
| "stream": true | ||
| }`, modelName, prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using fmt.Sprintf to construct a JSON string is not robust and can be error-prone, especially when dealing with user input that might contain special characters. The %q format verb is for Go string literals and doesn't guarantee a valid JSON string for all inputs. It's much safer and more idiomatic to define a struct for the request body and use json.Marshal.
You will need to import the bytes and encoding/json packages.
type chatMessage struct {
Role string `json:"role"`
Content string `json:"content"`
}
type chatRequest struct {
Model string `json:"model"`
Messages []chatMessage `json:"messages"`
Stream bool `json:"stream"`
}
payload := chatRequest{
Model: modelName,
Messages: []chatMessage{
{Role: "user", Content: prompt},
},
Stream: true,
}
reqBodyBytes, err := json.Marshal(payload)
if err != nil {
return fmt.Errorf("failed to marshal request body: %w", err)
}| // Parse the JSON and extract the content | ||
| // For simplicity, we'll use basic string parsing | ||
| // In production, we'd use proper JSON parsing | ||
| if strings.Contains(data, `"content"`) { | ||
| // Extract content field - simple approach | ||
| contentStart := strings.Index(data, `"content":"`) | ||
| if contentStart != -1 { | ||
| contentStart += len(`"content":"`) | ||
| contentEnd := strings.Index(data[contentStart:], `"`) | ||
| if contentEnd != -1 { | ||
| content := data[contentStart : contentStart+contentEnd] | ||
| // Unescape basic JSON escapes | ||
| content = strings.ReplaceAll(content, `\n`, "\n") | ||
| content = strings.ReplaceAll(content, `\t`, "\t") | ||
| content = strings.ReplaceAll(content, `\"`, `"`) | ||
| cmd.Print(content) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing JSON with string manipulation is fragile and error-prone. The comment acknowledges this is for simplicity, but it should be replaced with proper JSON parsing for robustness. The current implementation only unescapes a few characters (\n, \t, \") and will fail on others. Using json.Unmarshal into a struct is the correct and safer approach.
// Parse the JSON and extract the content
var streamResp struct {
Choices []struct {
Delta struct {
Content string `json:"content"`
} `json:"delta"`
} `json:"choices"`
}
if err := json.Unmarshal([]byte(data), &streamResp); err == nil {
if len(streamResp.Choices) > 0 {
cmd.Print(streamResp.Choices[0].Delta.Content)
}
}| if err != nil { | ||
| return "", fmt.Errorf("failed to get home directory: %w", err) | ||
| } | ||
| cacheDir = homeDir + "/.cache/nim" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if err := waitForNIMReady(ctx, cmd); err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help users debug issues, it's useful to wrap the error from waitForNIMReady with more specific context, including the container name and a hint on how to check its logs. The current error message in waitForNIMReady contains a shell command that is not robust and may fail if multiple NIM containers are running. It would be better to simplify the error in waitForNIMReady to just report the timeout, and add the specific debugging instructions here in runNIMModel.
if err := waitForNIMReady(ctx, cmd); err != nil {
containerName := nimContainerName(model)
return fmt.Errorf("failed to wait for NIM container %q to be ready: %w. You can check its logs with: 'docker logs %s'", containerName, err, containerName)
}| // If no prompt provided, enter interactive mode | ||
| if prompt == "" { | ||
| scanner := bufio.NewScanner(os.Stdin) | ||
| cmd.Println("Interactive chat mode started. Type '/bye' to exit.") | ||
|
|
||
| for { | ||
| userInput, err := readMultilineInput(cmd, scanner) | ||
| if err != nil { | ||
| if err.Error() == "EOF" { | ||
| cmd.Println("\nChat session ended.") | ||
| break | ||
| } | ||
| return fmt.Errorf("Error reading input: %v", err) | ||
| } | ||
|
|
||
| if strings.ToLower(strings.TrimSpace(userInput)) == "/bye" { | ||
| cmd.Println("Chat session ended.") | ||
| break | ||
| } | ||
|
|
||
| if strings.TrimSpace(userInput) == "" { | ||
| continue | ||
| } | ||
|
|
||
| if err := chatWithNIM(cmd, model, userInput); err != nil { | ||
| cmd.PrintErr(fmt.Errorf("failed to chat with NIM: %w", err)) | ||
| continue | ||
| } | ||
|
|
||
| cmd.Println() | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interactive chat loop is very similar to the one for non-NIM models later in the file (lines 452-477). This code duplication can be avoided by refactoring the loop into a helper function that takes a chat function as an argument. This would improve maintainability and reduce the chance of bugs when one loop is updated but the other is not.
Additionally, checking for EOF by string comparison (err.Error() == "EOF") is not robust. It's better to use errors.Is(err, io.EOF). This would require readMultilineInput to return io.EOF directly instead of a formatted error string.
|
This will need a contributor with Nvidia hardware to own this |
|
Plenty of the AI bot comments here make sense to address |
| Count: -1, | ||
| Capabilities: [][]string{{"gpu"}}, | ||
| }} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If someone is running NIM, they likely want NVIDIA GPU support. Maybe we should warn the user if we could not detect one?
Summary by Sourcery
Add full NVIDIA NIM support to Docker Model Runner, enabling detection and lifecycle management of NIM containers with GPU, memory, and API key configuration, and proxying chat requests to their OpenAI-compatible API.
New Features:
Enhancements:
Documentation:
Tests: