Skip to content

Commit 5b56415

Browse files
authored
Merge pull request #254 from DC-Shi/fix-nim-image-auth
Add NIM supprot: Fix NIM auth issue for download nvcr.io images.
2 parents 1bb9fcf + e7933ea commit 5b56415

File tree

4 files changed

+691
-0
lines changed

4 files changed

+691
-0
lines changed

README.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -282,6 +282,80 @@ The response will contain the model's reply:
282282
}
283283
```
284284

285+
## NVIDIA NIM Support
286+
287+
Docker Model Runner supports running NVIDIA NIM (NVIDIA Inference Microservices) containers directly. This provides a simplified workflow for deploying NVIDIA's optimized inference containers.
288+
289+
### Prerequisites
290+
291+
- Docker with NVIDIA GPU support (nvidia-docker2 or Docker with NVIDIA Container Runtime)
292+
- NGC API Key (optional, but required for some NIM models)
293+
- Docker login to nvcr.io registry
294+
295+
### Quick Start
296+
297+
1. **Login to NVIDIA Container Registry:**
298+
299+
```bash
300+
docker login nvcr.io
301+
Username: $oauthtoken
302+
Password: <PASTE_API_KEY_HERE>
303+
```
304+
305+
2. **Set NGC API Key (if required by the model):**
306+
307+
```bash
308+
export NGC_API_KEY=<PASTE_API_KEY_HERE>
309+
```
310+
311+
3. **Run a NIM model:**
312+
313+
```bash
314+
docker model run nvcr.io/nim/google/gemma-3-1b-it:latest
315+
```
316+
317+
That's it! The Docker Model Runner will:
318+
- Automatically detect that this is a NIM image
319+
- Pull the NIM container image
320+
- Configure it with proper GPU support, shared memory (16GB), and NGC credentials
321+
- Start the container and wait for it to be ready
322+
- Provide an interactive chat interface
323+
324+
### Features
325+
326+
- **Automatic GPU Detection**: Automatically configures NVIDIA GPU support if available
327+
- **Persistent Caching**: Models are cached in `~/.cache/nim` (or `$LOCAL_NIM_CACHE` if set)
328+
- **Interactive Chat**: Supports both single prompt and interactive chat modes
329+
- **Container Reuse**: Existing NIM containers are reused across runs
330+
331+
### Example Usage
332+
333+
**Single prompt:**
334+
```bash
335+
docker model run nvcr.io/nim/google/gemma-3-1b-it:latest "Explain quantum computing"
336+
```
337+
338+
**Interactive chat:**
339+
```bash
340+
docker model run nvcr.io/nim/google/gemma-3-1b-it:latest
341+
> Tell me a joke
342+
...
343+
> /bye
344+
```
345+
346+
### Configuration
347+
348+
- **NGC_API_KEY**: Set this environment variable to authenticate with NVIDIA's services
349+
- **LOCAL_NIM_CACHE**: Override the default cache location (default: `~/.cache/nim`)
350+
351+
### Technical Details
352+
353+
NIM containers:
354+
- Run on port 8000 (localhost only)
355+
- Use 16GB shared memory by default
356+
- Mount `~/.cache/nim` for model caching
357+
- Support NVIDIA GPU acceleration when available
358+
285359
## Metrics
286360

287361
The Model Runner exposes [the metrics endpoint](https://github.com/ggml-org/llama.cpp/tree/master/tools/server#get-metrics-prometheus-compatible-metrics-exporter) of llama.cpp server at the `/metrics` endpoint. This allows you to monitor model performance, request statistics, and resource usage.

0 commit comments

Comments
 (0)