GitHub - RaidanAi/bring-llms-to-nim: Nvidia

NVIDIA AI Blueprint: Bring Your LLM to NIM

Deploy your own Large Language Model (LLM) using NVIDIA NIM - a powerful service that automatically optimizes your model for maximum performance on NVIDIA GPUs.

What is NIM?

NVIDIA NIM is a containerized solution that:

Automatically optimizes your LLM for the best performance
Handles all the complexity of model deployment and serving
Provides an OpenAI-compatible API for easy integration
Supports multiple model formats without manual conversion

What You'll Learn

This repository contains three hands-on notebooks showing different deployment approaches:

📘 Notebook 1: Deploy HuggingFace Safetensors

Best for: Quick deployment of popular models from HuggingFace

Deploy models directly from HuggingFace with a single command
Download and deploy from local storage for offline use
Choose between TensorRT-LLM and vLLM backends

📗 Notebook 2: Deploy TensorRT-LLM Checkpoints and Engines

Best for: Maximum performance with custom optimization

Convert HuggingFace models to TensorRT-LLM format
Compile optimized engines for production deployment
Fine-tune performance parameters

📙 Notebook 3: Deploy GGUF Checkpoints

Best for: Memory-efficient deployment with quantized models

Deploy pre-quantized GGUF models (4-bit, 8-bit, etc.)
Reduce GPU memory requirements significantly
Handle external configuration requirements

Important

NVIDIA cannot guarantee the security of any models hosted on non-NVIDIA systems such as HuggingFace. Malicious or insecure models can result in serious security risks up to and including full remote code execution. We strongly recommend that before attempting to load it you manually verify the safety of any model not provided by NVIDIA, through such mechanisms as a) ensuring that the model weights are serialized using the safetensors format, b) conducting a manual review of any model or inference code to ensure that it is free of obfuscated or malicious code, and c) validating the signature of the model, if available, to ensure that it comes from a trusted source and has not been modified.

What You'll Learn

What is NIM? NVIDIA Inference Microservice (NIM) is a containerized solution that automatically optimizes LLM deployment
Why use NIM? Handles model optimization, backend selection, and scaling automatically
What you need to know: Basic Docker commands and Python. No prior LLM deployment experience required.

Quick Start: Deploy Your First Model in 3 Minutes

Step 1: Pull the NIM container

docker pull nvcr.io/nim/nvidia/llm-nim:latest

Step 2: Deploy a small model

docker run -it --rm --name=my-first-nim \
  --runtime=nvidia --gpus all \
  -p 8000:8000 \
  -e NIM_MODEL_NAME="hf://Qwen/Qwen2.5-0.5B" \
  nvcr.io/nim/nvidia/llm-nim:latest

Step 3: Test it

curl http://localhost:8000/v1/completions \
  -d '{"model": "Qwen/Qwen2.5-0.5B", "prompt": "Hello world"}'

Overview

This blueprint demonstrates how to deploy almost any Large Language Model using NVIDIA NIM (NVIDIA Inference Microservice). NIM provides a streamlined way to deploy and serve LLMs with optimized performance, handling model analysis, backend selection, and configuration automatically.

Key capabilities:

Deploy models directly from Hugging Face Hub
Deploy models from local storage
Multiple inference backends (TensorRT-LLM, vLLM)
Customizable deployment parameters
OpenAI-compatible REST API

Software Components

NVIDIA NIM LLM Container: nvcr.io/nim/nvidia/llm-nim:latest
Supported Inference Backends:
- TensorRT-LLM (high performance)
- vLLM (versatile deployment)
Container Runtime: Docker with NVIDIA Container Runtime
Model Sources: Hugging Face Hub, Local filesystem

Hardware Requirements

GPU: NVIDIA GPU with compute capability 7.0+ (V100, T4, A10, A100, H100, etc.)
GPU Memory: Varies by model size (minimum 8GB for small models, 40GB+ for large models)
System Memory: 16GB+ RAM recommended
Storage: Sufficient disk space for model weights and cache
NVIDIA Driver: Version 535+ required

API Definition

The deployed NIM service provides OpenAI-compatible REST API endpoints:

Health Check: GET /v1/health/ready
Completions: POST /v1/completions
Chat Completions: POST /v1/chat/completions
Models: GET /v1/models

Example completion request:

{
  "model": "model-name",
  "prompt": "Your prompt here",
  "max_tokens": 250,
  "temperature": 0.7
}

Deployment

Prerequisites

NVIDIA Developer License: Access to NVIDIA Container Registry
NGC API Key: Required for pulling NIM containers (Generate here)
Hugging Face Token: Required for downloading models from Hugging Face Hub
Docker: With NVIDIA Container Runtime installed

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
deploy		deploy
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NVIDIA AI Blueprint: Bring Your LLM to NIM

What is NIM?

What You'll Learn

📘 Notebook 1: Deploy HuggingFace Safetensors

📗 Notebook 2: Deploy TensorRT-LLM Checkpoints and Engines

📙 Notebook 3: Deploy GGUF Checkpoints

What You'll Learn

Quick Start: Deploy Your First Model in 3 Minutes

Step 1: Pull the NIM container

Step 2: Deploy a small model

Step 3: Test it

Overview

Software Components

Hardware Requirements

API Definition

Deployment

Prerequisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NVIDIA AI Blueprint: Bring Your LLM to NIM

What is NIM?

What You'll Learn

📘 Notebook 1: Deploy HuggingFace Safetensors

📗 Notebook 2: Deploy TensorRT-LLM Checkpoints and Engines

📙 Notebook 3: Deploy GGUF Checkpoints

What You'll Learn

Quick Start: Deploy Your First Model in 3 Minutes

Step 1: Pull the NIM container

Step 2: Deploy a small model

Step 3: Test it

Overview

Software Components

Hardware Requirements

API Definition

Deployment

Prerequisites

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages