NVIDIA RAG Blueprint Documentation

Welcome to the NVIDIA RAG Blueprint documentation. You can learn more here, including how to get started with the RAG Blueprint, how to customize the RAG Blueprint, and how to troubleshoot the RAG Blueprint.

To view this documentation on docs.nvidia.com, browse to NVIDIA RAG Blueprint Documentation.
To view this documentation on GitHub, browse to NVIDIA RAG Blueprint Documentation.

Release Notes

For the release notes, refer to Release Notes.

Support Matrix

For hardware requirements and other information, refer to the Support Matrix.

Get Started With RAG Blueprint

Use the procedures in Get Started to get started quickly with the NVIDIA RAG Blueprint.
Experiment and test in the Web User Interface.
Use the Python Package to interact with the RAG system directly from Python code.
Explore the notebooks that demonstrate how to use the APIs. For details refer to Notebooks.

Deployment Options for RAG Blueprint

You can deploy the RAG Blueprint with Docker, Helm, or NIM Operator, and target dedicated hardware or a Kubernetes cluster. Use the following documentation to deploy the blueprint.

:::{important} Before you deploy, consider the following:

Self-hosted deployments require ~200GB of free disk space for model downloads and caching.
First-time deployments take 15-30 minutes (Docker) or 60-70 minutes (Kubernetes) as large models are downloaded.
Model downloads do not show progress bars; see the deployment guides for monitoring commands.
Subsequent deployments are much faster (2-15 minutes) because models are already cached.

For detailed requirements, refer to Support Matrix. :::

Alternative Deployment Options:

Use the Python Package (Library Mode) - Use the NVIDIA RAG Python package directly for programmatic access to the RAG system
Containerless Deployment (Lite Mode) - Simplified Python-only setup using Milvus Lite and NVIDIA cloud APIs, without Docker containers

Developer Guide

After you deploy the RAG blueprint, you can customize it for your use cases.

Common configurations
Data Ingestion & Processing
Vector Database and Retrieval
Multimodal and Advanced Generation
Evaluation
- Evaluate Your NVIDIA RAG Blueprint System
Governance
- NeMo Guardrails for input/output
Observability and Telemetry
- Observability
- Query-to-Answer Pipeline

Troubleshoot RAG Blueprint

Reference

Blog Posts

   :name: NVIDIA RAG Blueprint
   :caption: NVIDIA RAG Blueprint
   :maxdepth: 1
   :hidden:

   Release Notes <release-notes.md>
   Support Matrix <support-matrix.md>

   :name: Get Started
   :caption: Get Started
   :maxdepth: 1
   :hidden:

   Get an API Key <api-key.md>
   Get Started with the RAG Blueprint <deploy-docker-self-hosted.md>
   Web User Interface <user-interface.md>
   Use the RAG Python Package <python-client.md>
   Notebooks <notebooks.md>

   :name: Deployment Options for RAG Blueprint
   :caption: Deployment Options for RAG Blueprint
   :maxdepth: 1
   :hidden:

   Deploy with Docker (NVIDIA-Hosted Models) <deploy-docker-nvidia-hosted.md>
   Deploy on Kubernetes with Helm <deploy-helm.md>
   Deploy on Kubernetes with Helm from the repository <deploy-helm-from-repo.md>
   Deploy on Kubernetes with Helm and MIG Support <mig-deployment.md>
   Deploy Retrieval-Only Mode <retrieval-only-deployment.md>

   :name: Common configurations
   :caption: Common configurations
   :maxdepth: 1
   :hidden:

   Best Practices for Common Settings <accuracy_perf.md>
   Change the Model <change-model.md>
   Customize Parameters <llm-params.md>
   Customize Prompts <prompt-customization.md>
   Model Profiles <model-profiles.md>
   Multi-Collection Retrieval <multi-collection-retrieval.md>
   Multi-Turn Conversation Support <multiturn.md>
   Reasoning <enable-nemotron-thinking.md>
   Self-reflection <self-reflection.md>
   Summarization <summarization.md>

   :name: Data Ingestion and Processing
   :caption: Data Ingestion and Processing
   :maxdepth: 1
   :hidden:

   Audio Ingestion Support <audio_ingestion.md>
   Custom metadata Support <custom-metadata.md>
   Data Catalog for Collections and Documents <data-catalog.md>
   File System Access to Results <mount-ingestor-volume.md>
   Multimodal Embedding Support (Early Access) <vlm-embed.md>
   OCR Configuration Guide <nemoretriever-ocr.md>
   Enhanced PDF Extraction <nemotron-parse-extraction.md>
   Standalone NV-Ingest <nv-ingest-standalone.md>
   Text-Only Ingestion <text_only_ingest.md>
   MCP Server Usage <mcp.md>

   :name: Vector Database and Retrieval
   :caption: Vector Database and Retrieval
   :maxdepth: 1
   :hidden:

   Change the Vector Database <change-vectordb.md>
   Hybrid Search <hybrid_search.md>
   Milvus Configuration <milvus-configuration.md>
   Query Decomposition <query_decomposition.md>

   :name: Multimodal and Advanced Generation
   :caption: Multimodal and Advanced Generation
   :maxdepth: 1
   :hidden:

   Image Captioning <image_captioning.md>
   Multimodal Query Support <multimodal-query.md>
   VLM-based Inferencing <vlm.md>

   :name: Evaluation
   :caption: Evaluation
   :maxdepth: 1
   :hidden:

   Evaluate Your RAG System <evaluate.md>

   :name: Governance
   :caption: Governance
   :maxdepth: 1
   :hidden:

   NeMo Guardrails <nemo-guardrails.md>

   :name: Observability and Telemetry
   :caption: Observability and Telemetry
   :maxdepth: 1
   :hidden:

   Observability <observability.md>
   Query-to-Answer Pipeline <query-to-answer-pipeline.md>

   :name: Troubleshoot RAG Blueprint
   :caption: Troubleshoot RAG Blueprint
   :maxdepth: 1
   :hidden:

   Troubleshoot <troubleshooting.md>
   RAG Pipeline Debugging Guide <debugging.md>
   Migration Guide <migration_guide.md>

   :name: Reference
   :caption: Reference
   :maxdepth: 1
   :hidden:

   Milvus Collection Schema <milvus-schema.md>
   Service Port and GPU Reference <service-port-gpu-reference.md>
   API - Ingestor Server Schema <api-ingestor.md>
   API - RAG Server Schema <api-rag.md>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA RAG Blueprint Documentation

Release Notes

Support Matrix

Get Started With RAG Blueprint

Deployment Options for RAG Blueprint

Developer Guide

Troubleshoot RAG Blueprint

Reference

Blog Posts

FilesExpand file tree

index.md

Latest commit

History

index.md

File metadata and controls

NVIDIA RAG Blueprint Documentation

Release Notes

Support Matrix

Get Started With RAG Blueprint

Deployment Options for RAG Blueprint

Developer Guide

Troubleshoot RAG Blueprint

Reference

Blog Posts