Skip to content

Latest commit

 

History

History
304 lines (210 loc) · 9.43 KB

File metadata and controls

304 lines (210 loc) · 9.43 KB

NVIDIA RAG Blueprint Documentation

Welcome to the NVIDIA RAG Blueprint documentation. You can learn more here, including how to get started with the RAG Blueprint, how to customize the RAG Blueprint, and how to troubleshoot the RAG Blueprint.

Release Notes

For the release notes, refer to Release Notes.

Support Matrix

For hardware requirements and other information, refer to the Support Matrix.

Get Started With RAG Blueprint

  • Use the procedures in Get Started to get started quickly with the NVIDIA RAG Blueprint.
  • Experiment and test in the Web User Interface.
  • Use the Python Package to interact with the RAG system directly from Python code.
  • Explore the notebooks that demonstrate how to use the APIs. For details refer to Notebooks.

Deployment Options for RAG Blueprint

You can deploy the RAG Blueprint with Docker, Helm, or NIM Operator, and target dedicated hardware or a Kubernetes cluster. Use the following documentation to deploy the blueprint.

:::{important} Before you deploy, consider the following:

  • Self-hosted deployments require ~200GB of free disk space for model downloads and caching.
  • First-time deployments take 15-30 minutes (Docker) or 60-70 minutes (Kubernetes) as large models are downloaded.
  • Model downloads do not show progress bars; see the deployment guides for monitoring commands.
  • Subsequent deployments are much faster (2-15 minutes) because models are already cached.

For detailed requirements, refer to Support Matrix. :::

Alternative Deployment Options:

Developer Guide

After you deploy the RAG blueprint, you can customize it for your use cases.

Troubleshoot RAG Blueprint

Reference

Blog Posts

   :name: NVIDIA RAG Blueprint
   :caption: NVIDIA RAG Blueprint
   :maxdepth: 1
   :hidden:

   Release Notes <release-notes.md>
   Support Matrix <support-matrix.md>
   :name: Get Started
   :caption: Get Started
   :maxdepth: 1
   :hidden:

   Get an API Key <api-key.md>
   Get Started with the RAG Blueprint <deploy-docker-self-hosted.md>
   Web User Interface <user-interface.md>
   Use the RAG Python Package <python-client.md>
   Notebooks <notebooks.md>
   :name: Deployment Options for RAG Blueprint
   :caption: Deployment Options for RAG Blueprint
   :maxdepth: 1
   :hidden:

   Deploy with Docker (NVIDIA-Hosted Models) <deploy-docker-nvidia-hosted.md>
   Deploy on Kubernetes with Helm <deploy-helm.md>
   Deploy on Kubernetes with Helm from the repository <deploy-helm-from-repo.md>
   Deploy on Kubernetes with Helm and MIG Support <mig-deployment.md>
   Deploy Retrieval-Only Mode <retrieval-only-deployment.md>
   :name: Common configurations
   :caption: Common configurations
   :maxdepth: 1
   :hidden:

   Best Practices for Common Settings <accuracy_perf.md>
   Change the Model <change-model.md>
   Customize Parameters <llm-params.md>
   Customize Prompts <prompt-customization.md>
   Model Profiles <model-profiles.md>
   Multi-Collection Retrieval <multi-collection-retrieval.md>
   Multi-Turn Conversation Support <multiturn.md>
   Reasoning <enable-nemotron-thinking.md>
   Self-reflection <self-reflection.md>
   Summarization <summarization.md>
   :name: Data Ingestion and Processing
   :caption: Data Ingestion and Processing
   :maxdepth: 1
   :hidden:

   Audio Ingestion Support <audio_ingestion.md>
   Custom metadata Support <custom-metadata.md>
   Data Catalog for Collections and Documents <data-catalog.md>
   File System Access to Results <mount-ingestor-volume.md>
   Multimodal Embedding Support (Early Access) <vlm-embed.md>
   OCR Configuration Guide <nemoretriever-ocr.md>
   Enhanced PDF Extraction <nemotron-parse-extraction.md>
   Standalone NV-Ingest <nv-ingest-standalone.md>
   Text-Only Ingestion <text_only_ingest.md>
   MCP Server Usage <mcp.md>
   :name: Vector Database and Retrieval
   :caption: Vector Database and Retrieval
   :maxdepth: 1
   :hidden:

   Change the Vector Database <change-vectordb.md>
   Hybrid Search <hybrid_search.md>
   Milvus Configuration <milvus-configuration.md>
   Query Decomposition <query_decomposition.md>
   :name: Multimodal and Advanced Generation
   :caption: Multimodal and Advanced Generation
   :maxdepth: 1
   :hidden:

   Image Captioning <image_captioning.md>
   Multimodal Query Support <multimodal-query.md>
   VLM-based Inferencing <vlm.md>
   :name: Evaluation
   :caption: Evaluation
   :maxdepth: 1
   :hidden:

   Evaluate Your RAG System <evaluate.md>
   :name: Governance
   :caption: Governance
   :maxdepth: 1
   :hidden:

   NeMo Guardrails <nemo-guardrails.md>
   :name: Observability and Telemetry
   :caption: Observability and Telemetry
   :maxdepth: 1
   :hidden:

   Observability <observability.md>
   Query-to-Answer Pipeline <query-to-answer-pipeline.md>
   :name: Troubleshoot RAG Blueprint
   :caption: Troubleshoot RAG Blueprint
   :maxdepth: 1
   :hidden:

   Troubleshoot <troubleshooting.md>
   RAG Pipeline Debugging Guide <debugging.md>
   Migration Guide <migration_guide.md>
   :name: Reference
   :caption: Reference
   :maxdepth: 1
   :hidden:

   Milvus Collection Schema <milvus-schema.md>
   Service Port and GPU Reference <service-port-gpu-reference.md>
   API - Ingestor Server Schema <api-ingestor.md>
   API - RAG Server Schema <api-rag.md>