Enable Image Captioning Support for NVIDIA RAG Blueprint

You can enable image captioning support for NVIDIA RAG Blueprint. Enabling image captioning will yield higher accuracy for querstions relevant to images in the ingested documents at the cost of higher ingestion latency.

After you have deployed the blueprint, to enable image captioning support, you have the following options:

Using on-prem VLM model (Recommended)
Using cloud hosted VLM model
Using Helm chart deployment (On-prem only)

:::{warning} B200 GPUs are not supported for image captioning support for ingested documents. For this feature, use H100 or A100 GPUs instead. :::

Using on-prem VLM model (Recommended)

Deploy the VLM model on-prem. You need a H100 or A100 or B200 GPU to deploy this model.

export VLM_MS_GPU_ID=<AVAILABLE_GPU_ID>
USERID=$(id -u) docker compose -f deploy/compose/nims.yaml --profile vlm-only up -d

Make sure the vlm container is up and running

docker ps --filter "name=nemo-vlm-microservice" --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

Example Output

NAMES                                   STATUS
nemo-vlm-microservice                   Up 5 minutes (healthy)

Enable image captioning Export the below environment variable and relaunch the ingestor-server container.

export APP_NVINGEST_EXTRACTIMAGES="True"
export APP_NVINGEST_CAPTIONENDPOINTURL="http://vlm-ms:8000/v1/chat/completions"
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d

Using cloud hosted VLM model

Set caption endpoint and model to API catalog

export APP_NVINGEST_CAPTIONENDPOINTURL="https://integrate.api.nvidia.com/v1/chat/completions"
export APP_NVINGEST_CAPTIONMODELNAME="nvidia/nemotron-nano-12b-v2-vl"

Enable image captioning Export the below environment variable and relaunch the ingestor-server container.

export APP_NVINGEST_EXTRACTIMAGES="True"
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d

:::{tip} You can change the model name and model endpoint in case of an externally hosted VLM model by setting these two environment variables and restarting the ingestion services :::

export APP_NVINGEST_CAPTIONMODELNAME="<vlm_nim_http_endpoint_url>"
export APP_NVINGEST_CAPTIONMODELNAME="<model_name>"

Using Helm chart deployment (On-prem only)

To enable image captioning in Helm-based deployments by using an on-prem VLM model, use the following procedure.

Modify values.yaml to enable image captioning:

# Enable VLM NIM for image captioning
nim-vlm:
  enabled: true

# Configure ingestor-server for image captioning
ingestor-server:
  envVars:
    # ... existing configurations ...
    
    # === Image Captioning ===
    APP_NVINGEST_EXTRACTIMAGES: "True"
    APP_NVINGEST_CAPTIONENDPOINTURL: "http://nim-vlm:8000/v1/chat/completions"
    APP_NVINGEST_CAPTIONMODELNAME: "nvidia/nemotron-nano-12b-v2-vl"

Apply the updated Helm chart:

After modifying values.yaml, apply the changes as described in Change a Deployment.

For detailed HELM deployment instructions, see Helm Deployment Guide.

:::{note} Enabling the on-prem VLM model increases the total GPU requirement to 9xH100 GPUs. :::

:::{warning} With image captioning enabled, uploaded files will fail to get ingested, if they do not contain any graphs, charts, tables or plots. This is currently a known limitation. :::

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Image Captioning Support for NVIDIA RAG Blueprint

Using on-prem VLM model (Recommended)

Using cloud hosted VLM model

Using Helm chart deployment (On-prem only)

FilesExpand file tree

image_captioning.md

Latest commit

History

image_captioning.md

File metadata and controls

Enable Image Captioning Support for NVIDIA RAG Blueprint

Using on-prem VLM model (Recommended)

Using cloud hosted VLM model

Using Helm chart deployment (On-prem only)