Milvus Collection Schema Requirements for NVIDIA RAG Blueprint

When you create a collection in Milvus to use with the NVIDIA RAG Blueprint server, there are specific schema requirements that must be followed to ensure compatibility with the search and generate APIs. This document outlines the required fields and their configurations.

:::{note} If you are using either LangChain's Milvus integration or NVIDIA's nv-ingest tool for data ingestion, these schema requirements are automatically handled for you. Both tools will create and configure the collection with the correct schema fields. You only need to ensure these requirements when manually creating collections or using custom ingestion methods. :::

Required Schema Fields

The following fields are required in your Milvus collection schema:

Vector Field
- Name: vector
- Description: Stores the document embeddings
Text Field
- Name: text
- Description: Stores the document content
Source Field
- Name: source
- Can be configured in two ways:
  1. Simple string format: Directly store the filename
  2. JSON format: Store a JSON object with a source_id field containing the filename
```
{
  "source_id": "document.pdf"
}
```
Content Metadata Field (Optional)
- Name: content_metadata
- Type: JSON (DataType.JSON)
- Description: Stores additional metadata about the document content
- Can be used for filtering during search and retrieval

Example Schema Definition

Here's an example of a complete schema definition that meets all requirements:

{
    'auto_id': True,
    'description': '',
    'fields': [
        {
            'name': 'pk',
            'description': '',
            'type': DataType.INT64,
            'is_primary': True,
            'auto_id': True
        },
        {
            'name': 'vector',
            'description': '',
            'type': DataType.FLOAT_VECTOR,
            'params': {'dim': 2048}
        },
        {
            'name': 'source',
            'description': '',
            'type': DataType.JSON
        },
        {
            'name': 'content_metadata',
            'description': '',
            'type': DataType.JSON
        },
        {
            'name': 'text',
            'description': '',
            'type': DataType.VARCHAR,
            'params': {'max_length': 65535}
        }
    ],
    'enable_dynamic_field': True
}

Usage with RAG Server

When using this schema with the RAG server:

The search API will use the vector field for similarity search
The text field will be used to return the actual content
The source field will be used to track document sources
The content_metadata field can be used for filtering using the filter_expr parameter in search and generate APIs

For more information about using metadata for filtering, refer to the Custom Metadata Documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Milvus Collection Schema Requirements for NVIDIA RAG Blueprint

Required Schema Fields

Example Schema Definition

Usage with RAG Server

FilesExpand file tree

milvus-schema.md

Latest commit

History

milvus-schema.md

File metadata and controls

Milvus Collection Schema Requirements for NVIDIA RAG Blueprint

Required Schema Fields

Example Schema Definition

Usage with RAG Server