Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 124 additions & 82 deletions agent_knowledge/search-with-milvus/search-with-watsonx-data-milvus.md
Original file line number Diff line number Diff line change
@@ -1,92 +1,134 @@
# How to set up watsonx.data Milvus Cloud Pak for Data as a content repository of Agent Knowledge in watsonx Orchestrate Cloud Pak for Data by using an embedding model in watsonx.ai Cloud Pak for Data
This document explains how to set up watsonx.data Milvus Cloud Pak for Data as a content repository of Agent Knowledge in watsonx Orchestrate Cloud Pak for Data by using an embedding model in watsonx.ai Cloud Pak for Data.

## Before you begin
1. Provision a watsonx.data instance
* To install and set up watsonx.data on Cloud Pak for Data (on-prem), see [Installing and administering Cloud Pak for Data](https://www.ibm.com/docs/en/cloud-paks/cp-data/5.2.x?topic=installing-administering-cloud-pak-data)
2. Add a Milvus service in the watsonx.data console
* To add a Milvus service on Cloud Pak for Data (on-prem), see [Adding a Milvus service](https://www.ibm.com/docs/en/watsonxdata/standard/2.2.x?topic=milvus-adding-service)
3. This guide focuses on how to set up watsonx.data Milvus on Cloud Pak for Data as the content repository. For instructions on configuring watsonx.ai and watsonx Orchestrate on Cloud Pak for Data, refer to their respective documentation.

## Table of contents
* [Step 1: Collect Milvus connection information](#step-1-collect-milvus-connection-information)
* [Get the credentials](#get-the-credentials)
* [Get other connection details](#get-other-connection-details)
* [Step 2: Ingest data into Milvus](#step-2-ingest-data-into-milvus)
* [Option 1: Ingest data through watsonx.ai](#option-1-ingest-data-through-watsonxai)
* [Option 2: Ingest data using custom code](#option-2-ingest-data-using-custom-code)
* [Step 3: Connect to Agent Knowledge in watsonx Orchestrate](#step-3-connect-to-agent-knowledge-in-watsonx-orchestrate)

## Step 1: Collect Milvus connection information
### Get the credentials
#### Username
The default username is `cpadmin` for the Milvus service on watsonx.data. You can also find the username in the Milvus "Access control" panel in the `Infrastructure Manager` of the watsonx.data instance.

#### Password
Use the password that you created for `cpadmin` or a different user that has the access to the Milvus service as identified in the `Infrastructure Manager`.

### Get other connection details
Apply the following steps to collect additional Milvus connection details from the watsonx.data console:

1. Go to the `Infrastructure manager` page.
2. Click the Milvus service to open the `Details` page.
3. Click on `View connect details` to view more connection details.
4. Collect the GRPC `host`, `port`, and the SSL certificate from the service details.

## Step 2: Ingest data into Milvus
You can ingest data into Milvus vector database either through watsonx.ai or by using custom code.
### Option 1: Ingest data through watsonx.ai
#### Create a Milvus connection
On the watsonx.ai Project Assets page, click on `New asset` > choose `Connect a data source` > choose `Milvus` > click `Next` > fill in the connection details and credentials as below > `Test connection` > click `Create`.

<img src="./assets/create-milvus-connection.png" width="1080" height="574" />

#### Create a vector index and upload documents
On the Watsonx.ai Project Assets page, use the Milvus connection configured in the previous step to create a vector index and upload documents. Do the following steps:
1. On the watsonx.ai Project Assets page, click on `New asset` > choose `Ground gen AI with vectorized documents`.
2. On the left-side panel, select `watsonx.data Milvus` as the vector store > fill in name and description > select the Milvus connection created earlier.
3. Select the Milvus connection created earlier > Select `Database` and `Embeddings model` from the dropdowns > click `Next`. For example,
<img src="./assets/create-milvus-index-watsonx-ai.png" width="1080" height="574" />

4. Click `New collection` to create a new collection.
5. Enter a unique collection name, select the files to include in the Milvus collection, and then click `Create`. Note the values for the `Document name` and the `Text` under `Advanced settings`, for later use during Agent Knowledge setup.
<img src="./assets/create-milvus-collection-and-ingest-watsonx-ai.png" width="1080" height="574" />

6. Once the document upload is complete, you can start testing it in the prompt lab.

**NOTE: By default, `document_name` and `text` are the two main fields created in the Milvus collection schema. When searching this Milvus collection using custom code, you must specify these two fields as `output_fields`. When setting up Milvus as content repository in Agent Knowledge, you must configure the `Title` and `Body` fields with these two fields.**

### Option 2: Ingest data using custom code
To ingest documents into Milvus, refer to the sample code: [../examples/index-with-milvus.py](../examples/index-with-milvus.py). To run the code,
1. Install dependencies.
# Using Milvus in watsonx.data as an Agent Knowledge Repository for Cloud Pak for Data

This guide explains how to configure watsonx.data Milvus as a content repository for Agent Knowledge in watsonx Orchestrate **on Cloud Pak for Data**, using embedding models from watsonx.ai Cloud Pak for Data.

## Prerequisites

Before starting this integration, ensure you have:

- **Cloud Pak for Data environment**: A properly configured Cloud Pak for Data environment
- **watsonx.data instance**: A properly configured watsonx.data instance on Cloud Pak for Data
- For installation instructions, see [Installing and administering Cloud Pak for Data](https://www.ibm.com/docs/en/cloud-paks/cp-data/5.2.x?topic=installing-administering-cloud-pak-data)
- **Milvus service**: Added to your watsonx.data console
- For setup instructions, see [Adding a Milvus service](https://www.ibm.com/docs/en/watsonxdata/standard/2.2.x?topic=milvus-adding-service)
- **Access credentials**: Administrative access to both watsonx.data and watsonx.ai
- **Documents**: Content you want to make available to your agents

## Table of Contents

- [Step 1: Collect Milvus Connection Information](#step-1-collect-milvus-connection-information)
- [Step 2: Ingest Data into Milvus](#step-2-ingest-data-into-milvus)
- [Step 3: Connect to Agent Knowledge in watsonx Orchestrate](#step-3-connect-to-agent-knowledge-in-watsonx-orchestrate)
- [Troubleshooting](#troubleshooting)
- [Conclusion](#conclusion)

## Integration Process

### Step 1: Collect Milvus Connection Information

#### Authentication Credentials

- **Username**: Default is `cpadmin` for Milvus service on watsonx.data
- You can verify this in the Milvus "Access control" panel in the Infrastructure Manager
- **Password**: Use the password created for the Milvus service user

#### Connection Details

1. Navigate to the **Infrastructure manager** page
2. Select your Milvus service to open the **Details** page
3. Click **View connect details**
4. Record the following information:
- GRPC host
- GRPC port
- SSL certificate

### Step 2: Ingest Data into Milvus

Choose one of the following methods to populate your Milvus vector database:

#### Option 1: Using watsonx.ai Interface

1. **Create a Milvus connection**:
- Go to watsonx.ai Project Assets page
- Click **New asset** > **Connect a data source** > **Milvus** > **Next**
- Enter your connection details and credentials
- Click **Test connection** > **Create**

![Milvus Connection Setup](./assets/create-milvus-connection.png)

2. **Create a vector index and upload documents**:
- On the watsonx.ai Project Assets page, click **New asset** > **Ground gen AI with vectorized documents**
- Select **watsonx.data Milvus** as the vector store
- Fill in name and description
- Select your Milvus connection
- Choose your **Database** and **Embeddings model** > click **Next**

![Milvus Index Creation](./assets/create-milvus-index-watsonx-ai.png)

3. **Create a collection and upload documents**:
- Click **New collection**
- Enter a unique collection name
- Select files to include
- Click **Create**
- Note the values for **Document name** and **Text** under Advanced settings (needed for Agent Knowledge setup)

![Milvus Collection Creation](./assets/create-milvus-collection-and-ingest-watsonx-ai.png)

> **Important**: By default, `document_name` and `text` are the two main fields created in the Milvus collection schema. When searching this collection using custom code, you must specify these as `output_fields`. When configuring Agent Knowledge, map these to the `Title` and `Body` fields.

#### Option 2: Using Custom Code

To programmatically ingest documents:

1. **Install dependencies**:
```bash
python3 -m pip install pymilvus langchain langchain-milvus langchain-ibm ibm-watsonx-ai PyPDF2
```
2. Create environment variables for Milvus credentials.
```bash
export MILVUS_HOST="Your Milvus GRPC host"
export MILVUS_PORT="Your Milvus GRPC port"
export MILVUS_USER="cpadmin" // The default username for watsonx.data Milvus on-prem
export MILVUS_PASSWORD="Your watsonx.data Milvus on-prem password"
export MILVUS_PEM_PATH="the file path to the watsonx.data Milvus on-prem TLS certificate"
export MILVUS_COLLECTION_NAME="Your Milvus collection name" // It can be anything

export WATSONX_AI_URL="Your watsonx.ai on-prem URL"
export WATSONX_AI_USERNAME="Your watsonx.ai on-prem username" // watsonx.ai embeddings model is used to create vectors
export WATSONX_AI_PASSWORD="Your watsonx.ai on-prem password" // watsonx.ai embeddings model is used to create vectors
export WATSONX_AI_PROJECT_ID="Your watsonx.ai project ID" // watsonx.ai project ID is required to access the embeddings models
```
3. Update the `SOURCE_FILES`, `SOURCE_URLS`, and `SOURCE_TITLES` variables at the beginning of the script to your file names, urls, and titles respectively.
4. Run the script.

2. **Set environment variables**:
```bash
python3 index-with-milvus.py
export MILVUS_HOST="Your Milvus GRPC host"
export MILVUS_PORT="Your Milvus GRPC port"
export MILVUS_USER="cpadmin" # Default username for watsonx.data Milvus on-prem
export MILVUS_PASSWORD="Your on-prem watsonx.data Milvus password"
export MILVUS_PEM_PATH="path/to/milvus/tls/certificate"
export MILVUS_COLLECTION_NAME="your_collection_name"

export WATSONX_AI_URL="Your on-prem watsonx.ai URL"
export WATSONX_AI_USERNAME="Your on-prem watsonx.ai username"
export WATSONX_AI_PASSWORD="Your on-prem watsonx.ai password"
export WATSONX_AI_PROJECT_ID="Your on-prem watsonx.ai project ID"
```

## Step 3: Connect to Agent Knowledge in watsonx Orchestrate
3. **Modify and run the sample script**:
- Update `SOURCE_FILES`, `SOURCE_URLS`, and `SOURCE_TITLES` in the script
- Run the script:
```bash
python3 index-with-milvus.py
```

### Step 3: Connect to Agent Knowledge in watsonx Orchestrate

> **Important**: The embedding model used for search must match the one used during data ingestion in Step 2.

To configure watsonx.data Milvus as a content repository in watsonx Orchestrate:

1. Navigate to the Agent Knowledge section in watsonx Orchestrate
2. Follow the integration steps for Milvus content repository
3. Configure the connection using the details collected in Step 1
4. Map the `document_name` field to `Title` and `text` field to `Body`

For detailed instructions, see [Connecting to a Milvus content repository](https://www.ibm.com/docs/en/watsonx/watson-orchestrate/base?topic=agents-connecting-milvus-content-repository).

## Troubleshooting

**NOTE: The embedding model used for search in [Step 3](#step-3-connect-to-agent-knowledge-in-watsonx-orchestrate) must align with the embedding model used for data ingestion in [Step 2](#step-2-ingest-data-into-milvus).**
- **Connection Issues**: Verify your host, port, and credentials are correct
- **SSL Certificate Problems**: Ensure the certificate path is correct and the certificate is valid
- **Embedding Model Mismatch**: Confirm the same embedding model is used for both ingestion and search
- **Missing Fields**: Check that `document_name` and `text` fields are properly configured in your collection

This option allows you to integrate with your watsonx.data Milvus on-prem service through the Agent Knowledge feature of watsonx Orchestrate.
## Conclusion

For detailed instructions on setting up watsonx.data Milvus (on-prem) through the Agent Knowledge feature of watsonx Orchestrate, see [Connecting to a Milvus content repository](https://www.ibm.com/docs/en/watsonx/watson-orchestrate/base?topic=agents-connecting-milvus-content-repository).
You have now successfully set up watsonx.data Milvus as a content repository for Agent Knowledge in watsonx Orchestrate on Cloud Pak for Data. Your agents can now search and retrieve information from the documents you've ingested, enhancing their capabilities with domain-specific knowledge.

For additional support or advanced configurations, refer to the official [watsonx documentation](https://www.ibm.com/docs/en/watsonx).