Merge pull request #250262 from ChenJieting/jieting/tool_related_doc

v-ccolin · web-flow · commit 0d7435f96a04 · 2023-09-06T08:21:39.000+01:00
update VectorDB tools docs and add a faq.md for tools
diff --git a/articles/machine-learning/prompt-flow/media/faq/switch-to-raw-file-mode.png b/articles/machine-learning/prompt-flow/media/faq/switch-to-raw-file-mode.png
diff --git a/articles/machine-learning/prompt-flow/media/faq/update-tool-name.png b/articles/machine-learning/prompt-flow/media/faq/update-tool-name.png
diff --git a/articles/machine-learning/prompt-flow/tools-reference/faiss-index-lookup-tool.md b/articles/machine-learning/prompt-flow/tools-reference/faiss-index-lookup-tool.md
@@ -18,13 +18,15 @@ Faiss Index Lookup is a tool tailored for querying within a user-provided Faiss-
 
 ## Prerequisites
 - Prepare an accessible path on Azure Blob Storage. Here's the guide if a new storage account needs to be created:  [Azure Storage Account](../../../storage/common/storage-account-create.md).
-- Create related Faiss-based index files on Azure Blob Storage. We support the LangChain format (index.faiss + index.pkl) for the index files, which can be prepared either by employing our EmbeddingStore SDK or following the quick guide from [LangChain documentation](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/faiss). Please refer to [the sample notebook for creating Faiss index](https://aka.ms/pf-sample-build-faiss-index) for building index using EmbeddingStore SDK.
+- Create related Faiss-based index files on Azure Blob Storage. We support the LangChain format (index.faiss + index.pkl) for the index files, which can be prepared either by employing promptflow-vectordb SDK or following the quick guide from [LangChain documentation](https://python.langchain.com/docs/modules/data_connection/vectorstores/integrations/faiss). Please refer to [the sample notebook for creating Faiss index](https://aka.ms/pf-sample-build-faiss-index) for building index using promptflow-vectordb SDK.
 - Based on where you put your own index files, the identity used by the promptflow runtime should be granted with certain roles. Please refer to [Steps to assign an Azure role](../../../role-based-access-control/role-assignments-steps.md):
 
     | Location | Role |
     | ---- | ---- |
     | workspace datastores or workspace default blob | AzureML Data Scientist |
     | other blobs | Storage Blob Data Reader |
+> [!NOTE]
+> When legacy tools switching to code first mode, if you encounter "'embeddingstore.tool.faiss_index_lookup.search' is not found" error, please refer to the [Troubleshoot Guidance](./troubleshoot-guidance.md).
 
 ## Inputs
 
@@ -38,7 +40,7 @@ The tool accepts the following inputs:
 
 ## Outputs
 
-The following is an example for JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by our EmbeddingStore SDK. For the Faiss Index Search, the following fields are populated:
+The following is an example for JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by promptflow-vectordb SDK. For the Faiss Index Search, the following fields are populated:
 
 | Field Name | Type | Description |
 | ---- | ---- | ----------- |
diff --git a/articles/machine-learning/prompt-flow/tools-reference/troubleshoot-guidance.md b/articles/machine-learning/prompt-flow/tools-reference/troubleshoot-guidance.md
@@ -0,0 +1,44 @@
+---
+title: Troubleshoot guidance
+titleSuffix: Azure Machine Learning
+description: This article addresses frequent questions about tool usage.
+services: machine-learning
+ms.service: machine-learning
+ms.subservice: core
+ms.topic: reference
+author: ChenJieting
+ms.author: chenjieting
+ms.reviewer: lagayhar
+ms.date: 09/05/2023
+---
+
+# Troubleshoot guidance
+
+This article addresses frequent questions about tool usage.
+
+## Error "package tool is not found" occurs when updating the flow for code first experience.
+
+When you update flows for code first experience, if the flow utilized these tools (Faiss Index Lookup, Vector Index Lookup, Vector DB Lookup, Content Safety (Text)), you may encounter the error message like below:
+
+<code><i>Package tool 'embeddingstore.tool.faiss_index_lookup.search' is not found in the current environment.</i></code>
+
+To resolve the issue, you have two options:
+
+- **Option 1**
+  - Update your runtime to latest version. 
+  - Click on "Raw file mode" to switch to the raw code view, then open the "flow.dag.yaml" file.
+     ![how-to-switch-to-raw-file-mode](../media/faq/switch-to-raw-file-mode.png)
+  - Update the tool names.
+     ![how-to-update-tool-name](../media/faq/update-tool-name.png)
+     
+      | Tool | New tool name |
+      | ---- | ---- |
+      | Faiss Index Lookup tool | promptflow_vectordb.tool.faiss_index_lookup.FaissIndexLookup.search |
+      | Vector Index Lookup | promptflow_vectordb.tool.vector_index_lookup.VectorIndexLookup.search |
+      | Vector DB Lookup | promptflow_vectordb.tool.vector_db_lookup.VectorDBLookup.search |
+      | Content Safety (Text) | content_safety_text.tools.content_safety_text_tool.analyze_text |
+  - Save the "flow.dag.yaml" file.
+
+- **Option 2**
+  - Update your runtime to latest version.
+  - Remove the old tool and re-create a new tool.
diff --git a/articles/machine-learning/prompt-flow/tools-reference/vector-db-lookup-tool.md b/articles/machine-learning/prompt-flow/tools-reference/vector-db-lookup-tool.md
@@ -19,68 +19,188 @@ Vector DB Lookup is a vector search tool that allows users to search top k simil
 | Name | Description |
 | --- | --- |
 | Azure Cognitive Search | Microsoft's cloud search service with built-in AI capabilities that enrich all types of information to help identify and explore relevant content at scale. |
+| Qdrant | Qdrant is a vector similarity search engine that provides a production-ready service with a convenient API to store, search and manage points (i.e. vectors) with an additional payload. |
+| Weaviate | Weaviate is an open source vector database that stores both objects and vectors. This allows for combining vector search with structured filtering. |
 
-This tool adds support for more vector databases, including Pinecone, Weaviete, Qdrant etc.
+This tool will support more vector databases.
 
 > [!IMPORTANT]
 > Prompt flow is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
 > For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
 
 ## Prerequisites
 
-The tool searches data from a third-party vector database. To use it, you should create resources in advance and establish connections between the tool and the resource.
+The tool searches data from a third-party vector database. To use it, you should create resources in advance and establish connection between the tool and the resource.
 
   - **Azure Cognitive Search:**
     - Create resource [Azure Cognitive Search](../../../search/search-create-service-portal.md).
-    - Add "CognitiveSearchConnection" connection. Fill "API key" field with "Primary admin key" from "Keys" section of created resource, and fill "Api Base" field with the URL, the URL format is `https://{your_serive_name}.search.windows.net`.
+    - Add "Cognitive search" connection. Fill "API key" field with "Primary admin key" from "Keys" section of created resource, and fill "API base" field with the URL, the URL format is `https://{your_serive_name}.search.windows.net`.
+
+  - **Qdrant:**
+    - Follow the [installation](https://qdrant.tech/documentation/quick-start/) to deploy Qdrant to a self-maintained cloud server.
+    - Add "Qdrant" connection. Fill "API base" with your self-maintained cloud server address and fill "API key" field.
+
+  - **Weaviate:**
+    - Follow the [installation](https://weaviate.io/developers/weaviate/installation) to deploy Weaviate to a self-maintained instance.
+    - Add "Weaviate" connection. Fill "API base" with your self-maintained instance address and fill "API key" field.
+
+> [!NOTE]
+> When legacy tools switching to code first mode, if you encounter "'embeddingstore.tool.vector_db_lookup.search' is not found" error, please refer to the [Troubleshoot Guidance](./troubleshoot-guidance.md).
 
 ## Inputs
 
-The following are available input parameters:
+The tool accepts the following inputs:
 - **Azure Cognitive Search:**
 
   | Name | Type | Description | Required |
   | ---- | ---- | ----------- | -------- |
-  | connection | CognitiveSearchConnection | The created workspace connection for accessing to Cognitive Search service endpoint. | Yes |
+  | connection | CognitiveSearchConnection | The created connection for accessing to Cognitive Search endpoint. | Yes |
   | index_name | string | The index name created in Cognitive Search resource. | Yes |
-  | text_field | string | The text field name. The returned text filed will populate the result of text. | No |
+  | text_field | string | The text field name. The returned text field will populate the text of output. | No |
   | vector_field | string | The vector field name. The target vector is searched in this vector field. | Yes |
-  | search_params | dict | The search parameters. It's key-value pairs. Except for parameters in the tool input list mentioned above, additional search parameters can be formed into a JSON object as search_params. For example, use `{"select": ""}` as search_params to select the returned fields, use `{"search": "", "queryType": "", ""semanticConfiguration": "", "queryLanguage": ""}` to perform a hybrid search. | No |
+  | search_params | dict | The search parameters. It's key-value pairs. Except for parameters in the tool input list mentioned above, additional search parameters can be formed into a JSON object as search_params. For example, use `{"select": ""}` as search_params to select the returned fields, use `{"search": ""}` to perform a [hybrid search](../../../search/search-get-started-vector.md#hybrid-search). | No |
   | search_filters | dict | The search filters. It's key-value pairs, the input format is like `{"filter": ""}` | No |
-  | vector | list | The target vector to be queried, which can be generated by the LLM tool. | Yes |
+  | vector | list | The target vector to be queried, which can be generated by Embedding tool. | Yes |
   | top_k | int | The count of top-scored entities to return. Default value is 3 | No |
 
+- **Qdrant:**
 
-## Outputs
+  | Name | Type | Description | Required |
+  | ---- | ---- | ----------- | -------- |
+  | connection | QdrantConnection | The created connection for accessing to Qdrant server. | Yes |
+  | collection_name | string | The collection name created in self-maintained cloud server. | Yes |
+  | text_field | string | The text field name. The returned text field will populate the text of output. | No |
+  | search_params | dict | The search parameters can be formed into a JSON object as search_params. For example, use `{"params": {"hnsw_ef": 0, "exact": false, "quantization": null}}` to set search_params. | No |
+  | search_filters | dict | The search filters. It's key-value pairs, the input format is like `{"filter": {"should": [{"key": "", "match": {"value": ""}}]}}` | No |
+  | vector | list | The target vector to be queried, which can be generated by Embedding tool. | Yes |
+  | top_k | int | The count of top-scored entities to return. Default value is 3 | No |
 
-The following is an example JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by our EmbeddingStore SDK.
+- **Weaviate:**
+
+  | Name | Type | Description | Required |
+  | ---- | ---- | ----------- | -------- |
+  | connection | WeaviateConnection | The created connection for accessing to Weaviate. | Yes |
+  | class_name | string | The class name. | Yes |
+  | text_field | string | The text field name. The returned text field will populate the text of output. | No |
+  | vector | list | The target vector to be queried, which can be generated by Embedding tool. | Yes |
+  | top_k | int | The count of top-scored entities to return. Default value is 3 | No |
 
-**Azure Cognitive Search:**
+## Outputs
 
-  For the Azure Cognitive Search, the following fields are populated:
+The following is an example JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by promptflow-vectordb SDK. 
+- **Azure Cognitive Search:**
 
-| Field Name      | Type   | Description                                                       |
-|-----------------|--------|-------------------------------------------------------------------|
-| vector          | list   | vector of the entity, the vector field name is specified in input |
-| text            | string | text of the entity, the text field name is specified in input     |
-| score           | float  | @search.score from the original entity, which evaluates the similarity between the entity and the query vector                   |
-| original_entity | dict   | the original response json from search REST API                   |
+  For Azure Cognitive Search, the following fields are populated:
 
+  | Field Name | Type | Description |
+  | ---- | ---- | ----------- |
+  | original_entity | dict | the original response json from search REST API|
+  | score | float |  @search.score from the original entity, which evaluates the similarity between the entity and the query vector |
+  | text | string | text of the entity|
+  | vector | list | vector of the entity|
 
+  <details>
+    <summary>Output</summary>
+    
   ```json
   [
     {
       "metadata": null,
       "original_entity": {
         "@search.score": 0.5099789,
         "id": "",
-        "your_text_filed_name": "text",
+        "your_text_filed_name": "sample text1",
         "your_vector_filed_name": [-0.40517663431890405, 0.5856996257406859, -0.1593078462266455, -0.9776269170785785, -0.6145604369828972],
         "your_additional_field_name": ""
       },
       "score": 0.5099789,
-      "text": "text",
+      "text": "sample text1",
       "vector": [-0.40517663431890405, 0.5856996257406859, -0.1593078462266455, -0.9776269170785785, -0.6145604369828972]
     }
   ]
   ```
+  </details>
+
+- **Qdrant:**
+
+  For Qdrant, the following fields are populated:
+
+  | Field Name | Type | Description |
+  | ---- | ---- | ----------- |
+  | original_entity | dict | the original response json from search REST API|
+  | metadata | dict | payload from the original entity|
+  | score | float | score from the original entity, which evaluates the similarity between the entity and the query vector|
+  | text | string | text of the payload|
+  | vector | list | vector of the entity|
+
+  <details>
+    <summary>Output</summary>
+    
+  ```json
+  [
+    {
+      "metadata": {
+        "text": "sample text1"
+      },
+      "original_entity": {
+        "id": 1,
+        "payload": {
+          "text": "sample text1"
+        },
+        "score": 1,
+        "vector": [0.18257418, 0.36514837, 0.5477226, 0.73029673],
+        "version": 0
+      },
+      "score": 1,
+      "text": "sample text1",
+      "vector": [0.18257418, 0.36514837, 0.5477226, 0.73029673]
+    }
+  ]
+  ```
+  </details>
+
+- **Weaviate:**
+
+  For Weaviate, the following fields are populated:
+  
+  | Field Name | Type | Description |
+  | ---- | ---- | ----------- |
+  | original_entity | dict | the original response json from search REST API|
+  | score | float | certainty from the original entity, which evaluates the similarity between the entity and the query vector|
+  | text | string | text in the original entity|
+  | vector | list | vector of the entity|
+
+  <details>
+    <summary>Output</summary>
+    
+  ```json
+  [
+    {
+      "metadata": null,
+      "original_entity": {
+        "_additional": {
+          "certainty": 1,
+          "distance": 0,
+          "vector": [
+            0.58,
+            0.59,
+            0.6,
+            0.61,
+            0.62
+          ]
+        },
+        "text": "sample text1."
+      },
+      "score": 1,
+      "text": "sample text1.",
+      "vector": [
+        0.58,
+        0.59,
+        0.6,
+        0.61,
+        0.62
+      ]
+    }
+  ]
+  ```
+  </details>
diff --git a/articles/machine-learning/prompt-flow/tools-reference/vector-index-lookup-tool.md b/articles/machine-learning/prompt-flow/tools-reference/vector-index-lookup-tool.md
@@ -29,6 +29,8 @@ Vector index lookup is a tool tailored for querying within an Azure Machine Lear
     | ---- | ---- |
     | workspace datastores or workspace default blob | AzureML Data Scientist |
     | other blobs | Storage Blob Data Reader |
+> [!NOTE]
+> When legacy tools switching to code first mode, if you encounter "'embeddingstore.tool.vector_index_lookup.search' is not found" error, please refer to the [Troubleshoot Guidance](./troubleshoot-guidance.md).
 
 ## Inputs
 
@@ -42,7 +44,7 @@ The tool accepts the following inputs:
 
 ## Outputs
 
-The following is an example for JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by our EmbeddingStore SDK. For the Vector Index Search, the following fields are populated:
+The following is an example for JSON format response returned by the tool, which includes the top-k scored entities. The entity follows a generic schema of vector search result provided by promptflow-vectordb SDK. For the Vector Index Search, the following fields are populated:
 
 | Field Name | Type | Description |
 | ---- | ---- | ----------- |
diff --git a/articles/machine-learning/toc.yml b/articles/machine-learning/toc.yml
@@ -656,6 +656,8 @@
               href: ./prompt-flow/tools-reference/vector-db-lookup-tool.md
             - name: SERP API tool
               href: ./prompt-flow/tools-reference/serp-api-tool.md
+            - name: Troubleshoot Guidance
+              href: ./prompt-flow/tools-reference/troubleshoot-guidance.md
     - name: Retrieval Augmented Generation (RAG)
       items:
         - name: What is RAG