Skip to content

Commit e0911d9

Browse files
committed
Pablo feedback, part 2
1 parent ecd4f92 commit e0911d9

File tree

4 files changed

+20
-14
lines changed

4 files changed

+20
-14
lines changed

articles/search/vector-search-how-to-chunk-documents.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.date: 06/29/2023
1515
> [!IMPORTANT]
1616
> Vector search is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). It's available through the Azure portal, preview REST API, and [alpha SDKs](https://github.com/Azure/cognitive-search-vector-pr#readme).
1717
18-
This article describes several approaches for chunking large documents so that you can generate embeddings for vector search.
18+
This article describes several approaches for chunking large documents so that you can generate embeddings for vector search. Chunking is only required if source documents are too large for the maximum input size imposed by models.
1919

2020
## Why is chunking important?
2121

@@ -42,7 +42,7 @@ Here are some common chunking techniques, starting with the most widely used met
4242

4343
### Content overlap considerations
4444

45-
When chunking data, overlapping a small amount of text between chunks can help preserve context. We recommend starting with an overlap of approximately 10%. For example, given a fixed chunk size of 256 tokens, you would begin testing with an overlap of 25 tokens. The actual amount of overlap varies depending on the type of data and the specific use case, but we have found that 10-15% works for many scenarios.
45+
When you chunk data, overlapping a small amount of text between chunks can help preserve context. We recommend starting with an overlap of approximately 10%. For example, given a fixed chunk size of 256 tokens, you would begin testing with an overlap of 25 tokens. The actual amount of overlap varies depending on the type of data and the specific use case, but we have found that 10-15% works for many scenarios.
4646

4747
### Factors for chunking data
4848

articles/search/vector-search-how-to-create-index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,9 @@ Prior to indexing, assemble a document payload that includes vector data. The do
3737

3838
Vector fields contain vector data generated by embedding models. We recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as **text-embedding-ada-002** for text documents or the [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for images.
3939

40-
1. Optionally, include other fields with alphanumeric content for hybrid query scenarios that include full text search or semantic ranking.
40+
1. Provide any other fields with alphanumeric content for any nonvector queries you want to support, as well as for hybrid query scenarios that include full text search or semantic ranking in the same request.
41+
42+
Your search index should include fields and content for all of the query scenarios you want to support. Suppose you want to search or filter over product names, versions, metadata, or addresses. In this case, similarity search isn't especially helpful and keyword search, geo-search, or filters would be a better choice. A search index that includes a comprehensive field collection of vector and non-vector data provides maximum flexibility for query construction.
4143

4244
## Add a vector field to the fields collection
4345

articles/search/vector-search-how-to-generate-embeddings.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: farzad528
77
ms.author: fsunavala
88
ms.service: cognitive-search
99
ms.topic: how-to
10-
ms.date: 06/29/2023
10+
ms.date: 07/10/2023
1111
---
1212

1313
# Create and use embeddings for search queries and documents
@@ -23,9 +23,9 @@ Dimension attributes have a minimum of 2 and a maximum of 2048 dimensions per ve
2323

2424
+ Query inputs require that you submit user-provided input to an embedding model that quickly converts human readable text into a vector.
2525

26-
+ We used **text-embedding-ada-002** to generate text embeddings and [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for image embeddings.
26+
+ For example, you can use **text-embedding-ada-002** to generate text embeddings and [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for image embeddings.
2727

28-
+ To avoid [rate limiting](/azure/cognitive-services/openai/quotas-limits), we implemented retry logic in our workload. For the Python demo, we used [tenacity](https://pypi.org/project/tenacity/).
28+
+ To avoid [rate limiting](/azure/cognitive-services/openai/quotas-limits), you can implement retry logic in your workload. For the Python demo, we used [tenacity](https://pypi.org/project/tenacity/).
2929

3030
+ Query outputs are any matching documents found in a search index. Your search index must have been previously loaded with documents having one or more vector fields with embeddings. Whatever model you used for indexing, use the same model for queries.
3131

@@ -65,8 +65,8 @@ print(embeddings)
6565

6666
+ **Identify use cases:** Evaluate the specific use cases where embedding model integration for vector search features can add value to your search solution. This can include matching image content with text content, cross-lingual searches, or finding similar documents.
6767
+ **Optimize cost and performance**: Vector search can be resource-intensive and is subject to maximum limits, so consider only vectorizing the fields that contain semantic meaning.
68-
+ **Choose the right embedding model:** Select an appropriate model for your specific use case, such as word embeddings for text-based searches or image embeddings for visual searches. Consider using pre-trained models like **text-embedding-ada-002** from OpenAI or **Image Retreival** REST API from [Azure AI Computer Vision](/azure/cognitive-services/computer-vision/how-to/image-retrieval).
69-
+ **Normalize Vector lengths**: Ensure that the vector lengths are normalized before storing them in the search index to improve the accuracy and performance of similarity search. Most pre-trained models already are normalized but not all.
68+
+ **Choose the right embedding model:** Select an appropriate model for your specific use case, such as word embeddings for text-based searches or image embeddings for visual searches. Consider using pretrained models like **text-embedding-ada-002** from OpenAI or **Image Retrieval** REST API from [Azure AI Computer Vision](/azure/cognitive-services/computer-vision/how-to/image-retrieval).
69+
+ **Normalize Vector lengths**: Ensure that the vector lengths are normalized before storing them in the search index to improve the accuracy and performance of similarity search. Most pretrained models already are normalized but not all.
7070
+ **Fine-tune the model**: If needed, fine-tune the selected model on your domain-specific data to improve its performance and relevance to your search application.
7171
+ **Test and iterate**: Continuously test and refine your embedding model integration to achieve the desired search performance and user satisfaction.
7272

articles/search/vector-search-how-to-query.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,26 +7,26 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: how-to
10-
ms.date: 07/07/2023
10+
ms.date: 07/10/2023
1111
---
1212

1313
# Query vector data in a search index
1414

1515
> [!IMPORTANT]
1616
> Vector search is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). It's available through the Azure portal, preview REST API, and [alpha SDKs](https://github.com/Azure/cognitive-search-vector-pr#readme).
1717
18-
In Azure Cognitive Search, if you added vector fields to a search index, this article explains how to query those fields. It also explains how to combine vector queries with full text search and semantic search for hybrid query scenarios.
18+
In Azure Cognitive Search, if you added vector fields to a search index, this article explains how to query those fields. It also explains how to combine vector queries with full text search and semantic search for hybrid query combination scenarios.
1919

2020
## Prerequisites
2121

22-
+ Azure Cognitive Search, in any region and on any tier. However, if you want to also use [semantic search](semantic-search-overview.md) for hybrid queries, your search service must be Basic tier or higher, with [semantic search enabled](semantic-search-overview.md#enable-semantic-search).
23-
24-
Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields will fail on creation. In this situation, a new service must be created.
22+
+ Azure Cognitive Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields will fail on creation. In this situation, a new service must be created.
2523

2624
+ A search index containing vector fields. See [Add vector fields to a search index](vector-search-how-to-query.md).
2725

2826
+ Use REST API version 2023-07-01-preview or Azure portal to query vector fields. You can also use alpha versions of the Azure SDKs. For more information, see [this readme](https://github.com/Azure/cognitive-search-vector-pr/blob/main/README.md).
2927

28+
+ (Optional) If you want to also use [semantic search (preview)](semantic-search-overview.md) and vector search together, your search service must be Basic tier or higher, with [semantic search enabled](semantic-search-overview.md#enable-semantic-search).
29+
3030
## Check your index for vector fields
3131

3232
In the index schema, check for:
@@ -107,7 +107,11 @@ The response includes 5 matches, and each result provides a search score, title,
107107

108108
## Query syntax for hybrid search
109109

110-
A hybrid query combines full text search, semantic search (reranking), and vector search. The search engine runs full text and vector queries in parallel. Semantic ranking is applied to the results from the text search. A single result set is returned in the response.
110+
A hybrid query combines full text search and vector search. The search engine runs full text and vector queries in parallel. All matches are evaluated for relevance using Reciprocal Rank Fusion (RRF) and a single result set is returned in the response.
111+
112+
You can also write queries that target just the vector fields, or just the text fields, within your search index. For example, besides vector queries, you might also want to write queries that filter by location or search over product names or titles, scenarios for which similarity search isn't a good fit.
113+
114+
The following example is from the [Postman collection of REST APIs](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-python) that demonstrate query configurations. It shows a complete request that includes vector search, full text search with filters, and semantic search with captions and answers. Semantic search is an optional premium feature. It's not required for vector search or hybrid search. For content that includes rich descriptive text *and* vectors, it's possible to benefit from all of the search modalities in one request.
111115

112116
```http
113117
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version={{api-version}}

0 commit comments

Comments
 (0)