Skip to content

Commit 875f605

Browse files
committed
Acrolinx edits, porting lost edits from other PR
1 parent f65aba5 commit 875f605

File tree

4 files changed

+13
-19
lines changed

4 files changed

+13
-19
lines changed

articles/search/search-get-started-vector.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,7 @@ There are several queries to demonstrate the patterns. We use the same query str
248248

249249
In this vector query, which is shortened for brevity, the "value" contains the vectorized text of the query input, "fields" determines which vector fields are searched, and "k" specifies the number of nearest neighbors to return.
250250

251-
Recall that the vector query was generated from this string: "what Azure services support full text search". The search targets the `contentVector` field.
251+
Recall that the vector query was generated from this string: `"what Azure services support full text search"`. The search targets the `contentVector` field.
252252

253253
```http
254254
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version={{api-version}}
@@ -277,7 +277,7 @@ The response includes 5 results, and each result provides a search score, title,
277277

278278
You can add filters, but the filters are applied to the nonvector content in your index. In this example, the filter applies to the "category" field.
279279

280-
The response is 10 Azure services, with a search score, title, and category for each one. You'll also notice the `select` property here to visually only see the fields are necessary in my the response.
280+
The response is 10 Azure services, with a search score, title, and category for each one. Notice the `select` property. It's used to select specific fields for the response.
281281

282282
```http
283283
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version={{api-version}}
@@ -304,7 +304,7 @@ api-key: {{admin-api-key}}
304304

305305
### Cross-field vector search
306306

307-
Cross-field vector search allows you to send a single query across multiple vector fields in your vector index. For this example, I want to calculate the similarity across both `titleVector` and `contentVector`:
307+
A cross-field vector query sends a single query across multiple vector fields in your search index. This query example looks for similarity in both `titleVector` and `contentVector`:
308308

309309
```http
310310
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version={{api-version}}
@@ -329,7 +329,7 @@ api-key: {{admin-api-key}}
329329

330330
### Multi-query vector search
331331

332-
Multi-query vector search allows you to send a multiple queries across multiple vector fields in your vector index. For this example, I want to calculate the similarity across both `titleVector` and `contentVector` but will send in two different query embeddings respectively. This scenario is ideal for multi-modal use cases where you want to search over a `textVector` field and an `imageVector` field. You can also use this scenario if you have different embedding models with different dimensions in your search index.
332+
Multi-query vector search sends multiple queries across multiple vector fields in your search index. This query example looks for similarity in both `titleVector` and `contentVector`, but sends in two different query embeddings respectively. This scenario is ideal for multi-modal use cases where you want to search over a `textVector` field and an `imageVector` field. You can also use this scenario if you have different embedding models with different dimensions in your search index.
333333

334334
```http
335335
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version={{api-version}}
@@ -365,9 +365,9 @@ api-key: {{admin-api-key}}
365365

366366
### Hybrid search
367367

368-
Hybrid search allows to compose keyword queries and vector queries in a single search request.
368+
Hybrid search consists of keyword queries and vector queries in a single search request.
369369

370-
The response includes the top 10 ordereded by search score. Both vector queries and free text queries are assigned a search score according to the scoring or similarity functions configured on the fields (BM25 for text fields). The scores are merged using Reciprocal Rank Fusion (RRF) to weight each document with the inverse of its position in the ranked result set.
370+
The response includes the top 10 ordered by search score. Both vector queries and free text queries are assigned a search score according to the scoring or similarity functions configured on the fields (BM25 for text fields). The scores are merged using [Reciprocal Rank Fusion (RRF)](vector-search-ranking.md#reciprocal-rank-fusion-rrf-for-hybrid-queries) to weight each document with the inverse of its position in the ranked result set.
371371

372372
```http
373373
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version={{api-version}}
@@ -392,9 +392,9 @@ api-key: {{admin-api-key}}
392392
}
393393
```
394394

395-
Compare the responses between Single Vector Search and Simple Hybrid Search for the top result. The different ranking algorithms produce scores that may seem in different magnitudes. This is by design of the RRF algorithm. When using Hybrid search, it's important to note that the reciprocal of the ranked documents are taken given the relatively smaller score vs pure vector search.
395+
Compare the responses between Single Vector Search and Simple Hybrid Search for the top result. The different ranking algorithms, HNSW's similarity metric and RRF respectively, produce scores that have different magnitudes. This is by design. Note that RRF scores may appear quite low, even with a high similarity match. This is a characteristic of the RRF algorithm. When using hybrid search and RRF, more of the reciprocal of the ranked documents are included in the results, given the relatively smaller score of the RRF ranked documents, as opposed to pure vector search.
396396

397-
**Single Vector Search**: Results ordered by cosine similarity (default vector similarity distance function)
397+
**Single Vector Search**: Results ordered by cosine similarity (default vector similarity distance function).
398398

399399
```json
400400
{
@@ -418,7 +418,7 @@ Compare the responses between Single Vector Search and Simple Hybrid Search for
418418

419419
### Hybrid search with filter
420420

421-
This example adds a filter, which is applied to the non-vector content of the search index.
421+
This example adds a filter, which is applied to the nonvector content of the search index.
422422

423423
```http
424424
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version={{api-version}}
@@ -479,7 +479,7 @@ api-key: {{admin-api-key}}
479479

480480
### Semantic hybrid search with filter
481481

482-
Here's the last query in the collection. It's the same hybrid query as above, with a filter.
482+
Here's the last query in the collection. It's the same hybrid query as the previous example, but with a filter.
483483

484484
```http
485485
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version={{api-version}}

articles/search/vector-search-how-to-chunk-documents.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ This article describes several approaches for chunking large documents so that y
1919

2020
## Why is chunking important?
2121

22-
The models used to generate embedding vectors have maximum limits on the text fragments provided as input. For example, the maximum length of input text for the [Azure OpenAI](/azure/cognitive-services/openai/how-to/embeddings) embedding models is 8,191 tokens (equivalent to around 6000 words of text). If you're using these models to generate embeddings, it's critical that the input text stays under the limit. Partitioning your content into chunks ensures that your data can be processed by the Large Language Models (LLM) used for indexing and queries.
22+
The models used to generate embedding vectors have maximum limits on the text fragments provided as input. For example, the maximum length of input text for the [Azure OpenAI](/azure/cognitive-services/openai/how-to/embeddings) embedding models is 8,191 tokens. Given that each token is around 4 tokens for common OpenAI models, this maximum limit is equivalent to around 6000 words of text. If you're using these models to generate embeddings, it's critical that the input text stays under the limit. Partitioning your content into chunks ensures that your data can be processed by the Large Language Models (LLM) used for indexing and queries.
2323

2424
## How chunking fits into the workflow
2525

articles/search/vector-search-how-to-create-index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,4 +230,4 @@ api-key: {{admin-api-key}}
230230

231231
As a next step, we recommend [Query vector data in a search index](vector-search-how-to-query.md).
232232

233-
You might also consider reviewing the demo code for [Python](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-python), [JavaScript](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-javascript), or [C#](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-dotnet).
233+
You might also consider reviewing the demo code for [Python](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-python) or [C#](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-dotnet).

articles/search/vector-search-how-to-generate-embeddings.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -63,14 +63,8 @@ print(embeddings)
6363

6464
## Tips and recommendations for embedding model integration
6565

66-
<!-- + Python and JavaScript demos offer more scalability than the REST APIs for generating embeddings. As of this writing, the REST API doesn't currently support batching. -->
67-
<!--
68-
+ We've done proof-of-concept testing with indexers and skillsets, where a custom skill calls a machine learning model to generate embeddings. There's currently no tutorial or walkthrough, but we intend to provide this content as part of the public preview launch, if not sooner. -->
69-
<!--
70-
+ We've done proof-of-concept testing of embeddings for a thousand images using [image retrieval vectorization in Cognitive Services](/azure/cognitive-services/computer-vision/how-to/image-retrieval). We hope to provide a demo of this soon. -->
71-
7266
+ **Identify use cases:** Evaluate the specific use cases where embedding model integration for vector search features can add value to your search solution. This can include matching image content with text content, cross-lingual searches, or finding similar documents.
73-
+ **Optimize cost and performance**: Vector search can be resource-intensive, so consider only vectorizing the fields that contain semantic meaning
67+
+ **Optimize cost and performance**: Vector search can be resource-intensive and is subject to maximum limits, so consider only vectorizing the fields that contain semantic meaning.
7468
+ **Choose the right embedding model:** Select an appropriate model for your specific use case, such as word embeddings for text-based searches or image embeddings for visual searches. Consider using pre-trained models like **text-embedding-ada-002** from OpenAI or **Image Retreival** REST API from [Azure AI Computer Vision](/azure/cognitive-services/computer-vision/how-to/image-retrieval).
7569
+ **Normalize Vector lengths**: Ensure that the vector lengths are normalized before storing them in the search index to improve the accuracy and performance of similarity search. Most pre-trained models already are normalized but not all.
7670
+ **Fine-tune the model**: If needed, fine-tune the selected model on your domain-specific data to improve its performance and relevance to your search application.

0 commit comments

Comments
 (0)