Skip to content

Commit bc28a6c

Browse files
Merge pull request #5427 from MicrosoftDocs/main
Merged by Learn.Build PR Management system
2 parents 0565698 + fc4a458 commit bc28a6c

10 files changed

+173
-164
lines changed

articles/search/index-add-custom-analyzers.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: azure-ai-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: how-to
12-
ms.date: 01/16/2025
12+
ms.date: 06/16/2025
1313
---
1414

1515
# Add custom analyzers to string fields in an Azure AI Search index
@@ -20,7 +20,9 @@ In a custom analyzer, character filters prepare the input text before it's proce
2020

2121
## Why use a custom analyzer?
2222

23-
A custom analyzer gives you control over the process of converting plain text into indexable and searchable tokens by allowing you to choose which types of analysis or filtering to invoke, and the order in which they occur.
23+
You should consider a custom analyzer in classic search workflows that don't include large language models and their ability to handle content anomalies.
24+
25+
In class search, a custom analyzer gives you control over the process of converting plain text into indexable and searchable tokens by allowing you to choose which types of analysis or filtering to invoke, and the order in which they occur.
2426

2527
Create and assign a custom analyzer if none of default (Standard Lucene), built-in, or language analyzers are sufficient for your needs. You might also create a custom analyzer if you want to use a built-in analyzer with custom options. For example, if you wanted to change the `maxTokenLength` on Standard Lucene, you would create a custom analyzer, with a user-defined name, to set that option.
2628

articles/search/index-add-language-analyzers.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
title: Add language analyzers to string fields
33
titleSuffix: Azure AI Search
4-
description: Configure multi-lingual lexical analysis for non-English queries and indexes in Azure AI Search.
4+
description: Configure multi-lingual lexical analysis for non-English text queries and indexes in Azure AI Search.
55
author: HeidiSteen
66
manager: nitinme
77
ms.author: heidist
88
ms.service: azure-ai-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: how-to
12-
ms.date: 01/16/2025
12+
ms.date: 06/16/2025
1313
---
1414

1515
# Add language analyzers to string fields in an Azure AI Search index
@@ -18,9 +18,11 @@ A *language analyzer* is a specific type of [text analyzer](search-analyzers.md)
1818

1919
## When to use a language analyzer
2020

21-
You should consider a language analyzer when awareness of word or sentence structure adds value to text parsing. A common example is the association of irregular verb forms ("bring" and "brought) or plural nouns ("mice" and "mouse"). Without linguistic awareness, these strings are parsed on physical characteristics alone, which fails to catch the connection. Since large chunks of text are more likely to have this content, fields consisting of descriptions, reviews, or summaries are good candidates for a language analyzer.
21+
You should consider a language analyzer in classic search workflows that don't include large language models and their awareness of linguistic rules and multilingual content.
2222

23-
You should also consider language analyzers when content consists of non-Western language strings. While the [default analyzer (Standard Lucene)](search-analyzers.md#default-analyzer) is language-agnostic, the concept of using spaces and special characters (hyphens and slashes) to separate strings is more applicable to Western languages than non-Western ones.
23+
In class search, you might add a language analyzer when awareness of word or sentence structure adds value to text parsing. A common example is the association of irregular verb forms ("bring" and "brought) or plural nouns ("mice" and "mouse"). Without linguistic awareness, these strings are parsed on physical characteristics alone, which fails to catch the connection. Since large chunks of text are more likely to have this content, fields consisting of descriptions, reviews, or summaries are good candidates for a language analyzer.
24+
25+
You might also consider language analyzers when content consists of non-Western language strings. While the [default analyzer (Standard Lucene)](search-analyzers.md#default-analyzer) is language-agnostic, the concept of using spaces and special characters (hyphens and slashes) to separate strings is more applicable to Western languages than non-Western ones.
2426

2527
For example, in Chinese, Japanese, Korean (CJK), and other Asian languages, a space isn't necessarily a word delimiter. Consider the following Japanese string. Because it has no spaces, a language-agnostic analyzer would likely analyze the entire string as one token, when in fact the string is actually a phrase.
2628

@@ -37,7 +39,7 @@ A better experience is to search for individual words: 明るい (Bright), 私
3739

3840
Azure AI Search supports 35 language analyzers backed by Lucene, and 50 language analyzers backed by proprietary Microsoft natural language processing technology used in Office and Bing.
3941

40-
Some developers might prefer the more familiar, simple, open-source solution of Lucene. Lucene language analyzers are faster, but the Microsoft analyzers have advanced capabilities, such as lemmatization, word decompounding (in languages like German, Danish, Dutch, Swedish, Norwegian, Estonian, Finnish, Hungarian, Slovak) and entity recognition (URLs, emails, dates, numbers). If possible, you should run comparisons of both the Microsoft and Lucene analyzers to decide which one is a better fit. You can use [Analyze API](/rest/api/searchservice/indexes/analyze) to see the tokens generated from a given text using a specific analyzer.
42+
Some developers might prefer the open-source solution of Lucene. Lucene language analyzers are faster, but the Microsoft analyzers have advanced capabilities, such as lemmatization, word decompounding (in languages like German, Danish, Dutch, Swedish, Norwegian, Estonian, Finnish, Hungarian, Slovak) and entity recognition (URLs, emails, dates, numbers). If possible, you should run comparisons of both the Microsoft and Lucene analyzers to decide which one is a better fit. You can use [Analyze API](/rest/api/searchservice/indexes/analyze) to see the tokens generated from a given text using a specific analyzer.
4143

4244
Indexing with Microsoft analyzers is on average two to three times slower than their Lucene equivalents, depending on the language. Search performance shouldn't be significantly affected for average size queries.
4345

articles/search/search-agentic-retrieval-how-to-create.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ A knowledge agent specifies:
2323
+ A target search index used at query time
2424
+ Parameters on the index for setting default behaviors and response shaping
2525

26-
After you can create a knowledge agent, you can update its properties at any time. If the knowledge agent is in use, updates take effect on the next job.
26+
After you create a knowledge agent, you can update its properties at any time. If the knowledge agent is in use, updates take effect on the next job.
2727

2828
## Prerequisites
2929

@@ -37,7 +37,7 @@ After you can create a knowledge agent, you can update its properties at any tim
3737

3838
+ A search index containing plain text or vectors. The index must [meet the requirements for agentic retrieval](search-agentic-retrieval-how-to-index.md), including a [semantic configuration](semantic-how-to-configure.md) with the `defaultConfiguration` specified.
3939

40-
+ API requirements. To create or use a knowledge agent, use [2025-05-01-preview](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-05-01-preview&preserve-view=true) data plane REST API. Or, use a prerelease package of an Azure SDK that provides knowledge agent APIs: [Azure SDK for Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/CHANGELOG.md), [Azure SDK for .NET](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/search/Azure.Search.Documents/CHANGELOG.md#1170-beta3-2025-03-25), [Azure SDK for Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/search/azure-search-documents/CHANGELOG.md).
40+
+ API requirements. To create or use a knowledge agent, use the [2025-05-01-preview](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-05-01-preview&preserve-view=true) data plane REST API. Or, use a prerelease package of an Azure SDK that provides knowledge agent APIs: [Azure SDK for Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/CHANGELOG.md), [Azure SDK for .NET](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/search/Azure.Search.Documents/CHANGELOG.md#1170-beta3-2025-03-25), [Azure SDK for Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/search/azure-search-documents/CHANGELOG.md).
4141

4242
To follow the steps in this guide, we recommend [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for sending preview REST API calls to Azure AI Search. There's no portal support at this time.
4343

@@ -75,7 +75,7 @@ In Azure, you must have **Owner** or **User Access Administrator** permissions o
7575

7676
1. [Configure Azure AI Search to use a managed identity](search-howto-managed-identities-data-sources.md).
7777

78-
1. On your model provider, such Foundry Model, create a role assignment that gives the search service managed identity **Cognitive Services User** permissions. If you're testing locally, assign yourself to the same role.
78+
1. On your model provider, such as Foundry Model, create a role assignment that gives the search service managed identity **Cognitive Services User** permissions. If you're testing locally, assign yourself to the same role.
7979

8080
1. For local testing, follow the steps in [Quickstart: Connect without keys](search-get-started-rbac.md) to get a personal access token and to ensure you're logged in to a specific subscription and tenant. Paste your personal identity token into the `@accessToken` variable. A request that connects using your personal identity should look similar to the following example:
8181

@@ -111,7 +111,7 @@ You can use API keys if you don't have permission to create role assignments.
111111

112112
## Check for existing knowledge agents
113113

114-
The following request lists knowledge agents by name on your search service. Within the knowledge agents collection, all knowledge agents are uniquely named. It's helpful for knowing about existing knowledge agents for reuse or naming purposes.
114+
The following request lists knowledge agents by name. Within the knowledge agents collection, all knowledge agents must be uniquely named. It's helpful to know about existing knowledge agents for reuse or for naming new agents.
115115

116116
<!-- ### [**REST APIs**](#tab/rest-get) -->
117117

@@ -143,7 +143,6 @@ To create an agent, use the 2025-05-01-preview data plane REST API or an Azure S
143143

144144
```http
145145
@search-url=<YOUR SEARCH SERVICE URL>
146-
@search-api-key=<YOUR SEARCH SERVICE ADMIN API KEY>
147146
@agent-name=<YOUR AGENT NAME>
148147
@index-name=<YOUR INDEX NAME>
149148
@model-provider-url=<YOUR AZURE OPENAI RESOURCE URI>
@@ -222,7 +221,7 @@ Call the **retrieve** action on the knowledge agent object to confirm the model
222221
Replace "What are my vision benefits?" with a query string that's valid for your search index.
223222

224223
```http
225-
# Send Grounding Request
224+
# Send grounding request
226225
POST https://{{search-url}}/agents/{{agent-name}}/retrieve?api-version=2025-05-01-preview
227226
Content-Type: application/json
228227
Authorization: Bearer {{accessToken}}
@@ -261,7 +260,7 @@ For more information about the **retrieve** API and the shape of the response, s
261260
If you no longer need the agent, or if you need to rebuild it on the search service, use this request to delete the current object.
262261

263262
```http
264-
# Delete Agent
263+
# Delete agent
265264
DELETE https://{{search-url}}/agents/{{agent-name}}?api-version=2025-05-01-preview
266265
Authorization: Bearer {{accessToken}}
267266
```

articles/search/search-agentic-retrieval-how-to-index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.date: 05/05/2025
1515

1616
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
1717

18-
In Azure AI Search, *agentic retrieval* is a new parallel query architecture that uses a conversational language model for query planning, generating subqueries that broaden the scope of what's searchable and relevant.
18+
In Azure AI Search, *agentic retrieval* is a new parallel query architecture that uses a chat completion model for query planning, generating subqueries that broaden the scope of what's searchable and relevant.
1919

2020
Queries are created internally. Certain aspects of those generated queries are determined by your search index. This article explains which index elements affect agentic retrieval. None of the required elements are new or specific to agentic retrieval, which means you can use an existing index if it meets the criteria identified in this article, even if it was created using earlier API versions.
2121

articles/search/search-agentic-retrieval-how-to-pipeline.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ This exercise differs from the [Agentic Retrieval Quickstart](search-get-started
2626

2727
The following resources are required for this design pattern:
2828

29-
+ Azure AI Search, basic tier or higher, in a [region that provides semantic ranking](search-region-support.md).
29+
+ Azure AI Search, Basic pricing tier or higher, in a [region that provides semantic ranking](search-region-support.md).
3030

3131
+ A search index that satisfies the [index criteria for agentic retrieval](search-agentic-retrieval-how-to-index.md).
3232

articles/search/search-agentic-retrieval-how-to-retrieve.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,27 +32,27 @@ The retrieve request can include instructions for query processing that override
3232

3333
+ A [knowledge agent](search-agentic-retrieval-how-to-create.md) that represents the chat completion model and a valid target index.
3434

35-
+ Azure AI Search, in any [region that provides semantic ranker](search-region-support.md), on basic tier and higher. Your search service must have a [managed identity](search-howto-managed-identities-data-sources.md) for role-based access to a chat completion model.
35+
+ Azure AI Search, in any [region that provides semantic ranker](search-region-support.md), on Basic pricing tier and higher. Your search service must have a [managed identity](search-howto-managed-identities-data-sources.md) for role-based access to a chat completion model.
3636

3737
+ Permissions on Azure AI Search. **Search Index Data Reader** can run queries on Azure AI Search, but the search service managed identity must have **Cognitive Services User** permissions on the Azure OpenAI resource. For more information about local testing and obtaining access tokens, see [Quickstart: Connect without keys](search-get-started-rbac.md).
3838

39-
+ API requirements. To create or use a knowledge agent, use [2025-05-01-preview](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-05-01-preview&preserve-view=true) data plane REST API. Or, use a prerelease package of an Azure SDK that provides knowledge agent APIs: [Azure SDK for Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/CHANGELOG.md), [Azure SDK for .NET](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/search/Azure.Search.Documents/CHANGELOG.md#1170-beta3-2025-03-25), [Azure SDK for Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/search/azure-search-documents/CHANGELOG.md).
39+
+ API requirements. To create or use a knowledge agent, use the [2025-05-01-preview](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-05-01-preview&preserve-view=true) data plane REST API. Or, use a prerelease package of an Azure SDK that provides knowledge agent APIs: [Azure SDK for Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/CHANGELOG.md), [Azure SDK for .NET](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/search/Azure.Search.Documents/CHANGELOG.md#1170-beta3-2025-03-25), [Azure SDK for Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/search/azure-search-documents/CHANGELOG.md).
4040

4141
To follow the steps in this guide, we recommend [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for sending REST API calls to Azure AI Search. There's no portal support at this time.
4242

4343
## Call the retrieve action
4444

4545
Call the **retrieve** action on the knowledge agent object to invoke retrieval and return a response. Use the [2025-05-01-preview](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-05-01-preview&preserve-view=true) data plane REST API or an Azure SDK prerelease package that provides equivalent functionality for this task.
4646

47-
All `searchable` fields in the search index are in-scope for query execution. If the index includes vector fields, your index should have a valid vectorizer definition so that it can vectorize the query inputs. Otherwise, vector fields are ignored. The implied query type is `semantic`, and there's no search mode or selection of search fields.
47+
All `searchable` fields in the search index are in-scope for query execution. If the index includes vector fields, your index should have a valid [vectorizer definition](vector-search-how-to-configure-vectorizer.md) so that it can vectorize the query inputs. Otherwise, vector fields are ignored. The implied query type is `semantic`, and there's no search mode or selection of search fields.
4848

4949
The input for the retrieval route is chat conversation history in natural language, where the `messages` array contains the conversation.
5050

5151
```http
5252
@search-url=<YOUR SEARCH SERVICE URL>
5353
@accessToken=<YOUR PERSONAL ID>
5454
55-
# Send Grounding Request
55+
# Send grounding request
5656
POST https://{{search-url}}/agents/{{agent-name}}/retrieve?api-version=2025-05-01-preview
5757
Content-Type: application/json
5858
Authorization: Bearer {{accessToken}}
@@ -77,7 +77,7 @@ POST https://{{search-url}}/agents/{{agent-name}}/retrieve?api-version=2025-05-0
7777
"targetIndexParams" : [
7878
{
7979
"indexName" : "{{index-name}}",
80-
"filterAddOn" : "page_number eq 105'",
80+
"filterAddOn" : "page_number eq '105'",
8181
"IncludeReferenceSourceData": true,
8282
"rerankerThreshold" : 2.5,
8383
"maxDocsForReranker": 50
@@ -130,9 +130,9 @@ The body of the response is also structured in the chat message style format. Cu
130130

131131
**Key points**:
132132

133-
+ `content` is a JSON array. It's a single string composed of the most relevant documents (or chunks) found in the search index, given the query and chat history inputs. This array is your grounding data that a conversational language model uses to formulate a response to the user's question.
133+
+ `content` is a JSON array. It's a single string composed of the most relevant documents (or chunks) found in the search index, given the query and chat history inputs. This array is your grounding data that a chat completion model uses to formulate a response to the user's question.
134134

135-
+ text is the only valid value for type, and the text consists of the reference ID of the chunk (used for citation purposes), and any fields specified in the semantic configuration of the target index. In this example, you should assume the semantic configuration in the target index has a "title" field, a "terms" field, and a "content" filed.
135+
+ "text" is the only valid value for `type`, and it consists of the reference ID of the chunk (used for citation purposes), and any fields specified in the semantic configuration of the target index. In this example, you should assume the semantic configuration in the target index has a "title" field, a "terms" field, and a "content" field.
136136

137137
> [!NOTE]
138138
> The `maxOutputSize` property on the [knowledge agent](search-agentic-retrieval-how-to-create.md) determines the length of the string. We recommend 5,000 tokens.

articles/search/search-how-to-index-logic-apps-indexers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Review the following requirements before you start:
5252

5353
+ You must be an **Owner** or **Contributor** in your Azure subscription, with permissions to create resources.
5454

55-
+ Azure AI Search, basic tier or higher if you want to use a search service identity for connections to an Azure data source, otherwise you can use any tier, subject to tier limits.
55+
+ Azure AI Search, Basic pricing tier or higher if you want to use a search service identity for connections to an Azure data source, otherwise you can use any tier, subject to tier limits.
5656

5757
+ Azure OpenAI, with a [supported embedding model](#supported-models) deployment. Vectorization is integrated into the workflow. If you don't need vectors, you can ignore the fields or try another indexing strategy.
5858

0 commit comments

Comments
 (0)