Skip to content

Commit 3764071

Browse files
authored
Merge pull request #269420 from HeidiSteen/heidist-fix
[azure search] March freshness pass
2 parents 6127c09 + 81d57ce commit 3764071

10 files changed

+113
-146
lines changed

articles/search/cognitive-search-create-custom-skill-example.md

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: gmndrg
66
ms.author: gimondra
77
ms.service: cognitive-search
88
ms.topic: conceptual
9-
ms.date: 12/01/2022
9+
ms.date: 03/18/2024
1010
ms.custom:
1111
- devx-track-csharp
1212
- ignite-2023
@@ -20,31 +20,33 @@ In this example, learn how to create a web API custom skill. This skill will acc
2020

2121
+ Read about [custom skill interface](cognitive-search-custom-skill-interface.md) article if you aren't familiar with the input/output interface that a custom skill should implement.
2222

23-
+ Create a [Bing Search v7 resource](https://portal.azure.com/#create/Microsoft.BingSearch) through the Azure Portal. A free tier is available and sufficient for this example.
23+
+ Create a [Bing Search resource](https://portal.azure.com/#create/Microsoft.BingSearch) through the Azure portal. A free tier is available and sufficient for this example.
2424

25-
+ Install [Visual Studio 2019](https://www.visualstudio.com/vs/) or later, including the Azure development workload.
25+
+ Install [Visual Studio](https://www.visualstudio.com/vs/) or later.
2626

2727
## Create an Azure Function
2828

2929
Although this example uses an Azure Function to host a web API, it isn't required. As long as you meet the [interface requirements for a cognitive skill](cognitive-search-custom-skill-interface.md), the approach you take is immaterial. Azure Functions, however, make it easy to create a custom skill.
3030

31-
### Create a function app
31+
### Create a project
3232

3333
1. In Visual Studio, select **New** > **Project** from the File menu.
3434

35-
1. In the New Project dialog, select **Azure Functions** as the template and select **Next**. Type a name for your project, and select **Create**. The function app name must be valid as a C# namespace, so don't use underscores, hyphens, or any other non-alphanumeric characters.
35+
1. Choose **Azure Functions** as the template and select **Next**. Type a name for your project, and select **Create**. The function app name must be valid as a C# namespace, so don't use underscores, hyphens, or any other non-alphanumeric characters.
3636

37-
1. Select the type to be **HTTP Trigger**
37+
1. Select a framework that has long term support.
3838

39-
1. For Storage Account, you may select **None**, as you won't need any storage for this function.
39+
1. Choose **HTTP Trigger** for the type of function to add to the project.
40+
41+
1. Choose **Function** for the authorization level.
4042

4143
1. Select **Create** to create the function project and HTTP triggered function.
4244

43-
### Modify the code to call the Bing Entity Search Service
45+
### Add code to call the Bing Entity API
4446

45-
Visual Studio creates a project and in it a class that contains boilerplate code for the chosen function type. The *FunctionName* attribute on the method sets the name of the function. The *HttpTrigger* attribute specifies that the function is triggered by an HTTP request.
47+
Visual Studio creates a project with boilerplate code for the chosen function type. The *FunctionName* attribute on the method sets the name of the function. The *HttpTrigger* attribute specifies that the function is triggered by an HTTP request.
4648

47-
Now, replace all of the content of the file *Function1.cs* with the following code:
49+
Replace the contents of *Function1.cs* with the following code:
4850

4951
```csharp
5052
using System;
@@ -308,10 +310,6 @@ namespace SampleSkills
308310

309311
Make sure to enter your own *key* value in the `key` constant based on the key you got when signing up for the Bing entity search API.
310312

311-
This sample includes all necessary code in a single file for convenience. You can find a slightly more structured version of that same skill in [the power skills repository](https://github.com/Azure-Samples/azure-search-power-skills/tree/main/Text/BingEntitySearch).
312-
313-
Of course, you may rename the file from `Function1.cs` to `BingEntitySearch.cs`.
314-
315313
## Test the function from Visual Studio
316314

317315
Press **F5** to run the program and test function behaviors. In this case, we'll use the function below to look up two entities. Use a REST client to issue a call like the one shown below:

articles/search/cognitive-search-custom-skill-scale.md

Lines changed: 20 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -9,51 +9,48 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: conceptual
12-
ms.date: 12/01/2022
12+
ms.date: 03/18/2024
1313
---
1414

1515
# Efficiently scale out a custom skill
1616

1717
Custom skills are web APIs that implement a specific interface. A custom skill can be implemented on any publicly addressable resource. The most common implementations for custom skills are:
18-
* Azure Functions for custom logic skills
19-
* Azure Webapps for simple containerized AI skills
20-
* Azure Kubernetes service for more complex or larger skills.
18+
19+
+ Azure Functions for custom logic skills
20+
+ Azure Web apps for simple containerized AI skills
21+
+ Azure Kubernetes service for more complex or larger skills.
2122

2223
## Prerequisites
2324

24-
+ Review the [custom skill interface](cognitive-search-custom-skill-interface.md) for an introduction into the input/output interface that a custom skill should implement.
25+
+ Review the [custom skill interface](cognitive-search-custom-skill-interface.md) for an introduction into the inputs and outputs that a custom skill should implement.
2526

26-
+ Set up your environment. You could start with [this tutorial end-to-end](../azure-functions/create-first-function-vs-code-python.md) to set up serverless Azure Function using Visual Studio Code and Python extensions.
27+
+ Set up your environment. You can start with [this tutorial end-to-end](../azure-functions/create-first-function-vs-code-python.md) to set up serverless Azure Function using Visual Studio Code with the Python extension.
2728

2829
## Skillset configuration
2930

30-
Configuring a custom skill for maximizing throughput of the indexing process requires an understanding of the skill, indexer configurations and how the skill relates to each document. For example, the number of times a skill is invoked per document and the expected duration per invocation.
31-
32-
### Skill settings
33-
34-
On the [custom skill](cognitive-search-custom-skill-web-api.md) set the following parameters.
31+
The following properties on a [custom skill](cognitive-search-custom-skill-web-api.md) are used for scale.
3532

3633
1. Set `batchSize` of the custom skill to configure the number of records sent to the skill in a single invocation of the skill.
3734

38-
2. Set the `degreeOfParallelism` to calibrate the number of concurrent requests the indexer will make to your skill.
35+
1. Set the `degreeOfParallelism` to calibrate the number of concurrent requests the indexer makes to your skill.
3936

40-
3. Set `timeout`to a value sufficient for the skill to respond with a valid response.
37+
1. Set `timeout`to a value sufficient for the skill to respond with a valid response.
4138

42-
4. In the `indexer` definition, set [`batchSize`](/rest/api/searchservice/create-indexer#indexer-parameters) to the number of documents that should be read from the data source and enriched concurrently.
39+
1. In the `indexer` definition, set [`batchSize`](/rest/api/searchservice/create-indexer#indexer-parameters) to the number of documents that should be read from the data source and enriched concurrently.
4340

4441
### Considerations
4542

46-
Setting these variables to optimize the indexers performance requires determining if your skill performs better with many concurrent small requests or fewer large requests. A few questions to consider are:
43+
There's no "one size fits all" set of recommendations. You should plan on testing different configurations to reach an optimum result. Strategies are either fewer large requests or many small requests.
4744

48-
* What is the skill invocation cardinality? Does the skill execute once for each document, for instance a document classification skill, or could the skill execute multiple times per document, a paragraph classification skill?
45+
+ Skill invocation cardinality: Does the skill execute once for each document (`/document/content`) or multiple times per document (`/document/reviews_text/pages/*`).
4946

50-
* On average how many documents are read from the data source to fill out a skill request based on the skill batch size? Ideally, this should be less than the indexer batch size. With batch sizes greater than 1 your skill can receive records from multiple source documents. For example if the indexer batch count is 5 and the skill batch count is 50 and each document generates only five records, the indexer will need to fill a custom skill request across multiple indexer batches.
47+
+ On average, how many documents are read from the data source to fill out a skill request based on the skill batch size? Ideally, this should be less than the indexer batch size. With batch sizes greater than one, your skill can receive records from multiple source documents. For example, if the indexer batch count is 5, and the skill batch count is 50 and each document generates only five records, the indexer will need to fill a custom skill request across multiple indexer batches.
5148

52-
* The average number of requests an indexer batch can generate should give you an optimal setting for the degrees of parallelism. If your infrastructure hosting the skill cannot support that level of concurrency, consider dialing down the degrees of parallelism. As a best practice, test your configuration with a few documents to validate your choices on the parameters.
49+
+ The average number of requests an indexer batch can generate should give you an optimal setting for the degrees of parallelism. If your infrastructure hosting the skill can't support that level of concurrency, consider dialing down the degrees of parallelism. As a best practice, test your configuration with a few documents to validate your choices on the parameters.
5350

54-
* Testing with a smaller sample of documents, evaluate the execution time of your skill to the overall time taken to process the subset of documents. Does your indexer spend more time building a batch or waiting for a response from your skill?
51+
+ Testing with a smaller sample of documents, evaluate the execution time of your skill to the overall time taken to process the subset of documents. Does your indexer spend more time building a batch or waiting for a response from your skill?
5552

56-
* Consider the upstream implications of parallelism. If the input to a custom skill is an output from a prior skill, are all the skills in the skillset scaled out effectively to minimize latency?
53+
+ Consider the upstream implications of parallelism. If the input to a custom skill is an output from a prior skill, are all the skills in the skillset scaled out effectively to minimize latency?
5754

5855
## Error handling in the custom skill
5956

@@ -83,25 +80,16 @@ Start by testing your custom skill with a REST API client to validate:
8380

8481
* Returns a valid HTTP status code
8582

86-
Create a [debug session](cognitive-search-debug-session.md) to add your skill to the skillset and make sure it produces a valid enrichment. While a debug session does not allow you to tune the performance of the skill, it enables you to ensure that the skill is configured with valid values and returns the expected enriched objects.
83+
Create a [debug session](cognitive-search-debug-session.md) to add your skill to the skillset and make sure it produces a valid enrichment. While a debug session doesn't allow you to tune the performance of the skill, it enables you to ensure that the skill is configured with valid values and returns the expected enriched objects.
8784

8885
## Best practices
8986

9087
* While skills can accept and return larger payloads, consider limiting the response to 150 MB or less when returning JSON.
9188

9289
* Consider setting the batch size on the indexer and skill to ensure that each data source batch generates a full payload for your skill.
9390

94-
* For long running tasks, set the timeout to a high enough value to ensure the indexer does not error out when processing documents concurrently.
91+
* For long running tasks, set the timeout to a high enough value to ensure the indexer doesn't error out when processing documents concurrently.
9592

9693
* Optimize the indexer batch size, skill batch size, and skill degrees of parallelism to generate the load pattern your skill expects, fewer large requests or many small requests.
9794

98-
* Monitor custom skills with detailed logs of failures as you can have scenarios where specific requests consistently fail as a result of the data variability.
99-
100-
101-
## Next steps
102-
Congratulations! Your custom skill is now scaled right to maximize throughput on the indexer.
103-
104-
+ [Power Skills: a repository of custom skills](https://github.com/Azure-Samples/azure-search-power-skills)
105-
+ [Add a custom skill to an AI enrichment pipeline](cognitive-search-custom-skill-interface.md)
106-
+ [Add an Azure Machine Learning skill](./cognitive-search-aml-skill.md)
107-
+ [Use debug sessions to test changes](./cognitive-search-debug-session.md)
95+
* Monitor custom skills with detailed logs of failures as you can have scenarios where specific requests consistently fail as a result of the data variability.

articles/search/index-sql-relational-data.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
11
---
22
title: Model SQL relational data for import and indexing
33
titleSuffix: Azure AI Search
4-
description: Learn how to model relational data, de-normalized into a flat result set, for indexing and full text search in Azure AI Search.
4+
description: Learn how to model relational data, denormalized into a flat result set, for indexing and full text search in Azure AI Search.
55
author: HeidiSteen
66
manager: nitinme
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: how-to
12-
ms.date: 02/22/2023
12+
ms.date: 03/18/2024
1313
---
14+
1415
# How to model relational SQL data for import and indexing in Azure AI Search
1516

1617
Azure AI Search accepts a flat rowset as input to the [indexing pipeline](search-what-is-an-index.md). If your source data originates from joined tables in a SQL Server relational database, this article explains how to construct the result set, and how to model a parent-child relationship in an Azure AI Search index.
1718

18-
As an illustration, we refer to a hypothetical hotels database, based on [demo data](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/hotels). Assume the database consists of a Hotels$ table with 50 hotels, and a Rooms$ table with rooms of varying types, rates, and amenities, for a total of 750 rooms. There's a one-to-many relationship between the tables. In our approach, a view provides the query that returns 50 rows, one row per hotel, with associated room detail embedded into each row.
19+
As an illustration, we refer to a hypothetical hotels database, based on [demo data](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/hotels). Assume the database consists of a `Hotels$` table with 50 hotels, and a `Rooms$` table with rooms of varying types, rates, and amenities, for a total of 750 rooms. There's a one-to-many relationship between the tables. In our approach, a view provides the query that returns 50 rows, one row per hotel, with associated room detail embedded into each row.
1920

2021
![Tables and view in the Hotels database](media/index-sql-relational-data/hotels-database-tables-view.png "Tables and view in the Hotels database")
2122

@@ -43,7 +44,7 @@ To deliver the expected search experience, your data set should consist of one r
4344

4445
The solution is to capture the room detail as nested JSON, and then insert the JSON structure into a field in a view, as shown in the second step.
4546

46-
1. Assume you've two joined tables, Hotels$ and Rooms$, that contain details for 50 hotels and 750 rooms and are joined on the HotelID field. Individually, these tables contain 50 hotels and 750 related rooms.
47+
1. Assume you have two joined tables, `Hotels$` and `Rooms$`, that contain details for 50 hotels and 750 rooms and are joined on the HotelID field. Individually, these tables contain 50 hotels and 750 related rooms.
4748

4849
```sql
4950
CREATE TABLE [dbo].[Hotels$](
@@ -106,7 +107,7 @@ This rowset is now ready for import into Azure AI Search.
106107

107108
## Use a complex collection for the "many" side of a one-to-many relationship
108109

109-
On the Azure AI Search side, create an index schema that models the one-to-many relationship using nested JSON. The result set you created in the previous section generally corresponds to the index schema provided below (we cut some fields for brevity).
110+
On the Azure AI Search side, create an index schema that models the one-to-many relationship using nested JSON. The result set you created in the previous section generally corresponds to the index schema provided next (we cut some fields for brevity).
110111

111112
The following example is similar to the example in [How to model complex data types](search-howto-complex-data-types.md#create-complex-fields). The *Rooms* structure, which has been the focus of this article, is in the fields collection of an index named *hotels*. This example also shows a complex type for *Address*, which differs from *Rooms* in that it's composed of a fixed set of items, as opposed to the multiple, arbitrary number of items allowed in a collection.
112113

@@ -144,11 +145,11 @@ The following example is similar to the example in [How to model complex data ty
144145
}
145146
```
146147

147-
Given the previous result set and the above index schema, you've all the required components for a successful indexing operation. The flattened data set meets indexing requirements yet preserves detail information. In the Azure AI Search index, search results will fall easily into hotel-based entities, while preserving the context of individual rooms and their attributes.
148+
Given the previous result set and the above index schema, you have all the required components for a successful indexing operation. The flattened data set meets indexing requirements yet preserves detail information. In the Azure AI Search index, search results fall easily into hotel-based entities, while preserving the context of individual rooms and their attributes.
148149

149150
## Facet behavior on complex type subfields
150151

151-
Fields that have a parent, such as the fields under Address and Rooms, are called *subfields*. Although you can assign a "facetable" attribute to a subfield, the count of the facet will always be for the main document.
152+
Fields that have a parent, such as the fields under Address and Rooms, are called *subfields*. Although you can assign a "facetable" attribute to a subfield, the count of the facet is always for the main document.
152153

153154
For complex types like Address, where there's just one "Address/City" or "Address/stateProvince" in the document, the facet behavior works as expected. However, in the case of Rooms, where there are multiple subdocuments for each main document, the facet counts can be misleading.
154155

0 commit comments

Comments
 (0)