Skip to content

Commit 2e78625

Browse files
authored
Merge pull request #1905 from HeidiSteen/heidist-freshness
Update index content, port material from old REST API to concept docs
2 parents 48e40df + 44d1fc0 commit 2e78625

File tree

5 files changed

+137
-29
lines changed

5 files changed

+137
-29
lines changed

articles/search/search-how-to-define-index-projections.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -292,8 +292,7 @@ For data sources that provide change tracking and deletion detection, an indexer
292292

293293
If you add new content to your data source, new chunks or child documents are added to the index on the next indexer run.
294294

295-
If you modify existing content in the data source, chunks are updated incrementally in the search index if the data source you're using supports change tracking and deletion detection. For exammple, if a word or sentence changes in a document, the chunk in the target index that contains that word or sentence is updated on the next indexer run. Other types of updates, such as changing a field type and some attributions, aren't supported for existing fields. For more information about allowed updates, see [
296-
Change an index schema](search-howto-reindex.md#change-an-index-schema).
295+
If you modify existing content in the data source, chunks are updated incrementally in the search index if the data source you're using supports change tracking and deletion detection. For exammple, if a word or sentence changes in a document, the chunk in the target index that contains that word or sentence is updated on the next indexer run. Other types of updates, such as changing a field type and some attributions, aren't supported for existing fields. For more information about allowed updates, see [Update an index schema](search-howto-reindex.md#update-an-index-schema).
297296

298297
Some data sources like [Azure Storage](search-howto-index-changed-deleted-blobs.md) support change and deletion tracking by default, based on the timestamp. Other data sources such as [OneLake](search-how-to-index-onelake-files.md), [Azure SQL](search-how-to-index-sql-database.md), or [Azure Cosmos DB](search-howto-index-cosmosdb.md) must be configured for change tracking.
299298

articles/search/search-howto-reindex.md

Lines changed: 127 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.service: azure-ai-search
1111
ms.custom:
1212
- ignite-2024
1313
ms.topic: how-to
14-
ms.date: 08/14/2024
14+
ms.date: 12/09/2024
1515
---
1616

1717
# Update or rebuild an index in Azure AI Search
@@ -20,28 +20,126 @@ This article explains how to update an existing index in Azure AI Search with sc
2020

2121
During active development, it's common to drop and rebuild indexes when you're iterating over index design. Most developers work with a small representative sample of their data so that reindexing goes faster.
2222

23-
For schema changes on applications already in production, we recommend creating and testing a new index that runs side by side an existing index. Use an [index alias](search-how-to-alias.md) to swap in the new index while avoiding changes your application code.
23+
For schema changes on applications already in production, we recommend creating and testing a new index that runs side by side an existing index. Use an [index alias](search-how-to-alias.md) to swap in the new index so that you can avoid changes your application code.
2424

2525
## Update content
2626

27-
Incremental indexing and synchronizing an index against changes in source data is fundamental to most search applications. This section explains the workflow for updating field contents in a search index.
27+
Incremental indexing and synchronizing an index against changes in source data is fundamental to most search applications. This section explains the workflow for updating field contents in a search index through the REST API, but the Azure SDKs provide equivalent functionality.
2828

29-
1. Use the same techniques for loading documents: [Documents - Index (REST)](/rest/api/searchservice/documents) or an equivalent API in the Azure SDKs. For more information about indexing techniques, see [Load documents](search-how-to-load-search-index.md).
29+
The body of the request contains one or more documents to be indexed. Documents are identified by a unique case-sensitive key. Each document is associated with an action: "upload", "delete", "merge", or "mergeOrUpload". Upload requests must include the document data as a set of key/value pairs.
3030

31-
1. Set the `@search.action` parameter to determine the effect on existing documents:
31+
```json
32+
{
33+
"value": [
34+
{
35+
"@search.action": "upload (default) | merge | mergeOrUpload | delete",
36+
"key_field_name": "unique_key_of_document", (key/value pair for key field from index schema)
37+
"field_name": field_value (key/value pairs matching index schema)
38+
...
39+
},
40+
...
41+
]
42+
}
43+
```
44+
45+
+ First, use the APIs for loading documents, such as [Documents - Index (REST)](/rest/api/searchservice/documents) or an equivalent API in the Azure SDKs. For more information about indexing techniques, see [Load documents](search-how-to-load-search-index.md).
46+
47+
+ For a large update, batching (up to 1,000 documents per batch, or about 16 MB per batch, whichever limit comes first) is recommended and significantly improves indexing performance.
48+
49+
+ Set the `@search.action` parameter on the API to determine the effect on existing documents.
3250

3351
| Action | Effect |
3452
|--------|--------|
3553
| delete | Removes the entire document from the index. If you want to remove an individual field, use merge instead, setting the field in question to null. Deleted documents and fields don't immediately free up space in the index. Every few minutes, a background process performs the physical deletion. Whether you use the Azure portal or an API to return index statistics, you can expect a small delay before the deletion is reflected in the Azure portal and through APIs. |
36-
| merge | Updates a document that already exists, and fails a document that can't be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type `Collection(Edm.String)`. For example, if a `tags` field starts with a value of `["budget"]` and you execute a merge with `["economy", "pool"]`, the final value of the `tags` field is `["economy", "pool"]`. It won't be `["budget", "economy", "pool"]`. |
54+
| merge | Updates a document that already exists, and fails a document that can't be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type `Collection(Edm.String)`. For example, if a `tags` field starts with a value of `["budget"]` and you execute a merge with `["economy", "pool"]`, the final value of the `tags` field is `["economy", "pool"]`. It won't be `["budget", "economy", "pool"]`. <br><br>The same behavior applies to complex collections. If the document contains a complex collection field named Rooms with a value of `[{ "Type": "Budget Room", "BaseRate": 75.0 }]`, and you execute a merge with a value of `[{ "Type": "Standard Room" }, { "Type": "Budget Room", "BaseRate": 60.5 }]`, the final value of the Rooms field will be `[{ "Type": "Standard Room" }, { "Type": "Budget Room", "BaseRate": 60.5 }]`. It won't append or merge new and existing values. |
3755
| mergeOrUpload | Behaves like merge if the document exists, and upload if the document is new. This is the most common action for incremental updates. |
3856
| upload | Similar to an "upsert" where the document is inserted if it's new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field's value is set to null. |
3957

40-
1. Post the update.
58+
Queries continue to run during indexing, but if you're updating or removing existing fields, you can expect mixed results and a higher incidence of throttling.
59+
60+
> [!NOTE]
61+
> There are no ordering guarantees for which action in the request body is executed first. It's not recommended to have multiple "merge" actions associated with the same document in a single request body. If there are multiple "merge" actions required for the same document, perform the merging client-side before updating the document in the search index.
62+
63+
### Responses
64+
65+
Status code 200 is returned for a successful response, meaning that all items have been stored durably and will start to be indexed. Indexing runs in the background and makes new documents available (that is, queryable and searchable) a few seconds after the indexing operation completed. The specific delay depends on the load on the service.
66+
67+
Successful indexing is indicated by the status property being set to true for all items, as well as the `statusCode` property being set to either 201 (for newly uploaded documents) or 200 (for merged or deleted documents):
68+
69+
```json
70+
{
71+
"value": [
72+
{
73+
"key": "unique_key_of_new_document",
74+
"status": true,
75+
"errorMessage": null,
76+
"statusCode": 201
77+
},
78+
{
79+
"key": "unique_key_of_merged_document",
80+
"status": true,
81+
"errorMessage": null,
82+
"statusCode": 200
83+
},
84+
{
85+
"key": "unique_key_of_deleted_document",
86+
"status": true,
87+
"errorMessage": null,
88+
"statusCode": 200
89+
}
90+
]
91+
}
92+
```
93+
94+
Status code 207 is returned when at least one item wasn't successfully indexed. Items that haven't been indexed have the status field set to false. The `errorMessage` and `statusCode` properties indicate the reason for the indexing error:
4195

42-
Queries continue to run, but if you're updating or removing existing fields, you can expect mixed results and a higher incidence of throttling.
96+
```json
97+
{
98+
"value": [
99+
{
100+
"key": "unique_key_of_document_1",
101+
"status": false,
102+
"errorMessage": "The search service is too busy to process this document. Please try again later.",
103+
"statusCode": 503
104+
},
105+
{
106+
"key": "unique_key_of_document_2",
107+
"status": false,
108+
"errorMessage": "Document not found.",
109+
"statusCode": 404
110+
},
111+
{
112+
"key": "unique_key_of_document_3",
113+
"status": false,
114+
"errorMessage": "Index is temporarily unavailable because it was updated with the 'allowIndexDowntime' flag set to 'true'. Please try again later.",
115+
"statusCode": 422
116+
}
117+
]
118+
}
119+
```
120+
121+
The `errorMessage` property indicates the reason for the indexing error if possible.
122+
123+
The following table explains the various per-document status codes that can be returned in the response. Some status codes indicate problems with the request itself, while others indicate temporary error conditions. The latter you should retry after a delay.
124+
125+
| Status code | Meaning | Retryable | Notes |
126+
|-------------|---------|-----------|-------|
127+
| 200 | Document was successfully modified or deleted. | n/a | Delete operations are idempotent. That is, even if a document key doesn't exist in the index, attempting a delete operation with that key results in a 200 status code. |
128+
| 201 | Document was successfully created. | n/a | |
129+
| 400 | There was an error in the document that prevented it from being indexed. | No | The error message in the response indicates what is wrong with the document.|
130+
| 404 | The document couldn't be merged because the given key doesn't exist in the index. | No | This error doesn't occur for uploads since they create new documents, and it doesn't occur for deletes because they're idempotent. |
131+
| 409 | A version conflict was detected when attempting to index a document.| Yes | This can happen when you're trying to index the same document more than once concurrently. |
132+
| 422 | The index is temporarily unavailable because it was updated with the 'allowIndexDowntime' flag set to 'true'. | Yes | |
133+
| 503 | Your search service is temporarily unavailable, possibly due to heavy load. | Yes | Your code should wait before retrying in this case or you risk prolonging the service unavailability.|
43134

44-
## Tips for incremental indexing
135+
If your client code frequently encounters a 207 response, one possible reason is that the system is under load. You can confirm this by checking the statusCode property for 503. If this is the case, we recommend throttling indexing requests. Otherwise, if indexing traffic doesn't subside, the system could start rejecting all requests with 503 errors.
136+
137+
Status code 429 indicates that you have exceeded your quota on the number of documents per index. You must either create a new index or upgrade for higher capacity limits.
138+
139+
> [!NOTE]
140+
> When you upload `DateTimeOffset` values with time zone information to your index, Azure AI Search normalizes these values to UTC. For example, 2024-01-13T14:03:00-08:00 is stored as 2024-01-13T22:03:00Z. If you need to store time zone information, add an extra column to your index for this data point.
141+
142+
### Tips for incremental indexing
45143

46144
+ [Indexers automate incremental indexing](search-indexer-overview.md). If you can use an indexer, and if the data source supports change tracking, you can run the indexer on a recurring schedule to add, update, or overwrite searchable content so that it's synchronized to your external data.
47145

@@ -88,18 +186,22 @@ GET {{baseUrl}}/indexes/hotels-vector-quickstart/docs('1')?api-version=2024-07-
88186
api-key: {{apiKey}}
89187
```
90188

91-
## Change an index schema
189+
## Update an index schema
190+
191+
The index schema defines the physical data structures created on the search service, so there aren't many schema changes that you can make without incurring a full rebuild.
92192

93-
The index schema defines the physical data structures created on the search service, so there aren't many schema changes that you can make without incurring a full rebuild. The following list enumerates the schema changes that can be introduced seamlessly into an existing index. Generally, the list includes new fields and functionality used during query execution.
193+
### Updates with no rebuild
194+
195+
The following list enumerates the schema changes that can be introduced seamlessly into an existing index. Generally, the list includes new fields and functionality used during query execution.
94196

95197
+ Add a new field
96198
+ Set the `retrievable` attribute on an existing field
97199
+ Update `searchAnalyzer` on a field having an existing `indexAnalyzer`
98-
+ Add a new analyzer definition in an index (which can be applied to new fields)
99-
+ Add, update, or delete scoring profiles
200+
+ Add a new [analyzer definition](index-add-custom-analyzers.md) in an index (which can be applied to new fields)
201+
+ Add, update, or delete [scoring profiles](index-add-scoring-profiles.md)
202+
+ Add, update, or delete [synonymMaps](search-synonyms.md)
203+
+ Add, update, or delete [semantic configurations](semantic-how-to-configure.md)
100204
+ Add, update, or delete CORS settings
101-
+ Add, update, or delete synonymMaps
102-
+ Add, update, or delete semantic configurations
103205

104206
The order of operations is:
105207

@@ -115,7 +217,7 @@ When you update an index schema to include a new field, existing documents in th
115217

116218
There should be no query disruptions during the updates, but query results will vary as the updates take effect.
117219

118-
## Drop and rebuild an index
220+
### Updates requiring a rebuild
119221

120222
Some modifications require an index drop and rebuild, replacing a current index with a new one.
121223

@@ -144,6 +246,8 @@ The order of operations is:
144246

145247
When you create the index, physical storage is allocated for each field in the index schema, with an inverted index created for each searchable field and a vector index created for each vector field. Fields that aren't searchable can be used in filters or expressions, but don't have inverted indexes and aren't full-text or fuzzy searchable. On an index rebuild, these inverted indexes and vector indexes are deleted and recreated based on the index schema you provide.
146248

249+
To minimize disruption to application code, consider [creating an index alias](search-how-to-alias.md). Application code references the alias, but you can update the name of the index that the alias points to.
250+
147251
## Balancing workloads
148252

149253
Indexing doesn't run in the background, but the search service will balance any indexing jobs against ongoing queries. During indexing, you can [monitor query requests](search-monitor-queries.md) in the Azure portal to ensure queries are completing in a timely manner.
@@ -156,7 +260,13 @@ You can begin querying an index as soon as the first document is loaded. If you
156260

157261
You can use [Search Explorer](search-explorer.md) or a [REST client](search-get-started-rest.md) to check for updated content.
158262

159-
If you added or renamed a field, use [$select](search-query-odata-select.md) to return that field: `search=*&$select=document-id,my-new-field,some-old-field&$count=true`.
263+
If you added or renamed a field, use [select](search-query-odata-select.md) to return that field:
264+
265+
```json
266+
"search": "*",
267+
"select": "document-id, my-new-field, some-old-field",
268+
"count": true
269+
```
160270

161271
The Azure portal provides index size and vector index size. You can check these values after updating an index, but remember to expect a small delay as the service processes the change and to account for portal refresh rates, which can be a few minutes.
162272

articles/search/search-limits-quotas-capacity.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -198,17 +198,18 @@ Static rate request limits for operations related to a service:
198198

199199
+ Service Statistics (GET /servicestats): 4 per second per search unit
200200

201-
### Semantic Ranker Throttling limits
201+
### Semantic ranker throttling limits
202202

203203
[Semantic ranker](search-get-started-semantic.md) uses a queuing system to manage concurrent requests. This system allows search services get the highest number of queries per second possible. When the limit of concurrent requests is reached, additional requests are placed in a queue. If the queue is full, further requests are rejected and must be retried.
204204

205205
Total semantic ranker queries per second varies based on the following factors:
206-
+ The SKU of the search service. Both queue capacity and concurrent request limits vary by SKU.
206+
207+
+ The tier of the search service. Both queue capacity and concurrent request limits vary by tier.
207208
+ The number of search units in the search service. The simplest way to increase the maximum number of concurrent semantic ranker queries is to [add more search units to your search service](search-capacity-planning.md#how-to-change-capacity).
208209
+ The total available semantic ranker capacity in the region.
209210
+ The amount of time it takes to serve a query using semantic ranker. This varies based on how busy the search service is.
210211

211-
The following table describes the semantic ranker throttling limits by SKU. Subject to available capacity in the region, contact support to request a limit increase.
212+
The following table describes the semantic ranker throttling limits by tier, subject to available capacity in the region. You can contact Microsoft support to request a limit increase.
212213

213214
| Resource | Basic | S1 | S2 | S3 | S3-HD | L1 | L2 |
214215
|----------|-------|----|----|----|-------|----|----|

0 commit comments

Comments
 (0)