You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-how-to-create-search-index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ In this article, learn the steps for defining a schema for the index and pushing
28
28
29
29
+ A stable index location. Moving an existing index to a different search service isn't supported out-of-the-box. Revisit application requirements and make sure that your existing search service (capacity and location), are sufficient for your needs.
30
30
31
-
+ Finally, all service tiers have [index limits](search-limits-quotas-capacity.md#index-limits) on the number of objects that you can create. For example, if you're experimenting on the Free tier, you can only have three indexes at any given time. Within the index itself, there are [limits on vectors](search-limits-quotas-capacity.md#vector-index-size-limits) and [index limits](search-limits-quotas-capacity#index-limits) on the number of simple and complex fields.
31
+
+ Finally, all service tiers have [index limits](search-limits-quotas-capacity.md#index-limits) on the number of objects that you can create. For example, if you're experimenting on the Free tier, you can only have three indexes at any given time. Within the index itself, there are [limits on vectors](search-limits-quotas-capacity.md#vector-index-size-limits) and [index limits](search-limits-quotas-capacity.md#index-limits) on the number of simple and complex fields.
32
32
33
33
## Document keys
34
34
@@ -62,7 +62,7 @@ Use this checklist to assist the design decisions for your search index.
62
62
63
63
+ Filterable fields are returned in arbitrary order, so consider making them sortable as well.
64
64
65
-
1. For vector fields, specify a vector search configuration and the algorithms used for creating navigation paths and filling the embedding space. For more information, see [Add vector fields](vector-search-how-to-create.md).
65
+
1. For vector fields, specify a vector search configuration and the algorithms used for creating navigation paths and filling the embedding space. For more information, see [Add vector fields](vector-search-how-to-create-index.md).
66
66
67
67
Vector fields have extra properties that nonvector fields don't have, such as which algorithms to use and vector compression.
68
68
@@ -229,6 +229,6 @@ To minimize churn in the design process, the following table describes which ele
229
229
Use the following links to become familiar with loading an index with data, or extending an index with a synonyms map.
Copy file name to clipboardExpand all lines: articles/search/search-how-to-load-search-index.md
+61-12Lines changed: 61 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,21 +14,25 @@ ms.date: 07/01/2024
14
14
15
15
# Load data into a search index in Azure AI Search
16
16
17
-
This article explains how to import, refresh, and manage content in a predefined search index. In Azure AI Search, a [search index is created first](search-how-to-create-search-index.md) with [data import](search-what-is-data-import.md) following as a second step. The exception is [Import wizards](search-import-data-portal.md) in the portal and indexer pipelines, which create and load an index in one workflow.
17
+
This article explains how to import documents into a predefined search index. In Azure AI Search, a [search index is created first](search-how-to-create-search-index.md) with [data import](search-what-is-data-import.md) following as a second step. The exception is [Import wizards](search-import-data-portal.md) in the portal and indexer pipelines, which create and load an index in one workflow.
18
18
19
-
A search service imports and indexes plain text and vectors in JSON, used in full text search, vector search, hybrid search, and knowledge mining scenarios. Plain text content is obtainable from alphanumeric fields in the external data source, metadata that's useful in search scenarios, or enriched content created by a [skillset](cognitive-search-working-with-skillsets.md) (skills can extract or infer textual descriptions from images and unstructured content). Vector content is vectorized using an [external embedding model](vector-search-how-to-generate-embeddings.md) or [integrated vectorization (preview)](vector-search-integrated-vectorization.md) using Azure AI Search features that integrate with applied AI.
19
+
## How data import works
20
20
21
-
Once data is indexed, the physical data structures of the index are locked in. For guidance on what can and can't be changed, see [Update and rebuild an index](search-howto-reindex.md).
22
-
23
-
Indexing isn't a background process. A search service will balance indexing and query workloads, but if [query latency is too high](search-performance-analysis.md#impact-of-indexing-on-queries), you can either [add capacity](search-capacity-planning.md#adjust-capacity) or identify periods of low query activity for loading an index.
21
+
A search service accepts JSON documents that conform to the index schema. A search service imports and indexes plain text and vectors in JSON, used in full text search, vector search, hybrid search, and knowledge mining scenarios.
24
22
25
-
## Load documents
23
+
+ Plain text content is obtainable from alphanumeric fields in the external data source, metadata that's useful in search scenarios, or enriched content created by a [skillset](cognitive-search-working-with-skillsets.md) (skills can extract or infer textual descriptions from images and unstructured content).
26
24
27
-
A search service accepts JSON documents that conform to the index schema.
25
+
+ Vector content is vectorized using an [external embedding model](vector-search-how-to-generate-embeddings.md) or [integrated vectorization (preview)](vector-search-integrated-vectorization.md) using Azure AI Search features that integrate with applied AI.
28
26
29
27
You can prepare these documents yourself, but if content resides in a [supported data source](search-indexer-overview.md#supported-data-sources), running an [indexer](search-indexer-overview.md) or using an Import wizard can automate document retrieval, JSON serialization, and indexing.
30
28
31
-
### [**Azure portal**](#tab/portal)
29
+
Once data is indexed, the physical data structures of the index are locked in. For guidance on what can and can't be changed, see [Update and rebuild an index](search-howto-reindex.md).
30
+
31
+
Indexing isn't a background process. A search service will balance indexing and query workloads, but if [query latency is too high](search-performance-analysis.md#impact-of-indexing-on-queries), you can either [add capacity](search-capacity-planning.md#adjust-capacity) or identify periods of low query activity for loading an index.
32
+
33
+
For more information, see [Data import strategies](search-what-is-data-import.md).
34
+
35
+
## Load documents using the Azure portal
32
36
33
37
In the Azure portal, use the Import wizards to create and load indexes in a seamless workflow. If you want to load an existing index, choose an alternative approach.
34
38
@@ -40,7 +44,7 @@ In the Azure portal, use the Import wizards to create and load indexes in a seam
40
44
41
45
If indexers are already defined, you can [reset and run an indexer](search-howto-run-reset-indexers.md) from the Azure portal, which is useful if you're adding fields incrementally. Reset forces the indexer to start over, picking up all fields from all source documents.
42
46
43
-
### [**REST**](#tab/import-rest)
47
+
##Load documents using the REST APIs
44
48
45
49
[Documents - Index (REST)](/rest/api/searchservice/documents) is the means by which you can import data into a search index through the REST APIs. The `@search.action` parameter determines whether documents are added in full, or partially in terms of new or replacement values for specific fields.
46
50
@@ -82,11 +86,15 @@ If indexers are already defined, you can [reset and run an indexer](search-howto
82
86
83
87
When the document key or ID is new, **null** becomes the value for any field that is unspecified in the document. For actions on an existing document, updated values replace the previous values. Any fields that weren't specified in a "merge" or "mergeUpload" are left intact in the search index.
84
88
85
-
### [**.NET SDK (C#)**](#tab/importcsharp)
89
+
## Load documents using the Azure SDKs
86
90
87
-
Azure AI Search supports the following APIs for simple and bulk document uploads into an index:
91
+
Programmability is provided in the following Azure SDKs.
88
92
89
-
+ [IndexDocumentsAsync (Azure SDK for .NET)](/dotnet/api/azure.search.documents.searchclient.indexdocumentsasync)
93
+
### [**.NET**](#tab/sdk-dotnet)
94
+
95
+
The Azure SDK for .NET provides the following APIs for simple and bulk document uploads into an index:
There are several samples that illustrate indexing in context of simple and large-scale indexing:
@@ -97,6 +105,47 @@ There are several samples that illustrate indexing in context of simple and larg
97
105
98
106
+ [**Tutorial: Index any data**](tutorial-optimize-indexing-push-api.md) couples batch indexing with testing strategies for determining an optimum size.
99
107
108
+
+ Be sure to check the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) repo for code examples showing how to index vector fields.
109
+
110
+
### [**Python**](#tab/sdk-python)
111
+
112
+
The Azure SDK for Python provides the following APIs for simple and bulk document uploads into an index:
+ Be sure to check the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) repo for code examples showing how to index vector fields.
122
+
123
+
### [**JavaScript**](#tab/sdk-javascript)
124
+
125
+
The Azure SDK for JavaScript/TypeScript provides the following APIs for simple and bulk document uploads into an index:
+ See this quickstart for basic steps: [Quickstart: Full text search using the Azure SDKs](search-get-started-text.md?tabs=javascript)
133
+
134
+
+ Be sure to check the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) repo for code examples showing how to index vector fields.
135
+
136
+
### [**Java**](#tab/sdk-java)
137
+
138
+
The Azure SDK for Java provides the following APIs for simple and bulk document uploads into an index:
+ Be sure to check the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) repo for code examples showing how to index vector fields.
148
+
100
149
---
101
150
102
151
Internally during indexing, each vector field is populated with embeddings in an internal vector index, and each nonvector field's inverted index is populated with all of the unique, tokenized words from each document. Each field is associated with a document key that determines the logical structure of the document. For example, when indexing a hotels data set, an inverted index created for a City field might contain terms for Seattle, Portland, and so forth. Documents that include Seattle or Portland in the City field would have their document ID listed alongside the term. On any [Documents - Index](/rest/api/searchservice/documents) operation, the terms and document ID list are updated accordingly. For more information about inverted indexes, see [Full text search in Azure AI Search](search-lucene-query-architecture.md).
Copy file name to clipboardExpand all lines: articles/search/search-howto-reindex.md
+20-8Lines changed: 20 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,32 +14,44 @@ ms.date: 07/01/2024
14
14
15
15
# Update or rebuild an index in Azure AI Search
16
16
17
-
This article explains how to update an existing index in Azure AI Search. It explains the circumstances under which rebuilds are required, and provides recommendations for mitigating the effects of rebuilds on ongoing query requests. If you have to rebuild frequently, we recommend using [index aliases](search-how-to-alias.md) to make it easier to swap which index your application is pointing to.
17
+
This article explains how to update an existing index in Azure AI Search with incremental indexing. It explains the circumstances under which rebuilds are required, and provides recommendations for mitigating the effects of rebuilds on ongoing query requests.
18
18
19
19
During active development, it's common to drop and rebuild indexes when you're iterating over index design. Most developers work with a small representative sample of their data so that reindexing goes faster.
20
20
21
21
For applications already in production, we recommend creating a new index that runs side by side an existing index to avoid query downtime and using an [index alias](search-how-to-alias.md) to avoid changing your application code.
22
22
23
23
## Update content
24
24
25
-
Incremental indexing and synchronizing an index against changes in source data is a basic requirement in search scenarios. This section explains the workflow for overwriting field contents in a search index.
25
+
Incremental indexing and synchronizing an index against changes in source data is a basic requirement for most search applications. This section explains the workflow for overwriting field contents in a search index.
26
26
27
-
1. Use the same techniques for loading documents: [Documents - Index (REST)](/rest/api/searchservice/documents) or an equivalent API in the Azure SDKs. For more information, see [Load documents](search-how-to-load-search-index.md).
27
+
1. Use the same techniques for loading documents: [Documents - Index (REST)](/rest/api/searchservice/documents) or an equivalent API in the Azure SDKs. For more information about indexing, see [Load documents](search-how-to-load-search-index.md).
28
28
29
29
1. Set the `@search.action` parameter to determine the effect on existing documents:
30
30
31
-
+`delete` removes the entire document from the index. If you want to remove an individual field, use `merge` instead, setting the field in question to null. Deleted documents don't immediately free up space in the index. Every few minutes, a background process performs the physical deletion. Whether you use the portal or an API to return index statistics, you can expect a small delay before the deletion is reflected in the portal and through APIs.
32
-
+`merge` updates a document that already exists, and fails a document that can't be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type `Collection(Edm.String)`. For example, if a `tags` field starts with a value of `["budget"]` and you execute a merge with `["economy", "pool"]`, the final value of the `tags` field is `["economy", "pool"]`. It won't be `["budget", "economy", "pool"]`.
33
-
+`mergeOrUpload` behaves like `merge` if the document exists, and `upload` if the document is new.
34
-
+`upload`, similar to an "upsert" where the document is inserted if it's new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field's value is set to null.
31
+
| Action | Effect |
32
+
|--------|--------|
33
+
|`delete`| emoves the entire document from the index. If you want to remove an individual field, use `merge` instead, setting the field in question to null. Deleted documents and fields don't immediately free up space in the index. Every few minutes, a background process performs the physical deletion. Whether you use the portal or an API to return index statistics, you can expect a small delay before the deletion is reflected in the portal and through APIs. |
34
+
|`merge`| Updates a document that already exists, and fails a document that can't be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type `Collection(Edm.String)`. For example, if a `tags` field starts with a value of `["budget"]` and you execute a merge with `["economy", "pool"]`, the final value of the `tags` field is `["economy", "pool"]`. It won't be `["budget", "economy", "pool"]`. |
35
+
|`mergeOrUpload`| Behaves like `merge` if the document exists, and `upload` if the document is new. This is the most common action for incremental updates. |
36
+
|`upload`| Similar to an "upsert" where the document is inserted if it's new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field's value is set to null. |
35
37
36
38
1. Post the update.
37
39
38
40
Queries continue to run, but if you're updating or removing existing fields, you can expect mixed results and a higher incidence of throttling.
39
41
42
+
## Tips for incremental indexing
43
+
44
+
+ Use `mergeOrUpload` as the search action.
45
+
46
+
+ The payload must include the keys or identifiers of every document you want to add, update, or delete.
47
+
48
+
+ For merging, avoid listing fields that contain content you want to preserve. For example, if you populated vector fields, but only need to update a few nonvector fields, the payload should list just those fields you want to update. Specifying an empty field overwrites the existing value with a null value.
49
+
50
+
+[Indexers](search-indexer-overview.md) are designed for incremental indexing. If you can use an indexer, and if the data source supports change tracking, you can run the indexer on a recurring schedule to add, update, and delete an index so that it's synchronized to your external data.
51
+
40
52
## Change an index schema
41
53
42
-
The index schema defines the physical data structures created on the search service, so there aren't many schema changes that you can make without incurring a full rebuild. The following list enumerates the schema changes that can be introduced seamlessly into an existing index. The list includes new fields and functionality used during query executions.
54
+
The index schema defines the physical data structures created on the search service, so there aren't many schema changes that you can make without incurring a full rebuild. The following list enumerates the schema changes that can be introduced seamlessly into an existing index. Generally, the list includes new fields and functionality used during query executions.
43
55
44
56
+ Add a new field
45
57
+ Set the **retrievable** attribute on an existing field
0 commit comments