Skip to content

Commit 388efdf

Browse files
author
Jill Grant
authored
Merge pull request #279942 from HeidiSteen/heidist-june28
[azure search] Refactor update index scenarios and covered delete behavior
2 parents a694951 + b374b7a commit 388efdf

File tree

6 files changed

+242
-124
lines changed

6 files changed

+242
-124
lines changed

articles/search/TOC.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -218,16 +218,16 @@
218218
href: search-howto-move-across-regions.md
219219
- name: Index management
220220
items:
221-
- name: Create a search index
221+
- name: Create an index
222222
href: search-how-to-create-search-index.md
223-
- name: Create an index alias
224-
href: search-how-to-alias.md
225223
- name: Load an index
226224
href: search-how-to-load-search-index.md
227-
- name: Index large data sets
228-
href: search-howto-large-index.md
229-
- name: Drop and rebuild an index
225+
- name: Update or rebuild an index
230226
href: search-howto-reindex.md
227+
- name: Alias an index
228+
href: search-how-to-alias.md
229+
- name: Import large data sets
230+
href: search-howto-large-index.md
231231
- name: Indexers
232232
items:
233233
- name: Create an indexer

articles/search/resource-tools.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 05/22/2024
10+
ms.date: 07/02/2024
1111
---
1212

1313
# Productivity tools - Azure AI Search
@@ -16,7 +16,7 @@ Productivity tools are built by engineers at Microsoft, but aren't part of the A
1616

1717
| Tool name | Description | Source code |
1818
|-----------|------------ |-------------|
19-
| [Back up and Restore](https://github.com/liamca/azure-search-backup-restore/blob/master/README.md) | Download the retrievable fields of an index to your local device and then upload the index and its content to a new search service. | [https://github.com/liamca/azure-search-backup-restore](https://github.com/liamca/azure-search-backup-restore) |
19+
| [Back up and Restore](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python/code/index-backup-restore) | Download the retrievable fields of an index to your local device and then upload the index and its content to a new search service. | [https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python/code/index-backup-restore](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python/code/index-backup-restore) |
2020
| [Chat with your data solution accelerator](https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/blob/main/README.md) | Code and docs to create interactive search solution in production environments. | [https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator](https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator) |
2121
| [Knowledge Mining Accelerator](https://github.com/Azure-Samples/azure-search-knowledge-mining/blob/main/README.md) | Code and docs to jump start a knowledge store using your data. | [https://github.com/Azure-Samples/azure-search-knowledge-mining](https://github.com/Azure-Samples/azure-search-knowledge-mining) |
2222
| [Performance testing solution](https://github.com/Azure-Samples/azure-search-performance-testing/blob/main/README.md) | This solution helps you load test Azure AI Search. It uses Apache JMeter as an open source load and performance testing tool and Terraform to dynamically provision and destroy the required infrastructure on Azure. | [https://github.com/Azure-Samples/azure-search-performance-testing](https://github.com/Azure-Samples/azure-search-performance-testing) |

articles/search/search-how-to-create-search-index.md

Lines changed: 39 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Create a search index
2+
title: Create an index
33
titleSuffix: Azure AI Search
44
description: Create a search index using the Azure portal, REST APIs, or an Azure SDK.
55

@@ -8,33 +8,33 @@ author: HeidiSteen
88
ms.author: heidist
99

1010
ms.service: cognitive-search
11-
ms.custom:
12-
- ignite-2023
1311
ms.topic: how-to
14-
ms.date: 09/25/2023
12+
ms.date: 07/01/2024
1513
---
1614

1715
# Create an index in Azure AI Search
1816

19-
In Azure AI Search, query requests target the searchable text in a [**search index**](search-what-is-an-index.md).
20-
21-
In this article, learn the steps for defining and publishing a search index. Creating an index establishes the physical data structures on your search service. Once the index definition exists, [**loading the index**](search-what-is-data-import.md) follows as a separate task.
17+
In this article, learn the steps for defining a schema for a [**search index**](search-what-is-an-index.md) and pushing it to a search service. Creating an index establishes the physical data structures on your search service. Once the index exists, [**load the index**](search-what-is-data-import.md) as a separate task.
2218

2319
## Prerequisites
2420

25-
+ Write permissions. Permission can be granted through an [admin API key](search-security-api-keys.md) on the request. Alternatively, if you're using [role-based access control](search-security-rbac.md), send a request as a member of the Search Contributor role.
21+
+ Write permissions as a [**Search Service Contributor**](search-security-rbac.md) or an [admin API key](search-security-api-keys.md) for key-based authentication.
2622

27-
+ An understanding of the data you want to index. Creating an index is a schema definition exercise, so you should have a clear idea of which source fields you want to make searchable, retrievable, filterable, facetable, and sortable (see the [schema checklist](#schema-checklist) for guidance).
23+
+ An understanding of the data you want to index. A search index is based on external content that you want to make searchable. Searchable content is stored as fields in an index. You should have a clear idea of which source fields you want to make searchable, retrievable, filterable, facetable, and sortable (see the [schema checklist](#schema-checklist) for guidance).
2824

29-
You must also have a unique field in source data that can be used as the [document key (or ID)](#document-keys) in the index.
25+
+ You must also have a unique field in source data that can be used as the [document key (or ID)](#document-keys) in the index.
3026

31-
+ A stable index location. Moving an existing index to a different search service isn't supported out-of-the-box. Revisit application requirements and make sure that your existing search service, its capacity and location, are sufficient for your needs.
27+
+ A stable index location. Moving an existing index to a different search service isn't supported out-of-the-box. Revisit application requirements and make sure that your existing search service (capacity and location), are sufficient for your needs.
3228

33-
+ Finally, all service tiers have [index limits](search-limits-quotas-capacity.md#index-limits) on the number of objects that you can create. For example, if you're experimenting on the Free tier, you can only have three indexes at any given time. Within the index itself, there are limits on the number of complex fields and collections.
29+
+ Finally, all service tiers have [index limits](search-limits-quotas-capacity.md#index-limits) on the number of objects that you can create. For example, if you're experimenting on the Free tier, you can only have three indexes at any given time. Within the index itself, there are [limits on vectors](search-limits-quotas-capacity.md#vector-index-size-limits) and [index limits](search-limits-quotas-capacity.md#index-limits) on the number of simple and complex fields.
3430

3531
## Document keys
3632

37-
A search index has one required field: a document key. A document key is the unique identifier of a search document. In Azure AI Search, it must be a string, and it must originate from unique values in the data source that's providing the content to be indexed. A search service doesn't generate key values, but in some scenarios (such as the [Azure table indexer](search-howto-indexing-azure-tables.md)) it synthesizes existing values to create a unique key for the documents being indexed.
33+
A search index has two requirements: it must have a name and a document key.
34+
35+
A document key is the unique identifier of a search document, and a search document is a collection of fields that completely describes something. For example, if you're indexing a [movies data set](https://www.kaggle.com/datasets/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows), a search document contains the title, genre, and duration of a single movie.
36+
37+
In Azure AI Search, a document key must be a string, and it must originate from unique values in the data source that's providing the content to be indexed. A search service doesn't generate key values, but in some scenarios (such as the [Azure table indexer](search-howto-indexing-azure-tables.md)) it synthesizes existing values to create a unique key for the documents being indexed.
3838

3939
During incremental indexing, where new and updated content is indexed, incoming documents with new keys are added, while incoming documents with existing keys are either merged or overwritten, depending on whether index fields are null or populated.
4040

@@ -44,32 +44,44 @@ Use this checklist to assist the design decisions for your search index.
4444

4545
1. Review [naming conventions](/rest/api/searchservice/naming-rules) so that index and field names conform to the naming rules.
4646

47-
1. Review [supported data types](/rest/api/searchservice/supported-data-types). The data type affects how the field is used. For example, numeric content is filterable but not full text searchable. The most common data type is `Edm.String` for searchable text, which is tokenized and queried using the full text search engine.
47+
1. Review [supported data types](/rest/api/searchservice/supported-data-types). The data type affects how the field is used. For example, numeric content is filterable but not full text searchable. The most common data type is `Edm.String` for searchable text, which is tokenized and queried using the full text search engine. The most common data type for a vector field is `Edm.Single` but you can use other types as well.
4848

4949
1. Identify a [document key](#document-keys). A document key is an index requirement. It's a single string field and it's populated from a source data field that contains unique values. For example, if you're indexing from Blob Storage, the metadata storage path is often used as the document key because it uniquely identifies each blob in the container.
5050

51-
1. Identify the fields in your data source that contribute searchable content in the index. Searchable content includes short or long strings that are queried using the full text search engine. If the content is verbose (small phrases or bigger chunks), experiment with different analyzers to see how the text is tokenized.
51+
1. Identify the fields in your data source that contribute searchable content in the index.
52+
53+
Searchable nonvector content includes short or long strings that are queried using the full text search engine. If the content is verbose (small phrases or bigger chunks), experiment with different analyzers to see how the text is tokenized.
54+
55+
Searchable vector content can be images or text (in any language) that exists as a mathematical representation. You can use narrow data types or vector compression to make vector fields smaller.
5256

5357
[Field attribute assignments](search-what-is-an-index.md#index-attributes) determine both search behaviors and the physical representation of your index on the search service. Determining how fields should be specified is an iterative process for many customers. To speed up iterations, start with sample data so that you can drop and rebuild easily.
5458

5559
1. Identify which source fields can be used as filters. Numeric content and short text fields, particularly those with repeating values, are good choices. When working with filters, remember:
5660

61+
+ Filters can be used in vector and nonvector queries, but the filter itself is applied alphanumeric (nonvector) fields in your index.
62+
5763
+ Filterable fields can optionally be used in faceted navigation.
5864

5965
+ Filterable fields are returned in arbitrary order, so consider making them sortable as well.
6066

61-
1. Determine whether to use the default analyzer (`"analyzer": null`) or a different analyzer. [Analyzers](search-analyzers.md) are used to tokenize text fields during indexing and query execution.
67+
1. For vector fields, specify a vector search configuration and the algorithms used for creating navigation paths and filling the embedding space. For more information, see [Add vector fields](vector-search-how-to-create-index.md).
68+
69+
Vector fields have extra properties that nonvector fields don't have, such as which algorithms to use and vector compression.
70+
71+
Vector fields omit attributes that aren't useful on vector data, such as sorting, filtering, and faceting.
72+
73+
1. For nonvector fields, determine whether to use the default analyzer (`"analyzer": null`) or a different analyzer. [Analyzers](search-analyzers.md) are used to tokenize text fields during indexing and query execution.
6274

6375
For multi-lingual strings, consider a [language analyzer](index-add-language-analyzers.md).
6476

6577
For hyphenated strings or special characters, consider [specialized analyzers](index-add-custom-analyzers.md#built-in-analyzers). One example is [keyword](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html) that treats the entire contents of a field as a single token. This behavior is useful for data like zip codes, IDs, and some product names. For more information, see [Partial term search and patterns with special characters](search-query-partial-matching.md).
6678

6779
> [!NOTE]
68-
> Full text search is conducted over terms that are tokenized during indexing. If your queries fail to return the results you expect, [test for tokenization](/rest/api/searchservice/test-analyzer) to verify the string actually exists. You can try different analyzers on strings to see how tokens are produced for various analyzers.
80+
> Full text search is conducted over terms that are tokenized during indexing. If your queries fail to return the results you expect, [test for tokenization](/rest/api/searchservice/indexes/analyze) to verify the string you're searchin for actually exists. You can try different analyzers on strings to see how tokens are produced for various analyzers.
6981
7082
## Create an index
7183

72-
When you're ready to create the index, use a search client that can send the request. You can use the Azure portal or REST APIs for early development and proof-of-concept testing.
84+
When you're ready to create the index, use a search client that can send the request. You can use the Azure portal or REST APIs for early development and proof-of-concept testing, otherwise it's common to use the Azure SDKs.
7385

7486
During development, plan on frequent rebuilds. Because physical structures are created in the service, [dropping and re-creating indexes](search-howto-reindex.md) is necessary for many modifications. You might consider working with a subset of your data to make rebuilds go faster.
7587

@@ -79,10 +91,12 @@ Index design through the portal enforces requirements and schema rules for speci
7991

8092
1. Sign in to the [Azure portal](https://portal.azure.com).
8193

94+
1. Check for space. Search services are subject to [maximum number of indexes](search-limits-quotas-capacity.md), varying by service tier. Make sure you have room for a second index.
95+
8296
1. In the search service Overview page, choose either option for creating a search index:
8397

8498
+ **Add index**, an embedded editor for specifying an index schema
85-
+ [**Import data wizard**](search-import-data-portal.md)
99+
+ [**Import wizards**](search-import-data-portal.md)
86100

87101
The wizard is an end-to-end workflow that creates an indexer, a data source, and a finished index. It also loads the data. If this is more than what you want, use **Add index** instead.
88102

@@ -95,7 +109,7 @@ The following screenshot highlights where **Add index** and **Import data** appe
95109
96110
### [**REST**](#tab/index-rest)
97111

98-
[**Create Index (REST API)**](/rest/api/searchservice/create-index) is used to create an index. You need a REST client to connect to your search service and send requests. See [Quickstart: Text search using REST](search-get-started-rest.md) to get started.
112+
[**Create Index (REST API)**](/rest/api/searchservice/indexes/create) is used to create an index. You need a REST client to connect to your search service and send requests. See [Quickstart: Full text search using REST](search-get-started-rest.md) or [Quickstart: Vector search using REST](search-get-started-vector.md) to get started.
99113

100114
The REST API provides defaults for field attribution. For example, all `Edm.String` fields are searchable by default. Attributes are shown in full below for illustrative purposes, but you can omit attribution in cases where the default values apply.
101115

@@ -192,7 +206,7 @@ The following properties can be set for CORS:
192206

193207
## Allowed updates on existing indexes
194208

195-
[**Create Index**](/rest/api/searchservice/create-index) creates the physical data structures (files and inverted indexes) on your search service. Once the index is created, your ability to effect changes using [**Update Index**](/rest/api/searchservice/update-index) is contingent upon whether your modifications invalidate those physical structures. Most field attributes can't be changed once the field is created in your index.
209+
[**Create Index**](/rest/api/searchservice/indexes/create) creates the physical data structures (files and inverted indexes) on your search service. Once the index is created, your ability to effect changes using [**Create or Update Index**](/rest/api/searchservice/indexes/create-or-update) is contingent upon whether your modifications invalidate those physical structures. Most field attributes can't be changed once the field is created in your index.
196210

197211
Alternatively, you can [create an index alias](search-how-to-alias.md) that serves as a stable reference in your application code. Instead of updating your code, you can update an index alias to point to newer index versions.
198212

@@ -205,6 +219,7 @@ To minimize churn in the design process, the following table describes which ele
205219
| Field names and types | No |
206220
| Field attributes (searchable, filterable, facetable, sortable) | No |
207221
| Field attribute (retrievable) | Yes |
222+
| Stored (applies to vectors) | No |
208223
| [Analyzer](search-analyzers.md) | You can add and modify custom analyzers in the index. Regarding analyzer assignments on string fields, you can only modify `searchAnalyzer`. All other assignments and modifications require a rebuild. |
209224
| [Scoring profiles](index-add-scoring-profiles.md) | Yes |
210225
| [Suggesters](index-add-suggesters.md) | No |
@@ -216,5 +231,7 @@ To minimize churn in the design process, the following table describes which ele
216231
Use the following links to become familiar with loading an index with data, or extending an index with a synonyms map.
217232

218233
+ [Data import overview](search-what-is-data-import.md)
219-
+ [Add, Update or Delete Documents (REST)](/rest/api/searchservice/addupdate-or-delete-documents)
234+
+ [Add vector fields](vector-search-how-to-create-index.md)
235+
+ [Load documents](search-how-to-load-search-index.md)
236+
+ [Update an index](search-howto-reindex.md)
220237
+ [Synonym maps](search-synonyms.md)

0 commit comments

Comments
 (0)