Skip to content

Commit 18fce7f

Browse files
committed
ported field attribution content from REST to conceptual docs
1 parent 7336536 commit 18fce7f

File tree

1 file changed

+63
-10
lines changed

1 file changed

+63
-10
lines changed

articles/search/search-how-to-create-search-index.md

Lines changed: 63 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: azure-ai-search
99
ms.topic: how-to
10-
ms.date: 10/20/2024
10+
ms.date: 11/01/2024
1111
---
1212

1313
# Create an index in Azure AI Search
@@ -18,24 +18,32 @@ In this article, learn the steps for defining a schema for a [**search index**](
1818

1919
+ Write permissions as a [**Search Service Contributor**](search-security-rbac.md) or an [admin API key](search-security-api-keys.md) for key-based authentication.
2020

21-
+ An understanding of the data you want to index. A search index is based on external content that you want to make searchable. Searchable content is stored as fields in an index. You should have a clear idea of which source fields you want to make searchable, retrievable, filterable, facetable, and sortable. See the [schema checklist](#schema-checklist) for guidance.
21+
+ An understanding of the data you want to index. A search index is based on external content that you want to make searchable. Searchable content is stored as fields in an index. You should have a clear idea of which source fields you want to make searchable, retrievable, filterable, facetable, and sortable on Azure AI Search. See the [schema checklist](#schema-checklist) for guidance.
2222

2323
+ You must also have a unique field in source data that can be used as the [document key (or ID)](#document-keys) in the index.
2424

25-
+ A stable index location. Moving an existing index to a different search service isn't supported out-of-the-box. Revisit application requirements and make sure that your existing search service (capacity and location), are sufficient for your needs.
25+
+ A stable index location. Moving an existing index to a different search service isn't supported out-of-the-box. Revisit application requirements and make sure that your existing search service (capacity and region), are sufficient for your needs. If you're taking a dependency on Azure AI services or Azure OpenAI, [choose a region](search-create-service-portal.md#checklist-for-choosing-a-region) that provides all of the necessary resources.
2626

2727
+ Finally, all service tiers have [index limits](search-limits-quotas-capacity.md#index-limits) on the number of objects that you can create. For example, if you're experimenting on the Free tier, you can only have three indexes at any given time. Within the index itself, there are [limits on vectors](search-limits-quotas-capacity.md#vector-index-size-limits) and [index limits](search-limits-quotas-capacity.md#index-limits) on the number of simple and complex fields.
2828

2929
## Document keys
3030

31-
A search index has two requirements: it must have a name and a document key.
31+
Search index creation has two requirements: an index must have a unique name on the search service, and it must have a document key. The boolean `key` attribute on a field can be set to true to indicate which field provides the document key.
3232

33-
A document key is the unique identifier of a search document, and a search document is a collection of fields that completely describes something. For example, if you're indexing a [movies data set](https://www.kaggle.com/datasets/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows), a search document contains the title, genre, and duration of a single movie.
33+
A document key is the unique identifier of a search document, and a search document is a collection of fields that completely describes something. For example, if you're indexing a [movies data set](https://www.kaggle.com/datasets/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows), a search document contains the title, genre, and duration of a single movie. Movie names are unique in this dataset, so you might use the movie name as the document key.
3434

35-
In Azure AI Search, a document key must be a string, and it must originate from unique values in the data source that's providing the content to be indexed. A search service doesn't generate key values, but in some scenarios (such as the [Azure table indexer](search-howto-indexing-azure-tables.md)) it synthesizes existing values to create a unique key for the documents being indexed.
35+
In Azure AI Search, a document key is a string, and it must originate from unique values in the data source that's providing the content to be indexed. As a general rule, a search service doesn't generate key values, but in some scenarios (such as the [Azure table indexer](search-howto-indexing-azure-tables.md)) it synthesizes existing values to create a unique key for the documents being indexed. Another scenario is one-to-many indexing for chunked or partitioned data, in which case document keys are generated for each chunk.
3636

3737
During incremental indexing, where new and updated content is indexed, incoming documents with new keys are added, while incoming documents with existing keys are either merged or overwritten, depending on whether index fields are null or populated.
3838

39+
Important points about document keys include:
40+
41+
+ The maximum length of values in a key field is 1,024 characters.
42+
+ Exactly one top-level field in each index must be chosen as the key field and it must be of type `Edm.String`.
43+
+ The default of the `key` attribute is false for simple fields and null for complex fields.
44+
45+
Key fields can be used to look up documents directly and update or delete specific documents. The values of key fields are handled in a case-sensitive manner when looking up or indexing documents. See [GET Document (REST)](/rest/api/searchservice/documents/get) and [Index Documents (REST)](/rest/api/searchservice/documents) for details.
46+
3947
## Schema checklist
4048

4149
Use this checklist to assist the design decisions for your search index.
@@ -52,15 +60,15 @@ Use this checklist to assist the design decisions for your search index.
5260

5361
Searchable vector content can be images or text (in any language) that exists as a mathematical representation. You can use narrow data types or vector compression to make vector fields smaller.
5462

55-
[Field attribute assignments](search-what-is-an-index.md#index-attributes) determine both search behaviors and the physical representation of your index on the search service. Determining how fields should be specified is an iterative process for many customers. To speed up iterations, start with sample data so that you can drop and rebuild easily.
63+
[Attributes set on fields](search-what-is-an-index.md#index-attributes), such as `retrievable` or `filterable`, determine both search behaviors and the physical representation of your index on the search service. Determining how fields should be attributed is an iterative process for many developers. To speed up iterations, start with sample data so that you can drop and rebuild easily.
5664

5765
1. Identify which source fields can be used as filters. Numeric content and short text fields, particularly those with repeating values, are good choices. When working with filters, remember:
5866

59-
+ Filters can be used in vector and nonvector queries, but the filter itself is applied alphanumeric (nonvector) fields in your index.
67+
+ Filters can be used in vector and nonvector queries, but the filter itself is applied to alphanumeric (nonvector) fields in your index.
6068

6169
+ Filterable fields can optionally be used in faceted navigation.
6270

63-
+ Filterable fields are returned in arbitrary order, so consider making them sortable as well.
71+
+ Filterable fields are returned in arbitrary order and don't undergo relevance scoring, so consider making them sortable as well.
6472

6573
1. For vector fields, specify a vector search configuration and the algorithms used for creating navigation paths and filling the embedding space. For more information, see [Add vector fields](vector-search-how-to-create-index.md).
6674

@@ -77,6 +85,51 @@ Use this checklist to assist the design decisions for your search index.
7785
> [!NOTE]
7886
> Full text search is conducted over terms that are tokenized during indexing. If your queries fail to return the results you expect, [test for tokenization](/rest/api/searchservice/indexes/analyze) to verify the string you're searching for actually exists. You can try different analyzers on strings to see how tokens are produced for various analyzers.
7987
88+
## Configure field definitions
89+
90+
The fields collection defines the structure of a search document. All fields have a name, data type, and attributes.
91+
92+
Setting a field as searchable, filterable, sortable, or facetable has an effect on index size and query performance. Don't set those attributes on fields that aren't meant to be referenced in query expressions.
93+
94+
If a field isn't set to be searchable, filterable, sortable, or facetable, the field can't be referenced in any query expression. This is desirable for fields that aren't used in queries, but are needed in search results.
95+
96+
The REST APIs have default attribution based on data types, which is also used by the [Import wizards](search-import-data-portal.md) in the Azure portal. The Azure SDKs don't have defaults, but they have field subclasses that incorporate properties and behaviors, such as [SearchableField](/dotnet/api/azure.search.documents.indexes.models.searchablefield) for strings and [SimpleField](/dotnet/api/azure.search.documents.indexes.models.simplefield) for primitives.
97+
98+
Default field attributions for the REST APIs are summarized in the following table.
99+
100+
| Data type | Searchable | Retrievable | Filterable | Facetable | Sortable | Stored |
101+
|-----------|-------------|------------|------------|-----------|----------|--------|
102+
| `Edm.String` |||||||
103+
| `Collection(Edm.String)` |||||||
104+
| `Edm.Boolean` |||||||
105+
| `Edm.Int32`, `Edm.Int64`, `Edm.Double` |||||||
106+
| `Edm.DateTimeOffset` |||||||
107+
| `Edm.GeographyPoint` |||||||
108+
| `Edm.ComplexType` |||||||
109+
| `Collection(Edm.Single)` and all other vector field types |||||||
110+
111+
String fields can also be optionally associated with [analyzers](search-analyzers.md) and [synonym maps](search-synonyms.md). Fields of type `Edm.String` that are filterable, sortable, or facetable can be at most 32 kilobytes in length. This is because values of such fields are treated as a single search term, and the maximum length of a term in Azure AI Search is 32 kilobytes. If you need to store more text than this in a single string field, you should explicitly set filterable, sortable, and facetable to `false` in your index definition.
112+
113+
Vector fields must be associated with [dimensions and vector profiles](vector-search-how-to-create-index.md).
114+
115+
Field attributes are described in the following table.
116+
117+
|Attribute|Description|
118+
|---------------|-----------------|
119+
|name|Required. Sets the name of the field, which must be unique within the fields collection of the index or parent field.|
120+
|type|Required. Sets the data type for the field. Fields can be simple or complex. Simple fields are of primitive types, like `Edm.String` for text or `Edm.Int32` for integers. [Complex fields](search-howto-complex-data-types.md) can have sub-fields that are themselves either simple or complex. This allows you to model objects and arrays of objects, which in turn enables you to upload most JSON object structures to your index. See [Supported data types](/rest/api/searchservice/supported-data-types) for the complete list of supported types.|
121+
|key|Required. Set this attribute to true to designate that a field's values uniquely identify documents in the index. See [Document keys](#document-keys) in this article for details.|
122+
|retrievable| Indicates whether the field can be returned in a search result. Set this attribute to `false` if you want to use a field as a filter, sorting, or scoring mechanism but don't want the field to be visible to the end user. This attribute must be `true` for key fields, and it must be `null` for complex fields. This attribute can be changed on existing fields. Setting retrievable to `true` doesn't cause any increase in index storage requirements. Default is `true` for simple fields and `null` for complex fields.|
123+
|searchable| Indicates whether the field is full-text searchable and can be referenced in search queries. This means it undergoes [lexical analysis](search-analyzers.md) such as word-breaking during indexing. If you set a searchable field to a value like "Sunny day", internally it's normalized into the individual tokens \"sunny\" and \"day\". This enables full-text searches for these terms. Fields of type `Edm.String` or `Collection(Edm.String)` are searchable by default. This attribute must be `false` for simple fields of other nonstring data types, and it must be `null` for complex fields. </br></br>A searchable field consumes extra space in your index since Azure AI Search processes the contents of those fields and organize them in auxiliary data structures for performant searching. If you want to save space in your index and you don't need a field to be included in searches, set searchable to `false`. See [How full-text search works in Azure AI Search](search-lucene-query-architecture.md) for details. |
124+
|filterable| Indicates whether to enable the field to be referenced in `$filter` queries. Filterable differs from searchable in how strings are handled. Fields of type `Edm.String` or `Collection(Edm.String)` that are filterable don't undergo lexical analysis, so comparisons are for exact matches only. For example, if you set such a field `f` to "Sunny day", `$filter=f eq 'sunny'` finds no matches, but `$filter=f eq 'Sunny day'` will. This attribute must be `null` for complex fields. Default is `true` for simple fields and `null` for complex fields. To reduce index size, set this attribute to `false` on fields that you won't be filtering on.|
125+
|sortable| Indicates whether to enable the field to be referenced in `$orderby` expressions. By default Azure AI Search sorts results by score, but in many experiences users want to sort by fields in the documents. A simple field can be sortable only if it's single-valued (it has a single value in the scope of the parent document). </br></br>Simple collection fields can't be sortable, since they're multi-valued. Simple subfields of complex collections are also multi-valued, and therefore can't be sortable. This is true whether it's an immediate parent field, or an ancestor field, that's the complex collection. Complex fields can't be sortable and the sortable attribute must be `null` for such fields. The default for sortable is `true` for single-valued simple fields, `false` for multi-valued simple fields, and `null` for complex fields.|
126+
|facetable| Indicates whether to enable the field to be referenced in facet queries. Typically used in a presentation of search results that includes hit count by category (for example, search for digital cameras and see hits by brand, by megapixels, by price, and so on). This attribute must be `null` for complex fields. Fields of type `Edm.GeographyPoint` or `Collection(Edm.GeographyPoint)` can't be facetable. Default is `true` for all other simple fields. To reduce index size, set this attribute to `false` on fields that you won't be faceting on. |
127+
|analyzer|Sets the lexical analyzer for tokenizing strings during indexing and query operations. Valid values for this property include [language analyzers](index-add-language-analyzers.md), [built-in analyzers](index-add-custom-analyzers.md#built-in-analyzers), and [custom analyzers](index-add-custom-analyzers.md). The default is `standard.lucene`. This attribute can only be used with searchable string fields, and it can't be set together with either searchAnalyzer or indexAnalyzer. Once the analyzer is chosen and the field is created in the index, it can't be changed for the field. Must be `null` for [complex fields](search-howto-complex-data-types.md). |
128+
|searchAnalyzer|Set this property together with indexAnalyzer to specify different lexical analyzers for indexing and queries. If you use this property, set analyzer to `null` and make sure indexAnalyzer is set to an allowed value. Valid values for this property include built-in analyzers and custom analyzers. This attribute can be used only with searchable fields. The search analyzer can be updated on an existing field since it's only used at query-time. Must be `null` for complex fields].|
129+
|indexAnalyzer|Set this property together with searchAnalyzer to specify different lexical analyzers for indexing and queries. If you use this property, set analyzer to `null` and make sure searchAnalyzer is set to an allowed value. Valid values for this property include built-in analyzers and custom analyzers. This attribute can be used only with searchable fields. Once the index analyzer is chosen, it can't be changed for the field. Must be `null` for complex fields.|
130+
|synonymMaps|A list of the names of synonym maps to associate with this field. This attribute can be used only with searchable fields. Currently only one synonym map per field is supported. Assigning a synonym map to a field ensures that query terms targeting that field are expanded at query-time using the rules in the synonym map. This attribute can be changed on existing fields. Must be `null` or an empty collection for complex fields.|
131+
|fields|A list of subfields if this is a field of type `Edm.ComplexType` or `Collection(Edm.ComplexType)`. Must be `null` or empty for simple fields. See [How to model complex data types in Azure AI Search](search-howto-complex-data-types.md) for more information on how and when to use subfields.|
132+
80133
## Create an index
81134

82135
When you're ready to create the index, use a search client that can send the request. You can use the Azure portal or REST APIs for early development and proof-of-concept testing, otherwise it's common to use the Azure SDKs.
@@ -168,7 +221,7 @@ SearchIndex index = new SearchIndex(indexName)
168221
await indexClient.CreateIndexAsync(index);
169222
```
170223

171-
For more examples, see[azure-search-dotnet-samples/quickstart/v11/](https://github.com/Azure-Samples/azure-search-dotnet-samples/tree/main/quickstart/v11).
224+
For more examples, see [azure-search-dotnet-samples/quickstart/v11/](https://github.com/Azure-Samples/azure-search-dotnet-samples/tree/main/quickstart/v11).
172225

173226
### [**Other SDKs**](#tab/index-other-sdks)
174227

0 commit comments

Comments
 (0)