Skip to content

Commit 7debb69

Browse files
committed
Updated art, H2s
1 parent a211cbd commit 7debb69

File tree

4 files changed

+75
-83
lines changed

4 files changed

+75
-83
lines changed

articles/search/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@
126126
href: samples-rest.md
127127
- name: Concepts
128128
items:
129-
- name: Search index
129+
- name: Search indexes
130130
href: search-what-is-an-index.md
131131
- name: Full-text search
132132
href: search-lucene-query-architecture.md
30 KB
Loading

articles/search/search-how-to-create-search-index.md

Lines changed: 69 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,15 @@ Index creation is largely a schema definition exercise. Before creating one, you
3636

3737
Finally, all service tiers have [index limits](search-limits-quotas-capacity.md#index-limits) on the number of objects that you can create. For example, if you are experimenting on the Free tier, you can only have 3 indexes at any given time. Within the index itself, there are limits on the number of complex fields and collections.
3838

39-
## Mutable and immutable changes
39+
## Allowed updates
4040

41-
To minimize churn in the design process, the following table describes which elements are fixed and flexible. Changing a fixed element requires an index rebuild, whereas flexible elements can be changed at any time without impacting the physical implementation.
41+
A [Create Index](/rest/api/searchservice/create-index) operation creates the physical data structures (files and inverted indexes) on your search service. Your ability to update an existing index hinges on whether the modification invalidates those physical structures.
42+
43+
To minimize churn in the design process, the following table describes which elements are fixed and flexible in the schema. Changing a fixed element requires an index rebuild, whereas flexible elements can be changed at any time without impacting the physical implementation.
4244

4345
| Element | Mutable |
4446
|---------|---------|
45-
| Name | No |
47+
| Name | No. Refer to [naming conventions](/rest/api/searchservice/naming-rules) when naming an index. |
4648
| Key | No |
4749
| Field names and types | No |
4850
| Field attributes (searchable, filterable, facetable, sortable) | No. You can add new fields at any time, but changing an existing field is not supported. |
@@ -63,35 +65,88 @@ During development, plan on frequent rebuilds. Because physical structures are c
6365

6466
### [**Azure portal**](#tab/indexer-portal)
6567

66-
The portal provides two options for creating a search index: [**Import data wizard**](search-import-data-portal.md) and **Add Index** that provides fields for specifying an index schema.
68+
Index design through the portal enforces requirements and schema rules for specific data types, such as disallowing full text search capabilities on numeric fields. In the portal, there are two options for creating a search index:
69+
70+
+ **Add Index** is an embedded editor for specifying an index schema
71+
+ [**Import data*](search-import-data-portal.md) is a wizard
6772

6873
The wizard packs in additional operations by also creating an indexer, data source, and loading data. If this is more than what you want, you should just use **Add Index** or another approach.
6974

70-
The following screenshot shows where you can find **Add Index** and **Import data** on the command bar.
75+
The following screenshot shows where you can find **Add Index** and **Import data** on the command bar. After an index is created, you can find it again in the Indexes tab.
7176

7277
:::image type="content" source="media/search-what-is-an-index/add-index.png" alt-text="Add index command" border="true":::
7378

7479
> [!Tip]
75-
> Index design through the portal enforces requirements and schema rules for specific data types, such as disallowing full text search capabilities on numeric fields. Once you have a workable index, you can copy the JSON from the portal and add it to your solution.
80+
> After creating an index in the portal, you can copy the JSON representation and add it to your application code.
7681
7782
### [**REST**](#tab/kstore-rest)
7883

79-
Both Postman and Visual Studio Code (with an extension for Azure Cognitive Search) can function as a search index client. Using either tool, you can connect to your search service and send [Create Index (REST)](/rest/api/searchservice/create-index) requests. There are numerous tutorials and examples that demonstrate REST clients for creating objects.
80-
81-
Start with either of these articles to learn about each client:
84+
[**Create Index (REST)**](/rest/api/searchservice/create-index) is used to create an index. Both Postman and Visual Studio Code (with an extension for Azure Cognitive Search) can function as a search index client. Using either tool, you can connect to your search service and send requests:
8285

8386
+ [Create a search index using REST and Postman](search-get-started-rest.md)
8487
+ [Get started with Visual Studio Code and Azure Cognitive Search](search-get-started-vs-code.md)
8588

89+
The REST API provides defaults for field attribution. For example, all Edm.String fields are searchable by default. Attributes are shown in full below for illustrative purposes, but you can omit attribution in cases where the default values apply.
90+
8691
Refer to the [Index operations (REST)](/rest/api/searchservice/index-operations) for help with formulating index requests.
8792

88-
### [**.NET SDK**](#tab/kstore-dotnet)
93+
```json
94+
POST https://[servicename].search.windows.net/indexes?api-version=[api-version]
95+
{
96+
"name": "hotels",
97+
"fields": [
98+
{ "name": "HotelId", "type": "Edm.String", "key": true, "retrievable": true, "searchable": true, "filterable": true },
99+
{ "name": "HotelName", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": false, "sortable": true, "facetable": false },
100+
{ "name": "Description", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.microsoft" },
101+
{ "name": "Description_fr", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "fr.microsoft" },
102+
{ "name": "Address", "type": "Edm.ComplexType",
103+
"fields": [
104+
{ "name": "StreetAddress", "type": "Edm.String", "retrievable": true, "filterable": false, "sortable": false, "facetable": false, "searchable": true },
105+
{ "name": "City", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": true, "sortable": true, "facetable": true },
106+
{ "name": "StateProvince", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": true, "sortable": true, "facetable": true }
107+
]
108+
}
109+
],
110+
"suggesters": [ ],
111+
"scoringProfiles": [ ],
112+
"analyzers":(optional)[ ... ]
113+
}
114+
}
115+
```
89116

90-
For Cognitive Search, the Azure SDKs implement generally available features. As such, you can use any of the SDKs to create a search index. All of them provide a **SearchIndexClient** that has methods for creating and updating indexes.
117+
### [**.NET SDK**](#tab/kstore-dotnet)
91118

92-
| Azure SDK | Client | Examples |
93-
|-----------|--------|----------|
94-
| .NET | [SearchIndexClient](/dotnet/api/azure.search.documents.indexes.searchindexclient) | [azure-search-dotnet-samples/quickstart/v11/](https://github.com/Azure-Samples/azure-search-dotnet-samples/tree/master/quickstart/v11) |
119+
The Azure SDK for .NET has [**SearchIndexClient**](/dotnet/api/azure.search.documents.indexes.searchindexclient) with methods for creating and updating indexes.
120+
121+
```csharp
122+
// Create the index
123+
string indexName = "hotels";
124+
SearchIndex index = new SearchIndex(indexName)
125+
{
126+
Fields =
127+
{
128+
new SimpleField("hotelId", SearchFieldDataType.String) { IsKey = true, IsFilterable = true, IsSortable = true },
129+
new SearchableField("hotelName") { IsFilterable = true, IsSortable = true },
130+
new SearchableField("description") { AnalyzerName = LexicalAnalyzerName.EnLucene },
131+
new SearchableField("descriptionFr") { AnalyzerName = LexicalAnalyzerName.FrLucene }
132+
new ComplexField("address")
133+
{
134+
Fields =
135+
{
136+
new SearchableField("streetAddress"),
137+
new SearchableField("city") { IsFilterable = true, IsSortable = true, IsFacetable = true },
138+
new SearchableField("stateProvince") { IsFilterable = true, IsSortable = true, IsFacetable = true },
139+
new SearchableField("country") { SynonymMapNames = new[] { synonymMapName }, IsFilterable = true, IsSortable = true, IsFacetable = true },
140+
new SearchableField("postalCode") { IsFilterable = true, IsSortable = true, IsFacetable = true }
141+
}
142+
}
143+
}
144+
};
145+
146+
await indexClient.CreateIndexAsync(index);
147+
```
148+
149+
For more examples, see[azure-search-dotnet-samples/quickstart/v11/](https://github.com/Azure-Samples/azure-search-dotnet-samples/tree/master/quickstart/v11).
95150

96151
### [**Other SDKs**](#tab/other-sdks)
97152

@@ -105,71 +160,6 @@ For Cognitive Search, the Azure SDKs implement generally available features. As
105160

106161
---
107162

108-
<!-- ## Define fields
109-
110-
A search document is defined by the `fields` collection. You will need fields for queries and keys. You will probably also need fields to support filters, facets, and sorts. You might also need fields for data that a user never sees, for example you might want fields for profit margins or marketing promotions that you can use to modify search rank.
111-
112-
One field of type Edm.String must be designated as the document key. It's used to uniquely identify each search document and is case-sensitive. You can retrieve a document by its key to populate a details page.
113-
114-
If incoming data is hierarchical in nature, assign the [complex type](search-howto-complex-data-types.md) data type to represent the nested structures. The built-in sample data set, Hotels, illustrates complex types using an Address (contains multiple sub-fields) that has a one-to-one relationship with each hotel, and a Rooms complex collection, where multiple rooms are associated with each hotel.
115-
116-
Assign any analyzers to string fields before the index is created. Do the same for suggesters if you want to enable autocomplete on specific fields. -->
117-
118-
<!-- <a name="index-attributes"></a>
119-
120-
### Attributes
121-
122-
Field attributes determine how a field is used, such as whether it is used in full text search, faceted navigation, sort operations, and so forth.
123-
124-
String fields are often marked as "searchable" and "retrievable". Fields used to narrow search results include "sortable", "filterable", and "facetable".
125-
126-
|Attribute|Description|
127-
|---------------|-----------------|
128-
|"searchable" |Full-text searchable, subject to lexical analysis such as word-breaking during indexing. If you set a searchable field to a value like "sunny day", internally it will be split into the individual tokens "sunny" and "day". For details, see [How full text search works](search-lucene-query-architecture.md).|
129-
|"filterable" |Referenced in $filter queries. Filterable fields of type `Edm.String` or `Collection(Edm.String)` do not undergo word-breaking, so comparisons are for exact matches only. For example, if you set such a field f to "sunny day", `$filter=f eq 'sunny'` will find no matches, but `$filter=f eq 'sunny day'` will. |
130-
|"sortable" |By default the system sorts results by score, but you can configure sort based on fields in the documents. Fields of type `Collection(Edm.String)` cannot be "sortable". |
131-
|"facetable" |Typically used in a presentation of search results that includes a hit count by category (for example, hotels in a specific city). This option cannot be used with fields of type `Edm.GeographyPoint`. Fields of type `Edm.String` that are filterable, "sortable", or "facetable" can be at most 32 kilobytes in length. For details, see [Create Index (REST API)](/rest/api/searchservice/create-index).|
132-
|"key" |Unique identifier for documents within the index. Exactly one field must be chosen as the key field and it must be of type `Edm.String`.|
133-
|"retrievable" |Determines whether the field can be returned in a search result. This is useful when you want to use a field (such as *profit margin*) as a filter, sorting, or scoring mechanism, but do not want the field to be visible to the end user. This attribute must be `true` for `key` fields.|
134-
135-
Although you can add new fields at any time, existing field definitions are locked in for the lifetime of the index. For this reason, developers typically use the portal for creating simple indexes, testing ideas, or using the portal pages to look up a setting. Frequent iteration over an index design is more efficient if you follow a code-based approach so that you can rebuild the index easily.
136-
137-
> [!NOTE]
138-
> The APIs you use to build an index have varying default behaviors. For the [REST APIs](/rest/api/searchservice/Create-Index), most attributes are enabled by default (for example, "searchable" and "retrievable" are true for string fields) and you often only need to set them if you want to turn them off. For the .NET SDK, the opposite is true. On any property you do not explicitly set, the default is to disable the corresponding search behavior unless you specifically enable it.
139-
140-
<a name="index-size"></a>
141-
142-
## Attributes and index size (storage implications)
143-
144-
The size of an index is determined by the size of the documents you upload, plus index configuration, such as whether you include suggesters, and how you set attributes on individual fields.
145-
146-
The following screenshot illustrates index storage patterns resulting from various combinations of attributes. The index is based on the **real estate sample index**, which you can create easily using the Import data wizard. Although the index schemas are not shown, you can infer the attributes based on the index name. For example, *realestate-searchable* index has the "searchable" attribute selected and nothing else, *realestate-retrievable* index has the "retrievable" attribute selected and nothing else, and so forth.
147-
148-
![Index size based on attribute selection](./media/search-what-is-an-index/realestate-index-size.png "Index size based on attribute selection")
149-
150-
Although these index variants are artificial, we can refer to them for broad comparisons of how attributes affect storage. Does setting "retrievable" increase index size? No. Does adding fields to a **suggester** increase index size? Yes.
151-
152-
Making a field filterable or sortable also adds to storage consumption because filtered and sorted fields are not tokenized so that character sequences can be matched verbatim.
153-
154-
Also not reflected in the above table is the impact of [analyzers](search-analyzers.md). If you are using the edgeNgram tokenizer to store verbatim sequences of characters (a, ab, abc, abcd), the size of the index will be larger than if you used a standard analyzer.
155-
156-
> [!Note]
157-
> Storage architecture is considered an implementation detail of Azure Cognitive Search and could change without notice. There is no guarantee that current behavior will persist in the future.
158-
159-
<a name="corsoptions"></a>
160-
161-
## About `corsOptions`
162-
163-
Index schemas include a section for setting `corsOptions`. Client-side JavaScript cannot call any APIs by default since the browser will prevent all cross-origin requests. To allow cross-origin queries to your index, enable CORS (Cross-Origin Resource Sharing) by setting the **corsOptions** attribute. For security reasons, only query APIs support CORS.
164-
165-
The following options can be set for CORS:
166-
167-
+ **allowedOrigins** (required): This is a list of origins that will be granted access to your index. This means that any JavaScript code served from those origins will be allowed to query your index (assuming it provides the correct api-key). Each origin is typically of the form `protocol://<fully-qualified-domain-name>:<port>` although `<port>` is often omitted. See [Cross-origin resource sharing (Wikipedia)](https://en.wikipedia.org/wiki/Cross-origin_resource_sharing) for more details.
168-
169-
If you want to allow access to all origins, include `*` as a single item in the **allowedOrigins** array. *This is not recommended practice for production search services* but it is often useful for development and debugging.
170-
171-
+ **maxAgeInSeconds** (optional): Browsers use this value to determine the duration (in seconds) to cache CORS preflight responses. This must be a non-negative integer. The larger this value is, the better performance will be, but the longer it will take for CORS policy changes to take effect. If it is not set, a default duration of 5 minutes will be used. -->
172-
173163
## Next steps
174164

175165
Use the following links to become familiar with loading an index with data.

articles/search/search-what-is-an-index.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@ ms.date: 11/08/2021
1515

1616
Cognitive Search stores searchable content used for full text and filtered queries in a *search index*. An index is defined by a schema and saved to the service, with data import following as a second step.
1717

18-
Indexes contain *search documents*. Conceptually, a document is a single unit of searchable data in your index. A retailer might have a document for each product, a news organization might have a document for each article, and so forth. Mapping these concepts to more familiar database equivalents: a *search index* equates to a *table*, and *documents* are roughly equivalent to *rows* in a table.
18+
This article is an introduction to search indexes. Prefer to get started? See [Create a search index](search-hwo-to-create-search-index.md).
1919

20-
## What's an index schema?
20+
## What's a search index?
21+
22+
In Cognitive Search, indexes contain *search documents*. Conceptually, a document is a single unit of searchable data in your index. For example, a retailer might have a document for each product, a news organization might have a document for each article, and so forth. Mapping these concepts to more familiar database equivalents: a *search index* equates to a *table*, and *documents* are roughly equivalent to *rows* in a table.
2123

2224
The physical structure of an index is determined by the schema. The 'fields' collection is typically the largest part of an index, where each field is named, assigned a [data type](/rest/api/searchservice/Supported-data-types), and attributed with allowable behaviors that determine how it is used.
2325

@@ -126,7 +128,7 @@ You can get hands-on experience creating an index using almost any sample or wal
126128

127129
But you'll also want to become familiar with methodologies for loading an index with data. Index definition and data import strategies are defined in tandem. The following articles provide more information about creating and loading an index.
128130

129-
+ [Creating a search index](search-how-to-create-search-index.md)
131+
+ [Create a search index](search-how-to-create-search-index.md)
130132

131133
+ [Data import overview](search-what-is-data-import.md)
132134

0 commit comments

Comments
 (0)