Skip to content

Commit 194263a

Browse files
committed
reordered some sections
1 parent 6eae210 commit 194263a

File tree

1 file changed

+60
-62
lines changed

1 file changed

+60
-62
lines changed

articles/search/search-normalizers.md

Lines changed: 60 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,6 @@ By applying a normalizer, you can achieve light text transformations that improv
2424
+ Normalize accents and diacritics like ö or ê to ASCII equivalent characters "o" and "e"
2525
+ Map characters like `-` and whitespace into a user-specified character
2626

27-
Normalizers are specified on string fields in the index and applied during indexing and query execution.
28-
29-
> [!NOTE]
30-
> If fields are both searchable and filterable (or facetable or sortable), both analyzers and normalizers can be used. Analyzers are always used on searchable fields because its required for tokenization. Normalizers are optional.
31-
3227
## Benefits of normalizers
3328

3429
Searching and retrieving documents from a search index requires matching the query input to the contents of the document. Matching is either over tokenized content, as is the case when "search" is invoked, or over non-tokenized content if the request is a $filter, facet, or $orderby operation.
@@ -43,22 +38,11 @@ Because non-tokenized content is also not analyzed, small differences in the con
4338

4439
A normalizer, which is invoked during indexing and query execution, adds light transformations that smooth out minor differences in text for filter, facet, and sort scenarios. In the previous examples, the variants of "Las Vegas" would be processed according to the normalizer you select (for example, all text is lower-cased) for more uniform results.
4540

46-
## Predefined and custom normalizers
47-
48-
Azure Cognitive Search provides built-in normalizers for common use-cases along with the capability to customize as required.
49-
50-
| Category | Description |
51-
|----------|-------------|
52-
| [Predefined normalizers](#predefined-normalizers) | Provided out-of-the-box and can be used without any configuration. |
53-
|[Custom normalizers](#add-custom-normalizers) <sup>1</sup> | For advanced scenarios. Requires user-defined configuration of a combination of existing elements, consisting of char and token filters.|
54-
55-
<sup>(1)</sup> Custom normalizers don't specify tokenizers since normalizers always produce a single token.
56-
57-
## How to specify normalizers
41+
## How to specify a normalizer
5842

5943
Normalizers are specified in an index definition, on a per-field basis, on text fields (`Edm.String` and `Collection(Edm.String)`) that have at least one of "filterable", "sortable", or "facetable" properties set to true. Setting a normalizer is optional and it's null by default. We recommend evaluating predefined normalizers before configuring a custom one.
6044

61-
Normalizers can only be specified when a new field is added to the index. Try to assess the normalization needs upfront and assign normalizers in the initial stages of development when dropping and recreating indexes is routine. Normalizers can't be specified on a field that has already been created.
45+
Normalizers can only be specified when a new field is added to the index, so if possible, try to assess the normalization needs upfront and assign normalizers in the initial stages of development when dropping and recreating indexes is routine.
6246

6347
1. When creating a field definition in the [index](/rest/api/searchservice/create-index), set the "normalizer" property to one of the following values: a [predefined normalizer](#predefined-normalizers) such as "lowercase", or a custom normalizer (defined in the same index schema).
6448

@@ -73,12 +57,12 @@ Normalizers can only be specified when a new field is added to the index. Try to
7357
"analyzer": "en.microsoft",
7458
"normalizer": "lowercase"
7559
...
76-
},
60+
}
61+
]
7762
```
7863

7964
1. Custom normalizers are defined in the "normalizers" section of the index first, and then assigned to the field definition as shown in the previous step. For more information, see [Create Index](/rest/api/searchservice/create-index) and also [Add custom normalizers](#add-custom-normalizers).
8065

81-
8266
```json
8367
"fields": [
8468
{
@@ -90,52 +74,22 @@ Normalizers can only be specified when a new field is added to the index. Try to
9074
"normalizer": "my_custom_normalizer"
9175
},
9276
```
77+
9378
> [!NOTE]
9479
> To change the normalizer of an existing field, you'll have to rebuild the index entirely (you cannot rebuild individual fields).
9580

9681
A good workaround for production indexes, where rebuilding indexes is costly, is to create a new field identical to the old one but with the new normalizer, and use it in place of the old one. Use [Update Index](/rest/api/searchservice/update-index) to incorporate the new field and [mergeOrUpload](/rest/api/searchservice/addupdate-or-delete-documents) to populate it. Later, as part of planned index servicing, you can clean up the index to remove obsolete fields.
9782

98-
## Add custom normalizers
99-
100-
Custom normalizers are [defined within the index schema](/rest/api/searchservice/create-index). The definition includes a name, a type, one or more character filters and token filters. The character filters and token filters are the building blocks for a custom normalizer and responsible for the processing of the text. These filters are applied from left to right.
83+
## Predefined and custom normalizers
10184

102-
The `token_filter_name_1` is the name of token filter, and `char_filter_name_1` and `char_filter_name_2` are the names of char filters (see [supported token filters](#supported-token-filters) and [supported char filters](#supported-char-filters)tables below for valid values).
85+
Azure Cognitive Search provides built-in normalizers for common use-cases along with the capability to customize as required.
10386

104-
```json
105-
"normalizers":(optional)[
106-
{
107-
"name":"name of normalizer",
108-
"@odata.type":"#Microsoft.Azure.Search.CustomNormalizer",
109-
"charFilters":[
110-
"char_filter_name_1",
111-
"char_filter_name_2"
112-
],
113-
"tokenFilters":[
114-
"token_filter_name_1
115-
]
116-
}
117-
],
118-
"charFilters":(optional)[
119-
{
120-
"name":"char_filter_name_1",
121-
"@odata.type":"#char_filter_type",
122-
"option1":value1,
123-
"option2":value2,
124-
...
125-
}
126-
],
127-
"tokenFilters":(optional)[
128-
{
129-
"name":"token_filter_name_1",
130-
"@odata.type":"#token_filter_type",
131-
"option1":value1,
132-
"option2":value2,
133-
...
134-
}
135-
]
136-
```
87+
| Category | Description |
88+
|----------|-------------|
89+
| [Predefined normalizers](#predefined-normalizers) | Provided out-of-the-box and can be used without any configuration. |
90+
|[Custom normalizers](#add-custom-normalizers) <sup>1</sup> | For advanced scenarios. Requires user-defined configuration of a combination of existing elements, consisting of char and token filters.|
13791

138-
Custom normalizers can be added during index creation or later by updating an existing one. Adding a custom normalizer to an existing index requires the "allowIndexDowntime" flag to be specified in [Update Index](/rest/api/searchservice/update-index) and will cause the index to be unavailable for a few seconds.
92+
<sup>(1)</sup> Custom normalizers don't specify tokenizers since normalizers always produce a single token.
13993

14094
## Normalizers reference
14195

@@ -174,15 +128,57 @@ The list below shows the token filters supported for normalizers and is a subset
174128
+ [lowercase](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/LowerCaseFilter.html)
175129
+ [uppercase](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/core/UpperCaseFilter.html)
176130

131+
## Create a custom normalizer
132+
133+
Custom normalizers are [defined within the index schema](/rest/api/searchservice/create-index). The definition includes a name, a type, one or more character filters and token filters. The character filters and token filters are the building blocks for a custom normalizer and responsible for the processing of the text. These filters are applied from left to right.
134+
135+
The `token_filter_name_1` is the name of token filter, and `char_filter_name_1` and `char_filter_name_2` are the names of char filters (see [supported token filters](#supported-token-filters) and [supported char filters](#supported-char-filters)tables below for valid values).
136+
137+
```json
138+
"normalizers":(optional)[
139+
{
140+
"name":"name of normalizer",
141+
"@odata.type":"#Microsoft.Azure.Search.CustomNormalizer",
142+
"charFilters":[
143+
"char_filter_name_1",
144+
"char_filter_name_2"
145+
],
146+
"tokenFilters":[
147+
"token_filter_name_1
148+
]
149+
}
150+
],
151+
"charFilters":(optional)[
152+
{
153+
"name":"char_filter_name_1",
154+
"@odata.type":"#char_filter_type",
155+
"option1":value1,
156+
"option2":value2,
157+
...
158+
}
159+
],
160+
"tokenFilters":(optional)[
161+
{
162+
"name":"token_filter_name_1",
163+
"@odata.type":"#token_filter_type",
164+
"option1":value1,
165+
"option2":value2,
166+
...
167+
}
168+
]
169+
```
170+
171+
Custom normalizers can be added during index creation or later by updating an existing one. Adding a custom normalizer to an existing index requires the "allowIndexDowntime" flag to be specified in [Update Index](/rest/api/searchservice/update-index) and will cause the index to be unavailable for a few seconds.
172+
177173
## Custom normalizer example
178174

179175
The example below illustrates a custom normalizer definition with corresponding character filters and token filters. Custom options for character filters and token filters are specified separately as named constructs, and then referenced in the normalizer definition as illustrated below.
180176

181-
* A custom normalizer named "my_custom_normalizer" is defined in the "normalizers" section of the index definition.
177+
+ A custom normalizer named "my_custom_normalizer" is defined in the "normalizers" section of the index definition.
182178

183-
* The normalizer is composed of two character filters and three token filters: elision, lowercase, and customized asciifolding filter "my_asciifolding".
179+
+ The normalizer is composed of two character filters and three token filters: elision, lowercase, and customized asciifolding filter "my_asciifolding".
184180

185-
* The first character filter "map_dash" replaces all dashes with underscores while the second one "remove_whitespace" removes all spaces.
181+
+ The first character filter "map_dash" replaces all dashes with underscores while the second one "remove_whitespace" removes all spaces.
186182

187183
```json
188184
{
@@ -241,6 +237,8 @@ The example below illustrates a custom normalizer definition with corresponding
241237

242238
## See also
243239

240+
+ [Querying concepts in Azure Cognitive Search](search-query-overview.md)
241+
244242
+ [Analyzers for linguistic and text processing](search-analyzers.md)
245243

246-
+ [Search Documents REST API](/rest/api/searchservice/search-documents)
244+
+ [Search Documents REST API](/rest/api/searchservice/search-documents)

0 commit comments

Comments
 (0)