You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/search-normalizers.md
+60-62Lines changed: 60 additions & 62 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,11 +24,6 @@ By applying a normalizer, you can achieve light text transformations that improv
24
24
+ Normalize accents and diacritics like ö or ê to ASCII equivalent characters "o" and "e"
25
25
+ Map characters like `-` and whitespace into a user-specified character
26
26
27
-
Normalizers are specified on string fields in the index and applied during indexing and query execution.
28
-
29
-
> [!NOTE]
30
-
> If fields are both searchable and filterable (or facetable or sortable), both analyzers and normalizers can be used. Analyzers are always used on searchable fields because its required for tokenization. Normalizers are optional.
31
-
32
27
## Benefits of normalizers
33
28
34
29
Searching and retrieving documents from a search index requires matching the query input to the contents of the document. Matching is either over tokenized content, as is the case when "search" is invoked, or over non-tokenized content if the request is a $filter, facet, or $orderby operation.
@@ -43,22 +38,11 @@ Because non-tokenized content is also not analyzed, small differences in the con
43
38
44
39
A normalizer, which is invoked during indexing and query execution, adds light transformations that smooth out minor differences in text for filter, facet, and sort scenarios. In the previous examples, the variants of "Las Vegas" would be processed according to the normalizer you select (for example, all text is lower-cased) for more uniform results.
45
40
46
-
## Predefined and custom normalizers
47
-
48
-
Azure Cognitive Search provides built-in normalizers for common use-cases along with the capability to customize as required.
49
-
50
-
| Category | Description |
51
-
|----------|-------------|
52
-
|[Predefined normalizers](#predefined-normalizers)| Provided out-of-the-box and can be used without any configuration. |
53
-
|[Custom normalizers](#add-custom-normalizers) <sup>1</sup> | For advanced scenarios. Requires user-defined configuration of a combination of existing elements, consisting of char and token filters.|
54
-
55
-
<sup>(1)</sup> Custom normalizers don't specify tokenizers since normalizers always produce a single token.
56
-
57
-
## How to specify normalizers
41
+
## How to specify a normalizer
58
42
59
43
Normalizers are specified in an index definition, on a per-field basis, on text fields (`Edm.String` and `Collection(Edm.String)`) that have at least one of "filterable", "sortable", or "facetable" properties set to true. Setting a normalizer is optional and it's null by default. We recommend evaluating predefined normalizers before configuring a custom one.
60
44
61
-
Normalizers can only be specified when a new field is added to the index. Try to assess the normalization needs upfront and assign normalizers in the initial stages of development when dropping and recreating indexes is routine. Normalizers can't be specified on a field that has already been created.
45
+
Normalizers can only be specified when a new field is added to the index, so if possible, try to assess the normalization needs upfront and assign normalizers in the initial stages of development when dropping and recreating indexes is routine.
62
46
63
47
1. When creating a field definition in the [index](/rest/api/searchservice/create-index), set the "normalizer" property to one of the following values: a [predefined normalizer](#predefined-normalizers) such as "lowercase", or a custom normalizer (defined in the same index schema).
64
48
@@ -73,12 +57,12 @@ Normalizers can only be specified when a new field is added to the index. Try to
73
57
"analyzer": "en.microsoft",
74
58
"normalizer": "lowercase"
75
59
...
76
-
},
60
+
}
61
+
]
77
62
```
78
63
79
64
1. Custom normalizers are defined in the "normalizers" section of the index first, and then assigned to the field definition as shown in the previous step. For more information, see [Create Index](/rest/api/searchservice/create-index) and also [Add custom normalizers](#add-custom-normalizers).
80
65
81
-
82
66
```json
83
67
"fields": [
84
68
{
@@ -90,52 +74,22 @@ Normalizers can only be specified when a new field is added to the index. Try to
90
74
"normalizer": "my_custom_normalizer"
91
75
},
92
76
```
77
+
93
78
> [!NOTE]
94
79
> To change the normalizer of an existing field, you'll have to rebuild the index entirely (you cannot rebuild individual fields).
95
80
96
81
A good workaround for production indexes, where rebuilding indexes is costly, is to create a new field identical to the old one but with the new normalizer, and use it in place of the old one. Use [Update Index](/rest/api/searchservice/update-index) to incorporate the new field and [mergeOrUpload](/rest/api/searchservice/addupdate-or-delete-documents) to populate it. Later, as part of planned index servicing, you can clean up the index to remove obsolete fields.
97
82
98
-
## Add custom normalizers
99
-
100
-
Custom normalizers are [defined within the index schema](/rest/api/searchservice/create-index). The definition includes a name, a type, one or more character filters and token filters. The character filters and token filters are the building blocks for a custom normalizer and responsible for the processing of the text. These filters are applied from left to right.
83
+
## Predefined and custom normalizers
101
84
102
-
The `token_filter_name_1` is the name of token filter, and `char_filter_name_1` and `char_filter_name_2` are the names of char filters (see [supported token filters](#supported-token-filters) and [supported char filters](#supported-char-filters)tables below for valid values).
85
+
Azure Cognitive Search provides built-in normalizers for common use-cases along with the capability to customize as required.
| [Predefined normalizers](#predefined-normalizers) | Provided out-of-the-box and can be used without any configuration. |
90
+
|[Custom normalizers](#add-custom-normalizers) <sup>1</sup> | For advanced scenarios. Requires user-defined configuration of a combination of existing elements, consisting of char and token filters.|
137
91
138
-
Custom normalizers can be added during index creation or later by updating an existing one. Adding a custom normalizer to an existing index requires the "allowIndexDowntime" flag to be specified in [Update Index](/rest/api/searchservice/update-index) and will cause the index to be unavailable for a few seconds.
92
+
<sup>(1)</sup> Custom normalizers don't specify tokenizers since normalizers always produce a single token.
139
93
140
94
## Normalizers reference
141
95
@@ -174,15 +128,57 @@ The list below shows the token filters supported for normalizers and is a subset
Custom normalizers are [defined within the index schema](/rest/api/searchservice/create-index). The definition includes a name, a type, one or more character filters and token filters. The character filters and token filters are the building blocks for a custom normalizer and responsible for the processing of the text. These filters are applied from left to right.
134
+
135
+
The `token_filter_name_1` is the name of token filter, and `char_filter_name_1` and `char_filter_name_2` are the names of char filters (see [supported token filters](#supported-token-filters) and [supported char filters](#supported-char-filters)tables below for valid values).
Custom normalizers can be added during index creation or later by updating an existing one. Adding a custom normalizer to an existing index requires the "allowIndexDowntime" flag to be specified in [Update Index](/rest/api/searchservice/update-index) and will cause the index to be unavailable for a few seconds.
172
+
177
173
## Custom normalizer example
178
174
179
175
The example below illustrates a custom normalizer definition with corresponding character filters and token filters. Custom options for character filters and token filters are specified separately as named constructs, and then referenced in the normalizer definition as illustrated below.
180
176
181
-
* A custom normalizer named "my_custom_normalizer" is defined in the "normalizers" section of the index definition.
177
+
+ A custom normalizer named "my_custom_normalizer" is defined in the "normalizers" section of the index definition.
182
178
183
-
* The normalizer is composed of two character filters and three token filters: elision, lowercase, and customized asciifolding filter "my_asciifolding".
179
+
+ The normalizer is composed of two character filters and three token filters: elision, lowercase, and customized asciifolding filter "my_asciifolding".
184
180
185
-
* The first character filter "map_dash" replaces all dashes with underscores while the second one "remove_whitespace" removes all spaces.
181
+
+ The first character filter "map_dash" replaces all dashes with underscores while the second one "remove_whitespace" removes all spaces.
186
182
187
183
```json
188
184
{
@@ -241,6 +237,8 @@ The example below illustrates a custom normalizer definition with corresponding
241
237
242
238
## See also
243
239
240
+
+[Querying concepts in Azure Cognitive Search](search-query-overview.md)
241
+
244
242
+[Analyzers for linguistic and text processing](search-analyzers.md)
0 commit comments