Skip to content

Commit 7585b59

Browse files
committed
update prefix and suffix matching docs
1 parent 115c87f commit 7585b59

File tree

1 file changed

+76
-33
lines changed

1 file changed

+76
-33
lines changed

articles/search/search-query-partial-matching.md

Lines changed: 76 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -190,24 +190,24 @@ The following example illustrates a custom analyzer that provides the keyword to
190190
{
191191
"fields": [
192192
{
193-
"name": "accountNumber",
194-
"analyzer":"myCustomAnalyzer",
195-
"type": "Edm.String",
196-
"searchable": true,
197-
"filterable": true,
198-
"retrievable": true,
199-
"sortable": false,
200-
"facetable": false
193+
"name": "accountNumber",
194+
"analyzer":"myCustomAnalyzer",
195+
"type": "Edm.String",
196+
"searchable": true,
197+
"filterable": true,
198+
"retrievable": true,
199+
"sortable": false,
200+
"facetable": false
201201
}
202202
],
203203

204204
"analyzers": [
205205
{
206-
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
207-
"name":"myCustomAnalyzer",
208-
"charFilters":[],
209-
"tokenizer":"keyword_v2",
210-
"tokenFilters":["lowercase"]
206+
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
207+
"name":"myCustomAnalyzer",
208+
"charFilters":[],
209+
"tokenizer":"keyword_v2",
210+
"tokenFilters":["lowercase"]
211211
}
212212
],
213213
"tokenizers":[],
@@ -241,52 +241,95 @@ The previous sections explained the logic. This section steps through each API y
241241

242242
For infix and suffix queries, such as querying "num" or "numeric to find a match on "alphanumeric", use the full Lucene syntax and a regular expression: `search=/.*num.*/&queryType=full`
243243

244-
## Tune query performance
244+
## Optimizing prefix and suffix queries
245245

246246
If you implement the recommended configuration that includes the keyword_v2 tokenizer and lower-case token filter, you might notice a decrease in query performance due to the extra token filter processing over existing tokens in your index.
247247

248-
The following example adds an [EdgeNGramTokenFilter](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html) to make prefix matches faster. Tokens are generated in 2-25 character combinations that include characters. Here's an example progression from two to seven tokens: MS, MSF, MSFT, MSFT/, MSFT/S, MSFT/SQ, MSFT/SQL.
248+
The following example adds an [`EdgeNGramTokenFilter`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html) to make prefix matches faster. Tokens are generated in 2-25 character combinations that include characters. Here's an example progression from two to seven tokens: MS, MSF, MSFT, MSFT/, MSFT/S, MSFT/SQ, MSFT/SQL. `EdgeNGramTokenFilter` requires a `side` parameter which determines which side of the string character combinations are generated from. Use `front` for prefix queries and `back` for suffix queries.
249249

250250
Extra tokenization results in a larger index. If you have sufficient capacity to accommodate the larger index, this approach with its faster response time might be the best solution.
251251

252252
```json
253253
{
254254
"fields": [
255255
{
256-
"name": "accountNumber",
257-
"analyzer":"myCustomAnalyzer",
258-
"type": "Edm.String",
259-
"searchable": true,
260-
"filterable": true,
261-
"retrievable": true,
262-
"sortable": false,
263-
"facetable": false
256+
"name": "accountNumber_prefix",
257+
"indexAnalyzer": "ngram_front_analyzer",
258+
"searchAnalyzer": "keyword",
259+
"type": "Edm.String",
260+
"searchable": true,
261+
"filterable": false,
262+
"retrievable": true,
263+
"sortable": false,
264+
"facetable": false
265+
},
266+
{
267+
"name": "accountNumber_suffix",
268+
"indexAnalyzer": "ngram_back_analyzer",
269+
"searchAnalyzer": "keyword",
270+
"type": "Edm.String",
271+
"searchable": true,
272+
"filterable": false,
273+
"retrievable": true,
274+
"sortable": false,
275+
"facetable": false
264276
}
265277
],
266278

267279
"analyzers": [
268280
{
269-
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
270-
"name":"myCustomAnalyzer",
271-
"charFilters":[],
272-
"tokenizer":"keyword_v2",
273-
"tokenFilters":["lowercase", "my_edgeNGram"]
281+
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
282+
"name":"ngram_front_analyzer",
283+
"charFilters":[],
284+
"tokenizer":"keyword_v2",
285+
"tokenFilters":["lowercase", "front_edgeNGram"]
286+
},
287+
{
288+
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
289+
"name":"ngram_back_analyzer",
290+
"charFilters":[],
291+
"tokenizer":"keyword_v2",
292+
"tokenFilters":["lowercase", "back_edgeNGram"]
274293
}
275294
],
276295
"tokenizers":[],
277296
"charFilters": [],
278297
"tokenFilters": [
279298
{
280-
"@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
281-
"name":"my_edgeNGram",
282-
"minGram": 2,
283-
"maxGram": 25,
284-
"side": "front"
299+
"@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
300+
"name":"front_edgeNGram",
301+
"minGram": 2,
302+
"maxGram": 25,
303+
"side": "front"
304+
},
305+
{
306+
"@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
307+
"name":"back_edgeNGram",
308+
"minGram": 2,
309+
"maxGram": 25,
310+
"side": "back"
285311
}
286312
]
287313
}
288314
```
289315

316+
To search for account numbers that start with `123`, we can use the following query:
317+
```
318+
{
319+
"search": "123",
320+
"searchFields": "accountNumber_prefix"
321+
}
322+
```
323+
324+
325+
To search for account numbers that end with `456`, we can use the following query:
326+
```
327+
{
328+
"search": "456",
329+
"searchFields": "accountNumber_suffix"
330+
}
331+
```
332+
290333
## Next steps
291334

292335
This article explains how analyzers both contribute to query problems and solve query problems. As a next step, take a closer look at analyzers affect indexing and query processing.

0 commit comments

Comments
 (0)