You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/query-lucene-syntax.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,7 +148,7 @@ If you get syntax errors in your regular expression, review the [escape rules](#
148
148
149
149
## <aname="bkmk_wildcard"></a> Wildcard search
150
150
151
-
You can use generally recognized syntax for multiple (`*`) or single (`?`) character wildcard searches. Full Lucene syntax supports prefix, infix, and suffix matching.
151
+
You can use generally recognized syntax for multiple (`*`) or single (`?`) character wildcard searches. Full Lucene syntax supports prefix and infix matching. Use [regular expression](#bkmk_regex) syntax for suffix matching.
152
152
153
153
Note the Lucene query parser supports the use of these symbols with a single term, and not a phrase.
154
154
@@ -163,7 +163,7 @@ You can combine operators in one expression. For example, `980?2*` matches on `9
163
163
Suffix matching requires the regular expression forward slash `/` delimiters. Generally, you can’t use a `*` or `?` symbol as the first character of a term, without the `/`. It's also important to note that the `*` behaves differently when used outside of regex queries. Outside of the regex forward slash `/` delimiter, the `*` is a wildcard character and matches any series of characters much like `.*` in regex. As an example, `search=/non.*al/` produces the same result set as `search=non*al`.
164
164
165
165
> [!NOTE]
166
-
> As a rule, pattern matching is slow so you might want to explore alternative methods, such as edge n-gram tokenization that creates tokens for sequences of characters in a term. With n-gram tokenization, the index will be larger, but queries might execute faster, depending on the pattern construction and the length of strings you are indexing. For more information, see [Partial term search and patterns with special characters](search-query-partial-matching.md#tune-query-performance).
166
+
> As a rule, pattern matching is slow so you might want to explore alternative methods, such as edge n-gram tokenization that creates tokens for sequences of characters in a term. With n-gram tokenization, the index will be larger, but queries might execute faster, depending on the pattern construction and the length of strings you are indexing. For more information, see [Partial term search and patterns with special characters](search-query-partial-matching.md#optimizing-prefix-and-suffix-queries).
@@ -241,52 +241,95 @@ The previous sections explained the logic. This section steps through each API y
241
241
242
242
For infix and suffix queries, such as querying "num" or "numeric to find a match on "alphanumeric", use the full Lucene syntax and a regular expression: `search=/.*num.*/&queryType=full`
243
243
244
-
## Tune query performance
244
+
## Optimizing prefix and suffix queries
245
245
246
-
If you implement the recommended configuration that includes the keyword_v2 tokenizer and lower-case token filter, you might notice a decrease in query performance due to the extra token filter processing over existing tokens in your index.
246
+
Matching prefixes and suffixes using the default analyzer requires additional query features. Prefixes require [wildcard search](query-lucene-syntax.md#bkmk_wildcard) and suffixes require [regular expression search](query-lucene-syntax.md#bkmk_regex). Both of these features can reduce query performance.
247
247
248
-
The following example adds an [EdgeNGramTokenFilter](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html) to make prefix matches faster. Tokens are generated in 2-25 character combinations that include characters. Here's an example progression from two to seven tokens: MS, MSF, MSFT, MSFT/, MSFT/S, MSFT/SQ, MSFT/SQL.
248
+
The following example adds an [`EdgeNGramTokenFilter`](https://lucene.apache.org/core/6_6_1/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html) to make prefix or suffix matches faster. Tokens are generated in 2-25 character combinations that include characters. Here's an example progression from two to seven tokens: MS, MSF, MSFT, MSFT/, MSFT/S, MSFT/SQ, MSFT/SQL. `EdgeNGramTokenFilter` requires a `side` parameter which determines which side of the string character combinations are generated from. Use `front` for prefix queries and `back` for suffix queries.
249
249
250
250
Extra tokenization results in a larger index. If you have sufficient capacity to accommodate the larger index, this approach with its faster response time might be the best solution.
To search for account numbers that start with `123`, we can use the following query:
317
+
```
318
+
{
319
+
"search": "123",
320
+
"searchFields": "accountNumber_prefix"
321
+
}
322
+
```
323
+
324
+
325
+
To search for account numbers that end with `456`, we can use the following query:
326
+
```
327
+
{
328
+
"search": "456",
329
+
"searchFields": "accountNumber_suffix"
330
+
}
331
+
```
332
+
290
333
## Next steps
291
334
292
335
This article explains how analyzers both contribute to query problems and solve query problems. As a next step, take a closer look at analyzers affect indexing and query processing.
0 commit comments