Skip to content

Commit b4347c8

Browse files
authored
Merge pull request #110907 from HeidiSteen/heidist-search
[Azure Cog Search] edits
2 parents 7087f79 + 9148a0e commit b4347c8

File tree

1 file changed

+14
-10
lines changed

1 file changed

+14
-10
lines changed

articles/search/search-query-partial-matching.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,35 +8,35 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: conceptual
11-
ms.date: 04/02/2020
11+
ms.date: 04/09/2020
1212
---
1313
# Partial term search and patterns with special characters (wildcard, regex, patterns)
1414

15-
A *partial term search* refers to queries consisting of term fragments, such as the first, last, or interior parts of a string. A *pattern* might a combination of fragments, sometimes with special characters such as dashes or slashes that are part of the query. Common use-cases include querying for portions of a phone number, URL, people or product codes, or compound words.
15+
A *partial term search* refers to queries consisting of term fragments, where instead of a whole term, you might have just the start, middle, or end of term (sometimes referred to as prefix, infix, or suffix queries). A *pattern* might a combination of fragments, often with special characters such as dashes or slashes that are part of the query string. Common use-cases include querying for portions of a phone number, URL, people or product codes, or compound words.
1616

17-
Partial search can be problematic if the index doesn't have terms in the format required for pattern matching. During the text analysis phase of indexing, using the default standard analyzer, special characters are discarded, composite and compound strings are split up, causing pattern queries to fail when no match is found. For example, a phone number like `+1 (425) 703-6214`(tokenized as `"1"`, `"425"`, `"703"`, `"6214"`) won't show up in a `"3-62"` query because that content doesn't actually exist in the index.
17+
Partial and pattern search can be problematic if the index doesn't have terms in the expected format. During the [lexical analysis phase](search-lucene-query-architecture.md#stage-2-lexical-analysis) of indexing (assuming the default standard analyzer), special characters are discarded, composite and compound strings are split up, and whitespace is deleted; all of which can cause pattern queries to fail when no match is found. For example, a phone number like `+1 (425) 703-6214` (tokenized as `"1"`, `"425"`, `"703"`, `"6214"`) won't show up in a `"3-62"` query because that content doesn't actually exist in the index.
1818

1919
The solution is to invoke an analyzer that preserves a complete string, including spaces and special characters if necessary, so that you can match on partial terms and patterns. Creating an additional field for an intact string, plus using a content-preserving analyzer, is the basis of the solution.
2020

2121
## What is partial search in Azure Cognitive Search
2222

2323
In Azure Cognitive Search, partial search and pattern is available in these forms:
2424

25-
+ [Prefix search](query-simple-syntax.md#prefix-search), such as `search=cap*`, matching on "Cap'n Jack's Waterfront Inn" or "Gacc Capital". You can use the simply query syntax for prefix search.
25+
+ [Prefix search](query-simple-syntax.md#prefix-search), such as `search=cap*`, matching on "Cap'n Jack's Waterfront Inn" or "Gacc Capital". You can use the simple query syntax or the full Lucene query syntax for prefix search.
2626

27-
+ [Wildcard search](query-lucene-syntax.md#bkmk_wildcard) or [Regular expressions](query-lucene-syntax.md#bkmk_regex) that search for a pattern or parts of an embedded string, including the suffix. Wildcard and regular expressions require the full Lucene syntax.
27+
+ [Wildcard search](query-lucene-syntax.md#bkmk_wildcard) or [Regular expressions](query-lucene-syntax.md#bkmk_regex) that search for a pattern or parts of an embedded string. Wildcard and regular expressions require the full Lucene syntax. Suffix and index queries are formulated as a regular expression.
2828

29-
Some examples of partial term search include the following. For a suffix query, given the term "alphanumeric", you would use a wildcard search (`search=/.*numeric.*/`) to find a match. For a partial term that includes characters, such as a URL fragment, you might need to add escape characters. In JSON, a forward slash `/` is escaped with a backward slash `\`. As such, `search=/.*microsoft.com\/azure\/.*/` is the syntax for the URL fragment "microsoft.com/azure/".
29+
Some examples of partial term search include the following. For a suffix query, given the term "alphanumeric", you would use a wildcard search (`search=/.*numeric.*/`) to find a match. For a partial term that includes interior characters, such as a URL fragment, you might need to add escape characters. In JSON, a forward slash `/` is escaped with a backward slash `\`. As such, `search=/.*microsoft.com\/azure\/.*/` is the syntax for the URL fragment "microsoft.com/azure/".
3030

3131
As noted, all of the above require that the index contains strings in a format conducive to pattern matching, which the standard analyzer does not provide. By following the steps in this article, you can ensure that the necessary content exists to support these scenarios.
3232

33-
## Solving partial search problems
33+
## Solving partial/pattern search problems
3434

35-
When you need to search on patterns or special characters, you can override the default analyzer with a custom analyzer that operates under simpler tokenization rules, retaining the whole string. Taking a step back, the approach looks like this:
35+
When you need to search on fragments or patterns or special characters, you can override the default analyzer with a custom analyzer that operates under simpler tokenization rules, retaining the whole string. Taking a step back, the approach looks like this:
3636

3737
+ Define a field to store an intact version of the string (assuming you want analyzed and non-analyzed text)
38-
+ Choose a predefined analyzer or define a custom analyzer to output an intact string
39-
+ Assign the analyzer to the field
38+
+ Choose a predefined analyzer or define a custom analyzer to output a non-analyzed intact string
39+
+ Assign the custom analyzer to the field
4040
+ Build and test the index
4141

4242
> [!TIP]
@@ -218,6 +218,10 @@ The previous sections explained the logic. This section steps through each API y
218218

219219
+ [Search Documents](https://docs.microsoft.com/rest/api/searchservice/search-documents) explains how to construct a query request, using either [simple syntax](query-simple-syntax.md) or [full Lucene syntax](query-lucene-syntax.md) for wildcard and regular expressions.
220220

221+
For partial term queries, such as querying "3-6214" to find a match on "+1 (425) 703-6214", you can use the simple syntax: `search=3-6214&queryType=simple`.
222+
223+
For infix and suffix queries, such as querying "num" or "numeric to find a match on "alphanumeric", use the full Lucene syntax and a regular expression: `search=/.*num.*/&queryType=full`
224+
221225
## Tips and best practices
222226

223227
### Tune query performance

0 commit comments

Comments
 (0)