Skip to content

Commit c0bc334

Browse files
authored
Merge pull request #110290 from HeidiSteen/heidist-search
[Azure Cognitive Search] edits for clarity/readability
2 parents 577e507 + f6baa6f commit c0bc334

File tree

2 files changed

+13
-8
lines changed

2 files changed

+13
-8
lines changed

articles/search/query-lucene-syntax.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ Field grouping is similar but scopes the grouping to a single field. For example
100100

101101
### OR operator `OR` or `||`
102102

103-
The OR operator is a vertical bar or pipe character. For example: `wifi || luxury` will search for documents containing either "wifi" or "luxury" or both. Because OR is the default conjunction operator, you could also leave it out, such that `wifi luxury` is the equivalent of `wifi || luxuery`.
103+
The OR operator is a vertical bar or pipe character. For example: `wifi || luxury` will search for documents containing either "wifi" or "luxury" or both. Because OR is the default conjunction operator, you could also leave it out, such that `wifi luxury` is the equivalent of `wifi || luxury`.
104104

105105
### AND operator `AND`, `&&` or `+`
106106

@@ -159,6 +159,8 @@ The following example helps illustrate the differences. Suppose that there's a s
159159

160160
For example, to find documents containing "motel" or "hotel", specify `/[mh]otel/`. Regular expression searches are matched against single words.
161161

162+
Some tools and languages impose additional escape character requirements. For JSON, strings that include a forward slash are escaped with a backward slash: "microsoft.com/azure/" becomes `search=/.*microsoft.com\/azure\/.*/` where `search=/.* <string-placeholder>.*/` sets up the regular expression, and `microsoft.com\/azure\/` is the string with an escaped forward slash.
163+
162164
## <a name="bkmk_wildcard"></a> Wildcard search
163165
You can use generally recognized syntax for multiple (*) or single (?) character wildcard searches. Note the Lucene query parser supports the use of these symbols with a single term, and not a phrase.
164166

articles/search/search-query-partial-matching.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,22 +10,25 @@ ms.service: cognitive-search
1010
ms.topic: conceptual
1111
ms.date: 04/02/2020
1212
---
13-
# Partial term search in Azure Cognitive Search queries (wildcard, regex, fuzzy search, patterns)
13+
# Partial term search and patterns with special characters - Azure Cognitive Search (wildcard, regex, patterns)
1414

15-
A *partial term search* refers to queries consisting of term fragments, such as the first, last, or interior parts of a string, or a pattern consisting of a combination of fragments, often separated by special characters such as dashes or slashes. Common use-cases include querying for portions of a phone number, URL, people or product codes, or compound words.
15+
A *partial term search* refers to queries consisting of term fragments, such as the first, last, or interior parts of a string. A *pattern* might a combination of fragments, sometimes with special characters such as dashes or slashes that are part of the query. Common use-cases include querying for portions of a phone number, URL, people or product codes, or compound words.
1616

17-
Partial search can be problematic because the index itself does not typically store terms in a way that is conducive to partial string and pattern matching. During the text analysis phase of indexing, special characters are discarded, composite and compound strings are split up, causing pattern queries to fail when no match is found. For example, a phone number like `+1 (425) 703-6214`(tokenized as `"1"`, `"425"`, `"703"`, `"6214"`) won't show up in a `"3-62"` query because that content doesn't actually exist in the index.
17+
Partial search can be problematic if the index doesn't have terms in the format required for pattern matching. During the text analysis phase of indexing, using the default standard analyzer, special characters are discarded, composite and compound strings are split up, causing pattern queries to fail when no match is found. For example, a phone number like `+1 (425) 703-6214`(tokenized as `"1"`, `"425"`, `"703"`, `"6214"`) won't show up in a `"3-62"` query because that content doesn't actually exist in the index.
1818

19-
The solution is to store intact versions of these strings in the index so that you can support partial search scenarios. Creating an additional field for an intact string, plus using a content-preserving analyzer, is the basis of the solution.
19+
The solution is to invoke an analyzer that preserves a complete string, including spaces and special characters if necessary, so that you can support partial terms and patterns. Creating an additional field for an intact string, plus using a content-preserving analyzer, is the basis of the solution.
2020

2121
## What is partial search in Azure Cognitive Search
2222

23-
In Azure Cognitive Search, partial search is available in these forms:
23+
In Azure Cognitive Search, partial search and pattern is available in these forms:
2424

2525
+ [Prefix search](query-simple-syntax.md#prefix-search), such as `search=cap*`, matching on "Cap'n Jack's Waterfront Inn" or "Gacc Capital". You can use the simply query syntax for prefix search.
26-
+ [Wildcard search](query-lucene-syntax.md#bkmk_wildcard) or [Regular expressions](query-lucene-syntax.md#bkmk_regex) that search for a pattern or parts of an embedded string, including the suffix. For example, given the term "alphanumeric", you would use a wildcard search (`search=/.*numeric.*/`) for a suffix query match on that term. Wildcard and regular expressions require the full Lucene syntax.
2726

28-
When any of the above query types are needed in your client application, follow the steps in this article to ensure the necessary content exists in your index.
27+
+ [Wildcard search](query-lucene-syntax.md#bkmk_wildcard) or [Regular expressions](query-lucene-syntax.md#bkmk_regex) that search for a pattern or parts of an embedded string, including the suffix. Wildcard and regular expressions require the full Lucene syntax.
28+
29+
Some examples of partial term search include the following. For a suffix query, given the term "alphanumeric", you would use a wildcard search (`search=/.*numeric.*/`) to find a match. For a partial term that includes characters, such as a URL fragment, you might need to add escape characters. In JSON, a forward slash `/` is escaped with a backward slash `\`. As such, `search=/.*microsoft.com\/azure\/.*/` is the syntax for the URL fragment "microsoft.com/azure/".
30+
31+
As noted, all of the above require that the index contains strings in a format conducive to pattern matching, which the standard analyzer does not provide. By following the steps in this article, you can ensure that the necessary content exists to support these scenarios.
2932

3033
## Solving partial search problems
3134

0 commit comments

Comments
 (0)