Skip to content

Commit 02c1128

Browse files
committed
Checkpoint on fuzzy
1 parent f73a867 commit 02c1128

File tree

1 file changed

+25
-12
lines changed

1 file changed

+25
-12
lines changed

articles/search/search-query-fuzzy.md

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,29 +10,37 @@ ms.service: cognitive-search
1010
ms.topic: conceptual
1111
ms.date: 04/06/2020
1212
---
13-
# Fuzzy search for auto-corrected misspellings and typos
13+
# Fuzzy search to correct misspellings and typos
1414

15-
Azure Cognitive Search provides fuzzy search, a type of query that scans for highly similar terms in addition to the verbatim term. Expanding search to include a near-match has the effect of auto-correcting a typo when the discrepancy is just a few characters off.
15+
Azure Cognitive Search provides fuzzy search, a type of query that scans for highly similar terms in addition to the exact term. Expanding search to include a near-match has the effect of auto-correcting a typo when the discrepancy is just a few characters off.
1616

1717
## What is fuzzy search?
1818

19-
It's an expansion exercise that produces a match on similarly constructed terms, where the first character is the same, but other discrepancies are limited to one to two characters within the term. For example, given `"special~"`, s
19+
It's an expansion exercise that produces a match on similar terms. A term is considered similar if the first character is the same as the query terms, and other differences are numbered two or fewer edits: a character in the comparison string is inserted, deleted, substituted, or transposed.
2020

21-
A fuzzy search can expand a term up to 50 additional terms, but a typical expansion is usually much less.
21+
The string correction algorithm that specifies the difference between two terms is the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) metric, "where distance is the minimum number of operations (insertions, deletions, substitutions, or transpositions of two adjacent characters) required to change one word into the other".
2222

23-
The distance criteria is the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) metric, where distance is the minimum number of operations (insertions, deletions, substitutions, or transpositions of two adjacent characters) required to change one word into the other. By default the distance is 2. A value of `~0` signifies no expansion (search for the exact term as given), and `~1` signifies one degree of difference.
23+
In Azure Cognitive Search:
2424

25-
Fuzzy search applies to whole terms, but you can support phrases through AND constructions. For example, "Unviersty~ of~ "Wshington~" would match on "University of Washington".
25+
+ The default distance is 2. A value of `~0` signifies no expansion (only the exact term is considered a match), and `~1` signifies one degree of difference.
26+
27+
+ A fuzzy query can expand a term up to 50 additional permutations, although a typical expansion is usually much less.
28+
29+
+ Fuzzy query applies to whole terms, but you can support phrases through AND constructions. For example, "Unviersty~ of~ "Wshington~" would match on "University of Washington".
2630

2731
## How to use fuzzy search
2832

2933
Fuzzy search is constructed using the full Lucene query syntax, invoking the [Lucene query parser](https://lucene.apache.org/core/6_6_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html).
3034

31-
+ Set the full Lucene parser (`queryType=full`) on the query
35+
1. Set the full Lucene parser on the query (`queryType=full`).
36+
37+
1. In the query request, make sure you are targeting non-analyzed or minimally analyzed fields that keeps strings intact. Use this parameter to target a query on specific fields (`searchFields=<field1,field2>`).
3238

33-
+ In the query request, specify fields that were analyzed during indexing with a custom analyzer or built-in analyzer that keeps strings intact (with minimal transformations and reductions). The `searchFields` parameter is used to target a query on specific fields.
39+
This step assumes you have defined a field that contains the exact string, and that the field was analyzed during indexing with a content-preserving analyzer, such as a keyword analyzer.
3440

35-
+ Use the tilde (`~`) operator at the end of a single word with an optional parameter, a number between 0 and 2 (default), that specifies the edit distance. For example, "blue~" or "blue~1" would return "blue", "blues", and "glue".
41+
1. Use the tilde (`~`) operator at the end of the whole term (`search=<string>~`).
42+
43+
Alternatively, you can include an optional parameter, a number between 0 and 2 (default), that specifies the edit distance (`~1`). For example, "blue~" or "blue~1" would return "blue", "blues", and "glue".
3644

3745
In Azure Cognitive Search, besides the term and distance (up to 2), there are no additional parameters to set on the query.
3846

@@ -41,9 +49,14 @@ In Azure Cognitive Search, besides the term and distance (up to 2), there are no
4149
4250
## How to test fuzzy search
4351

44-
For testing, we recommend Search explorer or Postman for iterating over a query expression. You can introduce permutations of a term and evaluate the responses that come back.
52+
For simple testing, we recommend [Search explorer](search-explorer.md) or [Postman](search-get-started-postman.md) for iterating over a query expression. Both tools are interactive, which means you can quickly step through multiple variants of a term and evaluate the responses that come back.
53+
54+
When results are ambiguous, [hit highlighting](search-pagination-page-layout.md#hit-highlighting) can help you identify the match in the response.
55+
56+
> [!Important]
57+
> This technique works best for focused testing on fuzzy search itself. If your index has scoring profiles, or if you combine fuzzy search with additional syntax, hit highlighting might not work in those situations.
4558
46-
When results are ambiguous, [hit highlighting](search-pagination-page-layout.md#hit-highlighting) can help you identify the match in the response. For example, assume you have a document with this string: `"Description": "Test queries with special characters, plus strings for MSFT, SQL and Java."`
59+
Assume the following string exists in a `"Description"` field in a search document: `"Test queries with special characters, plus strings for MSFT, SQL and Java."`
4760

4861
Start with a fuzzy search on "special" and add hit highlighting to the Description field:
4962

@@ -75,7 +88,7 @@ Notice that the same response is returned, but now instead of matching on "speci
7588

7689
"@search.score": 0.4232868,
7790
"@search.highlights": {
78-
"longerText": [
91+
"Description": [
7992
"Mix of special characters, plus strings for MSFT, <em>SQL</em>, 2019, Linux, Java."
8093
]
8194

0 commit comments

Comments
 (0)