Skip to content

Commit 1e0ec3b

Browse files
committed
last edits from Ishan
1 parent e4e3686 commit 1e0ec3b

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

articles/search/search-query-fuzzy.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,11 @@ Azure Cognitive Search supports fuzzy search, a type of query that compensates f
1616

1717
## What is fuzzy search?
1818

19-
It's an expansion exercise that produces a match on terms having a similar composition. When a fuzzy search is specified, the engine builds a graph of similarly composed terms, for all whole terms in the query. For example, if your query includes three terms "university of washington", a graph is created for each term (`search=university~ of~ washington~`).
19+
It's an expansion exercise that produces a match on terms having a similar composition. When a fuzzy search is specified, the engine builds a graph (based on [deterministic finite automaton theory](https://en.wikipedia.org/wiki/Deterministic_finite_automaton)) of similarly composed terms, for all whole terms in the query. For example, if your query includes three terms "university of washington", a graph is created for every term in the query `search=university~ of~ washington~` (there is no stop-word removal in fuzzy search, so "of" gets a graph).
2020

2121
The graph consists of up to 50 expansions, or permutations, of each term, capturing both correct and incorrect variants in the process. The engine then returns the topmost relevant matches in the response.
2222

23-
For a term like "university", the graph might have "unversty, universty, university, universe, inverse". Any documents that match on those in the graph are included in results. In contrast with language analyzers that can handle irregularities between singular and plural forms of the same word ("mice" and "mouse"), the comparisons in a fuzzy query are taken at face value with no attempt at reconciling the semantic differences. "Universe" and "inverse" will match because the character discrepancies are small.
23+
For a term like "university", the graph might have "unversty, universty, university, universe, inverse". Any documents that match on those in the graph are included in results. In contrast with other queries that analyze the text to handle different forms of the same word ("mice" and "mouse"), the comparisons in a fuzzy query are taken at face value without any linguistic analysis on the text. "Universe" and "inverse", which are semantically different, will match because the syntactic discrepancies are small.
2424

2525
A match succeeds if the discrepancies are limited to two or fewer edits, where an edit is an inserted, deleted, substituted, or transposed character. The string correction algorithm that specifies the differential is the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) metric, described as the "minimum number of operations (insertions, deletions, substitutions, or transpositions of two adjacent characters) required to change one word into the other".
2626

@@ -37,7 +37,7 @@ In Azure Cognitive Search:
3737
Collectively, the graphs are submitted as match criteria against tokens in the index. As you can imagine, fuzzy search is inherently slower than other query forms. The size and complexity of your index can determine whether the benefits are enough to offset the latency of the response.
3838

3939
> [!NOTE]
40-
> Because fuzzy search tends to be slow, it might be worthwhile to investigate alternatives such as n-gram indexing, with its progression of short character sequences (two and three character sequences for bigram and trigram tokens). Depending on your language and query surface, n-gram might give you better performance.
40+
> Because fuzzy search tends to be slow, it might be worthwhile to investigate alternatives such as n-gram indexing, with its progression of short character sequences (two and three character sequences for bigram and trigram tokens). Depending on your language and query surface, n-gram might give you better performance. The trade off is that n-gram indexing is very storage intensive and generates much bigger indexes.
4141
>
4242
> Another alternative, which you could consider if you want to handle just the most egregious cases, would be a [synonym map](search-synonyms.md). For example, mapping "search" to "serach, serch, sarch", or "retrieve" to "retreive".
4343
@@ -90,7 +90,7 @@ In the response, because you added hit highlighting, formatting is applied to "s
9090
"Test queries with <em>special</em> characters, plus strings for MSFT, SQL and Java."
9191
]
9292

93-
Try the request again, misspelling "special" by taking out letters several letters ("pe"):
93+
Try the request again, misspelling "special" by taking out several letters ("pe"):
9494

9595
search=scial~&highlight=Description
9696

0 commit comments

Comments
 (0)