Skip to content

Commit ce02aae

Browse files
authored
Merge pull request #202499 from HeidiSteen/heidist-support-case
[azure search] Updates on fuzzy search per support case
2 parents 5ba4759 + d3eb068 commit ce02aae

File tree

1 file changed

+29
-18
lines changed

1 file changed

+29
-18
lines changed

articles/search/search-query-fuzzy.md

Lines changed: 29 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,41 @@
11
---
22
title: Fuzzy search
33
titleSuffix: Azure Cognitive Search
4-
description: Implement a "did you mean" search experience to auto-correct a misspelled term or typo.
4+
description: Implement a fuzzy search query for a "did you mean" search experience. Fuzzy search will auto-correct a misspelled term or typo on the query.
55

66
manager: nitinme
77
author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
10-
ms.topic: conceptual
11-
ms.date: 03/03/2021
10+
ms.topic: how-to
11+
ms.date: 06/22/2022
1212
---
1313
# Fuzzy search to correct misspellings and typos
1414

1515
Azure Cognitive Search supports fuzzy search, a type of query that compensates for typos and misspelled terms in the input string. It does this by scanning for terms having a similar composition. Expanding search to cover near-matches has the effect of auto-correcting a typo when the discrepancy is just a few misplaced characters.
1616

1717
## What is fuzzy search?
1818

19-
It's an expansion exercise that produces a match on terms having a similar composition. When a fuzzy search is specified, the engine builds a graph (based on [deterministic finite automaton theory](https://en.wikipedia.org/wiki/Deterministic_finite_automaton)) of similarly composed terms, for all whole terms in the query. For example, if your query includes three terms "university of washington", a graph is created for every term in the query `search=university~ of~ washington~` (there is no stop-word removal in fuzzy search, so "of" gets a graph).
19+
It's a query expansion exercise that produces a match on terms having a similar composition. When a fuzzy search is specified, the search engine builds a graph (based on [deterministic finite automaton theory](https://en.wikipedia.org/wiki/Deterministic_finite_automaton)) of similarly composed terms, for all whole terms in the query. For example, if your query includes three terms "university of washington", a graph is created for every term in the query `search=university~ of~ washington~` (there's no stop-word removal in fuzzy search, so "of" gets a graph).
2020

2121
The graph consists of up to 50 expansions, or permutations, of each term, capturing both correct and incorrect variants in the process. The engine then returns the topmost relevant matches in the response.
2222

2323
For a term like "university", the graph might have "unversty, universty, university, universe, inverse". Any documents that match on those in the graph are included in results. In contrast with other queries that analyze the text to handle different forms of the same word ("mice" and "mouse"), the comparisons in a fuzzy query are taken at face value without any linguistic analysis on the text. "Universe" and "inverse", which are semantically different, will match because the syntactic discrepancies are small.
2424

25-
A match succeeds if the discrepancies are limited to two or fewer edits, where an edit is an inserted, deleted, substituted, or transposed character. The string correction algorithm that specifies the differential is the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) metric, described as the "minimum number of operations (insertions, deletions, substitutions, or transpositions of two adjacent characters) required to change one word into the other".
25+
A match succeeds if the discrepancies are limited to two or fewer edits, where an edit is an inserted, deleted, substituted, or transposed character. The string correction algorithm that specifies the differential is the [Damerau-Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) metric. It's described as the "minimum number of operations (insertions, deletions, substitutions, or transpositions of two adjacent characters) required to change one word into the other".
2626

2727
In Azure Cognitive Search:
2828

2929
+ Fuzzy query applies to whole terms, but you can support phrases through AND constructions. For example, "Unviersty~ of~ "Wshington~" would match on "University of Washington".
3030

3131
+ The default distance of an edit is 2. A value of `~0` signifies no expansion (only the exact term is considered a match), but you could specify `~1` for one degree of difference, or one edit.
3232

33-
+ A fuzzy query can expand a term up to 50 additional permutations. This limit is not configurable, but you can effectively reduce the number of expansions by decreasing the edit distance to 1.
33+
+ A fuzzy query can expand a term up to 50 permutations. This limit isn't configurable, but you can effectively reduce the number of expansions by decreasing the edit distance to 1.
3434

3535
+ Responses consist of documents containing a relevant match (up to 50).
3636

37+
During query processing, fuzzy queries don't undergo [lexical analysis](search-lucene-query-architecture.md#stage-2-lexical-analysis). The query input is added directly to the query tree and expanded to create a graph of terms. The only transformation performed is lower casing.
38+
3739
Collectively, the graphs are submitted as match criteria against tokens in the index. As you can imagine, fuzzy search is inherently slower than other query forms. The size and complexity of your index can determine whether the benefits are enough to offset the latency of the response.
3840

3941
> [!NOTE]
@@ -43,37 +45,46 @@ Collectively, the graphs are submitted as match criteria against tokens in the i
4345
4446
## Indexing for fuzzy search
4547

46-
Analyzers are not used during query processing to create an expansion graph, but that doesn't mean analyzers should be ignored in fuzzy search scenarios. After all, analyzers are used during indexing to create tokens against which matching is done, whether the query is free form, filtered search, or a fuzzy search with a graph as input.
48+
Make sure the index includes text fields that are conducive to fuzzy search, such as names, categories, descriptions, or tags.
4749

48-
Generally, when assigning analyzers on a per-field basis, the decision to fine-tune the analysis chain is based on the primary use case (a filter or full text search) rather than specialized query forms like fuzzy search. For this reason, there is not a specific analyzer recommendation for fuzzy search.
50+
Analyzers aren't used to create an expansion graph, but that doesn't mean analyzers should be ignored in fuzzy search scenarios. Analyzers are important for tokenization during indexing, where tokens are used for both full text search and for matching against the graph.
4951

50-
However, if test queries are not producing the matches you expect, you could try varying the indexing analyzer, setting it to a [language analyzer](index-add-language-analyzers.md), to see if you get better results. Some languages, particularly those with vowel mutations, can benefit from the inflection and irregular word forms generated by the Microsoft natural language processors. In some cases, using the right language analyzer can make a difference in whether a term is tokenized in a way that is compatible with the value provided by the user.
52+
As always, if test queries aren't producing the matches you expect, you could try varying the indexing analyzer, setting it to a [language analyzer](index-add-language-analyzers.md), to see if you get better results. Some languages, particularly those with vowel mutations, can benefit from the inflection and irregular word forms generated by the Microsoft natural language processors. In some cases, using the right language analyzer can make a difference in whether a term is tokenized in a way that is compatible with the value provided by the user.
5153

5254
## How to use fuzzy search
5355

54-
Fuzzy queries are constructed using the full Lucene query syntax, invoking the [Lucene query parser](https://lucene.apache.org/core/6_6_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html).
56+
Fuzzy queries are constructed using the full Lucene query syntax, invoking the [full Lucene query parser](https://lucene.apache.org/core/6_6_1/queryparser/org/apache/lucene/queryparser/classic/package-summary.html).
57+
58+
```http
59+
POST https://[service name].search.windows.net/indexes/hotels-sample-index/docs/search?api-version=2020-06-30
60+
{
61+
"search": "seatle~2",
62+
"queryType": "full",
63+
"searchMode": "any",
64+
"searchFields": "HotelName, Address/City",
65+
"select": "HotelName, Address/City,",
66+
"count": "true"
67+
}
68+
```
5569

56-
1. Set the full Lucene parser on the query (`queryType=full`).
70+
1. Set the query type to the full Lucene syntax (`queryType=full`).
5771

5872
1. Optionally, scope the request to specific fields, using this parameter (`searchFields=<field1,field2>`).
5973

60-
1. Append the tilde (`~`) operator at the end of the whole term (`search=<string>~`).
74+
1. Provide the query string. An expansion graph will be created for every term in the query input. Append the tilde (`~`) operator at the end of each whole term (`search=<string>~`).
6175

6276
Include an optional parameter, a number between 0 and 2 (default), if you want to specify the edit distance (`~1`). For example, "blue~" or "blue~1" would return "blue", "blues", and "glue".
6377

64-
In Azure Cognitive Search, besides the term and distance (maximum of 2), there are no additional parameters to set on the query.
65-
66-
> [!NOTE]
67-
> During query processing, fuzzy queries do not undergo [lexical analysis](search-lucene-query-architecture.md#stage-2-lexical-analysis). The query input is added directly to the query tree and expanded to create a graph of terms. The only transformation performed is lower casing.
78+
In Azure Cognitive Search, besides the term and distance (maximum of 2), there are no other parameters to set on the query.
6879

6980
## Testing fuzzy search
7081

7182
For simple testing, we recommend [Search explorer](search-explorer.md) or [Postman](search-get-started-rest.md) for iterating over a query expression. Both tools are interactive, which means you can quickly step through multiple variants of a term and evaluate the responses that come back.
7283

7384
When results are ambiguous, [hit highlighting](search-pagination-page-layout.md#hit-highlighting) can help you identify the match in the response.
7485

75-
> [!Note]
76-
> The use of hit highlighting to identify fuzzy matches has limitations and only works for basic fuzzy search. If your index has scoring profiles, or if you layer the query with additional syntax, hit highlighting might fail to identify the match.
86+
> [!NOTE]
87+
> The use of hit highlighting to identify fuzzy matches has limitations and only works for basic fuzzy search. If your index has scoring profiles, or if you layer the query with more syntax, hit highlighting might fail to identify the match.
7788
7889
### Example 1: fuzzy search with the exact term
7990

0 commit comments

Comments
 (0)