Merge pull request #150309 from HeidiSteen/heidist-work

jborsecnik · web-flow · commit fb774c36dd26 · 2021-03-18T13:31:04.000-07:00
token fix part 2
diff --git a/articles/search/semantic-answers.md b/articles/search/semantic-answers.md
@@ -59,7 +59,7 @@ The "searchFields" parameter is critical to returning a high quality answer, bot
 
 + A query string must not be null and should be formulated as question. In this preview, the "queryType" and "queryLanguage" must be set exactly as shown in the example.
 
-+ The "searchFields" parameter determines which fields provide tokens to the extraction model. Be sure to set this parameter. You must have at least one string field, but include any string field that you think is useful in providing an answer. Only about 8,000 tokens per document are passed into the model. Start the field list with concise fields, and then progress to text-rich fields. For precise guidance on how to set this field, see [Set searchFields](semantic-how-to-query-request.md#searchfields).
++ The "searchFields" parameter determines which fields provide tokens to the extraction model. Be sure to set this parameter. You must have at least one string field, but include any string field that you think is useful in providing an answer. Collectively across all fields in searchFields, only about 8,000 tokens per document are passed into the model. Start the field list with concise fields, and then progress to text-rich fields. For precise guidance on how to set this field, see [Set searchFields](semantic-how-to-query-request.md#searchfields).
 
 + For "answers", the basic parameter construction is `"answers": "extractive"`, where the default number of answers returned is one. You can increase the number of answers by adding a count, up to a maximum of five.  Whether you need more than one answer depends on the user experience of your app, and how you want to render results.
 
@@ -111,15 +111,15 @@ Given the query "how do clouds form", the following answer is returned in the re
 
 For best results, return semantic answers on a document corpus having the following characteristics:
 
-+ "searchFields" should include one or more fields that provides sufficient text in which an answer is likely to be found.
-
-+ Semantic extraction and summarization have limits over how much content can be analyzed in a timely fashion. Collectively, only the first 20,000 tokens are analyzed. Anything beyond that is ignored. In practical terms, if you have large documents that run into hundreds of pages, you should try to break the content up into manageable parts first.
++ "searchFields" must provide fields that offer sufficient text in which an answer is likely to be found. Only verbatim text from a document can be appear as an answer.
 
 + query strings must not be null (search=`*`) and the string should have the characteristics of a question, as opposed to a keyword search (a sequential list of arbitrary terms or phrases). If the query string does not appear to be answer, answer processing is skipped, even if the request specifies "answers" as a query parameter.
 
++ Semantic extraction and summarization have limits over how many tokens per document can be analyzed in a timely fashion. In practical terms, if you have large documents that run into hundreds of pages, you should try to break the content up into smaller documents first.
+
 ## Next steps
 
 + [Semantic search overview](semantic-search-overview.md)
 + [Semantic ranking algorithm](semantic-ranking.md)
-+ [Similarity algorithm](index-ranking-similarity.md)
++ [Similarity ranking algorithm](index-ranking-similarity.md)
 + [Create a semantic query](semantic-how-to-query-request.md)
diff --git a/articles/search/semantic-how-to-query-request.md b/articles/search/semantic-how-to-query-request.md
@@ -136,7 +136,7 @@ Follow these guidelines to ensure optimum results when two or more searchFields
 
 + Follow those fields by descriptive fields where the answer to semantic queries may be found, such as the main content of a document.
 
-If only one field specified, use a descriptive field where the answer to semantic queries may be found, such as the main content of a document. Choose a field that provides sufficient content. To ensure timely processing, only about 8,000 tokens of the collective contents of searchFields undergo semantic evaluation and ranking.
+If only one field specified, use a descriptive field where the answer to semantic queries may be found, such as the main content of a document. Choose a field that provides sufficient content. To ensure timely processing, only about 8,000 tokens of the aggregate contents of searchFields undergo semantic evaluation and ranking.
 
 #### Step 3: Remove orderBy clauses
 
@@ -186,7 +186,7 @@ The response for the above example query returns the following match as the top
 Recall that semantic ranking and responses are built over an initial result set. Any logic that improves the quality of the initial results will carry forward to semantic search. As a next step, review the features that contribute to initial results, including analyzers that affect how strings are tokenized, scoring profiles that can tune results, and the default relevance algorithm.
 
 + [Analyzers for text processing](search-analyzers.md)
-+ [Similarity and scoring in Cognitive Search](index-similarity-and-scoring.md)
-+ [Add scoring profiles](index-add-scoring-profiles.md)
++ [Similarity ranking algorithm](index-similarity-and-scoring.md)
++ [Scoring profiles](index-add-scoring-profiles.md)
 + [Semantic search overview](semantic-search-overview.md)
-+ [Add spell check to query terms](speller-how-to-add.md)
++ [Semantic ranking algorithm](semantic-ranking.md)
diff --git a/articles/search/semantic-ranking.md b/articles/search/semantic-ranking.md
@@ -24,25 +24,35 @@ The semantic ranking is both resource and time intensive. In order to complete p
 
 For semantic ranking, the model uses both machine reading comprehension and transfer learning to re-score the documents based on how well each one matches the intent of the query.
 
-1. For each document, the semantic ranker evaluates the fields in the searchFields parameter in order, consolidating the contents into one large string.
+### Preparation (passage extraction) phase
 
-1. The string is then trimmed to ensure the overall length is not more than 8,000 tokens. If you have very large documents, with a content field or merged_content field that has many pages of content, anything after the token limit is ignored.
+For each document in the initial results, there is a passage extraction exercise that identifies key passages. This is a downsizing exercise that reduces content to an amount that can be processed swiftly.
 
-1. Each of the 50 documents is now represented by a single long string. This string is sent to the summarization model. The summarization model produces captions (and answers), using machine reading comprehension to identify passages that appear to summarize the content or answer the question. The output of the summarization model is a further reduced string, which be at most 128 tokens.
+1. For each of the 50 documents, each field in the searchFields parameter is evaluated in consecutive order. Contents from each field are consolidated into one long string. 
 
-1. The smaller string becomes the caption of the document, and it represents the most relevant passages found in the larger string. The set of 50 (or fewer) captions is then ranked in order relevance. 
+1. The long string is then trimmed to ensure the overall length is not more than 8,000 tokens. For this reason, it's recommended that you position concise fields first so that they are included in the string. If you have very large documents with text-heavy fields, anything after the token limit is ignored.
 
-Conceptual and semantic relevance is established through vector representation and term clusters. Whereas a keyword similarity algorithm might give equal weight to any term in the query, the semantic model has been trained to recognize the interdependency and relationships among words that are otherwise unrelated on the surface. As a result, if a query string includes terms from the same cluster, a document containing both will rank higher than one that doesn't.
+1. Each document is now represented by a single long string that is up to 8,000 tokens. These strings are sent to the summarization model, which will reduce the string further. The summarization model evaluates the long string for key sentences or passages that best summarize the document or answer the question.
 
-:::image type="content" source="media/semantic-search-overview/semantic-vector-representation.png" alt-text="Vector representation for context" border="true":::
+1. The output of this phase is a caption (and optionally, an answer). The caption is at most 128 tokens per document, and it is considered the most representative of the document.
 
-## Next steps
+### Scoring and ranking phases
+
+In this phase, all 50 captions are evaluated to assess relevance.
+
+1. Scoring is determined by evaluating each caption for conceptual and semantic relevance, relative to the query provided.
+
+   The following diagram provides an illustration of what "semantic relevance" means. Consider the term "capital", which could be used in the context of finance, law, geography, or grammar. If a query includes terms from the same vector space (for example, "capital" and "investment"), a document that also includes tokens in the same cluster will score higher than one that doesn't.
 
-Semantic ranking is offered on Standard tiers, in specific regions. For more information and to sign up, see [Availability and pricing](semantic-search-overview.md#availability-and-pricing).
+   :::image type="content" source="media/semantic-search-overview/semantic-vector-representation.png" alt-text="Vector representation for context" border="true":::
+
+1. The output of this phase is an @search.rerankerScore assigned to each document. Once all documents are scored, they are listed in descending order and included in the query response payload.
+
+## Next steps
 
-A new query type enables the relevance ranking and response structures of semantic search. [Create a semantic query](semantic-how-to-query-request.md) to get started.
+Semantic ranking is offered on Standard tiers, in specific regions. For more information and to sign up, see [Availability and pricing](semantic-search-overview.md#availability-and-pricing). A new query type enables the relevance ranking and response structures of semantic search. To get started, [Create a semantic query](semantic-how-to-query-request.md).
 
 Alternatively, review either of the following articles for related information.
 
-+ [Add spell check to query terms](speller-how-to-add.md)
++ [Semantic search overview](semantic-search-overview.md)
 + [Return a semantic answer](semantic-answers.md)
diff --git a/articles/search/semantic-search-overview.md b/articles/search/semantic-search-overview.md
@@ -8,13 +8,13 @@ author: HeidiSteen
 ms.author: heidist
 ms.service: cognitive-search
 ms.topic: conceptual
-ms.date: 03/12/2021
+ms.date: 03/18/2021
 ms.custom: references_regions
 ---
 # Semantic search in Azure Cognitive Search
 
 > [!IMPORTANT]
-> Semantic search features are in public preview, available through the preview REST API only. Preview features are offered as-is, under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/), and are not guaranteed to have the same implementation at general availability. For more information, see [Availability and pricing](semantic-search-overview.md#availability-and-pricing).
+> Semantic search is in public preview, available through the preview REST API only. Preview features are offered as-is, under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/), and are not guaranteed to have the same implementation at general availability. These features are billable. For more information, see [Availability and pricing](semantic-search-overview.md#availability-and-pricing).
 
 Semantic search is a collection of query-related features that support a higher-quality, more natural query experience. 
 
@@ -67,4 +67,6 @@ A new query type enables the relevance ranking and response structures of semant
 
 + [Add spell check to query terms](speller-how-to-add.md)
 + [Return a semantic answer](semantic-answers.md)
-+ [Semantic ranking](semantic-ranking.md)
++ [Semantic ranking](semantic-ranking.md)
++ [Introducing semantic search (blog post)](https://techcommunity.microsoft.com/t5/azure-ai/introducing-semantic-search-bringing-more-meaningful-results-to/ba-p/2175636)
++ [Find meaningful insights using semantic capabilities (AI Show video)](https://channel9.msdn.com/Shows/AI-Show/Find-meaningful-insights-using-semantic-capabilities-in-Azure-Cognitive-Search)