Skip to content

Commit 709f91d

Browse files
authored
Merge pull request #178378 from mrbullwinkle/mrb_11_2_2021_updates_004
[Cognitive Services] Q&A Updates
2 parents 3620f6a + 7c7da61 commit 709f91d

File tree

3 files changed

+287
-0
lines changed

3 files changed

+287
-0
lines changed
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
---
2+
title: Limits and boundaries - question answering
3+
description: Question answering has meta-limits for parts of the knowledge base and service. It is important to keep your knowledge base within those limits in order to test and publish.
4+
ms.service: cognitive-services
5+
ms.subservice: language-service
6+
author: mrbullwinkle
7+
ms.author: mbullwin
8+
ms.topic: conceptual
9+
ms.date: 11/02/2021
10+
---
11+
12+
# Project limits and boundaries
13+
14+
Question answering limits provided below are a combination of the [Azure Cognitive Search pricing tier limits](../../../../search/search-limits-quotas-capacity.md) and question answering limits. Both sets of limits affect how many knowledge bases you can create per resource and how large each knowledge base can grow.
15+
16+
## Knowledge bases
17+
18+
The maximum number of knowledge bases is based on [Azure Cognitive Search tier limits](../../../../search/search-limits-quotas-capacity.md).
19+
20+
|**Azure Cognitive Search tier** | **Free** | **Basic** |**S1** | **S2**| **S3** |**S3 HD**|
21+
|---|---|---|---|---|---|----|
22+
|Maximum number of published knowledge bases allowed|2|14|49|199|199|2,999|
23+
24+
For example, if your tier has 15 allowed indexes, you can publish 14 knowledge bases (one index per published knowledge base). The 15th index, `testkb`, is used for all the knowledge bases for authoring and testing.
25+
26+
## Extraction limits
27+
28+
### File naming constraints
29+
30+
File names may not include the following characters:
31+
32+
|Do not use character|
33+
|--|
34+
|Single quote `'`|
35+
|Double quote `"`|
36+
37+
### Maximum file size
38+
39+
|Format|Max file size (MB)|
40+
|--|--|
41+
|`.docx`|10|
42+
|`.pdf`|25|
43+
|`.tsv`|10|
44+
|`.txt`|10|
45+
|`.xlsx`|3|
46+
47+
### Maximum number of files
48+
49+
> [!NOTE]
50+
> Question answering currently has no limits on the number of sources that can be added. Throughput is currently capped at 10 transactions per second for both management APIs and prediction APIs.
51+
52+
### Maximum number of deep-links from URL
53+
54+
The maximum number of deep-links that can be crawled for extraction of question answer pairs from a URL page is **20**.
55+
56+
## Metadata limits
57+
58+
Metadata is presented as a text-based `key:value` pair, such as `product:windows 10`. It is stored and compared in lower case. Maximum number of metadata fields is based on your **[Azure Cognitive Search tier limits](../../../../search/search-limits-quotas-capacity.md)**.
59+
60+
If you choose to projects with multiple languages in a single language resource, there is a dedicated test index per project/knowledge base. So the limit is applied per project/knowledge base in the language service.
61+
62+
|**Azure Cognitive Search tier** | **Free** | **Basic** |**S1** | **S2**| **S3** |**S3 HD**|
63+
|---|---|---|---|---|---|----|
64+
|Maximum metadata fields per language service (per knowledge base)|1,000|100*|1,000|1,000|1,000|1,000|
65+
66+
If you don't choose the option to have projects with multiple different languages, then the limits are applied across all knowledge bases in the language service.
67+
68+
|**Azure Cognitive Search tier** | **Free** | **Basic** |**S1** | **S2**| **S3** |**S3 HD**|
69+
|---|---|---|---|---|---|----|
70+
|Maximum metadata fields per Language service (across all knowledge bases)|1,000|100*|1,000|1,000|1,000|1,000|
71+
72+
### By name and value
73+
74+
The length and acceptable characters for metadata name and value are listed in the following table.
75+
76+
|Item|Allowed chars|Regex pattern match|Max chars|
77+
|--|--|--|--|
78+
|Name (key)|Allows<br>Alphanumeric (letters and digits)<br>`_` (underscore)<br> Must not contain spaces.|`^[a-zA-Z0-9_]+$`|100|
79+
|Value|Allows everything except<br>`:` (colon)<br>`|` (vertical pipe)<br>Only one value allowed.|`^[^:|]+$`|500|
80+
|||||
81+
82+
## Knowledge base content limits
83+
Overall limits on the content in the knowledge base:
84+
* Length of answer text: 25,000 characters
85+
* Length of question text: 1,000 characters
86+
* Length of metadata key text: 100 characters
87+
* Length of metadata value text: 500 characters
88+
* Supported characters for metadata name: Alphabets, digits, and `_`
89+
* Supported characters for metadata value: All except `:` and `|`
90+
* Length of file name: 200
91+
* Supported file formats: ".tsv", ".pdf", ".txt", ".docx", ".xlsx".
92+
* Maximum number of alternate questions: 300
93+
* Maximum number of question-answer pairs: Depends on the **[Azure Cognitive Search tier](../../../../search/search-limits-quotas-capacity.md#document-limits)** chosen. A question and answer pair maps to a document on Azure Cognitive Search index.
94+
* URL/HTML page: 1 million characters
95+
96+
## Create project call limits:
97+
98+
These represent the limits for each create project/knowledge base action; that is, selecting *Create new project* or calling the REST API to create a project/knowledge base.
99+
100+
* Recommended maximum number of alternate questions per answer: 300
101+
* Maximum number of URLs: 10
102+
* Maximum number of files: 10
103+
* Maximum number of QnAs permitted per call: 1000
104+
105+
## Update knowledge base call limits
106+
107+
These represent the limits for each update action; that is, selecting *Save* or calling the REST API with an update request.
108+
* Length of each source name: 300
109+
* Recommended maximum number of alternate questions added or deleted: 300
110+
* Maximum number of metadata fields added or deleted: 10
111+
* Maximum number of URLs that can be refreshed: 5
112+
* Maximum number of QnAs permitted per call: 1000
113+
114+
## Add unstructured file limits
115+
116+
> [!NOTE]
117+
> * If you need to use larger files than the limit allows, you can break the file into smaller files before sending them to the API.
118+
119+
These represent the limits when unstructured files are used to *Create new project* or call the REST API to create a knowledge base:
120+
* Length of file: We will extract first 32000 characters
121+
* Maximum three responses per file.
122+
123+
## Prebuilt question answering limits
124+
125+
> [!NOTE]
126+
> * If you need to use larger documents than the limit allows, you can break the text into smaller chunks of text before sending them to the API.
127+
> * A document is a single string of text characters.
128+
129+
These represent the limits when REST API is used to answer a question based without having to create a project/knowledge base:
130+
* Number of documents: 5
131+
* Maximum size of a single document: 5,120 characters
132+
* Maximum three responses per document.
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
title: Plan your app - question answering
3+
description: Learn how to plan your question answering app. Understand how question answering works and interacts with other Azure services and some knowledge base concepts.
4+
ms.service: cognitive-services
5+
ms.subservice: language-service
6+
author: mrbullwinkle
7+
ms.author: mbullwin
8+
ms.topic: conceptual
9+
ms.date: 11/02/2021
10+
---
11+
12+
# Plan your question answering app
13+
14+
To plan your question answering app, you need to understand how question answering works and interacts with other Azure services. You should also have a solid grasp of knowledge base concepts.
15+
16+
## Azure resources
17+
18+
Each [Azure resource](azure-resources.md#resource-purposes) created with question answering has a specific purpose. Each resource has its own purpose, limits, and [pricing tier](azure-resources.md#pricing-tier-considerations). It's important to understand the function of these resources so that you can use that knowledge into your planning process.
19+
20+
| Resource | Purpose |
21+
|--|--|
22+
| [Language resource](azure-resources.md) resource | Authoring, query prediction endpoint and telemetry|
23+
| [Cognitive Search](azure-resources.md#azure-cognitive-search-resource) resource | Data storage and search |
24+
25+
### Resource planning
26+
27+
Question answering throughput is currently capped at 10 transactions per second for both management APIs and prediction APIs. To target 10 transactions per second for your service, we recommend the S1 (one instance) SKU of Azure Cognitive Search.
28+
29+
### Language resource
30+
31+
A single language resource with the custom question answering feature enabled can host more than one project/knowledge base. The number of projects/knowledge bases is determined by the Cognitive Search pricing tier's quantity of supported indexes. Learn more about the [relationship of indexes to knowledge bases](azure-resources.md#index-usage).
32+
33+
### Knowledge base size and throughput
34+
35+
When you build a real app, plan sufficient resources for the size of your knowledge base and for your expected query prediction requests.
36+
37+
A knowledge base size is controlled by the:
38+
* [Cognitive Search resource](../../../../search/search-limits-quotas-capacity.md) pricing tier limits
39+
* [Question answering limits](./limits.md)
40+
41+
The knowledge base query prediction request is controlled by the web app plan and web app. Refer to [recommended settings](azure-resources.md#recommended-settings) to plan your pricing tier.
42+
43+
### Understand the impact of resource selection
44+
45+
Proper resource selection means your knowledge base answers query predictions successfully.
46+
47+
If your knowledge base isn't functioning properly, it's typically an issue of improper resource management.
48+
49+
Improper resource selection requires investigation to determine which [resource needs to change](azure-resources.md#pricing-tier-considerations).
50+
51+
## Project
52+
53+
A project/knowledge base is directly tied its language resource. It holds the question and answer (QnA) pairs that are used to answer query prediction requests.
54+
55+
### Language considerations
56+
57+
You can now have projects in different languages within the same language resource where the custom question answering feature is enabled. When you create the first project, you can choose whether you want to use the resource for projects/knowledge bases in a single language that will apply to all subsequent projects or make a language selection each time a project is created.
58+
59+
### Ingest data sources
60+
61+
Question answering also supports unstructured content. You can upload a file that has unstructured content.
62+
63+
Currently we do not support URLs for unstructured content.
64+
65+
The ingestion process converts supported content types to markdown. All further editing of the *answer* is done with markdown. After you create a knowledge base, you can edit QnA pairs in the Language Studio portal with rich text authoring.
66+
67+
### Data format considerations
68+
69+
Because the final format of a QnA pair is markdown, it's important to understand markdown support.
70+
71+
### Bot personality
72+
73+
Add a bot personality to your project/knowledge base with [chit-chat](../how-to/chit-chat.md). This personality comes through with answers provided in a certain conversational tone such as *professional* and *friendly*. This chit-chat is provided as a conversational set, which you have total control to add, edit, and remove.
74+
75+
A bot personality is recommended if your bot connects to your knowledge base. You can choose to use chit-chat in your knowledge base even if you also connect to other services, but you should review how the bot service interacts to know if that is the correct architectural design for your use.
76+
77+
### Conversation flow with a project
78+
79+
Conversation flow usually begins with a salutation from a user, such as `Hi` or `Hello`. Your knowledge base can answer with a general answer, such as `Hi, how can I help you`, and it can also provide a selection of follow-up prompts to continue the conversation.
80+
81+
You should design your conversational flow with a loop in mind so that a user knows how to use your bot and isn't abandoned by the bot in the conversation. [Follow-up prompts](../tutorials/guided-conversations.md) provide linking between QnA pairs, which allow for the conversational flow.
82+
83+
### Authoring with collaborators
84+
85+
Collaborators may be other developers who share the full development stack of the knowledge base application or may be limited to just authoring the knowledge base.
86+
87+
Knowledge base authoring supports several role-based access permissions you apply in the Azure portal to limit the scope of a collaborator's abilities.
88+
89+
## Integration with client applications
90+
91+
Integration with client applications is accomplished by sending a query to the prediction runtime endpoint. A query is sent to your specific project/knowledge base with an SDK or REST-based request to your question answering web app endpoint.
92+
93+
To authenticate a client request correctly, the client application must send the correct credentials and knowledge base ID. If you're using an Azure Bot Service, configure these settings as part of the bot configuration in the Azure portal.
94+
95+
### Conversation flow in a client application
96+
97+
Conversation flow in a client application, such as an Azure bot, may require functionality before and after interacting with the knowledge base.
98+
99+
Does your client application support conversation flow, either by providing alternate means to handle follow-up prompts or including chit-chit? If so, design these early and make sure the client application query is handled correctly by another service or when sent to your knowledge base.
100+
101+
### Active learning from a client application
102+
103+
Question answering uses _active learning_ to improve your knowledge base by suggesting alternate questions to an answer. The client application is responsible for a part of this [active learning](../tutorials/active-learning.md). Through conversational prompts, the client application can determine that the knowledge base returned an answer that's not useful to the user, and it can determine a better answer. The client application needs to send that information back to the knowledge base to improve the prediction quality.
104+
105+
### Providing a default answer
106+
107+
If your knowledge base doesn't find an answer, it returns the _default answer_. This answer is configurable on the **Settings** page.).
108+
109+
This default answer is different from the Azure bot default answer. You configure the default answer for your Azure bot in the Azure portal as part of configuration settings. It's returned when the score threshold isn't met.
110+
111+
## Prediction
112+
113+
The prediction is the response from your knowledge base, and it includes more information than just the answer. To get a query prediction response, use the question answering API.
114+
115+
### Prediction score fluctuations
116+
117+
A score can change based on several factors:
118+
119+
* Number of answers you requested in response with the `top` property
120+
* Variety of available alternate questions
121+
* Filtering for metadata
122+
* Query sent to `test` or `production` project/knowledge base.
123+
124+
### Analytics with Azure Monitor
125+
126+
In question answering, telemetry is offered through the [Azure Monitor service](../../../../azure-monitor/index.yml). Use our [top queries](../how-to/analytics.md) to understand your metrics.
127+
128+
## Development lifecycle
129+
130+
The development lifecycle of a knowledge base is ongoing: editing, testing, and publishing your knowledge base.
131+
132+
### Knowledge base development of question answer pairs
133+
134+
Your QnA pairs should be designed and developed based on your client application usage.
135+
136+
Each pair can contain:
137+
* Metadata - filterable when querying to allow you to tag your QnA pairs with additional information about the source, content, format, and purpose of your data.
138+
* Follow-up prompts - helps to determine a path through your knowledge base so the user arrives at the correct answer.
139+
* Alternate questions - important to allow search to match to your answer from different forms of the question. [Active learning suggestions](../tutorials/active-learning.md) turn into alternate questions.
140+
141+
### DevOps development
142+
143+
Developing a knowledge base to insert into a DevOps pipeline requires that the knowledge base is isolated during batch testing.
144+
145+
A knowledge base shares the Cognitive Search index with all other knowledge bases on the language resource. While the knowledge base is isolated by partition, sharing the index can cause a difference in the score when compared to the published knowledge base.
146+
147+
To have the _same score_ on the `test` and `production` knowledge bases, isolate a language resource to a single knowledge base. In this architecture, the resource only needs to live as long as the isolated batch test.
148+
149+
## Next steps
150+
151+
* [Azure resources](./azure-resources.md)

articles/cognitive-services/language-service/toc.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -515,12 +515,16 @@ items:
515515
items:
516516
- name: Resource planning
517517
href: question-answering/concepts/azure-resources.md
518+
- name: App planning
519+
href: question-answering/concepts/plan.md
518520
- name: Precise answering
519521
href: question-answering/concepts/precise-answering.md
520522
- name: Confidence score
521523
href: question-answering/concepts/confidence-score.md
522524
- name: Best practices
523525
href: question-answering/concepts/best-practices.md
526+
- name: Limits
527+
href: question-answering/concepts/limits.md
524528
- name: Tutorials
525529
items:
526530
- name: Create a FAQ Bot

0 commit comments

Comments
 (0)