Skip to content

Commit aa8b4c6

Browse files
committed
update migrate ref docs
1 parent 689e34c commit aa8b4c6

File tree

2 files changed

+192
-0
lines changed

2 files changed

+192
-0
lines changed
Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
---
2+
title: Import document format guidelines - question answering
3+
description: Use these guidelines for importing documents to get the best results for your content.
4+
ms.service: cognitive-services
5+
ms.subservice: language-service
6+
ms.author: mbullwin
7+
author: mrbullwinkle
8+
ms.topic: reference
9+
ms.date: 01/23/2022
10+
---
11+
12+
# Format guidelines for imported documents and URLs
13+
14+
Review these formatting guidelines to get the best results for your content.
15+
16+
## Formatting considerations
17+
18+
After importing a file or URL, question answering converts and stores your content in the [markdown format](https://en.wikipedia.org/wiki/Markdown). The conversion process adds new lines in the text, such as `\n\n`. A knowledge of the markdown format helps you to understand the converted content and manage your knowledge base content.
19+
20+
If you add or edit your content directly in your knowledge base, use **markdown formatting** to create rich text content or change the markdown format content that is already in the answer. Question answering supports much of the markdown format to bring rich text capabilities to your content. However, the client application, such as a chat bot may not support the same set of markdown formats. It is important to test the client application's display of answers.
21+
22+
## Basic document formatting
23+
24+
Question answering identifies sections and subsections and relationships in the file based on visual clues like:
25+
26+
* font size
27+
* font style
28+
* numbering
29+
* colors
30+
31+
> [!NOTE]
32+
> We don't support extraction of images from uploaded documents currently.
33+
34+
### Product manuals
35+
36+
A manual is typically guidance material that accompanies a product. It helps the user to set up, use, maintain, and troubleshoot the product. When question answering processes a manual, it extracts the headings and subheadings as questions and the subsequent content as answers. See an example [here](https://download.microsoft.com/download/2/9/B/29B20383-302C-4517-A006-B0186F04BE28/surface-pro-4-user-guide-EN.pdf).
37+
38+
Below is an example of a manual with an index page, and hierarchical content
39+
40+
> [!div class="mx-imgBorder"]
41+
> ![Product Manual example for a knowledge base](../../../qnamaker/media/qnamaker-concepts-datasources/product-manual.png)
42+
43+
> [!NOTE]
44+
> Extraction works best on manuals that have a table of contents and/or an index page, and a clear structure with hierarchical headings.
45+
46+
### Brochures, guidelines, papers, and other files
47+
48+
Many other types of documents can also be processed to generate question answer pairs, provided they have a clear structure and layout. These include: Brochures, guidelines, reports, white papers, scientific papers, policies, books, etc. See an example [here](https://qnamakerstore.blob.core.windows.net/qnamakerdata/docs/Manage%20Azure%20Blob%20Storage.docx).
49+
50+
Below is an example of a semi-structured doc, without an index:
51+
52+
> [!div class="mx-imgBorder"]
53+
> ![Azure Blob storage semi-structured Doc](../../../qnamaker/media/qnamaker-concepts-datasources/semi-structured-doc.png)
54+
55+
### Unstructured document support
56+
57+
Custom question answering now supports unstructured documents. A document that does not have its content organized in a well-defined hierarchical manner, is missing a set structure or has its content free flowing can be considered as an unstructured document.
58+
59+
Below is an example of an unstructured PDF document:
60+
61+
> [!div class="mx-imgBorder"]
62+
> ![Unstructured document example for a knowledge base](../../../qnamaker/media/qnamaker-concepts-datasources/unstructured-qna-pdf.png)
63+
64+
Currently this functionality is available only via document upload and only for PDF and DOC file formats.
65+
66+
> [!IMPORTANT]
67+
> Support for unstructured file/content is available only in question answering.
68+
69+
### Structured question answering document
70+
71+
The format for structured question-answers in DOC files, is in the form of alternating questions and answers per line, one question per line followed by its answer in the following line, as shown below:
72+
73+
```text
74+
Question1
75+
76+
Answer1
77+
78+
Question2
79+
80+
Answer2
81+
```
82+
83+
Below is an example of a structured question answering word document:
84+
85+
> [!div class="mx-imgBorder"]
86+
> ![Structured question answering document example for a knowledge base](../../../qnamaker/media/qnamaker-concepts-datasources/structured-qna-doc.png)
87+
88+
### Structured *TXT*, *TSV* and *XLS* Files
89+
90+
Question answering in the form of structured *.txt*, *.tsv* or *.xls* files can also be uploaded to question answering to create or augment a knowledge base. These can either be plain text, or can have content in RTF or HTML. Question answer pairs have an optional metadata field that can be used to group question answer pairs into categories.
91+
92+
| Question | Answer | Metadata (1 key: 1 value) |
93+
|-----------|---------|-------------------------|
94+
| Question1 | Answer1 | <code>Key1:Value1 &#124; Key2:Value2</code> |
95+
| Question2 | Answer2 | `Key:Value` |
96+
97+
Any additional columns in the source file are ignored.
98+
99+
#### Example of structured Excel file
100+
101+
Below is an example of a structured question and answer *.xls* file, with HTML content:
102+
103+
> [!div class="mx-imgBorder"]
104+
> ![Structured question answering excel example for a knowledge base](../../../qnamaker/media/qnamaker-concepts-datasources/structured-qna-xls.png)
105+
106+
#### Example of alternate questions for single answer in Excel file
107+
108+
Below is an example of a structured question answer *.xls* file, with several alternate questions for a single answer:
109+
110+
> [!div class="mx-imgBorder"]
111+
> ![Example of alternate questions for single answer in Excel file](../../../qnamaker/media/qnamaker-concepts-datasources/xls-alternate-question-example.png)
112+
113+
After the file is imported, the question-and-answer pair is in the knowledge base as shown below:
114+
115+
> [!div class="mx-imgBorder"]
116+
> ![Screenshot of alternate questions for single answer imported into knowledge base](../../../qnamaker/media/qnamaker-concepts-datasources/xls-alternate-question-example-after-import.png)
117+
118+
### Structured data format through import
119+
120+
Importing a knowledge base replaces the content of the existing knowledge base. Import requires a structured .tsv file that contains data source information. This information helps group the question-answer pairs and attribute them to a particular data source. [Question answer pairs](./How-To/edit-knowledge-base.md#question-and-answer-pairs) have an optional metadata field that can be used to group question answer pairs into categories.
121+
122+
| Question | Answer | Source| Metadata (1 key: 1 value) |
123+
|-----------|---------|----|---------------------|
124+
| Question1 | Answer1 | Url1 | <code>Key1:Value1 &#124; Key2:Value2</code> |
125+
| Question2 | Answer2 | Editorial| `Key:Value` |
126+
127+
<a href="#formatting-considerations"></a>
128+
129+
### Multi-turn document formatting
130+
131+
* Use headings and subheadings to denote hierarchy. For example, You can h1 to denote the parent question answer and h2 to denote the question answer that should be taken as prompt. Use small heading size to denote subsequent hierarchy. Do not use style, color, or some other mechanism to imply structure in your document, question answering will not extract the multi-turn prompts.
132+
* First character of heading must be capitalized.
133+
* Do not end a heading with a question mark, `?`.
134+
135+
**Sample documents**:<br>[Surface Pro (docx)](https://github.com/Azure-Samples/cognitive-services-sample-data-files/blob/master/qna-maker/data-source-formats/multi-turn.docx)<br>[Contoso Benefits (docx)](https://github.com/Azure-Samples/cognitive-services-sample-data-files/blob/master/qna-maker/data-source-formats/Multiturn-ContosoBenefits.docx)<br>[Contoso Benefits (pdf)](https://github.com/Azure-Samples/cognitive-services-sample-data-files/blob/master/qna-maker/data-source-formats/Multiturn-ContosoBenefits.pdf)
136+
137+
## FAQ URLs
138+
139+
Question answering can support FAQ web pages in three different forms:
140+
141+
* Plain FAQ pages
142+
* FAQ pages with links
143+
* FAQ pages with a Topics Homepage
144+
145+
### Plain FAQ pages
146+
147+
This is the most common type of FAQ page, in which the answers immediately follow the questions in the same page.
148+
149+
Below is an example of a plain FAQ page:
150+
151+
> [!div class="mx-imgBorder"]
152+
> ![Plain FAQ page example for a knowledge base](../../../qnamaker/media/qnamaker-concepts-datasources/plain-faq.png)
153+
154+
### FAQ pages with links
155+
156+
In this type of FAQ page, questions are aggregated together and are linked to answers that are either in different sections of the same page, or in different pages.
157+
158+
Below is an example of an FAQ page with links in sections that are on the same page:
159+
160+
> [!div class="mx-imgBorder"]
161+
> ![Section Link FAQ page example for a knowledge base](../../../qnamaker/media/qnamaker-concepts-datasources/sectionlink-faq.png)
162+
163+
### Parent Topics page links to child answers pages
164+
165+
This type of FAQ has a Topics page where each topic is linked to a corresponding set of questions and answers on a different page. Question answer crawls all the linked pages to extract the corresponding questions & answers.
166+
167+
Below is an example of a Topics page with links to FAQ sections in different pages.
168+
169+
> [!div class="mx-imgBorder"]
170+
> ![Deep link FAQ page example for a knowledge base](../../../qnamaker/media/qnamaker-concepts-datasources/topics-faq.png)
171+
172+
### Support URLs
173+
174+
Question answering can process semi-structured support web pages, such as web articles that would describe how to perform a given task, how to diagnose and resolve a given problem, and what are the best practices for a given process. Extraction works best on content that has a clear structure with hierarchical headings.
175+
176+
> [!NOTE]
177+
> Extraction for support articles is a new feature and is in early stages. It works best for simple pages, that are well structured, and do not contain complex headers/footers.
178+
179+
> [!div class="mx-imgBorder"]
180+
> ![Question answering supports extraction from semi-structured web pages where a clear structure is presented with hierarchical headings](../../../qnamaker/media/qnamaker-concepts-datasources/support-web-pages-with-heirarchical-structure.png)
181+
182+
## Import and export knowledge base
183+
184+
**TSV and XLS files**, from exported knowledge bases, can only be used by importing the files from the **Settings** page in the language studio. They cannot be used as data sources during knowledge base creation or from the **+ Add file** or **+ Add URL** feature on the **Settings** page.
185+
186+
When you import the knowledge base through these **TSV and XLS files**, the question answer pairs get added to the editorial source and not the sources from which the question and answers were extracted in the exported knowledge base.
187+
188+
## Next steps
189+
190+
* [Tutorial: Create an FAQ bot](../tutorials/bot-service.md)

articles/cognitive-services/language-service/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -589,6 +589,8 @@ items:
589589
items:
590590
- name: Markdown format
591591
href: question-answering/reference/markdown-format.md
592+
- name: Format guidelines
593+
href: question-answering/reference/document-format-guidelines.md
592594
- name: REST API
593595
items:
594596
- name: Prebuilt

0 commit comments

Comments
 (0)