Skip to content

Commit d76edea

Browse files
Merge pull request #246102 from jboback/csum-format-update
Csum format & sample update
2 parents 5347996 + a0bd988 commit d76edea

File tree

2 files changed

+99
-32
lines changed

2 files changed

+99
-32
lines changed

articles/ai-services/language-service/summarization/custom/how-to/data-formats.md

Lines changed: 95 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -32,44 +32,70 @@ In the abstractive document summarization scenario, each document (whether it ha
3232

3333
## Custom summarization conversation sample format
3434

35-
In the abstractive conversation summarization scenario, each conversation (whether it has a provided label or not) is expected to be provided in a plain .txt file. Each conversation turn must be provided in a single line that is formatted as Speaker + “: “ + text (I.e., Speaker and text are separated by a colon followed by a space). The following is an example conversation of three turns between two speakers (Agent and Customer).
36-
37-
Agent: Hello, how can I help you?
38-
39-
Customer: How do I upgrade office? I have been getting error messages all day.
40-
41-
Agent: Please press the upgrade button, then sign in and follow the instructions.
35+
In the abstractive conversation summarization scenario, each conversation (whether it has a provided label or not) is expected to be provided in a .json file, which is similar to the input format for our [pre-built conversation summarization service](https://learn.microsoft.com/rest/api/language/2023-04-01/analyze-conversation/submit-job?tabs=HTTP#textconversation). The following is an example conversation of three turns between two speakers (Agent and Customer).
4236

37+
```json
38+
{
39+
"conversationItems": [
40+
{
41+
"text": "Hello, how can I help you?",
42+
"modality": "text",
43+
"id": "1",
44+
"participantId": "Agent",
45+
"role": "Agent"
46+
},
47+
{
48+
"text": "How do I upgrade office? I have been getting error messages all day.",
49+
"modality": "text",
50+
"id": "2",
51+
"participantId": "Customer",
52+
"role": "Customer"
53+
},
54+
{
55+
"text": "Please press the upgrade button, then sign in and follow the instructions.",
56+
"modality": "text",
57+
"id": "3",
58+
"participantId": "Agent",
59+
"role": "Agent"
60+
}
61+
],
62+
"modality": "text",
63+
"id": "conversation1",
64+
"language": "en"
65+
}
66+
```
4367

44-
## Custom summarization document and sample mapping JSON format
68+
## Sample mapping JSON format
4569

4670
In both document and conversation summarization scenarios, a set of documents and corresponding labels can be provided in a single JSON file that references individual document/conversation and summary files.
4771

48-
<!--- The JSON file is expected to contain the following fields:
72+
The JSON file is expected to contain the following fields:
4973

5074
```json
51-
projectFileVersion": TODO,
52-
"stringIndexType": TODO,
53-
"metadata": {
54-
"projectKind": TODO,
55-
"storageInputContainerName": TODO,
56-
"projectName": a string project name,
57-
"multilingual": TODO,
58-
"description": a string project description,
59-
"language": TODO:
60-
},
61-
"assets": {
62-
"projectKind": TODO,
63-
"documents": a list of document-label pairs, each is defined with three fields:
64-
[
65-
{
66-
"summaryLocation": a string path to the summary txt file,
67-
"location": a string path to the document txt file,
68-
"language": TODO
69-
}
70-
]
75+
{
76+
projectFileVersion": The version of the exported project,
77+
"stringIndexType": Specifies the method used to interpret string offsets. For additional information see https://aka.ms/text-analytics-offsets,
78+
"metadata": {
79+
"projectKind": The project kind you need to import. Values for summarization are CustomAbstractiveSummarization and CustomConversationSummarization. Both projectKind fields must be identical.,
80+
"storageInputContainerName": The name of the storage container that contains the documents/conversations and the summaries,
81+
"projectName": a string project name,
82+
"multilingual": A flag denoting whether this project should allow multilingual documents or not. For Summarization this option is turned off,
83+
"description": a string project description,
84+
"language": The default language of the project. Possible values are “en” and “en-us”
85+
},
86+
"assets": {
87+
"projectKind": The project kind you need to import. Values for summarization are CustomAbstractiveSummarization and CustomConversationSummarization. Both projectKind fields must be identical.,
88+
"documents": a list of document-label pairs, each is defined with three fields:[
89+
{
90+
"summaryLocation": a string path to the summary txt (for documents) or json (for conversations) file,
91+
"location": a string path to the document txt (for documents) or json (for conversations) file,
92+
"language": The language of the documents. Possible values are “en” and “en-us”
93+
}
94+
]
95+
}
7196
}
72-
``` --->
97+
```
98+
## Custom document summarization mapping sample
7399

74100
The following is an example mapping file for the abstractive document summarization scenario with three documents and corresponding labels.
75101

@@ -108,6 +134,45 @@ The following is an example mapping file for the abstractive document summarizat
108134
}
109135
```
110136

137+
## Custom conversation summarization mapping sample
138+
139+
The following is an example mapping file for the abstractive conversation summarization scenario with three documents and corresponding labels.
140+
141+
```json
142+
{
143+
"projectFileVersion": "2022-10-01-preview",
144+
"stringIndexType": "Utf16CodeUnit",
145+
"metadata": {
146+
"projectKind": "CustomAbstractiveSummarization",
147+
"storageInputContainerName": "abstractivesummarization",
148+
"projectName": "sample_custom_summarization",
149+
"multilingual": false,
150+
"description": "Creating a custom summarization model",
151+
"language": "en-us"
152+
}
153+
"assets": {
154+
"projectKind": "CustomAbstractiveSummarization",
155+
"documents": [
156+
{
157+
"summaryLocation": "conv1_summary.txt",
158+
"location": "conv1.json",
159+
"language": "en-us"
160+
},
161+
{
162+
"summaryLocation": "conv2_summary.txt",
163+
"location": "conv2.json",
164+
"language": "en-us"
165+
},
166+
{
167+
"summaryLocation": "conv3_summary.txt",
168+
"location": "conv3.json",
169+
"language": "en-us"
170+
}
171+
]
172+
}
173+
}
174+
```
175+
111176
## Next steps
112177

113178
[Get started with custom summarization](../../custom/quickstart.md)

articles/ai-services/language-service/summarization/includes/quickstarts/custom-language-studio.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,11 @@ Before you can use custom Summarization, you'll need to create an Azure AI Langu
2323
2424
[!INCLUDE [create a new resource from the Azure portal](../../../includes/custom/resource-creation-azure-portal.md)]
2525

26-
## Upload sample data to blob container
26+
## Download sample data
27+
28+
If you need sample data, we've provided some for [document summarization](https://github.com/Azure-Samples/cognitive-services-sample-data-files/tree/master/language-service/Custom%20summarization/abstractive-document-samples) and [conversation summarization](https://github.com/Azure-Samples/cognitive-services-sample-data-files/tree/master/language-service/Custom%20summarization/abstractive-conversation-samples) scenarios for the purpose of this quickstart.
2729

28-
If you need sample data, we've [provided some](https://github.com/Azure-Samples/cognitive-services-sample-data-files/tree/master/language-service/Custom%20summarization/sample-docs-and-labels) for the purpose of this quickstart.
30+
## Upload sample data to blob container
2931

3032
[!INCLUDE [Uploading sample data for custom Summarization](../../../includes/custom/language-studio/upload-data-to-storage.md)]
3133

0 commit comments

Comments
 (0)