You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/language-service/summarization/custom/how-to/data-formats.md
+95-30Lines changed: 95 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,44 +32,70 @@ In the abstractive document summarization scenario, each document (whether it ha
32
32
33
33
## Custom summarization conversation sample format
34
34
35
-
In the abstractive conversation summarization scenario, each conversation (whether it has a provided label or not) is expected to be provided in a plain .txt file. Each conversation turn must be provided in a single line that is formatted as Speaker + “: “ + text (I.e., Speaker and text are separated by a colon followed by a space). The following is an example conversation of three turns between two speakers (Agent and Customer).
36
-
37
-
Agent: Hello, how can I help you?
38
-
39
-
Customer: How do I upgrade office? I have been getting error messages all day.
40
-
41
-
Agent: Please press the upgrade button, then sign in and follow the instructions.
35
+
In the abstractive conversation summarization scenario, each conversation (whether it has a provided label or not) is expected to be provided in a .json file, which is similar to the input format for our [pre-built conversation summarization service](https://learn.microsoft.com/rest/api/language/2023-04-01/analyze-conversation/submit-job?tabs=HTTP#textconversation). The following is an example conversation of three turns between two speakers (Agent and Customer).
42
36
37
+
```json
38
+
{
39
+
"conversationItems": [
40
+
{
41
+
"text": "Hello, how can I help you?",
42
+
"modality": "text",
43
+
"id": "1",
44
+
"participantId": "Agent",
45
+
"role": "Agent"
46
+
},
47
+
{
48
+
"text": "How do I upgrade office? I have been getting error messages all day.",
49
+
"modality": "text",
50
+
"id": "2",
51
+
"participantId": "Customer",
52
+
"role": "Customer"
53
+
},
54
+
{
55
+
"text": "Please press the upgrade button, then sign in and follow the instructions.",
56
+
"modality": "text",
57
+
"id": "3",
58
+
"participantId": "Agent",
59
+
"role": "Agent"
60
+
}
61
+
],
62
+
"modality": "text",
63
+
"id": "conversation1",
64
+
"language": "en"
65
+
}
66
+
```
43
67
44
-
## Custom summarization document and sample mapping JSON format
68
+
## Sample mapping JSON format
45
69
46
70
In both document and conversation summarization scenarios, a set of documents and corresponding labels can be provided in a single JSON file that references individual document/conversation and summary files.
47
71
48
-
<!--- The JSON file is expected to contain the following fields:
72
+
The JSON file is expected to contain the following fields:
49
73
50
74
```json
51
-
projectFileVersion": TODO,
52
-
"stringIndexType": TODO,
53
-
"metadata": {
54
-
"projectKind": TODO,
55
-
"storageInputContainerName": TODO,
56
-
"projectName": a string project name,
57
-
"multilingual": TODO,
58
-
"description": a string project description,
59
-
"language": TODO:
60
-
},
61
-
"assets": {
62
-
"projectKind": TODO,
63
-
"documents": a list of document-label pairs, each is defined with three fields:
64
-
[
65
-
{
66
-
"summaryLocation": a string path to the summary txt file,
67
-
"location": a string path to the document txt file,
68
-
"language": TODO
69
-
}
70
-
]
75
+
{
76
+
projectFileVersion": The version of the exported project,
77
+
"stringIndexType": Specifies the method used to interpret string offsets. For additional information see https://aka.ms/text-analytics-offsets,
78
+
"metadata": {
79
+
"projectKind": The project kind you need to import. Values for summarization are CustomAbstractiveSummarization and CustomConversationSummarization. Both projectKind fields must be identical.,
80
+
"storageInputContainerName": The name of the storage container that contains the documents/conversations and the summaries,
81
+
"projectName": a string project name,
82
+
"multilingual": A flag denoting whether this project should allow multilingual documents or not. For Summarization this option is turned off,
83
+
"description": a string project description,
84
+
"language": The default language of the project. Possible values are “en” and “en-us”
85
+
},
86
+
"assets": {
87
+
"projectKind": The project kind you need to import. Values for summarization are CustomAbstractiveSummarization and CustomConversationSummarization. Both projectKind fields must be identical.,
88
+
"documents": a list of document-label pairs, each is defined with three fields:[
89
+
{
90
+
"summaryLocation": a string path to the summary txt (for documents) or json (for conversations) file,
91
+
"location": a string path to the document txt (for documents) or json (for conversations) file,
92
+
"language": The language of the documents. Possible values are “en” and “en-us”
93
+
}
94
+
]
95
+
}
71
96
}
72
-
``` --->
97
+
```
98
+
## Custom document summarization mapping sample
73
99
74
100
The following is an example mapping file for the abstractive document summarization scenario with three documents and corresponding labels.
75
101
@@ -108,6 +134,45 @@ The following is an example mapping file for the abstractive document summarizat
Copy file name to clipboardExpand all lines: articles/ai-services/language-service/summarization/includes/quickstarts/custom-language-studio.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,9 +23,11 @@ Before you can use custom Summarization, you'll need to create an Azure AI Langu
23
23
24
24
[!INCLUDE [create a new resource from the Azure portal](../../../includes/custom/resource-creation-azure-portal.md)]
25
25
26
-
## Upload sample data to blob container
26
+
## Download sample data
27
+
28
+
If you need sample data, we've provided some for [document summarization](https://github.com/Azure-Samples/cognitive-services-sample-data-files/tree/master/language-service/Custom%20summarization/abstractive-document-samples) and [conversation summarization](https://github.com/Azure-Samples/cognitive-services-sample-data-files/tree/master/language-service/Custom%20summarization/abstractive-conversation-samples) scenarios for the purpose of this quickstart.
27
29
28
-
If you need sample data, we've [provided some](https://github.com/Azure-Samples/cognitive-services-sample-data-files/tree/master/language-service/Custom%20summarization/sample-docs-and-labels) for the purpose of this quickstart.
30
+
## Upload sample data to blob container
29
31
30
32
[!INCLUDE [Uploading sample data for custom Summarization](../../../includes/custom/language-studio/upload-data-to-storage.md)]
0 commit comments