You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/language-service/custom-classification/concepts/data-formats.md
+14-5Lines changed: 14 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@ Your tags file should be in the `json` format below.
34
34
],
35
35
"documents": [
36
36
{
37
-
"location": "doc1.txt",
37
+
"location": "file1.txt",
38
38
"language": "en-us",
39
39
"classifiers": [
40
40
{
@@ -44,6 +44,15 @@ Your tags file should be in the `json` format below.
44
44
"classifierName": "Class1"
45
45
}
46
46
]
47
+
},
48
+
{
49
+
"location": "file2.txt",
50
+
"language": "en-us",
51
+
"classifiers": [
52
+
{
53
+
"classifierName": "Class2"
54
+
}
55
+
]
47
56
}
48
57
]
49
58
}
@@ -52,10 +61,10 @@ Your tags file should be in the `json` format below.
52
61
### Data description
53
62
54
63
*`classifiers`: An array of classifiers for your data. Each classifier represents one of the classes you want to tag your data with.
55
-
*`documents`: An array of tagged documents. For example:
56
-
*`location`: The path of the JSON file containing tags. The tags file has to be in root of the storage container.
57
-
*`language`: Language of the document. Use one of the [supported culture locales](../language-support.md).
58
-
*`classifiers`: Array of classifier objects assigned to the document. If you're working on a single classification project, there should be one classifier only.
64
+
*`documents`: An array of tagged documents.
65
+
*`location`: The path of the file. The file has to be in root of the storage container.
66
+
*`language`: Language of the file. Use one of the [supported culture locales](../language-support.md).
67
+
*`classifiers`: Array of classifier objects assigned to the file. If you're working on a single classification project, there should be one classifier per file only.
|`{YOUR-ENDPOINT}`| The endpoint for authenticating your API request. |`https://<your-custom-subdomain>.cognitiveservices.azure.com`|
162
152
|`{PROJECT-NAME}`| The name for your project. This value is case-sensitive. |`myProject`|
163
153
164
-
### Parameters
165
-
166
-
Pass the following parameter with your request.
167
-
168
-
|Key|Explanation|Value|
169
-
|--|--|--|
170
-
|`api-version`| The API version used.|`2021-11-01-preview`|
171
-
172
-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
173
-
174
154
### Headers
175
155
176
156
Use the following header to authenticate your request.
@@ -198,7 +178,7 @@ Use the following JSON in your request. The model will be named `MyModel` once t
198
178
Once you send your API request, you will receive a `202` response indicating success. In the response headers, extract the `location` value. It will be formatted like this:
@@ -218,16 +198,6 @@ Use the following **GET** request to query the status of your model's training p
218
198
|`{PROJECT-NAME}`| The name for your project. This value is case-sensitive. |`myProject`|
219
199
|`{JOB-ID}`| The ID for locating your model's training status. This is in the `location` header value you received in the previous step. |`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx`|
220
200
221
-
### Parameters
222
-
223
-
Pass the following parameter with your request.
224
-
225
-
|Key|Explanation|Value|
226
-
|--|--|--|
227
-
|`api-version`| The API version used.|`2021-11-01-preview`|
228
-
229
-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
230
-
231
201
### Headers
232
202
233
203
Use the following header to authenticate your request.
@@ -277,7 +247,7 @@ Once you send the request, you will get the following response.
277
247
Create a **PUT** request using the following URL, headers, and JSON body to start deploying a text classification model.
@@ -286,16 +256,6 @@ Create a **PUT** request using the following URL, headers, and JSON body to star
286
256
|`{PROJECT-NAME}`| The name for your project. This value is case-sensitive. |`myProject`|
287
257
|`{DEPLOYMENT-NAME}`| The name of your deployment. This value is case-sensitive. |`prod`|
288
258
289
-
### Parameters
290
-
291
-
Pass the following parameter with your request.
292
-
293
-
|Key|Explanation|Value|
294
-
|--|--|--|
295
-
|`api-version`| The API version used.|`2021-11-01-preview`|
296
-
297
-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
298
-
299
259
### Headers
300
260
301
261
Use the following header to authenticate your request.
@@ -318,7 +278,7 @@ Use the following JSON in your request. The model will be named `MyModel` once t
318
278
Once you send your API request, you will receive a `202` response indicating success. In the response headers, extract the `location` value. It will be formatted like this:
`JOB-ID` is used to identify your request, since this operation is asynchronous. You will use this URL in the next step to get the publishing status.
@@ -328,7 +288,7 @@ Once you send your API request, you will receive a `202` response indicating suc
328
288
Use the following **GET** request to query the status of your model's publishing process. You can use the URL you received from the previous step, or replace the placeholder values below with your own values.
@@ -338,16 +298,6 @@ Use the following **GET** request to query the status of your model's publishing
338
298
|`{DEPLOYMENT-NAME}`| The name of your deployment. This value is case-sensitive. |`prod`|
339
299
|`{JOB-ID}`| The ID for locating your model's training status. This is in the `location` header value you received in the previous step. |`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx`|
340
300
341
-
### Parameters
342
-
343
-
Pass the following parameter with your request.
344
-
345
-
|Key|Explanation|Value|
346
-
|--|--|--|
347
-
|`api-version`| The API version used.|`2021-11-01-preview`|
348
-
349
-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
350
-
351
301
### Headers
352
302
353
303
Use the following header to authenticate your request.
@@ -525,4 +475,4 @@ Use the following header to authenticate your request.
525
475
526
476
|Key|Value|
527
477
|--|--|
528
-
|Ocp-Apim-Subscription-Key| The key to your resource. Used for authenticating your API requests.|
478
+
|Ocp-Apim-Subscription-Key| The key to your resource. Used for authenticating your API requests.|
Copy file name to clipboardExpand all lines: articles/cognitive-services/language-service/custom-named-entity-recognition/concepts/data-formats.md
+58-30Lines changed: 58 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,48 +24,76 @@ When you tag entities, the tags are saved as in the following JSON format. If yo
24
24
25
25
```json
26
26
{
27
-
//List of entity names. Their index within this array is used as an ID.
28
-
"entityNames": [
29
-
"entity_name1",
30
-
"entity_name2"
27
+
"extractors": [
28
+
{
29
+
"name": "Entity1"
30
+
},
31
+
{
32
+
"name": "Entity2"
33
+
}
31
34
],
32
-
"documents": "path_to_document", //Relative file path to get the text.
33
-
"culture": "en-US", //Standard culture strings supported by CultureInfo.
34
-
"entities": [
35
+
"documents": [
35
36
{
36
-
"regionStart": 0,
37
-
"regionLength": 69,
38
-
"labels": [
37
+
"location": "file1.txt",
38
+
"language": "en-us",
39
+
"extractors": [
39
40
{
40
-
"entity": 0, // Index of the entity in the "entityNames" array. Positions are relative to the original text (not bounding box)
41
-
"start": 4,
42
-
"length": 10
43
-
},
41
+
"regionOffset": 0,
42
+
"regionLength": 5129,
43
+
"labels": [
44
+
{
45
+
"extractorName": "Entity1",
46
+
"offset": 77,
47
+
"length": 10
48
+
},
49
+
{
50
+
"extractorName": "Entity2",
51
+
"offset": 3062,
52
+
"length": 8
53
+
}
54
+
]
55
+
}
56
+
]
57
+
},
58
+
{
59
+
"location": "file2.txt",
60
+
"language": "en-us",
61
+
"extractors": [
44
62
{
45
-
"entity": 1,
46
-
"start": 18,
47
-
"length": 11
63
+
"regionOffset": 0,
64
+
"regionLength": 6873,
65
+
"labels": [
66
+
{
67
+
"extractorName": "Entity2",
68
+
"offset": 60,
69
+
"length": 7
70
+
},
71
+
{
72
+
"extractorName": "Entity1",
73
+
"offset": 2805,
74
+
"length": 10
75
+
}
76
+
]
48
77
}
49
78
]
50
79
}
51
-
]
80
+
]
52
81
}
53
82
```
54
83
55
-
The following list describes the various JSON properties of the sample above.
84
+
### Data description
56
85
57
-
*`entityNames`: An array of entity names. Index of the entity within the array is used as its ID.
86
+
*`extractors`: An array of extractors for your data. Each extractor represents one of the entities you want to extract from your data.
58
87
*`documents`: An array of tagged documents.
59
-
*`location`: The path of the document relative to the JSON file. For example, docs on the same level as the tags file `file.txt`, for docs inside one directory level `dir1/file.txt`.
60
-
*`culture`: culture/language of the document. <!-- See [language support](../language-support.md) for more information. -->
61
-
*`entities`: Specifies the entity recognition tags.
62
-
*`regionStart`: The inclusive character position of the start of the text.
63
-
*`regionLength`: The length of the bounding box in terms of UTF16 characters. Training only considers the data in this region, so if this is a tagged file, set the `regionStart` to 0 and the `regionLength` to the last index of last character in the file. You can also set this region if you want to introduce a negative sample to the training, by defining the region as a portion of the file with no tags.
64
-
65
-
*`labels`: All tags occurring within the bounding box.
66
-
*`entity`: The index of the entity in the `entityNames` array.
67
-
*`start`: The inclusive character position of the start of the tag in the document text. This is not relative to the bounding box.
68
-
*`length`: The length of the tag in terms of UTF16 characters.
88
+
*`location`: The path of the file. The file has to be in root of the storage container.
89
+
*`language`: Language of the file. Use one of the [supported culture locales](../language-support.md).
90
+
*`extractors`: Array of extractor objects to be extracted from the file.
91
+
*`regionOffset`: The inclusive character position of the start of the text.
92
+
*`regionLength`: The length of the bounding box in terms of UTF16 characters. Training only considers the data in this region.
93
+
*`labels`: Array of all the tagged entities within the specified region.
94
+
*`extractorName`: Type of the entity to be extracted.
95
+
*`offset`: The inclusive character position of the start of the entity. This is not relative to the bounding box.
96
+
*`length`: The length of the entity in terms of UTF16 characters.
0 commit comments