Skip to content

Commit 73608f1

Browse files
authored
Merge pull request #180490 from magrefaat/patch-65
Update data-formats.md
2 parents 5863d6e + be38111 commit 73608f1

File tree

4 files changed

+88
-151
lines changed

4 files changed

+88
-151
lines changed

articles/cognitive-services/language-service/custom-classification/concepts/data-formats.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Your tags file should be in the `json` format below.
3434
],
3535
"documents": [
3636
{
37-
"location": "doc1.txt",
37+
"location": "file1.txt",
3838
"language": "en-us",
3939
"classifiers": [
4040
{
@@ -44,6 +44,15 @@ Your tags file should be in the `json` format below.
4444
"classifierName": "Class1"
4545
}
4646
]
47+
},
48+
{
49+
"location": "file2.txt",
50+
"language": "en-us",
51+
"classifiers": [
52+
{
53+
"classifierName": "Class2"
54+
}
55+
]
4756
}
4857
]
4958
}
@@ -52,10 +61,10 @@ Your tags file should be in the `json` format below.
5261
### Data description
5362

5463
* `classifiers`: An array of classifiers for your data. Each classifier represents one of the classes you want to tag your data with.
55-
* `documents`: An array of tagged documents. For example:
56-
* `location`: The path of the JSON file containing tags. The tags file has to be in root of the storage container.
57-
* `language`: Language of the document. Use one of the [supported culture locales](../language-support.md).
58-
* `classifiers`: Array of classifier objects assigned to the document. If you're working on a single classification project, there should be one classifier only.
64+
* `documents`: An array of tagged documents.
65+
* `location`: The path of the file. The file has to be in root of the storage container.
66+
* `language`: Language of the file. Use one of the [supported culture locales](../language-support.md).
67+
* `classifiers`: Array of classifier objects assigned to the file. If you're working on a single classification project, there should be one classifier per file only.
5968

6069
## Next steps
6170

articles/cognitive-services/language-service/custom-classification/includes/quickstarts/rest-api.md

Lines changed: 8 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -72,23 +72,13 @@ Create a **POST** request using the following URL, headers, and JSON body to cre
7272
Use the following URL to create a project and import your tags file. Replace the placeholder values below with your own values.
7373

7474
```rest
75-
{YOUR-ENDPOINT}/language/analyze-text/projects/{projectName}/:import.
75+
{YOUR-ENDPOINT}/language/analyze-text/projects/{projectName}/:import?api-version=2021-11-01-preview
7676
```
7777

7878
|Placeholder |Value | Example |
7979
|---------|---------|---------|
8080
|`{YOUR-ENDPOINT}` | The endpoint for authenticating your API request. | `https://<your-custom-subdomain>.cognitiveservices.azure.com` |
8181

82-
### Parameters
83-
84-
Pass the following parameter with your request.
85-
86-
|Key|Explanation|Value|
87-
|--|--|--|
88-
|`api-version`| The API version used.| `2021-11-01-preview` |
89-
90-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
91-
9282
### Headers
9383

9484
Use the following header to authenticate your request.
@@ -153,24 +143,14 @@ After your project has been created, you can begin training a text classificatio
153143
Use the following URL when creating your API request. Replace the placeholder values below with your own values.
154144

155145
```rest
156-
{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}/:train
146+
{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}/:train?api-version=2021-11-01-preview
157147
```
158148

159149
|Placeholder |Value | Example |
160150
|---------|---------|---------|
161151
|`{YOUR-ENDPOINT}` | The endpoint for authenticating your API request. | `https://<your-custom-subdomain>.cognitiveservices.azure.com` |
162152
|`{PROJECT-NAME}` | The name for your project. This value is case-sensitive. | `myProject` |
163153

164-
### Parameters
165-
166-
Pass the following parameter with your request.
167-
168-
|Key|Explanation|Value|
169-
|--|--|--|
170-
|`api-version`| The API version used.| `2021-11-01-preview` |
171-
172-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
173-
174154
### Headers
175155

176156
Use the following header to authenticate your request.
@@ -198,7 +178,7 @@ Use the following JSON in your request. The model will be named `MyModel` once t
198178
Once you send your API request, you will receive a `202` response indicating success. In the response headers, extract the `location` value. It will be formatted like this:
199179

200180
```rest
201-
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/train/jobs/{JOB-ID}?api-version=xxxx-xx-xx-xxxxxxx
181+
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/train/jobs/{JOB-ID}?api-version=2021-11-01-preview
202182
```
203183

204184
`JOB-ID` is used to identify your request, since this operation is asynchronous. You will use this URL in the next step to get the training status.
@@ -209,7 +189,7 @@ Use the following **GET** request to query the status of your model's training p
209189

210190

211191
```rest
212-
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/train/jobs/{JOB-ID}
192+
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/train/jobs/{JOB-ID}?api-version=2021-11-01-preview
213193
```
214194

215195
|Placeholder |Value | Example |
@@ -218,16 +198,6 @@ Use the following **GET** request to query the status of your model's training p
218198
|`{PROJECT-NAME}` | The name for your project. This value is case-sensitive. | `myProject` |
219199
|`{JOB-ID}` | The ID for locating your model's training status. This is in the `location` header value you received in the previous step. | `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx` |
220200

221-
### Parameters
222-
223-
Pass the following parameter with your request.
224-
225-
|Key|Explanation|Value|
226-
|--|--|--|
227-
|`api-version`| The API version used.| `2021-11-01-preview` |
228-
229-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
230-
231201
### Headers
232202

233203
Use the following header to authenticate your request.
@@ -277,7 +247,7 @@ Once you send the request, you will get the following response.
277247
Create a **PUT** request using the following URL, headers, and JSON body to start deploying a text classification model.
278248

279249
```rest
280-
{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}
250+
{YOUR-ENDPOINT}/language/analyze-text/projects/{PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}?api-version=2021-11-01-preview
281251
```
282252

283253
|Placeholder |Value | Example |
@@ -286,16 +256,6 @@ Create a **PUT** request using the following URL, headers, and JSON body to star
286256
|`{PROJECT-NAME}` | The name for your project. This value is case-sensitive. | `myProject` |
287257
|`{DEPLOYMENT-NAME}` | The name of your deployment. This value is case-sensitive. | `prod` |
288258

289-
### Parameters
290-
291-
Pass the following parameter with your request.
292-
293-
|Key|Explanation|Value|
294-
|--|--|--|
295-
|`api-version`| The API version used.| `2021-11-01-preview` |
296-
297-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
298-
299259
### Headers
300260

301261
Use the following header to authenticate your request.
@@ -318,7 +278,7 @@ Use the following JSON in your request. The model will be named `MyModel` once t
318278
Once you send your API request, you will receive a `202` response indicating success. In the response headers, extract the `location` value. It will be formatted like this:
319279

320280
```rest
321-
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version=xxxx-xx-xx-xxxxxxx
281+
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version=2021-11-01-preview
322282
```
323283

324284
`JOB-ID` is used to identify your request, since this operation is asynchronous. You will use this URL in the next step to get the publishing status.
@@ -328,7 +288,7 @@ Once you send your API request, you will receive a `202` response indicating suc
328288
Use the following **GET** request to query the status of your model's publishing process. You can use the URL you received from the previous step, or replace the placeholder values below with your own values.
329289

330290
```rest
331-
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}
291+
{YOUR-ENDPOINT}/language/analyze-text/projects/{YOUR-PROJECT-NAME}/deployments/{DEPLOYMENT-NAME}/jobs/{JOB-ID}?api-version=2021-11-01-preview
332292
```
333293

334294
|Placeholder |Value | Example |
@@ -338,16 +298,6 @@ Use the following **GET** request to query the status of your model's publishing
338298
|`{DEPLOYMENT-NAME}` | The name of your deployment. This value is case-sensitive. | `prod` |
339299
|`{JOB-ID}` | The ID for locating your model's training status. This is in the `location` header value you received in the previous step. | `xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx` |
340300

341-
### Parameters
342-
343-
Pass the following parameter with your request.
344-
345-
|Key|Explanation|Value|
346-
|--|--|--|
347-
|`api-version`| The API version used.| `2021-11-01-preview` |
348-
349-
To pass the parameter, add `?api-version=2021-11-01-preview` to the end of your request URL.
350-
351301
### Headers
352302

353303
Use the following header to authenticate your request.
@@ -525,4 +475,4 @@ Use the following header to authenticate your request.
525475

526476
|Key|Value|
527477
|--|--|
528-
|Ocp-Apim-Subscription-Key| The key to your resource. Used for authenticating your API requests.|
478+
|Ocp-Apim-Subscription-Key| The key to your resource. Used for authenticating your API requests.|

articles/cognitive-services/language-service/custom-named-entity-recognition/concepts/data-formats.md

Lines changed: 58 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -24,48 +24,76 @@ When you tag entities, the tags are saved as in the following JSON format. If yo
2424

2525
```json
2626
{
27-
//List of entity names. Their index within this array is used as an ID.
28-
"entityNames": [
29-
"entity_name1",
30-
"entity_name2"
27+
"extractors": [
28+
{
29+
"name": "Entity1"
30+
},
31+
{
32+
"name": "Entity2"
33+
}
3134
],
32-
"documents": "path_to_document", //Relative file path to get the text.
33-
"culture": "en-US", //Standard culture strings supported by CultureInfo.
34-
"entities": [
35+
"documents": [
3536
{
36-
"regionStart": 0,
37-
"regionLength": 69,
38-
"labels": [
37+
"location": "file1.txt",
38+
"language": "en-us",
39+
"extractors": [
3940
{
40-
"entity": 0, // Index of the entity in the "entityNames" array. Positions are relative to the original text (not bounding box)
41-
"start": 4,
42-
"length": 10
43-
},
41+
"regionOffset": 0,
42+
"regionLength": 5129,
43+
"labels": [
44+
{
45+
"extractorName": "Entity1",
46+
"offset": 77,
47+
"length": 10
48+
},
49+
{
50+
"extractorName": "Entity2",
51+
"offset": 3062,
52+
"length": 8
53+
}
54+
]
55+
}
56+
]
57+
},
58+
{
59+
"location": "file2.txt",
60+
"language": "en-us",
61+
"extractors": [
4462
{
45-
"entity": 1,
46-
"start": 18,
47-
"length": 11
63+
"regionOffset": 0,
64+
"regionLength": 6873,
65+
"labels": [
66+
{
67+
"extractorName": "Entity2",
68+
"offset": 60,
69+
"length": 7
70+
},
71+
{
72+
"extractorName": "Entity1",
73+
"offset": 2805,
74+
"length": 10
75+
}
76+
]
4877
}
4978
]
5079
}
51-
]
80+
]
5281
}
5382
```
5483

55-
The following list describes the various JSON properties of the sample above.
84+
### Data description
5685

57-
* `entityNames`: An array of entity names. Index of the entity within the array is used as its ID.
86+
* `extractors`: An array of extractors for your data. Each extractor represents one of the entities you want to extract from your data.
5887
* `documents`: An array of tagged documents.
59-
* `location`: The path of the document relative to the JSON file. For example, docs on the same level as the tags file `file.txt`, for docs inside one directory level `dir1/file.txt`.
60-
* `culture`: culture/language of the document. <!-- See [language support](../language-support.md) for more information. -->
61-
* `entities`: Specifies the entity recognition tags.
62-
* `regionStart`: The inclusive character position of the start of the text.
63-
* `regionLength`: The length of the bounding box in terms of UTF16 characters. Training only considers the data in this region, so if this is a tagged file, set the `regionStart` to 0 and the `regionLength` to the last index of last character in the file. You can also set this region if you want to introduce a negative sample to the training, by defining the region as a portion of the file with no tags.
64-
65-
* `labels`: All tags occurring within the bounding box.
66-
* `entity`: The index of the entity in the `entityNames` array.
67-
* `start`: The inclusive character position of the start of the tag in the document text. This is not relative to the bounding box.
68-
* `length`: The length of the tag in terms of UTF16 characters.
88+
* `location`: The path of the file. The file has to be in root of the storage container.
89+
* `language`: Language of the file. Use one of the [supported culture locales](../language-support.md).
90+
* `extractors`: Array of extractor objects to be extracted from the file.
91+
* `regionOffset`: The inclusive character position of the start of the text.
92+
* `regionLength`: The length of the bounding box in terms of UTF16 characters. Training only considers the data in this region.
93+
* `labels`: Array of all the tagged entities within the specified region.
94+
* `extractorName`: Type of the entity to be extracted.
95+
* `offset`: The inclusive character position of the start of the entity. This is not relative to the bounding box.
96+
* `length`: The length of the entity in terms of UTF16 characters.
6997

7098
## Next steps
7199

0 commit comments

Comments
 (0)