Skip to content

Commit 9eb9d0d

Browse files
committed
fixing conflict
2 parents c9a3675 + 32bb2e2 commit 9eb9d0d

File tree

4 files changed

+159
-7
lines changed

4 files changed

+159
-7
lines changed

articles/cognitive-services/language-service/custom-text-analytics-for-health/concepts/data-formats.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ ms.author: aahi
1313
ms.custom: language-service-custom-ta4h
1414
---
1515

16-
# Accepted custom Text Analytics for health data formats
16+
# Accepted data formats in custom text analytics for health
1717

1818
Use this article to learn about formatting your data to be imported into custom text analytics for health.
1919

20-
<!--If you are trying to [import your data](../how-to/create-project.md#import-project) into custom Text Analytics for health, it has to follow a specific format. If you don't have data to import, you can [create your project](../how-to/create-project.md) and use the Language Studio to [label your documents](../how-to/tag-data.md).-->
20+
If you are trying to [import your data](../how-to/create-project.md#import-project) into custom Text Analytics for health, it has to follow a specific format. If you don't have data to import, you can [create your project](../how-to/create-project.md) and use the Language Studio to [label your documents](../how-to/label-data.md).
2121

2222
Your Labels file should be in the `json` format below to be used when importing your labels into a project.
2323

@@ -130,13 +130,13 @@ Your Labels file should be in the `json` format below to be used when importing
130130

131131
```
132132

133-
<!--|Key |Placeholder |Value | Example |
133+
|Key |Placeholder |Value | Example |
134134
|---------|---------|----------|--|
135135
| `multilingual` | `true`| A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See [language support](../language-support.md#) to learn more about multilingual support. | `true`|
136136
|`projectName`|`{PROJECT-NAME}`|Project name|`myproject`|
137137
| `storageInputContainerName` |`{CONTAINER-NAME}`|Container name|`mycontainer`|
138138
| `entities` | | Array containing all the entity types you have in the project. These are the entity types that will be extracted from your documents into.| |
139-
| `category` | | The name of the entity type, which can be user defined in the case of a new entity definition or predefined in the case of prebuilt entities. For more information, see the entity naming rules below.| |
139+
| `category` | | The name of the entity type, which can be user defined for new entity definitions, or predefined for prebuilt entities. For more information, see the entity naming rules below.| |
140140
|`compositionSetting`|`{COMPOSITION-SETTING}`|Rule that defines how to manage multiple components in your entity. Options are `combineComponents` or `separateComponents`. |`combineComponents`|
141141
| `list` | | Array containing all the sublists you have in the project for a specific entity. Lists can be added to prebuilt entities or new entities with learned components.| |
142142
|`sublists`|`[]`|Array containing sublists. Each sublist is a key and its associated values.|`[]`|
@@ -147,7 +147,7 @@ Your Labels file should be in the `json` format below to be used when importing
147147
| `prebuilts` | `MedicationName` | The name of the prebuilt component populating the prebuilt entity. [Prebuilt entities](../../text-analytics-for-health/concepts/health-entity-categories.md) are automatically loaded into your project by default but you can extend them with list components in your labels file. | `MedicationName` |
148148
| `documents` | | Array containing all the documents in your project and list of the entities labeled within each document. | [] |
149149
| `location` | `{DOCUMENT-NAME}` | The location of the documents in the storage container. Since all the documents are in the root of the container this should be the document name.|`doc1.txt`|
150-
| `dataset` | `{DATASET}` | The test set to which this file goes to when split before training. Learn more about data splitting [here](../how-to/train-model.md#data-splitting) . Possible values for this field are `Train` and `Test`. |`Train`|
150+
| `dataset` | `{DATASET}` | The test set to which this file goes to when split before training. <!--Learn more about data splitting [here](../how-to/train-model.md#data-splitting).--> Possible values for this field are `Train` and `Test`. |`Train`|
151151
| `regionOffset` | | The inclusive character position of the start of the text. |`0`|
152152
| `regionLength` | | The length of the bounding box in terms of UTF16 characters. Training only considers the data in this region. |`500`|
153153
| `category` | | The type of entity associated with the span of text specified. | `Entity1`|
@@ -165,5 +165,5 @@ Your Labels file should be in the `json` format below to be used when importing
165165

166166
## Next steps
167167
* You can import your labeled data into your project directly. Learn how to [import project](../how-to/create-project.md#import-project)
168-
* See the [how-to article](../how-to/tag-data.md) more information about labeling your data. When you're done labeling your data, you can [train your model](../how-to/train-model.md).
169-
-->
168+
* See the [how-to article](../how-to/label-data.md) more information about labeling your data.
169+
<!--* When you're done labeling your data, you can [train your model](../how-to/train-model.md).-->
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Language and region support for custom Text Analytics for health
3+
titleSuffix: Azure Cognitive Services
4+
description: Learn about the languages and regions supported by custom Text Analytics for health
5+
services: cognitive-services
6+
author: aahill
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: language-service
10+
ms.topic: conceptual
11+
ms.date: 05/06/2022
12+
ms.custom: language-service-custom-ta4h
13+
ms.author: aahi
14+
---
15+
16+
# Language support for custom text analytics for health
17+
18+
Use this article to learn about the languages currently supported by custom Text Analytics for health.
19+
20+
## Multilingual option
21+
22+
With custom Text Analytics for health, you can train a model in one language and use it to extract entities from documents other languages. This feature saves you the trouble of building separate projects for each language and instead combining your datasets in a single project, making it easy to scale your projects to multiple languages. You can train your project entirely with English documents, and query it in: French, German, Italian, and others. You can enable the multilingual option as part of the project creation process or later through the project settings.
23+
24+
You aren't expected to add the same number of documents for every language. You should build the majority of your project in one language, and only add a few documents in languages you observe aren't performing well. If you create a project that is primarily in English, and start testing it in French, German, and Spanish, you might observe that German doesn't perform as well as the other two languages. In that case, consider adding 5% of your original English documents in German, train a new model and test in German again. In the [data labeling](how-to/label-data.md) page in Language Studio, you can select the language of the document you're adding. You should see better results for German queries. The more labeled documents you add, the more likely the results are going to get better. When you add data in another language, you shouldn't expect it to negatively affect other languages.
25+
26+
Hebrew is not supported in multilingual projects. If the primary language of the project is Hebrew, you will not be able to add training data in other languages, or query the model with other languages. Similarly, if the primary language of the project is not Hebrew, you will not be able to add training data in Hebrew, or query the model in Hebrew.
27+
28+
## Language support
29+
30+
Custom Text Analytics for health supports `.txt` files in the following languages:
31+
32+
| Language | Language code |
33+
| --- | --- |
34+
| English | `en` |
35+
| French | `fr` |
36+
| German | `de` |
37+
| Spanish | `es` |
38+
| Italian | `it` |
39+
| Portuguese (Portugal) | `pt-pt` |
40+
| Hebrew | `he` |
41+
42+
43+
## Next steps
44+
45+
* [Custom Text Analytics for health overview](overview.md)
46+
* [Service limits](reference/service-limits.md)
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
---
2+
title: Custom Text Analytics for health service limits
3+
titleSuffix: Azure Cognitive Services
4+
description: Learn about the data and service limits when using Custom Text Analytics for health.
5+
services: cognitive-services
6+
author: aahill
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: language-service
10+
ms.topic: conceptual
11+
ms.date: 05/06/2022
12+
ms.author: aahi
13+
ms.custom: language-service-custom-ta4h, references_regions
14+
---
15+
16+
# Custom Text Analytics for health service limits
17+
18+
Use this article to learn about the data and service limits when using custom Text Analytics for health.
19+
20+
## Language resource limits
21+
22+
* Your Language resource has to be created in one of the [supported regions](#regional-availability).
23+
24+
* Your resource must be one of the supported pricing tiers:
25+
26+
|Tier|Description|Limit|
27+
|--|--|--|
28+
|S |Paid tier|You can have unlimited Language S tier resources per subscription. |
29+
30+
31+
* You can only connect one storage account per resource. This process is irreversible. If you connect a storage account to your resource, you cannot unlink it later. Learn more about [connecting a storage account](../how-to/create-project.md#create-language-resource-and-connect-storage-account)
32+
33+
* You can have up to 500 projects per resource.
34+
35+
* Project names have to be unique within the same resource across all custom features.
36+
37+
## Regional availability
38+
39+
Custom Text Analytics for health is only available in some Azure regions since it is a preview service. Some regions may be available for **both authoring and prediction**, while other regions may be for **prediction only**. Language resources in authoring regions allow you to create, edit, train, and deploy your projects. Language resources in prediction regions allow you to get predictions from a deployment.
40+
41+
| Region | Authoring | Prediction |
42+
|--------------------|-----------|-------------|
43+
| East US |||
44+
| UK South |||
45+
| North Europe |||
46+
47+
## API limits
48+
49+
|Item|Request type| Maximum limit|
50+
|:-|:-|:-|
51+
|Authoring API|POST|10 per minute|
52+
|Authoring API|GET|100 per minute|
53+
|Prediction API|GET/POST|1,000 per minute|
54+
|Document size|--|125,000 characters. You can send up to 20 documents as long as they collectively do not exceed 125,000 characters|
55+
56+
> [!TIP]
57+
> If you need to send larger files than the limit allows, you can break the text into smaller chunks of text before sending them to the API. You use can the [chunk command from CLUtils](https://github.com/microsoft/CognitiveServicesLanguageUtilities/blob/main/CustomTextAnalytics.CLUtils/Solution/CogSLanguageUtilities.ViewLayer.CliCommands/Commands/ChunkCommand/README.md) for this process.
58+
59+
## Quota limits
60+
61+
|Pricing tier |Item |Limit |
62+
| --- | --- | ---|
63+
|S|Training time| Unlimited, free |
64+
|S|Prediction Calls| 5,000 text records for free per language resource|
65+
66+
## Document limits
67+
68+
* You can only use `.txt`. files. If your data is in another format, you can use the [CLUtils parse command](https://github.com/microsoft/CognitiveServicesLanguageUtilities/blob/main/CustomTextAnalytics.CLUtils/Solution/CogSLanguageUtilities.ViewLayer.CliCommands/Commands/ParseCommand/README.md) to open your document and extract the text.
69+
70+
* All files uploaded in your container must contain data. Empty files are not allowed for training.
71+
72+
* All files should be available at the root of your container.
73+
74+
## Data limits
75+
76+
The following limits are observed for authoring.
77+
78+
|Item|Lower Limit| Upper Limit |
79+
| --- | --- | --- |
80+
|Documents count | 10 | 100,000 |
81+
|Document length in characters | 1 | 128,000 characters; approximately 28,000 words or 56 pages. |
82+
|Count of entity types | 1 | 200 |
83+
|Entity length in characters | 1 | 500 |
84+
|Count of trained models per project| 0 | 10 |
85+
|Count of deployments per project| 0 | 10 |
86+
87+
## Naming limits
88+
89+
| Item | Limits |
90+
|--|--|
91+
| Project name | You can only use letters `(a-z, A-Z)`, and numbers `(0-9)` , symbols `_ . -`, with no spaces. Maximum allowed length is 50 characters. |
92+
| Model name | You can only use letters `(a-z, A-Z)`, numbers `(0-9)` and symbols `_ . -`. Maximum allowed length is 50 characters. |
93+
| Deployment name | You can only use letters `(a-z, A-Z)`, numbers `(0-9)` and symbols `_ . -`. Maximum allowed length is 50 characters. |
94+
| Entity name| You can only use letters `(a-z, A-Z)`, numbers `(0-9)` and all symbols except ":", `$ & % * ( ) + ~ # / ?`. Maximum allowed length is 50 characters. See the supported [data format](../concepts/data-formats.md#entity-naming-rules) for more information on entity names when importing a labels file. |
95+
| Document name | You can only use letters `(a-z, A-Z)`, and numbers `(0-9)` with no spaces. |
96+
97+
98+
## Next steps
99+
100+
* [Custom text analytics for health overview](../overview.md)

articles/cognitive-services/language-service/toc.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1081,6 +1081,8 @@ items:
10811081
href: custom-text-analytics-for-health/overview.md
10821082
- name: Custom text analytics for health quickstart
10831083
href: custom-text-analytics-for-health/quickstart.md
1084+
- name: Custom text analytics for health language support
1085+
href: custom-text-analytics-for-health/language-support.md
10841086
- name: How-to guides
10851087
items:
10861088
- name: Create projects
@@ -1107,6 +1109,10 @@ items:
11071109
href: custom-text-analytics-for-health/concepts/entity-components.md
11081110
- name: Evaluation metrics
11091111
href: custom-text-analytics-for-health/concepts/evaluation-metrics.md
1112+
- name: Reference
1113+
items:
1114+
- name: Service limits
1115+
href: custom-text-analytics-for-health/reference/service-limits.md
11101116
- name: Summarization (preview)
11111117
items:
11121118
- name: Summarization overview

0 commit comments

Comments
 (0)