Skip to content

Commit 87b8af2

Browse files
authored
Merge pull request #266082 from MicrosoftDocs/main
02/12 PM Publishing
2 parents 52ead6f + da47c2f commit 87b8af2

File tree

102 files changed

+1152
-1060
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

102 files changed

+1152
-1060
lines changed

articles/ai-services/language-service/language-detection/how-to/call-api.md

Lines changed: 108 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: jboback
77
manager: nitinme
88
ms.service: azure-ai-language
99
ms.topic: how-to
10-
ms.date: 12/19/2023
10+
ms.date: 01/16/2024
1111
ms.author: jboback
1212
ms.custom: language-service-language-detection
1313
---
@@ -50,14 +50,19 @@ Analysis is performed upon receipt of the request. Using the language detection
5050

5151
When you get results from language detection, you can stream the results to an application or save the output to a file on the local system.
5252

53-
Language detection will return one predominant language for each document you submit, along with it's [ISO 639-1](https://www.iso.org/standard/22109.html) name, a human-readable name, and a confidence score. A positive score of 1 indicates the highest possible confidence level of the analysis.
53+
Language detection will return one predominant language for each document you submit, along with it's [ISO 639-1](https://www.iso.org/standard/22109.html) name, a human-readable name, a confidence score, script name and script code according to the [ISO 15924 standard](https://wikipedia.org/wiki/ISO_15924). A positive score of 1 indicates the highest possible confidence level of the analysis.
54+
5455

5556
### Ambiguous content
5657

5758
In some cases it may be hard to disambiguate languages based on the input. You can use the `countryHint` parameter to specify an [ISO 3166-1 alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) country/region code. By default the API uses "US" as the default country hint. To remove this behavior, you can reset this parameter by setting this value to empty string `countryHint = ""` .
5859

5960
For example, "communication" is common to both English and French and if given with limited context the response will be based on the "US" country/region hint. If the origin of the text is known to be coming from France that can be given as a hint.
6061

62+
> [!NOTE]
63+
> Ambiguous content can cause confidence scores to be lower.
64+
> The `countryHint` in the response is only applicable if the confidence score is less than 0.8.
65+
6166
**Input**
6267

6368
```json
@@ -76,7 +81,8 @@ For example, "communication" is common to both English and French and if given w
7681
}
7782
```
7883

79-
The language detection model now has additional context to make a better judgment:
84+
With the second document, the language detection model has additional context to make a better judgment because it contains the `countryHint` property in the input above. This will return the following output.
85+
8086

8187
**Output**
8288

@@ -129,7 +135,7 @@ If the analyzer can't parse the input, it returns `(Unknown)`. An example is if
129135
}
130136
],
131137
"errors": [],
132-
"modelVersion": "2021-01-05"
138+
"modelVersion": "2023-12-01"
133139
}
134140
```
135141

@@ -156,22 +162,107 @@ The resulting output consists of the predominant language, with a score of less
156162

157163
```json
158164
{
159-
"documents": [
160-
{
161-
"id": "1",
162-
"detectedLanguage": {
163-
"name": "Spanish",
164-
"iso6391Name": "es",
165-
"confidenceScore": 0.88
166-
},
167-
"warnings": []
168-
}
169-
],
170-
"errors": [],
171-
"modelVersion": "2021-01-05"
165+
"kind": "LanguageDetectionResults",
166+
"results": {
167+
"documents": [
168+
{
169+
"id": "1",
170+
"detectedLanguage": {
171+
"name": "Spanish",
172+
"iso6391Name": "es",
173+
"confidenceScore": 0.97,
174+
"script": "Latin",
175+
"scriptCode": "Latn"
176+
},
177+
"warnings": []
178+
}
179+
],
180+
"errors": [],
181+
"modelVersion": "2023-12-01"
182+
}
183+
}
184+
```
185+
186+
## Script name and script code
187+
188+
> [!NOTE]
189+
> * Script detection is currently limited to [select languages](../language-support.md#script-detection).
190+
> * The script detection is only available for textual input which is greater than 12 characters in length.
191+
192+
Language detection offers the ability to detect more than one script per language according to the [ISO 15924 standard](https://wikipedia.org/wiki/ISO_15924). Specifically, Language Detection returns two script-related properties:
193+
194+
* `script`: The human-readable name of the identified script
195+
* `scriptCode`: The ISO 15924 code for the identified script
196+
197+
The output of the API includes the value of the `scriptCode` property for documents that are at least 12 characters or greater in length and matches the list of supported languages and scripts. Script detection is designed to benefit users whose language can be transliterated or written in more than one script, such as Kazakh or Hindi language.
198+
199+
Previously, language detection was designed to detect the language of documents in a wide variety of languages, dialects, and regional variants, but was limited by "Romanization". Romanization refers to conversion of text from one writing system to the Roman (Latin) script, and is necessary to detect many Indo-European languages. However, there are other languages which are written in multiple scripts, such as Kazakh, which can be written in Cyrillic, Perso-Arabic, and Latin scripts. There are also other cases in which users may either choose or are required to transliterate their language in more than one script, such as Hindi transliterated in Latin script, due to the limited availability of keyboards which support its Devanagari script.
200+
201+
Consequently, language detection's expanded support for script detection behaves as follows:
202+
203+
**Input**
204+
205+
```json
206+
{
207+
    "kind": "LanguageDetection",
208+
    "parameters": {
209+
        "modelVersion": "latest"
210+
    },
211+
    "analysisInput": {
212+
        "documents": [
213+
            {
214+
                "id": "1",
215+
                "text": "आप कहाँ जा रहे हैं?"
216+
            },
217+
            {
218+
                "id": "2",
219+
                "text": "Туған жерім менің - Қазақстаным"
220+
            }
221+
        ]
222+
    }
223+
}
224+
```
225+
226+
**Output**
227+
228+
The resulting output consists of the predominant language, along with a script name, script code, and confidence score.
229+
230+
```json
231+
{
232+
    "kind": "LanguageDetectionResults",
233+
    "results": {
234+
        "documents": [
235+
            {
236+
                "id": "1",
237+
                "detectedLanguage": {
238+
                    "name": "Hindi",
239+
                    "iso6391Name": "hi",
240+
                    "confidenceScore": 1.0,
241+
                    "script": "Devanagari",
242+
                    "scriptCode": "Deva"
243+
                },
244+
                "warnings": []
245+
            },
246+
            {
247+
                "id": "2",
248+
                "detectedLanguage": {
249+
                    "name": "Kazakh",
250+
                    "iso6391Name": "kk",
251+
                    "confidenceScore": 1.0,
252+
                    "script": "Cyrillic",
253+
  "scriptCode": "Cyrl"
254+
                },
255+
                "warnings": []
256+
            }
257+
        ],
258+
        "errors": [],
259+
        "modelVersion": "2023-12-01"
260+
    }
172261
}
173262
```
174263

264+
265+
175266
## Service and data limits
176267

177268
[!INCLUDE [service limits article](../../includes/service-limits-link.md)]

articles/ai-services/language-service/language-detection/how-to/use-containers.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: jboback
77
manager: nitinme
88
ms.service: azure-ai-language
99
ms.topic: how-to
10-
ms.date: 12/19/2023
10+
ms.date: 02/12/2024
1111
ms.author: jboback
1212
ms.custom: language-service-language-detection
1313
keywords: on-premises, Docker, container
@@ -35,7 +35,7 @@ The following table describes the minimum and recommended specifications for the
3535

3636
| | Minimum host specs | Recommended host specs | Minimum TPS | Maximum TPS|
3737
|---|---------|-------------|--|--|
38-
| **Language detection** | 1 core, 2GB memory | 1 core, 4GB memory |15 | 30|
38+
| **Language detection** | 1 core, 5GB memory | 1 core, 8GB memory |15 | 30|
3939

4040
CPU core and memory correspond to the `--cpus` and `--memory` settings, which are used as part of the `docker run` command.
4141

articles/ai-services/language-service/language-detection/includes/quickstarts/rest-api.md

Lines changed: 21 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.author: jboback
1010

1111
[Reference documentation](https://go.microsoft.com/fwlink/?linkid=2239169)
1212

13-
Use this quickstart to send language detection requests using the REST API. In the following example, you will use cURL to identify the language that a text sample was written in.
13+
Use this quickstart to send language detection requests using the REST API. In the following example, you'll use cURL to identify the language that a text sample was written in.
1414

1515
[!INCLUDE [Use Language Studio](../../../includes/use-language-studio.md)]
1616

@@ -20,7 +20,7 @@ Use this quickstart to send language detection requests using the REST API. In t
2020
* Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services)
2121
* The current version of [cURL](https://curl.haxx.se/).
2222
* Once you have your Azure subscription, <a href="https://portal.azure.com/#create/Microsoft.CognitiveServicesTextAnalytics" title="Create a Language resource" target="_blank">create a Language resource </a> in the Azure portal to get your key and endpoint. After it deploys, select **Go to resource**.
23-
* You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
23+
* You'll need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
2424
* You can use the free pricing tier (`Free F0`) to try the service, and upgrade later to a paid tier for production.
2525

2626
> [!NOTE]
@@ -47,7 +47,7 @@ The following cURL commands are executed from a BASH shell. Edit these commands
4747
[!INCLUDE [REST API quickstart instructions](../../../includes/rest-api-instructions.md)]
4848

4949
```bash
50-
curl -i -X POST https://<your-language-resource-endpoint>/language/:analyze-text?api-version=2022-05-01 \
50+
curl -i -X POST https://<your-language-resource-endpoint>/language/:analyze-text?api-version=2023-11-15-preview \
5151
-H "Content-Type: application/json" \
5252
-H "Ocp-Apim-Subscription-Key:<your-language-resource-key>" \
5353
-d \
@@ -76,19 +76,23 @@ curl -i -X POST https://<your-language-resource-endpoint>/language/:analyze-text
7676

7777
```json
7878
{
79-
"kind": "LanguageDetectionResults",
80-
"results": {
81-
"documents": [{
82-
"id": "1",
83-
"detectedLanguage": {
84-
"name": "English",
85-
"iso6391Name": "en",
86-
"confidenceScore": 1.0
87-
},
88-
"warnings": []
89-
}],
90-
"errors": [],
91-
"modelVersion": "2022-10-01"
92-
}
79+
"kind": "LanguageDetectionResults",
80+
"results": {
81+
"documents": [
82+
{
83+
"id": "1",
84+
"detectedLanguage": {
85+
"name": "English",
86+
"iso6391Name": "en",
87+
"confidenceScore": 1.0,
88+
"script": "Latin",
89+
"scriptCode": "Latn"
90+
},
91+
"warnings": []
92+
}
93+
],
94+
"errors": [],
95+
"modelVersion": "2023-12-01"
96+
}
9397
}
9498
```

articles/ai-services/language-service/language-detection/language-support.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,26 @@ If you have content expressed in a less frequently used language, you can try La
162162
| Telugu | `te` |
163163
| Urdu | `ur` |
164164

165+
## Script detection
166+
167+
| Language |Script code | Scripts |
168+
| --- | --- | --- |
169+
| Bengali (Bengali-Assamese) | `as` | `Latn`, `Beng` |
170+
| Bengali (Bangla) | `bn` | `Latn`, `Beng` |
171+
| Gujarati | `gu` | `Latn`, `Gujr` |
172+
| Hindi | `hi` | `Latn`, `Deva` |
173+
| Kannada | `kn` | `Latn`, `Knda` |
174+
| Malayalam | `ml` | `Latn`, `Mlym` |
175+
| Marathi | `mr` | `Latn`, `Deva` |
176+
| Oriya | `or` | `Latn`, `Orya` |
177+
| Gurmukhi | `pa` | `Latn`, `Guru` |
178+
| Tamil | `ta` | `Latn`, `Taml` |
179+
| Telugu | `te` | `Latn`, `Telu` |
180+
| Arabic | `ur` | `Latn`, `Arab` |
181+
| Cyrillic | `tt` | `Latn`, `Cyrl` |
182+
| Serbian `sr` | `Latn`, `Cyrl` |
183+
| Unified Canadian Aboriginal Syllabics | `iu` | `Latn`, `Cans` |
184+
165185
## Next steps
166186

167187
[Language detection overview](overview.md)

articles/ai-services/language-service/language-detection/overview.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,21 @@ ms.custom: language-service-language-detection
1414

1515
# What is language detection in Azure AI Language?
1616

17-
Language detection is one of the features offered by [Azure AI Language](../overview.md), a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. Language detection can detect the language a document is written in, and returns a language code for a wide range of languages, variants, dialects, and some regional/cultural languages.
17+
Language detection is one of the features offered by [Azure AI Language](../overview.md), a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. Language detection is able to detect more than 100 languages in their primary script. In addition, it offers [script detection](./how-to/call-api.md#script-name-and-script-code) to detect multiple scripts per language according to the [ISO 15924 standard](https://wikipedia.org/wiki/ISO_15924) for a [select number of languages](./language-support.md#script-detection).
1818

1919
This documentation contains the following types of articles:
2020

2121
* [**Quickstarts**](quickstart.md) are getting-started instructions to guide you through making requests to the service.
2222
* [**How-to guides**](how-to/call-api.md) contain instructions for using the service in more specific or customized ways.
2323

24+
## Language detection features
25+
26+
* Language detection: Returns one predominant language for each document you submit, along with its ISO 639-1 name, a human-readable name, confidence score, script name and script code according to ISO 15924 standard.
27+
28+
* Script detection: To distinguish between multiple scripts used to write certain languages, such as Kazakh, language detection returns a script name and script code according to the ISO 15924 standard.
29+
30+
* Ambiguous content handling: To help disambiguate language based on the input, you can specify an ISO 3166-1 alpha-2 country/region code. For example, the word "communication" is common to both English and French. Specifying the origin of the text as France can help the language detection model determine the correct language.
31+
2432
[!INCLUDE [Typical workflow for pre-configured language features](../includes/overview-typical-workflow.md)]
2533

2634

@@ -30,7 +38,7 @@ This documentation contains the following types of articles:
3038

3139
## Responsible AI
3240

33-
An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Read the [transparency note for language detection](/legal/cognitive-services/language-service/transparency-note-language-detection?context=/azure/ai-services/language-service/context/context) to learn about responsible AI use and deployment in your systems. You can also see the following articles for more information:
41+
An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it's deployed. Read the [transparency note for language detection](/legal/cognitive-services/language-service/transparency-note-language-detection?context=/azure/ai-services/language-service/context/context) to learn about responsible AI use and deployment in your systems. You can also see the following articles for more information:
3442

3543
[!INCLUDE [Responsible AI links](../includes/overview-responsible-ai-links.md)]
3644

articles/ai-services/language-service/language-detection/quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: jboback
77
manager: nitinme
88
ms.service: azure-ai-language
99
ms.topic: quickstart
10-
ms.date: 12/19/2023
10+
ms.date: 01/16/2024
1111
ms.author: jboback
1212
ms.devlang: csharp
1313
# ms.devlang: csharp, java, javascript, python

articles/ai-services/language-service/whats-new.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,10 @@ ms.author: aahi
1515

1616
Azure AI Language is updated on an ongoing basis. To stay up-to-date with recent developments, this article provides you with information about new releases and features.
1717

18+
## February 2024
19+
20+
* Expanded [language detection](./language-detection/how-to/call-api.md#script-name-and-script-code) support for additional scripts according to the [ISO 15924 standard](https://wikipedia.org/wiki/ISO_15924) is now available starting in API version `2023-11-15-preview`.
21+
1822
## January 2024
1923

2024
* [Native document support](native-document-support/use-native-documents.md) is now available in `2023-11-15-preview` public preview.

0 commit comments

Comments
 (0)