You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -50,14 +50,19 @@ Analysis is performed upon receipt of the request. Using the language detection
50
50
51
51
When you get results from language detection, you can stream the results to an application or save the output to a file on the local system.
52
52
53
-
Language detection will return one predominant language for each document you submit, along with it's [ISO 639-1](https://www.iso.org/standard/22109.html) name, a human-readable name, and a confidence score. A positive score of 1 indicates the highest possible confidence level of the analysis.
53
+
Language detection will return one predominant language for each document you submit, along with it's [ISO 639-1](https://www.iso.org/standard/22109.html) name, a human-readable name, a confidence score, script name and script code according to the [ISO 15924 standard](https://wikipedia.org/wiki/ISO_15924). A positive score of 1 indicates the highest possible confidence level of the analysis.
54
+
54
55
55
56
### Ambiguous content
56
57
57
58
In some cases it may be hard to disambiguate languages based on the input. You can use the `countryHint` parameter to specify an [ISO 3166-1 alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) country/region code. By default the API uses "US" as the default country hint. To remove this behavior, you can reset this parameter by setting this value to empty string `countryHint = ""` .
58
59
59
60
For example, "communication" is common to both English and French and if given with limited context the response will be based on the "US" country/region hint. If the origin of the text is known to be coming from France that can be given as a hint.
60
61
62
+
> [!NOTE]
63
+
> Ambiguous content can cause confidence scores to be lower.
64
+
> The `countryHint` in the response is only applicable if the confidence score is less than 0.8.
65
+
61
66
**Input**
62
67
63
68
```json
@@ -76,7 +81,8 @@ For example, "communication" is common to both English and French and if given w
76
81
}
77
82
```
78
83
79
-
The language detection model now has additional context to make a better judgment:
84
+
With the second document, the language detection model has additional context to make a better judgment because it contains the `countryHint` property in the input above. This will return the following output.
85
+
80
86
81
87
**Output**
82
88
@@ -129,7 +135,7 @@ If the analyzer can't parse the input, it returns `(Unknown)`. An example is if
129
135
}
130
136
],
131
137
"errors": [],
132
-
"modelVersion": "2021-01-05"
138
+
"modelVersion": "2023-12-01"
133
139
}
134
140
```
135
141
@@ -156,22 +162,107 @@ The resulting output consists of the predominant language, with a score of less
156
162
157
163
```json
158
164
{
159
-
"documents": [
160
-
{
161
-
"id": "1",
162
-
"detectedLanguage": {
163
-
"name": "Spanish",
164
-
"iso6391Name": "es",
165
-
"confidenceScore": 0.88
166
-
},
167
-
"warnings": []
168
-
}
169
-
],
170
-
"errors": [],
171
-
"modelVersion": "2021-01-05"
165
+
"kind": "LanguageDetectionResults",
166
+
"results": {
167
+
"documents": [
168
+
{
169
+
"id": "1",
170
+
"detectedLanguage": {
171
+
"name": "Spanish",
172
+
"iso6391Name": "es",
173
+
"confidenceScore": 0.97,
174
+
"script": "Latin",
175
+
"scriptCode": "Latn"
176
+
},
177
+
"warnings": []
178
+
}
179
+
],
180
+
"errors": [],
181
+
"modelVersion": "2023-12-01"
182
+
}
183
+
}
184
+
```
185
+
186
+
## Script name and script code
187
+
188
+
> [!NOTE]
189
+
> * Script detection is currently limited to [select languages](../language-support.md#script-detection).
190
+
> * The script detection is only available for textual input which is greater than 12 characters in length.
191
+
192
+
Language detection offers the ability to detect more than one script per language according to the [ISO 15924 standard](https://wikipedia.org/wiki/ISO_15924). Specifically, Language Detection returns two script-related properties:
193
+
194
+
*`script`: The human-readable name of the identified script
195
+
*`scriptCode`: The ISO 15924 code for the identified script
196
+
197
+
The output of the API includes the value of the `scriptCode` property for documents that are at least 12 characters or greater in length and matches the list of supported languages and scripts. Script detection is designed to benefit users whose language can be transliterated or written in more than one script, such as Kazakh or Hindi language.
198
+
199
+
Previously, language detection was designed to detect the language of documents in a wide variety of languages, dialects, and regional variants, but was limited by "Romanization". Romanization refers to conversion of text from one writing system to the Roman (Latin) script, and is necessary to detect many Indo-European languages. However, there are other languages which are written in multiple scripts, such as Kazakh, which can be written in Cyrillic, Perso-Arabic, and Latin scripts. There are also other cases in which users may either choose or are required to transliterate their language in more than one script, such as Hindi transliterated in Latin script, due to the limited availability of keyboards which support its Devanagari script.
200
+
201
+
Consequently, language detection's expanded support for script detection behaves as follows:
202
+
203
+
**Input**
204
+
205
+
```json
206
+
{
207
+
"kind": "LanguageDetection",
208
+
"parameters": {
209
+
"modelVersion": "latest"
210
+
},
211
+
"analysisInput": {
212
+
"documents": [
213
+
{
214
+
"id": "1",
215
+
"text": "आप कहाँ जा रहे हैं?"
216
+
},
217
+
{
218
+
"id": "2",
219
+
"text": "Туған жерім менің - Қазақстаным"
220
+
}
221
+
]
222
+
}
223
+
}
224
+
```
225
+
226
+
**Output**
227
+
228
+
The resulting output consists of the predominant language, along with a script name, script code, and confidence score.
Use this quickstart to send language detection requests using the REST API. In the following example, you will use cURL to identify the language that a text sample was written in.
13
+
Use this quickstart to send language detection requests using the REST API. In the following example, you'll use cURL to identify the language that a text sample was written in.
14
14
15
15
[!INCLUDE [Use Language Studio](../../../includes/use-language-studio.md)]
16
16
@@ -20,7 +20,7 @@ Use this quickstart to send language detection requests using the REST API. In t
20
20
* Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services)
21
21
* The current version of [cURL](https://curl.haxx.se/).
22
22
* Once you have your Azure subscription, <ahref="https://portal.azure.com/#create/Microsoft.CognitiveServicesTextAnalytics"title="Create a Language resource"target="_blank">create a Language resource </a> in the Azure portal to get your key and endpoint. After it deploys, select **Go to resource**.
23
-
* You will need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
23
+
* You'll need the key and endpoint from the resource you create to connect your application to the API. You'll paste your key and endpoint into the code below later in the quickstart.
24
24
* You can use the free pricing tier (`Free F0`) to try the service, and upgrade later to a paid tier for production.
25
25
26
26
> [!NOTE]
@@ -47,7 +47,7 @@ The following cURL commands are executed from a BASH shell. Edit these commands
47
47
[!INCLUDE [REST API quickstart instructions](../../../includes/rest-api-instructions.md)]
48
48
49
49
```bash
50
-
curl -i -X POST https://<your-language-resource-endpoint>/language/:analyze-text?api-version=2022-05-01 \
50
+
curl -i -X POST https://<your-language-resource-endpoint>/language/:analyze-text?api-version=2023-11-15-preview \
# What is language detection in Azure AI Language?
16
16
17
-
Language detection is one of the features offered by [Azure AI Language](../overview.md), a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. Language detection can detect the language a document is written in, and returns a language code for a wide range of languages, variants, dialects, and some regional/cultural languages.
17
+
Language detection is one of the features offered by [Azure AI Language](../overview.md), a collection of machine learning and AI algorithms in the cloud for developing intelligent applications that involve written language. Language detection is able to detect more than 100 languages in their primary script. In addition, it offers [script detection](./how-to/call-api.md#script-name-and-script-code) to detect multiple scripts per language according to the [ISO 15924 standard](https://wikipedia.org/wiki/ISO_15924) for a [select number of languages](./language-support.md#script-detection).
18
18
19
19
This documentation contains the following types of articles:
20
20
21
21
*[**Quickstarts**](quickstart.md) are getting-started instructions to guide you through making requests to the service.
22
22
*[**How-to guides**](how-to/call-api.md) contain instructions for using the service in more specific or customized ways.
23
23
24
+
## Language detection features
25
+
26
+
* Language detection: Returns one predominant language for each document you submit, along with its ISO 639-1 name, a human-readable name, confidence score, script name and script code according to ISO 15924 standard.
27
+
28
+
* Script detection: To distinguish between multiple scripts used to write certain languages, such as Kazakh, language detection returns a script name and script code according to the ISO 15924 standard.
29
+
30
+
* Ambiguous content handling: To help disambiguate language based on the input, you can specify an ISO 3166-1 alpha-2 country/region code. For example, the word "communication" is common to both English and French. Specifying the origin of the text as France can help the language detection model determine the correct language.
31
+
24
32
[!INCLUDE [Typical workflow for pre-configured language features](../includes/overview-typical-workflow.md)]
25
33
26
34
@@ -30,7 +38,7 @@ This documentation contains the following types of articles:
30
38
31
39
## Responsible AI
32
40
33
-
An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it is deployed. Read the [transparency note for language detection](/legal/cognitive-services/language-service/transparency-note-language-detection?context=/azure/ai-services/language-service/context/context) to learn about responsible AI use and deployment in your systems. You can also see the following articles for more information:
41
+
An AI system includes not only the technology, but also the people who will use it, the people who will be affected by it, and the environment in which it's deployed. Read the [transparency note for language detection](/legal/cognitive-services/language-service/transparency-note-language-detection?context=/azure/ai-services/language-service/context/context) to learn about responsible AI use and deployment in your systems. You can also see the following articles for more information:
34
42
35
43
[!INCLUDE [Responsible AI links](../includes/overview-responsible-ai-links.md)]
Copy file name to clipboardExpand all lines: articles/ai-services/language-service/whats-new.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,10 @@ ms.author: aahi
15
15
16
16
Azure AI Language is updated on an ongoing basis. To stay up-to-date with recent developments, this article provides you with information about new releases and features.
17
17
18
+
## February 2024
19
+
20
+
* Expanded [language detection](./language-detection/how-to/call-api.md#script-name-and-script-code) support for additional scripts according to the [ISO 15924 standard](https://wikipedia.org/wiki/ISO_15924) is now available starting in API version `2023-11-15-preview`.
21
+
18
22
## January 2024
19
23
20
24
*[Native document support](native-document-support/use-native-documents.md) is now available in `2023-11-15-preview` public preview.
0 commit comments