You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/text-analytics/concepts/process-offsets.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Text offsets in the Text Analytics API
3
3
titleSuffix: Azure Cognitive Services
4
-
description: Learn how to process text offsets in the output of the Text Analytics API.
4
+
description: Learn about offsets caused by multilingual and emoji encodings.
5
5
services: cognitive-services
6
6
author: aahill
7
7
manager: nitinme
@@ -13,13 +13,13 @@ ms.author: aahi
13
13
ms.reviewer: jdesousa
14
14
---
15
15
16
-
# How to process text offsets in the output of the Text Analytics API
16
+
# Text offsets in the Text Analytics API output
17
17
18
18
Multilingual and emoji support has led to Unicode encodings that use more than one [code point](https://wikipedia.org/wiki/Code_point) to represent a single displayed character, called a grapheme. For example, emojis like 🌷 and 👍 may use several characters to compose the shape with additional characters for visual attributes, such as skin tone. Similarly, the Hindi word `अनुच्छेद` is encoded as five letters and three combining marks:
19
19
20
20
`अ` + `न` +` ु` + `च` +` ्` + `छ` + ` े` + `द`
21
21
22
-
Because of the different lengths of possible multilingual and emoji encodings, the Text Analytics API returns grapheme offsets.
22
+
Because of the different lengths of possible multilingual and emoji encodings, the Text Analytics API may return offsets in the response.
23
23
24
24
## Offsets in the API response.
25
25
@@ -29,9 +29,9 @@ Whenever offsets are returned the API response, such as [Named Entity Recognitio
29
29
* HTTP POST/GET payloads are encoded in [UTF-8](https://www.w3schools.com/charsets/ref_html_utf8.asp), which may or may not be the default character encoding on your client-side compiler or operating system.
30
30
* Offsets refer to grapheme counts based on the [Unicode 8.0.0](https://unicode.org/versions/Unicode8.0.0) standard, not character counts.
31
31
32
-
## Extracting substrings using Grapheme offsets
32
+
## Extracting substrings from text with offsets
33
33
34
-
These offsets can cause problems when using character-based substring methods, for example the .NET [substring()](https://docs.microsoft.com/dotnet/api/system.string.substring?view=netframework-4.8) method. One problem is that an offset may cause a substring method to end in the middle of a multi-character grapheme encoding instead of the end.
34
+
Offsets can cause problems when using character-based substring methods, for example the .NET [substring()](https://docs.microsoft.com/dotnet/api/system.string.substring?view=netframework-4.8) method. One problem is that an offset may cause a substring method to end in the middle of a multi-character grapheme encoding instead of the end.
35
35
36
36
In .NET consider using the [StringInfo](https://docs.microsoft.com/dotnet/api/system.globalization.stringinfo?view=netframework-4.8) class, which enables you to work with a string as a series of textual elements, rather than individual character objects. You can also look for grapheme splitter libraries in your preferred software environment.
Copy file name to clipboardExpand all lines: articles/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -178,7 +178,7 @@ The Text Analytics API is stateless. No data is stored in your account, and resu
178
178
179
179
All POST requests return a JSON formatted response with the IDs and detected entity properties.
180
180
181
-
Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See [how to process offsets in API output](../concepts/process-offsets.md) for more information.
181
+
Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See [how to process text offsets](../concepts/process-offsets.md) for more information.
Copy file name to clipboardExpand all lines: articles/cognitive-services/text-analytics/how-tos/text-analytics-how-to-sentiment-analysis.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -155,7 +155,7 @@ The Text Analytics API is stateless. No data is stored in your account, and resu
155
155
156
156
The sentiment analyzer classifies text as predominantly positive or negative. It assigns a score in the range of 0 to 1. Values close to 0.5 are neutral or indeterminate. A score of 0.5 indicates neutrality. When a string can't be analyzed for sentiment or has no sentiment, the score is always 0.5 exactly. For example, if you pass in a Spanish string with an English language code, the score is 0.5.
157
157
158
-
Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system. Then, import the output into an application that you can use to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See [how to process offsets in API output](../concepts/process-offsets.md) for more information.
158
+
Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system. Then, import the output into an application that you can use to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See [how to process offsets](../concepts/process-offsets.md) for more information.
0 commit comments