Skip to content

Commit 301e278

Browse files
committed
wording updates
1 parent 445fbdf commit 301e278

File tree

3 files changed

+7
-7
lines changed

3 files changed

+7
-7
lines changed

articles/cognitive-services/text-analytics/concepts/process-offsets.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Text offsets in the Text Analytics API
33
titleSuffix: Azure Cognitive Services
4-
description: Learn how to process text offsets in the output of the Text Analytics API.
4+
description: Learn about offsets caused by multilingual and emoji encodings.
55
services: cognitive-services
66
author: aahill
77
manager: nitinme
@@ -13,13 +13,13 @@ ms.author: aahi
1313
ms.reviewer: jdesousa
1414
---
1515

16-
# How to process text offsets in the output of the Text Analytics API
16+
# Text offsets in the Text Analytics API output
1717

1818
Multilingual and emoji support has led to Unicode encodings that use more than one [code point](https://wikipedia.org/wiki/Code_point) to represent a single displayed character, called a grapheme. For example, emojis like 🌷 and 👍 may use several characters to compose the shape with additional characters for visual attributes, such as skin tone. Similarly, the Hindi word `अनुच्छेद` is encoded as five letters and three combining marks:
1919

2020
`` + `` +`` + `` +`` + `` + `` + ``
2121

22-
Because of the different lengths of possible multilingual and emoji encodings, the Text Analytics API returns grapheme offsets.
22+
Because of the different lengths of possible multilingual and emoji encodings, the Text Analytics API may return offsets in the response.
2323

2424
## Offsets in the API response.
2525

@@ -29,9 +29,9 @@ Whenever offsets are returned the API response, such as [Named Entity Recognitio
2929
* HTTP POST/GET payloads are encoded in [UTF-8](https://www.w3schools.com/charsets/ref_html_utf8.asp), which may or may not be the default character encoding on your client-side compiler or operating system.
3030
* Offsets refer to grapheme counts based on the [Unicode 8.0.0](https://unicode.org/versions/Unicode8.0.0) standard, not character counts.
3131

32-
## Extracting substrings using Grapheme offsets
32+
## Extracting substrings from text with offsets
3333

34-
These offsets can cause problems when using character-based substring methods, for example the .NET [substring()](https://docs.microsoft.com/dotnet/api/system.string.substring?view=netframework-4.8) method. One problem is that an offset may cause a substring method to end in the middle of a multi-character grapheme encoding instead of the end.
34+
Offsets can cause problems when using character-based substring methods, for example the .NET [substring()](https://docs.microsoft.com/dotnet/api/system.string.substring?view=netframework-4.8) method. One problem is that an offset may cause a substring method to end in the middle of a multi-character grapheme encoding instead of the end.
3535

3636
In .NET consider using the [StringInfo](https://docs.microsoft.com/dotnet/api/system.globalization.stringinfo?view=netframework-4.8) class, which enables you to work with a string as a series of textual elements, rather than individual character objects. You can also look for grapheme splitter libraries in your preferred software environment.
3737

articles/cognitive-services/text-analytics/how-tos/text-analytics-how-to-entity-linking.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ The Text Analytics API is stateless. No data is stored in your account, and resu
178178

179179
All POST requests return a JSON formatted response with the IDs and detected entity properties.
180180

181-
Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See [how to process offsets in API output](../concepts/process-offsets.md) for more information.
181+
Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See [how to process text offsets](../concepts/process-offsets.md) for more information.
182182

183183
#### [Version 3.0-preview)](#tab/version-3)
184184

articles/cognitive-services/text-analytics/how-tos/text-analytics-how-to-sentiment-analysis.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ The Text Analytics API is stateless. No data is stored in your account, and resu
155155

156156
The sentiment analyzer classifies text as predominantly positive or negative. It assigns a score in the range of 0 to 1. Values close to 0.5 are neutral or indeterminate. A score of 0.5 indicates neutrality. When a string can't be analyzed for sentiment or has no sentiment, the score is always 0.5 exactly. For example, if you pass in a Spanish string with an English language code, the score is 0.5.
157157

158-
Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system. Then, import the output into an application that you can use to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See [how to process offsets in API output](../concepts/process-offsets.md) for more information.
158+
Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system. Then, import the output into an application that you can use to sort, search, and manipulate the data. Due to multilingual and emoji support, the response may contain text offsets. See [how to process offsets](../concepts/process-offsets.md) for more information.
159159

160160
#### [Version 3.0-preview](#tab/version-3)
161161

0 commit comments

Comments
 (0)