Skip to content

Commit a2f62a3

Browse files
author
ecfan
committed
Update reference info
1 parent 434291e commit a2f62a3

File tree

1 file changed

+15
-10
lines changed

1 file changed

+15
-10
lines changed

articles/logic-apps/parse-document-chunk-text.md

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ ms.date: 07/26/2024
1818
> This capability is in preview and is subject to the
1919
> [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
2020
21-
Sometimes you have to convert content into token form or break down a large document into smaller pieces before you can use this content with some actions. For example, actions such as **Azure AI Search** or **Azure OpenAI** expect tokenized input and can handle only a limited number of tokens, which are words or chunks of characters.
21+
Sometimes you have to convert content into token form or break down a large document into smaller pieces before you can use this content with some actions. For example, the such as **Azure AI Search** or **Azure OpenAI** expect tokenized input and can handle only a limited number of tokens, which are words or chunks of characters.
2222

2323
For these scenarios, use the **Data Operations** actions named **Parse a document** and **Chunk text** in your Standard logic app workflow. These actions respectively convert content, such as a PDF document, CSV file, Excel file, and so on, into tokenized string output and then split the string into pieces, based on the number of tokens or characters. You can then reference and use these outputs with subsequent actions in your workflow.
2424

@@ -84,13 +84,14 @@ If you use other content sources, such as Azure Blob Storage, SharePoint, OneDri
8484

8585
| Name | Value | Data type | Description | Limit |
8686
|------|-------|-----------|-------------|-------|
87-
| **Document Content** | <*content-to-parse*> | Any | The content to parse. ||
87+
| **Document Content** | <*content-to-parse*> | Any | The content to parse. | None |
8888

8989
#### Outputs
9090

9191
| Name | Data type | Description |
9292
|------|-----------|-------------|
93-
| **Parsed result text** | String ||
93+
| **Parsed result text** | String array | An array of strings. |
94+
| **Parsed result** | Object | An object that contains the entire parsed text. |
9495

9596
## Chunk text
9697

@@ -144,23 +145,23 @@ Now, when you add other actions that expect and use tokenized input, such as the
144145

145146
| Name | Value | Data type | Description | Limit |
146147
|------|-------|-----------|-------------|-------|
147-
| **Chunking Strategy** | **FixedLength** or **TokenSize** | String | **FixedLength**: Split the content, based on the number of characters <br><br>**TokenSize**: Split the content, based on the number of tokens. <br><br>Default: **FixedLength** ||
148-
| **Text** | <*content-to-chunk*> | Any | The content to chunk. ||
148+
| **Chunking Strategy** | **FixedLength** or **TokenSize** | String enum | **FixedLength**: Split the content, based on the number of characters <br><br>**TokenSize**: Split the content, based on the number of tokens. <br><br>Default: **FixedLength** ||
149+
| **Text** | <*content-to-chunk*> | Any | The content to chunk. | See [Limits and configuration reference guide](logic-apps-limits-and-config.md#character-limits) |
149150

150151
For **Chunking Strategy** set to **FixedLength**:
151152

152153
| Name | Value | Data type | Description | Limit |
153154
|------|-------|-----------|-------------|-------|
154-
| **MaxPageLength** | <*max-char-per-chunk*> | Integer | The maximum number of characters per content chunk. <br><br>Default: **5000** ||
155-
| **PageOverlapLength** | <*number-of-overlapping-characters*> | Integer | The number of characters from the end of the previous chunk to include in the next chunk. This setting helps you avoid losing important information when splitting content into chunks and preserves continuity and context across chunks. <br><br>Default: **0** - No overlapping characters exist. ||
155+
| **MaxPageLength** | <*max-char-per-chunk*> | Integer | The maximum number of characters per content chunk. <br><br>Default: **5000** | Minimum: **1** |
156+
| **PageOverlapLength** | <*number-of-overlapping-characters*> | Integer | The number of characters from the end of the previous chunk to include in the next chunk. This setting helps you avoid losing important information when splitting content into chunks and preserves continuity and context across chunks. <br><br>Default: **0** - No overlapping characters exist. | Minimum: **0** |
156157
| **Language** | <*language*> | String | The [language](/azure/ai-services/language-service/language-detection/language-support) to use for the resulting chunks. <br><br>Default: **en-us** | Not applicable |
157158

158159
For **Chunking Strategy** set to **TokenSize**:
159160

160161
| Name | Value | Data type | Description | Limit |
161162
|------|-------|-----------|-------------|-------|
162-
| **TokenSize** | <*max-tokens-per-chunk*> | Integer | The maximum number of tokens per content chunk. <br><br>Default: None | |
163-
| **Encoding model** | <*encoding-method*> | String | The encoding method to use. <br><br>Default: None | Not applicable |
163+
| **TokenSize** | <*max-tokens-per-chunk*> | Integer | The maximum number of tokens per content chunk. <br><br>Default: None | - Minimum: 1 <br><br>- Maximum: 8000 |
164+
| **Encoding model** | <*encoding-method*> | String enum | The [encoding method]() to use: **cl100k_base**, **cl200k_base**, **p50k_base**, **p50k_edit**, **r50k_base** <br><br>Default: None | Not applicable |
164165

165166
> [!TIP]
166167
>
@@ -173,7 +174,11 @@ For **Chunking Strategy** set to **TokenSize**:
173174
174175
#### Outputs
175176

176-
177+
| Name | Data type | Description |
178+
|------|-----------|-------------|
179+
| **Chunked result Text items** | String array | An array of strings. |
180+
| **Chunked result Text items Item** | String | A single string in the array. |
181+
| **Chunked result** | Object | An object that contains the entire chunked text. |
177182

178183
## Example workflow
179184

0 commit comments

Comments
 (0)