You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Vertex AI**: Use [Vertex AI](https://cloud.google.com/vertex-ai) to generate embeddings by using the [textembedding-gecko@001](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) model, with 768 dimensions.
Copy file name to clipboardExpand all lines: platform/overview.mdx
+13-3Lines changed: 13 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,9 +23,19 @@ To get your data RAG-ready, the Unstructured Platform moves it through the follo
23
23
<Steptitle="Route">
24
24
Routing determines which strategy Unstructured Platform uses to transforming your documents into Unstructured's canonical JSON schema. The Unstructured Platform provides these [partitioning](/platform/partitioning) strategies for document transformation:
25
25
26
-
- **Fast** is great for when there is extractable text available, like in HTML files or in the Microsoft Office Document format.
27
-
- **Hi Res** is best for PDFs and tables and where accurate classification of document elements is critical.
28
-
- If you're unsure which strategy to use, choose **Auto**, and the Unstructured Platform will handle the decision for you.
26
+
- **Basic** / **Fast** is ideal for simple, text-only documents.
27
+
- **Advanced** / **High Res** is best for PDFs, images, and complex file types.
28
+
29
+
<Note>
30
+
During **Advanced** / **High Res** processing, any detected text-based files are processed and billed at the **Basic** / **Fast** rate instead.
31
+
</Note>
32
+
33
+
-**Platinum** / **VLM** is for challenging documents, including scanned and handwritten content.
34
+
35
+
<Note>
36
+
During **Platinum** / **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** / **High Res** or **Basic** / **Fast** rate instead.
37
+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** / **Fast** rate instead. The other files are processed and billed at the **Advanced** / **High Res** rate instead.
Copy file name to clipboardExpand all lines: platform/partitioning.mdx
+12-7Lines changed: 12 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,11 +20,16 @@ To choose one of these strategies, select one of the **Partition Strategy** opti
20
20
<Note>You can change a workflow's predefined strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>
21
21
22
22
-**Fast**: This strategy is ideal for simple, text-based documents.
23
-
-**Hi-Res**: This strategy is best for PDFs, images, and complex file types.
24
-
-**VLM**: For your most challenging documents, including scanned and handwritten content, use this strategy, which leverages vision
25
-
language models (VLMs). During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged
26
-
at the **Hi-Res** rate instead.
27
-
-**Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
28
-
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
29
-
**Fast** rate for that file.
23
+
-**High Res**: This strategy is best for PDFs, images, and complex file types.
24
+
25
+
<Note>
26
+
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
27
+
</Note>
28
+
29
+
-**VLM**: For your most challenging documents, including scanned and handwritten content.
30
+
31
+
<Note>
32
+
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
33
+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
Copy file name to clipboardExpand all lines: platform/workflows.mdx
+35-18Lines changed: 35 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,8 +54,17 @@ To create an automatic workflow:
54
54
55
55
-**Basic** Ideal for simple, text-only documents.
56
56
-**Advanced** Best for PDFs, images, and complex file types.
57
-
-**Platinum** For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
58
-
During processing, files that are not PDFs or images are processed by using the **Advanced** strategy and are charged at the **Advanced** rate instead.
57
+
58
+
<Note>
59
+
During **Advanced** processing, any detected text-based files are processed and billed at the **Basic** rate instead.
60
+
</Note>
61
+
62
+
-**Platinum** For your most challenging documents, including scanned and handwritten content.
63
+
64
+
<Note>
65
+
During **Platinum** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** or **Basic** rate instead.
66
+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** rate instead. The other files are processed and billed at the **Advanced** rate instead.
67
+
</Note>
59
68
60
69
9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:
61
70
@@ -109,12 +118,18 @@ There are two ways to create a custom workflow:
109
118
9. In the **Strategy** area, choose one of the following:
110
119
111
120
-**Fast**: Ideal for simple, text-only documents.
112
-
-**Hi-Res**: Best for PDFs, images, and complex file types.
113
-
-**VLM**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
114
-
During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged at the **Hi-Res** rate instead.
115
-
-**Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
116
-
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
117
-
**Fast** rate for that file.
121
+
-**High Res**: Best for PDFs, images, and complex file types.
122
+
123
+
<Note>
124
+
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
125
+
</Note>
126
+
127
+
-**VLM**: For your most challenging documents, including scanned and handwritten content.
128
+
129
+
<Note>
130
+
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
131
+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
132
+
</Note>
118
133
119
134
[Learn more](/platform/partitioning).
120
135
@@ -189,8 +204,6 @@ There are two ways to create a custom workflow:
-**Vertex AI**: Use Vertex AI to generate embeddings by using the `textembedding-gecko@001` model, with 768 dimensions. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings).
193
-
194
207
Learn more:
195
208
196
209
-[Embedding overview](/platform/embedding)
@@ -266,12 +279,18 @@ There are two ways to create a custom workflow:
266
279
For **Partition Strategy**, choose one of the following:
267
280
268
281
-**Fast**: Ideal for simple, text-only documents.
269
-
- **Hi-Res**: Best for PDFs, images, and complex file types.
270
-
- **VLM**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
271
-
During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged at the **Hi-Res** rate instead.
272
-
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
273
-
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
274
-
**Fast** rate for that file.
282
+
- **High Res**: Best for PDFs, images, and complex file types.
283
+
284
+
<Note>
285
+
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
286
+
</Note>
287
+
288
+
-**VLM**: For your most challenging documents, including scanned and handwritten content.
289
+
290
+
<Note>
291
+
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
292
+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
293
+
</Note>
275
294
276
295
[Learn more](/platform/partitioning).
277
296
</Accordion>
@@ -338,8 +357,6 @@ There are two ways to create a custom workflow:
-**Vertex AI**: Use Vertex AI to generate embeddings by using the `textembedding-gecko@001` model, with 768 dimensions. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings).
0 commit comments