Skip to content

Commit a379a58

Browse files
authored
Platform: Remove Auto and Vertex AI, add strategy fallback formulas (#351)
1 parent c3e8a91 commit a379a58

File tree

4 files changed

+61
-31
lines changed

4 files changed

+61
-31
lines changed

platform/embedding.mdx

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,4 @@ To generate embeddings, choose one of the following embedding providers and mode
6969
- **text-embedding-3-large**, with 3072 dimensions.
7070
- **Ada 002 (Text)**, with 1536 dimensions.
7171

72-
[Learn more](https://platform.openai.com/docs/guides/embeddings).
73-
74-
- **Vertex AI**: Use [Vertex AI](https://cloud.google.com/vertex-ai) to generate embeddings by using the [textembedding-gecko@001](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings) model, with 768 dimensions.
72+
[Learn more](https://platform.openai.com/docs/guides/embeddings).

platform/overview.mdx

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,19 @@ To get your data RAG-ready, the Unstructured Platform moves it through the follo
2323
<Step title="Route">
2424
Routing determines which strategy Unstructured Platform uses to transforming your documents into Unstructured's canonical JSON schema. The Unstructured Platform provides these [partitioning](/platform/partitioning) strategies for document transformation:
2525

26-
- **Fast** is great for when there is extractable text available, like in HTML files or in the Microsoft Office Document format.
27-
- **Hi Res** is best for PDFs and tables and where accurate classification of document elements is critical.
28-
- If you're unsure which strategy to use, choose **Auto**, and the Unstructured Platform will handle the decision for you.
26+
- **Basic** / **Fast** is ideal for simple, text-only documents.
27+
- **Advanced** / **High Res** is best for PDFs, images, and complex file types.
28+
29+
<Note>
30+
During **Advanced** / **High Res** processing, any detected text-based files are processed and billed at the **Basic** / **Fast** rate instead.
31+
</Note>
32+
33+
- **Platinum** / **VLM** is for challenging documents, including scanned and handwritten content.
34+
35+
<Note>
36+
During **Platinum** / **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** / **High Res** or **Basic** / **Fast** rate instead.
37+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** / **Fast** rate instead. The other files are processed and billed at the **Advanced** / **High Res** rate instead.
38+
</Note>
2939

3040
</Step>
3141
<Step title="Transform">

platform/partitioning.mdx

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,16 @@ To choose one of these strategies, select one of the **Partition Strategy** opti
2020
<Note>You can change a workflow's predefined strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>
2121

2222
- **Fast**: This strategy is ideal for simple, text-based documents.
23-
- **Hi-Res**: This strategy is best for PDFs, images, and complex file types.
24-
- **VLM**: For your most challenging documents, including scanned and handwritten content, use this strategy, which leverages vision
25-
language models (VLMs). During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged
26-
at the **Hi-Res** rate instead.
27-
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
28-
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
29-
**Fast** rate for that file.
23+
- **High Res**: This strategy is best for PDFs, images, and complex file types.
24+
25+
<Note>
26+
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
27+
</Note>
28+
29+
- **VLM**: For your most challenging documents, including scanned and handwritten content.
30+
31+
<Note>
32+
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
33+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
34+
</Note>
3035

platform/workflows.mdx

Lines changed: 35 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,17 @@ To create an automatic workflow:
5454

5555
- **Basic** Ideal for simple, text-only documents.
5656
- **Advanced** Best for PDFs, images, and complex file types.
57-
- **Platinum** For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
58-
During processing, files that are not PDFs or images are processed by using the **Advanced** strategy and are charged at the **Advanced** rate instead.
57+
58+
<Note>
59+
During **Advanced** processing, any detected text-based files are processed and billed at the **Basic** rate instead.
60+
</Note>
61+
62+
- **Platinum** For your most challenging documents, including scanned and handwritten content.
63+
64+
<Note>
65+
During **Platinum** processing, any detected files that are not PDFs or images are processed and billed at either the **Advanced** or **Basic** rate instead.
66+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Basic** rate instead. The other files are processed and billed at the **Advanced** rate instead.
67+
</Note>
5968

6069
9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:
6170

@@ -109,12 +118,18 @@ There are two ways to create a custom workflow:
109118
9. In the **Strategy** area, choose one of the following:
110119

111120
- **Fast**: Ideal for simple, text-only documents.
112-
- **Hi-Res**: Best for PDFs, images, and complex file types.
113-
- **VLM**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
114-
During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged at the **Hi-Res** rate instead.
115-
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
116-
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
117-
**Fast** rate for that file.
121+
- **High Res**: Best for PDFs, images, and complex file types.
122+
123+
<Note>
124+
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
125+
</Note>
126+
127+
- **VLM**: For your most challenging documents, including scanned and handwritten content.
128+
129+
<Note>
130+
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
131+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
132+
</Note>
118133

119134
[Learn more](/platform/partitioning).
120135

@@ -189,8 +204,6 @@ There are two ways to create a custom workflow:
189204

190205
[Learn more](https://platform.openai.com/docs/guides/embeddings).
191206

192-
- **Vertex AI**: Use Vertex AI to generate embeddings by using the `textembedding-gecko@001` model, with 768 dimensions. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings).
193-
194207
Learn more:
195208

196209
- [Embedding overview](/platform/embedding)
@@ -266,12 +279,18 @@ There are two ways to create a custom workflow:
266279
For **Partition Strategy**, choose one of the following:
267280

268281
- **Fast**: Ideal for simple, text-only documents.
269-
- **Hi-Res**: Best for PDFs, images, and complex file types.
270-
- **VLM**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
271-
During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged at the **Hi-Res** rate instead.
272-
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
273-
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
274-
**Fast** rate for that file.
282+
- **High Res**: Best for PDFs, images, and complex file types.
283+
284+
<Note>
285+
During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
286+
</Note>
287+
288+
- **VLM**: For your most challenging documents, including scanned and handwritten content.
289+
290+
<Note>
291+
During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead.
292+
Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
293+
</Note>
275294

276295
[Learn more](/platform/partitioning).
277296
</Accordion>
@@ -338,8 +357,6 @@ There are two ways to create a custom workflow:
338357

339358
[Learn more](https://platform.openai.com/docs/guides/embeddings).
340359

341-
- **Vertex AI**: Use Vertex AI to generate embeddings by using the `textembedding-gecko@001` model, with 768 dimensions. [Learn more](https://cloud.google.com/vertex-ai/generative-ai/docs/embeddings/get-text-embeddings).
342-
343360
Learn more:
344361

345362
- [Embedding overview](/platform/embedding)

0 commit comments

Comments
 (0)