Skip to content

Commit e7e0cfb

Browse files
authored
Platform: Platinum/VLM strategy (#319)
1 parent ec4073d commit e7e0cfb

File tree

4 files changed

+31
-50
lines changed

4 files changed

+31
-50
lines changed

faq/faq.mdx

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -46,23 +46,6 @@ When you log in to the Serverless API dashboard, you can access your API keys by
4646
Under the `Actions` column, click the `Copy` icon to copy the key or an example code snippet to process the documents
4747
using the Unstructured REST API, or the [Unstructured Ingest CLI](/ingestion/overview#unstructured-ingest-cli), or the [Unstructured Python SDK](https://github.com/Unstructured-IO/unstructured-python-client) or [Unstructured JavaScript/TypeScript SDK](https://github.com/Unstructured-IO/unstructured-js-client).
4848

49-
### What is the new Unstructured API pricing structure?
50-
51-
We offer a clear and straightforward pay-per-page pricing model, giving you full control and predictability over your
52-
document preprocessing costs. We have two different document processing strategies:
53-
54-
- **Fast Strategy**: $1 per 1000 pages processed
55-
56-
Designed for low-latency use cases, Fast Strategy is for file types other than PDF or images, where we can capitalize on the structure of the document to exact and classify the text. This strategy uses rules-based parsers to deliver fast and cost-effective processing to render natural language in structured JSON.
57-
58-
- **Hi-Res Strategy**: $10 per 1000 pages processed
59-
60-
Best for complex file types like PDF and JPEG or documents with images, forms, and tables. This strategy uses AI models to understand document layouts and render their contents in structured JSON.
61-
62-
import SharedPagesBilling from '/snippets/general-shared-text/pages-billing.mdx';
63-
64-
<SharedPagesBilling />
65-
6649
### How can I check my balance and make the payment?
6750

6851
Your usage information can be found on your API dashboard. Click the `Usage` link on the side navigation. You can view your usage by date, including detailed information, such as the number of requests, total pages processed by fast and hi-res strategy, and total cost.

platform/partitioning.mdx

Lines changed: 8 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -19,26 +19,12 @@ To choose one of these strategies, select one of the **Partition Strategy** opti
1919

2020
<Note>You can change a workflow's predefined strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>
2121

22-
- **Auto**: This strategy leaves the choice of using **High Res** or **Fast** to Unstructured to determine on a file-by-file basis as it goes along.
23-
Unstructured will use **High Res** if it can determine that the current file under analysis is an image file or a PDF file with embedded images or tables.
24-
Otherwise, Unstructured will use **Fast** on the current file.
25-
You should choose this strategy if you know that all of the files are a combination of:
26-
27-
- At least one image file; or at least one PDF file with embedded images or tables in it; and any number of other kinds of files.
28-
29-
Choosing **Auto** can be an effective choice with a reasonable balance of speed, cost, and quality
30-
when you have a mixture of these types of files.
31-
32-
- **Fast**: This strategy is rule-based. It is faster and cheaper than **High Res** but might provide lower-quality resolution.
33-
You should choose this strategy if you know that:
34-
35-
- You have only PDF files, and you know that none of them have embedded images or tables in them, or
36-
- You have no PDF files or image files at all.
37-
38-
- **Hi-res**: This strategy uses an image-to-text model for inference. It is slower and costlier than **Fast** but can provide
39-
higher-quality resolution. You should choose this strategy if you know that:
40-
41-
- All of the files are only image files, or
42-
- All of the files are only PDF files, and they have embedded images or tables in them, or
43-
- All of the files are a combination of only these two kinds of files.
22+
- **Fast**: This strategy is ideal for simple, text-based documents.
23+
- **Hi-Res**: This strategy is best for PDFs, images, and complex file types.
24+
- **VLM**: For your most challenging documents, including scanned and handwritten content, use this strategy, which leverages vision
25+
language models (VLMs). During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged
26+
at the **Hi-Res** rate instead.
27+
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
28+
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
29+
**Fast** rate for that file.
4430

platform/workflows.mdx

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,10 @@ To create an automatic workflow:
5252
7. Click **Continue**.
5353
8. In the **Optimize for** section, select the option to choose one of these predefined workflow settings groups:
5454

55-
- **Basic** is a good choice if you have text-only documents that have no images or tables in them.
56-
- **Advanced** is a good choice if you have complex documents that have images or tables or both in them.
55+
- **Basic** Ideal for simple, text-only documents.
56+
- **Advanced** Best for PDFs, images, and complex file types.
57+
- **Platinum** For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
58+
During processing, files that are not PDFs or images are processed by using the **Advanced** strategy and are charged at the **Advanced** rate instead.
5759

5860
9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:
5961

@@ -78,7 +80,7 @@ To create an automatic workflow:
7880

7981
There are two ways to create a custom workflow:
8082

81-
- Through [Build it with me > Custom](#build-it-with-me-custom). This option enables you to fine-tune the kinds of settings that are in **Basic** and **Advanced**.
83+
- Through [Build it with me > Custom](#build-it-with-me-custom). This option enables you to fine-tune the kinds of settings that are in **Basic**, **Advanced**, and **Platinum**.
8284
- Through [Build it myself](#build-it-myself). This option offers a visual workflow designer with even more fine-tuning than the **Custom** option.
8385

8486
#### Build it with me - Custom
@@ -106,9 +108,13 @@ There are two ways to create a custom workflow:
106108

107109
9. In the **Strategy** area, choose one of the following:
108110

109-
- **Fast**: This strategy uses traditional NLP extraction techniques to quickly pull in all text elements. This strategy is not good for image-based file types or files with images or tables in them.
110-
- **Hi-res**: This strategy uses the document layout to gain additional information about document elements. This strategy is good for image-based file types and files with images or tables in them. This strategy is recommended if your use case is highly sensitive to correct classification for document elements.
111-
- **Auto**: This strategy chooses the partitioning strategy on a file-by-file basis, depending on detected document characteristics.
111+
- **Fast**: Ideal for simple, text-only documents.
112+
- **Hi-Res**: Best for PDFs, images, and complex file types.
113+
- **VLM**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
114+
During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged at the **Hi-Res** rate instead.
115+
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
116+
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
117+
**Fast** rate for that file.
112118

113119
[Learn more](/platform/partitioning).
114120

@@ -259,9 +265,13 @@ There are two ways to create a custom workflow:
259265
<Accordion title="Partitioner node">
260266
For **Partition Strategy**, choose one of the following:
261267

262-
- **Auto**: This strategy chooses the partitioning strategy on a file-by-file basis, depending on detected document characteristics.
263-
- **Fast**: This strategy uses traditional NLP extraction techniques to quickly pull in all text elements. This strategy is not good for image-based file types or files with images or tables in them.
264-
- **Hi-res**: This strategy uses the document layout to gain additional information about document elements. This strategy is good for image-based file types and files with images or tables in them. This strategy is recommended if your use case is highly sensitive to correct classification for document elements.
268+
- **Fast**: Ideal for simple, text-only documents.
269+
- **Hi-Res**: Best for PDFs, images, and complex file types.
270+
- **VLM**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
271+
During processing, files that are not PDFs or images are processed by using the **Hi-Res** strategy and are charged at the **Hi-Res** rate instead.
272+
- **Auto**: This strategy examines each file before processing it. If the file is an image, or if the file is a PDF and at least one embedded table
273+
or image is found in it, **Hi-Res** is used to process that file and charged at the **Hi-Res** rate for that file. Otherwise, **Fast** is used and charged at the
274+
**Fast** rate for that file.
265275

266276
[Learn more](/platform/partitioning).
267277
</Accordion>

snippets/quickstarts/platform.mdx

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,10 @@ allowfullscreen
7474
7. Click **Continue**.
7575
8. In the **Optimize for** section, select the option to choose one of these predefined workflow settings groups:
7676

77-
- **Basic** is a good choice if you have text-only documents that have no images or tables in them.
78-
- **Advanced** is a good choice if you have complex documents that have images or tables or both in them.
77+
- **Basic**: Ideal for simple, text-only documents.
78+
- **Advanced**: Best for PDFs, images, and complex file types.
79+
- **Platinum**: For your most challenging documents, including scanned and handwritten content. It uses vision language models (VLMs).
80+
During processing, files that are not PDFs or images are processed by using the **Advanced** strategy and are charged at the **Advanced** rate instead.
7981

8082
9. The **Reprocess all** box applies only to the Amazon S3 and Azure Blob Storage source connectors:
8183

0 commit comments

Comments
 (0)