Skip to content

Commit a43e863

Browse files
authored
More details on partitioning strategy routing logic for files and pages (#701)
1 parent eaf4051 commit a43e863

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

snippets/general-shared-text/platform-partitioning-strategies.mdx

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,12 @@ including reduction in transformation quality.
88

99
- **VLM**: For the highest-quality transformation of these file types: `.bmp`, `.gif`, `.heic`, `.jpeg`, `.jpg`, `.pdf`, `.png`, `.tiff`, and `.webp`.
1010
- **High Res**: For all other [supported file types](/ui/supported-file-types), and for the generation of bounding box coordinates.
11-
- **Fast**: For text-only documents.
11+
- **Fast**: For text-only documents.
12+
13+
The **Auto** partitioning strategy routes each file as a complete unit to the appropriate partitioning strategy (**VLM**, **High Res**, or **Fast**)
14+
based on the preceding file types. Additionally, for `.pdf` files, the **Auto** partitioning strategy routes these files' pages
15+
on a page-by-page basis, as follows:
16+
17+
- A page is routed to **Fast** when it contains only embedded text and no images or tables are detected.
18+
- All other kinds of pages are routed to **VLM** or **High Res**, depending on the complexity of a page's
19+
content. Unstructured constantly optimizes its proprietary algorithm for routing to **VLM** or **High Res** in these cases.

0 commit comments

Comments
 (0)