Platform: Recommend using Auto partitioning strategy whenever possible (#492)

Paul-Cornell · web-flow · commit 85a8f9131c11 · 2025-02-25T09:13:15.000-08:00
diff --git a/api-reference/how-to/choose-partitioning-strategy.mdx b/api-reference/how-to/choose-partitioning-strategy.mdx
@@ -42,12 +42,6 @@ See [Changing partition strategy for a PDF](/api-reference/api-services/examples
 
 ## Auto partitioning strategy logic
 
-Setting `--strategy` or `strategy` to `auto` leaves the decision up to Unstructured on a file-by-file basis about which partitioning strategy to use. Specifically:
+Setting `--strategy` or `strategy` to `auto` leaves the decision up to Unstructured on a page-by-page basis about which partitioning strategy to use.
 
-- If the file is an image, the `hi_res` strategy is used for that file. The `layout_v1.0.0` high-resolution object detection model is used.
-- If the file is a PDF, the local processing logic or Unstructured tries to detect whether there are any embedded tables or images in that file.
-
-  - If no embedded tables or images are detected, the `fast` strategy is used for that file. No high-resolution object detection model is used.
-  - If at least one embedded table or image is found, the `hi_res` strategy is used for that file. The `layout_v1.0.0` high-resolution object detection model is used.
-
-- If `--strategy` or `strategy` is not specified, the `auto` strategy is used by default.
+If `--strategy` or `strategy` is not specified, the `auto` strategy is used by default.
diff --git a/platform/overview.mdx b/platform/overview.mdx
@@ -29,33 +29,16 @@ flowchart LR
     Connect-->Route-->Transform-->Chunk-->Enrich-->Embed-->Persist
 ```
 
+import PlatformPartitioningStrategies from '/snippets/general-shared-text/platform-partitioning-strategies.mdx';
+
 <Steps>
   <Step title="Connect">
     The Unstructured Platform offers multiple [source connectors](/platform/sources/overview) to connect to your data in its existing location.
   </Step>
   <Step title="Route">
-    Routing determines which strategy Unstructured Platform uses to transforming your documents into Unstructured's canonical JSON schema. The Unstructured Platform provides these [partitioning](/platform/partitioning) strategies for document transformation:
+    Routing determines which strategy Unstructured Platform uses to transform your documents into Unstructured's canonical JSON schema. The Unstructured Platform provides four [partitioning](/platform/partitioning) strategies for document transformation, as follows.
     
-    - **Fast** is ideal for simple, text-only documents.
-    - **High Res** is best for PDFs, images, and complex file types.
-
-      <Note>
-          During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
-      </Note>
-
-    - **VLM** is for challenging documents, including scanned and handwritten content.
-
-      <Note>
-          During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead. 
-          Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
-      </Note>
-
-    - **Auto** automatically analyzes and processes files on a page-by-page basis (for PDF files) and on a document-by-document basis for everything else:
-
-      - If the page or document has no images and likely does not have tables, **Fast** partitioning is used, and the page or document is billed at the **Fast** rate for processing.
-      - If the page or document has only a few tables or images with standard layouts and languages, **High Res** partitioning is used, and the page or document is billed at the **High Res** rate for processing.
-      - If the page or document has more than a few tables or images, **VLM** partitioning is used, and the page or document is billed at the **VLM** rate for processing.
-
+    <PlatformPartitioningStrategies />
   </Step>
   <Step title="Transform"> 
     Your source document is transformed into Unstructured's canonical JSON schema. Regardless of the input document, this JSON schema gives you a [standardized output](/platform/document-elements). It contains more than 20 elements, such as `Header`, `Footer`, `Title`, `NarrativeText`, `Table`, `Image`, and many more. Each document is wrapped in extensive metadata so you can understand languages, file types, sources, hierarchies, and much more.
diff --git a/platform/partitioning.mdx b/platform/partitioning.mdx
@@ -15,30 +15,11 @@ model-based workflows, which can be slower and costlier because they require a m
 When you choose a partitioning strategy for your files, you should be mindful of these speed, cost, and quality trade-offs. 
 For example, the **Fast** strategy can be about 100 times faster than leading image-to-text models.
 
-To choose one of these strategies, select one of the **Partition Strategy** options in the **Partitioner** node of a workflow:
+To choose one of these strategies, select one of the following four **Partition Strategy** options in the **Partitioner** node of a workflow.
 
 <Note>You can change a workflow's preconfigured strategy only through [Custom](/platform/workflows#create-a-custom-workflow) workflow settings.</Note>
 
-- **Fast**: This strategy is ideal for simple, text-based documents.
-- **High Res**: This strategy is best for PDFs, images, and complex file types.
+import PlatformPartitioningStrategies from '/snippets/general-shared-text/platform-partitioning-strategies.mdx';
 
-  <Note>
-      During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
-  </Note>
-
-- **VLM**: For your most challenging documents, including scanned and handwritten content.
-
-  <Note>
-      During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead. 
-      Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
-   
-      When you use the **VLM** strategy with embeddings for PDF files of 200 or more pages, you might notice some errors when 
-      these files are processed. These errors typically occur when these larger PDF files have lots of tables and high-resolution images.
-  </Note>
-
-- **Auto**: Unstructured automatically analyzes and processes files on a page-by-page basis (for PDF files) and on a document-by-document basis for everything else:
-
-  - If the page or document has no images and likely does not have tables, **Fast** partitioning is used, and the page or document is billed at the **Fast** rate for processing.
-  - If the page or document has only a few tables or images with standard layouts and languages, **High Res** partitioning is used, and the page or document is billed at the **High Res** rate for processing.
-  - If the page or document has more than a few tables or images, **VLM** partitioning is used, and the page or document is billed at the **VLM** rate for processing.
+<PlatformPartitioningStrategies />
 
diff --git a/platform/workflows.mdx b/platform/workflows.mdx
@@ -197,20 +197,15 @@ If you did not previously set the workflow to run on a schedule, you can [run th
 
 #### Custom workflow node types
 
+import PlatformPartitioningStrategies from '/snippets/general-shared-text/platform-partitioning-strategies.mdx';
+
 <AccordionGroup>
     <Accordion title="Partitioner node">
-        For **Partition Strategy**, choose one of the following:
-
-        - **Fast**: Ideal for simple, text-only documents.  
-        - **High Res**: Best for PDFs, images, and complex file types.
-
-          <Note>
-              During **High Res** processing, any detected text-based files are processed and billed at the **Fast** rate instead.
-          </Note>
+        Choose from one of four available partitioning strategies.
 
-        - **VLM**: For your most challenging documents, including scanned and handwritten content.
+        <PlatformPartitioningStrategies />
 
-          You must also choose a VLM provider and model. Available choices include:
+        For **VLM**, you must also choose a VLM provider and model. Available choices include:
 
           - **Anthropic**: 
       
@@ -232,19 +227,10 @@ If you did not previously set the workflow to run on a schedule, you can [run th
             - **Meta Llama 3.2 11B Instruct**
 
           <Note>
-              During **VLM** processing, any detected files that are not PDFs or images are processed and billed at either the **High Res** or **Fast** rate instead. 
-              Of those non-PDF and non-image files, all text-based files are processed and billed at the **Fast** rate instead. The other files are processed and billed at the **High Res** rate instead.
-          
               When you use the **VLM** strategy with embeddings for PDF files of 200 or more pages, you might notice some errors when 
               these files are processed. These errors typically occur when these larger PDF files have lots of tables and high-resolution images.
           </Note>
 
-        - **Auto** automatically analyzes and processes files on a page-by-page basis (for PDF files) and on a document-by-document basis for everything else:
-
-          - If the page or document has no images and likely does not have tables, **Fast** partitioning is used, and the page or document is billed at the **Fast** rate for processing.
-          - If the page or document has only a few tables or images with standard layouts and languages, **High Res** partitioning is used, and the page or document is billed at the **High Res** rate for processing.
-          - If the page or document has more than a few tables or images, **VLM** partitioning is used, and the page or document is billed at the **VLM** rate for processing.
-
         [Learn more](/platform/partitioning).
     </Accordion>
     <Accordion title="Chunker node">
diff --git a/snippets/general-shared-text/platform-partitioning-strategies.mdx b/snippets/general-shared-text/platform-partitioning-strategies.mdx
@@ -0,0 +1,11 @@
+Unstructured recommends that you choose the **Auto** partitioning strategy in most cases. With **Auto**, Unstructured does all 
+the heavy lifting, optimizing at runtime for the highest quality at the lowest cost page-by-page.
+
+You should consider the following additional strategies only if you are absolutely sure that your documents are of the same 
+type. Each of the following strategies are best suited for specific situations. Choosing one of these 
+strategies other than **Auto** for sets of documents of different types could produce undesirable results, 
+including reduction in transformation quality.
+
+- **VLM**: For the highest-quality transformation of these file types: `.bmp`, `.gif`, `.heic`, `.jpeg`, `.jpg`, `.pdf`, `.png`, `.tiff`, and `.webp`.
+- **High Res**: For all other [supported file types](/platform/supported-file-types), and for the generation of bounding box coordinates.
+- **Fast**: For text-only documents.