You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ui/partitioning.mdx
+38Lines changed: 38 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,44 @@ import PlatformPartitioningStrategies from '/snippets/general-shared-text/platfo
23
23
24
24
<PlatformPartitioningStrategies />
25
25
26
+
## Images and tables in PDF files
27
+
28
+
The differences between the various partitioning strategies can be more clearly demonstrated by the ways each of these strategies handle images and tables within PDF files.
29
+
30
+
For example, the **Fast** partitioning strategy skips processing images altogether in PDF files:
31
+
32
+

33
+
34
+
For tables, the **Fast** strategy interprets table cells in PDF files as a mixture of title, list, and uncategorized text elements:
35
+
36
+

37
+
38
+
The **High Res** strategy, by itself, processes images in PDF files sometimes with limited output:
39
+
40
+

41
+
42
+
However, when combined with the [image description](/ui/enriching/image-descriptions) enrichment, the **High Res** strategy can process images in PDF files with better result output:
43
+
44
+

45
+
46
+
For tables, the **High Res** strategy processes tables in PDF files with the table's text and an HTML representation of the table as output:
47
+
48
+

49
+
50
+
When combined with the [table description](/ui/enriching/table-descriptions) and [tables to HTML](/ui/enriching/table-to-html) enrichments, the **High Res** strategy can process tables in PDF files with even richer result output:
51
+
52
+

53
+
54
+
The **VLM** strategy processes images in PDF files with image summaries and text as HTML elements as output. The following example shows GPT-4o by OpenAI being used. If
55
+
the **Auto** strategy is selected in this example, Unstructured will route to the **VLM** strategy for processing:
56
+
57
+

58
+
59
+
For tables, the **VLM** strategy processes tables in PDF files with the table's text and an HTML representation of the table as output, similar to the **High Res** strategy.
60
+
The following example shows GPT-4o by OpenAI being used. If the **Auto** strategy is selected in this example, Unstructured will route to the **VLM** strategy for processing:
61
+
62
+

63
+
26
64
## Supported languages
27
65
28
66
**Fast** partitioning accepts any text inputs, though automatic language detection of those inputs is restricted to [langdetect](https://pypi.org/project/langdetect/).
0 commit comments