You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ui/partitioning.mdx
+27Lines changed: 27 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,6 +61,33 @@ The following example shows GPT-4o by OpenAI being used. If the **Auto** strateg
61
61
62
62

63
63
64
+
## Handwriting and multilanguage characters in PDF files
65
+
66
+
The differences between the various partitioning strategies can be more clearly demonstrated by the ways each of these strategies handle handwriting and multilanguage characters within PDF files.
67
+
68
+
For example, the **Fast** partitioning strategy skips processing handwriting altogether in PDF files.
69
+
70
+
The **Fast** strategy processes multilanguage characters in PDF files with limited output, depending on the language. In the following
71
+
example, Japanese hiragana characters are processed as text, but the output can be very difficult to work with:
72
+
73
+

74
+
75
+
For handwriting, the **High Res** strategy typically produces unusable results, for example:
76
+
77
+

78
+
79
+
For multilanguage characters, the **High Res** strategy also typically produces unusable results, for example failing to recognize Japanese hiragana characters:
80
+
81
+

82
+
83
+
The **VLM** strategy can produce great results for handwriting, such as this example that uses GPT-4o by OpenAI:
84
+
85
+

86
+
87
+
The **VLM** strategy also has great support for recognizing multilanguage characters, such as this example that uses GPT-4o by OpenAI to recognize Japanese hiragana characters:
88
+
89
+

90
+
64
91
## Supported languages
65
92
66
93
**Fast** partitioning accepts any text inputs, though automatic language detection of those inputs is restricted to [langdetect](https://pypi.org/project/langdetect/).
0 commit comments