Skip to content

Commit 2864275

Browse files
authored
image_base64 metadata is generated only for documents or PDF pages that use High Res (not Fast or VLM) (#741)
1 parent d7c06f7 commit 2864275

File tree

4 files changed

+26
-3
lines changed

4 files changed

+26
-3
lines changed

ui/enriching/image-descriptions.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@ Line breaks have been inserted here for readability. The output will not contain
3939
}
4040
```
4141

42+
<Note>
43+
The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
44+
documents or PDF pages that are partitioned by using the Fast or VLM strategy.
45+
</Note>
46+
4247
For workflows that use [chunking](/ui/chunking), note the following changes:
4348

4449
- Each `Image` element is replaced by a `CompositeElement` element.

ui/enriching/table-descriptions.mdx

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,24 @@ Line breaks have been inserted here for readability. The output will not contain
4343
}
4444
```
4545

46+
<Note>
47+
The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
48+
documents or PDF pages that are partitioned by using the Fast or VLM strategy.
49+
</Note>
50+
4651
Here are two examples of the descriptions for detected tables. These descriptions are generated with GPT-4o by OpenAI:
4752

4853
![Description of a table with information about endoscopic datasets](/img/enriching/Table-Description-1.png)
4954

5055
![Description of a table with information about potentiodynamic polarization of stainless steel](/img/enriching/Table-Description-2.png)
5156

5257
The generated table's summary will overwrite any text that Unstructured had previously extracted from that table into the `text` field.
53-
The table's original content is available in the `image_base64` field.
58+
The table's original content is available in the `image_base64` field.
59+
60+
<Note>
61+
The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
62+
documents or PDF pages that are partitioned by using the Fast or VLM strategy.
63+
</Note>
5464

5565
For workflows that use [chunking](/ui/chunking), note the following changes:
5666

ui/enriching/table-to-html.mdx

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,14 +60,17 @@ Line breaks have been inserted here for readability. The output will not contain
6060
}
6161
```
6262

63+
<Note>
64+
The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
65+
documents or PDF pages that are partitioned by using the Fast or VLM strategy.
66+
</Note>
67+
6368
For workflows that use [chunking](/ui/chunking), note the following changes:
6469

6570
- If a `Table` element must be chunked, the `Table` element is replaced by a set of related `TableChunk` elements.
6671
- Each of these `TableChunk` elements will contain HTML table output for only its own element.
6772
- None of the these `TableChunk` elements will contain an `image_base64` field.
6873

69-
70-
7174
## Generate table-to-HTML output
7275

7376
import EnrichmentTableToHTMLHiResOnly from '/snippets/general-shared-text/enrichment-table-to-html-hi-res-only.mdx';

ui/summarizing.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,11 @@ Line breaks have been inserted here for readability. The output will not contain
7171
}
7272
```
7373

74+
<Note>
75+
The `image_base64` field is generated only for documents or PDF pages that are [partitioned](/ui/partitioning) by using the High Res strategy. This field is not generated for
76+
documents or PDF pages that are partitioned by using the Fast or VLM strategy.
77+
</Note>
78+
7479
## Summarize images or tables
7580

7681
import EnrichmentImagesTablesHiResOnly from '/snippets/general-shared-text/enrichment-images-tables-hi-res-only.mdx';

0 commit comments

Comments
 (0)