You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`attached_to_filename`| MSG | The name of the file that the attached file is attached to. |
150
-
|`bcc_recipient`| EML | The related [email](#email) BCC recipient. |
151
-
|`cc_recipient`| EML | The related [email](#email) CC recipient. |
152
-
|`email_message_id`| EML | The related [email](#email) message ID. |
153
-
|`header_footer_type`| Word Doc | The pages that a header or footer applies to in a [Word document](#microsoft-word-files): `primary`, `even_only`, and `first_page`. |
154
-
|`link_urls`| HTML | The URL that is associated with a link in a document. |
155
-
|`link_texts`| HTML | The text that is associated with a link in a document. |
156
-
|`page_name`| XLSX | The related sheet's name in an [Excel file](#microsoft-excel-files). |
157
-
|`page_number`| DOCX, PDF, PPT, XLSX | The related file's page number. |
158
-
|`section`| EPUB | The book section title corresponding to a table of contents. |
159
-
|`sent_from`| EML | The related [email](#email) sender. |
160
-
|`sent_to`| EML | The related [email](#email) recipient. |
161
-
|`signature`| EML | The related [email](#email) signature. |
162
-
|`subject`| EML | The related [email](#email) subject. |
147
+
| Field name | Applicable file types | Description |
|`attached_to_filename`| MSG | The name of the file that the attached file is attached to. |
150
+
|`bcc_recipient`| EML | The related [email](#email) BCC recipient. |
151
+
|`cc_recipient`| EML | The related [email](#email) CC recipient. |
152
+
|`email_message_id`| EML | The related [email](#email) message ID. |
153
+
|`header_footer_type`| Word Doc | The pages that a header or footer applies to in a [Word document](#microsoft-word-files): `primary`, `even_only`, and `first_page`. |
154
+
|`image_path`| PDF | The path to the image. This is useful when you want to extract the image and save it in a specified path instead of serializing the image within the processed data. |
155
+
|`image_mime_type`| PDF | The MIME type of the image. |
156
+
|`image_url`| HTML | The URL to the image. |
157
+
|`link_start_indexes`| HTML, PDF | A list of the index locations within the extracted content where the `links` can be found. |
158
+
|`link_texts`| HTML | A list of text strings that are associated with the `link_urls`. |
159
+
|`link_urls`| HTML | A list of URLs within the extracted content. |
160
+
|`links`| PDF | A list of links within the extracted content. |
161
+
|`page_name`| XLSX | The related sheet's name in an [Excel file](#microsoft-excel-files). |
162
+
|`page_number`| DOCX, PDF, PPT, XLSX | The related file's page number. |
163
+
|`section`| EPUB | The book section title corresponding to a table of contents. |
164
+
|`sent_from`| EML | The related [email](#email) sender. |
165
+
|`sent_to`| EML | The related [email](#email) recipient. |
166
+
|`signature`| EML | The related [email](#email) signature. |
167
+
|`subject`| EML | The related [email](#email) subject. |
Copy file name to clipboardExpand all lines: ui/document-elements.mdx
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -135,8 +135,12 @@ The `coordinates` metadata field contains:
135
135
|`cc_recipient`| EML | The related [email](#email) CC recipient. |
136
136
|`email_message_id`| EML | The related [email](#email) message ID. |
137
137
|`header_footer_type`| Word Doc | The pages that a header or footer applies to in a [Word document](#microsoft-word-files): `primary`, `even_only`, and `first_page`. |
138
-
|`link_urls`| HTML | The URL that is associated with a link in a document. |
139
-
|`link_texts`| HTML | The text that is associated with a link in a document. |
138
+
|`image_mime_type`| HTML, image, PDF | The MIME type of the image. |
139
+
|`image_url`| HTML | The URL to the image. |
140
+
|`link_start_indexes`| HTML, PDF | A list of the index locations within the extracted content where the `links` can be found. |
141
+
|`link_texts`| HTML | A list of text strings that are associated with the `link_urls`. |
142
+
|`link_urls`| HTML | A list of URLs within the extracted content. |
143
+
|`links`| PDF | A list of links within the extracted content. |
140
144
|`page_name`| XLSX | The related sheet's name in an [Excel file](#microsoft-excel-files). |
141
145
|`page_number`| DOCX, PDF, PPT, XLSX | The related file's page number. |
142
146
|`section`| EPUB | The book section title corresponding to a table of contents. |
0 commit comments