Skip to content

Commit be546bc

Browse files
authored
UI/API document elements: minor doc updates (#593)
1 parent fd33a67 commit be546bc

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

ui/document-elements.mdx

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ Here's an example of what an element might look like:
2727

2828
Every element has a [type](#element-type); an [element_id](#element-id); the extracted `text`; and some [metadata](#metadata) which might
2929
vary depending on the element type, file structure, and some additional settings that are applied during
30-
[partitioning](/ui/partitioning), chunking, summarizing, and embedding.
30+
[partitioning](/ui/partitioning), [chunking](/ui/chunking), and [enriching](/ui/enriching/overview). Optionally, the element can also have an
31+
[embeddings](/ui/embedding) derived from the `text`; the length of `embeddings` depends on the embedding model that is used.
3132

3233
## Element type
3334

@@ -43,18 +44,21 @@ Here are some examples of the element types your file might contain:
4344
| Element type | Description |
4445
|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------|
4546
| `Address` | A text element for capturing physical addresses. |
47+
| `CodeSnippet` | A text element for capturing code snippets. |
4648
| `EmailAddress` | A text element for capturing email addresses. |
4749
| `FigureCaption` | An element for capturing text associated with figure captions. |
4850
| `Footer` | An element for capturing document footers. |
51+
| `FormKeysValues` | An element for capturing key-value pairs in a form. |
4952
| `Formula` | An element containing formulas in a file. |
5053
| `Header` | An element for capturing document headers. |
5154
| `Image` | A text element for capturing image metadata. |
5255
| `ListItem` | `ListItem` is a `NarrativeText` element that is part of a list. |
5356
| `NarrativeText` | `NarrativeText` is an element consisting of multiple, well-formulated sentences. This excludes elements such titles, headers, footers, and captions. |
5457
| `PageBreak` | An element for capturing page breaks. |
58+
| `PageNumber` | An element for capturing page numbers. |
5559
| `Table` | An element for capturing tables. |
5660
| `Title` | A text element for capturing titles. |
57-
| `UncategorizedText` | Base element for capturing free text from within files. |
61+
| `UncategorizedText` | Base element for capturing free text from within files. Applies to extracted text not associated with bounding boxes if the input is a PDF file. |
5862

5963
If you apply chunking, you will also see the `CompositeElement` type.
6064
`CompositeElement` is a chunk formed from text (non-`Table`) elements.
@@ -172,6 +176,7 @@ Documents can include additional file metadata, based on the specified source co
172176
- `date_created`
173177
- `date_modified`
174178
- `date_processed`
179+
- `permissions_data`
175180
- `record_locator`
176181
- `url`
177182
- `version`

0 commit comments

Comments
 (0)