You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After partitioning and chunking, you can have Unstructured generate a list of recognized entities and their types (such as the names of organizations, products, and people) in the content, through a process known as _named entity recognition_ (NER).
5
+
After partitioning and chunking, you can have Unstructured generate a list of recognized entities and their types (such as the names of organizations, products, and people) in the content, through a process known as _named entity recognition_ (NER).
6
+
You can also have Unstructured generate a list of relationships between the entities that are recognized.
6
7
7
8
This NER is done by using models offered through these providers:
8
9
9
10
-[GPT-4o](https://openai.com/index/hello-gpt-4o/), provided through OpenAI.
10
11
-[Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet), provided through Anthropic.
11
12
12
-
Here is an example of a list of recognized entities and their types using GPT-4o. Note specifically the `entities` field that is added.
13
+
Here is an example of a list of recognized entities and their entity types, along with a list of relationships between those
14
+
entities and their relationship types, using GPT-4o. Note specifically the `entities` field that is added to the `metadata` field.
13
15
14
16
```json
15
17
{
@@ -31,63 +33,109 @@ Here is an example of a list of recognized entities and their types using GPT-4o
31
33
"eng"
32
34
],
33
35
"page_number": 2,
34
-
"entities": [
35
-
{
36
-
"entity": "Senate",
37
-
"type": "ORGANIZATION"
38
-
},
39
-
{
40
-
"entity": "United States",
41
-
"type": "LOCATION"
42
-
},
43
-
{
44
-
"entity": "Senators",
45
-
"type": "PERSON"
46
-
},
47
-
{
48
-
"entity": "State",
49
-
"type": "LOCATION"
50
-
},
51
-
{
52
-
"entity": "Legislature",
53
-
"type": "ORGANIZATION"
54
-
},
55
-
{
56
-
"entity": "six Years",
57
-
"type": "DATE"
58
-
},
59
-
{
60
-
"entity": "first Election",
61
-
"type": "EVENT"
62
-
},
63
-
{
64
-
"entity": "second Year",
65
-
"type": "DATE"
66
-
},
67
-
{
68
-
"entity": "fourth Year",
69
-
"type": "DATE"
70
-
},
71
-
{
72
-
"entity": "sixth Year",
73
-
"type": "DATE"
74
-
},
75
-
{
76
-
"entity": "Executive",
77
-
"type": "PERSON"
78
-
},
79
-
{
80
-
"entity": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
81
-
"type": "ARTIFACT"
82
-
}
83
-
]
36
+
"entities": {
37
+
"items": [
38
+
{
39
+
"entity": "Senate",
40
+
"type": "ORGANIZATION"
41
+
},
42
+
{
43
+
"entity": "United States",
44
+
"type": "LOCATION"
45
+
},
46
+
{
47
+
"entity": "Senator",
48
+
"type": "ROLE"
49
+
},
50
+
{
51
+
"entity": "State",
52
+
"type": "LOCATION"
53
+
},
54
+
{
55
+
"entity": "Legislature",
56
+
"type": "ORGANIZATION"
57
+
},
58
+
{
59
+
"entity": "Executive",
60
+
"type": "ROLE"
61
+
},
62
+
{
63
+
"entity": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
64
+
"type": "DOCUMENT"
65
+
}
66
+
],
67
+
"relationships": [
68
+
{
69
+
"from": "Senate",
70
+
"relationship": "based_in",
71
+
"to": "United States"
72
+
},
73
+
{
74
+
"from": "Senator",
75
+
"relationship": "has_role",
76
+
"to": "Senate"
77
+
},
78
+
{
79
+
"from": "Legislature",
80
+
"relationship": "has_office_in",
81
+
"to": "State"
82
+
},
83
+
{
84
+
"from": "Executive",
85
+
"relationship": "has_role",
86
+
"to": "State"
87
+
},
88
+
{
89
+
"from": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
90
+
"relationship": "dated",
91
+
"to": "DATE"
92
+
}
93
+
]
94
+
}
84
95
}
85
96
}
86
97
```
87
98
88
-
# Generate a list of entities and their types
89
-
90
-
To generate a list of recognized entities and their types, in an **Enrichment** node in a workflow, specify the following:
99
+
By default, the following entity types are supported for NER:
100
+
101
+
-`PERSON`
102
+
-`ORGANIZATION`
103
+
-`LOCATION`
104
+
-`DATE`
105
+
-`TIME`
106
+
-`EVENT`
107
+
-`MONEY`
108
+
-`PERCENT`
109
+
-`FACILITY`
110
+
-`PRODUCT`
111
+
-`ROLE`
112
+
-`DOCUMENT`
113
+
-`DATASET`
114
+
115
+
By default, the following entity relationships are supported for NER:
You can add, rename, or delete items in this list of default entity types and default entity relationship types.
133
+
You can also add any clarifying instructions to the
134
+
prompt that is used to run NER. To do this, see the next section.
135
+
136
+
# Generate a list of entities and their relationships
137
+
138
+
To generate a list of recognized entities and their relationships, in an **Enrichment** node in a workflow, specify the following:
91
139
92
140
<Note>
93
141
You can change a workflow's NER settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings.
@@ -97,16 +145,17 @@ To generate a list of recognized entities and their types, in an **Enrichment**
97
145
98
146
1. Select **Text**.
99
147
2. For **Model**, select either **OpenAI (GPT-4o)** or **Anthropic (Claude 3.5 Sonnet)**.
100
-
3. The selected model will follow a default set of instructions (called a _prompt_) to perform NER using a set of predefined entity types. To experiment
148
+
3. The selected model will follow a default set of instructions (called a _prompt_) to perform NER using a set of predefined entity types and relationships. To experiment
101
149
with running the default prompt against some sample data, click **Edit**, and then click **Run Prompt**. The selected **Model** uses the
102
150
**Prompt** to run NER on the **Input sample** and shows the results in the **Output**. Look specifically at the `response_json` field for the
103
-
entities that were recognized and their types.
151
+
entities that were recognized and their relationships.
104
152
4. To customize the prompt, change the contents of **Prompt**.
105
153
106
154
<Note>
107
155
For best results, Unstructured strongly recommends that you limit your changes only to certain portions of the default prompt, specifically:
108
156
109
157
- Adding, renaming, or deleting items in the list of predefined types (such as `PERSON`, `ORGANIZATION`, `LOCATION`, and so on).
158
+
- Adding, renaming, or deleting items in the list of predefined relationships (such as `works_for`, `based_in`, `has_role`, and so on).
110
159
- As needed, adding any clarifying instructions only between these two lines:
111
160
112
161
```text
@@ -123,5 +172,4 @@ To generate a list of recognized entities and their types, in an **Enrichment**
123
172
</Note>
124
173
125
174
5. To experiment with different data, change the contents of **Input sample**. For best results, Unstructured strongly recommends that the JSON structure in **Input sample** be preserved.
126
-
6. When you are satisfied with the **Model** and **Prompt** that you want to use, click **Save**.
127
-
175
+
6. When you are satisfied with the **Model** and **Prompt** that you want to use, click **Save**.
You can also customize the prompt used to add or remove entities. In the **Details** tab, under **Prompt**, click **Edit**. Click **Run Prompt** in the
322
+
You can also customize the prompt used to add or remove entities and relationships. In the **Details** tab, under **Prompt**, click **Edit**. Click **Run Prompt** in the
323
323
**Edit & Test Prompt** section to test the prompt.
0 commit comments