Skip to content

Commit 299c273

Browse files
authored
Entity relationships added to NER (#588)
1 parent 17e89be commit 299c273

File tree

2 files changed

+109
-61
lines changed

2 files changed

+109
-61
lines changed

ui/enriching/ner.mdx

Lines changed: 107 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,16 @@
22
title: Named entity recognition (NER)
33
---
44

5-
After partitioning and chunking, you can have Unstructured generate a list of recognized entities and their types (such as the names of organizations, products, and people) in the content, through a process known as _named entity recognition_ (NER).
5+
After partitioning and chunking, you can have Unstructured generate a list of recognized entities and their types (such as the names of organizations, products, and people) in the content, through a process known as _named entity recognition_ (NER).
6+
You can also have Unstructured generate a list of relationships between the entities that are recognized.
67

78
This NER is done by using models offered through these providers:
89

910
- [GPT-4o](https://openai.com/index/hello-gpt-4o/), provided through OpenAI.
1011
- [Claude 3.5 Sonnet](https://www.anthropic.com/news/claude-3-5-sonnet), provided through Anthropic.
1112

12-
Here is an example of a list of recognized entities and their types using GPT-4o. Note specifically the `entities` field that is added.
13+
Here is an example of a list of recognized entities and their entity types, along with a list of relationships between those
14+
entities and their relationship types, using GPT-4o. Note specifically the `entities` field that is added to the `metadata` field.
1315

1416
```json
1517
{
@@ -31,63 +33,109 @@ Here is an example of a list of recognized entities and their types using GPT-4o
3133
"eng"
3234
],
3335
"page_number": 2,
34-
"entities": [
35-
{
36-
"entity": "Senate",
37-
"type": "ORGANIZATION"
38-
},
39-
{
40-
"entity": "United States",
41-
"type": "LOCATION"
42-
},
43-
{
44-
"entity": "Senators",
45-
"type": "PERSON"
46-
},
47-
{
48-
"entity": "State",
49-
"type": "LOCATION"
50-
},
51-
{
52-
"entity": "Legislature",
53-
"type": "ORGANIZATION"
54-
},
55-
{
56-
"entity": "six Years",
57-
"type": "DATE"
58-
},
59-
{
60-
"entity": "first Election",
61-
"type": "EVENT"
62-
},
63-
{
64-
"entity": "second Year",
65-
"type": "DATE"
66-
},
67-
{
68-
"entity": "fourth Year",
69-
"type": "DATE"
70-
},
71-
{
72-
"entity": "sixth Year",
73-
"type": "DATE"
74-
},
75-
{
76-
"entity": "Executive",
77-
"type": "PERSON"
78-
},
79-
{
80-
"entity": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
81-
"type": "ARTIFACT"
82-
}
83-
]
36+
"entities": {
37+
"items": [
38+
{
39+
"entity": "Senate",
40+
"type": "ORGANIZATION"
41+
},
42+
{
43+
"entity": "United States",
44+
"type": "LOCATION"
45+
},
46+
{
47+
"entity": "Senator",
48+
"type": "ROLE"
49+
},
50+
{
51+
"entity": "State",
52+
"type": "LOCATION"
53+
},
54+
{
55+
"entity": "Legislature",
56+
"type": "ORGANIZATION"
57+
},
58+
{
59+
"entity": "Executive",
60+
"type": "ROLE"
61+
},
62+
{
63+
"entity": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
64+
"type": "DOCUMENT"
65+
}
66+
],
67+
"relationships": [
68+
{
69+
"from": "Senate",
70+
"relationship": "based_in",
71+
"to": "United States"
72+
},
73+
{
74+
"from": "Senator",
75+
"relationship": "has_role",
76+
"to": "Senate"
77+
},
78+
{
79+
"from": "Legislature",
80+
"relationship": "has_office_in",
81+
"to": "State"
82+
},
83+
{
84+
"from": "Executive",
85+
"relationship": "has_role",
86+
"to": "State"
87+
},
88+
{
89+
"from": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
90+
"relationship": "dated",
91+
"to": "DATE"
92+
}
93+
]
94+
}
8495
}
8596
}
8697
```
8798

88-
# Generate a list of entities and their types
89-
90-
To generate a list of recognized entities and their types, in an **Enrichment** node in a workflow, specify the following:
99+
By default, the following entity types are supported for NER:
100+
101+
- `PERSON`
102+
- `ORGANIZATION`
103+
- `LOCATION`
104+
- `DATE`
105+
- `TIME`
106+
- `EVENT`
107+
- `MONEY`
108+
- `PERCENT`
109+
- `FACILITY`
110+
- `PRODUCT`
111+
- `ROLE`
112+
- `DOCUMENT`
113+
- `DATASET`
114+
115+
By default, the following entity relationships are supported for NER:
116+
117+
- `PERSON` - `ORGANIZATION`: `works_for`, `affiliated_with`, `founded`
118+
- `PERSON` - `LOCATION`: `born_in`, `lives_in`, `traveled_to`
119+
- `ORGANIZATION` - `LOCATION`: `based_in`, `has_office_in`
120+
- Entity - `DATE`: `occurred_on`, `founded_on`, `died_on`, `published_in`
121+
- `PERSON` - `PERSON`: `married_to`, `parent_of`, `colleague_of`
122+
- `PRODUCT` - `ORGANIZATION`: `developed_by`, `owned_by`
123+
- `EVENT` - `LOCATION`: `held_in`, `occurred_in`
124+
- Entity - `ROLE`: `has_title`, `acts_as`, `has_role`
125+
- `DATASET` - `PERSON`: `mentions`
126+
- `DATASET` - `DOCUMENT`: `located_in`
127+
- `PERSON` - `DATASET`: `published`
128+
- `DOCUMENT` - `DOCUMENT`: `referenced_in`, `contains`
129+
- `DOCUMENT` - `DATE`: `dated`
130+
- `PERSON` - `DOCUMENT`: `published`
131+
132+
You can add, rename, or delete items in this list of default entity types and default entity relationship types.
133+
You can also add any clarifying instructions to the
134+
prompt that is used to run NER. To do this, see the next section.
135+
136+
# Generate a list of entities and their relationships
137+
138+
To generate a list of recognized entities and their relationships, in an **Enrichment** node in a workflow, specify the following:
91139

92140
<Note>
93141
You can change a workflow's NER settings only through [Custom](/ui/workflows#create-a-custom-workflow) workflow settings.
@@ -97,16 +145,17 @@ To generate a list of recognized entities and their types, in an **Enrichment**
97145

98146
1. Select **Text**.
99147
2. For **Model**, select either **OpenAI (GPT-4o)** or **Anthropic (Claude 3.5 Sonnet)**.
100-
3. The selected model will follow a default set of instructions (called a _prompt_) to perform NER using a set of predefined entity types. To experiment
148+
3. The selected model will follow a default set of instructions (called a _prompt_) to perform NER using a set of predefined entity types and relationships. To experiment
101149
with running the default prompt against some sample data, click **Edit**, and then click **Run Prompt**. The selected **Model** uses the
102150
**Prompt** to run NER on the **Input sample** and shows the results in the **Output**. Look specifically at the `response_json` field for the
103-
entities that were recognized and their types.
151+
entities that were recognized and their relationships.
104152
4. To customize the prompt, change the contents of **Prompt**.
105153

106154
<Note>
107155
For best results, Unstructured strongly recommends that you limit your changes only to certain portions of the default prompt, specifically:
108156

109157
- Adding, renaming, or deleting items in the list of predefined types (such as `PERSON`, `ORGANIZATION`, `LOCATION`, and so on).
158+
- Adding, renaming, or deleting items in the list of predefined relationships (such as `works_for`, `based_in`, `has_role`, and so on).
110159
- As needed, adding any clarifying instructions only between these two lines:
111160

112161
```text
@@ -123,5 +172,4 @@ To generate a list of recognized entities and their types, in an **Enrichment**
123172
</Note>
124173

125174
5. To experiment with different data, change the contents of **Input sample**. For best results, Unstructured strongly recommends that the JSON structure in **Input sample** be preserved.
126-
6. When you are satisfied with the **Model** and **Prompt** that you want to use, click **Save**.
127-
175+
6. When you are satisfied with the **Model** and **Prompt** that you want to use, click **Save**.

ui/workflows.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -313,13 +313,13 @@ import PlatformPartitioningStrategies from '/snippets/general-shared-text/platfo
313313

314314
[Learn more](/ui/enriching/table-to-html).
315315

316-
- **Text** to generate a list of recognized entities and their types by using a technique called _named entity recognition_ (NER).
316+
- **Text** to generate a list of recognized entities and their relationships by using a technique called _named entity recognition_ (NER).
317317
Also select one of the following provider (and model) combinations to use:
318318

319319
- **OpenAI (GPT-4o)**. [Learn more](https://openai.com/index/hello-gpt-4o/).
320320
- **Anthropic (Claude 3.5 Sonnet)**. [Learn more](https://www.anthropic.com/news/claude-3-5-sonnet).
321321

322-
You can also customize the prompt used to add or remove entities. In the **Details** tab, under **Prompt**, click **Edit**. Click **Run Prompt** in the
322+
You can also customize the prompt used to add or remove entities and relationships. In the **Details** tab, under **Prompt**, click **Edit**. Click **Run Prompt** in the
323323
**Edit & Test Prompt** section to test the prompt.
324324

325325
[Learn more](/ui/enriching/ner).

0 commit comments

Comments
 (0)