Extracting output of displacy.serve #12726
-
I've been able to run displacy.serve so that it opens a localhost browser displaying this: "Jerry PERSON walked past the School ORG, down to the lake where Martin PERSON was fishing.", but I was wondering if it's possible to extract the words that have labels from this output? Is there a way that I could filter those words into lists based on the label that displacy gave them? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
Hey EvelynGriffith, I'm not sure if I understand everything correctly -- due to not having a code snippet -- but to me it seems like you are running The named entity annotations you are looking for are stored on the import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Jerry walked past the School, down to the lake where Martin was fishing.")
print(doc.ents) This should print for ent in doc.ents:
print(doc[ent.start:ent.end]) This prints:
If you'd like to just extract the strings of various entity types you can maybe do: from collections import defaultdict
ents_dict = defaultdict(list)
for ent in doc.ents:
ents_dict[ent.label_].append(ent.text)
print(ents_dict) This prints: defaultdict(list, {'PERSON': ['Jerry', 'Martin'], 'ORG': ['School']}) You can extract the various types of entities from the dictionary with |
Beta Was this translation helpful? Give feedback.
-
Okay yes that’s what I was hoping to do! Thank you!
…On Fri, Jun 16, 2023 at 5:31 AM kadarakos ***@***.***> wrote:
Hey EvelynGriffith,
I'm not sure if I understand everything correctly -- due to not having a
code snippet -- but to me it seems like you are running displacy.serve(doc,
style="dep").
The named entity annotations you are looking for are stored on the doc
which is an instance of the Doc: https://spacy.io/api/doc.
You can find the entity spans in the document like this:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Jerry walked past the School, down to the lake where Martin was fishing.")
print(doc.ents)
This should print (Jerry, School, Martin). The elements of doc.ents are
Spans: https://spacy.io/api/span
for ent in doc.ents:
print(doc[ent.start:ent.end])
This prints:
Jerry
School
Martin
—
Reply to this email directly, view it on GitHub
<#12726 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQ2UKMZJN2WW3CMG5WXGDHDXLRG3DANCNFSM6AAAAAAZGZAKDY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
May I ask you to share your code using formatting and a code environment with syntax highlighting? |
Beta Was this translation helpful? Give feedback.
If you'd like to change the labels and it being reflected on
doc.ents
then its better to create a new list of entities and assign it back todoc.ents
. So you can loop throughdoc.ents
and for each entity you can decide whether to just add it tonew_ents
or first create a newSpan
and then add it tonew_ents
. Finally, at the end you can assigndoc.ents = new_ents
.