Setting span.id_ for doc.sents spans #13835
-
I want to set a unique ID for each of the sentence spans in doc.sents. However if I iterate over doc.sents and set a value for each, this does not persist. Is there a better way to do this? Eg. for sent in doc.sents:
sent.id_ = "<unique_id>"
...
for sent in doc.sents:
print(sent.id_)
> "" |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @ghinch, There is no built-in attribute for sentences. However, the big deal is that This documentation provides information on extending new attributes. I provided an implementation that uses the attribute extension: import spacy
from spacy.tokens import Span
nlp = spacy.load("en_core_web_sm")
Span.set_extension("unique_id", default=-1)
doc = nlp("This is sentence one. This is sentence two.")
for sent_i, sent in enumerate(doc.sents):
sent._.unique_id = sent_i You can certainly change the <unique_id> however fits your application. This solution works because the sentence segmentation is consistent across the document and you're accessing the exact slice. You can verify this by iterating over the sentences again: for sent in doc.sents:
print(sent._.unique_id) Output: 0
1 Now, the preferred solution includes a combination of attribute extensions and nlp = spacy.load("en_core_web_sm")
Span.set_extension("unique_id", default=-1)
@Language.component("label_sents")
def assign_sentence_ids(doc):
for sent_i, sent in enumerate(doc.sents):
sent._.unique_id = sent_i
return doc Then you can run: nlp.add_pipe("label_sents", last=True)
doc = nlp("Sentence one. Sentence two.")
for sent in doc.sents:
print(sent._.unique_id) Of course you can use a python list or a pandas |
Beta Was this translation helpful? Give feedback.
Hi @ghinch,
There is no built-in attribute for sentences. However, the big deal is that
doc.sents
yieldsSpan
objects. That is, you can you can take advantage of the correspondingSpan
for each sentence if you need to store it within a spaCy object. A similar issue was answered here by one of spaCy's maintainers.This documentation provides information on extending new attributes. I provided an implementation that uses the attribute extension: