-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Closed
Labels
feat / docFeature: Doc, Span and Token objectsFeature: Doc, Span and Token objectsfeat / serializeFeature: Serialization, saving and loadingFeature: Serialization, saving and loading
Description
How to reproduce the behaviour
I am trying to use Doc.to_bytes() after extending Doc with a custom attribute. I can successfully serialize and deserialize the custom attribute on its own, but this fails with Doc.to_bytes(). Here's a minimal reproducible example:
import spacy
from spacy.tokens import Doc
nlp = spacy.blank('en')
def serialize_spans(obj, attr):
return [(span.start_char, span.end_char) for span in getattr(obj._, attr)]
def deserialize_spans(obj, attr):
setattr(obj._, attr, [obj.char_span(start, end) for start, end in value])
Doc.set_extension("special_spans", default = list(), to_bytes = serialize_spans, from_bytes = deserialize_spans)
doc = nlp('The quick brown fox jumped over the lazy dog.')
doc._.special_spans = [doc[0:2], doc[4:6]]
# Works well
serialize_spans(doc, 'special_spans')
# Doesn't work
doc.to_bytes()
Your Environment
- spaCy version: 3.6.1
- Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.31
- Python version: 3.11.7
- Pipelines: en_core_sci_lg (0.5.3)
Metadata
Metadata
Assignees
Labels
feat / docFeature: Doc, Span and Token objectsFeature: Doc, Span and Token objectsfeat / serializeFeature: Serialization, saving and loadingFeature: Serialization, saving and loading