Skip to content

Doc won't serialize with custom attribute #13281

@avivbrokman

Description

@avivbrokman

How to reproduce the behaviour

I am trying to use Doc.to_bytes() after extending Doc with a custom attribute. I can successfully serialize and deserialize the custom attribute on its own, but this fails with Doc.to_bytes(). Here's a minimal reproducible example:

import spacy
from spacy.tokens import Doc

nlp = spacy.blank('en')

def serialize_spans(obj, attr):
    return [(span.start_char, span.end_char) for span in getattr(obj._, attr)]

def deserialize_spans(obj, attr):
    setattr(obj._, attr, [obj.char_span(start, end) for start, end in value])
    
Doc.set_extension("special_spans", default = list(), to_bytes = serialize_spans, from_bytes = deserialize_spans)

doc = nlp('The quick brown fox jumped over the lazy dog.')
doc._.special_spans = [doc[0:2], doc[4:6]]

# Works well
serialize_spans(doc, 'special_spans')

# Doesn't work
doc.to_bytes()

Your Environment

  • spaCy version: 3.6.1
  • Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.31
  • Python version: 3.11.7
  • Pipelines: en_core_sci_lg (0.5.3)

Metadata

Metadata

Assignees

No one assigned

    Labels

    feat / docFeature: Doc, Span and Token objectsfeat / serializeFeature: Serialization, saving and loading

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions