Skip to content
Discussion options

You must be logged in to vote

This was a bit of a surprise to me but I confirmed it is happening and figured out why.

When you create a Span extension, the data is actually saved on the Doc, in Doc.user_data (a dict). The key is a tuple that includes the field name and the span start and end, but not the span object id or other info. So two spans with the same start and end will have the same data.

I guess this is a design decision, but it is surprising that items without object identity share data, and we should probably highlight this more in the docs. On the other hand, I'm having a hard time imagining concrete cases where it would make sense for spans over the same text to have different user data. Could you tell …

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@polkaYK
Comment options

@polm
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / doc Feature: Doc, Span and Token objects
2 participants