Add an example on how to discover available extensions on a Doc/Span/Token object #12322
-
New to spacy and I was searching the Doc API documentation (https://spacy.io/api/doc) on extensions about how to "discover" what extensions would be available on a given doc/span/token object. After some searching and head-scratching it dawned on me that a classic python dir() actually does the trick:
or alternatively and a bit more involved but cleaner:
Would it be possible to add to one of of the examples about extensions also a one-liner on how to list available extensions of a doc, span and/or token object either in the API docs or in the usage docs Overwriting custom extension attributes? Would it make sense to have a Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Thanks for the suggestion. As you note this information is already available, if awkward to access, so it could make sense to document that. Could you describe how you're using this in more detail? I'm trying to imagine having an unknown set of extensions, and I feel like in that case it'd be better to use a single extension attribute that was itself a dict with sub-values, for example. |
Beta Was this translation helpful? Give feedback.
-
I am trying to process noun chunks into different categories and instead of having to pass the result of the processing functions and the related Doc objects around separately, the idea would be to tack the results onto the Doc object as extensions, similar to attrs of a python dataclass, and pass around or return such an "enhanced" Doc. |
Beta Was this translation helpful? Give feedback.
-
Thanks for explaining, but can you give a specific example of the kind of data you have? Based on your explanation it sounds like you should use
We'll think about documenting this better, but I think that in any case when you'd want to do this, it would be better to do something else instead. |
Beta Was this translation helpful? Give feedback.
Thanks for the suggestion. As you note this information is already available, if awkward to access, so it could make sense to document that.
Could you describe how you're using this in more detail? I'm trying to imagine having an unknown set of extensions, and I feel like in that case it'd be better to use a single extension attribute that was itself a dict with sub-values, for example.