Add an example on how to discover available extensions on a Doc/Span/Token object #12322

mattkeanny · 2022-12-21T16:08:57Z

mattkeanny
Dec 21, 2022

New to spacy and I was searching the Doc API documentation (https://spacy.io/api/doc) on extensions about how to "discover" what extensions would be available on a given doc/span/token object.
The use-case would have been discovering the available extensions of a doclike object returned by an (external) library function.
The options are either to go back and read the docs of the library or to have a way to list the available extensions in the shell/notebook.

After some searching and head-scratching it dawned on me that a classic python dir() actually does the trick:
(using the example from Overwriting custom extension attributes)

>>> print("list extensions:", dir(doc[0]._))
list extensions: ['get', 'has', 'is_musician', 'set']

or alternatively and a bit more involved but cleaner:

>>>print("list extensions:", [ext for ext in doc[0]._.__dict__['_extensions'].keys()])
list extensions: ['is_musician']

Would it be possible to add to one of of the examples about extensions also a one-liner on how to list available extensions of a doc, span and/or token object either in the API docs or in the usage docs Overwriting custom extension attributes?

Would it make sense to have a .list_extensions() class method for the DOC, Span, Token classes to that effect?

Thanks

Answered by polm

Dec 23, 2022

Thanks for the suggestion. As you note this information is already available, if awkward to access, so it could make sense to document that.

Could you describe how you're using this in more detail? I'm trying to imagine having an unknown set of extensions, and I feel like in that case it'd be better to use a single extension attribute that was itself a dict with sub-values, for example.

View full answer

polm · 2022-12-23T05:56:12Z

polm
Dec 23, 2022

Thanks for the suggestion. As you note this information is already available, if awkward to access, so it could make sense to document that.

Could you describe how you're using this in more detail? I'm trying to imagine having an unknown set of extensions, and I feel like in that case it'd be better to use a single extension attribute that was itself a dict with sub-values, for example.

0 replies

mattkeanny · 2022-12-25T14:04:18Z

mattkeanny
Dec 25, 2022
Author

I am trying to process noun chunks into different categories and instead of having to pass the result of the processing functions and the related Doc objects around separately, the idea would be to tack the results onto the Doc object as extensions, similar to attrs of a python dataclass, and pass around or return such an "enhanced" Doc.
While simply printing or dir() on a dataclass reveals its attrs, this requires an extra tweak with doclike extensions, perhaps not so obvious.
Documenting it would already be of great help.

0 replies

polm · 2022-12-26T04:10:41Z

polm
Dec 26, 2022

Thanks for explaining, but can you give a specific example of the kind of data you have? Based on your explanation it sounds like you should use doc.spans. For example:

doc.spans["my_chunks"] = list(doc.noun_chunks)

We'll think about documenting this better, but I think that in any case when you'd want to do this, it would be better to do something else instead.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add an example on how to discover available extensions on a Doc/Span/Token object #12322

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Add an example on how to discover available extensions on a Doc/Span/Token object #12322

Uh oh!

Uh oh!

mattkeanny Dec 21, 2022

Replies: 3 comments

Uh oh!

polm Dec 23, 2022

Uh oh!

mattkeanny Dec 25, 2022 Author

Uh oh!

polm Dec 26, 2022

mattkeanny
Dec 21, 2022

polm
Dec 23, 2022

mattkeanny
Dec 25, 2022
Author

polm
Dec 26, 2022