Token vectors are empty using en_core_web_trf model #8061
-
This is a copy of #8037 / #8047 because for some reason the migration on that won't finish. This was originally posted by @erip. How to reproduce the behaviour
Your EnvironmentInfo about spaCy
I imagine the model is trained on subwords, so maybe the alignment between those and tokens in the spaCy since is causing issues? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Sorry for the delayed reply on this, not sure what's up with the issue migration. This is a design decision and not a bug. Basically the |
Beta Was this translation helpful? Give feedback.
Sorry for the delayed reply on this, not sure what's up with the issue migration.
This is a design decision and not a bug. Basically the
.vector
api is only for static word vectors, not for contextual vectors like those generated by the Transformer. The Transformer models in spaCy don't include static word vectors because if you have Transformers you usually don't need them. If you need per-token representations, what you can do instead is use the data indoc._.trf_data
, which contains tensors, wordpieces, and an alignment between spaCy tokens and the wordpieces. (I'm not sure there's a guide to this anywhere yet.)