Mapping transformer vectors for long (and thus chunked) documents to spaCy tokens #9705
-
I open a separate discussion for my question from another thread. The question is what actually transformer do with the chunks I've studied the Yet in this spaCy tutorial it appears we do get some sort of a tensor, which would respect spaCy span's [start-end ]-to-[transformer vectors] mapping (even though a span can be inside two chunks, if we use a stride), so we can then pool it into one vector and thus get a span embedding. I understand, that you can possibly Then how do I get a spacy-tokenwise tensor for each and every doc token from a chunked input text? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I'm not sure If it is related to this topic of chunk merging and I shall create an issue, but my implementation of wordpiece-aware span getter leads to a random crashes with an error |
Beta Was this translation helpful? Give feedback.
-
This looks like a useful example: https://applied-language-technology.readthedocs.io/en/latest/notebooks/part_iii/05_embeddings_continued.html |
Beta Was this translation helpful? Give feedback.
This looks like a useful example: https://applied-language-technology.readthedocs.io/en/latest/notebooks/part_iii/05_embeddings_continued.html