Skip to content
Discussion options

You must be logged in to vote

spaCy handles long documents in Transformers using span getters, which pass slices of the original document to the Transformer to get vectors and then combine those to get the final representation. This makes it so that 256 tokens isn't a hard limit on the length of a Doc.

If you don't like the way the default span getters work it's possible to implement a custom one - for your example document, it might make sense to split on newlines, for example.

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@aniyya
Comment options

@aniyya
Comment options

@adrianeboyd
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer feat / transformer Feature: Transformer
3 participants