Skip to content
Discussion options

You must be logged in to vote

If you are using only transformer and not tok2vec (either as a separate component or internal to the component with an architecture with some form of HashEmbed), then the custom tokenizer will mainly affect the tokenization that you see in the resulting spacy doc. During training, the transformer component is using the transformer tokenizer internally and not the spacy tokenization. However, if there are a lot of alignment issues between your gold annotation and the predicted tokenization from the custom tokenizer, this would affect the training and evaluation for a component like ner, but this is true for both transformer and tok2vec.

I think you could also potentially run into poor resu…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@Pandalei97
Comment options

@adrianeboyd
Comment options

@Pandalei97
Comment options

@adrianeboyd
Comment options

Answer selected by polm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage feat / tokenizer Feature: Tokenizer feat / tok2vec Feature: Token-to-vector layer and pretraining
2 participants