Skip to content
Discussion options

You must be logged in to vote

That is a pretty complicated use of the pipeline! I am glad it worked for you, but I have never heard of anyone else changing the tokenizer on a pipeline in the same document, and I am surprised you didn't run into issues.

Regarding the issue with spacy.load, what is happening is that when you load the pipeline, it is also calling import spacy.lang.zh, and has no way to look at the custom code you defined. You should be able to define a custom language, use it at training/assemble time, and save a pipeline that way, though I think there are better ways to do this.

It sounds like you only need the jieba segmentation for comparison, right? In that case I think it makes sense to have a separ…

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
2 replies
@lingvisa
Comment options

@polm
Comment options

Answer selected by lingvisa
Comment options

You must be logged in to vote
1 reply
@polm
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / zh Chinese language data and models feat / pipeline Feature: Processing pipeline and components
2 participants