Precision on SpaCy pipeline and the possibility to use the same base models independantly #10787

merlinico · 2022-05-12T08:54:33Z

merlinico
May 12, 2022

Hello,

I have a question about the spaCy pipeline and the right way to implement my idea. I'm trying to add a custom spaCy tagger, train it on custom data and use it in a pipeline alongside the pretrained tagger of the model I'm using. For what I understand, adding this custom tagger to an existing pipeline would replace its existing tagger. Is there a solution to add multiple instances of the same spaCy models in the same pipeline in this fashion : ["original spaCy tagger", "custom tagger"].

I have the same question for the base DependencyParser model.

If that's possible, could you point me out the way to do this and to correctly form the spaCy Doc objects for training (specially, how to point to the correct Doc attribute to store my custom tags).

Thanks a lot !

polm · 2022-05-12T09:27:48Z

polm
May 12, 2022

For details on preparing training data, see the training data section of the docs. For POS tags and dependency annotations in particular it might be easiest to convert from a CoNLLu file, see here.

You can add multiple copies of the same component to a pipeline, each instance just needs its own name. See the double NER project for an example of two NER components.

However, using two part of speech taggers or dependency parsers won't really work, since there's only one place in a Doc object to put POS or dependency annotations. Is the goal to compare the output of your model with the pretrained spaCy models? If so it might make sense to just have two pipelines.

Also, can you clarify why you're training a custom tagger/parser? It's not something that's required very often, so extra background on your problem might help us understand your goals better.

2 replies

merlinico May 12, 2022
Author

Thanks for you reply ! Yeah I think the easiest way to go is using two seperate pipelines for comparison. I'm actually working on spoken data (from call transcripts). Syntax is pretty different from what the regular taggers are trained with so I'm using an annotated spoken conversations dataset to tackle the problem.

polm May 13, 2022

I tend to ask because many people who think they need a new tagger/parser don't, but that is a perfect use case for a new tagger/parser model. Good luck with your project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Precision on SpaCy pipeline and the possibility to use the same base models independantly #10787

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Precision on SpaCy pipeline and the possibility to use the same base models independantly #10787

Uh oh!

merlinico May 12, 2022

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

polm May 12, 2022

Uh oh!

merlinico May 12, 2022 Author

Uh oh!

Uh oh!

polm May 13, 2022

merlinico
May 12, 2022

Replies: 1 comment 2 replies

polm
May 12, 2022

merlinico May 12, 2022
Author