SpanCategorizer for default NER models #9644
-
Is there a way we can use can we use SpanCategorizer component to get the confidence scores from the default trained models such as en_core_web_trf and en_core_web_md ? We don't want to train new models, we just need the confidence scores for NER and POS from the default models, how it can be done ? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
You cannot get confidence scores from the default models without training anything. The tagger is not a model that generates a meaningful confidence score, it isn't structured like that. It's like asking for feathers from a horse. If you have access to OntoNotes you can train a spancat with the same data the NER models were trained with. If you do not have access to OntoNotes, you have another option, though I would not recommend it. You can use the pretrained models to annotate a lot of text and use those annotations as training data for a spancat model. For NER this might work acceptably if you use enough text. I don't think this would work well for POS. |
Beta Was this translation helpful? Give feedback.
You cannot get confidence scores from the default models without training anything. The tagger is not a model that generates a meaningful confidence score, it isn't structured like that. It's like asking for feathers from a horse.
If you have access to OntoNotes you can train a spancat with the same data the NER models were trained with. If you do not have access to OntoNotes, you have another option, though I would not recommend it. You can use the pretrained models to annotate a lot of text and use those annotations as training data for a spancat model. For NER this might work acceptably if you use enough text. I don't think this would work well for POS.