Training one's own model with Prodigy + misc questions #11029

nlovell1 · 2022-06-25T19:20:40Z

nlovell1
Jun 25, 2022

It seems that for all of the Spanish pretrained models, there is no tagger element, so it cannot be trained using Prodigy's pos.teach feature. Does it have to do with the fact that Spanish models have rule-based tagging?

The behavior of pos.correct is also a little odd to me. When specifying a label with
prodigy pos.correct POS_correct es_core_news_lg ./data.json -l PROPN,NOUN,INTJ -U
I was only able to find examples of INTJ and PROPN highlighted in the dataset. Obvious examples of nouns that should have been highlighted were not. I think this is due to having the morphology appended to the POS tag for the fine-grained information.

This command
prodigy pos.correct POS_correct es_core_news_lg ./data.json -U
resulted in an error
✘ No --label argument set and no labels found in model

So,

How would one go about improving the POS tagging for these models?
- Would I have to just start from scratch?
- Is there any workaround to separate the fine tuned morphology from the POS tag and use pos.correct as normal?
- Is there any way to use pos.teach on these models?
- Which pipeline gives the predictions for the POS tags and the morphology in these models?
Is there any way to train morphology predictions? Lemma predictions?
Does the POS tagging of words influence the predictions that an NER model makes after training? How much can incorrect POS tagging effect the accuracy of a model?
When you run prodigy train, is this just the default train settings for Spacy in a different wrapping?
And a more open ended question- I'm starting to get to the point where it's time to fine tune my training. Word vectors, transformers, etc., are all beyond my knowledge. A post here suggested a few resources, but I found the linked guides, for example this one, still above my level.
What resources are recommended in the community to demystify this?

polm · 2022-06-26T06:26:13Z

polm
Jun 26, 2022

To address just some points...

It seems that for all of the Spanish pretrained models, there is no tagger element, so it cannot be trained using Prodigy's pos.teach feature. Does it have to do with the fact that Spanish models have rule-based tagging?

The Spanish models do not use rule-based tagging. The post you link to states they have a rule-based tokenizer and lemmatizer. The Spanish models use a Morphologizer instead of a Tagger, so they predict Univeral Dependencies coarse POS tags (aka UPOS tags), and use an attribute ruler to set fine-grained POS tags from that. I'm not specifically familiar with the data for the Spanish pipeline but this is usually done when the training data only has UPOS tags.

It may be helpful to keep in mind that UPOS tags the Morphologizer sets correspond to tok.pos, while fine-grained tags the Tagger sets go in tok.tag.

The issue with Prodigy saying you have no labels and generally not working is probably because the training data has nothing in the tok.tag fields. One way you could train a Tagger model is to run the existing pipeline (or just the attribute ruler) over your training data to set the tags, and train a Tagger on that, which should then work with the existing Prodigy infrastructure.

Is there any way to train morphology predictions? Lemma predictions?

The Morphologizer can be trained like any component, by setting the values on your training data. For lemmas you can use the EditTreeLemmatizer.

Does the POS tagging of words influence the predictions that an NER model makes after training? How much can incorrect POS tagging effect the accuracy of a model?

NER models don't directly use POS predictions. They only way they can interact is through a shared tok2vec, and it's usually not beneficial to do that, and doesn't have a large effect in any direction. There are a couple of previous threads on this, like #9641.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training one's own model with Prodigy + misc questions #11029

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Training one's own model with Prodigy + misc questions #11029

Uh oh!

nlovell1 Jun 25, 2022

Replies: 1 comment

Uh oh!

polm Jun 26, 2022

nlovell1
Jun 25, 2022

polm
Jun 26, 2022