Training both NER and Neural Edit-Tree Lemmatization with a transformer #10554

oliviercwa · 2022-03-26T14:47:11Z

oliviercwa
Mar 26, 2022

Hello,

I am trying to use the "Experimental" Neural Edit-Tree Lemmatizer from the Experimentation repo using a transformer (xlm-roberta-base). I did not find any model on GitHub along with the lemmatizer so I assume it needs to be trained. Ideally we would like to train it along with a NER training to use a single pipline for both lemmatization and NER

Is it possible to train both at the same time? If so, how would you set-up your training since the data set we have for NER does not have lemmatization and vice versa?

Finally which data set was used to train Neural Edit-Tree Lemmatizer with roberta-xml ?

Thanks !

danieldk · 2022-03-27T08:24:24Z

danieldk
Mar 27, 2022

I am trying to use the "Experimental" Neural Edit-Tree Lemmatizer from the Experimentation repo using a transformer (xlm-roberta-base). I did not find any model on GitHub along with the lemmatizer so I assume it needs to be trained.

We are currently preparing for the edit-tree lemmatizer to be included in mainline spaCy, so it is likely that there will be models available in the future (hopefully spaCy 3.3). The models that Adriane Boyd trained for the UD benchmarks also use the edit tree lemmatizer:

https://explosion.ai/blog/ud-benchmarks-v3-2

Ideally we would like to train it along with a NER training to use a single pipline for both lemmatization and NER. Is it possible to train both at the same time? If so, how would you set-up your training since the data set we have for NER does not have lemmatization and vice versa?

You can train them at the same time when you do have NER and lemmas as part of the same data set. Otherwise, you will have to train on both data sets separately, but you can still make them part of one configuration. For example, you could train the lemmatizer first and then create a configuration for NER, where you source the lemmatizer and set the lemmatizer as frozen to ensure that it doesn't get updated while training NER.

Finally which data set was used to train Neural Edit-Tree Lemmatizer with roberta-xml ?

Which language/model are you referring to?

4 replies

oliviercwa Mar 28, 2022
Author

Thank you Daniel for the precision.

For the data, I was wondering which data was used to train the Neural Edit-Tree Lemmatizer with Roberta-XML.

Thanks

adrianeboyd Mar 28, 2022

You could potentially use the xlm-roberta-base transformer model for many languages. Is there are particular trained pipeline that you're looking at?

To clarify what Daniël said a bit: you can only have a shared upstream transformer for NER and a lemmatizer if you can train both at the same time from the same dataset. If you freeze the lemmatizer and then update the same transformer that it's listening to for NER, the lemmatizer performance will drop a ton. (If you want transformers for both and you don't have a single dataset, the only option would be two separate transformer components. This will be large/slow, but should work technically as long as you set upstream to the custom component name for each and make sure each relevant transformer runs right before the relevant component in your pipeline.)

oliviercwa Mar 28, 2022
Author

Thank you. Our NER trained pipeline is based on xlm-roberta-base but I don't recall seeing any lemma information in the data coming through. May be I am missing something ?

adrianeboyd Mar 29, 2022

Sorry, I'm still not sure exactly what you're asking. For example, this pipeline was trained on UD Finnish TDT v2.5:

https://huggingface.co/explosion/fi_udv25_finnishtdt_trf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Training both NER and Neural Edit-Tree Lemmatization with a transformer #10554

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Training both NER and Neural Edit-Tree Lemmatization with a transformer #10554

Uh oh!

oliviercwa Mar 26, 2022

Replies: 1 comment · 4 replies

Uh oh!

Uh oh!

danieldk Mar 27, 2022

Uh oh!

oliviercwa Mar 28, 2022 Author

Uh oh!

Uh oh!

adrianeboyd Mar 28, 2022

Uh oh!

oliviercwa Mar 28, 2022 Author

Uh oh!

adrianeboyd Mar 29, 2022

oliviercwa
Mar 26, 2022

Replies: 1 comment 4 replies

danieldk
Mar 27, 2022

oliviercwa Mar 28, 2022
Author

oliviercwa Mar 28, 2022
Author