how to replicate the training of one of spacy's pipelines #9288
-
Hello, I'm trying to replicatethe training of your pipeline named "fr_dep_news_trf" to be able to transpose it on another dataset with another label scheme as accurately as possible. I'm using fr_dep_news_trf's config file (except train and dev path) and I'm training the models on Ubuntu 18.04 with pytorch and CUDA 11.1. If necessary, the config.cfg and meta.json files of the trained models are attached. Thank you in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hmm, as long as you're using UD French Sequoia v2.5 and the exact same config, that sounds unexpected. Our reported evaluation is on the dev set rather than the test set, so maybe that explains the difference? For that particular corpus I'd be surprised if the splits were so different, but for some UD corpora there are large differences/imbalances between test and the other splits. (We're concerned about repeatedly evaluating on the test sets in case we want to run a clean evaluation for a future publication, so we set the test sets aside and don't use them in our standard training setup.) |
Beta Was this translation helpful? Give feedback.
Hmm, as long as you're using UD French Sequoia v2.5 and the exact same config, that sounds unexpected. Our reported evaluation is on the dev set rather than the test set, so maybe that explains the difference? For that particular corpus I'd be surprised if the splits were so different, but for some UD corpora there are large differences/imbalances between test and the other splits. (We're concerned about repeatedly evaluating on the test sets in case we want to run a clean evaluation for a future publication, so we set the test sets aside and don't use them in our standard training setup.)