Ukrainian model proposal #10561
kurnosovv
started this conversation in
Language Support
Replies: 2 comments 6 replies
-
Thanks, it would be great to be able to add Ukrainian pipelines! Can you provide some more information about the sources/citations for the training data? If you'd prefer to discuss it over email, you can contact me at [email protected]. |
Beta Was this translation helpful? Give feedback.
0 replies
-
At the moment, gold-standard Ukrainian language datasets are:
For creating silver-standard data, the following steps were performed:
The resulting dataset is synthetic data, stored on my Google Drive (I have not posted it anywhere at the moment) |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
@honnibal @adrianeboyd
I created SpaCy configs for Ukrainian language, and would like to propose to train the model and add them to the official SpaCy models registry.
Training configs are
All training sources are available under MIT license. Training and evaluation data are silver standard datasets for Ukrainian language.
Could you train and add such models to official SpaCy models registry? During my attempts, training stops much earlier than specified num_epochs, which seems lead to lower accuracy than may be achieved in the best case. Maybe you can modify training procedure, or change training config, so that it will lead to higher-accuracy models. I may assist you, and answer any questions.
Beta Was this translation helpful? Give feedback.
All reactions