With which corpora is the French accurate pipeline (fr_dep_new_trf) trained ? #8032
XavBeckers
started this conversation in
Language Support
Replies: 1 comment
-
Hi, the sources are included in In the v3.0.0 releases, the vectors information was accidentally left out of the metadata. The vectors for French are fastText cbow vectors trained on OSCAR (Common Crawl) and Wikipedia, published by us (Explosion). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone !
In the context of my graduate work, which is about a linguistic evaluation of syntactic parsing results in french, I'm searching for information about spacy's french pipeline. Indeed, we're considering the interest of a comparison of the efficiency of pipelines trained on two label sets (UD and SUD), and perform a qualitative analysis of the results, more on the linguistic side than the statistical side.
For this purpose, i'd need to know if the corpora used to train the original pipeline is known and available, in order to train a custom pipeline using SUD tag set onto the same data (annotated with SUD) and ensure the comparison.
Do someone have any information about training data of this particular pipeline ?
Have a good afternoon !
Beta Was this translation helpful? Give feedback.
All reactions