With which corpora is the French accurate pipeline (fr_dep_new_trf) trained ? #8032

XavBeckers · 2021-05-07T13:17:07Z

XavBeckers
May 7, 2021

Hello everyone !
In the context of my graduate work, which is about a linguistic evaluation of syntactic parsing results in french, I'm searching for information about spacy's french pipeline. Indeed, we're considering the interest of a comparison of the efficiency of pipelines trained on two label sets (UD and SUD), and perform a qualitative analysis of the results, more on the linguistic side than the statistical side.
For this purpose, i'd need to know if the corpora used to train the original pipeline is known and available, in order to train a custom pipeline using SUD tag set onto the same data (annotated with SUD) and ensure the comparison.
Do someone have any information about training data of this particular pipeline ?
Have a good afternoon !

adrianeboyd · 2021-05-07T17:07:54Z

adrianeboyd
May 7, 2021

Hi, the sources are included in meta.json in the model package and shown for each model on the models page here: https://spacy.io/models/fr

In the v3.0.0 releases, the vectors information was accidentally left out of the metadata. The vectors for French are fastText cbow vectors trained on OSCAR (Common Crawl) and Wikipedia, published by us (Explosion).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

With which corpora is the French accurate pipeline (fr_dep_new_trf) trained ? #8032

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

With which corpora is the French accurate pipeline (fr_dep_new_trf) trained ? #8032

Uh oh!

XavBeckers May 7, 2021

Replies: 1 comment

Uh oh!

adrianeboyd May 7, 2021

XavBeckers
May 7, 2021

adrianeboyd
May 7, 2021