OntoNotes 5.0 dependency parsing #8501
-
Hello everyone, I have a question about dependency parsing dataset that is used in spaCy. Is the dataset for dependency parsing publicly available anywhere? I found that the dataset is OntoNotes 5.0 but after converting with StanfordCoreNLP parser in conllu format. If the dataset is not available would it be possible to release the dataset publically. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
OntoNotes is available for a fee from the LDC. https://catalog.ldc.upenn.edu/LDC2013T19 It is "public" in the sense that it is available to anyone. We can't release it the same way we do the spaCy source due to rights issues. |
Beta Was this translation helpful? Give feedback.
-
And to clarify a bit, it's OntoNotes 5 but converted with the ClearNLP converter, with some additional work to align OntoNotes/PTB in order to have raw texts for as many documents as possible. Some of the source/info links were accidentally left out of the v3.0.0 model metadata, but they'll be added back in v3.1.0. In the meanwhile, you can see the v2 info here (it's all the same): https://v2.spacy.io/api/annotation#dependency-parsing |
Beta Was this translation helpful? Give feedback.
And to clarify a bit, it's OntoNotes 5 but converted with the ClearNLP converter, with some additional work to align OntoNotes/PTB in order to have raw texts for as many documents as possible. Some of the source/info links were accidentally left out of the v3.0.0 model metadata, but they'll be added back in v3.1.0. In the meanwhile, you can see the v2 info here (it's all the same): https://v2.spacy.io/api/annotation#dependency-parsing