Skip to content
Discussion options

You must be logged in to vote

Hey there,

The OntoNotes data itself is quite hard to work with in many ways unfortunately and as you say its not publicly available. However, to understand how you need to format your data to work with the coref you do not need to deal with it necessarily. Running

python -m spacy project assets --extra

should download the LitBank dataset into the directory assets/litbank. Then running

python -m spacy project run prep-artifical-unit-test-data

preprocesses a single file assets/litbank/95_the_prisoner_of_zenda_brat.conll using the scripts/preprocess.py script.

You can take a look at the .conll formatted files in assets/litbank to see what kind of format the scripts.preprocess.py expects. Y…

Replies: 2 comments 6 replies

Comment options

You must be logged in to vote
1 reply
@ksgr5566
Comment options

Comment options

You must be logged in to vote
5 replies
@ksgr5566
Comment options

@kadarakos
Comment options

@ksgr5566
Comment options

@kadarakos
Comment options

@Jiya126
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / coref Feature: Coreference resolution
4 participants