You can easily install discopy-data by using pip:
pip install git+https://github.com/rknaebel/discopy-dataor you just clone the repository. The you can either install discopy-data through pip
pip install -e path/to/discopy-dataDiscopy-data is the discopy backend that handles datastructures, preprocessing, and dataset extraction.
The first script uses trankit for tokenization, tagging, and dependency parsing.
In addition, the second script is used, to add constituency trees with the supar parser.
If dependency trees should be added by super as well, add the flag -d.
discopy-tokenize -i /some/examples/wsj_0336 | discopy-add-parses -cThis might be useful for neural pipeline that does not rely on language features.
cat /some/text | discopy-tokenize --tokenize-onlyThis is still experimental. A list of possible datasets is listed under cli/extract.py.
discopy-extract pdtb /data/discourse/conll2016/ --use-gpu --limit 2 | discopy-add-annotations pdtb /data/discourse/conll2016/ --simple-connectives --sense-level 2 | discopy-update-parses --dependency-parser ''