Penn Attribution Relation Corpus 3.0 (PARC 3.0) paper link
Contact the owner of the corpus, Silvia Pareti for access to the corpus (You will need valid LDC licenses to PTB & PDTB).
Given an source directory, the reader will look for all files with ".xml" extension in all nested sub-directories. Each document is read into a TextAnnotation instance with the following views defined in the ViewNames class:
TOKENS:TokenLabelViewthat keeps gold tokenization from corpus.SENTENCE:TokenLabelViewthat keeps gold sentence split from corpus.ATTRIBUTION_RELATION:PredicateArgumentView. Each Attribution Relation corresponds to one predicate argument set. The "Cue" in each Attribution Relation serves as a "predicate", and "source"s and "span"s in that relation serves as arguments.- (optional)
POS:TokenLabelViewthat keeps POS tags from corpus - (optional)
LEMMA:TokenLabelViewthat keeps lemma of each token from corpus
Standard WSJ directory structure.
\PARC3
\train
\00
wsj-0001.xml
...
\01
wsj-0101.xml
...
...
\test
\23
...
\dev
\24
...
import edu.illinois.cs.cogcomp.nlp.corpusreaders.parcReader.PARC3Reader;
import edu.illinois.cs.cogcomp.nlp.corpusreaders.parcReader.PARC3ReaderConfigurator;
// Read all training data, with defualt settings (discard gold POS and LEMMA)
PARC3Reader reader = new PARC3Reader("data/PARC3/train"); or specify your own settings by creating a *.properties file. See PARC3ReaderConfigurator for what fields you should specify.
PARC3Reader reader = new PARC3Reader(new ResourceManager("my-parc3-config.properties"))PARC3Reader implements Iterable<TextAnnotation> interface.
while (reader.hasNext()) {
TextAnnotation doc = reader.next();
...
}or
for (TextAnnotation doc : reader) {
...
}