You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*`-i`: the path to the parse file or a directory containing the parse files to convert.
35
+
*`-r`: if set, process all files with the extension in the subdirectories of the input directory recursively.
36
+
*`-n`: if set, normalize the parse trees before the conversion.
37
+
*`-pe`: the extension of the parse files; required if the input path indicates a directory (default: `parse`).
38
+
*`-oe`: the extension of the output files (default: `ddg`).
39
+
40
+
## Corpora
41
+
42
+
DDG conversion has been tested on the following corpora. Some of these corpora require you to be a member of the [Linguistic Data Consortium](https://www.ldc.upenn.edu) (LDC). Retrieve the corpora from LDC and run the following command for each corpus to generate DDG.
We have internally updated these corpora to reduce annotation errors and produce a richer representation. If you want to take advantage of our latest updates, merge the original annotation with our annotation. You still need to retrieve the original corpora from LDC.
*[English Web Treebank](https://catalog.ldc.upenn.edu/LDC2012T13):
90
+
91
+
```
92
+
java -cp nlp4j-ddr.jar edu.emory.mathcs.nlp.bin.DDGMerge eng_web_tbk/data ddr/english/google/ewt tree
93
+
```
94
+
95
+
*[QuestionBank with Manually Revised Treebank Annotation 1.0](https://catalog.ldc.upenn.edu/LDC2012R121):
96
+
97
+
```
98
+
java -cp nlp4j-ddr.jar edu.emory.mathcs.nlp.bin.DDGMerge QB-revised.tree ddr/english/google/qb/QB-revised.tree.skel tree
99
+
```
100
+
101
+
102
+
## Format
103
+
104
+
DDG is represented in the tab separated values format (TSV), where each column represents a different field. The semantic roles are indicated in the `feats` column with the key, `sem`.
105
+
106
+
```
107
+
1 You you PRP _ 3 nsbj 7:nsbj O
108
+
2 can can MD _ 3 modal _ O
109
+
3 ascend ascend VB _ 0 root _ O
110
+
4 Victoria victoria NNP _ 5 com _ B-LOC
111
+
5 Peak peak NNP _ 3 obj _ L-LOC
112
+
6 to to TO _ 7 aux _ O
113
+
7 get get VB sem=prp 3 advcl _ O
114
+
8 a a DT _ 10 det _ O
115
+
9 panoramic panoramic JJ _ 10 attr _ O
116
+
10 view view NN _ 7 obj _ O
117
+
11 of of IN _ 16 case _ O
118
+
12 Victoria victoria NNP _ 13 com _ B-LOC
119
+
13 Harbor harbor NNP _ 16 poss _ I-LOC
120
+
14 's 's POS _ 13 case _ L-LOC
121
+
15 beautiful beautiful JJ _ 16 attr _ O
122
+
16 scenery scenery NN _ 10 ppmod _ O
123
+
17 . . . _ 3 p _ O
124
+
```
125
+
126
+
*`id`: current token ID (starting at 1).
127
+
*`form`: word form.
128
+
*`lemma`: lemma.
129
+
*`pos`: part-of-speech tag.
130
+
*`feats`: extra features; different features are delimited by `|`, keys and values are delimited by `=` (`_` indicates no feature).
131
+
*`headId`: head token ID.
132
+
*`deprel`: dependency label.
133
+
*`sheads`: secondary heads (`_` indicates no secondary head).
134
+
*`nament`: named entity tags in the `BILOU` notation if the annotation is available.
0 commit comments