Skip to content

Commit 103241b

Browse files
authored
Modify the default path of DuEE-fin (#2132)
1 parent 63ab917 commit 103241b

File tree

2 files changed

+11
-6
lines changed

2 files changed

+11
-6
lines changed

examples/information_extraction/DuEE/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,8 @@ f1_score = (2 * P * R) / (P + R),其中
5151

5252
### 快速复现基线Step1:数据预处理并加载
5353

54-
从比赛官网下载数据集,解压存放于data/DuEE-Fin目录下,将原始数据预处理成序列标注格式数据
55-
处理之后的数据同样放在data/DuEE-Fin下,触发词识别数据文件存放在data/DuEE-Fin/role下,论元角色识别数据文件存放在data/DuEE-Fin/trigger下。
54+
从比赛官网下载数据集,逐层解压存放于data/DuEE-fin目录下,运行以下脚本将原始数据预处理成序列标注格式数据
55+
处理之后的数据放在data/DuEE-Fin下,触发词识别数据文件存放在data/DuEE-Fin/role下,论元角色识别数据文件存放在data/DuEE-Fin/trigger下。
5656
枚举分类数据存放在data/DuEE-Fin/enum下。
5757

5858
```

examples/information_extraction/DuEE/duee_fin_data_prepare.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,9 @@ def docs_data_process(path):
217217
# schema process
218218
print("\n=================DUEE FINANCE DATASET==============")
219219
conf_dir = "./conf/DuEE-Fin"
220-
schema_path = "{}/event_schema.json".format(conf_dir)
220+
if not os.path.exists(conf_dir):
221+
os.makedirs(conf_dir)
222+
schema_path = "./data/DuEE-fin/duee_fin_event_schema.json"
221223
tags_trigger_path = "{}/trigger_tag.dict".format(conf_dir)
222224
tags_role_path = "{}/role_tag.dict".format(conf_dir)
223225
tags_enum_path = "{}/enum_tag.dict".format(conf_dir)
@@ -245,11 +247,14 @@ def docs_data_process(path):
245247
print("\n********** start document process **********")
246248
if not os.path.exists(sentence_dir):
247249
os.makedirs(sentence_dir)
248-
train_sent = docs_data_process("{}/duee_fin_train.json".format(data_dir))
250+
train_sent = docs_data_process(
251+
"./data/DuEE-fin/duee_fin_train.json/duee_fin_train.json")
249252
write_by_lines("{}/train.json".format(sentence_dir), train_sent)
250-
dev_sent = docs_data_process("{}/duee_fin_dev.json".format(data_dir))
253+
dev_sent = docs_data_process(
254+
"./data/DuEE-fin/duee_fin_dev.json/duee_fin_dev.json")
251255
write_by_lines("{}/dev.json".format(sentence_dir), dev_sent)
252-
test_sent = docs_data_process("{}/duee_fin_test1.json".format(data_dir))
256+
test_sent = docs_data_process(
257+
"./data/DuEE-fin/duee_fin_test2.json/duee_fin_test2.json")
253258
write_by_lines("{}/test.json".format(sentence_dir), test_sent)
254259
print("train {} dev {} test {}".format(
255260
len(train_sent), len(dev_sent), len(test_sent)))

0 commit comments

Comments
 (0)