This repository was archived by the owner on Nov 8, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 447
bug: ECB Alignment issues with raw ECB files #158
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
I've been looking through your processed ECB data (thanks for sharing a processed version) and cross-comparing with that of the original files.
I've noticed there seems to be an alignment issue. If you look at your raw data https://raw.githubusercontent.com/NervanaSystems/nlp-architect/master/datasets/ecb/ecb_all_event_mentions.json
{
"coref_chain": "ACT15731460277214564",
"doc_id": "1_21ecbplus.xml",
"is_continuous": true,
"is_singleton": false,
"mention_head": "agreed",
"mention_head_lemma": "agree",
"mention_head_pos": "VERB",
"mention_id": "1_21ecbplus.xml_6_15",
"mention_ner": null,
"mention_type": "ACT",
"predicted_coref_chain": null,
"score": -1.0,
"sent_id": 6,
"tokens_number": [
15
],
"tokens_str": "agreed",
"topic_id": "1_ecbplus"
},
If I then go back to the raw ECB xml files and look at sentence 6 in file 1_21ecbplus, the corresponding tokens are:
<token t_id="106" sentence="6" number="0">Nothing</token>
<token t_id="107" sentence="6" number="1">bad</token>
<token t_id="108" sentence="6" number="2">is</token>
<token t_id="109" sentence="6" number="3">going</token>
<token t_id="110" sentence="6" number="4">to</token>
<token t_id="111" sentence="6" number="5">happen</token>
<token t_id="112" sentence="6" number="6">.</token>
<token t_id="113" sentence="6" number="7">"</token>
Reason why I'd want to go back and check this is if i want to pull out the full token list and attach it to this payload, the alignment is off.
Is this a bug? Or am I looking at this incorrectly...
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working