-
Notifications
You must be signed in to change notification settings - Fork 529
[Feature] Add Ner Suffix feature #1123
base: v0.x
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1123 +/- ##
==========================================
- Coverage 88.34% 88.21% -0.13%
==========================================
Files 66 66
Lines 6290 6290
==========================================
- Hits 5557 5549 -8
- Misses 733 741 +8
|
|
Job PR-1123/1 is complete. |
| help='Learning rate for optimization') | ||
| arg_parser.add_argument('--warmup-ratio', type=float, default=0.1, | ||
| help='Warmup ratio for learning rate scheduling') | ||
| arg_parser.add_argument('--tagging-first-token', type=str2bool, default=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about parser.add_argument('--tag-last-token', action='store_true'). It seems simpler to call finetune_bert.py --tag-last-token than finetune_bert.py --tagging-first-token=False.
In either case please update the test case in scripts/tests/ to run invoke the finetune_bert.py with both options. You can parametrize the test following for example Haibin's recent PR: https://github.com/dmlc/gluon-nlp/pull/1121/files#diff-fa82d34d543ff657c2fe09553bd0fa34R234
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will update it.
|
Have you found any performance differences? |
|
@sxjscience I've tried the default parameters set in the scripts on conll2003 dataset. The performance using suffix feature will be a little lower than using the prefix feature. |
|
I think we can try the following:
|
|
One problem is that since we are using self-attention, we are able to tailor the attention weights to cover the first, last, average cases. Thus, I don't think selecting the first/last token will impact the performance much. |
|
@sxjscience In classification task, I think it does not matter. But in sequence labeling task, one word has one label. If we break the word 'w' into several subwords [sw1,sw2,...], then only sw1 will have the label, and the labels of the others will set to NULL. I think it does not make sense. |
|
|
||
| dataset = BERTTaggingDataset(text_vocab, None, None, config.test_path, | ||
| config.seq_len, train_config.cased, tag_vocab=tag_vocab) | ||
| config.seq_len, train_config.cased, tag_vocab=tag_vocab,tagging_first_token=config.tagging_first_token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls add white space after the comma.
|
Due to the fact that we are using attention, the state bound to sw1 will be related to the other sub-words. The same thing happens for sw_n. A reasonable approach is to mask the loss corresponding to the other sub-word tokens and only use the state of the first subword as the contextualized word embedding.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Shawnyu <[email protected]>
Sent: Sunday, January 19, 2020 7:28:28 PM
To: dmlc/gluon-nlp <[email protected]>
Cc: Xingjian SHI <[email protected]>; Mention <[email protected]>
Subject: Re: [dmlc/gluon-nlp] [Feature] Add Ner Suffix feature (#1123)
@sxjscience<https://github.com/sxjscience> In classification task, I think it does not matter. But in sequence labeling task, one word has one label. If we break the word 'w' into several subwords [sw1,sw2,...], then only sw1 will have the label, and the labels of the others will set to NULL. I think it does not make sense.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1123?email_source=notifications&email_token=ABHQH3RQMOA4X4WSVHE6VADQ6UK5ZA5CNFSM4KIXYPMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJLHMDY#issuecomment-576091663>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHQH3UUE5HVOS5AHRINZLLQ6UK5ZANCNFSM4KIXYPMA>.
|
| entries.append(PredictedToken(text=text, | ||
| true_tag=true_tag, pred_tag=pred_tag)) | ||
| tmptext = '' | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can both cases be merged here? For example, if len(tmptext) == 0, you can still have text = tmptext + token_text which is equivalent to token_text.
| true_tag=true_tag, pred_tag=pred_tag)) | ||
|
|
||
| if true_tag == NULL_TAG: | ||
| tmptext += token_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better name it as tmp_text. Or what about partial_text?
|
@sxjscience Agree with you, and I'll try this method. |
I am confused about this part. Why masking loss of other sub-word tokens is reasonable? For example on NER tasks, suffix is much more important than prefix in words like |
Since we are using attention, the higher-level state associated with |
|
@sxjscience Do you think we should continue with this pull request? |
Description
Add a parameter "tagging_first_token", so you can choose to use the first piece or the last piece of each word. The first piece catches the prefix feature of a word, and the last piece catches the suffix feature of a word.
Checklist
Essentials
Comments