Small modifications to enhance the baseline performance like dev EM = 75% by akeyhero · Pull Request #2 · SkelterLabsInc/JaQuAD

akeyhero · 2022-03-01T06:57:36Z

Thank you for sharing the great Japanese QA dataset!

I would like to share my changes, which improve the baseline performance by 10%+ (EM).

Inference log:

  1/3939 | EM: 1.0000, F1: 1.0000
        (Sample) pred: "奈良", answer: "奈良"
Token indices sequence length is longer than the specified maximum sequence length for this model (538 > 512). Running this sequence through the model will result in indexing errors
  201/3939 | EM: 0.7861, F1: 0.8720
        (Sample) pred: "スティーブンズ・プリンター社", answer: "スティーブンズ・プリンター社"
  401/3939 | EM: 0.7781, F1: 0.8698
        (Sample) pred: "湿潤状態", answer: "湿潤状態"
  601/3939 | EM: 0.7604, F1: 0.8593
        (Sample) pred: "1881年", answer: "1881年"
  801/3939 | EM: 0.7491, F1: 0.8490
        (Sample) pred: "器具メーカー", answer: "光源メーカー"
  1001/3939 | EM: 0.7652, F1: 0.8555
        (Sample) pred: "Graduation", answer: "Graduation"
  1201/3939 | EM: 0.7619, F1: 0.8515
        (Sample) pred: "黒田孝高", answer: "黒田孝高"
  1401/3939 | EM: 0.7523, F1: 0.8445
        (Sample) pred: "煙害", answer: "煙害"
  1601/3939 | EM: 0.7552, F1: 0.8440
        (Sample) pred: "カルロス門", answer: "カルロス門"
  1801/3939 | EM: 0.7601, F1: 0.8479
        (Sample) pred: "「一心寮」", answer: "「一心寮」"
  2001/3939 | EM: 0.7651, F1: 0.8517
        (Sample) pred: "2015年7月18日", answer: "2015年7月18日"
  2201/3939 | EM: 0.7665, F1: 0.8517
        (Sample) pred: "SE車", answer: "SE車"
  2401/3939 | EM: 0.7668, F1: 0.8523
        (Sample) pred: "久原房之助", answer: "久原房之助"
  2601/3939 | EM: 0.7655, F1: 0.8507
        (Sample) pred: "1900年", answer: "地方議員"
  2801/3939 | EM: 0.7608, F1: 0.8491
        (Sample) pred: "藤山一郎", answer: "東海林太郎"
  3001/3939 | EM: 0.7614, F1: 0.8503
        (Sample) pred: "フィリップ・ファンデンベルク", answer: "フィリップ・ファンデンベルク"
  3201/3939 | EM: 0.7619, F1: 0.8500
        (Sample) pred: "大峯奥駈道", answer: "大峯奥駈道"
  3401/3939 | EM: 0.7601, F1: 0.8482
        (Sample) pred: "エラーヒューゼン", answer: "アルリック・エラーヒューゼン"
  3601/3939 | EM: 0.7540, F1: 0.8435
        (Sample) pred: "「道路標示黄色見本」", answer: "「道路標示黄色見本」"
  3801/3939 | EM: 0.7506, F1: 0.8402
        (Sample) pred: "『ヘントの祭壇画』", answer: "『ヘントの祭壇画』"
F1 score: 0.8404927719328006
Exact Match: 0.751967504442752

Performance by types:

Small modifications to enhance the baseline performance like dev EM = 75%

akeyhero · 2022-03-01T06:57:56Z

JaQuAD.ipynb

        "        val += [padding] * pad_len\n",
        "        return val\n",
        "\n",
-        "    for i in range(0, input_len - max_seq_len + stride, stride):\n",


This range will be empty when input_len <= max_seq_len - stride

akeyhero · 2022-03-01T06:58:23Z

JaQuAD.ipynb

+        "    step = max_seq_len - question_len - stride\n",
+        "    for i in range(0, max(context_len - stride, step), step):\n",


A stride is a sequence length of overlapping tokens in the Hugging Face manner. (if I am correct)

Thank you for this comment, but we choose to maintain the meaning of stride.

As you say, a stride of Tokenizer means the length of overlapping tokens.
However, HuggingFace sometimes uses stride as an interval of two spans (e.g. squad.py)
In my thought, this is implementation specific.

Thank you for your comment. That's so confusing 😭

akeyhero · 2022-03-01T06:58:45Z

JaQuAD.ipynb

        "            answer_start_index = ctx_start\n",
-        "            answer_end_index = len(offsets) - 1\n",
-        "            while offsets[answer_start_index][0] < start_char:\n",
+        "            while offsets[answer_start_index][1] < start_char:\n",


One may not like this change, but I prefer inclusive answer chunks.

e.g. where 分間 is a single token:
Original answer: 九十分
Previous answer chunk: 九十
Proposed answer chunk: 九十分間

When I tested both options, I found that the inclusive answer chunks performed better.
Thank you.

akeyhero · 2022-03-01T06:59:07Z

JaQuAD.ipynb

-        "            while offsets[answer_end_index][1] > start_char + len(answer):\n",
-        "                answer_end_index -= 1\n",


We will get a smaller index number by 1 when the following token length is >= 2.

akeyhero · 2022-03-01T07:05:55Z

Also, this typing of model should be QAModel although I did not fixed here (because it is trivial)

def get_answers(model: AutoModelForQuestionAnswering,

akeyhero · 2022-03-01T08:16:03Z

Another finding is that we can get better performance instantly with cl-tohoku/bert-base-japanese-v2

F1 score: 0.8517716934422136
Exact Match: 0.7684691546077684

Plus, with lr = 5e-5, epochs = 6:

F1 score: 0.8738306001360371
Exact Match: 0.7994414826097994

ghost · 2022-03-02T10:27:02Z

Thank you for the PR.
I will run with your code and merge it after some modifications.

I appreciate it if you change the type of model at get_answers func. 😄

ghost · 2022-03-02T10:32:44Z

@akeyhero If you don't mind, can I commit the changes I want to this PR?

akeyhero · 2022-03-02T13:09:06Z

@w4-ByunghoonSo Thank you. I've invited you to my forked repo to commit some changes. (Ignore it if you don't need the access to my repo)

akeyhero added 3 commits March 1, 2022 14:34

Fix stride on make_spans

1630f85

Fix answer indices

48ccd5a

Merge pull request #1 from akeyhero/fix-bugs-on-making-spans

d665090

Small modifications to enhance the baseline performance like dev EM = 75%

akeyhero commented Mar 1, 2022

View reviewed changes

Fix a comment about doc_stride

25fc286

combacsa assigned ghost Mar 2, 2022

Fix get_answers typing

3a829c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small modifications to enhance the baseline performance like dev EM = 75%#2

Small modifications to enhance the baseline performance like dev EM = 75%#2
akeyhero wants to merge 5 commits intoSkelterLabsInc:mainfrom
akeyhero:main

akeyhero commented Mar 1, 2022

Uh oh!

akeyhero Mar 1, 2022 •

edited

Loading

Uh oh!

akeyhero Mar 1, 2022 •

edited

Loading

Uh oh!

ghost Mar 2, 2022

Uh oh!

akeyhero Mar 2, 2022

Uh oh!

akeyhero Mar 1, 2022 •

edited

Loading

Uh oh!

ghost Mar 10, 2022

Uh oh!

akeyhero Mar 1, 2022

Uh oh!

akeyhero commented Mar 1, 2022 •

edited

Loading

Uh oh!

akeyhero commented Mar 1, 2022 •

edited

Loading

Uh oh!

ghost commented Mar 2, 2022

Uh oh!

ghost commented Mar 2, 2022

Uh oh!

akeyhero commented Mar 2, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		" step = max_seq_len - question_len - stride\n",
		" for i in range(0, max(context_len - stride, step), step):\n",

		" while offsets[answer_end_index][1] > start_char + len(answer):\n",
		" answer_end_index -= 1\n",

Conversation

akeyhero commented Mar 1, 2022

Uh oh!

akeyhero Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akeyhero Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

akeyhero Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

akeyhero Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost Mar 10, 2022

Choose a reason for hiding this comment

Uh oh!

akeyhero Mar 1, 2022

Choose a reason for hiding this comment

Uh oh!

akeyhero commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akeyhero commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Mar 2, 2022

Uh oh!

ghost commented Mar 2, 2022

Uh oh!

akeyhero commented Mar 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

akeyhero Mar 1, 2022 •

edited

Loading

akeyhero Mar 1, 2022 •

edited

Loading

akeyhero Mar 1, 2022 •

edited

Loading

akeyhero commented Mar 1, 2022 •

edited

Loading

akeyhero commented Mar 1, 2022 •

edited

Loading

akeyhero commented Mar 2, 2022 •

edited

Loading