Add possibility to input multiple sentences by dldk-gael · Pull Request #4 · simonepri/lm-scorer

dldk-gael · 2020-04-28T14:40:57Z

In order to be able to fully benefit from parallelization, I propose to add the possibility to input the sentences into the transformer models by batch.

The work is not finished yet and I still have to pass the CI test (I will proceed as you suggest in #3) but it is working and would like to have your opinion before going further.

In order to make minimal change to your code, for now tokens_score return a list of log_probs, ids, tokens (one item for each sentence). However, it would be much more efficient to make the reduction when we still have the log_probs scores as a tensor. One possibility would be that _tokens_log_prob now return a tensor of shape (number sentences, max sentences length). What do you think of that ?

Also, I did not use the pad_sequence function from torch.nn.utils.rnn but recode it in order that it returns a mask (with 1 at the place of the padding value) which is useful latter in the code to remove the score from padding value. I think this function should not be in GPT2LMScorer class but somewhere else.

simonepri

I left some comments more are coming.

Also if you don't manage to run the code formatter on your computer, you can copy-paste the code here: https://black.now.sh/

lm_scorer/models/abc/base.py

# Conflicts: # lm_scorer/models/abc/base.py # lm_scorer/models/gpt2.py

dldk-gael · 2020-04-29T20:37:33Z

I have taking into account all your comments. There is still one typing issue, however not sure how to resolve this one. Am i supposed to use a # type ignore or is there an other possibility ?

lm_scorer/models/abc/base.py

simonepri · 2020-04-30T14:54:44Z

Am i supposed to use a # type ignore or is there an other possibility ?

No, it is correctly pointing out a bug in the code

dldk-gael · 2020-04-30T15:10:27Z

I will also add some more tests concerning this new batch features in test_gpt2.

lm_scorer/models/gpt2.py

lm_scorer/models/abc/base.py

simonepri · 2020-04-30T15:23:41Z

Would you mind if I ask you to split this PR in two?

It would be convenient to have a first PR (we can use this one) with the API changes + GPT2 implemented as just a for loop on the old single sentence code.

Then the second PR will actually optimize GPT2 using batching.
In this way, we can have the new API merged for #3 in a timely manner.

simonepri

Some small changes, we are almost ready to merge it.
Thanks for the work!!

lm_scorer/models/abc/base.py

lm_scorer/models/gpt2.py

lm_scorer/models/abc/base.py

tests/models/test_gpt2.py

Co-authored-by: Simone Primarosa <simonepri@outlook.com>

simonepri · 2020-05-01T19:00:38Z

@dldk-gael Thanks!

dldk-gael · 2020-05-01T19:05:00Z

No problem and thank you for your patience, I was not familiar with all those tools and good practices, I learned a lot from your code.

add possibility to input sentences by batch

8f9d2fc

dldk-gael changed the title ~~add possibility to input sentences by batch~~ Add possibility to input sentences by batch Apr 28, 2020

simonepri requested changes Apr 28, 2020

View reviewed changes

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

add possibility to input sentences by batch

0903ae6

simonepri reviewed Apr 29, 2020

View reviewed changes

lm_scorer/models/abc/base.py Show resolved Hide resolved

dldk-gael added 2 commits April 29, 2020 22:22

resolve typing issue + format with black + correct tokens_score

e27b5da

Merge remote-tracking branch 'origin/batch_input' into batch_input

65523bc

# Conflicts: # lm_scorer/models/abc/base.py # lm_scorer/models/gpt2.py

simonepri requested changes Apr 30, 2020

View reviewed changes

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

simonepri added the enhancement New feature or request label Apr 30, 2020

simonepri mentioned this pull request Apr 30, 2020

Add BERTLMScore #3

Draft

change for + correct return

942c5da

simonepri reviewed Apr 30, 2020

View reviewed changes

lm_scorer/models/gpt2.py Outdated Show resolved Hide resolved

simonepri requested changes Apr 30, 2020

View reviewed changes

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

dldk-gael added 2 commits April 30, 2020 22:26

minor style changes

0401e7b

revert to previous version - handle multiple sentences with for loop

ea4c0bc

simonepri assigned dldk-gael Apr 30, 2020

simonepri marked this pull request as draft April 30, 2020 22:18

dldk-gael added 2 commits May 1, 2020 13:52

add test specific to list input

691a671

raise an exception for empty list

a7fa2a2

dldk-gael changed the title ~~Add possibility to input sentences by batch~~ Add possibility to input multiple sentences May 1, 2020

simonepri requested changes May 1, 2020

View reviewed changes

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

lm_scorer/models/gpt2.py Outdated Show resolved Hide resolved

lm_scorer/models/gpt2.py Outdated Show resolved Hide resolved

simonepri requested changes May 1, 2020

View reviewed changes

lm_scorer/models/abc/base.py Outdated Show resolved Hide resolved

dldk-gael added 2 commits May 1, 2020 14:11

rename sentences as text

4dde775

return empty list when input is an empty list

f135ca9

dldk-gael mentioned this pull request May 1, 2020

GPT2LMScorer batching optimization #5

Closed

simonepri marked this pull request as ready for review May 1, 2020 14:30

simonepri approved these changes May 1, 2020

View reviewed changes

simonepri requested changes May 1, 2020

View reviewed changes

tests/models/test_gpt2.py Outdated Show resolved Hide resolved

tests/models/test_gpt2.py Outdated Show resolved Hide resolved

dldk-gael and others added 2 commits May 1, 2020 20:44

Update tests/models/test_gpt2.py

2fd14da

Co-authored-by: Simone Primarosa <simonepri@outlook.com>

Update tests/models/test_gpt2.py

cff63e9

Co-authored-by: Simone Primarosa <simonepri@outlook.com>

simonepri approved these changes May 1, 2020

View reviewed changes

simonepri merged commit 5531890 into simonepri:master May 1, 2020

dldk-gael deleted the batch_input branch May 1, 2020 19:05

Conversation

dldk-gael commented Apr 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonepri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dldk-gael commented Apr 29, 2020

Uh oh!

Uh oh!

Uh oh!

simonepri commented Apr 30, 2020

Uh oh!

dldk-gael commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonepri commented Apr 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonepri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonepri commented May 1, 2020

Uh oh!

dldk-gael commented May 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dldk-gael commented Apr 28, 2020 •

edited

Loading

dldk-gael commented Apr 30, 2020 •

edited

Loading

simonepri commented Apr 30, 2020 •

edited

Loading