Share PPL results

Hi,
It was not clear to me from the article what are your final PPL results for each model.
Can you share them too?

From a first look I actually thought that you achieve same or comparable PPL results. I am not sure about it now. Can you clarify?

Do you have a baseline model with comparable PPL to the original base model?

Can someone use what you did as a baseline for smaller scale research? (4-8 "commodity" GPUs for example?). 

Extra detail on total training time:
I noticed that you count in tokens instead of steps,
were `tokens_per_global_batch=global_batch_size*seq_len`.
Using the parameters in the script, a simple calculation yields, in steps:

config | num gpus | max tokens | seq len | base batch | global batch size | tokens per batch | required steps | PPL
-- | -- | -- | -- | -- | -- | -- | -- | --
single machine | 1 | 1.8B | 128 | 32 | 32 | 4096 | 439453.125 | ?
single machine | 2 | 1.8B | 128 | 32 | 64 | 8192 | 219726.5625 | ?
single machine | 4 | 1.8B | 128 | 32 | 128 | 16384 | 109863.2813 | ?

Comparing the the base_wiki103 config from the original repo
(they used only data parallel) we get:

config | num gpus | tokens | seq len | base batch | global batch size | tokens per batch | steps | PPL
-- | -- | -- | -- | -- | -- | -- | -- | --
original-base-wt103 | don't care| 1.92B | 150 | don't care| 64 | 9600 | 200000 | 24

=>They trained on much more tokens.
If your results are really comparable, the model you present here is worth using as a baseline for future transfomerXL experiments because its faster. Right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share PPL results #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

config	num gpus	max tokens	seq len	base batch	global batch size	tokens per batch	required steps	PPL
single machine	1	1.8B	128	32	32	4096	439453.125	?
single machine	2	1.8B	128	32	64	8192	219726.5625	?
single machine	4	1.8B	128	32	128	16384	109863.2813	?

Share PPL results #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions