11# Pre-trained Models
22
33⚠️ Disclaimer: Checkpoints are based on training with publicly available datasets.
4- Some datasets contain limitations, including non-commercial use limitations. Please review the terms and conditions made available by third parties before using
5- the datasets provided. Checkpoints are licensed under
4+ Some datasets contain limitations, including non-commercial use limitations.
5+ Please review the terms and conditions made available by third parties before
6+ using the datasets provided. Checkpoints are licensed under
67[ Apache 2.0] ( https://github.com/tensorflow/models/blob/master/LICENSE ) .
78
89⚠️ Disclaimer: Datasets hyperlinked from this page are not owned or distributed
@@ -16,8 +17,9 @@ models.
1617
1718### How to Initialize from Checkpoint
1819
19- ** Note:** TF-HUB/Savedmodel is the preferred way to distribute models as it is
20- self-contained. Please consider using TF-HUB for finetuning tasks first.
20+ ** Note:** TF-HUB/Kaggle-Savedmodel is the preferred way to distribute models as
21+ it is self-contained. Please consider using TF-HUB/Kaggle for finetuning tasks
22+ first.
2123
2224If you use the [ NLP training library] ( train.md ) ,
2325you can specify the checkpoint path link directly when launching your job. For
@@ -29,11 +31,11 @@ python3 train.py \
2931 --params_override=task.init_checkpoint=PATH_TO_INIT_CKPT
3032```
3133
32- ### How to load TF-HUB SavedModel
34+ ### How to load TF-HUB/Kaggle SavedModel
3335
3436Finetuning tasks such as question answering (SQuAD) and sentence
35- prediction (GLUE) support loading a model from TF-HUB. These built-in tasks
36- support a specific ` task.hub_module_url ` parameter. To set this parameter,
37+ prediction (GLUE) support loading a model from TF-HUB/Kaggle . These built-in
38+ tasks support a specific ` task.hub_module_url ` parameter. To set this parameter,
3739replace ` --params_override=task.init_checkpoint=... ` with
3840` --params_override=task.hub_module_url=TF_HUB_URL ` , like below:
3941
@@ -54,7 +56,7 @@ in order to keep consistent with BERT paper.
5456
5557### Checkpoints
5658
57- Model | Configuration | Training Data | Checkpoint & Vocabulary | TF-HUB SavedModels
59+ Model | Configuration | Training Data | Checkpoint & Vocabulary | Kaggle SavedModels
5860---------------------------------------- | :--------------------------: | ------------: | ----------------------: | ------:
5961BERT-base uncased English | uncased_L-12_H-768_A-12 | Wiki + Books | [ uncased_L-12_H-768_A-12] ( https://storage.googleapis.com/tf_model_garden/nlp/bert/v3/uncased_L-12_H-768_A-12.tar.gz ) | [ ` BERT-Base, Uncased ` ] ( https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/ )
6062BERT-base cased English | cased_L-12_H-768_A-12 | Wiki + Books | [ cased_L-12_H-768_A-12] ( https://storage.googleapis.com/tf_model_garden/nlp/bert/v3/cased_L-12_H-768_A-12.tar.gz ) | [ ` BERT-Base, Cased ` ] ( https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/ )
@@ -74,7 +76,7 @@ We also have pretrained BERT models with variants in both network architecture
7476and training methodologies. These models achieve higher downstream accuracy
7577scores.
7678
77- Model | Configuration | Training Data | TF-HUB SavedModels | Comment
79+ Model | Configuration | Training Data | Kaggle SavedModels | Comment
7880-------------------------------- | :----------------------: | -----------------------: | ------------------------------------------------------------------------------------: | ------:
7981BERT-base talking heads + ggelu | uncased_L-12_H-768_A-12 | Wiki + Books | [ talkheads_ggelu_base] ( https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_base/1 ) | BERT-base trained with [ talking heads attention] ( https://arxiv.org/abs/2003.02436 ) and [ gated GeLU] ( https://arxiv.org/abs/2002.05202 ) .
8082BERT-large talking heads + ggelu | uncased_L-24_H-1024_A-16 | Wiki + Books | [ talkheads_ggelu_large] ( https://tfhub.dev/tensorflow/talkheads_ggelu_bert_en_large/1 ) | BERT-large trained with [ talking heads attention] ( https://arxiv.org/abs/2003.02436 ) and [ gated GeLU] ( https://arxiv.org/abs/2002.05202 ) .
@@ -96,13 +98,12 @@ ALBERT repository.
9698
9799### Checkpoints
98100
99- Model | Training Data | Checkpoint & Vocabulary | TF-HUB SavedModels
101+ Model | Training Data | Checkpoint & Vocabulary | Kaggle SavedModels
100102---------------------------------------- | ------------: | ----------------------: | ------:
101- ALBERT-base English | Wiki + Books | [ ` ALBERT Base ` ] ( https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_base.tar.gz ) | https://tfhub.dev/tensorflow/albert_en_base/3
102- ALBERT-large English | Wiki + Books | [ ` ALBERT Large ` ] ( https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_large.tar.gz ) | https://tfhub.dev/tensorflow/albert_en_large/3
103- ALBERT-xlarge English | Wiki + Books | [ ` ALBERT XLarge ` ] ( https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_xlarge.tar.gz ) | https://tfhub.dev/tensorflow/albert_en_xlarge/3
104- ALBERT-xxlarge English | Wiki + Books | [ ` ALBERT XXLarge ` ] ( https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_xxlarge.tar.gz ) | https://tfhub.dev/tensorflow/albert_en_xxlarge/3
105-
103+ ALBERT-base English | Wiki + Books | [ ` ALBERT Base ` ] ( https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_base.tar.gz ) | [ albert_en_base] ( https://tfhub.dev/tensorflow/albert_en_base/3 )
104+ ALBERT-large English | Wiki + Books | [ ` ALBERT Large ` ] ( https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_large.tar.gz ) | [ albert_en_large] ( https://tfhub.dev/tensorflow/albert_en_large/3 )
105+ ALBERT-xlarge English | Wiki + Books | [ ` ALBERT XLarge ` ] ( https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_xlarge.tar.gz ) | [ albert_en_xlarge] ( https://tfhub.dev/tensorflow/albert_en_xlarge/3 )
106+ ALBERT-xxlarge English | Wiki + Books | [ ` ALBERT XXLarge ` ] ( https://storage.googleapis.com/tf_model_garden/nlp/albert/albert_xxlarge.tar.gz ) | [ albert_en_xxlarge] ( https://tfhub.dev/tensorflow/albert_en_xxlarge/3 )
106107
107108## ELECTRA
108109
0 commit comments