Skip to content

Commit d1862ca

Browse files
Internal change
PiperOrigin-RevId: 417673004
1 parent 5c1cb32 commit d1862ca

27 files changed

+5628
-0
lines changed

official/legacy/transformer/README.md

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# Transformer Translation Model
2+
This is an implementation of the Transformer translation model as described in
3+
the [Attention is All You Need](https://arxiv.org/abs/1706.03762) paper. The
4+
implementation leverages tf.keras and makes sure it is compatible with TF 2.x.
5+
6+
**Warning: the features in the `transformer/` folder have been fully intergrated
7+
into nlp/modeling.
8+
Due to its dependencies, we will remove this folder after the model
9+
garden 2.5 release. The model in `nlp/modeling/models/seq2seq_transformer.py` is
10+
identical to the model in this folder.**
11+
12+
## Contents
13+
* [Contents](#contents)
14+
* [Walkthrough](#walkthrough)
15+
* [Detailed instructions](#detailed-instructions)
16+
* [Environment preparation](#environment-preparation)
17+
* [Download and preprocess datasets](#download-and-preprocess-datasets)
18+
* [Model training and evaluation](#model-training-and-evaluation)
19+
* [Implementation overview](#implementation-overview)
20+
* [Model Definition](#model-definition)
21+
* [Model Trainer](#model-trainer)
22+
* [Test dataset](#test-dataset)
23+
24+
## Walkthrough
25+
26+
Below are the commands for running the Transformer model. See the
27+
[Detailed instructions](#detailed-instructions) for more details on running the
28+
model.
29+
30+
```
31+
# Ensure that PYTHONPATH is correctly defined as described in
32+
# https://github.com/tensorflow/models/tree/master/official#requirements
33+
export PYTHONPATH="$PYTHONPATH:/path/to/models"
34+
35+
cd /path/to/models/official/legacy/transformer
36+
37+
# Export variables
38+
PARAM_SET=big
39+
DATA_DIR=$HOME/transformer/data
40+
MODEL_DIR=$HOME/transformer/model_$PARAM_SET
41+
VOCAB_FILE=$DATA_DIR/vocab.ende.32768
42+
43+
# Download training/evaluation/test datasets
44+
python3 data_download.py --data_dir=$DATA_DIR
45+
46+
# Train the model for 100000 steps and evaluate every 5000 steps on a single GPU.
47+
# Each train step, takes 4096 tokens as a batch budget with 64 as sequence
48+
# maximal length.
49+
python3 transformer_main.py --data_dir=$DATA_DIR --model_dir=$MODEL_DIR \
50+
--vocab_file=$VOCAB_FILE --param_set=$PARAM_SET \
51+
--train_steps=100000 --steps_between_evals=5000 \
52+
--batch_size=4096 --max_length=64 \
53+
--bleu_source=$DATA_DIR/newstest2014.en \
54+
--bleu_ref=$DATA_DIR/newstest2014.de \
55+
--num_gpus=1 \
56+
--enable_time_history=false
57+
58+
# Run during training in a separate process to get continuous updates,
59+
# or after training is complete.
60+
tensorboard --logdir=$MODEL_DIR
61+
```
62+
63+
## Detailed instructions
64+
65+
66+
0. ### Environment preparation
67+
68+
#### Add models repo to PYTHONPATH
69+
Follow the instructions described in the [Requirements](https://github.com/tensorflow/models/tree/master/official#requirements) section to add the models folder to the python path.
70+
71+
#### Export variables (optional)
72+
73+
Export the following variables, or modify the values in each of the snippets below:
74+
75+
```shell
76+
PARAM_SET=big
77+
DATA_DIR=$HOME/transformer/data
78+
MODEL_DIR=$HOME/transformer/model_$PARAM_SET
79+
VOCAB_FILE=$DATA_DIR/vocab.ende.32768
80+
```
81+
82+
1. ### Download and preprocess datasets
83+
84+
[data_download.py](data_download.py) downloads and preprocesses the training and evaluation WMT datasets. After the data is downloaded and extracted, the training data is used to generate a vocabulary of subtokens. The evaluation and training strings are tokenized, and the resulting data is sharded, shuffled, and saved as TFRecords.
85+
86+
1.75GB of compressed data will be downloaded. In total, the raw files (compressed, extracted, and combined files) take up 8.4GB of disk space. The resulting TFRecord and vocabulary files are 722MB. The script takes around 40 minutes to run, with the bulk of the time spent downloading and ~15 minutes spent on preprocessing.
87+
88+
Command to run:
89+
```
90+
python3 data_download.py --data_dir=$DATA_DIR
91+
```
92+
93+
Arguments:
94+
* `--data_dir`: Path where the preprocessed TFRecord data, and vocab file will be saved.
95+
* Use the `--help` or `-h` flag to get a full list of possible arguments.
96+
97+
2. ### Model training and evaluation
98+
99+
[transformer_main.py](transformer_main.py) creates a Transformer keras model,
100+
and trains it uses keras model.fit().
101+
102+
Users need to adjust `batch_size` and `num_gpus` to get good performance
103+
running multiple GPUs.
104+
105+
**Note that:**
106+
when using multiple GPUs or TPUs, this is the global batch size for all
107+
devices. For example, if the batch size is `4096*4` and there are 4 devices,
108+
each device will take 4096 tokens as a batch budget.
109+
110+
Command to run:
111+
```
112+
python3 transformer_main.py --data_dir=$DATA_DIR --model_dir=$MODEL_DIR \
113+
--vocab_file=$VOCAB_FILE --param_set=$PARAM_SET
114+
```
115+
116+
Arguments:
117+
* `--data_dir`: This should be set to the same directory given to the `data_download`'s `data_dir` argument.
118+
* `--model_dir`: Directory to save Transformer model training checkpoints.
119+
* `--vocab_file`: Path to subtoken vocabulary file. If data_download was used, you may find the file in `data_dir`.
120+
* `--param_set`: Parameter set to use when creating and training the model. Options are `base` and `big` (default).
121+
* `--enable_time_history`: Whether add TimeHistory call. If so, --log_steps must be specified.
122+
* `--batch_size`: The number of tokens to consider in a batch. Combining with
123+
`--max_length`, they decide how many sequences are used per batch.
124+
* Use the `--help` or `-h` flag to get a full list of possible arguments.
125+
126+
#### Using multiple GPUs
127+
You can train these models on multiple GPUs using `tf.distribute.Strategy` API.
128+
You can read more about them in this
129+
[guide](https://www.tensorflow.org/guide/distribute_strategy).
130+
131+
In this example, we have made it easier to use is with just a command line flag
132+
`--num_gpus`. By default this flag is 1 if TensorFlow is compiled with CUDA,
133+
and 0 otherwise.
134+
135+
- --num_gpus=0: Uses tf.distribute.OneDeviceStrategy with CPU as the device.
136+
- --num_gpus=1: Uses tf.distribute.OneDeviceStrategy with GPU as the device.
137+
- --num_gpus=2+: Uses tf.distribute.MirroredStrategy to run synchronous
138+
distributed training across the GPUs.
139+
140+
#### Using Cloud TPUs
141+
142+
You can train the Transformer model on Cloud TPUs using
143+
`tf.distribute.TPUStrategy`. If you are not familiar with Cloud TPUs, it is
144+
strongly recommended that you go through the
145+
[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to
146+
create a TPU and GCE VM.
147+
148+
To run the Transformer model on a TPU, you must set
149+
`--distribution_strategy=tpu`, `--tpu=$TPU_NAME`, and `--use_ctl=True` where
150+
`$TPU_NAME` the name of your TPU in the Cloud Console.
151+
152+
An example command to run Transformer on a v2-8 or v3-8 TPU would be:
153+
154+
```bash
155+
python transformer_main.py \
156+
--tpu=$TPU_NAME \
157+
--model_dir=$MODEL_DIR \
158+
--data_dir=$DATA_DIR \
159+
--vocab_file=$DATA_DIR/vocab.ende.32768 \
160+
--bleu_source=$DATA_DIR/newstest2014.en \
161+
--bleu_ref=$DATA_DIR/newstest2014.end \
162+
--batch_size=6144 \
163+
--train_steps=2000 \
164+
--static_batch=true \
165+
--use_ctl=true \
166+
--param_set=big \
167+
--max_length=64 \
168+
--decode_batch_size=32 \
169+
--decode_max_length=97 \
170+
--padded_decode=true \
171+
--distribution_strategy=tpu
172+
```
173+
Note: `$MODEL_DIR` and `$DATA_DIR` must be GCS paths.
174+
175+
#### Customizing training schedule
176+
177+
By default, the model will train for 10 epochs, and evaluate after every epoch. The training schedule may be defined through the flags:
178+
179+
* Training with steps:
180+
* `--train_steps`: sets the total number of training steps to run.
181+
* `--steps_between_evals`: Number of training steps to run between evaluations.
182+
183+
#### Compute BLEU score during model evaluation
184+
185+
Use these flags to compute the BLEU when the model evaluates:
186+
187+
* `--bleu_source`: Path to file containing text to translate.
188+
* `--bleu_ref`: Path to file containing the reference translation.
189+
190+
When running `transformer_main.py`, use the flags: `--bleu_source=$DATA_DIR/newstest2014.en --bleu_ref=$DATA_DIR/newstest2014.de`
191+
192+
#### Tensorboard
193+
Training and evaluation metrics (loss, accuracy, approximate BLEU score, etc.) are logged, and can be displayed in the browser using Tensorboard.
194+
```
195+
tensorboard --logdir=$MODEL_DIR
196+
```
197+
The values are displayed at [localhost:6006](localhost:6006).
198+
199+
## Implementation overview
200+
201+
A brief look at each component in the code:
202+
203+
### Model Definition
204+
* [transformer.py](transformer.py): Defines a tf.keras.Model: `Transformer`.
205+
* [embedding_layer.py](embedding_layer.py): Contains the layer that calculates the embeddings. The embedding weights are also used to calculate the pre-softmax probabilities from the decoder output.
206+
* [attention_layer.py](attention_layer.py): Defines the multi-headed and self attention layers that are used in the encoder/decoder stacks.
207+
* [ffn_layer.py](ffn_layer.py): Defines the feedforward network that is used in the encoder/decoder stacks. The network is composed of 2 fully connected layers.
208+
209+
Other files:
210+
* [beam_search.py](beam_search.py) contains the beam search implementation, which is used during model inference to find high scoring translations.
211+
212+
### Model Trainer
213+
[transformer_main.py](transformer_main.py) creates an `TransformerTask` to train and evaluate the model using tf.keras.
214+
215+
### Test dataset
216+
The [newstest2014 files](https://storage.googleapis.com/tf-perf-public/official_transformer/test_data/newstest2014.tgz)
217+
are extracted from the [NMT Seq2Seq tutorial](https://google.github.io/seq2seq/nmt/#download-data).
218+
The raw text files are converted from the SGM format of the
219+
[WMT 2016](http://www.statmt.org/wmt16/translation-task.html) test sets. The
220+
newstest2014 files are put into the `$DATA_DIR` when executing `data_download.py`
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+

0 commit comments

Comments
 (0)