Skip to content

Commit 8b29b4d

Browse files
committed
fix wmt readme
1 parent 6dece93 commit 8b29b4d

File tree

1 file changed

+22
-20
lines changed

1 file changed

+22
-20
lines changed

examples/wmt/README.md

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -12,33 +12,21 @@ https://arxiv.org/abs/1806.00187
1212

1313
### Training a new model on WMT'16 En-De
1414

15-
First download the [preprocessed WMT'16 En-De data provided by Google](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8).
16-
17-
Then:
18-
19-
##### 1. Extract the WMT'16 En-De data
20-
```bash
21-
TEXT=wmt16_en_de_bpe32k
22-
mkdir -p $TEXT
23-
tar -xzvf wmt16_en_de.tar.gz -C $TEXT
24-
```
25-
26-
##### 2. Preprocess the dataset with a joined dictionary
15+
##### 1. Preprocess the dataset with a joined dictionary (optional)
2716
```bash
28-
RAW=raw
2917
TOK=tok
3018
BIN=bin
3119
rm -rf $TOK $BIN
3220
mkdir -p $TOK $BIN
3321
# train
34-
cp wmt16_en_de_bpe32k/train.tok.clean.bpe.32000.en $TOK/train.bpe.source
35-
cp wmt16_en_de_bpe32k/train.tok.clean.bpe.32000.de $TOK/train.bpe.target
22+
wget -O $TOK/train.bpe.source https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/tok/train.bpe.source
23+
wget -O $TOK/train.bpe.target https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/tok/train.bpe.target
3624
# val
37-
cp wmt16_en_de_bpe32k/newstest2013.tok.bpe.32000.en $TOK/val.bpe.source
38-
cp wmt16_en_de_bpe32k/newstest2013.tok.bpe.32000.de $TOK/val.bpe.target
25+
wget -O $TOK/val.bpe.source https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/tok/val.bpe.source
26+
wget -O $TOK/val.bpe.target https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/tok/val.bpe.target
3927
# test
40-
cat wmt16_en_de_bpe32k/newstest201[456].tok.bpe.32000.en > $TOK/test.bpe.source
41-
cat wmt16_en_de_bpe32k/newstest201[456].tok.bpe.32000.de > $TOK/test.bpe.target
28+
wget -O $TOK/test.bpe.source https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/tok/test.bpe.source
29+
wget -O $TOK/test.bpe.target https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/tok/test.bpe.target
4230
fairseq-preprocess \
4331
--source-lang source --target-lang target \
4432
--validpref $TOK/val.bpe \
@@ -50,7 +38,21 @@ fairseq-preprocess \
5038
--workers 20
5139
```
5240

53-
##### 3. Train a model (optional)
41+
Or you can download the preprocessed data directly
42+
```bash
43+
TOK=tok
44+
BIN=bin
45+
rm -rf $TOK $BIN
46+
mkdir -p $TOK $BIN
47+
wget -O $BIN/dict.source.txt https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/bin/dict.source.txt
48+
wget -O $BIN/dict.target.txt https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/bin/dict.target.txt
49+
wget -O $BIN/test.source-target.source.bin https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/bin/test.source-target.source.bin
50+
wget -O $BIN/test.source-target.source.idx https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/bin/test.source-target.source.idx
51+
wget -O $BIN/test.source-target.target.bin https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/bin/test.source-target.target.bin
52+
wget -O $BIN/test.source-target.target.idx https://fastseq.blob.core.windows.net/data/tasks/wmt16_en_de_bpe32k/bin/test.source-target.target.idx
53+
```
54+
55+
##### 2. Train a model (optional)
5456
```bash
5557
fairseq-train \
5658
bin/ \

0 commit comments

Comments
 (0)