Skip to content

Commit 4ba8729

Browse files
Modify README of bert example (#174)
* Modify README of bert example * fix some comments * fix some comments
1 parent cb746fa commit 4ba8729

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

examples/bert/README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ The latest wikipedia dump can be downloaded [at this link](https://dumps.wikimed
7979
or via command line:
8080

8181
```shell
82-
curl https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
82+
curl -O https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
8383
```
8484
The dump can be extracted with the `wikiextractor` tool.
8585

@@ -126,7 +126,12 @@ The `create_vocabulary.py` script allows you to compute your own WordPiece
126126
vocabulary for use with BERT. In most cases however, it is desirable to use the
127127
standard BERT vocabularies from the original models. You can download the
128128
English uncased vocabulary
129-
[here](https://storage.googleapis.com/tensorflow/keras-nlp/examples/bert/bert_vocab_uncased.txt).
129+
[here](https://storage.googleapis.com/tensorflow/keras-nlp/examples/bert/bert_vocab_uncased.txt),
130+
or in your terminal run:
131+
132+
```shell
133+
curl -O https://storage.googleapis.com/tensorflow/keras-nlp/examples/bert/bert_vocab_uncased.txt
134+
```
130135

131136
### Tokenize, mask, and combine sentences into training examples
132137

@@ -169,7 +174,7 @@ for file in path/to/sentence-split-data/*; do
169174
output="path/to/pretraining-data/$(basename -- "$file" .txt).tfrecord"
170175
python examples/bert/create_pretraining_data.py \
171176
--input_files ${file} \
172-
--vocab_file vocab.txt \
177+
--vocab_file bert_vocab_uncased.txt \
173178
--output_file ${output}
174179
done
175180
```
@@ -183,7 +188,7 @@ for file in path/to/sentence-split-data/*; do
183188
output="path/to/pretraining-data/$(basename -- "$file" .txt).tfrecord"
184189
echo python examples/bert/create_pretraining_data.py \
185190
--input_files ${file} \
186-
--vocab_file vocab.txt \
191+
--vocab_file bert_vocab_uncased.txt \
187192
--output_file ${output}
188193
done | parallel -j ${NUM_JOBS}
189194
```

0 commit comments

Comments
 (0)