Skip to content

Commit 17b93de

Browse files
authored
language fix for the docs
I found TensorLayer very useful and its tutorials are actually clearer than TensorFlow's tutorials. However, I found many typos, grammar errors, and other language uses that had a negative impact on the reading experience. For this edit I am just revising the PTB example in this tutorial. Later I can revise the other parts if needed.
1 parent 979c1e0 commit 17b93de

File tree

1 file changed

+53
-55
lines changed

1 file changed

+53
-55
lines changed

docs/user/tutorial.rst

Lines changed: 53 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -971,24 +971,24 @@ challenging task of language modeling.
971971

972972
Given a sentence "I am from Imperial College London", the model can learn to
973973
predict "Imperial College London" from "from Imperial College". In other
974-
word, it predict next words in a text given a history of previous words.
975-
In previous example , ``num_steps (sequence length)`` is 3.
974+
word, it predict the next word in a text given a history of previous words.
975+
In the previous example , ``num_steps`` (sequence length) is 3.
976976

977977
.. code-block:: bash
978978
979979
python tutorial_ptb_lstm.py
980980
981981
982-
The script provides three settings (small, medium, large), larger model has
983-
better performance, you can choice different setting in:
982+
The script provides three settings (small, medium, large), where a larger model has
983+
better performance. You can choose different settings in:
984984

985985
.. code-block:: python
986986
987987
flags.DEFINE_string(
988988
"model", "small",
989989
"A type of model. Possible options are: small, medium, large.")
990990
991-
If you choice small setting, you can see:
991+
If you choose the small setting, you can see:
992992

993993
.. code-block:: text
994994
@@ -1021,11 +1021,11 @@ If you choice small setting, you can see:
10211021
Epoch: 13 Valid Perplexity: 121.475
10221022
Test Perplexity: 116.716
10231023
1024-
The PTB example proves RNN is able to modeling language, but this example
1025-
did not do something practical. However, you should read through this example
1026-
and “Understand LSTM” in order to understand the basic of RNN.
1027-
After that, you learn how to generate text, how to achieve language translation
1028-
and how to build a questions answering system by using RNN.
1024+
The PTB example shows that RNN is able to model language, but this example
1025+
did not do something practically interesting. However, you should read through this example
1026+
and “Understand LSTM” in order to understand the basics of RNN.
1027+
After that, you will learn how to generate text, how to achieve language translation,
1028+
and how to build a question answering system by using RNN.
10291029

10301030

10311031
Understand LSTM
@@ -1038,7 +1038,7 @@ We personally think Andrey Karpathy's blog is the best material to
10381038
`Understand Recurrent Neural Network`_ , after reading that, Colah's blog can
10391039
help you to `Understand LSTM Network`_ `[chinese] <http://dataunion.org/9331.html>`_
10401040
which can solve The Problem of Long-Term
1041-
Dependencies. We do not describe more about RNN, please read through these blogs
1041+
Dependencies. We will not describe more about the theory of RNN, so please read through these blogs
10421042
before you go on.
10431043

10441044
.. _fig_0601:
@@ -1051,28 +1051,28 @@ Image by Andrey Karpathy
10511051
Synced sequence input and output
10521052
---------------------------------
10531053

1054-
The model in PTB example is a typically type of synced sequence input and output,
1054+
The model in PTB example is a typical type of synced sequence input and output,
10551055
which was described by Karpathy as
10561056
"(5) Synced sequence input and output (e.g. video classification where we wish
1057-
to label each frame of the video). Notice that in every case are no pre-specified
1058-
constraints on the lengths sequences because the recurrent transformation (green)
1059-
is fixed and can be applied as many times as we like."
1060-
1061-
The model is built as follow. Firstly, transfer the words into word vectors by
1062-
looking up an embedding matrix, in this tutorial, no pre-training on embedding
1063-
matrix. Secondly, we stacked two LSTMs together use dropout among the embedding
1064-
layer, LSTM layers and output layer for regularization. In the last layer,
1057+
to label each frame of the video). Notice that in every case there are no pre-specified
1058+
constraints on the lengths of sequences because the recurrent transformation (green)
1059+
can be applied as many times as we like."
1060+
1061+
The model is built as follows. Firstly, we transfer the words into word vectors by
1062+
looking up an embedding matrix. In this tutorial, there is no pre-training on the embedding
1063+
matrix. Secondly, we stack two LSTMs together using dropout between the embedding
1064+
layer, LSTM layers, and the output layer for regularization. In the final layer,
10651065
the model provides a sequence of softmax outputs.
10661066

1067-
The first LSTM layer outputs [batch_size, num_steps, hidden_size] for stacking
1068-
another LSTM after it. The second LSTM layer outputs [batch_size*num_steps, hidden_size]
1069-
for stacking DenseLayer after it, then compute the softmax outputs of each example
1070-
(n_examples = batch_size*num_steps).
1067+
The first LSTM layer outputs ``[batch_size, num_steps, hidden_size]`` for stacking
1068+
another LSTM after it. The second LSTM layer outputs ``[batch_size*num_steps, hidden_size]``
1069+
for stacking a DenseLayer after it. Then the DenseLayer computes the softmax outputs of each example
1070+
``n_examples = batch_size*num_steps``).
10711071

10721072
To understand the PTB tutorial, you can also read `TensorFlow PTB tutorial
10731073
<https://www.tensorflow.org/versions/r0.9/tutorials/recurrent/index.html#recurrent-neural-networks>`_.
10741074

1075-
(Note that, TensorLayer supports DynamicRNNLayer after v1.1, so you can set the input/output dropouts, number of RNN layer in one single layer)
1075+
(Note that, TensorLayer supports DynamicRNNLayer after v1.1, so you can set the input/output dropouts, number of RNN layers in one single layer)
10761076

10771077

10781078
.. code-block:: python
@@ -1118,26 +1118,26 @@ To understand the PTB tutorial, you can also read `TensorFlow PTB tutorial
11181118
Dataset iteration
11191119
^^^^^^^^^^^^^^^^^
11201120

1121-
The batch_size can be seem as how many concurrent computations.
1122-
As the following example shows, the first batch learn the sequence information by using 0 to 9.
1123-
The second batch learn the sequence information by using 10 to 19.
1124-
So it ignores the information from 9 to 10 !\n
1125-
If only if we set the batch_size = 1, it will consider all information from 0 to 20.
1121+
The ``batch_size`` can be seen as the number of concurrent computations we are running.
1122+
As the following example shows, the first batch learns the sequence information by using items 0 to 9.
1123+
The second batch learn the sequence information by using items 10 to 19.
1124+
So it ignores the information from items 9 to 10 !\n
1125+
If only if we set ``batch_size = 1```, it will consider all the information from items 0 to 20.
11261126

1127-
The meaning of batch_size here is not the same with the batch_size in MNIST example. In MNIST example,
1128-
batch_size reflects how many examples we consider in each iteration, while in
1129-
PTB example, batch_size is how many concurrent processes (segments)
1130-
for speed up computation.
1127+
The meaning of ``batch_size`` here is not the same as the ``batch_size`` in the MNIST example. In the MNIST example,
1128+
``batch_size`` reflects how many examples we consider in each iteration, while in the
1129+
PTB example, ``batch_size`` is the number of concurrent processes (segments)
1130+
for accelerating the computation.
11311131

1132-
Some Information will be ignored if batch_size > 1, however, if your dataset
1133-
is "long" enough (a text corpus usually has billions words), the ignored
1134-
information would not effect the final result.
1132+
Some information will be ignored if ``batch_size`` > 1, however, if your dataset
1133+
is "long" enough (a text corpus usually has billions of words), the ignored
1134+
information would not affect the final result.
11351135

1136-
In PTB tutorial, we set batch_size = 20, so we cut the dataset into 20 segments.
1137-
At the beginning of each epoch, we initialize (reset) the 20 RNN states for 20
1138-
segments, then go through 20 segments separately.
1136+
In the PTB tutorial, we set ``batch_size = 20``, so we divide the dataset into 20 segments.
1137+
At the beginning of each epoch, we initialize (reset) the 20 RNN states for the 20
1138+
segments to zero, then go through the 20 segments separately.
11391139

1140-
A example of generating training data as follow:
1140+
A example of generating training data is as follows:
11411141

11421142
.. code-block:: python
11431143
@@ -1169,7 +1169,7 @@ A example of generating training data as follow:
11691169
Loss and update expressions
11701170
^^^^^^^^^^^^^^^^^^^^^^^^^^^
11711171

1172-
The cost function is the averaged cost of each mini-batch:
1172+
The cost function is the average cost of each mini-batch:
11731173

11741174
.. code-block:: python
11751175
@@ -1181,7 +1181,7 @@ The cost function is the averaged cost of each mini-batch:
11811181
# targets : 2D tensor [batch_size, num_steps], need to be reshaped.
11821182
# n_examples = batch_size * num_steps
11831183
# so
1184-
# cost is the averaged cost of each mini-batch (concurrent process).
1184+
# cost is the average cost of each mini-batch (concurrent process).
11851185
loss = tf.nn.seq2seq.sequence_loss_by_example(
11861186
[outputs],
11871187
[tf.reshape(targets, [-1])],
@@ -1193,9 +1193,7 @@ The cost function is the averaged cost of each mini-batch:
11931193
cost = loss_fn(network.outputs, targets, batch_size, num_steps)
11941194
11951195
1196-
For updating, this example decreases the initial learning rate after several
1197-
epochs (defined by ``max_epoch``), by multiplying a ``lr_decay``. In addition,
1198-
truncated backpropagation clips values of gradients by the ratio of the sum of
1196+
For updating, truncated backpropagation clips values of gradients by the ratio of the sum of
11991197
their norms, so as to make the learning process tractable.
12001198

12011199
.. code-block:: python
@@ -1210,7 +1208,7 @@ their norms, so as to make the learning process tractable.
12101208
train_op = optimizer.apply_gradients(zip(grads, tvars))
12111209
12121210
1213-
If the epoch index greater than ``max_epoch``, decrease the learning rate
1211+
In addition, if the epoch index is greater than ``max_epoch``, we decrease the learning rate
12141212
by multipling ``lr_decay``.
12151213

12161214
.. code-block:: python
@@ -1220,8 +1218,8 @@ by multipling ``lr_decay``.
12201218
12211219
12221220
At the beginning of each epoch, all states of LSTMs need to be reseted
1223-
(initialized) to zero states, then after each iteration, the LSTMs' states
1224-
is updated, so the new LSTM states (final states) need to be assigned as the initial states of next iteration:
1221+
(initialized) to zero states. Then after each iteration, the LSTMs' states
1222+
is updated, so the new LSTM states (final states) need to be assigned as the initial states of the next iteration:
12251223

12261224
.. code-block:: python
12271225
@@ -1249,8 +1247,8 @@ Predicting
12491247
^^^^^^^^^^^^^
12501248

12511249
After training the model, when we predict the next output, we no long consider
1252-
the number of steps (sequence length), i.e. ``batch_size, num_steps`` are ``1``.
1253-
Then we can output the next word step by step, instead of predict a sequence
1250+
the number of steps (sequence length), i.e. ``batch_size, num_steps`` are set to ``1``.
1251+
Then we can output the next word one by one, instead of predicting a sequence
12541252
of words from a sequence of words.
12551253

12561254
.. code-block:: python
@@ -1291,12 +1289,12 @@ of words from a sequence of words.
12911289
What Next?
12921290
-----------
12931291

1294-
Now, you understand Synced sequence input and output. Let think about
1295-
Many to one (Sequence input and one output), LSTM is able to predict
1292+
Now, you have understood Synced sequence input and output. Let's think about
1293+
Many to one (Sequence input and one output), so that LSTM is able to predict
12961294
the next word "English" from "I am from London, I speak ..".
12971295

1298-
Please read and understand the code of ``tutorial_generate_text.py``,
1299-
it show you how to restore a pre-trained Embedding matrix and how to learn text
1296+
Please read and understand the code of ``tutorial_generate_text.py``.
1297+
It shows you how to restore a pre-trained Embedding matrix and how to learn text
13001298
generation from a given context.
13011299

13021300
Karpathy's blog :

0 commit comments

Comments
 (0)