@@ -971,24 +971,24 @@ challenging task of language modeling.
971971
972972Given a sentence "I am from Imperial College London", the model can learn to
973973predict "Imperial College London" from "from Imperial College". In other
974- word, it predict next words in a text given a history of previous words.
975- In previous example , ``num_steps (sequence length) `` is 3.
974+ word, it predict the next word in a text given a history of previous words.
975+ In the previous example , ``num_steps `` (sequence length) is 3.
976976
977977.. code-block :: bash
978978
979979 python tutorial_ptb_lstm.py
980980
981981
982- The script provides three settings (small, medium, large), larger model has
983- better performance, you can choice different setting in:
982+ The script provides three settings (small, medium, large), where a larger model has
983+ better performance. You can choose different settings in:
984984
985985.. code-block :: python
986986
987987 flags.DEFINE_string(
988988 " model" , " small" ,
989989 " A type of model. Possible options are: small, medium, large." )
990990
991- If you choice small setting, you can see:
991+ If you choose the small setting, you can see:
992992
993993.. code-block :: text
994994
@@ -1021,11 +1021,11 @@ If you choice small setting, you can see:
10211021 Epoch: 13 Valid Perplexity: 121.475
10221022 Test Perplexity: 116.716
10231023
1024- The PTB example proves RNN is able to modeling language, but this example
1025- did not do something practical . However, you should read through this example
1026- and “Understand LSTM” in order to understand the basic of RNN.
1027- After that, you learn how to generate text, how to achieve language translation
1028- and how to build a questions answering system by using RNN.
1024+ The PTB example shows that RNN is able to model language, but this example
1025+ did not do something practically interesting . However, you should read through this example
1026+ and “Understand LSTM” in order to understand the basics of RNN.
1027+ After that, you will learn how to generate text, how to achieve language translation,
1028+ and how to build a question answering system by using RNN.
10291029
10301030
10311031Understand LSTM
@@ -1038,7 +1038,7 @@ We personally think Andrey Karpathy's blog is the best material to
10381038`Understand Recurrent Neural Network `_ , after reading that, Colah's blog can
10391039help you to `Understand LSTM Network `_ `[chinese] <http://dataunion.org/9331.html >`_
10401040which can solve The Problem of Long-Term
1041- Dependencies. We do not describe more about RNN, please read through these blogs
1041+ Dependencies. We will not describe more about the theory of RNN, so please read through these blogs
10421042before you go on.
10431043
10441044.. _fig_0601 :
@@ -1051,28 +1051,28 @@ Image by Andrey Karpathy
10511051Synced sequence input and output
10521052---------------------------------
10531053
1054- The model in PTB example is a typically type of synced sequence input and output,
1054+ The model in PTB example is a typical type of synced sequence input and output,
10551055which was described by Karpathy as
10561056"(5) Synced sequence input and output (e.g. video classification where we wish
1057- to label each frame of the video). Notice that in every case are no pre-specified
1058- constraints on the lengths sequences because the recurrent transformation (green)
1059- is fixed and can be applied as many times as we like."
1060-
1061- The model is built as follow . Firstly, transfer the words into word vectors by
1062- looking up an embedding matrix, in this tutorial, no pre-training on embedding
1063- matrix. Secondly, we stacked two LSTMs together use dropout among the embedding
1064- layer, LSTM layers and output layer for regularization. In the last layer,
1057+ to label each frame of the video). Notice that in every case there are no pre-specified
1058+ constraints on the lengths of sequences because the recurrent transformation (green)
1059+ can be applied as many times as we like."
1060+
1061+ The model is built as follows . Firstly, we transfer the words into word vectors by
1062+ looking up an embedding matrix. In this tutorial, there is no pre-training on the embedding
1063+ matrix. Secondly, we stack two LSTMs together using dropout between the embedding
1064+ layer, LSTM layers, and the output layer for regularization. In the final layer,
10651065the model provides a sequence of softmax outputs.
10661066
1067- The first LSTM layer outputs [batch_size, num_steps, hidden_size] for stacking
1068- another LSTM after it. The second LSTM layer outputs [batch_size*num_steps, hidden_size]
1069- for stacking DenseLayer after it, then compute the softmax outputs of each example
1070- (n_examples = batch_size*num_steps).
1067+ The first LSTM layer outputs `` [batch_size, num_steps, hidden_size] `` for stacking
1068+ another LSTM after it. The second LSTM layer outputs `` [batch_size*num_steps, hidden_size] ``
1069+ for stacking a DenseLayer after it. Then the DenseLayer computes the softmax outputs of each example
1070+ (`` n_examples = batch_size*num_steps `` ).
10711071
10721072To understand the PTB tutorial, you can also read `TensorFlow PTB tutorial
10731073<https://www.tensorflow.org/versions/r0.9/tutorials/recurrent/index.html#recurrent-neural-networks> `_.
10741074
1075- (Note that, TensorLayer supports DynamicRNNLayer after v1.1, so you can set the input/output dropouts, number of RNN layer in one single layer)
1075+ (Note that, TensorLayer supports DynamicRNNLayer after v1.1, so you can set the input/output dropouts, number of RNN layers in one single layer)
10761076
10771077
10781078.. code-block :: python
@@ -1118,26 +1118,26 @@ To understand the PTB tutorial, you can also read `TensorFlow PTB tutorial
11181118 Dataset iteration
11191119^^^^^^^^^^^^^^^^^
11201120
1121- The batch_size can be seem as how many concurrent computations.
1122- As the following example shows, the first batch learn the sequence information by using 0 to 9.
1123- The second batch learn the sequence information by using 10 to 19.
1124- So it ignores the information from 9 to 10 !\n
1125- If only if we set the batch_size = 1, it will consider all information from 0 to 20.
1121+ The `` batch_size `` can be seen as the number of concurrent computations we are running .
1122+ As the following example shows, the first batch learns the sequence information by using items 0 to 9.
1123+ The second batch learn the sequence information by using items 10 to 19.
1124+ So it ignores the information from items 9 to 10 !\n
1125+ If only if we set `` batch_size = 1` `` , it will consider all the information from items 0 to 20.
11261126
1127- The meaning of batch_size here is not the same with the batch_size in MNIST example. In MNIST example,
1128- batch_size reflects how many examples we consider in each iteration, while in
1129- PTB example, batch_size is how many concurrent processes (segments)
1130- for speed up computation.
1127+ The meaning of `` batch_size `` here is not the same as the `` batch_size `` in the MNIST example. In the MNIST example,
1128+ `` batch_size `` reflects how many examples we consider in each iteration, while in the
1129+ PTB example, `` batch_size `` is the number of concurrent processes (segments)
1130+ for accelerating the computation.
11311131
1132- Some Information will be ignored if batch_size > 1, however, if your dataset
1133- is "long" enough (a text corpus usually has billions words), the ignored
1134- information would not effect the final result.
1132+ Some information will be ignored if `` batch_size `` > 1, however, if your dataset
1133+ is "long" enough (a text corpus usually has billions of words), the ignored
1134+ information would not affect the final result.
11351135
1136- In PTB tutorial, we set batch_size = 20, so we cut the dataset into 20 segments.
1137- At the beginning of each epoch, we initialize (reset) the 20 RNN states for 20
1138- segments, then go through 20 segments separately.
1136+ In the PTB tutorial, we set `` batch_size = 20 `` , so we divide the dataset into 20 segments.
1137+ At the beginning of each epoch, we initialize (reset) the 20 RNN states for the 20
1138+ segments to zero , then go through the 20 segments separately.
11391139
1140- A example of generating training data as follow :
1140+ A example of generating training data is as follows :
11411141
11421142.. code-block :: python
11431143
@@ -1169,7 +1169,7 @@ A example of generating training data as follow:
11691169Loss and update expressions
11701170^^^^^^^^^^^^^^^^^^^^^^^^^^^
11711171
1172- The cost function is the averaged cost of each mini-batch:
1172+ The cost function is the average cost of each mini-batch:
11731173
11741174.. code-block :: python
11751175
@@ -1181,7 +1181,7 @@ The cost function is the averaged cost of each mini-batch:
11811181 # targets : 2D tensor [batch_size, num_steps], need to be reshaped.
11821182 # n_examples = batch_size * num_steps
11831183 # so
1184- # cost is the averaged cost of each mini-batch (concurrent process).
1184+ # cost is the average cost of each mini-batch (concurrent process).
11851185 loss = tf.nn.seq2seq.sequence_loss_by_example(
11861186 [outputs],
11871187 [tf.reshape(targets, [- 1 ])],
@@ -1193,9 +1193,7 @@ The cost function is the averaged cost of each mini-batch:
11931193 cost = loss_fn(network.outputs, targets, batch_size, num_steps)
11941194
11951195
1196- For updating, this example decreases the initial learning rate after several
1197- epochs (defined by ``max_epoch ``), by multiplying a ``lr_decay ``. In addition,
1198- truncated backpropagation clips values of gradients by the ratio of the sum of
1196+ For updating, truncated backpropagation clips values of gradients by the ratio of the sum of
11991197their norms, so as to make the learning process tractable.
12001198
12011199.. code-block :: python
@@ -1210,7 +1208,7 @@ their norms, so as to make the learning process tractable.
12101208 train_op = optimizer.apply_gradients(zip (grads, tvars))
12111209
12121210
1213- If the epoch index greater than ``max_epoch ``, decrease the learning rate
1211+ In addition, if the epoch index is greater than ``max_epoch ``, we decrease the learning rate
12141212by multipling ``lr_decay ``.
12151213
12161214.. code-block :: python
@@ -1220,8 +1218,8 @@ by multipling ``lr_decay``.
12201218
12211219
12221220 At the beginning of each epoch, all states of LSTMs need to be reseted
1223- (initialized) to zero states, then after each iteration, the LSTMs' states
1224- is updated, so the new LSTM states (final states) need to be assigned as the initial states of next iteration:
1221+ (initialized) to zero states. Then after each iteration, the LSTMs' states
1222+ is updated, so the new LSTM states (final states) need to be assigned as the initial states of the next iteration:
12251223
12261224.. code-block :: python
12271225
@@ -1249,8 +1247,8 @@ Predicting
12491247^^^^^^^^^^^^^
12501248
12511249After training the model, when we predict the next output, we no long consider
1252- the number of steps (sequence length), i.e. ``batch_size, num_steps `` are ``1 ``.
1253- Then we can output the next word step by step , instead of predict a sequence
1250+ the number of steps (sequence length), i.e. ``batch_size, num_steps `` are set to ``1 ``.
1251+ Then we can output the next word one by one , instead of predicting a sequence
12541252of words from a sequence of words.
12551253
12561254.. code-block :: python
@@ -1291,12 +1289,12 @@ of words from a sequence of words.
12911289 What Next?
12921290-----------
12931291
1294- Now, you understand Synced sequence input and output. Let think about
1295- Many to one (Sequence input and one output), LSTM is able to predict
1292+ Now, you have understood Synced sequence input and output. Let's think about
1293+ Many to one (Sequence input and one output), so that LSTM is able to predict
12961294the next word "English" from "I am from London, I speak ..".
12971295
1298- Please read and understand the code of ``tutorial_generate_text.py ``,
1299- it show you how to restore a pre-trained Embedding matrix and how to learn text
1296+ Please read and understand the code of ``tutorial_generate_text.py ``.
1297+ It shows you how to restore a pre-trained Embedding matrix and how to learn text
13001298generation from a given context.
13011299
13021300Karpathy's blog :
0 commit comments