|
77 | 77 | "id": "BwpJ5IffzRG6"
|
78 | 78 | },
|
79 | 79 | "source": [
|
80 |
| - "This tutorial demonstrates how to generate text using a character-based RNN. We will work with a dataset of Shakespeare's writing from Andrej Karpathy's [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Given a sequence of characters from this data (\"Shakespear\"), train a model to predict the next character in the sequence (\"e\"). Longer sequences of text can be generated by calling the model repeatedly.\n", |
| 80 | + "This tutorial demonstrates how to generate text using a character-based RNN. You will work with a dataset of Shakespeare's writing from Andrej Karpathy's [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Given a sequence of characters from this data (\"Shakespear\"), train a model to predict the next character in the sequence (\"e\"). Longer sequences of text can be generated by calling the model repeatedly.\n", |
81 | 81 | "\n",
|
82 |
| - "Note: Enable GPU acceleration to execute this notebook faster. In Colab: *Runtime > Change runtime type > Hardware acclerator > GPU*. If running locally make sure TensorFlow version >= 1.11.\n", |
| 82 | + "Note: Enable GPU acceleration to execute this notebook faster. In Colab: *Runtime > Change runtime type > Hardware accelerator > GPU*. If running locally make sure TensorFlow version >= 1.11.\n", |
83 | 83 | "\n",
|
84 | 84 | "This tutorial includes runnable code implemented using [tf.keras](https://www.tensorflow.org/programmers_guide/keras) and [eager execution](https://www.tensorflow.org/programmers_guide/eager). The following is sample output when the model in this tutorial trained for 30 epochs, and started with the string \"Q\":\n",
|
85 | 85 | "\n",
|
|
98 | 98 | "To watch the next way with his father with his face?\n",
|
99 | 99 | "\n",
|
100 | 100 | "ESCALUS:\n",
|
101 |
| - "The cause why then we are all resolved more sons.\n", |
| 101 | + "The cause why then us all resolved more sons.\n", |
102 | 102 | "\n",
|
103 | 103 | "VOLUMNIA:\n",
|
104 | 104 | "O, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, it is no sin it should be dead,\n",
|
|
248 | 248 | "source": [
|
249 | 249 | "### Vectorize the text\n",
|
250 | 250 | "\n",
|
251 |
| - "Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters." |
| 251 | + "Before training, you need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters." |
252 | 252 | ]
|
253 | 253 | },
|
254 | 254 | {
|
|
272 | 272 | "id": "tZfqhkYCymwX"
|
273 | 273 | },
|
274 | 274 | "source": [
|
275 |
| - "Now we have an integer representation for each character. Notice that we mapped the character as indexes from 0 to `len(unique)`." |
| 275 | + "Now you have an integer representation for each character. Notice that you mapped the character as indexes from 0 to `len(unique)`." |
276 | 276 | ]
|
277 | 277 | },
|
278 | 278 | {
|
|
316 | 316 | "id": "wssHQ1oGymwe"
|
317 | 317 | },
|
318 | 318 | "source": [
|
319 |
| - "Given a character, or a sequence of characters, what is the most probable next character? This is the task we're training the model to perform. The input to the model will be a sequence of characters, and we train the model to predict the output—the following character at each time step.\n", |
| 319 | + "Given a character, or a sequence of characters, what is the most probable next character? This is the task you are training the model to perform. The input to the model will be a sequence of characters, and you train the model to predict the output—the following character at each time step.\n", |
320 | 320 | "\n",
|
321 | 321 | "Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?\n"
|
322 | 322 | ]
|
|
346 | 346 | },
|
347 | 347 | "outputs": [],
|
348 | 348 | "source": [
|
349 |
| - "# The maximum length sentence we want for a single input in characters\n", |
| 349 | + "# The maximum length sentence you want for a single input in characters\n", |
350 | 350 | "seq_length = 100\n",
|
351 | 351 | "examples_per_epoch = len(text)//seq_length\n",
|
352 | 352 | "\n",
|
|
458 | 458 | "source": [
|
459 | 459 | "### Create training batches\n",
|
460 | 460 | "\n",
|
461 |
| - "We used `tf.data` to split the text into manageable sequences. But before feeding this data into the model, we need to shuffle the data and pack it into batches." |
| 461 | + "You used `tf.data` to split the text into manageable sequences. But before feeding this data into the model, you need to shuffle the data and pack it into batches." |
462 | 462 | ]
|
463 | 463 | },
|
464 | 464 | {
|
|
650 | 650 | "id": "uwv0gEkURfx1"
|
651 | 651 | },
|
652 | 652 | "source": [
|
653 |
| - "To get actual predictions from the model we need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary.\n", |
| 653 | + "To get actual predictions from the model you need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary.\n", |
654 | 654 | "\n",
|
655 | 655 | "Note: It is important to _sample_ from this distribution as taking the _argmax_ of the distribution can easily get the model stuck in a loop.\n",
|
656 | 656 | "\n",
|
|
746 | 746 | "source": [
|
747 | 747 | "The standard `tf.keras.losses.sparse_categorical_crossentropy` loss function works in this case because it is applied across the last dimension of the predictions.\n",
|
748 | 748 | "\n",
|
749 |
| - "Because our model returns logits, we need to set the `from_logits` flag.\n" |
| 749 | + "Because our model returns logits, you need to set the `from_logits` flag.\n" |
750 | 750 | ]
|
751 | 751 | },
|
752 | 752 | {
|
|
771 | 771 | "id": "jeOXriLcymww"
|
772 | 772 | },
|
773 | 773 | "source": [
|
774 |
| - "Configure the training procedure using the `tf.keras.Model.compile` method. We'll use `tf.train.AdamOptimizer` with default arguments and the loss function." |
| 774 | + "Configure the training procedure using the `tf.keras.Model.compile` method. You'll use `tf.train.AdamOptimizer` with default arguments and the loss function." |
775 | 775 | ]
|
776 | 776 | },
|
777 | 777 | {
|
|
891 | 891 | "\n",
|
892 | 892 | "Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.\n",
|
893 | 893 | "\n",
|
894 |
| - "To run the model with a different `batch_size`, we need to rebuild the model and restore the weights from the checkpoint.\n" |
| 894 | + "To run the model with a different `batch_size`, you need to rebuild the model and restore the weights from the checkpoint.\n" |
895 | 895 | ]
|
896 | 896 | },
|
897 | 897 | {
|
|
992 | 992 | " predictions = predictions / temperature\n",
|
993 | 993 | " predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()\n",
|
994 | 994 | "\n",
|
995 |
| - " # We pass the predicted word as the next input to the model\n", |
| 995 | + " # You pass the predicted word as the next input to the model\n", |
996 | 996 | " # along with the previous hidden state\n",
|
997 | 997 | " input_eval = tf.expand_dims([predicted_id], 0)\n",
|
998 | 998 | "\n",
|
|
1035 | 1035 | "\n",
|
1036 | 1036 | "So now that you've seen how to run the model manually let's unpack the training loop, and implement it ourselves. This gives a starting point, for example, to implement _curriculum learning_ to help stabilize the model's open-loop output.\n",
|
1037 | 1037 | "\n",
|
1038 |
| - "We will use `tf.GradientTape` to track the gradients. You can learn more about this approach by reading the [eager execution guide](https://www.tensorflow.org/r1/guide/eager).\n", |
| 1038 | + "You will use `tf.GradientTape` to track the gradients. You can learn more about this approach by reading the [eager execution guide](https://www.tensorflow.org/r1/guide/eager).\n", |
1039 | 1039 | "\n",
|
1040 | 1040 | "The procedure works as follows:\n",
|
1041 | 1041 | "\n",
|
1042 |
| - "* First, initialize the RNN state. We do this by calling the `tf.keras.Model.reset_states` method.\n", |
| 1042 | + "* First, initialize the RNN state. You do this by calling the `tf.keras.Model.reset_states` method.\n", |
1043 | 1043 | "\n",
|
1044 | 1044 | "* Next, iterate over the dataset (batch by batch) and calculate the *predictions* associated with each.\n",
|
1045 | 1045 | "\n",
|
|
0 commit comments