You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"We'll also install Comet. f you followed the instructions from Lab 1, you should have you comet account set up. Otherwise, you'll need to sign up for an account and enter your API key. Click on the final link in the output to see your experiment."
85
+
"We'll also install Comet. If you followed the instructions from Lab 1, you should have your Comet account set up. Enter your API key below."
"To define the architecture of this first fully connected neural network, we'll once again use the Keras API and define the model using the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential) class. Note how we first use a [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model.\n",
191
+
"To define the architecture of this first fully connected neural network, we'll once again use the Keras API and define the model using the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential) class. Note how we first use a [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) layer, which flattens the input so that it can be fed into the model.\n",
183
192
"\n",
184
193
"In this next block, you'll define the fully connected layers of this simple work."
" # [TODO Dense layer to output classification probabilities]\n",
207
-
"\n",
216
+
"\n",
208
217
" ])\n",
209
218
" return fc_model\n",
210
219
"\n",
@@ -230,7 +239,7 @@
230
239
"\n",
231
240
"After the pixels are flattened, the network consists of a sequence of two `tf.keras.layers.Dense` layers. These are fully-connected neural layers. The first `Dense` layer has 128 nodes (or neurons). The second (and last) layer (which you've defined!) should return an array of probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the handwritten digit classes.\n",
232
241
"\n",
233
-
"That defines our fully connected model!"
242
+
"That defines our fully connected model!"
234
243
]
235
244
},
236
245
{
@@ -251,7 +260,7 @@
251
260
"\n",
252
261
"We'll start out by using a stochastic gradient descent (SGD) optimizer initialized with a learning rate of 0.1. Since we are performing a categorical classification task, we'll want to use the [cross entropy loss](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/sparse_categorical_crossentropy).\n",
253
262
"\n",
254
-
"You'll want to experiment with both the choice of optimizer and learning rate and evaluate how these affect the accuracy of the trained model."
263
+
"You'll want to experiment with both the choice of optimizer and learning rate and evaluate how these affect the accuracy of the trained model."
255
264
]
256
265
},
257
266
{
@@ -265,7 +274,7 @@
265
274
"'''TODO: Experiment with different optimizers and learning rates. How do these affect\n",
266
275
" the accuracy of the trained model? Which optimizers and/or learning rates yield\n",
"We're now ready to train our model, which will involve feeding the training data (`train_images` and `train_labels`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the batch size and the number of epochs, or iterations over the MNIST dataset, to use during training.\n",
290
+
"We're now ready to train our model, which will involve feeding the training data (`train_images` and `train_labels`) into the model, and then asking it to learn the associations between images and labels. We'll also need to define the batch size and the number of epochs, or iterations over the MNIST dataset, to use during training.\n",
282
291
"\n",
283
292
"In Lab 1, we saw how we can use `GradientTape` to optimize losses and train models with stochastic gradient descent. After defining the model settings in the `compile` step, we can also accomplish training by calling the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) method on an instance of the `Model` class. We will use this to train our fully connected model\n"
284
293
]
@@ -316,7 +325,7 @@
316
325
"source": [
317
326
"### Evaluate accuracy on the test dataset\n",
318
327
"\n",
319
-
"Now that we've trained the model, we can ask it to make predictions about a test set that it hasn't seen before. In this example, the `test_images` array comprises our test dataset. To evaluate accuracy, we can check to see if the model's predictions match the labels from the `test_labels` array.\n",
328
+
"Now that we've trained the model, we can ask it to make predictions about a test set that it hasn't seen before. In this example, the `test_images` array comprises our test dataset. To evaluate accuracy, we can check to see if the model's predictions match the labels from the `test_labels` array.\n",
320
329
"\n",
321
330
"Use the [`evaluate`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#evaluate) method to evaluate the model on the test dataset!"
322
331
]
@@ -342,7 +351,7 @@
342
351
"id": "yWfgsmVXCaXG"
343
352
},
344
353
"source": [
345
-
"You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*, when a machine learning model performs worse on new data than on its training data.\n",
354
+
"You may observe that the accuracy on the test dataset is a little lower than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*, when a machine learning model performs worse on new data than on its training data.\n",
346
355
"\n",
347
356
"What is the highest accuracy you can achieve with this first fully connected model? Since the handwritten digit classification task is pretty straightforward, you may be wondering how we can do better...\n",
348
357
"\n",
@@ -393,7 +402,7 @@
393
402
"\n",
394
403
" # TODO: Define the first convolutional layer\n",
"What is the highest accuracy you're able to achieve using the CNN model, and how does the accuracy of the CNN model compare to the accuracy of the simple fully connected network? What optimizers and learning rates seem to be optimal for training the CNN model?\n",
520
+
"What is the highest accuracy you're able to achieve using the CNN model, and how does the accuracy of the CNN model compare to the accuracy of the simple fully connected network? What optimizers and learning rates seem to be optimal for training the CNN model?\n",
512
521
"\n",
513
522
"Feel free to click the Comet links to investigate the training/accuracy curves for your model."
514
523
]
@@ -561,7 +570,7 @@
561
570
"id": "-hw1hgeSCaXN"
562
571
},
563
572
"source": [
564
-
"As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a probability distribution over the 10 digit classes. Thus, these numbers describe the model's \"confidence\" that the image corresponds to each of the 10 different digits.\n",
573
+
"As you can see, a prediction is an array of 10 numbers. Recall that the output of our model is a probability distribution over the 10 digit classes. Thus, these numbers describe the model's \"confidence\" that the image corresponds to each of the 10 different digits.\n",
565
574
"\n",
566
575
"Let's look at the digit that has the highest confidence for the first image in the test dataset:"
567
576
]
@@ -576,7 +585,7 @@
576
585
"source": [
577
586
"'''TODO: identify the digit with the highest confidence prediction for the first\n",
578
587
" image in the test dataset. '''\n",
579
-
"prediction = np.argmax(predictions[0])\n",
588
+
"prediction = np.argmax(predictions[0])\n",
580
589
"# prediction = # TODO\n",
581
590
"\n",
582
591
"print(prediction)"
@@ -671,7 +680,7 @@
671
680
"source": [
672
681
"## 1.4 Training the model 2.0\n",
673
682
"\n",
674
-
"Earlier in the lab, we used the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) function call to train the model. This function is quite high-level and intuitive, which is really useful for simpler models. As you may be able to tell, this function abstracts away many details in the training call, and we have less control over training model, which could be useful in other contexts.\n",
683
+
"Earlier in the lab, we used the [`fit`](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#fit) function call to train the model. This function is quite high-level and intuitive, which is really useful for simpler models. As you may be able to tell, this function abstracts away many details in the training call, and we have less control over training model, which could be useful in other contexts.\n",
675
684
"\n",
676
685
"As an alternative to this, we can use the [`tf.GradientTape`](https://www.tensorflow.org/api_docs/python/tf/GradientTape) class to record differentiation operations during training, and then call the [`tf.GradientTape.gradient`](https://www.tensorflow.org/api_docs/python/tf/GradientTape#gradient) function to actually compute the gradients. You may recall seeing this in Lab 1 Part 1, but let's take another look at this here.\n",
677
686
"\n",
@@ -720,7 +729,7 @@
720
729
"\n",
721
730
" # Backpropagation\n",
722
731
" '''TODO: Use the tape to compute the gradient against all parameters in the CNN model.\n",
723
-
" Use cnn_model.trainable_variables to access these parameters.'''\n",
732
+
" Use cnn_model.trainable_variables to access these parameters.'''\n",
"In this part of the lab, you had the chance to play with different MNIST classifiers with different architectures (fully-connected layers only, CNN), and experiment with how different hyperparameters affect accuracy (learning rate, etc.). The next part of the lab explores another application of CNNs, facial detection, and some drawbacks of AI systems in real world applications, like issues of bias."
748
+
"In this part of the lab, you had the chance to play with different MNIST classifiers with different architectures (fully-connected layers only, CNN), and experiment with how different hyperparameters affect accuracy (learning rate, etc.). The next part of the lab explores another application of CNNs, facial detection, and some drawbacks of AI systems in real world applications, like issues of bias."
0 commit comments