Update video tutorials with links and evaluation to MoViNet

shilpakancharla · copybara-github · commit 613618a5ce6f · 2022-12-02T10:05:44.000-08:00
PiperOrigin-RevId: 492496066
diff --git a/site/en/tutorials/load_data/video.ipynb b/site/en/tutorials/load_data/video.ipynb
@@ -68,15 +68,21 @@
         "id": "F-SqCosJ6-0H"
       },
       "source": [
-        "This tutorial demonstrates how to load and preprocess [AVI](https://en.wikipedia.org/wiki/Audio_Video_Interleave){:.external} video data using the [UCF101 human action dataset](https://www.tensorflow.org/datasets/catalog/ucf101). Once you have preprocessed the data, it can be used for such tasks as video classification/recognition, captioning or clustering. The original dataset contains realistic action videos collected from YouTube with 101 categories, including playing cello, brushing teeth, and applying eye makeup. You will learn how to:\n",
+        "This tutorial demonstrates how to load and preprocess [AVI](https://en.wikipedia.org/wiki/Audio_Video_Interleave) video data using the [UCF101 human action dataset](https://www.tensorflow.org/datasets/catalog/ucf101). Once you have preprocessed the data, it can be used for such tasks as video classification/recognition, captioning or clustering. The original dataset contains realistic action videos collected from YouTube with 101 categories, including playing cello, brushing teeth, and applying eye makeup. You will learn how to:\n",
         "\n",
         "* Load the data from a zip file.\n",
         "\n",
         "* Read sequences of frames out of the video files.\n",
         "\n",
         "* Visualize the video data.\n",
         "\n",
-        "* Wrap the frame-generator [`tf.data.Dataset`](https://www.tensorflow.org/guide/data)."
+        "* Wrap the frame-generator [`tf.data.Dataset`](https://www.tensorflow.org/guide/data).\n",
+        "\n",
+        "This video loading and preprocessing tutorial is the first part in a series of TensorFlow video tutorials. Here are the other three tutorials:\n",
+        "\n",
+        "- [Build a 3D CNN model for video classification](https://www.tensorflow.org/tutorials/video/video_classification): Note that this tutorial uses a (2+1)D CNN that decomposes the spatial and temporal aspects of 3D data; if you are using volumetric data such as an MRI scan, consider using a 3D CNN instead of a (2+1)D CNN.\n",
+        "- [MoViNet for streaming action recognition](https://www.tensorflow.org/hub/tutorials/movinet): Get familiar with the MoViNet models that are available on TF Hub.\n",
+        "- [Transfer learning for video classification with MoViNet](https://www.tensorflow.org/tutorials/video/transfer_learning_with_movinet): This tutorial explains how to use a pre-trained video classification model trained on a different dataset with the UCF-101 dataset."
       ]
     },
     {
@@ -88,7 +94,7 @@
         "## Setup\n",
         "\n",
         "Begin by installing and importing some necessary libraries, including:\n",
-        "[remotezip](https://github.com/gtsystem/python-remotezip){:.external} to inspect the contents of a ZIP file, [tqdm](https://github.com/tqdm/tqdm){:.external} to use a progress bar, [OpenCV](https://opencv.org/){:.external} to process video files, and [`tensorflow_docs`](https://github.com/tensorflow/docs/tree/master/tools/tensorflow_docs){:.external} for embedding data in a Jupyter notebook."
+        "[remotezip](https://github.com/gtsystem/python-remotezip) to inspect the contents of a ZIP file, [tqdm](https://github.com/tqdm/tqdm) to use a progress bar, [OpenCV](https://opencv.org/) to process video files, and [`tensorflow_docs`](https://github.com/tensorflow/docs/tree/master/tools/tensorflow_docs) for embedding data in a Jupyter notebook."
       ]
     },
     {
@@ -257,7 +263,7 @@
         "      files: List of files in the dataset.\n",
         "\n",
         "    Returns:\n",
-        "      Dictionary of class names (key) and files (values).\n",
+        "      Dictionary of class names (key) and files (values). \n",
         "  \"\"\"\n",
         "  files_for_class = collections.defaultdict(list)\n",
         "  for fname in files:\n",
@@ -989,6 +995,20 @@
         "          validation_data = val_ds,\n",
         "          callbacks = tf.keras.callbacks.EarlyStopping(patience = 2, monitor = 'val_loss'))"
       ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "DdJm7ojgGxtT"
+      },
+      "source": [
+        "\n",
+        "To learn more about working with video data in TensorFlow, check out the following tutorials:\n",
+        "\n",
+        "* [Build a 3D CNN model for video classification](https://www.tensorflow.org/tutorials/video/video_classification)\n",
+        "* [MoViNet for streaming action recognition](https://www.tensorflow.org/hub/tutorials/movinet)\n",
+        "* [Transfer learning for video classification with MoViNet](https://www.tensorflow.org/tutorials/video/transfer_learning_with_movinet)"
+      ]
     }
   ],
   "metadata": {
diff --git a/site/en/tutorials/video/transfer_learning_with_movinet.ipynb b/site/en/tutorials/video/transfer_learning_with_movinet.ipynb
@@ -68,7 +68,13 @@
         "* Replace the classifier head with the number of labels of a new dataset\n",
         "* Perform transfer learning on the [UCF101 dataset](https://www.crcv.ucf.edu/data/UCF101.php)\n",
         "\n",
-        "The model downloaded in this tutorial is from [official/projects/movinet](https://github.com/tensorflow/models/tree/master/official/projects/movinet). This repository contains a collection of MoViNet models that TF Hub uses in the TensorFlow 2 SavedModel format."
+        "The model downloaded in this tutorial is from [official/projects/movinet](https://github.com/tensorflow/models/tree/master/official/projects/movinet). This repository contains a collection of MoViNet models that TF Hub uses in the TensorFlow 2 SavedModel format.\n",
+        "\n",
+        "This transfer learning tutorial is the third part in a series of TensorFlow video tutorials. Here are the other three tutorials:\n",
+        "\n",
+        "- [Load video data](https://www.tensorflow.org/tutorials/load_data/video): This tutorial explains much of the code used in this document; in particular, how to preprocess and load data through the `FrameGenerator` class is explained in more detail.\n",
+        "- [Build a 3D CNN model for video classification](https://www.tensorflow.org/tutorials/video/video_classification). Note that this tutorial uses a (2+1)D CNN that decomposes the spatial and temporal aspects of 3D data; if you are using volumetric data such as an MRI scan, consider using a 3D CNN instead of a (2+1)D CNN.\n",
+        "- [MoViNet for streaming action recognition](https://www.tensorflow.org/hub/tutorials/movinet): Get familiar with the MoViNet models that are available on TF Hub."
       ]
     },
     {
@@ -111,6 +117,7 @@
         "import cv2\n",
         "import numpy as np\n",
         "import remotezip as rz\n",
+        "import seaborn as sns\n",
         "import matplotlib.pyplot as plt\n",
         "\n",
         "import keras\n",
@@ -132,7 +139,7 @@
       },
       "source": [
         "## Load data\n",
-        "\n",
+        " \n",
         "The hidden cell below defines helper functions to download a slice of data from the UCF-101 dataset, and load it into a `tf.data.Dataset`. The [Loading video data tutorial](https://www.tensorflow.org/tutorials/load_data/video) provides a detailed walkthrough of this code.\n",
         "\n",
         "The `FrameGenerator` class at the end of the hidden block is the most important utility here. It creates an iterable object that can feed data into the TensorFlow data pipeline. Specifically, this class contains a Python generator that loads the video frames along with its encoded label. The generator (`__call__`) function yields the frame array produced by `frames_from_video_file` and a one-hot encoded vector of the label associated with the set of frames.\n",
@@ -598,6 +605,111 @@
         "                    verbose=1)"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "KkLl2zF8G9W0"
+      },
+      "source": [
+        "## Evaluate the model\n",
+        "\n",
+        "The model achieved high accuracy on the training dataset. Next, use Keras `Model.evaluate` to evaluate it on the test set."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "NqgbzOiKuxxT"
+      },
+      "outputs": [],
+      "source": [
+        "model.evaluate(test_ds, return_dict=True)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "OkFst2gsHBwD"
+      },
+      "source": [
+        "To visualize model performance further, use a [confusion matrix](https://www.tensorflow.org/api_docs/python/tf/math/confusion_matrix). The confusion matrix allows you to assess the performance of the classification model beyond accuracy. In order to build the confusion matrix for this multi-class classification problem, get the actual values in the test set and the predicted values."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "hssSdW9XHF_j"
+      },
+      "outputs": [],
+      "source": [
+        "def get_actual_predicted_labels(dataset):\n",
+        "  \"\"\"\n",
+        "    Create a list of actual ground truth values and the predictions from the model.\n",
+        "\n",
+        "    Args:\n",
+        "      dataset: An iterable data structure, such as a TensorFlow Dataset, with features and labels.\n",
+        "\n",
+        "    Return:\n",
+        "      Ground truth and predicted values for a particular dataset.\n",
+        "  \"\"\"\n",
+        "  actual = [labels for _, labels in dataset.unbatch()]\n",
+        "  predicted = model.predict(dataset)\n",
+        "\n",
+        "  actual = tf.stack(actual, axis=0)\n",
+        "  predicted = tf.concat(predicted, axis=0)\n",
+        "  predicted = tf.argmax(predicted, axis=1)\n",
+        "\n",
+        "  return actual, predicted"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "2TmTue6THGWO"
+      },
+      "outputs": [],
+      "source": [
+        "def plot_confusion_matrix(actual, predicted, labels, ds_type):\n",
+        "  cm = tf.math.confusion_matrix(actual, predicted)\n",
+        "  ax = sns.heatmap(cm, annot=True, fmt='g')\n",
+        "  sns.set(rc={'figure.figsize':(12, 12)})\n",
+        "  sns.set(font_scale=1.4)\n",
+        "  ax.set_title('Confusion matrix of action recognition for ' + ds_type)\n",
+        "  ax.set_xlabel('Predicted Action')\n",
+        "  ax.set_ylabel('Actual Action')\n",
+        "  plt.xticks(rotation=90)\n",
+        "  plt.yticks(rotation=0)\n",
+        "  ax.xaxis.set_ticklabels(labels)\n",
+        "  ax.yaxis.set_ticklabels(labels)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "4RK1A1C1HH6V"
+      },
+      "outputs": [],
+      "source": [
+        "fg = FrameGenerator(subset_paths['train'], num_frames, training = True)\n",
+        "label_names = list(fg.class_ids_for_name.keys())"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "r4AFi2e5HKEO"
+      },
+      "outputs": [],
+      "source": [
+        "actual, predicted = get_actual_predicted_labels(test_ds)\n",
+        "plot_confusion_matrix(actual, predicted, label_names, 'test')"
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -611,10 +723,11 @@
         "\n",
         "In particular, using the `FrameGenerator` class used in this tutorial and the other video data and classification tutorials will help you load data into your models.\n",
         "\n",
-        "To learn more about video data, check out:\n",
+        "To learn more about working with video data in TensorFlow, check out the following tutorials:\n",
         "\n",
-        "- [Load video data](https://www.tensorflow.org/tutorials/load_data/video): This tutorial explains much of the code used in this document.\n",
-        "- [Build a 3D CNN model for video classification](https://www.tensorflow.org/tutorials/video/video_classification). Note that this tutorial uses a (2+1)D CNN that decomposes the spatial and temporal aspects of 3D data; if you are using volumetric data such as an MRI scan, consider using a 3D CNN instead of a (2+1)D CNN."
+        "* [Load video data](https://www.tensorflow.org/tutorials/load_data/video)\n",
+        "* [Build a 3D CNN model for video classification](https://www.tensorflow.org/tutorials/video/video_classification)\n",
+        "* [MoViNet for streaming action recognition](https://www.tensorflow.org/hub/tutorials/movinet)"
       ]
     }
   ],
diff --git a/site/en/tutorials/video/video_classification.ipynb b/site/en/tutorials/video/video_classification.ipynb
@@ -66,7 +66,13 @@
         "* Build an input pipeline\n",
         "* Build a 3D convolutional neural network model with residual connections using Keras functional API\n",
         "* Train the model\n",
-        "* Evaluate and test the model"
+        "* Evaluate and test the model \n",
+        "\n",
+        "This video classification tutorial is the second part in a series of TensorFlow video tutorials. Here are the other three tutorials:\n",
+        "\n",
+        "- [Load video data](https://www.tensorflow.org/tutorials/load_data/video): This tutorial explains much of the code used in this document.\n",
+        "- [MoViNet for streaming action recognition](https://www.tensorflow.org/hub/tutorials/movinet): Get familiar with the MoViNet models that are available on TF Hub.\n",
+        "- [Transfer learning for video classification with MoViNet](https://www.tensorflow.org/tutorials/video/transfer_learning_with_movinet): This tutorial explains how to use a pre-trained video classification model trained on a different dataset with the UCF-101 dataset."
       ]
     },
     {
@@ -835,7 +841,7 @@
       },
       "outputs": [],
       "source": [
-        "model.evaluate(test_ds, return_dict = True)"
+        "model.evaluate(test_ds, return_dict=True)"
       ]
     },
     {
@@ -905,8 +911,8 @@
       },
       "outputs": [],
       "source": [
-        "labels = ['ApplyEyeMakeup', 'ApplyLipstick', 'Archery', 'BabyCrawling', 'BalanceBeam',\n",
-        "          'BandMarching', 'BaseballPitch', 'Basketball', 'BasketballDunk', 'BenchPress']"
+        "fg = FrameGenerator(subset_paths['train'], num_frames, training = True)\n",
+        "label_names = list(fg.class_ids_for_name.keys())"
       ]
     },
     {
@@ -1013,6 +1019,21 @@
       "source": [
         "recall"
       ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "d4WsP4Z2HZ6L"
+      },
+      "source": [
+        "## Next Steps\n",
+        "\n",
+        "To learn more about working with video data in TensorFlow, check out the following tutorials:\n",
+        "\n",
+        "* [Load video data](https://www.tensorflow.org/tutorials/load_data/video)\n",
+        "* [MoViNet for streaming action recognition](https://www.tensorflow.org/hub/tutorials/movinet)\n",
+        "* [Transfer learning for video classification with MoViNet](https://www.tensorflow.org/tutorials/video/transfer_learning_with_movinet)"
+      ]
     }
   ],
   "metadata": {