Addressed reviewer's comments.

wwwind · wwwind · commit eb95d1169c7e · 2021-11-26T17:16:36.000Z
Change-Id: I095f9c6babd7d362e5ab975b1488ce49f02d1f05
diff --git a/tensorflow_model_optimization/g3doc/guide/pruning/index.md b/tensorflow_model_optimization/g3doc/guide/pruning/index.md
@@ -11,7 +11,7 @@ fits with your use case.
     [pruning comprehensive guide](comprehensive_guide.ipynb).
 *   To explore the application of pruning for on-device inference, see the
     [Pruning for on-device inference with XNNPACK](pruning_for_on_device_inference.ipynb).
-*   To see an example of structural pruning, see the
+*   To see an example of structural pruning, run the tutorial
     [Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb).
 
 ## Overview
@@ -43,18 +43,6 @@ It is on our roadmap to add support in the following areas:
 *   [Minimal Subclassed model support](https://github.com/tensorflow/model-optimization/issues/155)
 *   [Framework support for latency improvements](https://github.com/tensorflow/model-optimization/issues/173)
 
-## Structural pruning M by N
-
-Structural pruning zeroes out model weights at the beginning of the training
-process according to the following pattern: M weights are set to zero in the
-block of N weights. It is important to notice that this pattern affects only the last dimension of the weight tensor for the model that is converted by TensorFlow Lite. For example, `Conv2D` layer weights in TensorFlow Lite have the structure [channel_out, height, width, channel_in] and `Dense` layer weights have the structure [channel_out, channel_in]. The sparsity pattern is applied to the weights in the last dimension: channel_in.
-Special hardware can benefit from this type of sparsity in the model and inference time can have a speedup up to 2x. Because this pattern lock in sparsity is more restrictive, the accuracy achieved after fine-tuning is worse than with the magnitude-based pruning.
-It is important to indicate that the pattern is valid only for the model that is converted to tflite.
-If the model is quantized, then the accuracy could be improved using [collaborative optimization technique](https://blog.tensorflow.org/2021/10/Collaborative-Optimizations.html): Sparsity preserving quantization aware training.
-
-The tutorial [Structural pruning with sparsity 2 by 4](pruning_with_sparsity_2_by_4.ipynb)
-provides more information on this topic.
-
 ## Results
 
 ### Image Classification
@@ -64,16 +52,18 @@ provides more information on this topic.
     <tr>
       <th>Model</th>
       <th>Non-sparse Top-1 Accuracy </th>
-      <th>Sparse Accuracy </th>
-      <th>Sparsity 2 by 4</th>
-      <th>Sparsity </th>
+      <th>Random Sparse Accuracy </th>
+      <th>Random Sparsity </th>
+      <th>Structured Sparse Accuracy</th>
+      <th>Structured Sparsity </th>
     </tr>
     <tr>
       <td rowspan=3>InceptionV3</td>
       <td rowspan=3>78.1%</td>
       <td>78.0%</td>
-      <td>75.8%</td>
       <td>50%</td>
+      <td>75.8%</td>
+      <td>2 by 4</td>
     </tr>
     <tr>
       <td>76.1%</td><td>75%</td>
@@ -82,10 +72,10 @@ provides more information on this topic.
       <td>74.6%</td><td>87.5%</td>
     </tr>
     <tr>
-      <td>MobilenetV1 224</td><td>71.04%</td><td>70.84%</td><td>67.35%</td><td>50%</td>
+      <td>MobilenetV1 224</td><td>71.04%</td><td>70.84%</td><td>50%</td><td>67.35%</td><td>2 by 4</td>
     </tr>
     <tr>
-      <td>MobilenetV2 224</td><td>71.77%</td><td>69.64%</td><td>66.75%</td><td>50%</td>
+      <td>MobilenetV2 224</td><td>71.77%</td><td>69.64%</td><td>50%</td><td>66.75%</td><td>2 by 4</td>
     </tr>
  </table>
 </figure>
diff --git a/tensorflow_model_optimization/g3doc/guide/pruning/pruning_with_sparsity_2_by_4.ipynb b/tensorflow_model_optimization/g3doc/guide/pruning/pruning_with_sparsity_2_by_4.ipynb
@@ -67,7 +67,7 @@
       "source": [
         "Welcome to the guide on the structural pruning M by N.\n",
         "\n",
-        "Before reading this tutorial it is recommended to get familiar with the concept of pruning and APIs for unstructured pruning:\n",
+        "Before reading this tutorial it is recommended to get familiar with the concept of pruning and APIs for random pruning:\n",
         "*  General overview of the pruning technique for the model optimization, see the [overview](https://www.tensorflow.org/model_optimization/guide/pruning).\n",
         "*  Usage of API's on a single end-to-end example, see the [pruning example](https://www.tensorflow.org/model_optimization/guide/pruning/pruning_with_keras).\n",
         "\n",
@@ -80,6 +80,25 @@
         "id": "FbORZA_bQx1G"
       }
     },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Structural pruning M by N"
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Structural pruning zeroes out model weights at the beginning of the training\n",
+        "process according to the following pattern: M weights are set to zero in the\n",
+        "block of N weights. It is important to notice that this pattern affects only the last dimension of the weight tensor for the model that is converted by TensorFlow Lite. For example, `Conv2D` layer weights in TensorFlow Lite have the structure [channel_out, height, width, channel_in] and `Dense` layer weights have the structure [channel_out, channel_in]. The sparsity pattern is applied to the weights in the last dimension: channel_in.\n",
+        "Special hardware can benefit from this type of sparsity in the model and inference time can have a speedup up to 2x. Because this pattern lock in sparsity is more restrictive, the accuracy achieved after fine-tuning is worse than with the magnitude-based pruning.\n",
+        "It is important to indicate that the pattern is valid only for the model that is converted to tflite.\n",
+        "If the model is quantized, then the accuracy could be improved using [collaborative optimization technique](https://blog.tensorflow.org/2021/10/Collaborative-Optimizations.html): Sparsity preserving quantization aware training."
+      ],
+      "metadata": {}
+    },
     {
       "cell_type": "markdown",
       "source": [
@@ -117,11 +136,10 @@
       "execution_count": null,
       "source": [
         "import tensorflow as tf\n",
+        "from tensorflow import keras\n",
         "\n",
         "import tensorflow_model_optimization as tfmot\n",
-        "prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude\n",
-        "\n",
-        "from tensorflow import keras"
+        "prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude"
       ],
       "outputs": [],
       "metadata": {}
@@ -154,7 +172,7 @@
       "cell_type": "markdown",
       "source": [
         "Define parameters for pruning and specify the type of structural pruning that will be used: (2, 4).\n",
-        "It means that in a block of four elements, two with the lowest magnitude will be set to zero.\n",
+        "It means that in a block of four elements, at least two with the lowest magnitude will be set to zero.\n",
         "\n",
         "We don't set `pruning_schedule` parameter. By default, the pruning mask is defined at the first step and it is not updated during the training."
       ],
@@ -174,15 +192,15 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Define parameters for unstructured pruning with the same target sparsity: 50%."
+        "Define parameters for random pruning with the target sparsity: 50%."
       ],
       "metadata": {}
     },
     {
       "cell_type": "code",
       "execution_count": null,
       "source": [
-        "pruning_params_unstructured = {\n",
+        "pruning_params_sparsity_0_5 = {\n",
         "    'pruning_schedule': tfmot.sparsity.keras.ConstantSparsity(target_sparsity=0.5,\n",
         "                                                              begin_step=0,\n",
         "                                                              frequency=100)\n",
@@ -198,7 +216,7 @@
         "\n",
         "In the example below, we prune only some of the layers. We prune `Conv2D` layer with the biggest number of parameters and an internal `Dense` layer.\n",
         "\n",
-        "It is important to notice that even if we marked the first `Conv2D` layer to be structural pruned, it is not structurally pruned, because the number of input channels is 1. Therefore, we prune the first `Conv2D` layer with the unstructured pruning.\n"
+        "It is important to notice that even if we marked the first `Conv2D` layer to be structural pruned, it is not structurally pruned. We pruned in the channel directory, so we need to have at least 2 (or m channels) input channels. In this case, the number of input channels is 1. Therefore, we prune the first `Conv2D` layer with the random pruning.\n"
       ],
       "metadata": {}
     },
@@ -211,8 +229,8 @@
         "        keras.layers.Conv2D(\n",
         "            32, 5, padding='same', activation='relu',\n",
         "            input_shape=(28, 28, 1),\n",
-        "            name=\"unstructured_pruning\"),\n",
-        "        **pruning_params_unstructured),\n",
+        "            name=\"pruning_sparsity_0_5\"),\n",
+        "        **pruning_params_sparsity_0_5),\n",
         "    keras.layers.MaxPooling2D((2, 2), (2, 2), padding='same'),\n",
         "    prune_low_magnitude(\n",
         "        keras.layers.Conv2D(\n",
@@ -264,8 +282,8 @@
         "    callbacks=tfmot.sparsity.keras.UpdatePruningStep(),\n",
         "    validation_split=0.1)\n",
         "\n",
-        "_, model_for_pruning_accuracy = model.evaluate(test_images, test_labels, verbose=0)\n",
-        "print('Pruned test accuracy:', model_for_pruning_accuracy)"
+        "_, pruned_model_accuracy = model.evaluate(test_images, test_labels, verbose=0)\n",
+        "print('Pruned test accuracy:', pruned_model_accuracy)"
       ],
       "outputs": [],
       "metadata": {}
@@ -313,14 +331,14 @@
     {
       "cell_type": "markdown",
       "source": [
-        "## Visualize and check weights."
+        "## Visualize and check weights"
       ],
       "metadata": {}
     },
     {
       "cell_type": "markdown",
       "source": [
-        "Now let visualize the weights structure in the `Dense` layer pruned with 2/4 sparsity. At first, we need to extract these weights from the tflite file."
+        "Now let's visualize the weights structure in the `Dense` layer pruned with 2 by 4 sparsity. At first, we need to extract these weights from the tflite file."
       ],
       "metadata": {}
     },
@@ -347,7 +365,7 @@
     {
       "cell_type": "markdown",
       "source": [
-        "To check that we selected the layer that has been pruned, let us check the shape of the weight tensor."
+        "To verify that we selected the correct layer that has been pruned, let us print the shape of the weight tensor."
       ],
       "metadata": {}
     },
@@ -376,6 +394,7 @@
         "import matplotlib.pyplot as plt\n",
         "import numpy as np\n",
         "\n",
+        "# The value 24 is chosen for convenience.\n",
         "width = height = 24\n",
         "\n",
         "subset_values_to_display = tensor_data[0:height, 0:width]\n",
@@ -450,7 +469,7 @@
       "cell_type": "code",
       "execution_count": null,
       "source": [
-        "# Let us get weights of the convolutional layer that has been pruned with 2/4 sparsity.\n",
+        "# Let us get weights of the convolutional layer that has been pruned with 2 by 4 sparsity.\n",
         "tensor_name = 'structural_pruning/Conv2D'\n",
         "detail = [x for x in details if tensor_name in x[\"name\"]]\n",
         "tensor_data = interpreter.tensor(detail[1][\"index\"])()\n",
@@ -491,16 +510,16 @@
     {
       "cell_type": "markdown",
       "source": [
-        "Let's see how unstructured weights look. We extract them and display a subset of the weight tensor."
+        "Let's see how randomly pruned weights look. We extract them and display a subset of the weight tensor."
       ],
       "metadata": {}
     },
     {
       "cell_type": "code",
       "execution_count": null,
       "source": [
-        "# Let us get weights of the convolutional layer that has been pruned with unstructured pruning.\n",
-        "tensor_name = 'unstructured_pruning/Conv2D'\n",
+        "# Let us get weights of the convolutional layer that has been pruned with random pruning.\n",
+        "tensor_name = 'pruning_sparsity_0_5/Conv2D'\n",
         "detail = [x for x in details if tensor_name in x[\"name\"]]\n",
         "tensor_data = interpreter.tensor(detail[0][\"index\"])()\n",
         "print(f\"Shape of the weight tensor is {tensor_data.shape}\")"
@@ -533,7 +552,14 @@
     {
       "cell_type": "markdown",
       "source": [
-        "There is a python script included in the TensorFlow Model Optimization Toolkit that could be used to check whether which layers in the model from the given flite file have the structurally pruned weights: [`check_sparsity_m_by_n.py`](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/core/sparsity/keras/tools/check_sparsity_m_by_n.py)."
+        "There is a python script included in the TensorFlow Model Optimization Toolkit that could be used to check whether which layers in the model from the given flite file have the structurally pruned weights: [`check_sparsity_m_by_n.py`](https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/core/sparsity/keras/tools/check_sparsity_m_by_n.py). The usage of this tool for the case of 2 by 4 is shown below:"
+      ],
+      "metadata": {}
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "`python ./tensorflow_model_optimization/python/core/sparsity/keras/tools/check_sparsity_m_by_n.py --model_tflite=pruned_model.tflite --m_by_n=2,4`"
       ],
       "metadata": {}
     }