Merge pull request #978 from wwwind:docs_cpc

tensorflower-gardener · tensorflower-gardener · commit d9c3be7a5f54 · 2022-06-14T18:01:22.000-07:00
PiperOrigin-RevId: 455000418
diff --git a/tensorflow_model_optimization/g3doc/guide/clustering/clustering_comprehensive_guide.ipynb b/tensorflow_model_optimization/g3doc/guide/clustering/clustering_comprehensive_guide.ipynb
@@ -279,6 +279,27 @@
         "clustered_model.summary()"
       ]
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "bU0SIhY2Q63C"
+      },
+      "source": [
+        "### Cluster convolutional layers per channel\n",
+        "\n",
+        "The clustered model could be passed to further optimizations such as a [post training quantization](https://www.tensorflow.org/lite/performance/post_training_quantization). If the quantization is done per channel, then the model should be clustered per channel as well. This increases the accuracy of the clustered and quantized model.\n",
+        "\n",
+        "**Note:** only Conv2D layers are clustered per channel\n",
+        "\n",
+        "To cluster per channel, the parameter `cluster_per_channel` should be set to `True`. It could be set for some layers or for the whole model.\n",
+        "\n",
+        "**Tips:**\n",
+        "\n",
+        "* If a model is to be quantized further, you can consider to use [cluster preserving QAT technique](https://www.tensorflow.org/model_optimization/guide/combine/collaborative_optimization).\n",
+        "\n",
+        "* The model could be pruned before applying the clustering per channel. With the parameter `preserve_sparsity` is set to `True`, the sparsity is preserved during the clustering per channel. Note that the [sparsity and cluster preserving QAT technique](https://www.tensorflow.org/model_optimization/guide/combine/collaborative_optimization) should be used in this case."
+      ]
+    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -466,6 +487,7 @@
     "colab": {
       "collapsed_sections": [],
       "name": "clustering_comprehensive_guide.ipynb",
+      "provenance": [],
       "toc_visible": true
     },
     "kernelspec": {
diff --git a/tensorflow_model_optimization/g3doc/guide/combine/collaborative_optimization.md b/tensorflow_model_optimization/g3doc/guide/combine/collaborative_optimization.md
@@ -106,6 +106,18 @@ with PQAT and CQAT collaborative optimization paths.
 </table>
 </figure>
 
+### CQAT and PCQAT results for models clustered per channel
+Results below are obtained with the technique [clustering per channel](https://www.tensorflow.org/model_optimization/guide/clustering).
+They illustrate that if convolutional layers of the model are clustered per channel, then the model accuracy is higher. If your model has many convolutional layers, then we recommend to cluster per channel. The compression ratio remains the same, but the model accuracy will be higher. The model optimization pipeline is 'clustered -> cluster preserving QAT -> post training quantization, int8' in our experiments.
+<figure>
+<table  class="tableizer-table">
+<tr class="tableizer-firstrow"><th>Model</th><th>Clustered -> CQAT, int8 quantized</th><th>Clustered per channel -> CQAT, int8 quantized</th>
+ <tr><td>DS-CNN-L</td><td>95.949%</td><td> 96.44%</td></tr>
+ <tr><td>MobileNet-V2</td><td>71.538%</td><td>72.638%</td></tr>
+ <tr><td>MobileNet-V2 (pruned)</td><td>71.45%</td><td>71.901%</td></tr>
+</table>
+</figure>
+
 ## Examples
 
 For end-to-end examples of the collaborative optimization techniques described
diff --git a/tensorflow_model_optimization/g3doc/guide/combine/cqat_example.ipynb b/tensorflow_model_optimization/g3doc/guide/combine/cqat_example.ipynb
@@ -253,6 +253,7 @@
         "clustering_params = {\n",
         "  'number_of_clusters': 8,\n",
         "  'cluster_centroids_init': CentroidInitialization.KMEANS_PLUS_PLUS\n",
+        " 'cluster_per_channel': True,\n",
         "}\n",
         "\n",
         "clustered_model = cluster_weights(model, **clustering_params)\n",
@@ -597,7 +598,7 @@
       "source": [
         "## Apply post-training quantization and compare to CQAT model\n",
         "\n",
-        "Next, we use post-training quantization (no fine-tuning) on the clustered model and check its accuracy against the CQAT model. This demonstrates why you would need to use CQAT to improve the quantized model's accuracy.\n",
+        "Next, we use post-training quantization (no fine-tuning) on the clustered model and check its accuracy against the CQAT model. This demonstrates why you would need to use CQAT to improve the quantized model's accuracy. The difference may not be very visible, because the MNIST model is quite small and overparametrized.\n",
         "\n",
         "First, define a generator for the callibration dataset from the first 1000 training images."
       ]