- Update QAT docs to clarify support.

alanchiao · tensorflower-gardener · commit 70494e3c125d · 2020-04-07T09:46:17.000-07:00
- Update QAT tutorials based on 0.3.0 release code and clarify custom Keras layers use case.

PiperOrigin-RevId: 305277833
diff --git a/tensorflow_model_optimization/g3doc/guide/quantization/training.md b/tensorflow_model_optimization/g3doc/guide/quantization/training.md
@@ -24,12 +24,12 @@ leading to benefits during deployment.
 
 Quantization brings improvements via model compression and latency reduction.
 With the API defaults, the model size shrinks by 4x, and we typically see
-between 1.5 - 4x improvements in CPU latency in the tested backends. Further
+between 1.5 - 4x improvements in CPU latency in the tested backends. Eventually,
 latency improvements can be seen on compatible machine learning accelerators,
 such as the [EdgeTPU](https://coral.ai/docs/edgetpu/benchmarks/) and NNAPI.
 
 The technique is used in production in speech, vision, text, and translate use
-cases. The code currently supports vision use cases and will expand over time.
+cases. The code currently supports a subset of these models.
 
 #### Experiment with quantization and associated hardware
 
@@ -62,10 +62,12 @@ Support is available in the following areas:
 
 *   Model coverage: models using
     [whitelisted layers](https://github.com/tensorflow/model-optimization/tree/master/tensorflow_model_optimization/python/core/quantization/keras/default_8bit/default_8bit_quantize_registry.py),
-    BatchNormalization, and in limited cases, Concat.
+    BatchNormalization when it follows a convolutional or `Dense` layer, and in
+    limited cases, `Concat`.
     <!-- TODO(tfmot): add more details and ensure they are all correct. -->
 *   Hardware acceleration: our API defaults are compatible with acceleration on
-    EdgeTPU, NNAPI, and TFLite backends, amongst others.
+    EdgeTPU, NNAPI, and TFLite backends, amongst others. See the caveat in the
+    roadmap.
 *   Deploy with quantization: only per-axis quantization for convolutional
     layers, not per-tensor quantization, is currently supported.
 
@@ -75,6 +77,9 @@ It is on our roadmap to add support in the following areas:
 to launch. -->
 
 *   Model coverage: extended to include RNN/LSTMs and general Concat support.
+*   Hardware acceleration: ensure the TFLite converter can produce full-integer
+    models. See [this
+    issue](https://github.com/tensorflow/tensorflow/issues/38285) for details.
 *   Experiment with quantization use cases:
     *   Experiment with quantization algorithms that span Keras layers or
         require the training step.
diff --git a/tensorflow_model_optimization/g3doc/guide/quantization/training_comprehensive_guide.ipynb b/tensorflow_model_optimization/g3doc/guide/quantization/training_comprehensive_guide.ipynb
@@ -102,8 +102,8 @@
       "outputs": [],
       "source": [
         "! pip uninstall -y tensorflow\n",
-        "! pip install -q tf-nightly==2.2.0.dev20200315\n",
-        "! pip install -q --extra-index-url=https://test.pypi.org/simple/ tensorflow-model-optimization==0.3.0.dev6\n",
+        "! pip install -q tf-nightly\n",
+        "! pip install -q tensorflow-model-optimization\n",
         "\n",
         "import tensorflow as tf\n",
         "import numpy as np\n",
@@ -610,12 +610,11 @@
         "id": "YmyhI_bzWb2w"
       },
       "source": [
-        "This example uses the `DefaultDenseQuantizeConfig` to quantize a `Dense` layer. In practice, the layer\n",
-        "can be any custom Keras layer.\n",
+        "This example uses the `DefaultDenseQuantizeConfig` to quantize the `CustomLayer`.\n",
         "\n",
         "Applying the configuration is the same across\n",
         "the \"Experiment with quantization\" use cases.\n",
-        " * Apply `tfmot.quantization.keras.quantize_annotate_layer` to the `Dense` layer and pass in the `QuantizeConfig`.\n",
+        " * Apply `tfmot.quantization.keras.quantize_annotate_layer` to the `CustomLayer` and pass in the `QuantizeConfig`.\n",
         " * Use\n",
         "`tfmot.quantization.keras.quantize_annotate_model` to continue to quantize the rest of the model with the API defaults.\n",
         "\n"
@@ -635,14 +634,19 @@
         "quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model\n",
         "quantize_scope = tfmot.quantization.keras.quantize_scope\n",
         "\n",
+        "class CustomLayer(tf.keras.layers.Dense):\n",
+        "  pass\n",
+        "\n",
         "model = quantize_annotate_model(tf.keras.Sequential([\n",
-        "   quantize_annotate_layer(tf.keras.layers.Dense(20, input_shape=(20,)), DefaultDenseQuantizeConfig()),\n",
+        "   quantize_annotate_layer(CustomLayer(20, input_shape=(20,)), DefaultDenseQuantizeConfig()),\n",
         "   tf.keras.layers.Flatten()\n",
         "]))\n",
         "\n",
-        "# `quantize_apply` requires mentioning `DefaultDenseQuantizeConfig` with `quantize_scope`:\n",
+        "# `quantize_apply` requires mentioning `DefaultDenseQuantizeConfig` with `quantize_scope`\n",
+        "# as well as the custom Keras layer.\n",
         "with quantize_scope(\n",
-        "  {'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig}):\n",
+        "  {'DefaultDenseQuantizeConfig': DefaultDenseQuantizeConfig,\n",
+        "   'CustomLayer': CustomLayer}):\n",
         "  # Use `quantize_apply` to actually make the model quantization aware.\n",
         "  quant_aware_model = tfmot.quantization.keras.quantize_apply(model)\n",
         "\n",
@@ -864,7 +868,7 @@
         "    # Not needed. No new TensorFlow variables needed.\n",
         "    return {}\n",
         "\n",
-        "  def __call__(self, inputs, step, training, **kwargs):\n",
+        "  def __call__(self, inputs, training, weights, **kwargs):\n",
         "    return tf.keras.backend.clip(inputs, -1.0, 1.0)\n",
         "\n",
         "  def get_config(self):\n",
diff --git a/tensorflow_model_optimization/g3doc/guide/quantization/training_example.ipynb b/tensorflow_model_optimization/g3doc/guide/quantization/training_example.ipynb
@@ -118,8 +118,8 @@
       "outputs": [],
       "source": [
         "! pip uninstall -y tensorflow\n",
-        "! pip install -q tf-nightly==2.2.0.dev20200305\n",
-        "! pip install -q --extra-index-url=https://test.pypi.org/simple/ tensorflow-model-optimization==0.3.0.dev6\n"
+        "! pip install -q tf-nightly\n",
+        "! pip install -q tensorflow-model-optimization\n"
       ]
     },
     {
@@ -222,7 +222,7 @@
         "\n",
         "Note that the resulting model is quantization aware but not quantized (e.g. the weights are float32 instead of int8). The sections after show how to create a quantized model from the quantization aware one.\n",
         "\n",
-        "In the [comprehensive guide](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide.ipynb), you can see how to quantize some layers for model accuracy improvements."
+        "In the [comprehensive guide](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide.md), you can see how to quantize some layers for model accuracy improvements."
       ]
     },
     {

Original file line number	Diff line number	Diff line change
`@@ -118,8 +118,8 @@`
`118`	`118`	`"outputs": [],`
`119`	`119`	`"source": [`
`120`	`120`	`"! pip uninstall -y tensorflow\n",`
`121`		`- "! pip install -q tf-nightly==2.2.0.dev20200305\n",`
`122`		`- "! pip install -q --extra-index-url=https://test.pypi.org/simple/ tensorflow-model-optimization==0.3.0.dev6\n"`
	`121`	`+ "! pip install -q tf-nightly\n",`
	`122`	`+ "! pip install -q tensorflow-model-optimization\n"`
`123`	`123`	`]`
`124`	`124`	`},`
`125`	`125`	`{`
`@@ -222,7 +222,7 @@`
`222`	`222`	`"\n",`
`223`	`223`	`"Note that the resulting model is quantization aware but not quantized (e.g. the weights are float32 instead of int8). The sections after show how to create a quantized model from the quantization aware one.\n",`
`224`	`224`	`"\n",`
`225`		`- "In the [comprehensive guide](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide.ipynb), you can see how to quantize some layers for model accuracy improvements."`
	`225`	`+ "In the [comprehensive guide](https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide.md), you can see how to quantize some layers for model accuracy improvements."`
`226`	`226`	`]`
`227`	`227`	`},`
`228`	`228`	`{`