Clarify post training quantization behaviour (#2778)

jmarintur · jmarin · web-flow · commit 8c48ada89f9c · 2024-03-11T15:46:19.000-07:00
* Clarify dynamic quantization behaviour

Co-authored-by: jmarin &lt;javier.marin@satellogic.com&gt;
diff --git a/recipes_source/quantization.rst b/recipes_source/quantization.rst
@@ -81,7 +81,7 @@ The full documentation of the `quantize_dynamic` API call is `here <https://pyto
 3. Post Training Static Quantization
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-This method converts both the weights and the activations to 8-bit integers beforehand so there won't be on-the-fly conversion on the activations during the inference, as the dynamic quantization does, hence improving the performance significantly.
+This method converts both the weights and the activations to 8-bit integers beforehand so there won’t be on-the-fly conversion on the activations during the inference, as the dynamic quantization does. While post-training static quantization can significantly enhance inference speed and reduce model size, this method may degrade the original model's accuracy more compared to post training dynamic quantization.
 
 To apply static quantization on a model, run the following code: