Fix (Notebooks): Corrected typos/small errors in the text cells with explanations (#1399)

JP-Amboage · Juan P Garcia Amboage · nickfraser · web-flow · commit e04b825fe717 · 2025-10-23T10:21:14.000+01:00
---------

Signed-off-by: Juan P Garcia Amboage &lt;jgarciaa@XIRJGARCIAA01.amd.com&gt;
Co-authored-by: Juan P Garcia Amboage &lt;jgarciaa@XIRJGARCIAA01.amd.com&gt;
Co-authored-by: nickfraser &lt;icanlosh@gmail.com&gt;
diff --git a/notebooks/01_quant_tensor_quant_conv2d_overview.ipynb b/notebooks/01_quant_tensor_quant_conv2d_overview.ipynb
@@ -353,7 +353,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As expected, we have that the quantized value (in dequantized format) can be computer from its integer representation, together with zero-point and scale:"
+    "As expected, we have that the quantized value (in dequantized format) can be computed from its integer representation, together with zero-point and scale:"
    ]
   },
   {
@@ -422,7 +422,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Calling `is_valid` is relative expensive, so it should be using sparingly, but there are a few cases where a non-valid QuantTensor might be generated that is important to be aware of. Say we have two QuantTensor as output of the same quantized activation, and we want to sum them together:"
+    "Calling `is_valid` is relatively expensive, so it should be used sparingly, but there are a few cases where a non-valid QuantTensor might be generated that are important to be aware of. Say we have two QuantTensor as output of the same quantized activation, and we want to sum them together:"
    ]
   },
   {
@@ -540,7 +540,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "`QuantTensor` implements `__torch_function__` to handle being called from torch functional operators (e.g. ops under `torch.nn.functional`). Passing a QuantTensor to supported ops that are invariant to quantization, e.g. max-pooling, preserve the the validity of a QuantTensor. Example:"
+    "`QuantTensor` implements `__torch_function__` to handle being called from torch functional operators (e.g. ops under `torch.nn.functional`). Passing a QuantTensor to supported ops that are invariant to quantization, e.g. max-pooling, preserves the the validity of a QuantTensor. Example:"
    ]
   },
   {
@@ -634,7 +634,7 @@
    "source": [
     "## Input Quantization\n",
     "\n",
-    "We can obtain a valid output `QuantTensor` by making sure that both input and weight of `QuantConv2d` are quantized. To do so, we can set a quantizer for `input_quant`. In this example we pick a *signed 8-bit* quantizer with *per-tensor floating-point scale factor*:"
+    "We can obtain a valid output `QuantTensor` by making sure that both the inputs and weights of `QuantConv2d` are quantized. To do so, we can set a quantizer for `input_quant`. In this example we pick a *signed 8-bit* quantizer with *per-tensor floating-point scale factor*:"
    ]
   },
   {
@@ -708,7 +708,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "What happens internally is that the input tensor passed to `input_quant_conv` is being quantized before being passed to the convolution operator. That means we are now computing a convolution between two quantized tensors, which mimplies that the output of the operation is also quantized. As expected then `out_tensor` is marked as valid. \n",
+    "What happens internally is that the input tensor passed to `input_quant_conv` is being quantized before being passed to the convolution operator. That means we are now computing a convolution between two quantized tensors, which implies that the output of the operation is also quantized. As expected, `out_tensor` is then marked as valid. \n",
     "\n",
     "Another important thing to notice is how the `bit_width` field of `out_tensor` is relatively high at *21 bits*. In Brevitas, the assumption is always that the output bit-width of an operator reflects the worst-case size of the *accumulator* required by that operation. In other terms, given the *size* of the input and weight tensors and their *bit-widths*, 21 is the bit-width that would be required to represent the largest possible output value that could be generated. This makes sure that the affine quantization invariant is always respected."
    ]
@@ -799,7 +799,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Note**: how we are explicitly forcing `value`, `scale`, `zero_point` and `bit_width` to be floating-point `torch.Tensor`, as this is expected by Brevitas but it's currently not enforced automatically at initialization time.\n",
+    "**Note** how we are explicitly forcing `value`, `scale`, `zero_point` and `bit_width` to be floating-point `torch.Tensor`, as this is expected by Brevitas but it's currently not enforced automatically at initialization time.\n",
     "\n",
     "If we now pass in `quant_tensor_input` to `return_quant_conv`, we will see that indeed the output is a valid 21-bit `QuantTensor`:"
    ]
@@ -918,7 +918,7 @@
    "source": [
     "## Output Quantization\n",
     "\n",
-    "Let's now look at would have happened if we instead enabled output quantization:"
+    "Let's now look at what would have happened if we had instead enabled output quantization:"
    ]
   },
   {
@@ -1235,7 +1235,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Not all scenarios require bias quantization to depend on the scale factor of the input. In those cases, biases can be quantized the same way weights are quantized, and have their own scale factor. In Brevitas, a predefined quantizer that reflects this other scenario is `Int8BiasPerTensorFloatInternalScaling`. In this case then a valid quantized input is not required:"
+    "Not all scenarios require bias quantization to depend on the scale factor of the input. In those cases, biases can be quantized in the same way weights are quantized, and have their own scale factor. In Brevitas, a predefined quantizer that reflects this other scenario is `Int8BiasPerTensorFloatInternalScaling`. In this case then a valid quantized input is not required:"
    ]
   },
   {
diff --git a/notebooks/02_quant_activation_overview.ipynb b/notebooks/02_quant_activation_overview.ipynb
@@ -141,7 +141,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "From an algorithmic point of view then the two different implementation are doing the same thing. However, as it will become clearer in later tutorials, there are currently some scenarios where picking one style over the other can make a difference when it comes to exporting to a format such as standard ONNX. In the meantime, we can just keep in mind that both alternatives exist."
+    "From an algorithmic point of view the two different implementation are doing the same thing. However, as it will become clearer in later tutorials, there are currently some scenarios where picking one style over the other can make a difference when it comes to exporting to a format such as standard ONNX. In the meantime, we can just keep in mind that both alternatives exist."
    ]
   },
   {
@@ -251,7 +251,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As expected, a `QuantIdentity` with quantization disabled behaves like an identity function also when a `QuantTensor` is passed in. However, depending on whather `return_quant_tensor` is set to `False` or not, quantization metadata might be stripped out, i.e. the input `QuantTensor` is going to be returned as an implicitly quantized `torch.Tensor`:"
+    "As expected, a `QuantIdentity` with quantization disabled behaves like an identity function also when a `QuantTensor` is passed in. However, depending on whether `return_quant_tensor` is set to `False` or not, quantization metadata might be stripped out, i.e. the input `QuantTensor` is going to be returned as an implicitly quantized `torch.Tensor`:"
    ]
   },
   {
@@ -625,7 +625,7 @@
    "source": [
     "Regarding some premade activation quantizers, such as `Uint8ActPerTensorFloat`, `ShiftedUint8ActPerTensorFloat`, and `Int8ActPerTensorFloat`, a word of caution that anticipates some of the themes of the next tutorial.\n",
     "To minimize user interaction, Brevitas initializes scale and zero-point by collecting statistics for a number of training steps (by default 30). This can be seen as a sort of very basic calibration step, although it typically happens during training and with quantization already enabled. These statistics are accumulated in an exponential moving average that at end of the collection phase is used to initialize a learned *parameter*.\n",
-    "During the collection phase then, the quantizer behaves differently between `train()` and `eval()` mode. In `train()` mode, the statistics for that particular batch are returned. In `eval()` mode, the exponential moving average is returned. After the collection phase is over the learned parameter is returned in both execution modes.\n",
+    "During the collection phase then, the quantizer behaves differently between `train()` and `eval()` mode. In `train()` mode, the statistics for that particular batch are returned. In `eval()` mode, the exponential moving average is returned. After the collection phase is over, the learned parameter is returned in both execution modes.\n",
     "We can easily observe this behaviour with an example. Let's first define a quantized activation and two random input tensors:"
    ]
   },
@@ -818,7 +818,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In all of the examples that have currently been looked at in this tutorial, we have used per-tensor quantization. I.e., the output tensor of the activation, if quantized, was always quantized on a per-tensor level, with a single scale and zero-point quantization parameter per output tensor. However, one can also do per-channel quantization, where each output channel of the tensor has its own quantization parameters. In the example below, we look at per-tensor quantization of an input tensor that has 3 channels and 256 elements in the height and width dimensions. We purposely mutate the 1st channel to have its dynamic range be 3 times larger than the other 2 channels. We then feed it through a `QuantReLU`, whose default behavior is to quantize at a per-tensor granularity."
+    "In all of the examples that have looked at so far in this tutorial, we have used per-tensor quantization. I.e., the output tensor of the activation, if quantized, was always quantized on a per-tensor level, with a single scale and zero-point quantization parameter per output tensor. However, one can also do per-channel quantization, where each output channel of the tensor has its own quantization parameters. In the example below, we look at per-tensor quantization of an input tensor that has 3 channels and 256 elements in the height and width dimensions. We purposely mutate the 1st channel to have its dynamic range be 3 times larger than the other 2 channels. We then feed it through a `QuantReLU`, whose default behavior is to quantize at a per-tensor granularity."
    ]
   },
   {
@@ -1069,15 +1069,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can see that the number of elements in the quantization scale of the outputted tensor is now 3, matching those of the 3-channel tensor! Furthermore, we see that each channel has an 8-bit quantization range that matches its data distribution, which is much more ideal in terms of reducing quantization mismatch. However, it's important to note that some hardware providers don't efficiently support per-channel quantization in production, so it's best to check if your targetted hardware will allow per-channel quantization."
+    "We can see that the number of elements in the quantization scale of the output tensor is now 3, matching those of the 3-channel tensor! Furthermore, we see that each channel has an 8-bit quantization range that matches its data distribution, which is much more ideal in terms of reducing quantization mismatch. However, it's important to note that some hardware providers don't efficiently support per-channel quantization in production, so it's best to check if your targetted hardware will allow per-channel quantization."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Finally, a reminder that mixing things up is perfectly legal and encouraged in Brevitas.\n",
-    "For example, a `QuantIdentity` with `act_quant=Int8ActPerTensorFloatMinMaxInit` is equivalent to a default `QuantHardTanh`, or conversely a `QuantHardTanh` with `act_quant=Int8ActPerTensorFloat` is equivalent to a default `QuantIdentity`. This is allowed by the fact that - as it will be explained in the next tutorial - the same layer can accept different keyword arguments when different quantizers are set. So a QuantIdentity with `act_quant=Int8ActPerTensorFloatMinMaxInit` is going to expect arguments `min_val` and `max_val` the same way a default `QuantHardTanh` would."
+    "For example, a `QuantIdentity` with `act_quant=Int8ActPerTensorFloatMinMaxInit` is equivalent to a default `QuantHardTanh`, or conversely a `QuantHardTanh` with `act_quant=Int8ActPerTensorFloat` is equivalent to a default `QuantIdentity`. This is allowed by the fact that - as it will be explained in the next tutorial - the same layer can accept different keyword arguments when different quantizers are set. So a `QuantIdentity` with `act_quant=Int8ActPerTensorFloatMinMaxInit` is going to expect arguments `min_val` and `max_val` the same way a default `QuantHardTanh` would."
    ]
   }
  ],
diff --git a/notebooks/ONNX_export_tutorial.ipynb b/notebooks/ONNX_export_tutorial.ipynb
@@ -85,7 +85,7 @@
     "\n",
     "QCDQ is a style of representation introduced by Brevitas that extends the standard QDQ representation for quantization in ONNX. In Q(C)DQ export, before each operation, two (or three, in case of clipping) extra ONNX nodes are added:\n",
     "- `QuantizeLinear`: Takes as input a FP tensor, and quantizes it with a given zero-point and scale factor. It returns an (U)Int8 tensor.\n",
-    "- `Clip` (Optional): Takes as input an INT8 tensor, and, given ntenger min/max values, restricts its range.\n",
+    "- `Clip` (Optional): Takes as input an INT8 tensor, and, given integer min/max values, restricts its range.\n",
     "- `DeQuantizeLinear`: Takes as input an INT8 tensor, and converts it to its FP equivalent with a given zero-point and scale factor.\n",
     "\n",
     "There are several implications associated with this set of operations:\n",
diff --git a/notebooks/minifloat_mx_tutorial.ipynb b/notebooks/minifloat_mx_tutorial.ipynb
@@ -133,16 +133,16 @@
     "\n",
     "The reason for this is shaping. When quantizing a tensor with shapes [O, I], where O is output channel and I is input channel, with groupsize k, groupwise quantization is normally represented as follow:\n",
     "\n",
-    "- Tensor with shapes [O, k, I/k]\n",
-    "- Scales with shapes [O, k, 1]\n",
+    "- Tensor with shapes [O, I/k, k]\n",
+    "- Scales with shapes [O, I/k, 1]\n",
     "- Zero point same as scale\n",
     "\n",
     "The alternative to this representation is to have all three tensors with shapes [O,I], with a massive increase in memory utilization, especially with QAT + gradients.\n",
     "\n",
     "The underscored attributes will have the compressed shapes, while the properties (non-underscored naming) will dynamically compute the expanded version of the property. This means:\n",
     "```python\n",
     "quant_tensor.scale_.shape\n",
-    "# This will print [O, k, 1]\n",
+    "# This will print [O, I/k, 1]\n",
     "quant_tensor.scale.shape\n",
     "# This will print [O, I]\n",
     "```\n",
diff --git a/notebooks/quantized_recurrent.ipynb b/notebooks/quantized_recurrent.ipynb
@@ -396,7 +396,7 @@
     "`QuantRNN` follows the same `forward` interface of `torch.nn.RNN`, with a couple of exceptions. Packed variable length inputs are currently not supported, and unbatched inputs are not supported. \n",
     "Other than that, everything else is the same. \n",
     "\n",
-    "Inputs are expected to have shape `(batch, sequence, features)` for `batch_first=False`, or `(sequence, batch, features)` for `batch_first=True`. The layer returns a tuple with `(outputs, hidden_states)`, where `outputs` has shape `(sequence, batch, hidden_size * num_directions)` with `num_directions=2` when `bidirectional=True`, for `batch_first=False`, or `(batch, sequence, hidden_size * num_directions)` for `batch_first=True`, while `hidden_states` has shape `(num_directions * num_layers, batch, hidden_size)`."
+    "Inputs are expected to have shape `(sequence, batch, features)` for `batch_first=False`, or `(batch, sequence, features)` for `batch_first=True`. The layer returns a tuple with `(outputs, hidden_states)`, where `outputs` has shape `(sequence, batch, hidden_size * num_directions)` with `num_directions=2` when `bidirectional=True`, for `batch_first=False`, or `(batch, sequence, hidden_size * num_directions)` for `batch_first=True`, while `hidden_states` has shape `(num_directions * num_layers, batch, hidden_size)`."
    ]
   },
   {

Original file line number	Diff line number	Diff line change
`@@ -353,7 +353,7 @@`
`353`	`353`	`"cell_type": "markdown",`
`354`	`354`	`"metadata": {},`
`355`	`355`	`"source": [`
`356`		`- "As expected, we have that the quantized value (in dequantized format) can be computer from its integer representation, together with zero-point and scale:"`
	`356`	`+ "As expected, we have that the quantized value (in dequantized format) can be computed from its integer representation, together with zero-point and scale:"`
`357`	`357`	`]`
`358`	`358`	`},`
`359`	`359`	`{`
`@@ -422,7 +422,7 @@`
`422`	`422`	`"cell_type": "markdown",`
`423`	`423`	`"metadata": {},`
`424`	`424`	`"source": [`
`425`		- "Calling `is_valid` is relative expensive, so it should be using sparingly, but there are a few cases where a non-valid QuantTensor might be generated that is important to be aware of. Say we have two QuantTensor as output of the same quantized activation, and we want to sum them together:"
	`425`	+ "Calling `is_valid` is relatively expensive, so it should be used sparingly, but there are a few cases where a non-valid QuantTensor might be generated that are important to be aware of. Say we have two QuantTensor as output of the same quantized activation, and we want to sum them together:"
`426`	`426`	`]`
`427`	`427`	`},`
`428`	`428`	`{`
`@@ -540,7 +540,7 @@`
`540`	`540`	`"cell_type": "markdown",`
`541`	`541`	`"metadata": {},`
`542`	`542`	`"source": [`
`543`		- "`QuantTensor` implements `__torch_function__` to handle being called from torch functional operators (e.g. ops under `torch.nn.functional`). Passing a QuantTensor to supported ops that are invariant to quantization, e.g. max-pooling, preserve the the validity of a QuantTensor. Example:"
	`543`	+ "`QuantTensor` implements `__torch_function__` to handle being called from torch functional operators (e.g. ops under `torch.nn.functional`). Passing a QuantTensor to supported ops that are invariant to quantization, e.g. max-pooling, preserves the the validity of a QuantTensor. Example:"
`544`	`544`	`]`
`545`	`545`	`},`
`546`	`546`	`{`
`@@ -634,7 +634,7 @@`
`634`	`634`	`"source": [`
`635`	`635`	`"## Input Quantization\n",`
`636`	`636`	`"\n",`
`637`		- "We can obtain a valid output `QuantTensor` by making sure that both input and weight of `QuantConv2d` are quantized. To do so, we can set a quantizer for `input_quant`. In this example we pick a signed 8-bit quantizer with per-tensor floating-point scale factor:"
	`637`	+ "We can obtain a valid output `QuantTensor` by making sure that both the inputs and weights of `QuantConv2d` are quantized. To do so, we can set a quantizer for `input_quant`. In this example we pick a signed 8-bit quantizer with per-tensor floating-point scale factor:"
`638`	`638`	`]`
`639`	`639`	`},`
`640`	`640`	`{`
`@@ -708,7 +708,7 @@`
`708`	`708`	`"cell_type": "markdown",`
`709`	`709`	`"metadata": {},`
`710`	`710`	`"source": [`
`711`		- "What happens internally is that the input tensor passed to `input_quant_conv` is being quantized before being passed to the convolution operator. That means we are now computing a convolution between two quantized tensors, which mimplies that the output of the operation is also quantized. As expected then `out_tensor` is marked as valid. \n",
	`711`	+ "What happens internally is that the input tensor passed to `input_quant_conv` is being quantized before being passed to the convolution operator. That means we are now computing a convolution between two quantized tensors, which implies that the output of the operation is also quantized. As expected, `out_tensor` is then marked as valid. \n",
`712`	`712`	`"\n",`
`713`	`713`	"Another important thing to notice is how the `bit_width` field of `out_tensor` is relatively high at 21 bits. In Brevitas, the assumption is always that the output bit-width of an operator reflects the worst-case size of the accumulator required by that operation. In other terms, given the size of the input and weight tensors and their bit-widths, 21 is the bit-width that would be required to represent the largest possible output value that could be generated. This makes sure that the affine quantization invariant is always respected."
`714`	`714`	`]`
`@@ -799,7 +799,7 @@`
`799`	`799`	`"cell_type": "markdown",`
`800`	`800`	`"metadata": {},`
`801`	`801`	`"source": [`
`802`		- "Note: how we are explicitly forcing `value`, `scale`, `zero_point` and `bit_width` to be floating-point `torch.Tensor`, as this is expected by Brevitas but it's currently not enforced automatically at initialization time.\n",
	`802`	+ "Note how we are explicitly forcing `value`, `scale`, `zero_point` and `bit_width` to be floating-point `torch.Tensor`, as this is expected by Brevitas but it's currently not enforced automatically at initialization time.\n",
`803`	`803`	`"\n",`
`804`	`804`	"If we now pass in `quant_tensor_input` to `return_quant_conv`, we will see that indeed the output is a valid 21-bit `QuantTensor`:"
`805`	`805`	`]`
`@@ -918,7 +918,7 @@`
`918`	`918`	`"source": [`
`919`	`919`	`"## Output Quantization\n",`
`920`	`920`	`"\n",`
`921`		`- "Let's now look at would have happened if we instead enabled output quantization:"`
	`921`	`+ "Let's now look at what would have happened if we had instead enabled output quantization:"`
`922`	`922`	`]`
`923`	`923`	`},`
`924`	`924`	`{`
`@@ -1235,7 +1235,7 @@`
`1235`	`1235`	`"cell_type": "markdown",`
`1236`	`1236`	`"metadata": {},`
`1237`	`1237`	`"source": [`
`1238`		- "Not all scenarios require bias quantization to depend on the scale factor of the input. In those cases, biases can be quantized the same way weights are quantized, and have their own scale factor. In Brevitas, a predefined quantizer that reflects this other scenario is `Int8BiasPerTensorFloatInternalScaling`. In this case then a valid quantized input is not required:"
	`1238`	+ "Not all scenarios require bias quantization to depend on the scale factor of the input. In those cases, biases can be quantized in the same way weights are quantized, and have their own scale factor. In Brevitas, a predefined quantizer that reflects this other scenario is `Int8BiasPerTensorFloatInternalScaling`. In this case then a valid quantized input is not required:"
`1239`	`1239`	`]`
`1240`	`1240`	`},`
`1241`	`1241`	`{`
Original file line number	Diff line number	Diff line change
`@@ -141,7 +141,7 @@`
`141`	`141`	`"cell_type": "markdown",`
`142`	`142`	`"metadata": {},`
`143`	`143`	`"source": [`
`144`		`- "From an algorithmic point of view then the two different implementation are doing the same thing. However, as it will become clearer in later tutorials, there are currently some scenarios where picking one style over the other can make a difference when it comes to exporting to a format such as standard ONNX. In the meantime, we can just keep in mind that both alternatives exist."`
	`144`	`+ "From an algorithmic point of view the two different implementation are doing the same thing. However, as it will become clearer in later tutorials, there are currently some scenarios where picking one style over the other can make a difference when it comes to exporting to a format such as standard ONNX. In the meantime, we can just keep in mind that both alternatives exist."`
`145`	`145`	`]`
`146`	`146`	`},`
`147`	`147`	`{`
`@@ -251,7 +251,7 @@`
`251`	`251`	`"cell_type": "markdown",`
`252`	`252`	`"metadata": {},`
`253`	`253`	`"source": [`
`254`		- "As expected, a `QuantIdentity` with quantization disabled behaves like an identity function also when a `QuantTensor` is passed in. However, depending on whather `return_quant_tensor` is set to `False` or not, quantization metadata might be stripped out, i.e. the input `QuantTensor` is going to be returned as an implicitly quantized `torch.Tensor`:"
	`254`	+ "As expected, a `QuantIdentity` with quantization disabled behaves like an identity function also when a `QuantTensor` is passed in. However, depending on whether `return_quant_tensor` is set to `False` or not, quantization metadata might be stripped out, i.e. the input `QuantTensor` is going to be returned as an implicitly quantized `torch.Tensor`:"
`255`	`255`	`]`
`256`	`256`	`},`
`257`	`257`	`{`
`@@ -625,7 +625,7 @@`
`625`	`625`	`"source": [`
`626`	`626`	"Regarding some premade activation quantizers, such as `Uint8ActPerTensorFloat`, `ShiftedUint8ActPerTensorFloat`, and `Int8ActPerTensorFloat`, a word of caution that anticipates some of the themes of the next tutorial.\n",
`627`	`627`	`"To minimize user interaction, Brevitas initializes scale and zero-point by collecting statistics for a number of training steps (by default 30). This can be seen as a sort of very basic calibration step, although it typically happens during training and with quantization already enabled. These statistics are accumulated in an exponential moving average that at end of the collection phase is used to initialize a learned parameter.\n",`
`628`		- "During the collection phase then, the quantizer behaves differently between `train()` and `eval()` mode. In `train()` mode, the statistics for that particular batch are returned. In `eval()` mode, the exponential moving average is returned. After the collection phase is over the learned parameter is returned in both execution modes.\n",
	`628`	+ "During the collection phase then, the quantizer behaves differently between `train()` and `eval()` mode. In `train()` mode, the statistics for that particular batch are returned. In `eval()` mode, the exponential moving average is returned. After the collection phase is over, the learned parameter is returned in both execution modes.\n",
`629`	`629`	`"We can easily observe this behaviour with an example. Let's first define a quantized activation and two random input tensors:"`
`630`	`630`	`]`
`631`	`631`	`},`
`@@ -818,7 +818,7 @@`
`818`	`818`	`"cell_type": "markdown",`
`819`	`819`	`"metadata": {},`
`820`	`820`	`"source": [`
`821`		- "In all of the examples that have currently been looked at in this tutorial, we have used per-tensor quantization. I.e., the output tensor of the activation, if quantized, was always quantized on a per-tensor level, with a single scale and zero-point quantization parameter per output tensor. However, one can also do per-channel quantization, where each output channel of the tensor has its own quantization parameters. In the example below, we look at per-tensor quantization of an input tensor that has 3 channels and 256 elements in the height and width dimensions. We purposely mutate the 1st channel to have its dynamic range be 3 times larger than the other 2 channels. We then feed it through a `QuantReLU`, whose default behavior is to quantize at a per-tensor granularity."
	`821`	+ "In all of the examples that have looked at so far in this tutorial, we have used per-tensor quantization. I.e., the output tensor of the activation, if quantized, was always quantized on a per-tensor level, with a single scale and zero-point quantization parameter per output tensor. However, one can also do per-channel quantization, where each output channel of the tensor has its own quantization parameters. In the example below, we look at per-tensor quantization of an input tensor that has 3 channels and 256 elements in the height and width dimensions. We purposely mutate the 1st channel to have its dynamic range be 3 times larger than the other 2 channels. We then feed it through a `QuantReLU`, whose default behavior is to quantize at a per-tensor granularity."
`822`	`822`	`]`
`823`	`823`	`},`
`824`	`824`	`{`
`@@ -1069,15 +1069,15 @@`
`1069`	`1069`	`"cell_type": "markdown",`
`1070`	`1070`	`"metadata": {},`
`1071`	`1071`	`"source": [`
`1072`		- "We can see that the number of elements in the quantization scale of the outputted tensor is now 3, matching those of the 3-channel tensor! Furthermore, we see that each channel has an 8-bit quantization range that matches its data distribution, which is much more ideal in terms of reducing quantization mismatch. However, it's important to note that some hardware providers don't efficiently support per-channel quantization in production, so it's best to check if your targetted hardware will allow per-channel quantization."
	`1072`	+ "We can see that the number of elements in the quantization scale of the output tensor is now 3, matching those of the 3-channel tensor! Furthermore, we see that each channel has an 8-bit quantization range that matches its data distribution, which is much more ideal in terms of reducing quantization mismatch. However, it's important to note that some hardware providers don't efficiently support per-channel quantization in production, so it's best to check if your targetted hardware will allow per-channel quantization."
`1073`	`1073`	`]`
`1074`	`1074`	`},`
`1075`	`1075`	`{`
`1076`	`1076`	`"cell_type": "markdown",`
`1077`	`1077`	`"metadata": {},`
`1078`	`1078`	`"source": [`
`1079`	`1079`	`"Finally, a reminder that mixing things up is perfectly legal and encouraged in Brevitas.\n",`
`1080`		- "For example, a `QuantIdentity` with `act_quant=Int8ActPerTensorFloatMinMaxInit` is equivalent to a default `QuantHardTanh`, or conversely a `QuantHardTanh` with `act_quant=Int8ActPerTensorFloat` is equivalent to a default `QuantIdentity`. This is allowed by the fact that - as it will be explained in the next tutorial - the same layer can accept different keyword arguments when different quantizers are set. So a QuantIdentity with `act_quant=Int8ActPerTensorFloatMinMaxInit` is going to expect arguments `min_val` and `max_val` the same way a default `QuantHardTanh` would."
	`1080`	+ "For example, a `QuantIdentity` with `act_quant=Int8ActPerTensorFloatMinMaxInit` is equivalent to a default `QuantHardTanh`, or conversely a `QuantHardTanh` with `act_quant=Int8ActPerTensorFloat` is equivalent to a default `QuantIdentity`. This is allowed by the fact that - as it will be explained in the next tutorial - the same layer can accept different keyword arguments when different quantizers are set. So a `QuantIdentity` with `act_quant=Int8ActPerTensorFloatMinMaxInit` is going to expect arguments `min_val` and `max_val` the same way a default `QuantHardTanh` would."
`1081`	`1081`	`]`
`1082`	`1082`	`}`
`1083`	`1083`	`],`
Original file line number	Diff line number	Diff line change
`@@ -396,7 +396,7 @@`
`396`	`396`	"`QuantRNN` follows the same `forward` interface of `torch.nn.RNN`, with a couple of exceptions. Packed variable length inputs are currently not supported, and unbatched inputs are not supported. \n",
`397`	`397`	`"Other than that, everything else is the same. \n",`
`398`	`398`	`"\n",`
`399`		- "Inputs are expected to have shape `(batch, sequence, features)` for `batch_first=False`, or `(sequence, batch, features)` for `batch_first=True`. The layer returns a tuple with `(outputs, hidden_states)`, where `outputs` has shape `(sequence, batch, hidden_size * num_directions)` with `num_directions=2` when `bidirectional=True`, for `batch_first=False`, or `(batch, sequence, hidden_size * num_directions)` for `batch_first=True`, while `hidden_states` has shape `(num_directions * num_layers, batch, hidden_size)`."
	`399`	+ "Inputs are expected to have shape `(sequence, batch, features)` for `batch_first=False`, or `(batch, sequence, features)` for `batch_first=True`. The layer returns a tuple with `(outputs, hidden_states)`, where `outputs` has shape `(sequence, batch, hidden_size * num_directions)` with `num_directions=2` when `bidirectional=True`, for `batch_first=False`, or `(batch, sequence, hidden_size * num_directions)` for `batch_first=True`, while `hidden_states` has shape `(num_directions * num_layers, batch, hidden_size)`."
`400`	`400`	`]`
`401`	`401`	`},`
`402`	`402`	`{`