improve test check

a-r-r-o-w · a-r-r-o-w · commit de97a511e669 · 2024-12-05T14:11:08.000+01:00
diff --git a/docs/source/en/quantization/torchao.md b/docs/source/en/quantization/torchao.md
@@ -11,7 +11,7 @@ specific language governing permissions and limitations under the License. -->
 
 # torchao
 
-[TorchAO](https://github.com/pytorch/ao) is an architecture optimization library for PyTorch, it provides high performance dtypes, optimization techniques and kernels for inference and training, featuring composability with native PyTorch features like `torch.compile`, FSDP etc.. Some benchmark numbers can be found [here](https://github.com/pytorch/ao/tree/main/torchao/quantization#benchmarks).
+[TorchAO](https://github.com/pytorch/ao) is an architecture optimization library for PyTorch, it provides high performance dtypes, optimization techniques and kernels for inference and training, featuring composability with native PyTorch features like `torch.compile`, FSDP etc. Some benchmark numbers can be found [here](https://github.com/pytorch/ao/tree/main/torchao/quantization#benchmarks).
 
 Before you begin, make sure you have Pytorch version 2.5, or above, and TorchAO installed:
 
@@ -21,7 +21,7 @@ pip install -U torch torchao
 
 ## Usage
 
-Now you can quantize a model by passing a [`TorchAoConfig`] to [`~ModelMixin.from_pretrained`]. This works for any model in any modality, as long as it supports loading with [Accelerate](https://hf.co/docs/accelerate/index) and contains `torch.nn.Linear` layers.
+Now you can quantize a model by passing a [`TorchAoConfig`] to [`~ModelMixin.from_pretrained`]. Loading pre-quantized models is supported as well! This works for any model in any modality, as long as it supports loading with [Accelerate](https://hf.co/docs/accelerate/index) and contains `torch.nn.Linear` layers.
 
 ```python
 from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
diff --git a/tests/quantization/torchao/test_torchao.py b/tests/quantization/torchao/test_torchao.py
@@ -74,6 +74,7 @@ def forward(self, input, *args, **kwargs):
 if is_torchao_available():
     from torchao.dtypes import AffineQuantizedTensor
     from torchao.dtypes.affine_quantized_tensor import TensorCoreTiledLayoutType
+    from torchao.quantization.linear_activation_quantized_tensor import LinearActivationQuantizedTensor
 
 
 @require_torch
@@ -494,6 +495,11 @@ def check_serialization_expected_slice(self, expected_slice):
         output = loaded_quantized_model(**inputs)[0]
 
         output_slice = output.flatten()[-9:].detach().float().cpu().numpy()
+        self.assertTrue(
+            isinstance(
+                loaded_quantized_model.proj_out.weight, (AffineQuantizedTensor, LinearActivationQuantizedTensor)
+            )
+        )
         self.assertTrue(np.allclose(output_slice, expected_slice, atol=1e-3, rtol=1e-3))
 
     def test_serialization_expected_slice(self):