improve docs

a-r-r-o-w · a-r-r-o-w · commit 06b411fc02db · 2025-02-04T08:53:36.000+01:00
diff --git a/docs/source/en/optimization/memory.md b/docs/source/en/optimization/memory.md
@@ -160,9 +160,9 @@ In order to properly offload models after they're called, it is required to run
 
 ## Group offloading
 
-Group offloading is a middle ground between the two above methods. It works by offloading groups of internal layers (either `torch.nn.ModuleList` or `torch.nn.Sequential`). This method is more memory-efficient than model-level offloading. It is also faster than sequential-level offloading, as the number of device synchronizations is reduced.
+Group offloading is a middle ground between the two above methods. It works by offloading groups of internal layers (either `torch.nn.ModuleList` or `torch.nn.Sequential`). This method uses lower memory than model-level offloading. It is also faster than sequential-level offloading, as the number of device synchronizations is reduced.
 
-Another supported feature (for CUDA devices with support for asynchronous data transfer streams) is the ability to overlap data transfer and computation to reduce the overall execution time. This is enabled using layer prefetching with CUDA streams, i.e., the layer that is to be executed next starts onloading to the accelerator device while the current layer is being executed - this increases the memory requirements slightly. Note that this implementation also supports leaf-level offloading but can be made much faster when using streams.
+Another supported feature (for CUDA devices with support for asynchronous data transfer streams) is the ability to overlap data transfer and computation to reduce the overall execution time compared to sequential offloading. This is enabled using layer prefetching with CUDA streams, i.e., the layer that is to be executed next starts onloading to the accelerator device while the current layer is being executed - this increases the memory requirements slightly. Note that this implementation also supports leaf-level offloading but can be made much faster when using streams.
 
 To enable group offloading, either call the [`~ModelMixin.enable_group_offloading`] method on the model or pass use [`~hooks.group_offloading.apply_group_offloading`]:
 
diff --git a/src/diffusers/hooks/group_offloading.py b/src/diffusers/hooks/group_offloading.py
@@ -285,14 +285,14 @@ def apply_group_offloading(
       memory, but can be slower due to the excessive number of device synchronizations.
 
     Group offloading is a middle ground between the two methods. It works by offloading groups of internal layers,
-    (either `torch.nn.ModuleList` or `torch.nn.Sequential`). This method is more memory-efficient than module-level
+    (either `torch.nn.ModuleList` or `torch.nn.Sequential`). This method uses lower memory than module-level
     offloading. It is also faster than leaf-level offloading, as the number of device synchronizations is reduced.
 
     Another supported feature (for CUDA devices with support for asynchronous data transfer streams) is the ability to
-    overlap data transfer and computation to reduce the overall execution time. This is enabled using layer prefetching
-    with streams, i.e., the layer that is to be executed next starts onloading to the accelerator device while the
-    current layer is being executed - this increases the memory requirements slightly. Note that this implementation
-    also supports leaf-level offloading but can be made much faster when using streams.
+    overlap data transfer and computation to reduce the overall execution time compared to sequential offloading. This
+    is enabled using layer prefetching with streams, i.e., the layer that is to be executed next starts onloading to
+    the accelerator device while the current layer is being executed - this increases the memory requirements slightly.
+    Note that this implementation also supports leaf-level offloading but can be made much faster when using streams.
 
     Args:
         module (`torch.nn.Module`):