Skip to content

Commit 5c4eee5

Browse files
committed
update
1 parent cb004ad commit 5c4eee5

File tree

5 files changed

+18
-3
lines changed

5 files changed

+18
-3
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -333,7 +333,7 @@ jobs:
333333
additional_deps: ["peft"]
334334
- backend: "gguf"
335335
test_location: "gguf"
336-
additional_deps: ["peft"]
336+
additional_deps: ["peft", "kernels"]
337337
- backend: "torchao"
338338
test_location: "torchao"
339339
additional_deps: []

docs/source/en/quantization/gguf.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Optimized CUDA kernels can accelerate GGUF quantized model inference by approxim
6161
pip install -U kernels
6262
```
6363

64-
Once installed, GGUF inference automatically uses optimized kernels when available. Note that CUDA kernels may introduce minor numerical differences compared to the original GGUF implementation, potentially causing subtle visual variations in generated images. To disable CUDA kernel usage, set the environment variable `DIFFUSERS_GGUF_CUDA_KERNELS=false`.
64+
Once installed, set `DIFFUSERS_GGUF_CUDA_KERNELS=true` to use optimized kernels when available. Note that CUDA kernels may introduce minor numerical differences compared to the original GGUF implementation, potentially causing subtle visual variations in generated images. To disable CUDA kernel usage, set the environment variable `DIFFUSERS_GGUF_CUDA_KERNELS=false`.
6565

6666
## Supported Quantization Types
6767

src/diffusers/quantizers/gguf/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@
3030

3131

3232
can_use_cuda_kernels = (
33-
os.getenv("DIFFUSERS_GGUF_CUDA_KERNELS", "true").lower() in ["1", "true", "yes"]
33+
os.getenv("DIFFUSERS_GGUF_CUDA_KERNELS", "false").lower() in ["1", "true", "yes"]
3434
and torch.cuda.is_available()
3535
and torch.cuda.get_device_capability()[0] >= 7
3636
)

src/diffusers/utils/testing_utils.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
is_compel_available,
3636
is_flax_available,
3737
is_gguf_available,
38+
is_kernels_available,
3839
is_note_seq_available,
3940
is_onnx_available,
4041
is_opencv_available,
@@ -629,6 +630,18 @@ def decorator(test_case):
629630
return decorator
630631

631632

633+
def require_kernels_greater_or_equal(kernels_version):
634+
def decorator(test_case):
635+
correct_kernels_version = is_kernels_available() and version.parse(
636+
version.parse(importlib.metadata.version("kernels")).base_version
637+
) >= version.parse(kernels_version)
638+
return unittest.skipUnless(
639+
correct_kernels_version, f"Test requires kernels with version greater than {kernels_version}."
640+
)(test_case)
641+
642+
return decorator
643+
644+
632645
def deprecate_after_peft_backend(test_case):
633646
"""
634647
Decorator marking a test that will be skipped after PEFT backend

tests/quantization/gguf/test_gguf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
require_accelerator,
3333
require_big_accelerator,
3434
require_gguf_version_greater_or_equal,
35+
require_kernels_version_greater_or_equal,
3536
require_peft_backend,
3637
torch_device,
3738
)
@@ -49,6 +50,7 @@
4950
@require_accelerate
5051
@require_accelerator
5152
@require_gguf_version_greater_or_equal("0.10.0")
53+
@require_kernels_version_greater_or_equal("0.9.0")
5254
class GGUFCudaKernelsTests(unittest.TestCase):
5355
def setUp(self):
5456
gc.collect()

0 commit comments

Comments
 (0)