Skip to content

Commit b2c6455

Browse files
authored
Merge branch 'main' into rf-inversion
2 parents e61c023 + 63b631f commit b2c6455

File tree

73 files changed

+4298
-380
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+4298
-380
lines changed

.github/workflows/nightly_tests.yml

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,64 @@ jobs:
347347
pip install slack_sdk tabulate
348348
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
349349
350+
run_nightly_quantization_tests:
351+
name: Torch quantization nightly tests
352+
strategy:
353+
fail-fast: false
354+
max-parallel: 2
355+
matrix:
356+
config:
357+
- backend: "bitsandbytes"
358+
test_location: "bnb"
359+
runs-on:
360+
group: aws-g6e-xlarge-plus
361+
container:
362+
image: diffusers/diffusers-pytorch-cuda
363+
options: --shm-size "20gb" --ipc host --gpus 0
364+
steps:
365+
- name: Checkout diffusers
366+
uses: actions/checkout@v3
367+
with:
368+
fetch-depth: 2
369+
- name: NVIDIA-SMI
370+
run: nvidia-smi
371+
- name: Install dependencies
372+
run: |
373+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
374+
python -m uv pip install -e [quality,test]
375+
python -m uv pip install -U ${{ matrix.config.backend }}
376+
python -m uv pip install pytest-reportlog
377+
- name: Environment
378+
run: |
379+
python utils/print_env.py
380+
- name: ${{ matrix.config.backend }} quantization tests on GPU
381+
env:
382+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
383+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
384+
CUBLAS_WORKSPACE_CONFIG: :16:8
385+
BIG_GPU_MEMORY: 40
386+
run: |
387+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
388+
--make-reports=tests_${{ matrix.config.backend }}_torch_cuda \
389+
--report-log=tests_${{ matrix.config.backend }}_torch_cuda.log \
390+
tests/quantization/${{ matrix.config.test_location }}
391+
- name: Failure short reports
392+
if: ${{ failure() }}
393+
run: |
394+
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_stats.txt
395+
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_failures_short.txt
396+
- name: Test suite reports artifacts
397+
if: ${{ always() }}
398+
uses: actions/upload-artifact@v4
399+
with:
400+
name: torch_cuda_${{ matrix.config.backend }}_reports
401+
path: reports
402+
- name: Generate Report and Notify Channel
403+
if: always()
404+
run: |
405+
pip install slack_sdk tabulate
406+
python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
407+
350408
# M1 runner currently not well supported
351409
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
352410
# run_nightly_tests_apple_m1:

docs/source/en/api/pipelines/flux.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ image.save("output.png")
148148
**Note:** `black-forest-labs/Flux.1-Depth-dev` is _not_ a ControlNet model. [`ControlNetModel`] models are a separate component from the UNet/Transformer whose residuals are added to the actual underlying model. Depth Control is an alternate architecture that achieves effectively the same results as a ControlNet model would, by using channel-wise concatenation with input control condition and ensuring the transformer learns structure control by following the condition as closely as possible.
149149

150150
```python
151-
# !pip install git+https://github.com/asomoza/image_gen_aux.git
151+
# !pip install git+https://github.com/huggingface/image_gen_aux
152152
import torch
153153
from diffusers import FluxControlPipeline, FluxTransformer2DModel
154154
from diffusers.utils import load_image

docs/source/en/api/pipelines/pag.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,10 @@ Since RegEx is supported as a way for matching layer identifiers, it is crucial
9696
- all
9797
- __call__
9898

99+
## StableDiffusion3PAGImg2ImgPipeline
100+
[[autodoc]] StableDiffusion3PAGImg2ImgPipeline
101+
- all
102+
- __call__
99103

100104
## PixArtSigmaPAGPipeline
101105
[[autodoc]] PixArtSigmaPAGPipeline

docs/source/en/conceptual/evaluation.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ Then we load the [v1-5 checkpoint](https://huggingface.co/stable-diffusion-v1-5/
181181

182182
```python
183183
model_ckpt_1_5 = "stable-diffusion-v1-5/stable-diffusion-v1-5"
184-
sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=weight_dtype).to(device)
184+
sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=torch.float16).to("cuda")
185185

186186
images_1_5 = sd_pipeline_1_5(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images
187187
```
@@ -280,7 +280,7 @@ from diffusers import StableDiffusionInstructPix2PixPipeline
280280

281281
instruct_pix2pix_pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
282282
"timbrooks/instruct-pix2pix", torch_dtype=torch.float16
283-
).to(device)
283+
).to("cuda")
284284
```
285285

286286
Now, we perform the edits:
@@ -326,9 +326,9 @@ from transformers import (
326326

327327
clip_id = "openai/clip-vit-large-patch14"
328328
tokenizer = CLIPTokenizer.from_pretrained(clip_id)
329-
text_encoder = CLIPTextModelWithProjection.from_pretrained(clip_id).to(device)
329+
text_encoder = CLIPTextModelWithProjection.from_pretrained(clip_id).to("cuda")
330330
image_processor = CLIPImageProcessor.from_pretrained(clip_id)
331-
image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to(device)
331+
image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to("cuda")
332332
```
333333

334334
Notice that we are using a particular CLIP checkpoint, i.e., `openai/clip-vit-large-patch14`. This is because the Stable Diffusion pre-training was performed with this CLIP variant. For more details, refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/clip).
@@ -350,7 +350,7 @@ class DirectionalSimilarity(nn.Module):
350350

351351
def preprocess_image(self, image):
352352
image = self.image_processor(image, return_tensors="pt")["pixel_values"]
353-
return {"pixel_values": image.to(device)}
353+
return {"pixel_values": image.to("cuda")}
354354

355355
def tokenize_text(self, text):
356356
inputs = self.tokenizer(
@@ -360,7 +360,7 @@ class DirectionalSimilarity(nn.Module):
360360
truncation=True,
361361
return_tensors="pt",
362362
)
363-
return {"input_ids": inputs.input_ids.to(device)}
363+
return {"input_ids": inputs.input_ids.to("cuda")}
364364

365365
def encode_image(self, image):
366366
preprocessed_image = self.preprocess_image(image)
@@ -459,6 +459,7 @@ with ZipFile(local_filepath, "r") as zipper:
459459
```python
460460
from PIL import Image
461461
import os
462+
import numpy as np
462463

463464
dataset_path = "sample-imagenet-images"
464465
image_paths = sorted([os.path.join(dataset_path, x) for x in os.listdir(dataset_path)])
@@ -477,6 +478,7 @@ Now that the images are loaded, let's apply some lightweight pre-processing on t
477478

478479
```python
479480
from torchvision.transforms import functional as F
481+
import torch
480482

481483

482484
def preprocess_image(image):
@@ -498,6 +500,10 @@ dit_pipeline = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", torch_dtype=
498500
dit_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(dit_pipeline.scheduler.config)
499501
dit_pipeline = dit_pipeline.to("cuda")
500502

503+
seed = 0
504+
generator = torch.manual_seed(seed)
505+
506+
501507
words = [
502508
"cassette player",
503509
"chainsaw",

docs/source/en/training/create_dataset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Create a dataset for training
22

3-
There are many datasets on the [Hub](https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads) to train a model on, but if you can't find one you're interested in or want to use your own, you can create a dataset with the 🤗 [Datasets](hf.co/docs/datasets) library. The dataset structure depends on the task you want to train your model on. The most basic dataset structure is a directory of images for tasks like unconditional image generation. Another dataset structure may be a directory of images and a text file containing their corresponding text captions for tasks like text-to-image generation.
3+
There are many datasets on the [Hub](https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads) to train a model on, but if you can't find one you're interested in or want to use your own, you can create a dataset with the 🤗 [Datasets](https://huggingface.co/docs/datasets) library. The dataset structure depends on the task you want to train your model on. The most basic dataset structure is a directory of images for tasks like unconditional image generation. Another dataset structure may be a directory of images and a text file containing their corresponding text captions for tasks like text-to-image generation.
44

55
This guide will show you two ways to create a dataset to finetune on:
66

@@ -87,4 +87,4 @@ accelerate launch --mixed_precision="fp16" train_text_to_image.py \
8787

8888
Now that you've created a dataset, you can plug it into the `train_data_dir` (if your dataset is local) or `dataset_name` (if your dataset is on the Hub) arguments of a training script.
8989

90-
For your next steps, feel free to try and use your dataset to train a model for [unconditional generation](unconditional_training) or [text-to-image generation](text2image)!
90+
For your next steps, feel free to try and use your dataset to train a model for [unconditional generation](unconditional_training) or [text-to-image generation](text2image)!

docs/source/en/tutorials/basic_training.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ For convenience, create a `TrainingConfig` class containing the training hyperpa
7575

7676
... push_to_hub = True # whether to upload the saved model to the HF Hub
7777
... hub_model_id = "<your-username>/<my-awesome-model>" # the name of the repository to create on the HF Hub
78-
... hub_private_repo = False
78+
... hub_private_repo = None
7979
... overwrite_output_dir = True # overwrite the old model when re-running the notebook
8080
... seed = 0
8181

docs/source/ko/api/pipelines/stable_diffusion/stable_diffusion_xl.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inferen
121121

122122
### 이미지 결과물을 정제하기
123123

124-
[base 모델 체크포인트](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)에서, StableDiffusion-XL 또한 고주파 품질을 향상시키는 이미지를 생성하기 위해 낮은 노이즈 단계 이미지를 제거하는데 특화된 [refiner 체크포인트](huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)를 포함하고 있습니다. 이 refiner 체크포인트는 이미지 품질을 향상시키기 위해 base 체크포인트를 실행한 후 "두 번째 단계" 파이프라인에 사용될 수 있습니다.
124+
[base 모델 체크포인트](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)에서, StableDiffusion-XL 또한 고주파 품질을 향상시키는 이미지를 생성하기 위해 낮은 노이즈 단계 이미지를 제거하는데 특화된 [refiner 체크포인트](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)를 포함하고 있습니다. 이 refiner 체크포인트는 이미지 품질을 향상시키기 위해 base 체크포인트를 실행한 후 "두 번째 단계" 파이프라인에 사용될 수 있습니다.
125125

126126
refiner를 사용할 때, 쉽게 사용할 수 있습니다
127127
- 1.) base 모델과 refiner을 사용하는데, 이는 *Denoisers의 앙상블*을 위한 첫 번째 제안된 [eDiff-I](https://research.nvidia.com/labs/dir/eDiff-I/)를 사용하거나
@@ -215,7 +215,7 @@ image = refiner(
215215

216216
#### 2.) 노이즈가 완전히 제거된 기본 이미지에서 이미지 출력을 정제하기
217217

218-
일반적인 [`StableDiffusionImg2ImgPipeline`] 방식에서, 기본 모델에서 생성된 완전히 노이즈가 제거된 이미지는 [refiner checkpoint](huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)를 사용해 더 향상시킬 수 있습니다.
218+
일반적인 [`StableDiffusionImg2ImgPipeline`] 방식에서, 기본 모델에서 생성된 완전히 노이즈가 제거된 이미지는 [refiner checkpoint](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)를 사용해 더 향상시킬 수 있습니다.
219219

220220
이를 위해, 보통의 "base" text-to-image 파이프라인을 수행 후에 image-to-image 파이프라인으로써 refiner를 실행시킬 수 있습니다. base 모델의 출력을 잠재 공간에 남겨둘 수 있습니다.
221221

docs/source/ko/training/create_dataset.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# 학습을 위한 데이터셋 만들기
22

33
[Hub](https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads) 에는 모델 교육을 위한 많은 데이터셋이 있지만,
4-
관심이 있거나 사용하고 싶은 데이터셋을 찾을 수 없는 경우 🤗 [Datasets](hf.co/docs/datasets) 라이브러리를 사용하여 데이터셋을 만들 수 있습니다.
4+
관심이 있거나 사용하고 싶은 데이터셋을 찾을 수 없는 경우 🤗 [Datasets](https://huggingface.co/docs/datasets) 라이브러리를 사용하여 데이터셋을 만들 수 있습니다.
55
데이터셋 구조는 모델을 학습하려는 작업에 따라 달라집니다.
66
가장 기본적인 데이터셋 구조는 unconditional 이미지 생성과 같은 작업을 위한 이미지 디렉토리입니다.
77
또 다른 데이터셋 구조는 이미지 디렉토리와 text-to-image 생성과 같은 작업에 해당하는 텍스트 캡션이 포함된 텍스트 파일일 수 있습니다.

docs/source/ko/training/lora.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ specific language governing permissions and limitations under the License.
3636

3737
[cloneofsimo](https://github.com/cloneofsimo)는 인기 있는 [lora](https://github.com/cloneofsimo/lora) GitHub 리포지토리에서 Stable Diffusion을 위한 LoRA 학습을 최초로 시도했습니다. 🧨 Diffusers는 [text-to-image 생성](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora)[DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora)을 지원합니다. 이 가이드는 두 가지를 모두 수행하는 방법을 보여줍니다.
3838

39-
모델을 저장하거나 커뮤니티와 공유하려면 Hugging Face 계정에 로그인하세요(아직 계정이 없는 경우 [생성](hf.co/join)하세요):
39+
모델을 저장하거나 커뮤니티와 공유하려면 Hugging Face 계정에 로그인하세요(아직 계정이 없는 경우 [생성](https://huggingface.co/join)하세요):
4040

4141
```bash
4242
huggingface-cli login

docs/source/ko/tutorials/basic_training.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ huggingface-cli login
7676
... output_dir = "ddpm-butterflies-128" # 로컬 및 HF Hub에 저장되는 모델명
7777

7878
... push_to_hub = True # 저장된 모델을 HF Hub에 업로드할지 여부
79-
... hub_private_repo = False
79+
... hub_private_repo = None
8080
... overwrite_output_dir = True # 노트북을 다시 실행할 때 이전 모델에 덮어씌울지
8181
... seed = 0
8282

0 commit comments

Comments
 (0)