-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
Describe the bug
The problem is described in: https://discuss.huggingface.co/t/dreambooth-not-generating-model-index-json-and-thus-is-not-able-to-make-inference/38640
Re-iterate here:
I'm using runwayml/stable-diffusion-v1-5 model to train a dreambooth with the training script in examples/dreambooth/train_dreambooth.py
I get the following folders:
checkpoint-xxx (I turned on "--checkpointing_steps")
feature_extractor
logs
safety_checker
scheduler
text_encoder
tokenizer
unet
vae
When trying to make an inference, the following error is obtained:
OSError: Error no file named model_index.json found in directory
The most interesting thing is that the pipeline that I had was working perfectly before several days before.
I'm using a cloud GPU where I re-install the environment every time.
Thus, I assume that something has changed and now the flow training-inference is not working. I really appreciate any help with this issue.
Thanks.
Reproduction
Env Setup:
!git clone https://github.com/huggingface/diffusers ../diffusers
!pip install -e ../diffusers
!pip install -U -r ../diffusers/examples/dreambooth/requirements.txt
!pip install bitsandbytes xformer
!accelerate config defaultTraining (with variables setup to your local folders)
!accelerate launch scripts/dreambooth/train_dreambooth.py \
--pretrained_model_name_or_path={MODEL_ID} \
--instance_data_dir={INSTANCE_DIR} \
--output_dir={OUT_DIR} \
--instance_prompt=f"rendering of a {TOKEN_NAME} {CLASS_NAME}." \
--resolution={RESOLUTION} \
--train_batch_size=1 \
--use_8bit_adam \
--gradient_accumulation_steps=1 \
--gradient_checkpointing \
--learning_rate=1e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=3000 \
--checkpointing_steps=700 \
--checkpoints_total_limit=3Inference
# inference from a checkpoint
from diffusers import DiffusionPipeline, UNet2DConditionModel, StableDiffusionPipeline
from transformers import CLIPTextModel
import torch
promptOrigin = ["an office dog."] * num_col
imgCol = []
pretrained_path = f"./models/{model_id}/checkpoint-1000/unet"
unet = UNet2DConditionModel.from_pretrained(pretrained_path)
pipe = DiffusionPipeline.from_pretrained(pretrained_model_name_or_path=pretrained_path, unet=unet).to("cuda")
imgs = pipe(promptOrigin, height=RESOLUTION, width=RESOLUTION).images
imgCol.extend(imgs)
Logs
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[58], line 19
13 unet = UNet2DConditionModel.from_pretrained(pretrained_path)
15 # if you have trained with `--args.train_text_encoder` make sure to also load the text encoder
16 # text_encoder = CLIPTextModel.from_pretrained(f"./models/{model_id}/checkpoint-{i}/text_encoder")
17 # pipe = DiffusionPipeline.from_pretrained(pretrained_model_name_or_path=MODEL_ID, unet=unet, text_encoder=text_encoder)
---> 19 pipe = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path=pretrained_path, unet=unet)
21 pipe.to("cuda")
23 imgs = pipe(promptOrigin, height=RESOLUTION, width=RESOLUTION).images
File /opt/conda/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py:902, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
899 else:
900 cached_folder = pretrained_model_name_or_path
--> 902 config_dict = cls.load_config(cached_folder)
904 # pop out "_ignore_files" as it is only needed for download
905 config_dict.pop("_ignore_files", None)
File /opt/conda/lib/python3.10/site-packages/diffusers/configuration_utils.py:350, in ConfigMixin.load_config(cls, pretrained_model_name_or_path, return_unused_kwargs, return_commit_hash, **kwargs)
348 config_file = os.path.join(pretrained_model_name_or_path, subfolder, cls.config_name)
349 else:
--> 350 raise EnvironmentError(
351 f"Error no file named {cls.config_name} found in directory {pretrained_model_name_or_path}."
352 )
353 else:
354 try:
355 # Load from URL or cache if already cached
OSError: Error no file named model_index.json found in directory ./models/office-dbModel-v1-1/unet.
### System Info
diffusersversion: 0.17.0.dev0- Platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.27
- Python version: 3.10.9
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.14.1
- Transformers version: 4.29.2
- Accelerate version: 0.19.0
- xFormers version: not installed
- Using GPU in script?: 4090
- Using distributed or parallel set-up in script?: No