Skip to content

Dreambooth not generating model_index.json and thus is not able to make inference #3468

@xarthurx

Description

@xarthurx

Describe the bug

The problem is described in: https://discuss.huggingface.co/t/dreambooth-not-generating-model-index-json-and-thus-is-not-able-to-make-inference/38640

Re-iterate here:

I'm using runwayml/stable-diffusion-v1-5 model to train a dreambooth with the training script in examples/dreambooth/train_dreambooth.py

I get the following folders:

checkpoint-xxx (I turned on "--checkpointing_steps")
feature_extractor
logs
safety_checker
scheduler
text_encoder
tokenizer
unet
vae

When trying to make an inference, the following error is obtained:

OSError: Error no file named model_index.json found in directory

The most interesting thing is that the pipeline that I had was working perfectly before several days before.
I'm using a cloud GPU where I re-install the environment every time.

Thus, I assume that something has changed and now the flow training-inference is not working. I really appreciate any help with this issue.

Thanks.

Reproduction

Env Setup:

!git clone https://github.com/huggingface/diffusers ../diffusers
!pip install -e ../diffusers
!pip install -U -r ../diffusers/examples/dreambooth/requirements.txt
!pip install bitsandbytes xformer
!accelerate config default

Training (with variables setup to your local folders)

!accelerate launch scripts/dreambooth/train_dreambooth.py \
  --pretrained_model_name_or_path={MODEL_ID}  \
  --instance_data_dir={INSTANCE_DIR} \
  --output_dir={OUT_DIR} \
  --instance_prompt=f"rendering of a {TOKEN_NAME} {CLASS_NAME}." \
  --resolution={RESOLUTION} \
  --train_batch_size=1 \
  --use_8bit_adam \
  --gradient_accumulation_steps=1 \
  --gradient_checkpointing \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=3000 \
  --checkpointing_steps=700 \
  --checkpoints_total_limit=3

Inference

# inference from a checkpoint
from diffusers import DiffusionPipeline, UNet2DConditionModel, StableDiffusionPipeline
from transformers import CLIPTextModel
import torch


promptOrigin = ["an office dog."] * num_col

imgCol = []

  pretrained_path = f"./models/{model_id}/checkpoint-1000/unet"
  unet = UNet2DConditionModel.from_pretrained(pretrained_path)
  pipe = DiffusionPipeline.from_pretrained(pretrained_model_name_or_path=pretrained_path, unet=unet).to("cuda")
    
  imgs = pipe(promptOrigin, height=RESOLUTION, width=RESOLUTION).images
  imgCol.extend(imgs)

Logs

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[58], line 19
     13 unet = UNet2DConditionModel.from_pretrained(pretrained_path)
     15 # if you have trained with `--args.train_text_encoder` make sure to also load the text encoder
     16 # text_encoder = CLIPTextModel.from_pretrained(f"./models/{model_id}/checkpoint-{i}/text_encoder")
     17 # pipe = DiffusionPipeline.from_pretrained(pretrained_model_name_or_path=MODEL_ID, unet=unet, text_encoder=text_encoder)
---> 19 pipe = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path=pretrained_path, unet=unet)
     21 pipe.to("cuda")
     23 imgs = pipe(promptOrigin, height=RESOLUTION, width=RESOLUTION).images

File /opt/conda/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py:902, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    899 else:
    900     cached_folder = pretrained_model_name_or_path
--> 902 config_dict = cls.load_config(cached_folder)
    904 # pop out "_ignore_files" as it is only needed for download
    905 config_dict.pop("_ignore_files", None)

File /opt/conda/lib/python3.10/site-packages/diffusers/configuration_utils.py:350, in ConfigMixin.load_config(cls, pretrained_model_name_or_path, return_unused_kwargs, return_commit_hash, **kwargs)
    348         config_file = os.path.join(pretrained_model_name_or_path, subfolder, cls.config_name)
    349     else:
--> 350         raise EnvironmentError(
    351             f"Error no file named {cls.config_name} found in directory {pretrained_model_name_or_path}."
    352         )
    353 else:
    354     try:
    355         # Load from URL or cache if already cached

OSError: Error no file named model_index.json found in directory ./models/office-dbModel-v1-1/unet.


### System Info


  • diffusers version: 0.17.0.dev0
  • Platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.27
  • Python version: 3.10.9
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Huggingface_hub version: 0.14.1
  • Transformers version: 4.29.2
  • Accelerate version: 0.19.0
  • xFormers version: not installed
  • Using GPU in script?: 4090
  • Using distributed or parallel set-up in script?: No

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions