-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
I'm running both
and
and they both have issues around saving the checkpoint -different issues, but issues nonetheless.
Reproduction
Just train it
Logs
For sdxl:
Traceback (most recent call last):
File "/home/ubuntu/notebooks/../scripts/train_lora_sdxl.py", line 2234, in <module>
main(args)
File "/home/ubuntu/notebooks/../scripts/train_lora_sdxl.py", line 2007, in main
accelerator.save_state(save_path)
File "/home/ubuntu/myenv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2767, in save_state
hook(self._models, weights, output_dir)
File "/home/ubuntu/notebooks/../scripts/train_lora_sdxl.py", line 1420, in save_model_hook
embedding_handler.save_embeddings(f"{output_dir}/{args.output_dir}_emb.safetensors")
File "/home/ubuntu/notebooks/../scripts/train_lora_sdxl.py", line 787, in save_embeddings
save_file(tensors, file_path)
File "/home/ubuntu/myenv/lib/python3.10/site-packages/safetensors/torch.py", line 281, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
safetensors_rust.SafetensorError: Error while serializing: IoError(Os { code: 2, kind: NotFound, message: "No such file or directory" })For 1.5:
Traceback (most recent call last):
File "/home/ubuntu/myenv/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/myenv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/ubuntu/myenv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
simple_launcher(args)
File "/home/ubuntu/myenv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ubuntu/myenv/bin/python3', '../scripts/train_lora_sd_15.py', '--pretrained_model_name_or_path=/home/ubuntu/models/sd15/fml', '--dataset_name=/home/ubuntu/nate_pics_768/', '--output_dir=/home/ubuntu/nate_models/lora_fml', "--instance_prompt='a TOK man'", '--gradient_accumulation_steps=1', '--caption_column=prompt', '--train_batch_size=4', '--repeats=1', '--mixed_precision=bf16', '--resolution=768', '--gradient_checkpointing', '--learning_rate=1.0', '--text_encoder_lr=1.0', '--adam_beta2=0.99', '--optimizer=prodigy', '--train_text_encoder_ti', '--train_text_encoder_ti_frac=0.5', '--token_abstraction=TOK', '--snr_gamma=5.0', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--rank=32', '--max_train_steps=2160', '--checkpointing_steps=10', '--seed=0', '--with_prior_preservation', '--prior_generation_precision=bf16', '--sample_batch_size=1', "--class_prompt='a man'", '--class_data_dir=/home/ubuntu/notebooks/man_4321_imgs_768x768px', '--report_to=wandb', '--validation_prompt=a TOK man, professional headshot, hyperdetailed photography, soft light, head and shoulders portrait, cover', '--num_validation_images=3', '--validation_epochs=200']' returned non-zero exit status 1.
### System Info
linux distro - nothing special.
### Who can help?
@sayakpaul
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working